CN107203509B - Title generation method and device - Google Patents

Title generation method and device Download PDF

Info

Publication number
CN107203509B
CN107203509B CN201710262158.XA CN201710262158A CN107203509B CN 107203509 B CN107203509 B CN 107203509B CN 201710262158 A CN201710262158 A CN 201710262158A CN 107203509 B CN107203509 B CN 107203509B
Authority
CN
China
Prior art keywords
news
title
strings
word strings
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710262158.XA
Other languages
Chinese (zh)
Other versions
CN107203509A (en
Inventor
王洪俊
肖诗斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tols Information Technology Co ltd
Original Assignee
BEIJING TRS INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TRS INFORMATION TECHNOLOGY CO LTD filed Critical BEIJING TRS INFORMATION TECHNOLOGY CO LTD
Priority to CN201710262158.XA priority Critical patent/CN107203509B/en
Publication of CN107203509A publication Critical patent/CN107203509A/en
Application granted granted Critical
Publication of CN107203509B publication Critical patent/CN107203509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a title generation method and device. The title generation method comprises the following steps: acquiring original titles of all news documents in a first news set and splicing the original titles into title text strings, wherein the first news set comprises at least one news document related to the same news event; extracting a high-frequency word string from the title text string, and filtering the extracted high-frequency word string; and determining the word string with highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set. By adopting the technical scheme of the embodiment of the invention, a high-quality short title can be automatically generated for the news document, the semantic effect and the conciseness of the title are ensured, the calculation difficulty of short title generation is reduced, and the method has higher adaptability.

Description

Title generation method and device
Technical Field
The invention relates to the technical field of computers, and more particularly, to a title generation method and apparatus.
Background
Typically, news documents have a long title, typically 20-30 words, resulting in a limited amount of news that can be displayed on a news web page. In order to display more news on the news webpage, the title of the news document can be compressed or rewritten, and the length of the title is shortened on the basis that the semantics of the title are not affected.
Currently, the header compression method of news documents mainly shortens the header length based on a set rule or grammar pattern. For example, based on a set rule, a synonym or abbreviation with a shorter length is used to replace a corresponding word string in the title, or a core sentence or a key sentence of the news document is acquired to replace the title. For another example, a shorter-length title is generated by learning a grammar pattern generated from a database based on the grammar pattern.
However, since the coverage of the setting rule is limited and the grammar pattern is limited to the range of the database, the semantic effect and the conclusivity of the news headline generated based on the setting rule or the grammar pattern are easily not ensured and the headline cannot be effectively compressed.
Disclosure of Invention
The embodiment of the invention provides a title generation method and device, which are used for automatically generating high-quality short titles for news documents.
According to an aspect of an embodiment of the present invention, there is provided a title generation method including: acquiring original titles of all news documents in a first news set and splicing the original titles into title text strings, wherein the first news set comprises at least one news document related to the same news event; extracting a high-frequency word string from the title text string, and filtering the extracted high-frequency word string; and determining the word string with highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set.
Optionally, the method comprises the step of. The method further comprises the steps of: the first news collection is obtained by clustering a second news collection, wherein the second news collection includes at least the first new Wen Jige.
Optionally the obtaining the first news collection by clustering the second news collection includes: calculating content similarity among all news documents in the second news collection; at least one candidate news set is determined based on the content similarity, and the first new Wen Jige is determined from the at least one candidate news set.
Optionally, the method comprises the step of. The obtaining the original headlines of each news in the first news set and splicing the headline text strings comprises the following steps: punctuation marks are arranged between adjacent original titles in the title text strings; and/or, replacing the corresponding word strings in the original title by synonyms or short words.
Optionally, the filtering the extracted high-frequency word string includes: filtering word strings which do not appear at the head or tail of the original title from the extracted high-frequency word strings; and/or filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings.
According to another aspect of the embodiment of the present invention, there is also provided a title generating apparatus, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring original titles of all news documents in a first news set and splicing the original titles into title text strings, and the first news set comprises at least one news document related to the same news event; the extraction and filtration module is used for extracting a high-frequency word string from the title text string and filtering the extracted high-frequency word string; and the generating module is used for determining the word string with highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set.
Optionally, the apparatus further comprises: and the clustering module is used for acquiring the first news set by clustering a second news set, wherein the second news set at least comprises the first news Wen Jige.
Optionally, the clustering module includes: a calculating unit, configured to calculate content similarity between news documents in the second news set; and the determining unit is used for determining at least one candidate news set according to the content similarity and determining the first new Wen Jige from the at least one candidate news set.
Optionally, the acquiring module includes: a setting unit, configured to set punctuation marks between each adjacent original titles in the title text string; and/or, the replacing unit adopts synonyms or short words to replace the corresponding word strings in the original title.
Optionally, the extraction filtration module comprises a filtration unit, the filtration unit: filtering word strings which do not appear at the head or tail of the original title from the extracted high-frequency word strings; and/or filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings.
According to the title generation method and device, original titles of a plurality of news documents related to the same news event are acquired to be spliced into title text strings, then high-frequency word strings are extracted from the title text strings, the extracted high-frequency word strings are filtered to screen the high-frequency word strings conforming to the title characteristics, the filtered highest-frequency word strings are determined to be new titles, a high-quality short title is generated for each news document, and the semantic effect and the conciseness of the title are guaranteed; moreover, the calculation difficulty of short header generation is reduced, and the method has higher adaptability.
Drawings
Fig. 1 is a flowchart showing steps of a title generation method according to a first embodiment of the present invention;
fig. 2 is a flowchart showing steps of a title generation method according to a second embodiment of the present invention;
fig. 3 is a block diagram showing the construction of a title generation apparatus according to a third embodiment of the present invention;
fig. 4 is a block diagram showing the construction of a title generation apparatus according to a fourth embodiment of the present invention.
Detailed Description
The following description of embodiments of the present invention will be made in further detail with reference to the drawings (like numerals designate like elements throughout the several views) and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present invention are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
Example 1
Referring to fig. 1, a flowchart illustrating steps of a title generation method according to a first embodiment of the present invention is shown.
The title generation method of the present embodiment includes the steps of:
step S102: the original titles of all news documents in the first news set are acquired and spliced into title text strings.
Wherein the first news collection includes at least one news document pertaining to the same news event.
In this embodiment, one or more news documents in the first news collection pertain to the same news event, which may be any news event. One or more news documents in the first news collection each have an original title.
After the first news set is acquired, extracting the original titles of all news documents in the first news set, and splicing the acquired original identifications into a long text string to form a title text string.
Step S104: and extracting the high-frequency word strings from the title text strings, and filtering the extracted high-frequency word strings.
The high-frequency word string is a word string with a length exceeding a preset length (for example, the length of two English words or two Chinese characters) in the title text string and the occurrence number exceeding a preset number (for example, twice).
For example, for a title text string spliced from the original titles of the first news set shown in table 1, the extracted high-frequency word string. After the high-frequency word strings are extracted, filtering operation is carried out on the extracted high-frequency word strings so as to filter out word strings with characteristics not conforming to the characteristics of the title. In this embodiment, the extraction method of the high-frequency word string and the filtering rule of the high-frequency word string are not limited.
Step S106: and determining the word string with the highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set.
The filtered high-frequency word strings basically accord with the title characteristics, and the word string with the highest occurrence frequency is selected from the filtered high-frequency word strings to be used as a new title of each news document in the first news set. That is, the filtered highest frequency word string is used as the title, so that on one hand, the semantic effect of the new title is ensured, news events pointed by all news documents in the first news set can be expressed, and the basic characteristics of the title are met; on the other hand, using the word string as the title corresponds to regenerating a short title for each news document in the first news collection, and can ensure the conclusivity of the title.
According to the title generation method provided by the embodiment of the invention, the original titles of a plurality of news documents related to the same news event are acquired to be spliced into the title text strings, then the high-frequency word strings are extracted from the title text strings, the extracted high-frequency word strings are filtered to screen the high-frequency word strings conforming to the title characteristics, and then the filtered highest-frequency word strings are determined to be new titles, so that a high-quality short title is generated for each news document, and the semantic effect and the conciseness of the title are ensured.
Compared with the method for shortening the title based on the setting rule and the grammar mode in the prior art, the title generation method provided by the embodiment of the invention does not need to set a complex short title generation rule, and reduces the calculation difficulty of short title generation; moreover, the titles of all news documents can be acquired for splicing, screened and compressed without considering the setting rules and the coverage range of the database, and high-quality short titles can be automatically generated, so that the method has higher adaptability.
The title generation method of the present embodiment may be executed and implemented by any device having a corresponding data processing capability, including, but not limited to, a server side corresponding to a news web page.
Example two
Referring to fig. 2, a flowchart illustrating steps of a title generation method according to a second embodiment of the present invention is shown.
The title generation method of the present embodiment includes the steps of:
step S202: the first new Wen Jige is obtained by clustering the second news set.
Wherein the second news collection includes at least the first new Wen Jige.
In this embodiment, the second news set includes at least one news document related to at least one news event, that is, the second news set may include other news documents related to other news events in addition to the at least one news document related to the same news event in the first news set.
A class of news documents about the same news event therein is obtained as the first new Wen Jige by clustering the second news collection. In an alternative embodiment, content similarity between news documents in a second news collection is calculated, at least one candidate news collection is determined according to the content similarity, and the first new Wen Jige is determined from the at least one candidate news collection.
Specifically, content similarity between the news documents, for example, cosine similarity of included angles between the news document vectors, may be calculated by performing word segmentation and vectorization processing on each news document in the second news set. If the content similarity between two news documents is greater than a pre-set similarity threshold (e.g., 0.5), then it may be determined that the two news documents are related to the same news event. That is, a plurality of news documents having a content similarity greater than a similarity threshold may be determined as a plurality of news documents concerning the same news event, and further determined as candidate news sets. One or more candidate news sets may be determined from the second news set, and one candidate news set may be determined to be the first new Wen Jige.
Step S204: the original titles of all news documents in the first news set are acquired and spliced into title text strings.
Wherein the first news collection includes at least one news document pertaining to the same news event.
After the first news collection is determined, the original headlines of each news document in the first news collection are extracted to be spliced into a headline text string.
Optionally, in the process of splicing the original titles into the title text string, punctuation marks can be set between adjacent original titles in the title text string, and the original titles are segmented, so that word strings are prevented from being formed between ends of the adjacent original titles. Also, it is preferable to set the same punctuation marks between each adjacent original titles to reduce the amount of calculation. For example, a period is set at the end of each original title. In addition, periods may be used instead of space symbols in each original title.
In this embodiment, after extracting the original titles of each news document in the first news set, the corresponding word strings in each original title are replaced by synonyms (the length of which is smaller than that of the synonym of the word string to be replaced) or simply, so as to shorten the word string length, and thus, in the case that the replaced word string is taken as the title, the title length can be further shortened.
Step S206: a high frequency word string is extracted from the title text string.
In an alternative implementation manner, a statistical method of n-element word strings is adopted, and word strings with word string lengths larger than a preset length and occurrence times exceeding a preset number are extracted from the title text strings to serve as high-frequency word strings. If the extracted high-frequency word string comprises the same-frequency sub-string, the same-frequency sub-string is filtered. For example, if the word string "chinese" and "chinese people" both appear 4 times in the title text string and "chinese people" includes "chinese", then "chinese" is the same-frequency substring of "chinese people", and only the word string "chinese people" is extracted when extracting the high-frequency word string.
Step S208: filtering word strings which do not appear at the head or tail of the original title from the extracted high-frequency word strings word strings including punctuation marks, word strings having a word string length less than a set length threshold.
In this embodiment, word strings which do not appear at the beginning or end of the original title are filtered from the extracted high-frequency word strings; and/or filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings. The word strings which do not appear at the beginning or end of the original title are less likely to become the title, the word strings comprising punctuation marks cannot usually become the title, and word strings with the word string length smaller than the set length threshold are insufficient for clearly expressing news events, so that the word strings are filtered, and the extracted high-frequency word strings can be more in line with the title characteristics.
In other embodiments, one or more of the three word strings that do not fit the title feature may be filtered out from the extracted high frequency word strings, and other word strings that do not fit the title feature may be filtered out.
Step S210: and determining the word string with the highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set.
The title generating method of this embodiment may be considered as an alternative specific implementation of the title generating method of the first embodiment, and the same steps may be referred to as the implementation manner of the related steps in the first embodiment.
According to the title generation method, news related to the same time are aggregated together through a clustering method, then original titles of the news are extracted to be spliced into title text strings, then high-frequency word strings are extracted from the title text strings, the high-frequency word strings conforming to the title features are screened based on the features such as the positions and the lengths of the word strings, the highest-frequency word strings conforming to the title features are screened out to serve as new titles, a high-quality short title is generated for each news document, and the semantic effect and the conciseness of the title are guaranteed; moreover, the high-quality short titles are automatically generated, the calculation difficulty of short title generation is reduced, and the method has high adaptability.
Example III
Referring to fig. 3, there is shown a block diagram of a title generation apparatus according to a third embodiment of the present invention.
Title generation of the present embodiment the apparatus includes an acquisition module 302 an extraction filtering module 304 and a generating module 306. The obtaining module 302 is configured to obtain an original title of each news document in a first news set and splice the original titles into a title text string, where the first news set includes at least one news document related to the same news event. The extraction and filtering module 304 is configured to extract a high-frequency word string from the title text string, and filter the extracted high-frequency word string. The generating module 306 is configured to generate the filtered high-frequency word strings the most frequent word strings are determined as the titles of the first news set.
According to the title generation device provided by the embodiment of the invention, the original titles of a plurality of news documents related to the same news event are acquired to be spliced into the title text strings, then the high-frequency word strings are extracted from the title text strings, the extracted high-frequency word strings are filtered to screen the high-frequency word strings conforming to the title characteristics, and then the filtered highest-frequency word strings are determined to be new titles, so that a high-quality short title is generated for each news document, and the semantic effect and the conciseness of the title are ensured; and the calculation difficulty of short header generation is reduced, and the method has higher adaptability.
Example IV
Referring to fig. 4, there is shown a block diagram of a title generating apparatus according to a fourth embodiment of the present invention.
The title generating apparatus of this embodiment includes an acquisition module 402, an extraction filtering module 404, and a generating module 406. Wherein the obtaining module 402 is configured to obtain an original headline of each news document in a first news set, and splice the headline text strings into headline text strings, wherein the first news set includes at least one news document related to the same news event. The extraction and filtering module 404 is configured to extract a high-frequency word string from the title text string, and filter the extracted high-frequency word string. The generating module 406 is configured to determine, as the title of the first news set, a word string with the highest occurrence frequency of the filtered high-frequency word strings.
Optionally, the title generating apparatus of this embodiment further includes a clustering module 408, configured to obtain the first news set by clustering a second news set, where the second news set includes at least the first new Wen Jige.
Optionally, the clustering module 408 includes a calculating unit 4082 and a determining unit 4084, where the calculating unit 4082 is configured to calculate a content similarity between the news documents in the second news set; the determining unit 4084 is configured to determine at least one candidate news set according to the content similarity, and determine the first new Wen Jige from the at least one candidate news set.
Optionally, the obtaining module 402 includes a setting unit 4022 and/or a replacing unit 4024, where the setting unit 4022 is configured to set punctuation marks between each adjacent original title in the title text string; the replacing unit 4024 replaces the corresponding word string in the original title with a synonym or simply.
Optionally, the extraction filtering module 404 includes an extraction unit 4042 and a filtering unit 4044, where the extraction unit 4042 is configured to extract a high-frequency word string from the title text string. The filtering unit 4044 is configured to filter out word strings that do not appear at the beginning or end of the original title from the extracted high-frequency word strings; and-or alternatively, the first and second heat exchangers may be, filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings.
The title generation method of the present embodiment is used to implement the title generation method of the first embodiment or the second embodiment, and has the beneficial effects of the method embodiment, which is not described herein.
It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present invention may be split into more components/steps, or two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the objects of the embodiments of the present invention.
The methods according to embodiments of the present invention described above may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CDROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be stored on such software processes on a recording medium using a general purpose computer, special purpose processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a memory component (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the processing methods described herein. Further, when the general-purpose computer accesses code for implementing the processes shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the processes shown herein.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present invention.
The above embodiments are only for illustrating the embodiments of the present invention, but not for limiting the embodiments of the present invention, and various changes and modifications may be made by one skilled in the relevant art without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also fall within the scope of the embodiments of the present invention, and the scope of the embodiments of the present invention should be defined by the claims.

Claims (2)

1. A title generation method, comprising: acquiring original titles of all news documents in a first news set and splicing the original titles into title text strings, wherein the first news set comprises at least one news document related to the same news event;
extracting a high-frequency word string from the title text string, and filtering the extracted high-frequency word string; determining the word string with highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set;
further comprises: obtaining a first news set by clustering a second news set, wherein the second news set at least comprises the first news set;
the obtaining the first news collection by clustering the second news collection includes:
calculating content similarity among all news documents in the second news collection;
determining at least one candidate news set according to the content similarity, and determining the first news set from the at least one candidate news set;
the obtaining the original titles of all news documents in the first news set and splicing the original titles into title text strings comprises the following steps:
punctuation marks are arranged between adjacent original titles in the title text strings; and/or the number of the groups of groups,
replacing the corresponding word strings in the original title by synonyms or short words;
the filtering the extracted high-frequency word strings comprises the following steps:
filtering word strings which do not appear at the head or tail of the original title from the extracted high-frequency word strings; and/or the number of the groups of groups,
filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or the number of the groups of groups,
and filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings.
2. A title generation apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring original titles of all news documents in a first news set and splicing the original titles into title text strings, and the first news set comprises at least one news document related to the same news event;
the extraction and filtration module is used for extracting a high-frequency word string from the title text string and filtering the extracted high-frequency word string;
the generation module is used for determining the word string with highest occurrence frequency in the filtered high-frequency word strings as the title of the first news set;
further comprises:
the clustering module is used for acquiring the first news set by clustering a second news set, wherein the second news set at least comprises the first news set;
the clustering module comprises:
a calculating unit, configured to calculate content similarity between news documents in the second news set;
a determining unit, configured to determine at least one candidate news set according to the content similarity, and determine the first news set from the at least one candidate news set;
the acquisition module comprises:
a setting unit, configured to set punctuation marks between each adjacent original titles in the title text string; and/or the number of the groups of groups,
the replacing unit is used for replacing the corresponding word strings in the original title by synonyms or short words;
the extraction filtration module comprises a filtration unit for:
filtering word strings which do not appear at the head or tail of the original title from the extracted high-frequency word strings; and/or the number of the groups of groups,
filtering word strings comprising punctuation marks from the extracted high-frequency word strings; and/or the number of the groups of groups,
and filtering out word strings with word string lengths smaller than a set length threshold value from the extracted high-frequency word strings.
CN201710262158.XA 2017-04-20 2017-04-20 Title generation method and device Active CN107203509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710262158.XA CN107203509B (en) 2017-04-20 2017-04-20 Title generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710262158.XA CN107203509B (en) 2017-04-20 2017-04-20 Title generation method and device

Publications (2)

Publication Number Publication Date
CN107203509A CN107203509A (en) 2017-09-26
CN107203509B true CN107203509B (en) 2023-06-20

Family

ID=59904977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710262158.XA Active CN107203509B (en) 2017-04-20 2017-04-20 Title generation method and device

Country Status (1)

Country Link
CN (1) CN107203509B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509417B (en) * 2018-03-20 2022-03-15 腾讯科技(深圳)有限公司 Title generation method and device, storage medium and server
CN110895586B (en) * 2018-08-22 2023-07-14 深圳市雅阅科技有限公司 Method, device, computer equipment and storage medium for generating news page
CN112446207A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Title generation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311167A (en) * 1999-04-28 2000-11-07 Sharp Corp Device and method for document processing and storage medium used for same
CN1472675A (en) * 2002-07-29 2004-02-04 明日工作室股份有限公司 Title generation testing method and system thereof
CN1955952A (en) * 2005-10-25 2007-05-02 国际商业机器公司 System and method for automatically extracting by-line information
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101751455A (en) * 2009-12-31 2010-06-23 浙江大学 Method for automatically generating title by adopting artificial intelligence technology
CN105354333A (en) * 2015-12-07 2016-02-24 天云融创数据科技(北京)有限公司 Topic extraction method based on news text
CN105765566A (en) * 2013-06-27 2016-07-13 谷歌公司 Automatic generation of headlines

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311167A (en) * 1999-04-28 2000-11-07 Sharp Corp Device and method for document processing and storage medium used for same
CN1472675A (en) * 2002-07-29 2004-02-04 明日工作室股份有限公司 Title generation testing method and system thereof
CN1955952A (en) * 2005-10-25 2007-05-02 国际商业机器公司 System and method for automatically extracting by-line information
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101751455A (en) * 2009-12-31 2010-06-23 浙江大学 Method for automatically generating title by adopting artificial intelligence technology
CN105765566A (en) * 2013-06-27 2016-07-13 谷歌公司 Automatic generation of headlines
CN105354333A (en) * 2015-12-07 2016-02-24 天云融创数据科技(北京)有限公司 Topic extraction method based on news text

Also Published As

Publication number Publication date
CN107203509A (en) 2017-09-26

Similar Documents

Publication Publication Date Title
US10679051B2 (en) Method and apparatus for extracting information
KR102237702B1 (en) Entity relationship data generating method, apparatus, equipment and storage medium
CN108763591B (en) Webpage text extraction method and device, computer device and computer readable storage medium
CN107766328B (en) Text information extraction method of structured text, storage medium and server
US20140258283A1 (en) Computing device and file searching method using the computing device
US8185532B2 (en) Method for filtering out identical or similar documents
CN104598577B (en) A kind of extracting method of Web page text
CN107463548B (en) Phrase mining method and device
CN108073815B (en) Family judgment method and system based on code slice and storage medium
CN107203509B (en) Title generation method and device
CN112445912B (en) Fault log classification method, system, device and medium
CN112784009B (en) Method and device for mining subject term, electronic equipment and storage medium
CN107577713B (en) Text handling method based on electric power dictionary
CN106528509B (en) Webpage information extraction method and device
CN110222234B (en) Video classification method and device
CN103455572B (en) Obtain the method and device of video display main body in webpage
CN105808561A (en) Method and device for extracting abstract from webpage
CN106919554B (en) Method and device for identifying invalid words in document
CN110674243A (en) Corpus index construction method based on dynamic K-means algorithm
CN114036907A (en) Text data amplification method based on domain features
CN110929022A (en) Text abstract generation method and system
CN113689860B (en) Training of voice recognition model, voice recognition method, device and equipment
CN115577082A (en) Document keyword extraction method and device, electronic equipment and storage medium
CN107729898B (en) Method and device for detecting text lines in text image
CN111159996B (en) Short text set similarity comparison method and system based on text fingerprint algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 101, 1st to 7th floors, Building 3, Yard 6, Jianfeng Road (South Extension), Haidian District, Beijing, 100070

Patentee after: TOLS INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 14b04, 14th floor, Jinqiu international building, 6 Zhichun Road, Haidian District, Beijing 100088

Patentee before: BEIJING TRS INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address