CN107203509A - Title generation method and device - Google Patents
Title generation method and device Download PDFInfo
- Publication number
- CN107203509A CN107203509A CN201710262158.XA CN201710262158A CN107203509A CN 107203509 A CN107203509 A CN 107203509A CN 201710262158 A CN201710262158 A CN 201710262158A CN 107203509 A CN107203509 A CN 107203509A
- Authority
- CN
- China
- Prior art keywords
- news
- word string
- title
- high frequency
- agregator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000000605 extraction Methods 0.000 claims abstract description 42
- 238000001914 filtration Methods 0.000 claims abstract description 28
- 230000000694 effects Effects 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The embodiment provides a kind of title generation method and device.The title generation method includes:Obtain in the first news agregator the original header of each news documents and be spliced into title text string, wherein, first news agregator includes at least one news documents on same media event;High frequency word string is extracted from the title text string, and the high frequency word string of extraction is filtered;Frequency of occurrence highest word string in the high frequency word string by filtering is defined as to the title of first news agregator.Using the technical scheme of the embodiment of the present invention, a high-quality slug can be generated for news documents automatically, it is ensured that the semantic effect of title and Politeness, and reduce the difficulty in computation of slug generation, and with higher adaptability.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of title generation method and device.
Background technology
Generally, the title of news documents is longer, typically there is 20~30 words, causes the news that can be shown on news web page
Limited amount.In order to show more news on news web page, the title of news documents can be compressed or be rewritten,
On the basis of not influenceing title semantic, shorten length for heading.
At present, the header compressing method of news documents is mainly based upon setting rule or grammatical pattern is long to shorten title
Degree.For example, based on setting rule, replacing corresponding word string in title using the shorter synonym of length or abbreviation, or obtain
The kernel sentence or critical sentence of news documents is taken to make to replace title.For another example based on grammatical pattern, from database learning title
The grammatical pattern of generation, to generate the title that length is shorter.
But, because the limited coverage area of setting rule, and grammatical pattern are limited to the scope of database, easily make
Into the semantic effect of headline based on setting rule or grammatical pattern generation and it is Politeness cannot be guaranteed, and can not be effective
Compress title.
The content of the invention
Embodiments of the invention provide a kind of title generation method and device, are generated with automatically for news documents high-quality
Slug.
One side according to embodiments of the present invention there is provided a kind of title generation method, including:Obtain the first news collection
The original header of each news documents and title text string is spliced into conjunction, wherein, first news agregator is included on same
At least one news documents of media event;High frequency word string is extracted from the title text string, and to the high frequency of extraction
Word string is filtered;Frequency of occurrence highest word string in the high frequency word string by filtering is defined as the first news collection
The title of conjunction.
Alternatively.Methods described also includes:The first news collection is obtained by being clustered to the second news agregator
Close, wherein, second news agregator at least includes first news agregator.
It is alternatively described to obtain first news agregator by being clustered to the second news agregator, including:Calculate
Content similarity in second news agregator between each news documents;Determine that at least one candidate is new according to the content similarity
Set is heard, and first news agregator is determined from least one described candidate's news agregator.
Alternatively.It is described obtain the first news agregator in each news original header and be spliced into title text string, including:
Punctuation mark is set between each adjacent original header in the title text string;And/or, using synonym or abbreviation
Corresponding word string in the original header is replaced.
Alternatively, the high frequency word string of described pair of extraction is filtered, including:The mistake from the high frequency word string of extraction
Filter the word string not occurred in the beginning of the sentence or sentence tail of the original header;And/or, filtered from the high frequency word string of extraction
Exchange and include the word string of punctuation mark;And/or, word string length is filtered out from the high frequency word string of extraction and is less than setting length threshold
The word string of value.
Another aspect according to embodiments of the present invention, also provides a kind of title generating means, including:Acquisition module, is used for
Obtain in the first news agregator the original header of each news documents and be spliced into title text string, wherein, the first news collection
Closing includes at least one news documents on same media event;Filtering module is extracted, for from the title text string
High frequency word string is extracted, and the high frequency word string of extraction is filtered;Generation module, for the high frequency by filtering is passed through
Frequency of occurrence highest word string is defined as the title of first news agregator in word string.
Alternatively, described device also includes:Cluster module, for by being clustered the second news agregator to obtain
The first news agregator is stated, wherein, second news agregator at least includes first news agregator.
Alternatively, the cluster module includes:Computing unit, for calculating in the second news agregator between each news documents
Content similarity;Determining unit, for determining at least one candidate's news agregator according to the content similarity, and from described
First news agregator is determined at least one candidate's news agregator.
Alternatively, the acquisition module includes:Setting unit, for each adjacent original in the title text string
Punctuation mark is set between beginning title;And/or, replacement unit, using synonym or referred to as to corresponding in the original header
Word string is replaced.
Alternatively, the extraction filtering module includes filter element, the filter element:For the high frequency from extraction
The word string not occurred in the beginning of the sentence or sentence tail of the original header is filtered out in word string;And/or, from the high frequency words of extraction
The word string including punctuation mark is filtered out in string;And/or, filtered out from the high frequency word string of extraction word string length be less than set
The word string of measured length threshold value.
The title generation method and device of the embodiment of the present invention, by obtaining multiple news text on same media event
The respective original header of shelves, to be spliced into title text string, then the extraction high frequency word string from title text string, and to extraction
High frequency word string is filtered to be met the high frequency words of title feature and conspires to create to screen, and then the most high frequency word string by filtering is determined
It is that each news documents generate a high-quality slug for new title, it is ensured that the semantic effect of title and Politeness;And
And, the difficulty in computation of slug generation is reduced, and with higher adaptability.
Brief description of the drawings
Fig. 1 is a kind of step flow chart of according to embodiments of the present invention one title generation method;
Fig. 2 is a kind of step flow chart of according to embodiments of the present invention two title generation method;
Fig. 3 is a kind of structured flowchart of according to embodiments of the present invention three title generating means;
Fig. 4 is a kind of structured flowchart of according to embodiments of the present invention four title generating means.
Embodiment
(identical label represents identical element in some accompanying drawings) and embodiment, implement to the present invention below in conjunction with the accompanying drawings
The embodiment of example is described in further detail.Following examples are used to illustrate the present invention, but are not limited to the present invention
Scope.
It will be understood by those skilled in the art that the term such as " first ", " second " in the embodiment of the present invention is only used for difference
Different step, equipment or module etc., neither represent any particular technology implication, also do not indicate that the inevitable logic between them is suitable
Sequence.
Embodiment one
Reference picture 1, shows a kind of step flow chart of according to embodiments of the present invention one title generation method.
The title generation method of the present embodiment comprises the following steps:
Step S102:Obtain in the first news agregator the original header of each news documents and be spliced into title text string.
Wherein, the first news agregator includes at least one news documents on same media event.
In the present embodiment, one or more of first news agregator news documents are on same media event, the news
Event can be any media event.One or more of first news agregator news documents each have original header.
For example, table 1 shows a kind of example of the first news agregator.
After the first news agregator is obtained, the original header of each news documents in the first news agregator is extracted, and will be obtained
The original mark taken is spliced into a long text strings, forms title text string.
Step S104:High frequency word string is extracted from title text string, and the high frequency word string of extraction is filtered.
Wherein, high frequency word string exceedes preset length (for example, two English words or two for length in title text string
The length of Chinese character), and occurrence number exceedes the word string of preset times (for example twice).
For example, the title text string being spliced into for the original header of the first news agregator shown in table 1, the high frequency of extraction
Word string can include the big wedding of Liu Shishi, Liu Shishi, wedding gauze kerchief, Wu Qilong, big wedding, the grand Liu Shi poems of Wu Qi etc..Extracting high frequency words
After string, the high frequency word string to extraction carries out filter operation, to filter out the word string that feature does not meet title feature.The present embodiment
In to the extracting mode of high frequency word string, and the filtering rule of high frequency word string is not limited.
Step S106:Frequency of occurrence highest word string in high frequency word string by filtering is defined as the first news agregator
Title.
High frequency word string by filtering substantially conforms to title feature, is chosen in the high frequency words trail by filtering and frequency occurs
Secondary highest word string, is used as the new title of each news documents in the first news agregator.That is, with the most high frequency word string by filtering
As title, the semantic effect of new title is on the one hand ensure that, each news documents in the first news agregator can be stated and referred to
Media event, meet the essential characteristic of title;On the other hand, using word string as title, equivalent to in the first news agregator
Each news documents have regenerated a slug, ensure that the Politeness of title.
Title generation method according to embodiments of the present invention, by obtaining multiple news documents on same media event
Respective original header, to be spliced into title text string, then the extraction high frequency word string from title text string, and to the height of extraction
Frequency word string is filtered to be met the high frequency words of title feature and conspires to create to screen, and then will be defined as by the most high frequency word string of filtering
New title, is that each news documents generate a high-quality slug, it is ensured that the semantic effect of title and Politeness.
Relative to the method for shortening title based on setting rule and grammatical pattern in the prior art, the mark of the embodiment of the present invention
Generation method is inscribed, the slug create-rule complicated without setting reduces the difficulty in computation of slug generation;Moreover, need not
Consider setting rule and the coverage of database, the title that can obtain each news documents spliced, and screen with
Compression, automatically generates high-quality slug, with higher adaptability.
The title generation method of the present embodiment can be performed and realized by the arbitrarily equipment with corresponding data disposal ability,
The including but not limited to corresponding server end of news web page.
Embodiment two
Reference picture 2, shows a kind of step flow chart of according to embodiments of the present invention two title generation method.
The title generation method of the present embodiment comprises the following steps:
Step S202:The first news agregator is obtained by being clustered to the second news agregator.
Wherein, the second news agregator at least includes the first news agregator.
In the present embodiment, the second news agregator includes at least one news documents at least one media event,
That is, in the second news agregator except including in the first news agregator on same media event at least one news documents it
Outside, other news documents on other media events can also be included.
By being clustered to the second news agregator, to obtain the class news text therein on same media event
Shelves, are used as the first news agregator.In a kind of optional embodiment, calculate interior between each news documents in the second news agregator
Hold similarity, at least one candidate's news agregator is determined according to the content similarity, and from least one described candidate's news
First news agregator is determined in set.
Specifically, it can calculate each by carrying out participle and vectorization processing to each news documents in the second news agregator
Content similarity between news documents, for example, calculating the included angle cosine similarity between news documents vector.If two new
The content similarity heard between document is more than similarity threshold set in advance (for example, 0.5), then can determine the two news
Document is on same media event.That is, content similarity can be more than to multiple news documents of similarity threshold, it is defined as
On multiple news documents of same media event, this multiple news documents is further defined as candidate's news agregator.From
One or more candidate's news agregators are may determine in two news agregators, one candidate's news agregator of people can be determined that first is new
Hear set.
Step S204:Obtain in the first news agregator the original header of each news documents and be spliced into title text string.
Wherein, the first news agregator includes at least one news documents on same media event.
It is determined that after the first news agregator, the original header of each news documents in the first news agregator is extracted, to splice
Into a title text string.
Alternatively, during each original header is spliced into title text string, can in title text string each phase
Punctuation mark is set between adjacent original header, each original header is split, it is to avoid shape between being finished up in adjacent original header
Into word string.And, it is preferable that identical punctuation mark is set between each adjacent original header, to reduce amount of calculation.For example,
Fullstop is set at the ending of each original header.Further, it is also possible to the space character that will be replaced using fullstop in each original header
Number.
It is (long using synonym after the original header of each news documents in extracting the first news agregator in the present embodiment
Synonym of the degree less than word string to be replaced) or referred to as corresponding word string in each original header is replaced, it is long to shorten word string
Degree, so that in the case of being replaced word string as title, can further shorten length for heading.
Step S206:High frequency word string is extracted from title text string.
In a kind of optional embodiment, using the statistical method of n-gram word string, word string length is extracted from title text string
More than the word string that preset length and occurrence number exceed preset times, high frequency word string is used as.Wherein, if in the high frequency word string extracted
Including same frequency substring, then same frequency substring is filtered out.If for example, word string " China " and " Chinese people " go out in title text string
It is existing 4 times, and " Chinese people " include " China ", then " China " is the same frequency substring of " Chinese people ", when extracting high frequency word string,
Only extract word string " Chinese people ".
Step S208:The word not occurred in the beginning of the sentence or sentence tail of original header is filtered out from the high frequency word string of extraction
Word string, the word string length of string including punctuation mark are less than the word string of setting length threshold.
In the present embodiment, the word not occurred in the beginning of the sentence or sentence tail of original header is filtered out from the high frequency word string of extraction
String;And/or, the word string including punctuation mark is filtered out from the high frequency word string of extraction;And/or, from the high frequency word string of extraction
Filter out the word string that word string length is less than setting length threshold.Wherein, the word not occurred in the beginning of the sentence or sentence tail of original header
Conspire to create it is smaller for the possibility of title, including punctuation mark word string generally can not turn into title, and word string length be less than set
The word string of measured length threshold value is not enough to, by media event sake of clarity, therefore, these word strings be filtered out, and can cause what is extracted
High frequency word string more conforms to title feature.
Illustrate herein, in other embodiments, can filter out above-mentioned three kinds from the high frequency word string of extraction and do not meet
One or more in the word string of title feature, can also be after filtering other word strings for not meeting title feature.
Step S210:Frequency of occurrence highest word string in high frequency word string by filtering is defined as the first news agregator
Title.
The title generation method of the present embodiment, can be considered the optional tool of one kind of the title generation method of above-described embodiment one
Body embodiment, identical step can be found in the executive mode of correlation step in above-described embodiment one.
The title generation method of the embodiment of the present invention, by clustering method by the news Aggreagation of same time to one
Rise, then extract the original header of these news to be spliced into title text string, then the extraction high frequency word string from title text string,
And the feature such as position, the length based on word string meets the high frequency words of title feature and conspired to create to screen, and then filter out and meet mark
The most high frequency word string of feature is inscribed as new title, is that each news documents generate a high-quality slug, and ensure that mark
The semantic effect of topic and Politeness;Moreover, automatically generating high-quality slug, the difficulty in computation of slug generation is reduced,
And with higher adaptability.
Embodiment three
Reference picture 3, shows a kind of structured flowchart of according to embodiments of the present invention three title generating means.
The title generating means of the present embodiment include acquisition module 302, extract filtering module 304 and generation module 306.Its
In, acquisition module 302 be used for obtain the first news agregator in each news documents original header and be spliced into title text string, its
In, first news agregator includes at least one news documents on same media event.Extracting filtering module 304 is used for
High frequency word string is extracted from the title text string, and the high frequency word string of extraction is filtered.Generation module 306 is used for
Frequency of occurrence highest word string in the high frequency word string by filtering is defined as to the title of first news agregator.
The title generating means provided according to embodiments of the present invention, by obtaining multiple news on same media event
The respective original header of document, to be spliced into title text string, then the extraction high frequency word string from title text string, and to extracting
High frequency word string filtered and meet the high frequency words of title feature to screen and conspire to create, it is and then the most high frequency word string by filtering is true
It is set to new title, is that each news documents generate a high-quality slug, it is ensured that the semantic effect of title and Politeness;
And the difficulty in computation of slug generation is reduced, and with higher adaptability.
Example IV
Reference picture 4, shows a kind of structured flowchart of according to embodiments of the present invention four title generating means.
The title generating means of the present embodiment include acquisition module 402, extract filtering module 404 and generation module 406.Its
In, acquisition module 402 be used for obtain the first news agregator in each news documents original header and be spliced into title text string, its
In, first news agregator includes at least one news documents on same media event.Extracting filtering module 404 is used for
High frequency word string is extracted from the title text string, and the high frequency word string of extraction is filtered.Generation module 406 is used for
Frequency of occurrence highest word string in the high frequency word string by filtering is defined as to the title of first news agregator.
Alternatively, the title generating means of the present embodiment also include cluster module 408, for by the second news agregator
Clustered to obtain first news agregator, wherein, second news agregator at least includes first news agregator.
Alternatively, cluster module 408 includes computing unit 4082 and determining unit 4084, and computing unit 4082 is used to calculate
Content similarity in second news agregator between each news documents;Determining unit 4084 is used for true according to the content similarity
At least one fixed candidate's news agregator, and determine first news agregator from least one described candidate's news agregator.
Alternatively, acquisition module 402 includes setting unit 4022 and/or replacement unit 4024, and setting unit 4022 is used for
Punctuation mark is set between each adjacent original header in the title text string;Replacement unit 4024, using synonymous
Word is referred to as replaced to corresponding word string in the original header.
Alternatively, extracting filtering module 404 includes extraction unit 4042 and filter element 4044, and extraction unit 4042 is used for
High frequency word string is extracted from the title text string.Filter element 4044 is used to filter out not from the high frequency word string of extraction
The word string occurred in the beginning of the sentence or sentence tail of the original header;And/or, filtered out from the high frequency word string of extraction including
The word string of punctuation mark;And/or, the word that word string length is less than setting length threshold is filtered out from the high frequency word string of extraction
String.
The title generation method of the present embodiment is used for the title generation method for realizing previous embodiment one or embodiment two, and
Beneficial effect with embodiment of the method, is not being repeated herein.
It may be noted that the need for according to implementation, all parts/step described in the embodiment of the present invention can be split as more
The part operation of two or more components/steps or components/steps, can also be combined into new part/step by multi-part/step
Suddenly, to realize the purpose of the embodiment of the present invention.
Above-mentioned method according to embodiments of the present invention can be realized in hardware, firmware, or be implemented as being storable in note
Software or computer code in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through net
The original storage that network is downloaded is in long-range recording medium or nonvolatile machine readable media and will be stored in local recording medium
In computer code so that method described here can be stored in using all-purpose computer, application specific processor or can compile
Such software processing in journey or the recording medium of specialized hardware (such as ASIC or FPGA).It is appreciated that computer, processing
Device, microprocessor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example,
RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize
Processing method described here.In addition, when all-purpose computer accesses the code for realizing the processing being shown in which, code
Perform special-purpose computer all-purpose computer is converted to for performing the processing being shown in which.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and method and step, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, depending on the application-specific and design constraint of technical scheme.Professional and technical personnel
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The scope of the embodiment of the present invention.
Embodiment of above is merely to illustrate the embodiment of the present invention, and the not limitation to the embodiment of the present invention, relevant skill
The those of ordinary skill in art field, in the case where not departing from the spirit and scope of the embodiment of the present invention, can also make various
Change and modification, therefore all equivalent technical schemes fall within the category of the embodiment of the present invention, the patent of the embodiment of the present invention
Protection domain should be defined by the claims.
Claims (10)
1. a kind of title generation method, it is characterised in that including:
Obtain in the first news agregator the original header of each news documents and be spliced into title text string, wherein, described first is new
Hearing set includes at least one news documents on same media event;
High frequency word string is extracted from the title text string, and the high frequency word string of extraction is filtered;
Frequency of occurrence highest word string in the high frequency word string by filtering is defined as to the title of first news agregator.
2. according to the method described in claim 1, it is characterised in that also include:
First news agregator is obtained by being clustered to the second news agregator, wherein, second news agregator is extremely
Include first news agregator less.
3. method according to claim 2, it is characterised in that described to be obtained by being clustered to the second news agregator
First news agregator, including:
Calculate the content similarity between each news documents in the second news agregator;
At least one candidate's news agregator is determined according to the content similarity, and from least one described candidate's news agregator
Determine first news agregator.
4. according to the method described in claim 1, it is characterised in that the original for obtaining each news documents in the first news agregator
Beginning title is simultaneously spliced into title text string, including:
Punctuation mark is set between each adjacent original header in the title text string;And/or,
Corresponding word string in the original header is replaced using synonym or abbreviation.
5. method according to any one of claim 1 to 4, it is characterised in that the high frequency word string of described pair of extraction
Filtered, including:
The word string not occurred in the beginning of the sentence or sentence tail of the original header is filtered out from the high frequency word string of extraction;With/
Or,
The word string including punctuation mark is filtered out from the high frequency word string of extraction;And/or,
The word string that word string length is less than setting length threshold is filtered out from the high frequency word string of extraction.
6. a kind of title generating means, it is characterised in that including:
Acquisition module, for obtain the first news agregator in each news documents original header and be spliced into title text string, its
In, first news agregator includes at least one news documents on same media event;
Filtering module is extracted, for extracting high frequency word string from the title text string, and the high frequency word string of extraction is entered
Row filtering;
Generation module is new for frequency of occurrence highest word string in the high frequency word string by filtering to be defined as into described first
Hear the title of set.
7. device according to claim 6, it is characterised in that also include:
Cluster module, for obtaining first news agregator by being clustered to the second news agregator, wherein, described
Two news agregators at least include first news agregator.
8. device according to claim 7, it is characterised in that the cluster module includes:
Computing unit, for calculating the content similarity in the second news agregator between each news documents;
Determining unit, for determining at least one candidate's news agregator according to the content similarity, and from it is described at least one
First news agregator is determined in candidate's news agregator.
9. device according to claim 6, it is characterised in that the acquisition module includes:
Setting unit, for setting punctuation mark between each adjacent original header in the title text string;With/
Or,
Replacement unit, is replaced using synonym or abbreviation to corresponding word string in the original header.
10. the device according to any one of claim 6 to 9, it is characterised in that the extraction filtering module includes filtering
Unit, the filter element is used for:
The word string not occurred in the beginning of the sentence or sentence tail of the original header is filtered out from the high frequency word string of extraction;With/
Or,
The word string including punctuation mark is filtered out from the high frequency word string of extraction;And/or,
The word string that word string length is less than setting length threshold is filtered out from the high frequency word string of extraction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710262158.XA CN107203509B (en) | 2017-04-20 | 2017-04-20 | Title generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710262158.XA CN107203509B (en) | 2017-04-20 | 2017-04-20 | Title generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107203509A true CN107203509A (en) | 2017-09-26 |
CN107203509B CN107203509B (en) | 2023-06-20 |
Family
ID=59904977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710262158.XA Active CN107203509B (en) | 2017-04-20 | 2017-04-20 | Title generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107203509B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509417A (en) * | 2018-03-20 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Title generation method and equipment, storage medium, server |
CN110895586A (en) * | 2018-08-22 | 2020-03-20 | 腾讯科技(深圳)有限公司 | Method and device for generating news page, computer equipment and storage medium |
WO2022116435A1 (en) * | 2020-12-01 | 2022-06-09 | 平安科技(深圳)有限公司 | Title generation method and apparatus, electronic device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000311167A (en) * | 1999-04-28 | 2000-11-07 | Sharp Corp | Device and method for document processing and storage medium used for same |
CN1472675A (en) * | 2002-07-29 | 2004-02-04 | 明日工作室股份有限公司 | Title generation testing method and system thereof |
CN1955952A (en) * | 2005-10-25 | 2007-05-02 | 国际商业机器公司 | System and method for automatically extracting by-line information |
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
CN101751455A (en) * | 2009-12-31 | 2010-06-23 | 浙江大学 | Method for automatically generating title by adopting artificial intelligence technology |
CN105354333A (en) * | 2015-12-07 | 2016-02-24 | 天云融创数据科技(北京)有限公司 | Topic extraction method based on news text |
CN105765566A (en) * | 2013-06-27 | 2016-07-13 | 谷歌公司 | Automatic generation of headlines |
-
2017
- 2017-04-20 CN CN201710262158.XA patent/CN107203509B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000311167A (en) * | 1999-04-28 | 2000-11-07 | Sharp Corp | Device and method for document processing and storage medium used for same |
CN1472675A (en) * | 2002-07-29 | 2004-02-04 | 明日工作室股份有限公司 | Title generation testing method and system thereof |
CN1955952A (en) * | 2005-10-25 | 2007-05-02 | 国际商业机器公司 | System and method for automatically extracting by-line information |
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
CN101751455A (en) * | 2009-12-31 | 2010-06-23 | 浙江大学 | Method for automatically generating title by adopting artificial intelligence technology |
CN105765566A (en) * | 2013-06-27 | 2016-07-13 | 谷歌公司 | Automatic generation of headlines |
CN105354333A (en) * | 2015-12-07 | 2016-02-24 | 天云融创数据科技(北京)有限公司 | Topic extraction method based on news text |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509417A (en) * | 2018-03-20 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Title generation method and equipment, storage medium, server |
CN110895586A (en) * | 2018-08-22 | 2020-03-20 | 腾讯科技(深圳)有限公司 | Method and device for generating news page, computer equipment and storage medium |
WO2022116435A1 (en) * | 2020-12-01 | 2022-06-09 | 平安科技(深圳)有限公司 | Title generation method and apparatus, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107203509B (en) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110874531B (en) | Topic analysis method and device and storage medium | |
KR102268875B1 (en) | System and method for inputting text into electronic devices | |
US7765098B2 (en) | Machine translation using vector space representations | |
Benajiba et al. | ANERsys 2.0: Conquering the NER task for the Arabic language by combining the maximum entropy with POS-tag information. | |
CN107463548B (en) | Phrase mining method and device | |
US9965460B1 (en) | Keyword extraction for relationship maps | |
US20130007020A1 (en) | Method and system of extracting concepts and relationships from texts | |
RU2618374C1 (en) | Identifying collocations in the texts in natural language | |
CN103154939A (en) | Statistical machine translation method using dependency forest | |
CN112445912B (en) | Fault log classification method, system, device and medium | |
CN107203509A (en) | Title generation method and device | |
Jain et al. | Context sensitive text summarization using k means clustering algorithm | |
JP2006251843A (en) | Synonym pair extracting device, and computer program therefor | |
CN112784009A (en) | Subject term mining method and device, electronic equipment and storage medium | |
CN107577713B (en) | Text handling method based on electric power dictionary | |
JP6867963B2 (en) | Summary Evaluation device, method, program, and storage medium | |
Sangati et al. | Multiword expression identification with recurring tree fragments and association measures | |
CN113761161A (en) | Text keyword extraction method and device, computer equipment and storage medium | |
CN111259661B (en) | New emotion word extraction method based on commodity comments | |
CN108776705B (en) | Text full-text accurate query method, device, equipment and readable medium | |
JP2009277099A (en) | Similar document retrieval device, method and program, and computer readable recording medium | |
JP2007011973A (en) | Information retrieval device and information retrieval program | |
Nghiem et al. | Using MathML parallel markup corpora for semantic enrichment of mathematical expressions | |
Sembok et al. | A rule and template based stemming algorithm for Arabic language | |
CN111625579B (en) | Information processing method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 101, 1st to 7th floors, Building 3, Yard 6, Jianfeng Road (South Extension), Haidian District, Beijing, 100070 Patentee after: TOLS INFORMATION TECHNOLOGY Co.,Ltd. Address before: 14b04, 14th floor, Jinqiu international building, 6 Zhichun Road, Haidian District, Beijing 100088 Patentee before: BEIJING TRS INFORMATION TECHNOLOGY Co.,Ltd. |
|
CP03 | Change of name, title or address |