CN111753533B

CN111753533B - Title text generation method, device, computer storage medium and electronic equipment

Info

Publication number: CN111753533B
Application number: CN201910338380.2A
Authority: CN
Inventors: 郭昆; 陶通; 赫阳
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-04-25
Filing date: 2019-04-25
Publication date: 2024-04-05
Anticipated expiration: 2039-04-25
Also published as: CN111753533A

Abstract

The disclosure relates to the technical field of computers, and in particular relates to a title text generation method and device, a storage medium and electronic equipment. The method comprises the following steps: obtaining the confusion degree of the first-level candidate titles; filtering the primary candidate titles according to the confusion degree to obtain secondary candidate titles; acquiring the click probability of the secondary candidate title; and sorting the secondary candidate titles based on the click probability, and determining target candidate titles from the sorted secondary candidate titles. According to the method and the device, the candidate titles are comprehensively ordered and filtered by combining the confusion degree and the click probability of the candidate titles to determine the target candidate titles, so that the accuracy and the logic of the target candidate titles are improved, and the attraction degree of the target candidate titles to users is increased.

Description

Title text generation method, device, computer storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technology, and more particularly, to a title text generation method, a title text generation apparatus, a computer storage medium, and an electronic device.

Background

With the development of computer technology and internet technology, the internet platform generally has a requirement for increasing the access amount of content (such as item information, browsing amount of news information, etc.), but in terms of massive data, in order to increase the access amount of content and facilitate users to find target content in the shortest time, whether to provide high-quality and attractive title text has become one of the non-trivial problems.

In the related art, titles of related contents are mainly generated based on preset rules or language models, however, the methods are difficult to balance the accuracy and smoothness of title texts and the attraction to users, on one hand, quality evaluation on generated titles is lacking, the fit degree of control titles and the related contents is difficult to be controlled, and the accuracy of the generated titles is low; on the other hand, to increase the accuracy of generating titles, consideration of the potential appeal of the title to the user is often ignored again.

Accordingly, there is a need to provide a new title text generation method.

It should be noted that the information of the present invention in the above background section is only for enhancing understanding of the background of the present disclosure, and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure aims to provide a method and apparatus for generating a title text, a computer storage medium, and an electronic device, so as to avoid problems in terms of difficulty in balancing accuracy, smoothness, and attraction to a user of the generated title text, at least to some extent.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided a title text generation method including: obtaining the confusion degree of the first-level candidate titles; filtering the primary candidate titles according to the confusion degree to obtain secondary candidate titles; acquiring the click probability of the secondary candidate title; and sorting the secondary candidate titles based on the click probability, and determining a final title from the sorted secondary candidate titles.

In an exemplary embodiment of the present disclosure, before the obtaining the confusion degree of the primary candidate title, the method further includes: extracting article tag words and target keywords in text information corresponding to the current article, and generating the first-level candidate titles according to the article tag words and the target keywords.

In an exemplary embodiment of the present disclosure, the primary candidate title includes at least two target keywords therein; the obtaining the confusion degree of the first-level candidate title comprises the following steps: calculating the co-occurrence probability of the target keyword pairs in the primary candidate titles; and determining the confusion degree according to the co-occurrence probability of the target keyword pair, wherein the target keyword pair consists of any two adjacent target keywords in the primary candidate title.

In an exemplary embodiment of the present disclosure, the target keyword pair includes a first keyword and a second keyword in sequence; the calculating the co-occurrence probability of the target keyword pair in the primary candidate title comprises the following steps: acquiring target object words corresponding to the primary candidate titles; acquiring a first probability of occurrence of a first target title in a preset title library, wherein the first target title is a title comprising the target object word and the target keyword pair; acquiring a second probability of occurrence of a second target title in the preset title library, wherein the second target title is a title containing the target object word and the first keyword; the first probability is compared with the second probability to obtain the co-occurrence probability.

In an exemplary embodiment of the present disclosure, the determining the confusion according to the co-occurrence probability of the target keyword pair, wherein the target keyword pair is composed of any two adjacent target keywords in the primary candidate title includes: the inverse of the geometric mean of the co-occurrence probabilities is calculated and determined as the confusion.

In an exemplary embodiment of the present disclosure, the filtering the primary candidate title according to the confusion degree to obtain a secondary candidate title includes: sorting the primary candidate titles from low to high according to the confusion degree to form a first sequence; sequentially intercepting a first preset number of primary candidate titles from the first sequence, and taking the first preset number of primary candidate titles as the secondary candidate titles.

In an exemplary embodiment of the present disclosure, the target keyword includes a concept word and an item word; the obtaining the click probability of the secondary candidate title comprises the following steps: acquiring a first click rate of a third target title, wherein the third target title is a title containing target concept words and the target object words; acquiring a second click rate corresponding to the title containing the target object word; comparing the first click rate with the second click rate to obtain a third click rate corresponding to the target concept word; and acquiring the click probability of the secondary candidate title according to the third click rate.

In an exemplary embodiment of the disclosure, the obtaining, according to the third click rate, a click probability of the second candidate title includes: and calculating the geometric mean of each third click rate, and taking the geometric mean as the click probability.

In an exemplary embodiment of the disclosure, the sorting the secondary candidate titles based on the click probability, and determining a target candidate title from the sorted secondary candidate titles, includes: sorting the secondary candidate titles from high to low according to the click probability to form a second sequence; performing de-duplication processing on the secondary candidate titles in the second sequence according to the confusion degree and the target keywords corresponding to the secondary candidate titles so as to obtain a third sequence; obtaining the lowest confusion degree corresponding to the second-level candidate title in the third sequence, and filtering the third sequence according to the lowest confusion degree to obtain a fourth sequence; sequentially intercepting a second preset number of secondary candidate titles from the fourth sequence, and taking the second preset number of secondary candidate titles as the target candidate titles.

In one exemplary embodiment of the present disclosure, the order of the primary candidate title corresponds to the number of concept words in the target keywords included in the primary candidate title.

According to an aspect of the present disclosure, there is provided a title text generating apparatus including: the confusion degree acquisition module is used for acquiring the confusion degree of the first-level candidate titles; the filtering module is used for filtering the primary candidate titles according to the confusion degree so as to obtain secondary candidate titles; the click probability acquisition module is used for acquiring the click probability of the secondary candidate title; and the determining module is used for sorting the secondary candidate titles based on the click probability and determining a final title from the sorted secondary candidate titles.

According to an aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the title text generation method of any one of the above.

According to one aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the title text generation method of any one of the above via execution of the executable instructions.

The title text generation method in the exemplary embodiments of the present disclosure performs a comprehensive ranking and filtering process on the selected titles to determine the final title by combining the confusion and click probability of the candidate title. On one hand, according to the confusion degree, the first-level candidate titles with poor logicality are filtered out, so that the quality and the readability of the final title are improved as a whole; on the other hand, to some extent, the click probability of a secondary candidate title characterizes the appeal of the title to the user, such that the final title, determined after sorting based on the click probability, fully considers the potential user appeal of the title, and thus, the present disclosure balances the readability, logically, and appeal of the generated title to the user.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

FIG. 1 illustrates a flowchart of a title text generation method according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flowchart for forming primary candidate titles according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flowchart for calculating co-occurrence probabilities for target keyword pairs in primary candidate titles according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a flowchart for obtaining click probabilities for secondary candidate titles according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a flowchart for determining target candidate titles based on secondary candidate titles according to an exemplary embodiment of the present disclosure;

fig. 6 illustrates a schematic configuration diagram of a title text generating apparatus according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a schematic diagram of a storage medium according to an exemplary embodiment of the present disclosure; and

fig. 8 shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus detailed descriptions thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more software-hardened modules, or in different networks and/or processor devices and/or microcontroller devices.

In the related art in the art, a title text is mainly generated by: extracting effective information from related contents based on preset rules, and combining the effective information to generate a title; knowledge of the information represented by the headline text is achieved by training the language model to assist in generating the headline text, for example by training an LSTM network (Long Short-Term Memory network) to output the corresponding headline text based on feature extraction.

Accordingly, the title text generation method in the related art has the following drawbacks: on the one hand, titles are generated according to a specified mode, and an evaluation process for the quality of the titles is lacking, for example, whether the generated titles are smooth, have higher fitting degree with related contents (such as object attribute information), have complete semantic information and the like; on the other hand, titles obtained through model training improve the accuracy of titles to a certain extent, but are difficult to hit points of interest of users, so that titles lack user appeal.

Many platforms (e.g., enterprise recruitment information platforms, hospital information platforms, e-commerce platforms, online dining platforms, etc.) typically provide related content for various items or information objects, including related images, title information, and other descriptive content, etc., in order to attract users, and the role of the title text is not trivial as one of the most rapid and efficient ways for users to learn information. For example, it can accurately reflect the item's advantages and grasp the user's point of interest title, which will attract more users.

Based on this, in an exemplary embodiment of the present disclosure, a title text generation method is provided first. Referring to fig. 1, the title text generation method includes the steps of:

Step S110: obtaining the confusion degree of the first-level candidate titles;

step S120: filtering the primary candidate titles according to the confusion degree to obtain secondary candidate titles;

step S130: acquiring the click probability of the secondary candidate title;

step S140: and sorting the secondary candidate titles based on the click probability, and determining a final title from the sorted secondary candidate titles.

According to the title text generation method in the present exemplary embodiment, on one hand, the first-level candidate titles that are relatively unsmooth and have poor logicality are filtered according to the confusion degree, so that the quality and the readability of the final title are improved as a whole; on the other hand, to some extent, the click probability of a secondary candidate title characterizes the appeal of the title to the user, such that the final title, determined after sorting based on the click probability, fully considers the potential user appeal of the title, and thus, the present disclosure balances the readability, logically, and appeal of the generated title to the user.

The title text generation method in the exemplary embodiments of the present disclosure will be further described below in connection with the title text generation process of an object of an article.

In step S110, the degree of confusion of the primary candidate title is acquired.

In an exemplary embodiment of the present disclosure, the first-level candidate titles are candidate titles obtained by preprocessing target keywords extracted from multiple information sources, filtering, combining, and the like; the confusion is an index for measuring the quality of a language model in natural language processing, and in the present disclosure, the confusion is an index for representing whether a primary candidate title is clear, logically clear and readable, the lower the confusion is, the higher the above index of the corresponding primary candidate title is, for example, the title "bluetooth wireless smart earphone" and "bluetooth wireless full-automatic watch", and obviously, the later title has the problems of unclear logic, clear obscuration and the like, so the confusion of the title "bluetooth wireless smart earphone" is lower than the "bluetooth wireless full-automatic watch".

Specifically, before the confusion degree of the first-level candidate title is obtained, firstly extracting an item tag word and a target keyword in text information corresponding to the current item, and generating the first-level candidate title according to the item tag word and the target keyword, wherein the target keyword is a modified phrase or an item attribute word which can accurately reflect the characteristic of a certain aspect of the item and has correct and strict text description, and the modified phrase or the item attribute word comprises a concept word and an item word. For example, the concept word: winter, waterproof, spicy and hot, etc.; item words: "Pencil", "earphone", "computer", etc. Fig. 2 shows a flow chart for forming a primary candidate title, as shown in fig. 2, comprising the steps of:

In step S210, extraction of a target keyword is performed on text information corresponding to the current item.

In an exemplary embodiment of the present disclosure, the target keywords include concept words, which are key components constituting the title text, and item words, which are extracted from multiple information sources by means of a sequence annotation model. Sequence annotation models may include CRF models (Conditional Random Field, conditional random fields), LSTM models (Long Short-Term Memory network), CRF-BILSTM (Conditional Random Field-Bi-directional Long Term Memory network, conditional random field-bidirectional Long-Term Memory network) models, and so forth; the multiple information sources include article supply information, distribution channel information, demand information, article technology information, and the like. Specifically, various concept words having a preset type may be extracted, and table 1 is a concept word extracted from multiple information sources, and as shown in table 1, a plurality of concept words are extracted from multiple information sources of an article according to a preset type.

TABLE 1

Preset type	Concept words
		Regional attributes	"America", "Beijing", "sea lake"
Seasonal attribute	"spring and autumn", "winter", "early spring"
		Crowd attributes	"boy", "lover", "elder"
Scene attributes	"bathroom", "kitchen", "office"

It should be noted that, table 1 is only a partial example of the concept words extracted from the multiple information sources, and more types of concept words may be extracted from the multiple information sources according to actual requirements, for example, functional attributes (such as bluetooth), taste attributes (such as spicy and hot), style attributes (such as retro), material attributes (such as plastics), and style attributes (such as in-ear type), etc. Accordingly, the article words, such as "mobile phone", "fan", "book", "drone", etc., may be extracted from multiple information sources, which will not be described in detail in this disclosure.

In step S220, the target keywords are filtered to obtain a set of compliant keywords.

In an exemplary embodiment of the present disclosure, the target keyword extracted through step S210 may contain a offensive word; or, not suitable as a word constituting a title; or words in which the concept words collide with the article words, and the like, so that a qualified keyword set is obtained by screening and filtering the words. For example, filtering words such as "top", "king", "famous", "precise" and "extreme"; filtering the words with conflicts between concept words such as full-automatic words and watch words and object words.

In step S230, the text information corresponding to the current item is extracted with item tag words to form an item prefix.

In an exemplary embodiment of the present disclosure, the item tag is a vocabulary, symbol, design, etc. that distinguishes an item from other items, and is a comprehensive reflection of the item occupying a certain position in the user's consciousness; the item tag words are words describing the above features, and may be brand words, for example. Therefore, the present disclosure first extracts the item tag words and forms the item tag words into item prefixes according to a preset text format. For example, if the Chinese article tag word or the English article tag word is empty, acquiring the article tag word fails, and the corresponding title is not provided with an article prefix; for another example, if the Chinese article tag word is consistent with the English article tag word, the Chinese article tag word is taken as the article prefix of the current article; for another example, if the Chinese article tag word and the English article tag word have an inclusion relationship, taking the article tag word with more text content as an article prefix of the current article; finally, if the Chinese article tag word is inconsistent with the English article tag word, forming an article prefix of the current article in the form of Chinese article tag word (English article tag word), and if the prefix is overlong, taking the Chinese article tag word as the article prefix of the current article. For example, if the extracted chinese article tag is "SONY", the english article tag is "SONY", the article prefix formed is "SONY (SONY)". Of course, other ways of forming the article prefix may be selected according to practical situations, and the present disclosure includes, but is not limited to, the above-described forming way of forming the article prefix according to the article tag word.

In step S240, a first-level candidate title is generated from the item prefix and the target keyword.

In an exemplary embodiment of the present disclosure, a first-level candidate title is formed in a format of "item prefix+concept word+item word" based on the obtained item prefix, the target keywords (concept word and item word) in the keyword set. Such as "SONY bluetooth headset" and "SONY bluetooth wireless headset", wherein the number of target keywords for the two titles is two and three, respectively. Accordingly, the primary candidate titles may be further divided into a plurality of groups according to the number of target keywords included in the primary candidate titles, and each group of primary candidate titles may be defined as a first-order text, a second-order text, and a third-order text … …, respectively; wherein the order of the primary candidate title corresponds to the number of concept words in the target keywords contained in the primary candidate title. Specific formats are as follows: the first order text is the article prefix + concept word 1+ article word, such as "SONY (SONY) bluetooth headset"; the second order text is an article prefix+a concept word 1+a concept word 2+an article word, such as a Sony (SONY) Bluetooth wireless headset; third order text is article prefix+concept word 1+concept word 2+concept word 3+article word, such as "SONY (SONY) bluetooth wireless sports headset".

In step S250, the obtained primary candidate titles are subjected to filtering processing.

In the exemplary embodiment of the present disclosure, the concept words in the first-level candidate title are orderly, the priority levels of the corresponding positions of the concept words in the title are different, the higher the priority level is, the earlier the corresponding position of the concept word in the title is, so that filtering can be performed according to the type of the concept word in the first-level candidate title; in addition, the concept words belonging to the same type appear only once in the title text, and therefore, the first-level candidate titles can also be subjected to filtering processing according to the frequency of occurrence of the concept words. The present disclosure includes, but is not limited to, the location priority specified in table 2 below:

TABLE 2

Priority level	Preset type
		3	Regional attributes
2	Seasonal attribute, crowd attribute, and scene attribute
		1	Style attributes, texture attributes, style attributes, function attributes, taste attributes

For example, if the concept word 1 belongs to the crowd attribute and the concept word 2 belongs to the style attribute, the concept word 1 is located before the concept word 2 in the formed first-level candidate titles, for example, "lover Korean bathroom", and the first-level candidate titles, for example, "Korean lover bathroom", are filtered out.

It should be noted that, if the first-level candidate titles are divided into multiple groups according to the number of target keywords included in the first-level candidate titles, filtering operations of the first-level candidate titles may be performed on each group, so as to finally obtain multiple groups of first-level candidate titles, which will not be described in detail in this disclosure.

Further, after forming a plurality of primary candidate titles through steps S210 to S250, the confusion of the primary candidate titles is acquired. The primary candidate title in the present disclosure includes at least two target keywords; the process of obtaining the confusion degree comprises the following steps: first, based on a preset topic library, the co-occurrence probability between each target keyword (i.e., target keyword pair) in the primary candidate topic is calculated, and the confusion degree of the primary candidate topic is determined according to the co-occurrence probability. The target keyword pairs consist of any two adjacent target keywords in the first-level candidate title, namely the target keyword pairs sequentially comprise a first keyword and a second keyword; in addition, if a primary candidate title includes N target keywords, the title includes N-1 pairs of target keyword pairs. For example, the first-level candidate title is "sony wireless bluetooth headset", and the target keyword pair corresponding to the title includes "wireless bluetooth" and "bluetooth headset", and then co-occurrence probabilities of the two target keyword pairs are calculated respectively. FIG. 3 shows a flow chart for computing co-occurrence probabilities for target keyword pairs in primary candidate titles, as described in FIG. 3, comprising the steps of:

In step S310, a target item word corresponding to the primary candidate title is acquired.

In the exemplary embodiment of the present disclosure, preset item words exist for different primary candidate titles, so that a target item word corresponding to a primary candidate title may be obtained from a database, and the co-occurrence probability of a target keyword pair at the target item word granularity may be calculated.

In step S320, a first probability of occurrence of a first target title in the preset title library is calculated, wherein the first target title is a title including a target item word and a target keyword pair.

In an exemplary embodiment of the present disclosure, a first probability of occurrence of a first target topic including a target keyword pair and a target item word in a primary candidate topic is calculated sequentially, e.g., at the granularity of the target item word "earphone", a first probability of occurrence of a first target topic including a target keyword pair "wireless Bluetooth" and "earphone", i.e., P ₁ (first keyword, second keyword, target item word).

In step S330, a second probability of occurrence of a second target title in the preset title library is calculated, wherein the second target title is a title including the target item word and the first keyword.

In the exemplary embodiment of the present disclosure, since adjacent target keywords are sequentially acquired from the head to the tail of the second-level candidate title to determine the target keyword pair, in the second-level candidate title such as "sony bluetooth wireless headset," wireless "is a second keyword in the target keyword pair" bluetooth wireless "and is a first keyword in the target keyword pair" wireless headset, "then, for any pair of target keyword pairs, the second probability that the second target title appears may be expressed as P ₂ (first keyword, target item word).

In step S340, the first probability is compared with the second probability to obtain the co-occurrence probability.

In an exemplary embodiment of the present disclosure, the co-occurrence probability of the target keyword pair in the primary candidate topic may be evaluated under a Unigram probability model (Unigram model) as the confusion degree of the primary candidate topic, and the co-occurrence probability may be obtained based on the obtained first probability and second probability by the following formula:

Uni(Word ₁ ,Word ₂ |Product _* )＝P(Word ₂ |Word ₁ ,Product _* )

wherein Uni (Word) ₁ ,Word ₂ |Product _* ) To represent co-occurrence of target keyword pairs in primary candidate titles at target item Word granularity under the Unigram model, P (Word) ₂ |Word ₁ ,Product _* ) P is co-occurrence probability ₁ (Word ₁ ,Word ₂ ,Product _* ) For a first probability of occurrence of a first target title, P ₂ (Word ₁ ,Product _* ) Word for the second probability of the second target title appearing ₁ Word as the first keyword ₂ As the second keyword, product _* Is the target item word.

And finally, determining the confusion degree of the primary candidate title according to the co-occurrence probability of the target keyword pair in the acquired primary candidate title. Wherein { Word } is sequentially used for the target keyword and the target object Word ₁ ,Word ₂ ,……，Product _* The confusion of the first-order candidate titles may be calculated by taking the inverse of the geometric mean of the co-occurrence probabilities as the confusion, as shown in the following formula:

wherein Perplexity (word|product) _* ) To be confused, word _i Word as the first keyword _i+1 Is the second keyword.

It should be noted that the confusion of the first-order candidate titles may also be determined by taking the reciprocal of the arithmetic mean of the co-occurrence probability, the reciprocal of the square mean, or the like, and the present disclosure includes, but is not limited to, the above-described method of determining the confusion of the first-order candidate titles according to the co-occurrence probability.

In step S120, the primary candidate titles are filtered according to the confusion degree, so as to obtain secondary candidate titles.

In an exemplary embodiment of the present disclosure, a title with higher confusion among the obtained primary candidate titles is filtered, and the remaining titles are regarded as secondary candidate titles. The process comprises the following steps: firstly, sorting primary candidate titles from low to high according to confusion degree to form a first sequence; then, a first preset number of first-level candidate titles are sequentially intercepted from the first sequence to obtain second-level candidate titles, and the titles are subjected to sorting and filtering processing through the confusion degree of the first-level candidate titles, so that the titles which are more smooth, have stronger logic and higher readability can be reserved, the quality evaluation processing of the title text is realized in the process of title generation, and the quality of the generated title text is improved.

In step S130, the click probability of the secondary candidate title is acquired.

In the exemplary embodiment of the disclosure, the click probability of the secondary target title is an evaluation of the clicked behavior of the article, and the higher the click probability is, the higher the probability that the secondary target title is clicked is, so that the attraction degree of the secondary target title to the user can be reflected to a certain extent. Fig. 4 shows a flowchart for obtaining the click probability of a secondary candidate title, as shown in fig. 4, the process comprising the steps of: in step S410, a first click rate of a third target title is acquired, wherein the third target title is a title including a target concept word and a target item word; wherein the target concept word is any concept word in the secondary candidate title; in step S420, a second click rate corresponding to the title including the target item word is obtained; in step S430, comparing the first click rate with the second click rate to obtain a third click rate corresponding to the target concept word; in step S440, the click probability of the secondary candidate title is acquired according to the third click rate.

For example, for the secondary candidate title "sony bluetooth wireless headset", first, a first click rate of a third target title containing the target item word "headset" and containing the target concept word "bluetooth" is calculated; then, calculating a second click rate corresponding to the title containing the target article word 'earphone'; and finally, comparing the first click rate with the second click rate to obtain a third click rate corresponding to the target concept word Bluetooth. The third click rate corresponding to the target concept word "wireless" can be correspondingly obtained, which is not described in detail in the disclosure; and finally, determining the click probability of the secondary candidate title according to the third click rate corresponding to each target concept word in the obtained second target title. Wherein, the geometric mean of each third click rate can be obtained and used as the click probability of the secondary candidate title. The target concept words are word= { Word in turn ₁ ,Word ₂ … …, the target item word is Product _* The click probability of the secondary candidate title of (c) can be obtained by the following formula:

wherein P is _click (Word|Product _* ) Click probability for secondary candidate title, P _click (Word _i |Product _* ) And the third click rate. The click probability of the secondary candidate title may also be determined by taking an arithmetic mean, a square mean, or the like of each third click rate, and the present disclosure includes, but is not limited to, the above-described method of obtaining the click probability of the secondary candidate title according to the third click rate.

In step S140, the secondary candidate titles are ranked based on the click probability, and a final title is determined from the ranked secondary candidate titles.

In an exemplary embodiment of the present disclosure, the secondary candidate titles are ranked based on the obtained click probabilities to determine target candidate titles from the ranked secondary candidate titles. Fig. 5 shows a flowchart for determining a target candidate title based on a secondary candidate title, as shown in fig. 5, the process comprising:

in step 510, the secondary candidate titles are ranked from high to low according to the click probability to form a second sequence.

In step 520, the secondary candidate titles in the second sequence are subjected to deduplication processing according to the confusion degree and the target keyword corresponding to the secondary candidate titles, so as to obtain a third sequence.

In an exemplary embodiment of the present disclosure, the confusion corresponding to the secondary candidate title coincides with the confusion corresponding to it as the first candidate title. Wherein the deduplication process includes retaining only low-confusion secondary candidate titles, such as "sony wireless bluetooth headset" and "sony bluetooth wireless headset", for different secondary candidate titles having the same target keyword composition, and retaining only the lower-confusion one of the target keyword compositions of the two titles.

In step 530, the lowest confusion degree corresponding to the second-level candidate title in the third sequence is obtained, and filtering processing is performed on the third sequence according to the lowest confusion degree, so as to obtain a fourth sequence.

In an exemplary embodiment of the present disclosure, first, a lowest confusion degree corresponding to a second-level selected title is acquired; then, the secondary candidate titles corresponding to the confusion degree higher than the lowest confusion degree by the preset percentage are deleted to acquire a fourth sequence. The second-level candidate titles corresponding to the confusion degree higher than the lowest confusion degree by a preset percentage are deleted, so that the confusion degree balance among the second-level candidate titles in the fourth sequence can be ensured, that is, the difference of the accuracy, the logic property and the readability of each second-level candidate title is ensured not to exceed a preset threshold value.

In step 540, a second preset number of the secondary candidate titles are sequentially intercepted from the fourth sequence, and the second preset number of the secondary candidate titles are used as the target candidate titles.

In an exemplary embodiment of the present disclosure, since the secondary candidate titles in the fourth sequence are sorted according to the click probability, a second preset number of secondary candidate titles is directly sequentially intercepted from the fourth sequence as target candidate titles, where the second preset number is less than or equal to the first preset number. The sorting and filtering processing of the titles is carried out based on the clicking probability of the secondary candidate titles, potential attractions of the title text to users are considered, the title text is attractive to a certain extent, in addition, the secondary candidate titles are subjected to de-duplication and filtering processing according to the confusion degree corresponding to the secondary candidate titles and the target keyword composition, and the overall smoothness, logic and readability of the target candidate titles are improved.

It should be noted that, with continued reference to step S240, if the primary candidate titles are initially divided into a plurality of groups according to the number of target keywords included in the primary candidate titles, the subsequent operations such as sorting and filtering based on confusion and click probability are performed independently in each group, so as to obtain a plurality of groups of titles with different target keyword numbers, so as to provide documents with different lengths for users with different behavioral habits, which improves flexibility of title text generation and use, and is not repeated in this disclosure.

In addition, in an exemplary embodiment of the present disclosure, a title text generating apparatus is also provided. Referring to fig. 6, the title text generating apparatus 600 may include a confusion degree acquisition module 610, a filtering module 620, a click probability acquisition module 630, and a determination module 640. In particular, the method comprises the steps of,

a confusion obtaining module 610, configured to obtain a confusion of the first-level candidate title;

a filtering module 620, configured to perform filtering processing on the primary candidate title according to the confusion degree, so as to obtain a secondary candidate title;

a click probability acquisition module 630, configured to acquire a click probability of the secondary candidate title;

and the determining module 640 is configured to rank the secondary candidate titles based on the click probability, and determine target candidate titles from the ranked secondary candidate titles.

In an exemplary embodiment of the present disclosure, the title text generating device further includes a title generating module, configured to extract an item tag word and a target keyword in text information corresponding to a current item, and generate the first-level candidate title according to the item tag word and the target keyword.

In an exemplary embodiment of the present disclosure, the primary candidate title includes at least two target keywords therein; the confusion degree obtaining module may include a co-occurrence probability calculating unit, configured to calculate a co-occurrence probability of the target keyword pair in the primary candidate title. And the confusion degree determining unit is used for determining the confusion degree according to the co-occurrence probability of the target keyword pair, wherein the target keyword pair consists of any two adjacent target keywords in the primary candidate title.

In an exemplary embodiment of the present disclosure, the target keyword pair includes a first keyword and a second keyword in sequence; the confusion degree acquisition module may further include a target item word acquisition unit configured to acquire a target item word corresponding to the primary candidate title; the first probability acquisition unit is used for acquiring a first probability of occurrence of a first target title in a preset title library, wherein the first target title is a title containing the target object word and the target keyword pair; a second probability obtaining unit, configured to obtain a second probability of occurrence of a second target title in the preset title library, where the second target title is a title including the target item word and the first keyword; and the ratio obtaining unit is used for comparing the first probability with the second probability to obtain the co-occurrence probability.

In an exemplary embodiment of the present disclosure, the inverse of the geometric mean of the co-occurrence probabilities is taken and determined as the confusion degree.

In an exemplary embodiment of the present disclosure, the primary candidate title is a plurality of; the filtering module may include a ranking unit configured to rank the primary candidate titles from low to high according to the confusion degree to form a first sequence; the data interception unit is used for intercepting first-level candidate titles with a first preset number in sequence from the first sequence, and taking the first-level candidate titles with the first preset number as the second-level candidate titles.

In an exemplary embodiment of the present disclosure, the target keyword includes a concept word and an item word; the click probability acquisition module may include a first click rate acquisition unit configured to acquire a first click rate of a third target title, where the third target title is a title including a target concept word and the target item word; a second click rate obtaining unit, configured to obtain a second click rate corresponding to a title including the target item word; a third click rate obtaining unit, configured to compare the first click rate with the second click rate, so as to obtain a third click rate corresponding to the target concept word; and acquiring the click probability of the secondary candidate title according to the third click probability.

In an exemplary embodiment of the present disclosure, the geometric mean of each of the third click rates is determined and used as the click probability.

In an exemplary embodiment of the present disclosure, the determining module may include a ranking unit configured to rank the secondary candidate titles from high to low according to the click probability to form a second sequence; the duplicate removal processing unit is used for carrying out duplicate removal processing on the second-level candidate titles in the second sequence according to the confusion degree and the target keyword corresponding to the second-level candidate titles so as to acquire a third sequence; the filtering unit is used for acquiring the lowest confusion degree corresponding to the second-level candidate titles in the third sequence, and filtering the third sequence according to the lowest confusion degree so as to acquire a fourth sequence; the data intercepting unit is used for intercepting a second preset number of secondary candidate titles in sequence from the fourth sequence, and taking the second preset number of secondary candidate titles as the target candidate titles.

In an exemplary embodiment of the present disclosure, the order of the primary candidate title corresponds to the number of concept words in the target keywords included in the primary candidate title.

Since each functional module of the title text generating apparatus of the exemplary embodiment of the present disclosure is the same as that in the inventive embodiment of the title text generating method described above, a detailed description thereof will be omitted.

It should be noted that although several modules or units of the title text generating apparatus are mentioned in the above detailed description, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, in exemplary embodiments of the present disclosure, a computer storage medium capable of implementing the above-described method is also provided. On which a program product is stored which enables the implementation of the method described above in the present specification. In some possible embodiments, the various aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 7, a program product 700 for implementing the above-described method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided. Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 800 according to such an embodiment of the present disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 8, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one storage unit 820, a bus 830 connecting the different system components (including the storage unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification.

The storage unit 820 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 8201 and/or cache memory 8202, and may further include Read Only Memory (ROM) 8203.

Storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 800, and/or any device (e.g., router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 850. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A title text generation method, characterized by comprising:

obtaining the confusion degree of the first-level candidate titles;

filtering the primary candidate titles according to the confusion degree to obtain secondary candidate titles;

acquiring the click probability of the secondary candidate title;

sorting the secondary candidate titles based on the click probability, and determining target candidate titles from the sorted secondary candidate titles;

the first-level candidate title includes at least two target keywords, and the obtaining the confusion degree of the first-level candidate title includes:

calculating the co-occurrence probability of the target keyword pairs in the primary candidate titles;

solving the reciprocal of the geometric mean of the co-occurrence probability, and determining the reciprocal as the confusion degree, wherein the target keyword pair consists of any two adjacent target keywords in the primary candidate title, and the target keyword pair sequentially comprises a first keyword and a second keyword;

The calculating the co-occurrence probability of the target keyword pair in the primary candidate title comprises the following steps: acquiring target object words corresponding to the primary candidate titles; acquiring a first probability of occurrence of a first target title in a preset title library, wherein the first target title is a title comprising the target object word and the target keyword pair; acquiring a second probability of occurrence of a second target title in the preset title library, wherein the second target title is a title containing the target object word and the first keyword; the first probability is compared with the second probability to obtain the co-occurrence probability.

2. The title text generation method according to claim 1, wherein before the obtaining of the confusion degree of the primary candidate title, the method further comprises:

extracting article tag words and target keywords in text information corresponding to the current article, and generating the first-level candidate titles according to the article tag words and the target keywords.

3. The title text generation method according to claim 2, wherein the primary candidate title is a plurality of;

the filtering processing is performed on the primary candidate titles according to the confusion degree to obtain secondary candidate titles, including:

Sorting the primary candidate titles from low to high according to the confusion degree to form a first sequence;

sequentially intercepting a first preset number of primary candidate titles from the first sequence, and taking the first preset number of primary candidate titles as the secondary candidate titles.

4. The title text generation method according to claim 1, wherein the target keyword includes a concept word and an item word;

the obtaining the click probability of the secondary candidate title comprises the following steps:

acquiring a first click rate of a third target title, wherein the third target title is a title containing target concept words and the target object words;

acquiring a second click rate corresponding to the title containing the target object word;

comparing the first click rate with the second click rate to obtain a third click rate corresponding to the target concept word;

and acquiring the click probability of the secondary candidate title according to the third click rate.

5. The title text generation method according to claim 4, wherein the obtaining the click probability of the secondary candidate title according to the third click rate includes:

and calculating the geometric mean of each third click rate, and taking the geometric mean as the click probability.

6. The title text generation method according to claim 1, wherein the ranking the secondary candidate titles based on the click probability and determining a target candidate title from the ranked secondary candidate titles comprises:

sorting the secondary candidate titles from high to low according to the click probability to form a second sequence;

performing de-duplication processing on the secondary candidate titles in the second sequence according to the confusion degree and the target keywords corresponding to the secondary candidate titles so as to obtain a third sequence;

obtaining the lowest confusion degree corresponding to the second-level candidate title in the third sequence, and filtering the third sequence according to the lowest confusion degree to obtain a fourth sequence;

sequentially intercepting a second preset number of secondary candidate titles from the fourth sequence, and taking the second preset number of secondary candidate titles as the target candidate titles.

7. The method according to any one of claims 1 to 6, wherein the order of the primary candidate title corresponds to the number of concept words in the target keyword included in the primary candidate title.

8. A title text generating apparatus, the apparatus comprising:

the confusion degree acquisition module is used for acquiring the confusion degree of the first-level candidate titles;

the filtering module is used for filtering the primary candidate titles according to the confusion degree so as to obtain secondary candidate titles;

the click probability acquisition module is used for acquiring the click probability of the secondary candidate title;

the determining module is used for sorting the secondary candidate titles based on the click probability and determining target candidate titles from the sorted secondary candidate titles;

9. A storage medium having stored thereon a computer program which, when executed by a processor, implements the title text generation method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the title text generation method of any one of claims 1 to 7 via execution of the executable instructions.