CN110852078A - Method and device for generating title - Google Patents

Method and device for generating title Download PDF

Info

Publication number
CN110852078A
CN110852078A CN201810844000.8A CN201810844000A CN110852078A CN 110852078 A CN110852078 A CN 110852078A CN 201810844000 A CN201810844000 A CN 201810844000A CN 110852078 A CN110852078 A CN 110852078A
Authority
CN
China
Prior art keywords
keyword
core
keywords
clustering
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810844000.8A
Other languages
Chinese (zh)
Inventor
李俊涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810844000.8A priority Critical patent/CN110852078A/en
Publication of CN110852078A publication Critical patent/CN110852078A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a device for generating a title, and relates to the technical field of computers. One embodiment of the method comprises: extracting search keywords from historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; and generating a title based on the core weight of the core keyword. The implementation method can improve the hit conversion rate and the user experience; the generated title accurately contains the product characteristics concerned by the user and accords with the content of the product.

Description

Method and device for generating title
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a title.
Background
With the popularization of networks and the development of computer technologies, networks have already been integrated into the lives of people and bring convenience to the lives. At present, people are accustomed to network life, purchasing goods, viewing news, searching papers, and the like through the network.
Generally, a user searches and browses related products by means of keywords. When each website platform displays its product (such as commodity, news or paper), in order to reduce occupied pages or facilitate browsing of users, only the title of the product is displayed, and the title is a short introduction generated according to the content or characteristics of the product. Currently, there are two main methods for generating titles:
1. an extraction formula, extracting abstract sentences and words for generating a title from an original text of a product, and performing text compression on the abstract sentences and the words to generate the title;
2. and a generation formula for generating abstract sentences and words of the title, and performing text compression on the abstract sentences and words to generate the title, wherein the abstract sentences and words can be freely generated without necessarily being extracted from the original text.
At present, most of the Mongolian website platforms can enable titles of products to cover keywords of multiple categories to the maximum extent and increase text lengths of the titles in order to improve search hit rate of the products and improve commodity purchase quantity, news reading quantity or thesis reading quantity and the like.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the titles generated by the prior art have the problems of insufficient precision, overlong texts or redundancy and the like, the phenomena of excessive searched products or inconsistency of searched partial products with keywords and the like easily occur through keyword search, the hit conversion rate is low, and the user experience is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for generating a title, which can improve hit conversion rate and user experience; the generated title accurately contains the product characteristics concerned by the user and accords with the content of the product.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of generating a title.
The title generation method in the embodiment of the invention comprises the following steps: extracting search keywords from historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; and generating a title based on the core weight of the core keyword.
Optionally, analyzing the keyword dataset to obtain a core keyword and a core weight of the core keyword includes: calculating the average value of the keyword data set as a core keyword; distributing an initial weight value for the core keyword; calculating similarity values between the keyword data sets using a bidirectional neural network; and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
Optionally, the adjusting the initial weight based on the similarity value to obtain an adjusted weight includes adjusting the initial weight by using a weight processing formula to obtain an adjusted weight, and selecting the largest adjusted weight as a core weight, where W is v + α (x-v), W is the adjusted weight, v is the initial weight, x is an adjusted independent variable, and α is the similarity value.
Optionally, clustering the search keyword includes: and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model.
Optionally, clustering the search keywords to obtain a keyword dataset includes: randomly selecting K search keywords as clustering centroid points; dividing the search keywords of the same category as the cluster centroid points into a keyword cluster; calculating the average value of the keyword clusters as a new cluster centroid point; re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster; and clustering the keywords of which the clustering centroid points do not change any more or the division times reach a preset value to serve as a keyword data set.
Optionally, extracting the search key from the historical search data comprises: clustering historical search data to obtain a historical data set; and calculating the average value of the historical data set as a search key.
Optionally, the method further comprises: extracting real-time keywords based on the real-time search data; clustering the real-time keywords to obtain a real-time data set; analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword; and checking or adjusting the title based on the checking weight value of the checking keyword.
To achieve the above object, according to another aspect of embodiments of the present invention, there is provided an apparatus for generating a title.
The title generation device of the embodiment of the invention comprises: the extraction module is used for extracting search keywords from historical search data; the clustering module is used for clustering the search keywords to obtain a keyword data set; the analysis module is used for analyzing the keyword data set to obtain a core keyword and a core weight of the core keyword; and the generating module is used for generating a title based on the core weight of the core keyword.
Optionally, the analysis module is further configured to: calculating the average value of the keyword data set as a core keyword; distributing an initial weight value for the core keyword; calculating similarity values between the keyword data sets using a bidirectional neural network; and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
Optionally, the analysis module is further configured to adjust the initial weight by using a weight processing formula to obtain an adjusted weight, and select the largest adjusted weight as a core weight, where the weight processing formula is W ═ v + α (x-v), W is the adjusted weight, v is the initial weight, x is an adjusted independent variable, and α is a similarity value.
Optionally, the clustering module is further configured to: and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model.
Optionally, the clustering module is further configured to: randomly selecting K search keywords as clustering centroid points; dividing the search keywords of the same category as the cluster centroid points into a keyword cluster; calculating the average value of the keyword clusters as a new cluster centroid point; re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster; and clustering the keywords of which the clustering centroid points do not change any more or the division times reach a preset value to serve as a keyword data set.
Optionally, the extracting module is further configured to: clustering historical search data to obtain a historical data set; and calculating the average value of the historical data set as a search key.
Optionally, the apparatus further comprises: the checking module is used for extracting real-time keywords based on the real-time search data; clustering the real-time keywords to obtain a real-time data set; analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword; and checking or adjusting the title based on the checking weight value of the checking keyword.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an electronic device that generates a title.
An electronic device for generating a title according to an embodiment of the present invention includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of generating a title according to an embodiment of the present invention.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, implements a method of generating a title of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: because the method adopts the method of extracting the search key words from the historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; the technical means for generating the title based on the core weight of the core keyword generates the title according to the real user operation behavior, so that the technical problems of inaccurate title, overlong text or redundancy and the like, low hit conversion rate and poor user experience are solved, and the hit conversion rate and the user experience are improved; the generated title accurately contains the product characteristics concerned by the user and accords with the technical effect of the content of the product.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of main steps of a method of generating a title according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a main flow of a method of generating a title according to one referential embodiment of the present invention;
FIG. 3 is a schematic diagram of a main flow of clustering search keywords according to one referential embodiment of the present invention;
fig. 4 is a schematic diagram of main blocks of an apparatus for generating a title according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram of main steps of a method of generating a title according to an embodiment of the present invention.
As shown in fig. 1, the method for generating a title according to the embodiment of the present invention mainly includes the following steps:
step S101: search keywords are extracted from historical search data.
In the prior art, in order to cover keywords to the maximum extent, the generated title is usually long in text and redundant in content, most searched products are easy to be inconsistent with the keywords, and the user experience is poor. Moreover, because the text of the title is too long or redundant, the number of searched products is too large easily through keyword search, and the hit conversion rate is easily reduced.
In general, the title of a product can greatly affect the hit conversion rate of the product, and insufficient precision of the title, too long text, or too brief title may reduce the hit conversion rate. The hit conversion rate refers to the probability that a user searches for a product through a keyword and performs operations such as purchasing, viewing or collecting. In order to improve user experience and hit conversion rate, the embodiment of the invention provides a method for generating a title, and the title is generated according to real user operation behaviors.
In order to enable a user to quickly and accurately find an expected product, that is, to ensure that a generated title contains product features concerned by the user and conforms to the content of the product, in the embodiment of the invention, a search keyword is extracted from historical search data corresponding to the product, the historical search data is content (keyword) searched by the user, after the user searches for a certain keyword to find the product, the user performs operations such as purchasing, viewing or collecting on the product, and the keyword is the historical search data of the product, for example, after the user searches for a "computer paper", the "a paper" in a search result is collected, and the "computer paper" can be used as the historical search data of the product "a paper". In addition, for extracting search keywords from historical search data, the keywords in the historical search data can be counted to obtain the search frequency of each keyword, and a plurality of keywords with high search frequency are selected as the search keywords; search keywords can also be extracted from historical search data based on some algorithmic model, such as the artificial intelligence learning system (TensorFlow) framework, which is a framework that transports complex data structures into artificial intelligence neural networks for analysis and processing.
In the embodiment of the present invention, step S101 may be implemented by: clustering historical search data to obtain a historical data set; the average of the historical data sets is calculated as a search key.
Clustering refers to identifying rules that are inherent in data, and classifying the data into several classes according to the rules. The historical search data are classified, the historical search data in each cluster form a historical data set, the historical search data of the same historical data set have higher similarity values, and the historical search data of different historical data sets have lower similarity values. The average of the historical data set refers to: and the historical search data with the minimum distance from the cluster center in the historical data set, namely the historical search data closest to the cluster center. The distance of the historical search data from the cluster center can be measured using a minkowski distance, a manhattan distance, or a chebyshev distance, among others.
Step S102: and clustering the search keywords to obtain a keyword data set.
The search keywords extracted in step S101 may be classified, and the search keywords of the same category are put into the same keyword dataset.
In the embodiment of the present invention, clustering search keywords may be performed in the following manner: and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model and the like.
Wherein the K-Means algorithm (K-Means) is to divide n search keywords into K clusters such that the obtained clusters satisfy: the similarity value of the search keywords in the same cluster is high; while the similarity values of the search keywords in different clusters are smaller. The K-center algorithm (K-medoid) selects a search keyword as a center point, identifies the cluster through the center point, and assigns the remaining search keywords to corresponding clusters according to the category of the search keyword. The density-based clustering algorithm (DBSCAN) is the largest set to find density-connected search keys. The Gaussian Mixture Model (GMM) is to give the probability that each search keyword belongs to each cluster, and assign the search keyword to the cluster corresponding to the highest probability.
In the embodiment of the present invention, step S102 may be implemented by: randomly selecting K search keywords as clustering centroid points; dividing the search keywords with the same category as the clustering centroid points into a keyword cluster; calculating the average value of the keyword clusters as a new cluster centroid point; re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster; and clustering the keywords of which the clustering centroid points do not change any more or the iteration times reach a preset value to serve as a keyword data set.
Different search keywords have different clustering centroid points for the same product, and the keyword clusters obtained by selecting different clustering centroid points may be different, so that the clustering centroid points can be reselected for clustering again according to the result of the previous clustering until the keyword clusters are relatively stable (i.e. the clustering centroid points are not changed) or the number of repeated clustering (i.e. the number of iterations) reaches a preset value. The average value of the keyword cluster means: the search keyword in the keyword cluster with the smallest distance from the cluster center, i.e., the search keyword closest to the cluster center. Iteration refers to the process of repeatedly performing a series of arithmetic steps to sequentially find subsequent quantities from previous quantities. The final clustering content (i.e., keyword data set) can be calculated by the above method.
Step S103: and analyzing the keyword data set to obtain the core keywords and the core weight of the core keywords.
Wherein the core keyword is data that most represents a category of the keyword dataset. Analyzing the keyword data set can find out one search keyword which can best embody the expectation of the user from a certain category of search keywords, and the search keyword is the core keyword. The core weight can reflect the proportion of the contribution of the core keywords to the decision of the user for implementing operations such as purchase, viewing or collection, and the like, namely the influence degree of the core keywords on the decision of the user.
In the embodiment of the present invention, step S103 may be implemented by: calculating an average value of the keyword data set as a core keyword; distributing an initial weight value for the core keyword; calculating similarity values between keyword data sets by using a bidirectional neural network; and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
The average of the keyword data set refers to: the search keyword in the keyword data set having the smallest distance from the cluster center, i.e., the search keyword closest to the cluster center. In step S102, clustering is performed on different search keywords, but the correlation between keyword data sets (i.e., the correlation between search keywords) is not clear, so that the correlation between keyword data sets can be analyzed by using a bidirectional neural network, and the core weight of the core keyword can be calculated by using the bidirectional neural network. The initial weight value allocated to the core keyword may be a ratio of the core keyword in the keyword data set, or an influence degree of a category of the core keyword on a user decision. The similarity value is a quantification of the correlation between any two keyword data sets, and each keyword data set may correspond to multiple similarity values, so that adjusting the initial weight value may obtain multiple adjustment weight values, and the largest adjustment weight value may be selected as the core weight value of the core keyword.
In the embodiment of the invention, the initial weight is adjusted based on the similarity value to obtain the adjusted weight, which can be realized by adjusting the initial weight by using a weight processing formula to obtain the adjusted weight, and selecting the maximum adjusted weight as the core weight, wherein the weight processing formula is W (v + α) (x-v), W is the adjusted weight, v is the initial weight, x is an adjusted independent variable, α is a similarity value.
Step S104: and generating a title based on the core weight of the core keyword.
After the core weight of the core keyword is obtained, the core keyword with larger core weight and preset data can be selected to form the title of the product, the core keyword with larger core weight exceeding the preset value can be selected to form the title of the product, a plurality of core keywords with larger core weight can be selected to form the title of the product based on a certain algorithm, and the like.
In the embodiment of the present invention, the method for generating a title may further include: extracting real-time keywords based on the real-time search data; clustering the real-time keywords to obtain a real-time data set; analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword; and checking or adjusting the title based on the checking weight of the checking keyword.
Step S101-step S104 are based on historical search data to generate titles, which are based on a prediction of user operation behaviors, and the generated titles also belong to a prediction, so real-time search data can be collected, the generated titles are checked or adjusted based on a certain magnitude of real-time search data or real-time search data in a certain period, and if the adjustment keywords are the same as the core keywords, and the adjustment weights of the adjustment keywords are the same as the core weights of the core keywords, the generated titles basically accord with the user expectation and do not need to be adjusted; if the adjusting keywords are different from the core keywords or the adjusting weights of the adjusting keywords are different from the core weights of the core keywords, the titles can be regenerated, the word sequence of the titles can be adjusted, or the words of the titles can be replaced. The analysis processing procedure for the real-time search data may employ the same method as steps S101 to S104.
According to the method for generating the title, the search keywords are extracted from the historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; the technical means for generating the title based on the core weight of the core keyword generates the title according to the real user operation behavior, so that the technical problems of inaccurate title, overlong text or redundancy and the like, low hit conversion rate and poor user experience are solved, and the hit conversion rate and the user experience are improved; the generated title accurately contains the product characteristics concerned by the user and accords with the technical effect of the content of the product.
Fig. 2 is a schematic diagram of a main flow of a method of generating a title according to one referential embodiment of the present invention.
As shown in fig. 2, the method for generating a title according to the embodiment of the present invention can be implemented as the following processes:
step S201: acquiring historical search data of a product:
the historical search data is keywords searched by the user, after the user searches a certain keyword to find the product, the user performs operations such as purchasing, viewing or collecting on the product, and the keyword is the historical search data of the product.
Step S202: extracting search keywords from historical search data of the product:
the historical search data can be clustered, the historical search data in each cluster form a historical data set, and the average value of the historical data set is calculated, namely the search keyword.
Step S203: clustering the search keywords to obtain a keyword data set:
the search keywords extracted in step S202 may be classified, and the search keywords of the same category are put into the same keyword dataset. It should be noted that the clustering of the search keywords may be implemented by using a K-means algorithm, a K-center point algorithm, a density-based clustering algorithm, or a gaussian mixture model.
Step S204: analyzing the keyword dataset:
the method comprises the steps of analyzing a keyword data set to obtain core keywords and core weights of the core keywords, calculating an average value of the keyword data set to obtain the core keywords, analyzing the interrelation between the keyword data sets by using a bidirectional neural network to obtain similarity values between the keyword data sets, adjusting initial weights by using a weight processing formula to obtain adjusted weights, selecting the maximum adjusted weight as the core weight, wherein the weight processing formula is W ═ v + α (x-v), W is the adjusted weight, v is the initial weight, x is an adjusted independent variable, and α is the similarity value.
Step S205: and generating a title for the product based on the core weight of the core keyword.
After the core weight of the core keyword is obtained in step S204, the core keyword with a larger core weight and preset data may be selected to form a title of the product, the core keyword with a core weight exceeding the preset value may be selected to form a title of the product, several core keywords with a larger core weight may be selected to generate a title of the product based on a certain algorithm, and the like.
Step S206: based on the real-time search data, the generated title is checked or adjusted:
based on real-time search data of a certain magnitude or real-time search data in a certain period, the generated title can be checked or adjusted, and if the adjusting keyword is the same as the core keyword, and the adjusting weight of the adjusting keyword is the same as the core weight of the core keyword, the generated title basically meets the user expectation and does not need to be adjusted; if the adjusting keywords are different from the core keywords or the adjusting weights of the adjusting keywords are different from the core weights of the core keywords, the titles can be regenerated, the word sequence of the titles can be adjusted, or the words of the titles can be replaced. It should be noted that the analysis processing procedure for the real-time search data may adopt the same method as that of step S202 to step S205.
Fig. 3 is a schematic diagram of a main flow of clustering search keywords according to one referential embodiment of the present invention.
As shown in fig. 3, clustering search keywords in the method for generating a title according to the embodiment of the present invention can be implemented according to the following processes:
step S301: randomly selecting K search keywords as clustering centroid points:
k denotes the number of pre-specified keyword clusters.
Step S302: dividing the search keywords with the same category as the clustering centroid points into a keyword cluster to obtain K keyword clusters:
for each search keyword, the keyword cluster to which it should belong is calculated using the following formula:
Figure BDA0001746240500000121
wherein, c(i)Represents a keyword cluster with the smallest distance between the cluster centroid point and the search keyword in the K keyword clusters,
Figure BDA0001746240500000122
set, x, representing all arguments j that minimize the distance of the cluster centroid point from the search key(i)Represents a search key, mujRepresenting search key (x)(i)) The cluster centroid of the key word cluster to which it belongs.
Step S303: calculating the average value of the keyword clusters as a new cluster centroid point:
for each keyword cluster, recalculating the cluster centroid point of the keyword cluster by using the following formula to obtain a new cluster centroid point:
Figure BDA0001746240500000123
wherein, mujRepresenting search key (x)(i)) Cluster centroid of the belonging keyword clusters, m represents the number of keyword clusters in the keyword clusters, c(i)Representing a keyword cluster with the smallest distance between the cluster centroid point and the search keyword in the K keyword clusters, x(i)A search key is represented.
Step S304: and re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster.
Step S305: judging whether the new clustering centroid point in the step S303 is the same as the clustering centroid point in the step S301 or whether the iteration frequency reaches a preset value; if the values are the same or reach the preset value, executing step S306; if the difference is not equal to the preset value, step S303 is executed.
Step S306: clustering the keywords in step S304 as a keyword dataset:
the clustering centroid point is not changed any more or the iteration times reach a preset value, which indicates that the keyword clustering is relatively stable, and at this time, the keyword clustering can be used as a keyword data set.
Fig. 4 is a schematic diagram of main blocks of an apparatus for generating a title according to an embodiment of the present invention.
As shown in fig. 4, the apparatus 200 for generating a title according to an embodiment of the present invention includes: an extraction module 401, a clustering module 402, an analysis module 403, and a generation module 404.
Wherein the content of the first and second substances,
an extracting module 401, configured to extract a search keyword from historical search data;
a clustering module 402, configured to cluster the search keywords to obtain a keyword data set;
an analysis module 403, configured to analyze the keyword dataset to obtain a core keyword and a core weight of the core keyword;
a generating module 404, configured to generate a title based on the core weight of the core keyword.
In this embodiment of the present invention, the analysis module 403 may further be configured to: calculating the average value of the keyword data set as a core keyword; distributing an initial weight value for the core keyword; calculating similarity values between the keyword data sets using a bidirectional neural network; and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
In this embodiment of the present invention, the analysis module 403 may be further configured to adjust the initial weight by using a weight processing formula to obtain an adjusted weight, and select the largest adjusted weight as a core weight, where the weight processing formula is W ═ v + α (x-v), W is the adjusted weight, v is the initial weight, x is an adjusted independent variable, and α is a similarity value.
In this embodiment of the present invention, the clustering module 402 may further be configured to: and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model.
In this embodiment of the present invention, the clustering module 402 may further be configured to: randomly selecting K search keywords as clustering centroid points; dividing the search keywords of the same category as the cluster centroid points into a keyword cluster; calculating the average value of the keyword clusters as a new cluster centroid point; re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster; and clustering the keywords of which the clustering centroid points do not change any more or the division times reach a preset value to serve as a keyword data set.
In this embodiment of the present invention, the extracting module 401 may further be configured to: clustering historical search data to obtain a historical data set; and calculating the average value of the historical data set as a search key.
Further, the apparatus may further include: a checking module (not shown) for extracting real-time keywords based on the real-time search data; clustering the real-time keywords to obtain a real-time data set; analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword; and checking or adjusting the title based on the checking weight value of the checking keyword.
According to the title generating device, the search keywords are extracted from the historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; the technical means for generating the title based on the core weight of the core keyword generates the title according to the real user operation behavior, so that the technical problems of inaccurate title, overlong text or redundancy and the like, low hit conversion rate and poor user experience are solved, and the hit conversion rate and the user experience are improved; the generated title accurately contains the product characteristics concerned by the user and accords with the technical effect of the content of the product.
Fig. 5 illustrates an exemplary system architecture 500 of a method of generating a title or an apparatus for generating a title to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the method for generating a title provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the apparatus for generating a title is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an extraction module, a clustering module, an analysis module, and a generation module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the extraction module may also be described as a "module that extracts search keywords from historical search data".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: step S101: extracting search keywords from historical search data; step S102: clustering the search keywords to obtain a keyword data set; step S103: analyzing the keyword data set to obtain core keywords and core weights of the core keywords; step S104: and generating a title based on the core weight of the core keyword.
According to the technical scheme of the embodiment of the invention, the search keywords are extracted from the historical search data; clustering the search keywords to obtain a keyword data set; analyzing the keyword data set to obtain core keywords and core weights of the core keywords; the technical means for generating the title based on the core weight of the core keyword generates the title according to the real user operation behavior, so that the technical problems of inaccurate title, overlong text or redundancy and the like, low hit conversion rate and poor user experience are solved, and the hit conversion rate and the user experience are improved; the generated title accurately contains the product characteristics concerned by the user and accords with the technical effect of the content of the product.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (16)

1. A method of generating a title, comprising:
extracting search keywords from historical search data;
clustering the search keywords to obtain a keyword data set;
analyzing the keyword data set to obtain core keywords and core weights of the core keywords;
and generating a title based on the core weight of the core keyword.
2. The method of claim 1, wherein analyzing the keyword dataset to obtain core keywords and core weights for the core keywords comprises:
calculating the average value of the keyword data set as a core keyword;
distributing an initial weight value for the core keyword;
calculating similarity values between the keyword data sets using a bidirectional neural network;
and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
3. The method of claim 2, wherein adjusting the initial weight value based on the similarity value comprises:
adjusting the initial weight by using a weight processing formula to obtain an adjusted weight, and selecting the maximum adjusted weight as a core weight;
wherein, the weight processing formula is as follows:
w is v + α (x-v), W is the adjusted weight, v is the initial weight, x is the argument of the adjustment, α is the similarity value.
4. The method of claim 1, wherein clustering the search key comprises:
and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model.
5. The method of claim 1, wherein clustering the search keywords to obtain a keyword dataset comprises:
randomly selecting K search keywords as clustering centroid points;
dividing the search keywords of the same category as the cluster centroid points into a keyword cluster;
calculating the average value of the keyword clusters as a new cluster centroid point;
re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster;
and clustering the keywords of which the clustering centroid points do not change any more or the division times reach a preset value to serve as a keyword data set.
6. The method of claim 1, wherein extracting search keywords from historical search data comprises:
clustering historical search data to obtain a historical data set;
and calculating the average value of the historical data set as a search key.
7. The method according to any one of claims 1-6, further comprising:
extracting real-time keywords based on the real-time search data;
clustering the real-time keywords to obtain a real-time data set;
analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword;
and checking or adjusting the title based on the checking weight value of the checking keyword.
8. An apparatus for generating a title, comprising:
the extraction module is used for extracting search keywords from historical search data;
the clustering module is used for clustering the search keywords to obtain a keyword data set;
the analysis module is used for analyzing the keyword data set to obtain a core keyword and a core weight of the core keyword;
and the generating module is used for generating a title based on the core weight of the core keyword.
9. The apparatus of claim 8, wherein the analysis module is further configured to:
calculating the average value of the keyword data set as a core keyword;
distributing an initial weight value for the core keyword;
calculating similarity values between the keyword data sets using a bidirectional neural network;
and adjusting the initial weight value based on the similarity value to obtain an adjusted weight value, and selecting the maximum adjusted weight value as a core weight value.
10. The apparatus of claim 9, wherein the analysis module is further configured to:
adjusting the initial weight by using a weight processing formula to obtain an adjusted weight, and selecting the maximum adjusted weight as a core weight;
wherein, the weight processing formula is as follows:
w is v + α (x-v), W is the adjusted weight, v is the initial weight, x is the argument of the adjustment, α is the similarity value.
11. The apparatus of claim 8, wherein the clustering module is further configured to:
and clustering the search keywords by utilizing a K-mean algorithm, a K-center point algorithm, a density-based clustering algorithm or a Gaussian mixture model.
12. The apparatus of claim 8, wherein the clustering module is further configured to:
randomly selecting K search keywords as clustering centroid points;
dividing the search keywords of the same category as the cluster centroid points into a keyword cluster;
calculating the average value of the keyword clusters as a new cluster centroid point;
re-dividing the search keywords with the same category as the new cluster centroid points into a keyword cluster;
and clustering the keywords of which the clustering centroid points do not change any more or the division times reach a preset value to serve as a keyword data set.
13. The apparatus of claim 8, wherein the extraction module is further configured to:
clustering historical search data to obtain a historical data set;
and calculating the average value of the historical data set as a search key.
14. The apparatus according to any one of claims 8-13, wherein the apparatus further comprises:
the checking module is used for extracting real-time keywords based on the real-time search data; clustering the real-time keywords to obtain a real-time data set; analyzing the real-time data set to obtain a check keyword and a check weight of the check keyword; and checking or adjusting the title based on the checking weight value of the checking keyword.
15. An electronic device that generates a title, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
16. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201810844000.8A 2018-07-27 2018-07-27 Method and device for generating title Pending CN110852078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810844000.8A CN110852078A (en) 2018-07-27 2018-07-27 Method and device for generating title

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810844000.8A CN110852078A (en) 2018-07-27 2018-07-27 Method and device for generating title

Publications (1)

Publication Number Publication Date
CN110852078A true CN110852078A (en) 2020-02-28

Family

ID=69594817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810844000.8A Pending CN110852078A (en) 2018-07-27 2018-07-27 Method and device for generating title

Country Status (1)

Country Link
CN (1) CN110852078A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401046A (en) * 2020-04-13 2020-07-10 贝壳技术有限公司 Method and device for generating house source title, storage medium and electronic equipment
CN114363664A (en) * 2021-12-31 2022-04-15 北京达佳互联信息技术有限公司 Method and device for generating video collection title

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401046A (en) * 2020-04-13 2020-07-10 贝壳技术有限公司 Method and device for generating house source title, storage medium and electronic equipment
CN111401046B (en) * 2020-04-13 2023-09-29 贝壳技术有限公司 House source title generation method and device, storage medium and electronic equipment
CN114363664A (en) * 2021-12-31 2022-04-15 北京达佳互联信息技术有限公司 Method and device for generating video collection title

Similar Documents

Publication Publication Date Title
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
CN106354856B (en) Artificial intelligence-based deep neural network enhanced search method and device
CN107145485B (en) Method and apparatus for compressing topic models
CN113688310B (en) Content recommendation method, device, equipment and storage medium
CN112100396A (en) Data processing method and device
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN111191825A (en) User default prediction method and device and electronic equipment
CN113204621A (en) Document storage method, document retrieval method, device, equipment and storage medium
CN110750707A (en) Keyword recommendation method and device and electronic equipment
CN112116426A (en) Method and device for pushing article information
CN110245357B (en) Main entity identification method and device
CN110852078A (en) Method and device for generating title
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN111435406A (en) Method and device for correcting database statement spelling errors
CN110807097A (en) Method and device for analyzing data
CN110852057A (en) Method and device for calculating text similarity
CN114036921A (en) Policy information matching method and device
CN109902152B (en) Method and apparatus for retrieving information
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN110750708A (en) Keyword recommendation method and device and electronic equipment
CN111368036B (en) Method and device for searching information
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN112784046A (en) Text clustering method, device and equipment and storage medium
CN111274383B (en) Object classifying method and device applied to quotation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination