CN113987176A

CN113987176A - Content popularity tag generation method, intelligent terminal and storage medium

Info

Publication number: CN113987176A
Application number: CN202111242678.7A
Authority: CN
Inventors: 王嘉诚; 李晶
Original assignee: Easy City Square Network Technology Co ltd
Current assignee: Easy City Square Network Technology Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-28

Abstract

The invention discloses a content heat label generation method, an intelligent terminal and a storage medium, wherein the content heat label generation method comprises the following steps: obtaining user comment data, and extracting a plurality of content entity tags corresponding to the user comment data from the user comment data; clustering the content entity labels, and determining a plurality of clustering clusters corresponding to the user comment data; and generating a content heat label corresponding to the user comment data according to the clustering clusters. According to the method and the device, the content entity labels are extracted from the user comment data, and the content popularity label is generated according to the clustered content entity labels, so that the content popularity label in the user comment data can be accurately acquired.

Description

Content popularity tag generation method, intelligent terminal and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a content hotness tag generation method, an intelligent terminal, and a storage medium.

Background

With the rapid development of internet technology, more and more users choose to share their shopping experiences and product evaluations on social websites such as microblogs and forums, and these online product evaluations are not only important information resources for enterprises to know user requirements, but also important reference bases for users to select desired products. However, the existing user comment data are generally sorted according to publication time, when the user reviews the user comment data, the user reviews the user comment data one by one according to the time sequence, hot comment contents cannot be obtained from massive user comments, and when an enterprise faces a large amount of user comment data, how to better mine important requirements and feedback of the user on products and further assist decision making are also one of the difficulties.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a content popularity tag generation method, an intelligent terminal and a storage medium, aiming at solving the problem that the existing user comment data is generally sorted according to publication time, and a user cannot obtain popular comment content from massive user comment data.

The technical scheme adopted by the invention for solving the problems is as follows:

in a first aspect, an embodiment of the present invention provides a method for generating a content hotness label, where the method includes:

obtaining user comment data, and extracting a plurality of content entity tags corresponding to the user comment data from the user comment data;

clustering the content entity labels, and determining a plurality of clustering clusters corresponding to the user comment data;

and generating a content heat label corresponding to the user comment data according to the clustering clusters.

The content popularity tag generation method, wherein the step of extracting a plurality of content entity tags corresponding to the user comment data from the user comment data includes:

preprocessing the user comment data to obtain preprocessed user comment data;

and analyzing the preprocessed user comment data, and extracting a plurality of content entity labels corresponding to the user comment data from the preprocessed user comment data.

The content popularity tag generation method, wherein the clustering the content entity tags and the determining the clustering clusters corresponding to the user comment data includes:

vectorizing the content entity tags to obtain a plurality of vectorized tags;

clustering the vectorization labels, and determining a plurality of clustering clusters corresponding to the user comment data.

The content popularity tag generation method, wherein the clustering the vectorization tags and the determining the clustering clusters corresponding to the user comment data includes:

acquiring preset neighborhood parameters, and determining a core object sample set according to the vectorization labels and the neighborhood parameters;

and determining a plurality of clustering clusters corresponding to the user comment data according to the core object sample set and the neighborhood parameters.

The content popularity tag generation method, wherein the neighborhood parameters include a neighborhood distance threshold and a neighborhood quantity threshold, and the step of determining a core object sample set according to the vectorization tags and the neighborhood parameters includes:

determining a neighborhood subsample set corresponding to each vectorization label according to the vectorization labels and the neighborhood distance threshold;

and determining a core object sample set according to the neighborhood subsample set corresponding to each vectorization label.

The content popularity tag generation method, wherein the step of determining a plurality of cluster clusters corresponding to the user comment data according to the core object sample set and the neighborhood parameters includes:

selecting a target core object from the core object sample set, determining a cluster corresponding to the target core object according to the target core object and the neighborhood parameters, and updating the core object set;

when the core object set does not meet a preset first condition, continuing to execute the steps of selecting a target core object from the core object sample set, determining a cluster corresponding to the target core object according to the target core object and the neighborhood parameters and updating the core object set until the core object set meets the preset first condition;

and determining the cluster corresponding to each target core object as a plurality of cluster clusters corresponding to the user comment data.

The content hotness label generating method includes the following steps of:

initializing a current cluster core object queue according to the target core object, and updating a current cluster sample set and the current cluster core object queue according to the current cluster core object queue and the neighborhood parameters;

when the updated current cluster core object queue does not meet a preset second condition, continuing to execute the step of updating the current cluster sample set and the current cluster core object queue according to the current cluster core object queue and the neighborhood parameters until the current cluster core object queue meets the preset second condition;

and determining the updated current cluster sample set as a cluster corresponding to the target core object.

The content popularity tag generation method, wherein the step of generating the content popularity tag corresponding to the user comment data according to the plurality of clustering clusters includes:

acquiring the number of clustering labels and clustering center labels of each clustering cluster in the clustering clusters, and sequencing the clustering center labels according to the number of the clustering labels;

and generating a content heat label corresponding to the user comment data according to the sorted clustering center label.

In a second aspect, an embodiment of the present invention further provides a content hotness tag generation apparatus, where the apparatus includes:

the data acquisition module is used for acquiring user comment data and extracting a plurality of content entity tags corresponding to the user comment data from the user comment data;

the data clustering module is used for clustering the content entity labels and determining a plurality of clustering clusters corresponding to the user comment data;

and the label generation module is used for generating a content popularity label corresponding to the user comment data according to the clustering clusters.

In a third aspect, an embodiment of the present invention provides an intelligent terminal, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors includes steps for executing the content hottag generation method according to any one of the above items.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to perform the steps of the content hottag generation method as described in any one of the above.

The invention has the beneficial effects that: according to the embodiment of the invention, the user comment data is firstly obtained, the content entity labels corresponding to the user comment data are extracted from the user comment data, then the content entity labels are clustered, the clustering clusters corresponding to the user comment data are determined, and finally the content heat labels corresponding to the user comment data are generated according to the clustering clusters, so that the content heat labels in the user comment data can be accurately obtained by extracting the content entity labels from the user comment data and generating the content heat labels according to the clustered content entity labels.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a content hotness tag generation method according to an embodiment of the present invention;

FIG. 2 is a functional block diagram of a content hotness tag generation apparatus according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

The user comment data is not only an important information resource for the enterprise to know the user requirements, but also an important reference for the user to select the required products. However, the existing user comment data are generally sorted according to publication time, when the user reviews the user comment data, the user reviews the user comment data one by one according to the time sequence, hot comment contents cannot be obtained from massive user comments, and when an enterprise faces a large amount of user comment data, how to better mine important requirements and feedback of the user on products and further assist decision making are also one of the difficulties.

In order to solve the problems in the prior art, the embodiment provides a content popularity tag generation method, by which a content popularity tag in user comment data can be accurately obtained, and the problem that popular comment content cannot be obtained from massive user comment data in the prior art is solved. In specific implementation, firstly, user comment data is obtained, a plurality of content entity labels corresponding to the user comment data are extracted from the user comment data, then the plurality of content entity labels are clustered, a plurality of clustering clusters corresponding to the user comment data are determined, and finally, content heat labels corresponding to the user comment data are generated according to the clustering clusters.

Exemplary method

The embodiment provides a content popularity tag generation method, which can be applied to an intelligent terminal. As shown in fig. 1 in particular, the method comprises:

step S100, user comment data are obtained, and a plurality of content entity labels corresponding to the user comment data are extracted from the user comment data.

Specifically, the user comment data is comment data obtained after a user purchases or uses a commodity or a service, and is obtained from existing various E-commerce website platforms, and the content entity tags are a combination of specific structures extracted from the user comment data. In this embodiment, in order to generate content hotness labels corresponding to massive user comment data, user comment data is first acquired, and then a plurality of content entity labels corresponding to the user comment data are extracted from the user comment data by using an open-source dependency syntax analysis model, so that in a subsequent step, the content hotness labels are generated according to the plurality of content entity labels. In the embodiment, the dependency syntactic analysis model is used for extracting the content entity tag, and compared with the traditional word segmentation extraction method, the extracted content is more accurate and richer.

In a specific embodiment, the step of extracting a plurality of content entity tags corresponding to the user comment data from the user comment data in step S100 includes:

step S110, preprocessing the user comment data to obtain preprocessed user comment data;

step S120, analyzing the preprocessed user comment data, and extracting a plurality of content entity labels corresponding to the user comment data from the preprocessed user comment data.

In consideration of the fact that mass comment data may contain data which does not conform to the laws, such as advertisements or comment data irrelevant to the subject, in this embodiment, after user comment data is acquired, the user comment data is first preprocessed, and data which obviously does not conform to the laws in the user comment data is removed. Wherein, removing the data which is not consistent with the normalcy mainly comprises: and the phrase deletion means deleting phrases which are meaningless in judgment, the mechanical compression means removing repeated words or information in the phrases, and the text removal means carrying out duplication removal operation on repeated text data.

After the user comment data are preprocessed, the preprocessed user comment data are analyzed by utilizing an open-source dependency syntax analysis model, and a combination of a plurality of specific structures is extracted from the preprocessed user comment data and used as a plurality of content entity labels corresponding to the user comment data. The open-source dependency syntax analysis model comprises stanfordcorenlp, hanlp, hakuang LTP, FudanNLP and the like, and can analyze the preprocessed user comment data according to the actual service requirements, the part of speech of the subordinate words, the part of speech of the dominant words and the dependency relationship between the subordinate words and the dominant words to obtain a plurality of content entity labels corresponding to the user comment data.

And S200, clustering the content entity labels, and determining a plurality of clustering clusters corresponding to the user comment data.

In consideration of the fact that the data size of the content entity labels extracted from the massive user comment data is huge and there are many similar content entity labels, in this embodiment, after the content entity labels corresponding to the user comment data are obtained, the content entity labels are clustered by using a clustering algorithm, and the content entity labels with similar semantics are clustered together to determine a plurality of clustering clusters corresponding to the user comment data.

In a specific embodiment, step S200 specifically includes:

step S210, vectorizing the content entity tags to obtain a plurality of vectorized tags;

step S220, clustering the vectorization labels, and determining a plurality of clustering clusters corresponding to the user comment data.

The clustering clusters are formed by clustering tags with similar semantics in the vectorization tags, that is, each clustering cluster is composed of the vectorization tags with similar semantics, and in consideration of the fact that the content entity tags cannot be directly applied to clustering by a clustering algorithm, in this embodiment, a pre-trained word vector model is used to vectorize the content entity tags to obtain vectorization tags, wherein the word vector model may use an existing word vector model such as word2vec, and then the vectorization tags are clustered by the clustering algorithm to determine the clustering clusters corresponding to the user comment data. In the embodiment, a word vector model and a clustering algorithm are combined to cluster a plurality of content entity labels, so that semantic expression is enriched compared with a traditional word bank matching method, and the synonymous content entity labels are convenient to fuse.

In an embodiment, step S220 specifically includes:

step S221, obtaining preset neighborhood parameters, and determining a core object sample set according to the vectorization labels and the neighborhood parameters;

step S222, determining a plurality of clustering clusters corresponding to the user comment data according to the core object sample set and the neighborhood parameters.

In order to measure semantic similarity of the vectorization labels, neighborhood parameters are preset in the embodiment, the neighborhood parameters include a neighborhood distance threshold epsilon and a neighborhood quantity threshold MinPts, when the vectorization labels are clustered, the preset neighborhood parameters are firstly obtained, a core object sample set is determined according to the vectorization labels and the domain parameters, and then a plurality of clustering clusters corresponding to the user comment data are determined according to the core object sample set and the neighborhood parameters.

In one embodiment, step S221 specifically includes:

step S2211, determining a neighborhood subsample set corresponding to each vectorization label according to the vectorization labels and the neighborhood distance threshold;

step S2212, determining a core object sample set according to the neighborhood subsample set corresponding to each vectorization label.

In a specific embodiment, when determining the core object sample set, first, a distance value between each of the vectorization tags in the plurality of vectorization tags is calculated in a preset distance measurement manner, and a neighborhood subsample set N e (xj) corresponding to each vectorization tag is determined according to the distance value between each vectorization tag. The neighborhood subsample set N e (xj) is composed of a plurality of vectorization labels, and the distance value between each vectorization label in the neighborhood subsample set N e (xj) and the vectorization label corresponding to the neighborhood subsample set N e (xj) is not greater than the neighborhood distance threshold value E. The distance measurement mode comprises a Euclidean distance, a Mahalanobis distance, a Chebyshev distance, an included angle cosine and the like.

After a neighborhood subsample set N epsilon (xj) corresponding to each vectorization label is obtained, the number of subsamples in the neighborhood subsample set N epsilon (xj) is compared with a neighborhood quantity threshold MinPts, and a set formed by a plurality of vectorization labels corresponding to the neighborhood subsample set N epsilon (xj) with the subsample number not less than the neighborhood quantity threshold MinPts is determined as a core object sample set.

In an embodiment, the step S222 specifically includes:

step S2221, selecting a target core object from the core object sample set, determining a cluster corresponding to the target core object according to the target core object and the neighborhood parameters, and updating the core object set;

step S2222, when the core object set does not satisfy a preset first condition, continuing to execute the steps of selecting a target core object from the core object sample set, determining a cluster corresponding to the target core object according to the target core object and the neighborhood parameters, and updating the core object set until the core object set satisfies the preset first condition;

step S2223, determining the cluster corresponding to each target core object as a plurality of cluster clusters corresponding to the user comment data.

When determining a plurality of cluster clusters according to the core object sample set, the embodiment first initializes the core object set

Initializing cluster number k equal to 0, initializing sample set Γ equal to D, and cluster partitioning

And then randomly selecting a target core object o from the core object sample set, determining a cluster corresponding to the target core object o according to the target core object o and the neighborhood parameters, and updating the core object set omega. When the core object set omega does not meet a preset first condition, continuing to execute the steps of selecting a target core object from the core object sample set omega, determining a cluster corresponding to the target core object according to the target core object and the neighborhood parameters and updating the core object set until the core object set meets the preset first condition, and determining the cluster corresponding to each target core object as a plurality of cluster corresponding to the user comment data. Wherein the preset first condition is

In a specific embodiment, the step of determining the cluster corresponding to the target core object according to the target core object and the neighborhood parameter in step S2221 includes:

step S22211, initializing a current cluster core object queue according to the target core object, and updating a current cluster sample set and a current cluster core object queue according to the current cluster core object queue and the neighborhood parameters;

step S22212, when the updated current cluster core object queue does not satisfy the preset second condition, continuing to execute the step of updating the current cluster sample set and the current cluster core object queue according to the current cluster core object queue and the neighborhood parameters until the current cluster core object queue satisfies the preset second condition;

step S22213, the updated current cluster sample set is determined as the cluster corresponding to the target core object.

In this embodiment, when determining a cluster corresponding to a target core object, first, a current cluster core object queue Ω cur ═ o }, a class sequence number k ═ k +1, a current cluster sample set Ck ═ o }, an unaccessed sample set Γ ═ Γ - { o } is updated, and the current cluster sample set and the current cluster core object queue are updated according to the current cluster core object queue Ω cur and the neighborhood parameter. And then comparing the updated current cluster core object queue Ω cur with a preset second condition, when the updated current cluster core object queue Ω cur does not meet the preset second condition, continuing to execute the step of updating the current cluster sample set and the current cluster core object queue according to the current cluster core object queue and the neighborhood parameters until the updated current cluster core object queue Ω cur meets the preset second condition, and determining the updated current cluster sample set as a cluster corresponding to the target core object. Wherein the second condition is preset as

In a specific embodiment, the update process of the current cluster sample set and the current cluster core object queue specifically includes: randomly selecting a core object o 'from the current cluster core object queue Ω cur, finding out a neighborhood subsample set N e (o') corresponding to the core object o 'through a neighborhood distance threshold e, updating the current cluster sample set according to a formula Ck ═ Γ, updating the unaccessed sample set according to a formula Ck ═ Γ - Δ, and updating the current cluster core object queue according to a formula Ω cur ═ Γ ═ Ω -o'.

And S300, generating a content popularity label corresponding to the user comment data according to the clustering clusters.

The content popularity tag is the tag with the highest popularity among the content entity tags, the number of the clustering tags is the number of the content entity tags forming each clustering cluster, generally speaking, the larger the number of the content entity tags forming each clustering cluster is, the more popular the content entity tags forming the clustering cluster are, and the cluster center tag is the tag of the cluster center of each clustering cluster. In this embodiment, after a plurality of clustering clusters corresponding to the user comment data are determined, the number of clustering labels and the clustering center label of each clustering cluster in the plurality of clustering clusters are further obtained, the clustering center labels of each clustering cluster are sorted according to the number of clustering labels, and a content heat label corresponding to the user comment data is generated according to the sorted clustering center labels. The sorting mode may be descending order or ascending order, for example, the cluster center tags of each cluster are ascending order according to the number of the cluster tags, and K cluster center tags before sorting are taken to generate TopK user comment content heat tags

Therefore, in the embodiment of the invention, firstly, the user comment data is obtained, the content entity labels corresponding to the user comment data are extracted from the user comment data, then, the content entity labels are clustered, the clustering clusters corresponding to the user comment data are determined, and finally, the content heat labels corresponding to the user comment data are generated according to the clustering clusters.

Exemplary device

As shown in fig. 2, an embodiment of the present invention provides a content hotness tag generation apparatus, including: a data acquisition module 210, a data clustering module 220, and a tag generation module 230. Specifically, the data obtaining module 210 is configured to obtain user comment data, and extract a plurality of content entity tags corresponding to the user comment data from the user comment data. The data clustering module 220 is configured to cluster the content entity tags, and determine a plurality of clustering clusters corresponding to the user comment data. The label generating module 230 is configured to generate a content popularity label corresponding to the user comment data according to the plurality of clustering clusters.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 3. The intelligent terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to implement a content hottag generation method. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the operating temperature of internal equipment.

It will be understood by those skilled in the art that the block diagram shown in fig. 3 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

In summary, the present invention discloses a content hotness label generation method, an intelligent terminal and a storage medium, including: obtaining user comment data, and extracting a plurality of content entity tags corresponding to the user comment data from the user comment data; clustering the content entity labels, and determining a plurality of clustering clusters corresponding to the user comment data; and generating a content heat label corresponding to the user comment data according to the clustering clusters. According to the method and the device, the content entity labels are extracted from the user comment data, and the content popularity label is generated according to the clustered content entity labels, so that the content popularity label in the user comment data can be accurately acquired.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A method for generating content hotness labels, the method comprising:

2. The method according to claim 1, wherein the step of extracting a plurality of content entity tags corresponding to the user comment data from the user comment data includes:

preprocessing the user comment data to obtain preprocessed user comment data;

3. The method according to claim 1, wherein the clustering the content entity tags and the determining the cluster clusters corresponding to the user comment data comprises:

vectorizing the content entity tags to obtain a plurality of vectorized tags;

4. The method according to claim 3, wherein the clustering the vectorized tags and the determining the clusters corresponding to the user comment data comprises:

5. The method of claim 4, wherein the neighborhood parameters comprise a neighborhood distance threshold and a neighborhood number threshold, and wherein determining the set of core object samples from the vectorized labels and the neighborhood parameters comprises:

6. The method according to claim 4, wherein the step of determining a plurality of cluster clusters corresponding to the user comment data according to the core object sample set and the neighborhood parameters comprises:

7. The method according to claim 6, wherein the step of determining the cluster corresponding to the target core object according to the target core object and the neighborhood parameters comprises:

8. The method according to claim 1, wherein the step of generating the content popularity tag corresponding to the user comment data according to the plurality of clusters comprises:

9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors comprise steps for performing the content hottag generation method of any one of claims 1-8.

10. A computer readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the steps of the content hottag generation method according to any one of claims 1 to 8.