CN110442674B

CN110442674B - Label propagation clustering method, terminal equipment, storage medium and device

Info

Publication number: CN110442674B
Application number: CN201910504157.0A
Authority: CN
Inventors: 尹帆; 张广凯; 宋中山; 覃俊; 郑禄; 吴经龙
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2021-09-14
Anticipated expiration: 2039-06-11
Also published as: CN110442674A

Abstract

The invention discloses a clustering method, terminal equipment, storage medium and device for label propagation, wherein the method comprises the following steps: acquiring frequent words of each text; extracting text information of the text from a sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation; generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value; generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value; and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster. The technical scheme of the invention can solve the technical problems of low tag propagation randomness, low clustering accuracy and low reliability.

Description

Label propagation clustering method, terminal equipment, storage medium and device

Technical Field

The present invention relates to the field of tag propagation and clustering technologies, and in particular, to a tag propagation clustering method, a terminal device, a storage medium, and an apparatus.

Background

At present, in the aspects of agricultural production, information retrieval, financial and biological information processing and the like, a large amount of data information needs to be processed and then used, and generally, labels are used for propagation processing and then clustering; for example, when analyzing pest damage of crops, the damaged phenomenon of the damaged crops needs to be marked, then whether the damaged crops belong to the pest type is judged, the phenomenon can be quickly clustered by using a label propagation algorithm to obtain a result, and finally the pest can be remedied. However, the label propagation algorithm is not only random, but also has low accuracy and reliability after clustering data subjected to marking processing.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a label propagation clustering method, a terminal device, a storage medium and a device, and aims to solve the technical problems of low label propagation randomness, low clustering accuracy and low reliability.

In order to achieve the above object, the present invention provides a label propagation clustering method, which includes the following steps:

performing word segmentation processing on the texts in the sample text set to obtain frequent words of each text;

extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation;

generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value;

generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value;

and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster.

Preferably, the performing word segmentation processing on the texts in the sample text set to obtain frequent words of each text specifically includes:

performing word segmentation and part-of-speech tagging on the texts in the sample text set through FNLP to obtain feature words;

performing TF-IDF operation on the characteristic words to obtain the word frequency and the inverse document frequency of the characteristic words;

generating a weight threshold value of the characteristic word through a preset weight corresponding relation according to the word frequency and the inverse document frequency;

and comparing the weight threshold of the feature words with a preset frequent word threshold, and acquiring target feature words according to comparison results so as to take the target feature words as frequent words of the text.

Preferably, the extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relationship specifically includes:

extracting text information of the text from the sample text set;

and setting directed edges between the text nodes with the text information according to the text information through a preset mapping relation so as to construct a heterogeneous text network.

Preferably, the generating a node influence threshold value from the corresponding text node in the heterogeneous text network according to a preset node influence relationship, and acquiring the target label according to the node influence threshold value specifically includes:

generating a node influence threshold value by the corresponding text node in the heterogeneous text network through a preset node influence relationship;

and comparing the node influence threshold with a preset node influence threshold, and acquiring a target text according to a comparison result so as to take frequent words of the target text as target labels.

Preferably, the generating a total similarity threshold between the texts in the heterogeneous text network through a preset total similarity relationship, and obtaining a target text node according to the total similarity threshold specifically includes:

constructing a frequent word-text matrix according to the frequent words and the text to obtain text vectors corresponding to the text, and generating an internal feature similarity threshold value between the texts through a preset cosine similarity relation for the text vectors;

in the heterogeneous text network, generating an extrinsic feature similarity threshold value between the texts through a preset path similarity relation;

generating a total similarity threshold of the text by presetting a total similarity relation according to the internal feature similarity threshold and the external feature similarity threshold;

and acquiring a target text node according to the total similarity threshold.

Preferably, the obtaining a target text node according to the total similarity threshold specifically includes:

according to the total similarity threshold value;

and comparing the total similarity threshold with a preset text total similarity threshold, and acquiring target text nodes in the heterogeneous text network according to the comparison result.

Preferably, the propagating the target label among the target text nodes and clustering texts corresponding to the same target label to obtain a clustering result cluster specifically includes:

if the target text node is a target text node of a directed edge in the heterogeneous text network, the target label is spread among the target text nodes according to the direction of the directed edge;

if the target text node is a target text node with no directional edge or two-way edge in the heterogeneous text network, sequencing according to a node influence threshold corresponding to the target text node and obtaining a sequencing result, and spreading the target label among the target text nodes according to the sequencing result;

and clustering the texts corresponding to the same target label to obtain a clustering result cluster.

In addition, to achieve the above object, the present invention further provides a terminal device, including: a memory, a processor and a tag propagated clustering program stored on the memory and executable on the processor, the tag propagated clustering program when executed by the processor implementing the steps of the tag propagated clustering method as described above.

In addition, to achieve the above object, the present invention further provides a storage medium, on which a tag propagation clustering program is stored, and the tag propagation clustering program, when executed by a processor, implements the steps of the tag propagation clustering method as described above.

In addition, in order to achieve the above object, the present invention further provides a tag propagation clustering device, including:

the frequent word acquisition module is used for carrying out word segmentation processing on the texts in the sample text set so as to acquire frequent words of each text;

the heterogeneous text network construction module is used for extracting text information of the text from the sample text set and constructing a heterogeneous text network according to the text information through a preset mapping relation;

the target label acquisition module is used for generating a node influence threshold value for the corresponding text node in the heterogeneous text network through a preset node influence relationship and acquiring a target label according to the node influence threshold value;

the target text node acquisition module is used for generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation and acquiring a target text node according to the total similarity threshold value;

and the propagation and clustering module is used for propagating the target labels among the target text nodes and clustering texts corresponding to the same target labels to obtain a clustering result cluster.

In the invention, the frequent words of each text are obtained by performing word segmentation processing on the texts in the sample text set; extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation; generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value; generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value; and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster. The technical scheme of the invention can solve the technical problems of low tag propagation randomness, low clustering accuracy and low reliability.

Drawings

Fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a clustering method for tag propagation according to the present invention;

FIG. 3 is a flowchart illustrating a clustering method for tag propagation according to a second embodiment of the present invention;

fig. 4 is a block diagram of a first embodiment of a tag propagation clustering apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the terminal device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the terminal device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a tag-propagated clustering program.

In the terminal device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a peripheral and performing data communication with the peripheral; the terminal device calls the tag propagation clustering program stored in the memory 1005 through the processor 1001, and executes the tag propagation clustering method provided by the embodiment of the present invention.

Further, the processor 1001 may call a clustering routine of tag propagation stored in the memory 1005, and also perform the following operations:

extracting text information of the text from the sample text set;

and acquiring a target text node according to the total similarity threshold.

according to the total similarity threshold value;

In the embodiment, the frequent words of each text are obtained by performing word segmentation processing on the texts in the sample text set; extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation; generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value; generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value; and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster. The technical scheme of the invention can solve the technical problems of low tag propagation randomness, low clustering accuracy and low reliability.

Based on the hardware structure, the embodiment of the clustering method for the label propagation is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the label propagation clustering method, and the first embodiment of the label propagation clustering method is provided.

In a first embodiment, the label propagation clustering method includes the following steps:

step S10: and performing word segmentation processing on the texts in the sample text set to obtain frequent words of each text.

It is understood that, in the present embodiment, the text refers to a representation form of written language, and from a literature perspective, it is usually a sentence or a combination of sentences having complete and systematic meaning; a text may be a sentence, a paragraph, or a chapter, which is not described in detail herein.

In the specific implementation, a sample text set is collected in advance, word segmentation and part-of-speech tagging are performed on texts in the sample text set to obtain feature words, word frequency and inverse document frequency of the feature words are obtained according to the feature words, and then frequent words of each text are obtained according to the preset weight corresponding relation.

Step S20: and extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation.

It should be noted that, in this embodiment, the text information includes information of interest among authors of the text, information of approval of the text, forwarding and citation, and the like, and details are not repeated here.

In specific implementation, text information of the text is extracted from the sample text set, and text nodes with the text information are set as directed edges according to the text information through a preset mapping relation so as to construct a heterogeneous text network.

Step S30: and generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value.

It should be noted that, in this embodiment, according to the node influence threshold, the node influence threshold is compared with a preset node influence threshold, and a target text is obtained according to a comparison result, so that frequent words of the target text are used as a target label.

Step S40: and generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value.

It should be noted that, in this embodiment, the intrinsic feature similarity threshold is obtained according to the frequent word and the preset cosine similarity relationship; and finally, generating a total similarity threshold of the text by presetting a total similarity relation according to the internal feature similarity threshold and the external feature similarity threshold so as to obtain a target text node.

Step S50: and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster.

It should be noted that, in this embodiment, a label propagation algorithm is introduced, the target labels are propagated among the target text nodes, and finally, the texts corresponding to the same target labels are clustered to obtain a clustering result cluster until the whole process is finished.

It is worth to be noted that, in the embodiment, a weighted directed heterogeneous text network is introduced, and the multi-dimensional features of the text are mined to perform similarity calculation, so that the accuracy and the reliability of the clustering result are improved.

In the first embodiment, the frequent words of each text are obtained by performing word segmentation processing on the texts in the sample text set; extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation; generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value; generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value; and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster. The technical scheme of the invention can solve the technical problems of low tag propagation randomness, low clustering accuracy and low reliability.

Referring to fig. 3, fig. 3 is a flowchart illustrating a clustering method for tag propagation according to a second embodiment of the present invention, and the second embodiment of the clustering method for tag propagation according to the present invention is proposed based on the first embodiment illustrated in fig. 2.

In the second embodiment, the step S10 specifically includes:

step S11: performing word segmentation and part-of-speech tagging on the texts in the sample text set through FNLP (development kit for Chinese natural language text processing based on machine learning) to obtain feature words; and performing TF-IDF (Term-Inverse Document Frequency) operation on the feature words for a common weighting technology for information retrieval and data mining, wherein TF means Term Frequency Term Frequency and IDF means Inverse text Frequency index Inverse Document Frequency) operation to obtain the Term Frequency and the Inverse Document Frequency of the feature words.

In this embodiment, the TF-IDF operation, i.e., the following calculation formula, is used

And

obtaining the word frequency tf_ijAnd the inverse document frequency idf_iWherein i and j are positive integers.

Step S12: generating a weight threshold value of the characteristic word through a preset weight corresponding relation according to the word frequency and the inverse document frequency; and comparing the weight threshold of the feature words with a preset frequent word threshold, and acquiring target feature words according to comparison results so as to take the target feature words as frequent words of the text.

It should be noted that, in this embodiment, the preset weight correspondence relationship is adopted, that is, the formula W is calculated as follows_i＝tf_ij*idf_iObtaining a weight threshold value w of the characteristic words_iThe weight threshold value w of the characteristic word is set_iComparing with the preset frequent word threshold value, and mining the weight threshold value w_iThe characteristic words larger than the preset frequent word threshold value are used as frequent words f of the text_i。

Further, the step S20 specifically includes:

step S21: extracting textual information of the text from the sample text set.

It should be noted that, in this embodiment, the text information includes information about concern among authors of the text, information about approval of the text, forwarding and citation, and the like, and is not described herein any more; and taking each text and the corresponding author thereof as nodes respectively.

Step S22: and setting directed edges between the text nodes with the text information according to the text information through a preset mapping relation so as to construct a heterogeneous text network.

It should be noted that, in this embodiment, for two author nodes marked as having a concern relationship, an author node marked as having a forwarding relationship, a forwarded text node, and a text node marked as having a reference relationship, a directed edge is added between nodes having the above corresponding preset mapping relationship; in addition, for an author node which is not marked to have an attention relationship, if one author approves or comments on another author, and the percentage of the text number exceeds a preset attention probability threshold, a directed edge is added, and the abstract representation of the directed edge is as follows:

If(u_icomment on approver d_j)

{

New edge u in network_i→d_j；

}

If(u_iAttention u_j)

{

New edge u in network_i→u_j；

}

Else if(u_inot concern u_j and u_iAttention u_jIs greater than the preset attention probability threshold value)

{

New edge u in network_i→u_j

}

And constructing a two-dimensional heterogeneous text network according to the rules. The table of the correspondence between different edges in the specific network is as follows:

network relationships	Representation form
		Author u₁Published text d₁	u₁-d₁
Author u₁Pay attention to the author u₂	E_u12：u1_→u₂
		Author node u₁Praise or comment text d₄	E_ud14：u₁----→d₄
Text d₁Reference is made to text d₂	E_d12：d₁---→d₂

It is easy to understand that a multidimensional heterogeneous text network can be constructed according to a plurality of nodes and characteristic information thereof, which is not described in detail herein.

Further, the step S30 specifically includes:

step S31: and generating a node influence threshold value by the corresponding text node in the heterogeneous text network through a preset node influence relationship.

It should be noted that, in this embodiment, the preset node influence relationship is adopted, that is, the following calculation formula is adopted

Obtaining the node influence threshold; wherein the ith node and the jth node are directly connected, then a_ij1, otherwise 0; k is a radical of_jRepresents the degree of the j-th node,

representing the probability of the ith node randomly walking to the jth node; s of all nodes except the initial node g in the initial state_i(0) 1, and s_g(0) 0; and finally, averagely distributing the node influence threshold of the node g to other N nodes, wherein the calculation formula is as follows: s_i＝s_i(t_c)+s_g(t_c)·N^-1(ii) a Wherein s is_g(t_c) Is the node influence threshold, t, of node g at steady state_cIndicating the number of convergence times.

Step S32: and comparing the node influence threshold with a preset node influence threshold, and acquiring a target text according to a comparison result so as to take frequent words of the target text as target labels.

It should be noted that, in this embodiment, for a text node whose node influence threshold is greater than the preset node influence threshold, a text corresponding to the text node is mined to obtain a target text, and frequent words of the target text are used as target labels.

Further, the step S40 specifically includes:

step S41: and constructing a frequent word-text matrix according to the frequent words and the text to obtain a text vector corresponding to the text, and generating an intrinsic feature similarity threshold value between the texts through a preset cosine similarity relation for the text vector.

It should be noted that, in this embodiment, the frequent word f to be mined_iAnd constructing a frequent word-text matrix M with the text, wherein M is a matrix of 0-1, and the expression form of M is as follows:

assigning an abstract representation by measuring whether the text contains the frequent words as follows: if (frequent word f)_i∈d_j)

{

M[i][j]＝1；

}

else

{

M[i][j]＝0；

}

Wherein each text d is caused to_jThe expression form of (a) is represented by an n-dimensional text vector composed of 0 and 1, and the expression form is as follows: d_j1, 0. }; and then utilizing the preset cosine similarity relation to calculate an internal feature similarity threshold S between the texts_IndijWherein, the calculation formula of the preset cosine similarity relation is as follows:

i.e. the cosine value between each of said n-dimensional vectors and this vector is calculated.

Step S42: and in the heterogeneous text network, generating an extrinsic feature similarity threshold value between the texts through a preset path similarity relation.

It should be noted that, in this embodiment, the path of each weighted directed edge element is used as a basis

Each containing an attribute function delta on said textual information relation R_l(R_l) Is a determined value, and the similarity between the author nodes is calculated by using the preset path similarity relation, namely the similarity S of the external features of the text is calculated_OutdijThe formula is as follows:

where P is the meta path and the same type objects are x and y.

Step S43: generating a total similarity threshold of the text by presetting a total similarity relation according to the internal feature similarity threshold and the external feature similarity threshold; and comparing the total similarity threshold with a preset text total similarity threshold, and acquiring target text nodes in the heterogeneous text network according to the comparison result.

It should be noted that, in this embodiment, the preset total similarity relationship is adopted, that is, the formula S is calculated as follows_dij＝S_Indij*W_In+S_Outdij*W_OutObtaining the total similarity threshold S_dijWherein W is_In、W_OutRespectively assigning weights of intrinsic feature similarity and weights of extrinsic feature similarity; and taking the text node in the heterogeneous text network with the total similarity threshold value larger than the preset text total similarity threshold value as a target text node.

Further, the step S50 specifically includes:

step S51: if the target text node is a target text node of a directed edge in the heterogeneous text network, the target label is spread among the target text nodes according to the direction of the directed edge; and clustering the texts corresponding to the same target label to obtain a clustering result cluster.

Step S52: if the target text node is a target text node with no directional edge or two-way edge in the heterogeneous text network, sequencing according to a node influence threshold corresponding to the target text node and obtaining a sequencing result, and spreading the target label among the target text nodes according to the sequencing result; and clustering the texts corresponding to the same target label to obtain a clustering result cluster.

It should be noted that, in this embodiment, the sorting result is obtained by sorting the node influence thresholds corresponding to the target text node in a descending order.

In the second embodiment, the frequent words of each text are obtained by performing word segmentation processing on the texts in the sample text set; extracting text information of the text from the sample text set, and constructing a heterogeneous text network according to the text information through a preset mapping relation; generating a node influence threshold value by corresponding text nodes in the heterogeneous text network through a preset node influence relationship, and acquiring a target label according to the node influence threshold value; generating a total similarity threshold value between the texts in the heterogeneous text network through a preset total similarity relation, and acquiring a target text node according to the total similarity threshold value; and transmitting the target label among the target text nodes, and clustering texts corresponding to the same target label to obtain a clustering result cluster. The technical scheme of the invention can solve the technical problems of low tag propagation randomness, low clustering accuracy and low reliability.

In addition, an embodiment of the present invention further provides a storage medium, where a tag propagation clustering program is stored on the storage medium, and when executed by a processor, the tag propagation clustering program implements the following operations:

Further, the tag propagated clustering program when executed by the processor further implements the following operations:

extracting text information of the text from the sample text set;

and acquiring a target text node according to the total similarity threshold.

according to the total similarity threshold value;

In addition, referring to fig. 4, an embodiment of the present invention further provides a tag propagation clustering apparatus, where the tag propagation clustering apparatus includes:

and the frequent word obtaining module 10 is configured to perform word segmentation processing on the texts in the sample text set to obtain frequent words of each text.

And the heterogeneous text network construction module 20 is configured to extract text information of the text from the sample text set, and construct a heterogeneous text network according to the text information through a preset mapping relationship.

And the target label obtaining module 30 is configured to generate a node influence threshold value from the corresponding text node in the heterogeneous text network according to a preset node influence relationship, and obtain a target label according to the node influence threshold value.

And the target text node obtaining module 40 is configured to generate a total similarity threshold between the texts in the heterogeneous text network through a preset total similarity relationship, and obtain a target text node according to the total similarity threshold.

And a propagation and clustering module 50, configured to propagate the target labels among the target text nodes, and cluster texts corresponding to the same target labels to obtain a clustering result cluster.

Other embodiments or specific implementation manners of the label propagation clustering device of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A label propagation clustering method is characterized by comprising the following steps:

propagating the target labels among the target text nodes, and clustering texts corresponding to the same target labels to obtain a clustering result cluster;

the propagating the target label among the target text nodes and clustering texts corresponding to the same target label to obtain a clustering result cluster specifically includes:

2. The label propagation clustering method according to claim 1, wherein the performing word segmentation on the texts in the sample text set to obtain frequent words of each text specifically comprises:

3. The label propagation clustering method according to claim 1, wherein the extracting text information of the text from the sample text set and constructing a heterogeneous text network according to the text information through a preset mapping relationship specifically comprises:

extracting text information of the text from the sample text set;

4. The method according to any one of claims 1 to 3, wherein the generating a node influence threshold value from a preset node influence relationship for a corresponding text node in the heterogeneous text network, and obtaining a target label according to the node influence threshold value specifically comprises:

5. The label propagation clustering method according to any one of claims 1 to 3, wherein the generating a total similarity threshold between the texts through a preset total similarity relationship in the heterogeneous text network, and obtaining a target text node according to the total similarity threshold specifically comprises:

and acquiring a target text node according to the total similarity threshold.

6. The label propagation clustering method according to claim 5, wherein the obtaining of the target text node according to the total similarity threshold specifically comprises:

according to the total similarity threshold value;

7. A terminal device, characterized in that the terminal device comprises: memory, a processor and a tag-propagated clustering program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the tag-propagated clustering method according to any one of claims 1 to 6.

8. A storage medium, characterized in that the storage medium has stored thereon a tag-propagated clustering program, which when executed by a processor implements the steps of the tag-propagated clustering method according to any one of claims 1 to 6.

9. A label propagation clustering device, characterized in that the label propagation clustering device comprises:

the propagation and clustering module is used for propagating the target labels among the target text nodes and clustering texts corresponding to the same target labels to obtain a clustering result cluster;

the propagation and clustering module is further configured to propagate the target label between the target text nodes according to the direction of the directed edge when the target text node is a target text node of the directed edge in the heterogeneous text network;

the propagation and clustering module is further configured to, when the target text node is a target text node with no directional edge or a bidirectional edge in the heterogeneous text network, perform ranking according to a node influence threshold corresponding to the target text node and obtain a ranking result, and propagate the target label among the target text nodes according to the ranking result;

and the propagation and clustering module is also used for clustering the texts corresponding to the same target label to obtain a clustering result cluster.