CN110442674A

CN110442674A - Clustering method, terminal device, storage medium and the device that label is propagated

Info

Publication number: CN110442674A
Application number: CN201910504157.0A
Authority: CN
Inventors: 尹帆; 张广凯; 宋中山; 覃俊; 郑禄; 吴经龙
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2019-11-12
Anticipated expiration: 2039-06-11
Also published as: CN110442674B

Abstract

The invention discloses clustering method, terminal device, storage medium and devices that a kind of label is propagated, this method comprises: obtaining the frequent word of each text；The text information for extracting the text is concentrated from sample text, and heterogeneous text network is constructed by default mapping relations according to the text information；Corresponding text node in the heterogeneous text network, which is generated node by default node influence power relationship, influences force threshold, influences force threshold according to the node and obtains target labels；Total similarity threshold between the text is generated by presetting total similarity relationship in the heterogeneous text network, target text node is obtained according to total similarity threshold；The target labels are propagated between the target text node, and there will be the corresponding text of identical target labels to cluster, to obtain cluster result cluster.Technical solution of the present invention is able to solve label and propagates randomness and cluster accuracy and technical problem with a low credibility.

Description

Clustering method, terminal device, storage medium and the device that label is propagated

Technical field

The present invention relates to the clustering methods of label propagation and clustering technique field more particularly to a kind of propagation of label, terminal Equipment, storage medium and device.

Background technique

At present agricultural production, information retrieval, finance and in terms of, require for a large amount of number It is believed that breath handled after carry out again using, generally will use label carry out dissemination process after clustered again；For example, grinding When studying carefully the analysis of crop pests, needs to carry out aggrieved phenomenon to aggrieved crops to carry out mark, then carry out judging whether to belong to Which kind of, in pest, cracking this phenomenon can be clustered to obtain as a result, finally can using label propagation algorithm It is remedied for this pest.But there is only randomnesss for this label propagation algorithm, and to mark treated data Its accuracy and confidence level be not high after being clustered.

Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.

Summary of the invention

The main purpose of the present invention is to provide clustering method, terminal device, storage medium and dresses that a kind of label is propagated It sets, it is intended to solve label and propagate randomness and cluster accuracy and technical problem with a low credibility.

To achieve the above object, the present invention provides a kind of clustering method that label is propagated, the cluster side that the label is propagated Method the following steps are included:

Word segmentation processing is carried out to the text that sample text is concentrated, to obtain the frequent word of each text；

The text information for extracting the text is concentrated from the sample text, is reflected according to the text information by default It penetrates relationship and constructs heterogeneous text network；

Corresponding text node in the heterogeneous text network, which is generated node by default node influence power relationship, to be influenced Force threshold influences force threshold according to the node and obtains target labels；

Total similarity threshold between the text is generated by presetting total similarity relationship in the heterogeneous text network Value obtains target text node according to total similarity threshold；

The target labels are propagated between the target text node, and there will be the identical target mark It signs corresponding text to be clustered, to obtain cluster result cluster.

Preferably, the text concentrated to sample text carries out word segmentation processing, to obtain the frequent word of each text, tool Body includes:

Participle and part-of-speech tagging operation are carried out by the text that FNLP concentrates the sample sample text, to obtain feature Word；

TF-IDF operation is carried out to the Feature Words, to obtain the word frequency and inverse document frequency of the Feature Words；

According to the word frequency and the inverse document frequency, the power of the Feature Words is generated by presetting weight corresponding relationship Weight threshold value；

The weight threshold of the Feature Words is compared with frequent word threshold value is preset, target is obtained according to comparison result Feature Words, using the target signature word as the frequent word of the text.

Preferably, described that the text information for extracting the text is concentrated from the sample text, according to the text information Heterogeneous text network is constructed by default mapping relations, is specifically included:

The text information for extracting the text is concentrated from the sample text；

According to the text information by presetting mapping relations, will be set between the text node with the text information It is set to directed edge, to construct heterogeneous text network.

Preferably, described that corresponding text node in the heterogeneous text network is passed through into default node influence power relationship Generating node influences force threshold, influences force threshold according to the node and obtains target labels, specifically includes:

Corresponding text node in the heterogeneous text network, which is generated node by default node influence power relationship, to be influenced Force threshold；

The node is influenced force threshold to be compared with default node influence force threshold, mesh is obtained according to comparison result Text is marked, using the frequent word of the target text as target labels.

Preferably, described to be generated between the text in the heterogeneous text network by presetting total similarity relationship Total similarity threshold, target text node is obtained according to the total similarity threshold, is specifically included:

Frequent word-text matrix is constructed according to the frequent word and the text, to obtain the corresponding text of the text Vector, and the internal characteristics similarity between the text is generated by default cosine similarity relationship to the text vector Threshold value；

In the heterogeneous text network, the external spy between the text is generated by preset path similarity relationship Levy similarity threshold；

According to the internal characteristics similarity threshold and the external feature similarity threshold, by presetting total similarity Relationship generates total similarity threshold of the text；

Target text node is obtained according to total similarity threshold.

Preferably, described that target text node is obtained according to total similarity threshold, it specifically includes:

According to total similarity threshold；

Total similarity threshold is compared with the total similarity threshold of pre-set text, institute is obtained according to comparison result State the target text node in heterogeneous text network.

Preferably, described to propagate the target labels between the target text node, and will have identical The corresponding text of the target labels is clustered, and to obtain cluster result cluster, is specifically included:

It, will be described if the target text node is the target text node of directed edge in the heterogeneous text network Target labels are propagated between the target text node according to the direction of the directed edge；

If the target text node is nonoriented edge or the target text section on two-way side in the heterogeneous text network Point influences force threshold according to the corresponding node of the target text node and is ranked up and obtains ranking results, by the target Label is propagated between the target text node according to the ranking results；

There to be the corresponding text of identical target labels to cluster, to obtain cluster result cluster.

In addition, to achieve the above object, the present invention also proposes that a kind of terminal device, the terminal device include: storage Device, processor and the Cluster Program for being stored in the label propagation that can be run on the memory and on the processor, it is described The Cluster Program that label is propagated realizes the step for the clustering method that label as described above is propagated when being executed by the processor Suddenly.

In addition, to achieve the above object, the present invention also proposes a kind of storage medium, mark is stored on the storage medium The Cluster Program propagated is signed, the Cluster Program that the label is propagated realizes that label as described above passes when being executed by processor The step of clustering method broadcast.

In addition, to achieve the above object, the present invention also proposes that a kind of clustering apparatus that label is propagated, the label are propagated Clustering apparatus include:

Frequent word obtains module, and the text for concentrating to sample text carries out word segmentation processing, to obtain the frequency of each text Numerous word；

Heterogeneous text network struction module, for concentrating the text information for extracting the text, root from the sample text Heterogeneous text network is constructed by default mapping relations according to the text information；

Target labels obtain module, for corresponding text node in the heterogeneous text network to be passed through default node Influence power relationship, which generates node, influences force threshold, influences force threshold according to the node and obtains target labels；

Target text node obtains module, for raw by presetting total similarity relationship in the heterogeneous text network At total similarity threshold between the text, target text node is obtained according to total similarity threshold；

Propagation and cluster module, for the target labels to be propagated between the target text node, and will It is clustered with the corresponding text of the identical target labels, to obtain cluster result cluster.

In the present invention, word segmentation processing is carried out by the text concentrated to sample text, to obtain the frequent word of each text； The text information for extracting the text is concentrated from the sample text, according to the text information by presetting mapping relations structure Build heterogeneous text network；Corresponding text node in the heterogeneous text network is generated by default node influence power relationship Node influences force threshold, influences force threshold according to the node and obtains target labels；By pre- in the heterogeneous text network If total similarity relationship generates total similarity threshold between the text, target text is obtained according to total similarity threshold This node；The target labels are propagated between the target text node, and there will be the identical target labels Corresponding text is clustered, to obtain cluster result cluster.Technical solution of the present invention be able to solve label propagate randomness and Cluster accuracy and technical problem with a low credibility.

Detailed description of the invention

Fig. 1 is the terminal device structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to；

Fig. 2 is the flow diagram for the clustering method first embodiment that label of the present invention is propagated；

Fig. 3 is the flow diagram for the clustering method second embodiment that label of the present invention is propagated；

Fig. 4 is the structural block diagram for the clustering apparatus first embodiment that label of the present invention is propagated.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to limit this hair It is bright.

Referring to Fig.1, Fig. 1 is the terminal device structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to.

As shown in Figure 1, the terminal device may include: processor 1001, such as central processing unit (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, Communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display screen (Display), optional user interface 1003 can also include standard wireline interface and wireless interface, for user interface 1003 Wireline interface in the present invention can be USB interface.Network interface 1004 optionally may include the wireline interface of standard, nothing Line interface (such as Wireless Fidelity (WIreless-FIdelity, WI-FI) interface).Memory 1005 can be depositing at random for high speed Access to memory (Random Access Memory, RAM) memory, is also possible to stable memory (Non-volatile Memory, NVM), such as magnetic disk storage.Memory 1005 optionally can also be depositing independently of aforementioned processor 1001 Storage device.

It, can be with it will be understood by those skilled in the art that structure shown in Fig. 1 does not constitute the restriction to terminal device Including perhaps combining certain components or different component layouts than illustrating more or fewer components.

As shown in Figure 1, as may include operating system, network in a kind of memory 1005 of computer storage medium The Cluster Program that communication module, Subscriber Interface Module SIM and label are propagated.

In terminal device shown in Fig. 1, network interface 1004 is mainly used for connecting background server, with the backstage Server carries out data communication；User interface 1003 is mainly used for connecting peripheral hardware, carries out data communication with the peripheral hardware；It is described The Cluster Program that terminal device calls the label stored in memory 1005 to propagate by processor 1001, and execute the present invention The clustering method that the label that embodiment provides is propagated.

Further, the Cluster Program that processor 1001 can call the label stored in memory 1005 to propagate, also holds The following operation of row:

Target text node is obtained according to total similarity threshold.

According to total similarity threshold；

In the present embodiment, word segmentation processing is carried out by the text concentrated to sample text, to obtain the frequent of each text Word；The text information for extracting the text is concentrated from the sample text, according to the text information by presetting mapping relations Construct heterogeneous text network；Corresponding text node in the heterogeneous text network is raw by default node influence power relationship Force threshold is influenced at node, force threshold is influenced according to the node and obtains target labels；Pass through in the heterogeneous text network It presets total similarity relationship and generates total similarity threshold between the text, target is obtained according to total similarity threshold Text node；The target labels are propagated between the target text node, and there will be the identical target mark It signs corresponding text to be clustered, to obtain cluster result cluster.Technical solution of the present invention is able to solve label and propagates randomness With cluster accuracy and technical problem with a low credibility.

Based on above-mentioned hardware configuration, the embodiment for the clustering method that label of the present invention is propagated is proposed.

Referring to Fig. 2, Fig. 2 is the flow diagram for the clustering method first embodiment that label of the present invention is propagated, and proposes this hair The clustering method first embodiment that bright label is propagated.

In the first embodiment, the label is propagated clustering method the following steps are included:

Step S10: word segmentation processing is carried out to the text that sample text is concentrated, to obtain the frequent word of each text.

It is understood that the text refers to the form of expression of written language in the present embodiment, from the perspective of from literature angle, Usually there is a complete, sentence of system meaning or the combination of multiple sentences；One text can be sentence, one A paragraph or a chapter, no longer repeat one by one herein.

In the concrete realization, preparatory collecting sample text set carries out participle and part of speech mark to the text that sample text is concentrated Note operation obtains its word frequency and inverse document frequency according to the Feature Words, further according to the default weight pair to obtain Feature Words It should be related to obtain the frequent word of each text.

Step S20: concentrating the text information for extracting the text from the sample text, logical according to the text information It crosses default mapping relations and constructs heterogeneous text network.

It should be noted that in the present embodiment, the text information includes concern information between the author of text, text Originally the information etc. for thumbing up, forwarding and quoting, no longer repeats one by one herein.

In the concrete realization, the text information for extracting the text is concentrated from the sample text, according to the text envelope Breath will be set as directed edge between the text node with the text information, to construct heterogeneous text by presetting mapping relations Present networks.

Step S30: corresponding text node in the heterogeneous text network is generated by default node influence power relationship Node influences force threshold, influences force threshold according to the node and obtains target labels.

It should be noted that influencing force threshold according to the node in the present embodiment, the node is influenced into force threshold Force threshold is influenced with default node to be compared, and target text is obtained according to comparison result, by the frequent of the target text Word is as target labels.

Step S40: in the heterogeneous text network by preset total similarity relationship generate it is total between the text Similarity threshold obtains target text node according to total similarity threshold.

It should be noted that obtaining institute according to the frequent word and the default cosine similarity relationship in the present embodiment State internal characteristics similarity threshold；It is obtained in the heterogeneous text network by the preset path similarity relationship simultaneously The external feature similarity threshold, finally according to the internal characteristics similarity threshold and the external feature similarity threshold Value, generates total similarity threshold of the text by presetting total similarity relationship to obtain target text node.

Step S50: the target labels are propagated between the target text node, and there will be identical institute It states the corresponding text of target labels to be clustered, to obtain cluster result cluster.

It should be noted that label propagation algorithm is quoted in the present embodiment, by the target labels in the target text It is propagated between this node, will finally have the corresponding text of identical target labels to cluster, to obtain cluster knot Fruit cluster is until whole process terminates.

It is worth noting that introducing the oriented heterogeneous text network of weighting in the present embodiment, the multidimensional for excavating text is special Sign carries out Similarity measures, improves the accuracy and confidence level of cluster result.

In the first embodiment, word segmentation processing is carried out by the text concentrated to sample text, to obtain the frequency of each text Numerous word；The text information for extracting the text is concentrated from the sample text, is closed according to the text information by default mapping System constructs heterogeneous text network；By corresponding text node in the heterogeneous text network by presetting node influence power relationship Generating node influences force threshold, influences force threshold according to the node and obtains target labels；Lead in the heterogeneous text network It crosses and presets total similarity relationship and generate total similarity threshold between the text, mesh is obtained according to total similarity threshold Mark text node；The target labels are propagated between the target text node, and there will be the identical target The corresponding text of label is clustered, to obtain cluster result cluster.Technical solution of the present invention is able to solve label and propagates at random Property and cluster accuracy and technical problem with a low credibility.

It is the flow diagram for the clustering method second embodiment that label of the present invention is propagated referring to Fig. 3, Fig. 3, based on above-mentioned First embodiment shown in Fig. 2 proposes the second embodiment for the clustering method that label of the present invention is propagated.

In a second embodiment, the step S10, specifically includes:

Step S11: by FNLP (development kit of the Chinese natural language text-processing based on machine learning) to institute The text for stating sample sample text concentration carries out participle and part-of-speech tagging operation, to obtain Feature Words；The Feature Words are carried out TF-IDF (Term frequency-inverse document frequency, for the normal of information retrieval and data mining With weighting technique, wherein TF means that word frequency Term Frequency, IDF mean inverse document frequency Inverse Document Frequency) operation, to obtain the word frequency and inverse document frequency of the Feature Words.

It should be noted that in the present embodiment, using TF-IDF operation, that is, following calculation formulaAndObtain the word frequency tf_ijAnd the inverse document frequency idf_i, Wherein i and j is positive integer.

Step S12: according to the word frequency and the inverse document frequency, the spy is generated by default weight corresponding relationship Levy the weight threshold of word；The weight threshold of the Feature Words is compared with frequent word threshold value is preset, is obtained according to comparison result Target signature word is taken, using the target signature word as the frequent word of the text.

It should be noted that in the present embodiment, using the default weight corresponding relationship, that is, following calculation formula W_i= tf_ij*idf_iObtain the weight threshold w of the Feature Words_i, by the weight threshold w of the Feature Words_iFrequent word threshold is preset with described Value is compared, and excavates the weight threshold w_iFrequency greater than the Feature Words for presetting frequent word threshold value as the text Numerous word f_i。

Further, the step S20, specifically includes:

Step S21: the text information for extracting the text is concentrated from the sample text.

It should be noted that in the present embodiment, the text information includes concern information between the author of text, text Originally the information etc. for thumbing up, forwarding and quoting, no longer repeats one by one herein；Wherein, by each text and its corresponding author point It Zuo Wei not node.

Step S22: according to the text information by presetting mapping relations, by the text section with the text information It is set as directed edge between point, to construct heterogeneous text network.

It should be noted that for two author nodes indicated with concern relation, there is forwarding to close in the present embodiment It the author node of system and the text node that is forwarded and indicates the text node with adduction relationship, will have the above correspondence Default mapping relations situation node between increase newly a directed edge；In addition for not indicating the work with concern relation Person's node, it is more than default concern that an author, which thumbs up or comment on the percentage of amount of text described in another author, if it exists Probability threshold value then increases a directed edge newly, and abstract representation is as follows:

If(u_iThumb up or comment d_j)

{

Increase side u in network newly_i→d_j；

}

If(u_iPay close attention to u_j)

{

Increase side u in network newly_i→u_j；

}

Else if(u_iNot pays close attention to u_j and u_iPay close attention to u_jAssociation probability be greater than the default concern probability threshold value)

{

Increase side u in network newly_i→u_j

}

According to the heterogeneous text network of the above rule building two dimension.Different sides mapping table is as follows in specific network:

It can be readily appreciated that also the heterogeneous text network of multidimensional can be constructed according to multiple nodes and its characteristic information, herein not It repeats one by one again.

Further, the step S30, specifically includes:

Step S31: corresponding text node in the heterogeneous text network is generated by default node influence power relationship Node influences force threshold.

It should be noted that in the present embodiment, using the default node influence power relationship, that is, following calculation formulaObtaining the node influences force threshold；Wherein the i-th node and jth node are connected directly then a_ij =1, it is otherwise 0；k_jThe degree of jth node is represented,Represent the i-th node random walk to jth node probability；Initial shape The s of all nodes under state in addition to start node g_i(0)=1, s_g(0)=0；The node of node g is finally influenced into force threshold Other N number of nodes are averagely given, calculation formula is as follows: S_i=s_i(t_c)+s_g(t_c)·N^-1；Wherein, s_g(t_c) it is under stable state The node of node g influences force threshold, t_cIndicate convergence number.

Step S32: the node is influenced into force threshold and is compared with default node influence force threshold, is tied according to comparing Fruit obtains target text, using the frequent word of the target text as target labels.

It should be noted that influencing force threshold in the present embodiment for the node and being greater than the default node influence power The text node of threshold value excavates its corresponding text to obtain target text, and using the frequent word of the target text as mesh Mark label.

Further, the step S40, specifically includes:

Step S41: frequent word-text matrix is constructed according to the frequent word and the text, to obtain the text pair The text vector answered, and it is special by presetting the inherence that cosine similarity relationship generates between the text to the text vector Levy similarity threshold.

It should be noted that in the present embodiment, by the frequent word f of excavation_iFrequent word-text is constructed with the text This matrix M, wherein M is 0-1 matrix, the form of expression of M are as follows:

Abstract table is assigned by whether containing the frequent word in the measurement text Show as follows: If (frequent word f_i∈d_f)

{

M [i] [j]=1；

}

else

{

M [i] [j]=0；

}

Wherein make each text d_jThe form of expression be to be indicated by 0,1 n this vector of Balakrishnan for constituting, the form of expression It is as follows: d_f={ 1,0 ..., }；The default cosine similarity relationship is recycled to calculate the internal characteristics phase between the text Like degree threshold value S_Indij, wherein the calculation formula of the default cosine similarity relationship is as follows:Calculate the cosine value between each described n-dimensional vector and this vector.

Step S42: it in the heterogeneous text network, is generated between the text by preset path similarity relationship External feature similarity threshold.

It should be noted that weighting directed edge member path according to each in the present embodimentIn, each includes the attribute function δ on the text information relationship R_l(R_l) be One determining value calculates the similarity between author node using the preset path similarity relationship, i.e., described in calculating The external feature similarity S of text_OutdijFormula is as follows:

Wherein P is first path, same type pair As for x and y.

Step S43: according to the internal characteristics similarity threshold and the external feature similarity threshold, by default Total similarity relationship generates total similarity threshold of the text；By total similarity threshold and the total similarity of pre-set text Threshold value is compared, and obtains the target text node in the heterogeneous text network according to comparison result.

It should be noted that presetting the i.e. following calculation formula S of total similarity relationship using described in the present embodiment_dij= S_Indij*W_In+S_Outdij*W_OutObtain total similarity threshold S_dij, wherein W_In、W_OutRespectively assign internal characteristics similitude Weight and external feature similitude weight；Total similarity threshold is greater than the total similarity threshold of the pre-set text The heterogeneous text network in text node as target text node.

Further, the step S50, specifically includes:

Step S51: if the target text node is the target text node of directed edge in the heterogeneous text network, Then direction of the target labels between the target text node according to the directed edge is propagated；There to be phase With the target labels, corresponding text is clustered, to obtain cluster result cluster.

Step S52: if the target text node is the target on nonoriented edge or two-way side in the heterogeneous text network Text node influences force threshold according to the corresponding node of the target text node and is ranked up and obtains ranking results, by institute Target labels are stated to be propagated between the target text node according to the ranking results；There to be the identical target The corresponding text of label is clustered, to obtain cluster result cluster.

It should be noted that the ranking results are according to the corresponding node of the target text node in the present embodiment Influence the ranking results that the arrangement of force threshold descending obtains.

In a second embodiment, word segmentation processing is carried out by the text concentrated to sample text, to obtain the frequency of each text Numerous word；The text information for extracting the text is concentrated from the sample text, is closed according to the text information by default mapping System constructs heterogeneous text network；By corresponding text node in the heterogeneous text network by presetting node influence power relationship Generating node influences force threshold, influences force threshold according to the node and obtains target labels；Lead in the heterogeneous text network It crosses and presets total similarity relationship and generate total similarity threshold between the text, mesh is obtained according to total similarity threshold Mark text node；The target labels are propagated between the target text node, and there will be the identical target The corresponding text of label is clustered, to obtain cluster result cluster.Technical solution of the present invention is able to solve label and propagates at random Property and cluster accuracy and technical problem with a low credibility.

In addition, the embodiment of the present invention also proposes a kind of storage medium, the poly- of label propagation is stored on the storage medium Class method realizes following operation when the Cluster Program that the label is propagated is executed by processor:

Further, following operation is also realized when the Cluster Program that the label is propagated is executed by processor:

Target text node is obtained according to total similarity threshold.

According to total similarity threshold；

In addition, the embodiment of the present invention also proposes a kind of clustering apparatus that label is propagated, what the label was propagated referring to Fig. 4 Clustering apparatus includes:

Frequent word obtains module 10, and the text for concentrating to sample text carries out word segmentation processing, to obtain each text Frequent word.

Heterogeneous text network struction module 20, for concentrating the text information for extracting the text from the sample text, Heterogeneous text network is constructed by default mapping relations according to the text information.

Target labels obtain module 30, for corresponding text node in the heterogeneous text network to be passed through default section Point influence power relationship, which generates node, influences force threshold, influences force threshold according to the node and obtains target labels.

Target text node obtain module 40, in the heterogeneous text network by presetting total similarity relationship Total similarity threshold between the text is generated, target text node is obtained according to total similarity threshold.

Propagation and cluster module 50, for the target labels to be propagated between the target text node, and There to be the corresponding text of identical target labels to cluster, to obtain cluster result cluster.

The other embodiments or specific implementation for the clustering apparatus that label of the present invention is propagated can refer to above-mentioned each side Method embodiment, details are not described herein again.

It should be noted that, in this document, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or the system that include a series of elements not only include those elements, It but also including other elements that are not explicitly listed, or further include for this process, method, article or system institute Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or system including the element.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.If listing equipment for drying Unit claim in, several in these devices, which can be, to be embodied by the same item of hardware.Word One, second and the use of third etc. do not indicate any sequence, can be title by these word explanations.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but many situations It is lower the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to the prior art The part to contribute can be embodied in the form of software products, which is stored in a storage and is situated between Matter (such as read-only memory mirror image (Read Only Memory image, ROM)/random access memory (Random Access Memory, RAM), magnetic disk, CD) in, including some instructions are used so that terminal device (can be mobile phone, computer, Server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content, it is relevant to be applied directly or indirectly in other Technical field is included within the scope of the present invention.

Claims

1. the clustering method that a kind of label is propagated, which is characterized in that clustering method that the label is propagated the following steps are included:

The text information for extracting the text is concentrated from the sample text, according to the text information by presetting mapping relations Construct heterogeneous text network；

Corresponding text node in the heterogeneous text network is generated into node influence power threshold by default node influence power relationship Value influences force threshold according to the node and obtains target labels；

Total similarity threshold between the text, root are generated by presetting total similarity relationship in the heterogeneous text network Target text node is obtained according to total similarity threshold；

The target labels are propagated between the target text node, and there will be the identical target labels corresponding Text clustered, to obtain cluster result cluster.

2. the clustering method that label as described in claim 1 is propagated, which is characterized in that the text concentrated to sample text Word segmentation processing is carried out to specifically include to obtain the frequent word of each text:

Participle and part-of-speech tagging operation are carried out by the text that FNLP concentrates the sample sample text, to obtain Feature Words；

According to the word frequency and the inverse document frequency, the weight threshold of the Feature Words is generated by presetting weight corresponding relationship Value；

The weight threshold of the Feature Words is compared with frequent word threshold value is preset, target signature is obtained according to comparison result Word, using the target signature word as the frequent word of the text.

3. the clustering method that label as described in claim 1 is propagated, which is characterized in that described to be mentioned from sample text concentration The text information for taking the text constructs heterogeneous text network by default mapping relations according to the text information, specific to wrap It includes:

According to the text information by presetting mapping relations, will be provided between the text node with the text information Xiang Bian, to construct heterogeneous text network.

4. the clustering method that the label as described in claims 1 to 3 any one is propagated, which is characterized in that it is described will be described different Corresponding text node, which generates node by default node influence power relationship, in matter text network influences force threshold, according to the section Point influences force threshold and obtains target labels, specifically includes:

Corresponding text node in the heterogeneous text network is generated into node influence power threshold by default node influence power relationship Value；

The node is influenced force threshold to be compared with default node influence force threshold, target text is obtained according to comparison result This, using the frequent word of the target text as target labels.

5. the clustering method that the label as described in claims 1 to 3 any one is propagated, which is characterized in that described described different Total similarity threshold between the text is generated by presetting total similarity relationship in matter text network, according to described total similar It spends threshold value and obtains target text node, specifically include:

Frequent word-text matrix is constructed according to the frequent word and the text, to obtain the corresponding text vector of the text, And the internal characteristics similarity threshold between the text is generated by default cosine similarity relationship to the text vector；

In the heterogeneous text network, the external feature generated between the text by preset path similarity relationship is similar Spend threshold value；

It is raw by presetting total similarity relationship according to the internal characteristics similarity threshold and the external feature similarity threshold At total similarity threshold of the text；

Target text node is obtained according to total similarity threshold.

6. the clustering method that label as claimed in claim 5 is propagated, which is characterized in that described according to total similarity threshold Target text node is obtained, is specifically included:

According to total similarity threshold；

Total similarity threshold is compared with the total similarity threshold of pre-set text, is obtained according to comparison result described heterogeneous Target text node in text network.

7. the clustering method that the label as described in claim 1 to 6 any one is propagated, which is characterized in that described by the mesh Mark label is propagated between the target text node, and will have the corresponding text of identical target labels to gather Class is specifically included with obtaining cluster result cluster:

If the target text node is the target text node of directed edge in the heterogeneous text network, by the target mark The direction between the target text node according to the directed edge is signed to be propagated；

If the target text node is nonoriented edge or the target text node on two-way side in the heterogeneous text network, according to The corresponding node of the target text node influences force threshold and is ranked up and obtains ranking results, by the target labels in institute It states and is propagated between target text node according to the ranking results；

8. a kind of terminal device, which is characterized in that the terminal device includes: memory, processor and is stored in the storage On device and Cluster Program that the label that can run on the processor is propagated, the Cluster Program that the label is propagated is by the place Manage the step of clustering method that the label as described in any one of claims 1 to 7 is propagated is realized when device executes.

9. a kind of storage medium, which is characterized in that be stored with the Cluster Program of label propagation, the label on the storage medium The cluster side that the label as described in any one of claims 1 to 7 is propagated is realized when the Cluster Program of propagation is executed by processor The step of method.

10. the clustering apparatus that a kind of label is propagated, which is characterized in that the clustering apparatus that the label is propagated includes:

Frequent word obtains module, and the text for concentrating to sample text carries out word segmentation processing, to obtain the frequent word of each text；

Heterogeneous text network struction module, for concentrating the text information for extracting the text from the sample text, according to institute It states text information and constructs heterogeneous text network by default mapping relations；

Target text node obtains module, in the heterogeneous text network by presetting described in total similarity relationship generates Total similarity threshold between text obtains target text node according to total similarity threshold；

Propagation and cluster module, for propagating the target labels between the target text node, and will have The corresponding text of identical target labels is clustered, to obtain cluster result cluster.