CN111831820A - News and case correlation analysis method based on case element guidance and deep clustering - Google Patents

News and case correlation analysis method based on case element guidance and deep clustering Download PDF

Info

Publication number
CN111831820A
CN111831820A CN202010166279.6A CN202010166279A CN111831820A CN 111831820 A CN111831820 A CN 111831820A CN 202010166279 A CN202010166279 A CN 202010166279A CN 111831820 A CN111831820 A CN 111831820A
Authority
CN
China
Prior art keywords
case
clustering
news
text
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010166279.6A
Other languages
Chinese (zh)
Other versions
CN111831820B (en
Inventor
余正涛
李云龙
高盛祥
郭军军
相艳
线岩团
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010166279.6A priority Critical patent/CN111831820B/en
Publication of CN111831820A publication Critical patent/CN111831820A/en
Application granted granted Critical
Publication of CN111831820B publication Critical patent/CN111831820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Abstract

The invention relates to a news and case correlation analysis method based on case element guidance and deep clustering, which comprises the steps of firstly extracting important sentence representation texts; secondly, characterizing cases by using case elements, and using the case elements to initially cluster centers and guide a clustering search process; and finally, a convolution self-encoder is selected to obtain text representation, a reconstruction loss and clustering loss combined training network is utilized to enable the representation of the text to be closer to the case, the text representation and the clustering process are unified into the same frame, and self-encoder parameters and clustering model parameters are alternately updated to realize text clustering. The method aims at the problems that the current clustering algorithm lacks effective guide information for news and case correlation analysis tasks, so that clustering divergence is caused, and the accuracy of results is reduced, gives full play to the guidance of case elements in the clustering process and text vectorization representation, and effectively improves the accuracy of clustering results.

Description

News and case correlation analysis method based on case element guidance and deep clustering
Technical Field
The invention relates to a news and case correlation analysis method based on case element guidance and deep clustering, and belongs to the technical field of natural language processing.
Background
The case field public opinion analysis is developed by news texts related to a case, the purpose of the news and case correlation analysis is to judge whether the news texts are related to the case, the case field public opinion analysis is an important link, and the case field public opinion analysis has important significance. News and case correlation analysis can be regarded as a text clustering process, namely news texts describing the same case are clustered under the same case cluster.
Currently, the related research on text clustering can be divided into two methods based on statistics and deep learning. However, for news and case correlation analysis tasks, due to the lack of effective guide information, clustering divergence is easily caused by the existing method, and the accuracy of results is reduced.
Disclosure of Invention
The invention provides a news and case correlation analysis method based on case element guidance and deep clustering, which is used for solving the problems that the existing clustering method lacks effective guidance information for news and case correlation analysis tasks, clustering divergence is easy to cause, the accuracy of results is reduced and the like.
The technical scheme of the invention is as follows: the news and case correlation analysis method based on case element guidance and deep clustering comprises the following steps:
step1, compressing the case-related news text by using a plurality of summarization technologies; extracting the abstracts of the news text by adopting a plurality of abstraction methods, synthesizing the abstracts by using a voting method, extracting important information representation texts, and realizing text compression;
step2, representing the case by using the mean value of the case element word vectors to obtain the vectorization representation of the case;
step3, the compressed news text data is passed through a convolution self-encoder to obtain a text vectorization representation; wherein, a Text-CNN model encoder is used, a deconvolution network is used to form a decoder part, and the minimum mean square error loss is used as the reconstruction loss of the convolution self-encoding;
step4, utilizing the vectorization representation of the case to initialize a clustering center, unifying the text vectorization representation and the clustering process into the same frame, and alternately updating the self-encoder parameters and the clustering model parameters to realize text clustering.
For a given set of news text vectors Hi}i=1,2,...,N,HiAnd (3) obtaining a vectorized representation for the ith news document through a convolution self-encoder. The task here is to divide the news text of N different cases into k different case clusters, i.e. C ═ C1,...,Cr,...,Ck}。
Further, the Step1 includes the specific steps of:
step1.1, firstly formally describing various abstract text compression tasks as follows: let a news text be S ═ S1,S2,...,SpP sentences are contained in total, and the abstracts generated by q methods are respectively set as L1v,L2v,...,LqvAbbreviated as L1v:LqvWherein each abstract contains v sentences, including o different sentences, with the goal of starting from L1v:LqvSelecting z sentences as compressed texts;
defining the ith abstract as fi(. o.), then:
Liv=fi(S) (1)
here, the news text is abstracted by using 7 abstraction methods, namely Lead, Luhn, LSA, LexRank, TextRank, SumBasic, KL-Sum, so that i belongs to [1,7], i is equal to 7;
selecting z sentences with the highest frequency of occurrence in the plurality of abstracts as compressed texts, and selecting the sentences with the front positions by considering the positions of the sentences in the document when the frequency of occurrence is the same; in addition, news headlines are also considered herein to be part of the news and are topical and factual, so headline information is also added to the compressed text collection.
Further, Step2 includes:
the case elements are structured displays of cases, and the cases can be characterized by the case elements. If Er={e1,e2,...emThe case element set of the r-th case includes m case elements in total, and each case element eiIt can be characterized as a d-dimensional word vector wiI.e. Er={w1,w2,...wm};
Mitchel et al found that vector addition is a simple and effective semantic combination method. By taking the idea as a reference, the case is vectorized by the mean value of the word vectors of the case elements: let Cen ber∈RdFor the vectorized representation of the r-th case, the calculation method is as follows:
Figure BDA0002407580510000021
assuming there are a total of k cases, using Cen to represent the set of cases, then:
Cen={Cen1,...,Cenr,...,Cenk} (3)。
further, Step3 includes:
and constructing a word vector matrix for the compressed document, selecting a convolution self-encoder, and training a network by utilizing reconstruction loss and clustering loss.
The specific steps of Step3 are as follows:
step3.1, set X of compressed sentences of a news text Si∈RkIs a word vector of the nth word in the sentence set X, and the sentence set contains n words in total, then the news text is represented as:
Figure BDA0002407580510000031
here, the first and second liquid crystal display panels are,
Figure BDA0002407580510000032
splicing, namely constructing a sentence set X into a document word matrix with dimension of n multiplied by k;
adopting a Text-CNN Text classification model as a coder, and determining an input single-channel document word matrix x belonging to Rn×kThe potential representation of the τ th feature map is:
cτ=σ(x*Wτ+bτ) (5)
wherein, Wτ∈Ra×kFor the τ th convolution kernel, a is the height of the convolution kernel, σ is the activation function, which represents the 2d convolution operation, bτIs the bias term for the τ th convolution operation; since a narrow convolution is used, cτ∈Rn-a+1
To cτPerforming maximum pooling operation to obtain hτe.R, namely:
hτ=max(cτ) (6)
because the dimensionality of the clustering center is d, d convolution checks are needed to perform convolution operation on the input document word matrix, maximum pooling operation is performed on each feature map, and finally, each h-th feature map is subjected toτSplicing to obtain vectorization expression H e R of textdNamely:
Figure BDA0002407580510000033
the decoder portion is constructed using a deconvolution network, first for each h separatelyτPerforming inverse pooling operation to reduce the data to gτ∈Rn-a+1(ii) a Secondly, for each gτPerforming deconvolution operation, and reconstructing a document word matrix, wherein the calculation method comprises the following steps:
Figure BDA0002407580510000034
here, σ is the activation function, T denotes all the characteristic diagrams, WTIs the transpose of the corresponding convolution kernel, is a 2d convolution operation, ξ is the bias term;
the minimum mean square error loss is used as the reconstruction loss of the convolution self-coding, and the calculation formula is as follows:
Figure BDA0002407580510000035
where θ is a parameter of the convolutional auto-encoder.
Further, Step4 is to cluster the text when the convolution self-encoder calculates forward.
Further, the iteration of the cluster centers is to update the cluster centers by adopting the combination of the last cluster center and the current newly allocated cluster center.
As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:
step4.1, for a given set of news text vectors Hi}i=1,2,...,N,HiAnd (3) carrying out vectorization representation on the ith news document obtained by the convolutional self-encoder. The task here is to divide the news text of N different cases into k different case clusters, i.e. C ═ C1,...,Cr,...,Ck}. Wherein, CrIs the r-th case cluster. k-means is one of the most widely used clustering algorithms, and its loss function is:
Figure BDA0002407580510000041
wherein M is equal to Rd×kAs a cluster center matrix, sr,i∈{0,1}kFor each news text case cluster, and 1Tsi=1。
The update mode of the r-th case cluster partition is as follows:
Cr=Cr∪{Hi}if sr,i=1 (11)
in the iterative updating process, the news text is divided into cluster clusters closest to the cluster center, and specifically, the s is updatediThe rule of (1) is as follows:
Figure BDA0002407580510000042
initializing a cluster center M using a case-vectorized representation set Cen, wherein each column of M is Cenr. Considering that news is reported on different sides of a case, the information of the news text under the case is also added into the case characterization vector, so that the case characterization is more reasonable. Specifically, in the clustering process, the last clustering center is adopted
Figure BDA0002407580510000043
And the current newly assigned cluster center
Figure BDA0002407580510000044
The cluster center is updated to obtain a new case characteristic, and the updating method of the r-th case cluster center comprises the following steps:
Figure BDA0002407580510000045
wherein the content of the first and second substances,
Figure BDA0002407580510000046
and distributing the mean vector of the news text in the r case cluster for the t round, namely:
Figure BDA0002407580510000047
Figure BDA0002407580510000048
for the weight coefficient of the r-th case cluster, the calculation method is as follows:
Figure BDA0002407580510000049
here, the first and second liquid crystal display panels are,
Figure BDA00024075805100000410
the number of news assigned to the r-th case cluster for the t-th turn.
The network is trained under the guidance of reconstruction loss of the self-encoder, so that the representation of the text can be restrained, and the network is trained under the guidance of clustering loss, so that the representation of the text is closer to a case. Therefore, the network is jointly trained using a combination of reconstruction and clustering losses of the convolutional autocoder, and the loss function is defined as follows:
Loss=λLossc+(1-λ)Loss(θ)n(16)
wherein, λ ∈ [0,1]]Is balance LosscAnd Loss (θ)nIs determined.
In the early stage of clustering iteration, the self-encoder cannot learn good text representation and influences the representation of the case, so that a poor clustering result is generated. Let a co-training T round, the front J round only perform updating the parameters of the convolutional auto-encoder, and let λ be 0, make the Loss be the reconstruction Loss Loss (θ) of the auto-encoder onlyn. The last T-J is added to the clustering process in the forward direction, and the Loss is the Loss of the joint Loss.
Iteratively updating X ═ X using the proposed method1,X2,...,XNAnd after the round, the news text set converges into different case clusters, so that a final clustering result is obtained.
The invention has the beneficial effects that:
1. firstly, extracting important sentence representation texts; secondly, characterizing cases by using case elements, and using the case elements to initially cluster centers and guide a clustering search process; finally, a convolution self-encoder is selected to obtain text representation, a reconstruction loss and clustering loss combined training network is utilized to enable the representation of the text to be closer to a case, the text representation and the clustering process are unified into the same frame, self-encoder parameters and clustering model parameters are alternately updated, and text clustering is achieved;
2. the method aims at the problems that the current clustering algorithm lacks effective guide information for news and case correlation analysis tasks, so that clustering divergence is caused, and the accuracy of results is reduced, gives full play to the guide effect of case elements in the clustering process and text vectorization representation, and effectively improves the accuracy of clustering results.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Example 1: as shown in fig. 1, the news and case correlation analysis method based on case element guidance and deep clustering specifically includes:
step1, collecting related case news documents and defining related case elements.
Step1, collecting and sorting the related case news documents by compiling web crawlers to crawl related news texts.
The case elements defined in Step1 are defined by analyzing the composition of the case elements of Chinese books in the Chinese referee's netbook and considering the characteristics of the news text related to the case.
Specifically, a total of 5970 pieces of news text related to 6 hot cases are crawled, as shown in table 1. Definition "places of record, personnel involved in the record, description of the record" 3 elements are taken as record elements, and are shown in table 2.
TABLE 1 case-related news text data set
Figure RE-GDA0002680482060000051
TABLE 2 case elements List
Figure RE-GDA0002680482060000061
Step2, compressing the case-related news text by using a plurality of summarization technologies;
step3, representing the case by using the mean value of the case element word vectors to obtain the vectorization representation of the case;
step4, the compressed news text data is passed through a convolution self-encoder to obtain a text vectorization representation; constructing a word vector matrix for the compressed document, selecting a convolution self-encoder, and jointly training a network by utilizing reconstruction loss and clustering loss;
step5, utilizing the vectorization representation of the case to initialize a clustering center, unifying the text vectorization representation and the clustering process into the same frame, and alternately updating the self-encoder parameters and the clustering model parameters to realize text clustering.
Further, in Step2, several summarization methods are adopted to extract the summaries of the news texts, the summarization methods are used to synthesize the summaries, important information representation texts are extracted, and text compression is realized.
Further, the Step2 includes the specific steps of:
step2.1, firstly formally describing various abstract text compression tasks as follows: let a news text be S ═ S1,S2,...,SpP sentences are contained in total, and the abstracts generated by q methods are respectively set as L1v,L2v,...,LqvAbbreviated as L1v:LqvWherein each abstract contains v sentences, including o different sentences, with the goal of starting from L1v:LqvSelecting z sentences as compressed texts;
defining the ith abstract as fi(. o.), then:
Liv=fi(S) (1)
here, the news text is abstracted by using 7 abstraction methods, namely Lead, Luhn, LSA, LexRank, TextRank, SumBasic, KL-Sum, so that i belongs to [1,7], i is equal to 7;
selecting z sentences with the highest frequency of occurrence in the plurality of abstracts as compressed texts, and selecting the sentences with the front positions by considering the positions of the sentences in the document when the frequency of occurrence is the same; in addition, news headlines are also considered herein to be part of the news and are topical and factual, so headline information is also added to the compressed text collection.
Further, Step3 includes:
if Er={e1,e2,...emThe case element set of the r-th case includes m case elements in total, and each case element eiIt can be characterized as a d-dimensional word vector wiI.e. Er={w1,w2,...wm};
Then the case is vectorized with the mean of the word vectors of the case elements: let Cen ber∈RdFor the vectorized representation of the r-th case, the calculation method is as follows:
Figure BDA0002407580510000071
assuming there are a total of k cases, using Cen to represent the set of cases, then:
Cen={Cen1,...,Cenr,...,Cenk} (3)。
the specific steps of Step4 are as follows:
step4.1, set X of compressed sentences of a news text Si∈RkIs a word vector of the nth word in the sentence set X, and the sentence set contains n words in total, then the news text is represented as:
Figure BDA0002407580510000072
here, the first and second liquid crystal display panels are,
Figure BDA0002407580510000073
splicing, namely constructing a sentence set X into a document word matrix with dimension of n multiplied by k;
adopting a Text-CNN Text classification model as a coder, and determining an input single-channel document word matrix x belonging to Rn×kThe potential representation of the τ th feature map is:
cτ=σ(x*Wτ+bτ) (5)
wherein, Wτ∈Ra×kFor the τ th convolution kernel, a is the height of the convolution kernel, σ is the activation function, which represents the 2d convolution operation, bτIs the bias term for the τ th convolution operation; since a narrow convolution is used, cτ∈Rn-a+1
To cτPerforming maximum pooling operation to obtain hτe.R, namely:
hτ=max(cτ) (6)
because the dimensionality of the clustering center is d, d convolution checks are needed to perform convolution operation on the input document word matrix, maximum pooling operation is performed on each feature map, and finally, each h-th feature map is subjected toτSplicing to obtain vectorization expression H e R of textdNamely:
Figure BDA0002407580510000074
the decoder portion is constructed using a deconvolution network, first for each h separatelyτPerforming inverse pooling operation to reduce the data to gτ∈Rn-a+1(ii) a Secondly, for each gτPerforming deconvolution operation, and reconstructing a document word matrix, wherein the calculation method comprises the following steps:
Figure BDA0002407580510000075
here, σ is the activation function, T denotes all the characteristic diagrams, WTIs the transpose of the corresponding convolution kernel, is a 2d convolution operation, ξ is the bias term;
the minimum mean square error loss is used as the reconstruction loss of the convolution self-coding, and the calculation formula is as follows:
Figure BDA0002407580510000076
where θ is a parameter of the convolutional auto-encoder.
Further, Step5 is to cluster the text when the convolution self-encoder calculates forward.
Further, the iteration of the cluster centers is to update the cluster centers by adopting the combination of the last cluster center and the current newly allocated cluster center.
And comparing the clustering result with the label of the text in the data set to evaluate the clustering performance, and selecting Accuracy (ACC) and standardized mutual information (NMI) as evaluation indexes, wherein the accuracy is defined as:
Figure BDA0002407580510000081
sT∈[1,N]is the transpose of the clustering result matrix, s, [1, N ]]And the label matrix of the text in the data set, tr is the trace of the matrix, and N is the total number of the news text.
Standardized mutual information (NMI) can be used to measure the similarity between two data distributions, i.e. for clustering tasks, i.e. to measure the similarity between clustering labels and clustering results.
Figure BDA0002407580510000082
Wherein, MI (-) is mutual information, H (-) is information entropy, NMI epsilon [0,1], and the larger the value is, the better the clustering effect is.
In order to make the invention more convincing, 2 vector space models, 1 topic model and 3 word vector-based distributed representation methods are selected to characterize the document, and the k-means clustering algorithm and the proposed method are used for comparison. In particular, for a feature dimension based on a vector space model of 2000, the dimensions of the remaining baseline methods are all 300. Further, for a distributed representation method of a document, the text after compression is used with the text method. Specifically, the method comprises the following steps of (1) TFIDF-1: taking each word in the document as a feature item, wherein the weight is TFIDF; (2) TFIDF-2: context words with the window size of 2 are used as feature items, and the weight is TFIDF; (3) LDA: obtaining a document representation using a topic model; (4) MeanWV (mean Word embedding): an average word vector for the document; (5) TWE (local Word embedding) the average of the concatenation subject vector and the average of the Word vector represent the document; (6) TopicVec: documents are represented using a concatenation of the document topic vector and the average of the word vectors.
Aiming at the invention, the following hyper-parameter settings are adopted: (1) for the document compression module, the number of sentences extracted by each abstract is set to be 3, and the number of sentences synthesized by a plurality of abstracts is also set to be 3. (2) For the convolutional self-coding module, the dimension of the input word vector is 300 dimensions; selecting three different convolution kernels, wherein the heights of the three different convolution kernels are 3, 4 and 5 respectively, and the thickness of each convolution kernel is 100; the optimizer is Adam, the learning rate is 0.01, and the L2 regularization weight is 0.00001. (3) for the clustering module, the embedding dimension of the case element is 300 dimensions; setting the iteration turn to be 25 turns, and in the clustering process, using no clustering loss optimization network in the first 5 turns; the hyperparameter that balances the cluster loss and the autoencoder loss is set to 0.1.
Table 3 shows the comparison of clustering effect between the present method and the baseline method in 4, 5, and 6 cases. Experimental results show that the method is superior to a baseline method in both accuracy and standardized mutual information indexes.
Table 3 comparison of experimental results for the methods herein and the baseline method
Figure BDA0002407580510000091
As can be seen from the experimental results in table 3, the text characterization based on LDA has relatively poor clustering effect, and the reason for analyzing the text characterization is mainly that the task of the method is not very suitable, because our purpose is to cluster news texts of the same case into the same case cluster, one case is a topic, and LDA considers that one news text contains multiple topics, the clustering result is not ideal. The clustering method based on the vector space text representation obtains good effect, as for case-related public sentiment data, news texts of different cases have certain difference, TF-IDF calculates the representative degree of words to documents, can better distinguish different documents, especially for TFIDF-2, considers 2-element grammatical features, and captures a part of context information. The document representation method based on the distribution type respectively uses word embedding or theme embedding to represent the document, and achieves the effect close to TFIDF-2.
The method utilizes a convolution self-encoder to extract features and combine semantics of the text, so that the representation of the text has n-element grammatical features, and meanwhile, the loss of clustering is used for guidance, so that a model can learn a text representation form related to a task better. And the case elements are used for initializing the clustering center, so that the clustering process is guided. The method herein is superior to the baseline method in both average indicators. For example, under 6 cases, the accuracy of the method is improved by 4.16% compared with TFIDF-2, and the standardized mutual information is improved by 9.20%.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (7)

1. The news and case correlation analysis method based on case element guidance and deep clustering is characterized by comprising the following steps of: the method comprises the following steps:
step1, compressing the case-related news text by using a plurality of summarization technologies;
step2, representing the case by using the mean value of the case element word vectors to obtain the vectorization representation of the case;
step3, the compressed news text data is passed through a convolution self-encoder to obtain a text vectorization representation;
step4, utilizing the vectorization representation of the case to initialize a clustering center, unifying the text vectorization representation and the clustering process into the same frame, and alternately updating the self-encoder parameters and the clustering model parameters to realize text clustering.
2. The news and case correlation analysis method based on case element guidance and deep clustering of claim 1, wherein: in Step1, a plurality of summarization methods are adopted to extract the summaries of the news texts, the summarization methods are used to synthesize the summaries, important information representation texts are extracted, and text compression is realized.
3. The news and case correlation analysis method based on case element guidance and deep clustering according to claim 1 or 2, characterized in that: the specific steps of Step1 are as follows:
step1.1, firstly formally describing various abstract text compression tasks as follows: let a news text be S ═ S1,S2,...,SpP sentences are contained in total, and the abstracts generated by q methods are respectively set as L1v,L2v,...,LqvAbbreviated as L1v:LqvWherein each abstract contains v sentences, including o different sentences, with the goal of starting from L1v:LqvSelecting z sentences as compressed texts;
defining the ith abstract as fi(. o.), then:
Liv=fi(S) (1)
here, the news text is abstracted by using 7 abstraction methods, namely Lead, Luhn, LSA, LexRank, TextRank, SumBasic, KL-Sum, so that i belongs to [1,7], i is equal to 7;
and selecting z sentences with the highest frequency of occurrence in the plurality of abstracts as compressed texts.
4. The news and case correlation analysis method based on case element guidance and deep clustering of claim 1, wherein: the Step2 includes:
if Er={e1,e2,...emThe case element set of the r-th case includes m case elements in total, and each case element eiIt can be characterized as a d-dimensional word vector wiI.e. Er={w1,w2,...wm};
Then the case is vectorized with the mean of the word vectors of the case elements: let Cen ber∈RdFor the vectorized representation of the r-th case, the calculation method is as follows:
Figure FDA0002407580500000011
assuming there are a total of k cases, using Cen to represent the set of cases, then:
Cen={Cen1,...,Cenr,...,Cenk} (3)。
5. the news and case correlation analysis method based on case element guidance and deep clustering of claim 1, wherein: step3 comprises the following steps:
and constructing a word vector matrix for the compressed document, selecting a convolution self-encoder, and training a network by utilizing reconstruction loss and clustering loss.
6. The news and case correlation analysis method based on case element guidance and deep clustering of claim 1, wherein: step4 is to cluster the text when the convolution self-encoder calculates forward.
7. The news and case correlation analysis method based on case element guidance and deep clustering of claim 6, wherein: and the iteration of the clustering centers is to update the clustering centers by adopting the combination of the last clustering center and the current newly distributed clustering center.
CN202010166279.6A 2020-03-11 2020-03-11 News and case correlation analysis method based on case element guidance and deep clustering Active CN111831820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010166279.6A CN111831820B (en) 2020-03-11 2020-03-11 News and case correlation analysis method based on case element guidance and deep clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010166279.6A CN111831820B (en) 2020-03-11 2020-03-11 News and case correlation analysis method based on case element guidance and deep clustering

Publications (2)

Publication Number Publication Date
CN111831820A true CN111831820A (en) 2020-10-27
CN111831820B CN111831820B (en) 2022-07-19

Family

ID=72913341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010166279.6A Active CN111831820B (en) 2020-03-11 2020-03-11 News and case correlation analysis method based on case element guidance and deep clustering

Country Status (1)

Country Link
CN (1) CN111831820B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158079A (en) * 2021-04-22 2021-07-23 昆明理工大学 Case public opinion timeline generation method based on difference case elements
CN113191411A (en) * 2021-04-22 2021-07-30 杭州卓智力创信息技术有限公司 Electronic sound image file management method based on photo group
WO2022228127A1 (en) * 2021-04-29 2022-11-03 京东科技控股股份有限公司 Element text processing method and apparatus, electronic device, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898180A (en) * 2018-06-28 2018-11-27 中国人民解放军国防科技大学 Depth clustering method for single-particle cryoelectron microscope images
US20190005976A1 (en) * 2017-07-03 2019-01-03 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Method and system for enhancing a speech signal of a human speaker in a video using visual information
CN109272992A (en) * 2018-11-27 2019-01-25 北京粉笔未来科技有限公司 A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model
CN109492157A (en) * 2018-10-24 2019-03-19 华侨大学 Based on RNN, the news recommended method of attention mechanism and theme characterizing method
US20190304437A1 (en) * 2018-03-29 2019-10-03 Tencent Technology (Shenzhen) Company Limited Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition
CN110533545A (en) * 2019-07-12 2019-12-03 长春工业大学 Side community discovery algorithm based on the sparse self-encoding encoder of depth
CN110717332A (en) * 2019-07-26 2020-01-21 昆明理工大学 News and case similarity calculation method based on asymmetric twin network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005976A1 (en) * 2017-07-03 2019-01-03 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Method and system for enhancing a speech signal of a human speaker in a video using visual information
US20190304437A1 (en) * 2018-03-29 2019-10-03 Tencent Technology (Shenzhen) Company Limited Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition
CN108898180A (en) * 2018-06-28 2018-11-27 中国人民解放军国防科技大学 Depth clustering method for single-particle cryoelectron microscope images
CN109492157A (en) * 2018-10-24 2019-03-19 华侨大学 Based on RNN, the news recommended method of attention mechanism and theme characterizing method
CN109272992A (en) * 2018-11-27 2019-01-25 北京粉笔未来科技有限公司 A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model
CN110533545A (en) * 2019-07-12 2019-12-03 长春工业大学 Side community discovery algorithm based on the sparse self-encoding encoder of depth
CN110717332A (en) * 2019-07-26 2020-01-21 昆明理工大学 News and case similarity calculation method based on asymmetric twin network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A. ALQAHTANI 等: "A deep convolutional auto-encoder with embedded clustering", 《2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 *
CHEN D 等: "Unsupervised multi-manifold clustering by learning deep representation", 《PROCEEDINGS OF THE 31ST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
Y. LI 等: "Spatial Fuzzy Clustering and Deep Auto-encoder for Unsupervised Change Detection in Synthetic Aperture Radar Images", 《IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM》 *
谢娟英 等: "深度卷积自编码图像聚类算法", 《计算机科学与探索》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158079A (en) * 2021-04-22 2021-07-23 昆明理工大学 Case public opinion timeline generation method based on difference case elements
CN113191411A (en) * 2021-04-22 2021-07-30 杭州卓智力创信息技术有限公司 Electronic sound image file management method based on photo group
CN113158079B (en) * 2021-04-22 2022-06-17 昆明理工大学 Case public opinion timeline generation method based on difference case elements
CN113191411B (en) * 2021-04-22 2023-02-07 杭州卓智力创信息技术有限公司 Electronic sound image file management method based on photo group
WO2022228127A1 (en) * 2021-04-29 2022-11-03 京东科技控股股份有限公司 Element text processing method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
CN111831820B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN109241424B (en) A kind of recommended method
Chen et al. Experimental explorations on short text topic mining between LDA and NMF based Schemes
US10795949B2 (en) Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom
CN111831820B (en) News and case correlation analysis method based on case element guidance and deep clustering
US9684678B2 (en) Methods and system for investigation of compositions of ontological subjects
CN109189925A (en) Term vector model based on mutual information and based on the file classification method of CNN
CN109460737A (en) A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network
Ma et al. Course recommendation based on semantic similarity analysis
CN106156023B (en) Semantic matching method, device and system
CN105183833A (en) User model based microblogging text recommendation method and recommendation apparatus thereof
López-Sánchez et al. Hybridizing metric learning and case-based reasoning for adaptable clickbait detection
CN108319734A (en) A kind of product feature structure tree method for auto constructing based on linear combiner
Simchoni et al. Integrating random effects in deep neural networks
CN114077661A (en) Information processing apparatus, information processing method, and computer readable medium
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
Ma et al. Clustering and integrating of heterogeneous microbiome data by joint symmetric nonnegative matrix factorization with laplacian regularization
Chu et al. Refined SBERT: Representing sentence BERT in manifold space
Wang et al. Empirical exploring word-character relationship for chinese sentence representation
Li et al. A personalized recommendation algorithm for college books based on user interest
CN106294295A (en) Article similarity recognition method based on word frequency
Kostkina et al. Document categorization based on usage of features reduction with synonyms clustering in weak semantic map
CN109902273A (en) The modeling method and device of keyword generation model
CN112015760B (en) Automatic question-answering method and device based on candidate answer set reordering and storage medium
Wingfield et al. Sensorimotor distance: A fully grounded measure of semantic similarity for 800 million concept pairs
CN104298704B (en) The method and system of text push is realized in blog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant