CN114661903A - Deep semi-supervised text clustering method, device and medium combining user intention - Google Patents

Deep semi-supervised text clustering method, device and medium combining user intention Download PDF

Info

Publication number
CN114661903A
CN114661903A CN202210208434.5A CN202210208434A CN114661903A CN 114661903 A CN114661903 A CN 114661903A CN 202210208434 A CN202210208434 A CN 202210208434A CN 114661903 A CN114661903 A CN 114661903A
Authority
CN
China
Prior art keywords
text
intention
clustering
matrix
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210208434.5A
Other languages
Chinese (zh)
Other versions
CN114661903B (en
Inventor
黄瑞章
李静楠
秦永彬
陈艳平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN202210208434.5A priority Critical patent/CN114661903B/en
Publication of CN114661903A publication Critical patent/CN114661903A/en
Application granted granted Critical
Publication of CN114661903B publication Critical patent/CN114661903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a deep semi-supervised text clustering method, equipment and a medium in combination with user intention, wherein the method comprises the following steps: the method comprises the following steps: constructing an intention information matrix; step two: performing vector mapping on the text, and extracting features of the text vector through a neural network; step three: optimizing the encoder by using the intention information matrix to further obtain better feature representation; step four: utilizing KL divergence to assist optimization to obtain an initial clustering result; step five: and constructing an optimization function, and guiding the clustering direction of the clusters by using the intention information. On the basis of giving constraint pair supervision information, the intention information is mined by fully utilizing a deep neural network, the intention information is fused into feature representation, and meanwhile, the clustering process is supervised by utilizing the intention information, so that the problems of the semi-supervised text clustering text representation difference, insufficient supervision and neglect of the intention of a user are effectively solved, the accuracy of a clustering result is improved, and the clustering result more suitable for downstream tasks is obtained.

Description

Deep semi-supervised text clustering method, device and medium combining user intention
Technical Field
The invention belongs to the technical field of information extraction and text processing, and particularly relates to a deep semi-supervised text clustering method, equipment and medium in combination with user intention.
Background
With the advent of the information age, large-scale data has appeared in the form of text in front of human beings. Text clustering is the classification of similar text documents into one class, and is one of the most important algorithms in the field of data mining. The traditional unsupervised text clustering classifies clusters according to the similarity between documents, and does not need any data attribute during classification. With the diversification of application scenes and the differentiation development of downstream tasks, different users have different clustering division intents for the same batch of data, and the users need to guide clustering results according to the intents. For example, for the same batch of news text data, the intention of the user a is divided according to the 'region' to which the news belongs, and the intention of the user B is divided according to the 'subject' of the news. Different intents may result in different clustering results. However, the traditional unsupervised clustering algorithm can only divide the structure according to the characteristics of data, and the intention information provided by the user cannot be considered. Therefore, in practical application, a user provides different monitoring information according to different downstream task requirements, and the monitoring information is used for guiding clustering, so that semi-monitoring text clustering is realized. Semi-supervised clustering is a new learning method provided by combining semi-supervised learning and clustering analysis, and the semi-supervised clustering is widely regarded and applied in machine learning. The semi-supervised text clustering algorithm is a method for grouping documents with a small amount of supervision information. The method effectively utilizes the supervision information, improves the performance of the algorithm and reduces the calculation complexity. From a theoretical level, the technical research of semi-supervised text clustering can provide theoretical support for other natural language processing technologies, and is a natural language processing project worthy of proceeding.
Semi-supervised text clustering has been widely studied in machine learning from different aspects, and a large number of semi-supervised text clustering algorithms have been proposed for various problems, and the semi-supervised clustering methods are classified into the following 3 types: based on constrained semi-supervised clustering, the idea of the algorithm is characterized in that constraint limiting information is added on the basis of the traditional clustering to enable the clustering effect to reach the best; the algorithm is characterized in that in the process of preprocessing data, similarity measurement among samples is transformed, so that a new measurement function is obtained, and associated positive constraint samples are closer and negative samples are opposite; based on semi-supervised clustering combining constraint and distance, the algorithm is a new algorithm obtained by combining the former two methods, and a better clustering effect can be obtained. However, these methods have the following disadvantages: firstly, the text represents the difference problem, in practical application, the text expression has difference, and different expression emphasis is required for different user clustering intents; secondly, the intention supervision is weak, and supervision information can only guide the structural division of a small number of text samples, so that the whole clustering intention of the user cannot be accurately expressed; finally, the user intention is ignored, and different clustering results meeting the user intention can not be obtained for the same batch of data samples according to a specific application scene and downstream task requirements.
Disclosure of Invention
The invention provides a deep semi-supervised text clustering method, equipment and medium combined with user intention, aiming at solving the problems in the prior art.
The invention is realized by the following technical scheme, and provides a deep semi-supervised text clustering method combined with user intention, which specifically comprises the following steps:
the method comprises the following steps: processing constraint information given by a user into an intention matrix;
step two: learning an initial feature representation for the text through a pre-training depth self-encoder;
step three: carrying out similarity normalization processing on the initial feature representation, carrying out fitting calculation loss by using an intention matrix, and continuously reversely adjusting and optimizing the parameters of the encoder to obtain final feature representation;
step four: clustering the obtained feature vectors by utilizing the KL divergence to obtain a text clustering pseudo label;
step five: and calculating a loss function, namely an optimization function, on the obtained pseudo label by using the intention graph matrix, and performing iterative optimization step three to obtain a text clustering result which finally accords with the intention of the user.
Further, in the first step, according to the pair of constraint information given by the user, the association relationship between the data points is mined, so as to construct an intention information matrix with the size of n × n, wherein n is the size of the data set.
Further, in the second step, the text is vectorized and represented, and in the vectorization representation process, the following steps are selected: and mapping by using a Word frequency TF, a Word frequency-inverse text frequency index TF-IDF or a Word2Vec method.
Further, in the third step, the initial feature representation obtained in the second step is subjected to matrix multiplication to obtain a Similarity matrix for converging all text data, and the Similarity matrix and the intention information matrix are used for performing Similarity Loss calculation to obtain corresponding Similarity Loss; and fine-tuning the encoder in the second step by minimizing the similarity loss to finally obtain the text feature representation fused with the semantic information of the user intention.
Further, in the fourth step, the distribution Q of the text vectors is obtained through the third step, in order to enable the distribution to have higher confidence coefficient, the distribution P is further calculated according to Q, the difference loss between the two distributions is calculated by utilizing a KL divergence formula, the high confidence coefficient distribution of the loss-assisted model learning is minimized, the model parameters and the clustering mass center are refined, and therefore the clustered pseudo label result is obtained.
Further, in the fifth step, a label information matrix with the size of n × n is constructed according to the pseudo label obtained in the fourth step, a new optimization function is constructed to calculate the loss between the label information matrix and the intention information matrix, the loss is minimized to optimize and guide the clustering process, and an optimal clustering result is finally obtained through iteration, so that the purpose of guiding the clustering process by using the constraint information is achieved, and a text clustering result combining the user intention is obtained.
Further, the optimization function is of the form:
Figure BDA0003530087160000031
Figure BDA0003530087160000032
wherein,
Figure BDA0003530087160000033
representative sample xiTo which the group of (a) belongs,
Figure BDA0003530087160000034
representative sample xjTo which category (c) belongs; must-link means that the two sample points Must belong to the same class, and Cannot-link means that the two sample points Must not belong to the same class.
The invention provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the deep semi-supervised text clustering method combined with the user intention when executing the computer program.
The invention proposes a computer readable storage medium for storing computer instructions which, when executed by a processor, implement the steps of the method for deep semi-supervised text clustering in combination with user intent.
The invention has the beneficial effects that:
(1) the method can mine and fuse the user intention into the semi-supervised text clustering, obtain the characteristic representation which can express the user intention more according to the user intention in a targeted manner, obtain the clustering result which meets the user requirement more, and adapt to different downstream tasks. (2) The clustering process is guided by intention, so that the problem of weak supervision strength of the supervision information can be solved, and a new thought is provided for the follow-up research of semi-supervised text clustering. (3) In view of the important role played by text clustering in the field of natural language processing, the semi-supervised text clustering introduced with the user intention can obtain a better clustering result, is suitable for different application scenes, provides more favorable support and has greater theoretical significance and practical value.
Drawings
FIG. 1 is a technical roadmap for the present invention;
FIG. 2 is a diagram of a process model of the present invention;
FIG. 3 is a schematic diagram of the intent mining and fusion method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The corresponding clustering result can be obtained by combining the deep semi-supervised text clustering method of the user intention, and the technical problem to be solved by the invention is solved by considering the intentions of different users. In the invention, constraint pair information provided by a user is used as supervision information, user intentions are mined out from the supervision information, and then characteristic representation fusing the user intentions is obtained, in the clustering process, the intention information is used for guiding the clustering direction, and the intention information supervision target is achieved in two stages, so that a clustering result meeting the user intentions is obtained, and the requirements of different application scenes and downstream tasks are met.
The invention provides a deep semi-supervised text clustering method combined with user intention. The technical problem to be solved by the invention is as follows: the encoding mode for mining and fusing the intention information is provided, a neural network technology is introduced from the perspective of fully utilizing supervision information, the characteristic that the neural network extracts high-dimensional abstract features is fully exerted, the intention information is mined and fused into feature representation of a text, and the aim of guiding clustering by the intention information is fulfilled through an iterative optimization clustering process, so that the accuracy of a clustering result is improved, and the existing problems are effectively solved.
With reference to fig. 1 to 3, the present invention provides a deep semi-supervised text clustering method with reference to user intentions, and the method specifically includes:
the method comprises the following steps: processing constraint information given by a user into an intention matrix;
step two: learning an initial feature representation for the text through a pre-training depth self-encoder;
step three: carrying out similarity normalization processing on the initial feature representation, carrying out fitting calculation loss by using an intention matrix, and continuously reversely adjusting and optimizing the parameters of the encoder to obtain final feature representation;
step four: clustering the obtained feature vectors by using the KL divergence to obtain a text clustering pseudo label;
step five: and calculating a loss function, namely an optimization function, on the obtained pseudo label by using the intention graph matrix, and iterating and optimizing the step three to obtain a text clustering result which finally accords with the intention of the user.
In the first step, according to the paired constraint information given by the user, the incidence relation between the data points is mined, so that an intention information matrix with the size of n x n is constructed, wherein n is the size of the data set. The construction of the matrix realizes the aim of digitally coding the user intention, and facilitates the subsequent calculation of the loss function of each part.
In the second step, the text is vectorized and represented, and the following steps are selected in the vectorization representation process: and mapping by using a Word frequency TF, a Word frequency-inverse text frequency index TF-IDF or a Word2Vec method. The vectorized representation of the text is often high-dimensional, and in the training process, in order to avoid dimensional disasters, the method is based on a neural network, and a self-encoder is pre-trained for feature representation learning, so that the initial feature representation of the text is obtained.
In the third step, the initial feature representation obtained in the second step is multiplied by a matrix to obtain a Similarity matrix for converging all text data, and the Similarity matrix and the intention information matrix are used for performing Similarity Loss calculation to obtain corresponding Similarity Loss; and fine-tuning the encoder in the second step by minimizing the similarity loss to finally obtain the text feature representation fused with the semantic information of the user intention. The feature representation output by this step is used for the subsequent clustering process.
In the fourth step, the distribution Q of the text vectors is obtained through the third step, in order to enable the distribution to have higher confidence coefficient, the distribution P is further calculated according to Q, the difference loss between the two distributions is calculated by utilizing a KL divergence formula, the high confidence coefficient distribution of the loss auxiliary model learning is minimized, the model parameters and the clustering mass center are refined, and therefore the clustered pseudo label result is obtained.
And in the fifth step, a label information matrix with the size of n x n is constructed according to the pseudo label obtained in the fourth step, a new optimization function is constructed to be used for calculating the loss between the label information matrix and the intention information matrix, the loss is minimized to optimize and guide the clustering process, and the optimal clustering result is finally obtained through iteration, so that the aim of guiding the clustering process by using the constraint information is achieved, and the text clustering result combining the user intention is obtained.
According to the method, the encoder and the clustering process are optimized and guided through the intention information, the clustering result meeting the requirement can be obtained by combining the intention of the user, and the model can achieve better performance through experimental verification.
Example (b): as shown in fig. 1 to 3, a deep semi-supervised text clustering method combining user intentions includes the following steps: the method comprises the following steps: constructing an intention information matrix; step two: performing vector mapping on the text, and extracting features of the text vector through a neural network; step three: optimizing the encoder by using the intention information matrix to further obtain better feature representation; step four: utilizing KL divergence to assist optimization to obtain an initial clustering result; step five: and constructing an optimization function, and guiding the clustering direction of the clusters by using the intention information.
In the first step, according to the paired constraint information given by the user, the incidence relation between data points is mined, so that an intention information matrix R with the size of n x n is constructed, wherein n is the size of the data set. Given X as the original text data sample, each point X in the matrixijThe value of (a) represents the sample data xiAnd xjThere are three values of the constraint relationship between the two, rijSample x is represented by 1iAnd sample xjAre in the same cluster, rij1 stands for sample xiAnd sample xjNot in the same cluster, rij0 represents a data-to-tentative unconstrained relationship. The construction of the matrix realizes the aim of digitally coding the user intention, and facilitates the subsequent calculation of the loss function of each part.
In the second step, the text is vectorized and expressed, and this link can be selected as follows: and mapping by using methods such as TF (Word frequency), TF-IDF (Word frequency-inverse text frequency index) or Word2 Vec. The vectorized representation of the text is often high-dimensional, and in the training process, in order to avoid dimensional disasters, the method is based on a neural network, and a self-encoder is pre-trained for feature representation learning, so that the initial feature representation of the text is obtained.
In the third step, the intention mining and feature representation fusion part mainly solves the problem of text representation difference, and after feature coding is carried out on text data, fusion intention information is the key. And (4) multiplying the initial characteristic representation obtained in the step two by a matrix to obtain a Similarity matrix for converging all text data, and performing Similarity Loss calculation by using the Similarity matrix and the intention information matrix to obtain corresponding Similarity Loss, wherein intention information fusion is performed by minimizing the Loss.
In fig. 2, X represents an original data sample, and Z represents a feature representation obtained after an original feature distribution passes through an Encoder module. The invention constructs an IMA module to mine and fuse user intention information. And (4) performing fusion intention coding by using the intention matrix R obtained in the step one, wherein the technical principle of the fusion intention coding is shown in FIG. 3.
In fig. 3, Z is a representation of the feature vector obtained from the Encoder part, and the size is n × d, where n is the size of the data set and d is 10. The invention adopts a method of multiplying Z by its own transposition to obtain a matrix W with n x n dimension, namely a similarity matrix among samples in a data set. Then, two thresholds _ up and _ down are set through a Normalization algorithm to normalize the similarity matrix, so as to obtain a new matrix S, wherein the value rule of the matrix S is as follows:
Figure BDA0003530087160000061
the model intention matrix R designed by the invention continuously optimizes the matrix S and recalls the Encoder part. The Loss function algorithm applied here is Similarity Loss, which can jointly measure the self-Similarity and relative Similarity of the sample pairs, so that it can optimize the correlation coding between the sample pairs through iteration. And (5) finely adjusting the encoder in the second step by minimizing the loss, and finally obtaining the text feature representation fused with the semantic information of the user intention. The feature representation distribution output by this step is used for the subsequent clustering process.
In the fourth step, the distribution Q of the text vectors is obtained through the third step, and the distribution P is further calculated according to Q in order to make the distribution have higher confidence. The method of the invention sets a clustering loss function, calculates the difference loss between two distributions by using a KL divergence formula, and minimizes the loss to assist the model to learn high-confidence-level distribution so as to refine model parameters and clustering mass centers. Thereby obtaining a clustered pseudo-label result.
And step five, constructing a label information matrix with the size of n x n according to the pseudo labels obtained in the step four, and constructing a new optimization function for calculating the loss between the matrix and the intention information matrix. This loss is minimized to optimize and guide the clustering process.
The purpose of the intention guide clustering process is to solve the problem of weak supervision, and find clusters which can meet given constraint conditions to the greatest extent under the condition of considering the user intention. And the guidance strength is enhanced by learning the similarity relation between the constraint information pairs. And after continuous iteration, the clustering result is optimized along with a specific direction. To achieve this, the invention sets an optimization function in the form of:
Figure BDA0003530087160000062
Figure BDA0003530087160000063
wherein,
Figure BDA0003530087160000064
representative sample xiTo which the group of (a) is assigned,
Figure BDA0003530087160000065
representative sample xjTo which category; must-link means that the two sample points Must belong to the same class, and Cannot-link means that the two sample points Must not belong to the same class. For two constraint relations, the following pairing cost calculation formula is set:
same pair constraint pairing cost:
L(Xp,Xq)+=DKL(P*||Q)+DKL(Q*||P)
different pairs of constraint pairing costs:
L(Xp,Xq)-=Ln(DKL(P*||Q),σ)+Ln(DKL(Q*||P),σ)
Ln(e,σ)=max(0,σ-e)
where σ is a set parameter to prevent overfitting. Minimizing Loss by iterationIDCAnd finally, an optimal clustering result can be obtained, so that the aim of guiding a clustering process by using constraint information is fulfilled, and a text clustering result combining the user intention is obtained.
In conclusion, the deep semi-supervised text clustering method combined with the user intention has excellent performance.
The invention provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the deep semi-supervised text clustering method combined with the user intention when executing the computer program.
The invention proposes a computer readable storage medium for storing computer instructions which, when executed by a processor, implement the steps of the method for deep semi-supervised text clustering in combination with user intent.
The method, the device and the medium for deep semi-supervised text clustering in combination with user intention are introduced in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the above embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (9)

1. A deep semi-supervised text clustering method combined with user intention is characterized by specifically comprising the following steps:
the method comprises the following steps: processing constraint information given by a user into an intention matrix;
step two: learning an initial feature representation for the text through a pre-training depth self-encoder;
step three: carrying out similarity normalization processing on the initial feature representation, carrying out fitting calculation loss by using an intention matrix, and continuously reversely adjusting and optimizing the parameters of the encoder to obtain final feature representation;
step four: clustering the obtained feature vectors by utilizing the KL divergence to obtain a text clustering pseudo label;
step five: and calculating a loss function, namely an optimization function, on the obtained pseudo label by using the intention graph matrix, and performing iterative optimization step three to obtain a text clustering result which finally accords with the intention of the user.
2. The method according to claim 1, wherein in the first step, association relationship between data points is mined according to paired constraint information given by a user, so as to construct an intention information matrix with size n x n, wherein n is data set size.
3. The method according to claim 2, wherein in the second step, the text is vectorized and represented, and in the vectorization representation process, the following steps are selected: and mapping by using a Word frequency TF, a Word frequency-inverse text frequency index TF-IDF or a Word2Vec method.
4. The method according to claim 3, wherein in the third step, the initial feature representation obtained in the second step is multiplied by a matrix to obtain a Similarity matrix for converging all text data, and the Similarity matrix and the intention information matrix are used for Similarity Loss calculation to obtain corresponding Similarity Loss; and fine-tuning the encoder in the second step by minimizing the similarity loss to finally obtain the text feature representation fused with the semantic information of the user intention.
5. The method according to claim 4, wherein in the fourth step, after the third step, a distribution Q of text vectors is obtained, and in order to further calculate a distribution P according to Q with higher confidence of the distribution, a KL divergence formula is used to calculate a difference loss between the two distributions, and the loss-aided model is minimized to learn high confidence distribution, so as to refine model parameters and cluster centroids, thereby obtaining a pseudo-label result of the clusters.
6. The method according to claim 5, wherein in the fifth step, a label information matrix with the size of n x n is constructed according to the pseudo label obtained in the fourth step, a new optimization function is constructed to calculate the loss between the label information matrix and the intention information matrix, the loss is minimized to optimize and guide the clustering process, and through iteration, the optimal clustering result is finally obtained, so that the purpose of guiding the clustering process by using the constraint information is achieved, and the text clustering result combining the user intention is obtained.
7. The method of claim 6, wherein the optimization function is of the form:
Figure FDA0003530087150000021
Figure FDA0003530087150000022
wherein,
Figure FDA0003530087150000023
representative sample xiTo which the group of (a) belongs,
Figure FDA0003530087150000024
representative sample xjTo which category (c) belongs; must-link means that the two sample points Must belong to the same class, and Cannot-link means that the two sample points Must not belong to the same class.
8. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method according to any one of claims 1-7 when executing the computer program.
9. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
CN202210208434.5A 2022-03-03 2022-03-03 Deep semi-supervised text clustering method, device and medium combining user intention Active CN114661903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210208434.5A CN114661903B (en) 2022-03-03 2022-03-03 Deep semi-supervised text clustering method, device and medium combining user intention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210208434.5A CN114661903B (en) 2022-03-03 2022-03-03 Deep semi-supervised text clustering method, device and medium combining user intention

Publications (2)

Publication Number Publication Date
CN114661903A true CN114661903A (en) 2022-06-24
CN114661903B CN114661903B (en) 2024-07-09

Family

ID=82027540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210208434.5A Active CN114661903B (en) 2022-03-03 2022-03-03 Deep semi-supervised text clustering method, device and medium combining user intention

Country Status (1)

Country Link
CN (1) CN114661903B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049697A (en) * 2023-01-10 2023-05-02 苏州科技大学 Interactive clustering quality improving method based on user intention learning
CN117875318A (en) * 2023-02-27 2024-04-12 同心县启胜新能源科技有限公司 Temperature and humidity control method and system for livestock breeding based on Internet of things and cloud platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300564A1 (en) * 2016-04-19 2017-10-19 Sprinklr, Inc. Clustering for social media data
CN110309302A (en) * 2019-05-17 2019-10-08 江苏大学 A kind of uneven file classification method and system of combination SVM and semi-supervised clustering
CN110516068A (en) * 2019-08-23 2019-11-29 贵州大学 A kind of various dimensions Text Clustering Method based on metric learning
US20200074280A1 (en) * 2018-08-28 2020-03-05 Apple Inc. Semi-supervised learning using clustering as an additional constraint
CN111046907A (en) * 2019-11-02 2020-04-21 国网天津市电力公司 Semi-supervised convolutional network embedding method based on multi-head attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300564A1 (en) * 2016-04-19 2017-10-19 Sprinklr, Inc. Clustering for social media data
US20200074280A1 (en) * 2018-08-28 2020-03-05 Apple Inc. Semi-supervised learning using clustering as an additional constraint
CN110309302A (en) * 2019-05-17 2019-10-08 江苏大学 A kind of uneven file classification method and system of combination SVM and semi-supervised clustering
CN110516068A (en) * 2019-08-23 2019-11-29 贵州大学 A kind of various dimensions Text Clustering Method based on metric learning
CN111046907A (en) * 2019-11-02 2020-04-21 国网天津市电力公司 Semi-supervised convolutional network embedding method based on multi-head attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钟将;刘龙海;梁传伟;: "基于成对约束的主动半监督文本聚类", 计算机工程, no. 13, 5 July 2011 (2011-07-05) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049697A (en) * 2023-01-10 2023-05-02 苏州科技大学 Interactive clustering quality improving method based on user intention learning
CN117875318A (en) * 2023-02-27 2024-04-12 同心县启胜新能源科技有限公司 Temperature and humidity control method and system for livestock breeding based on Internet of things and cloud platform

Also Published As

Publication number Publication date
CN114661903B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
Long et al. Sentiment analysis of text based on bidirectional LSTM with multi-head attention
CN111444340B (en) Text classification method, device, equipment and storage medium
CN113204952B (en) Multi-intention and semantic slot joint identification method based on cluster pre-analysis
CN106383877B (en) Social media online short text clustering and topic detection method
CN114661903B (en) Deep semi-supervised text clustering method, device and medium combining user intention
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN109344399B (en) Text similarity calculation method based on stacked bidirectional lstm neural network
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN107085581A (en) Short text classification method and device
CN109933792B (en) Viewpoint type problem reading and understanding method based on multilayer bidirectional LSTM and verification model
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN113672718A (en) Dialog intention recognition method and system based on feature matching and field self-adaption
CN114358109A (en) Feature extraction model training method, feature extraction model training device, sample retrieval method, sample retrieval device and computer equipment
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN116775497A (en) Database test case generation demand description coding method
CN113032556A (en) Method for forming user portrait based on natural language processing
CN112446405A (en) User intention guiding method for home appliance customer service and intelligent home appliance
CN116167833B (en) Internet financial risk control system and method based on federal learning
CN117216012A (en) Theme modeling method, apparatus, electronic device, and computer-readable storage medium
Jing et al. Chinese text sentiment analysis based on transformer model
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN115526174A (en) Deep learning model fusion method for finance and economics text emotional tendency classification
CN115906845A (en) E-commerce commodity title naming entity identification method
CN113204971B (en) Scene self-adaptive Attention multi-intention recognition method based on deep learning
CN110516068B (en) Multi-dimensional text clustering method based on metric learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant