CN115098672A - User demand discovery method and system based on multi-view deep clustering - Google Patents

User demand discovery method and system based on multi-view deep clustering Download PDF

Info

Publication number
CN115098672A
CN115098672A CN202210510779.6A CN202210510779A CN115098672A CN 115098672 A CN115098672 A CN 115098672A CN 202210510779 A CN202210510779 A CN 202210510779A CN 115098672 A CN115098672 A CN 115098672A
Authority
CN
China
Prior art keywords
view
clustering
representing
consistency
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210510779.6A
Other languages
Chinese (zh)
Inventor
杨颖�
蒋文文
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202210510779.6A priority Critical patent/CN115098672A/en
Publication of CN115098672A publication Critical patent/CN115098672A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a user demand discovery method, a system, a storage medium and electronic equipment based on multi-view deep clustering, and relates to the technical field of data mining. Firstly, acquiring a plurality of texts containing single user requirement description, and vectorizing the texts; then, obtaining multi-view text representation characteristics according to the vectorized text; and then, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm with multi-view consistency and diversity cooperation. By establishing a multi-view collaborative learning mechanism, view diversity information can be effectively retained, consistency information can be mined, information complementarity and bottom information consistency among multi-view data are fully utilized, accuracy of a user demand clustering result is improved, and therefore most of class representative viewpoints and few class novelty viewpoints related to user demands in user generated contents are effectively mined.

Description

User demand discovery method and system based on multi-view deep clustering
Technical Field
The invention relates to the technical field of data mining, in particular to a user demand discovery method and system based on multi-view deep clustering, a storage medium and electronic equipment.
Background
With the development of society, the marketing guidance of enterprises is gradually changed from production drive to user demand drive, the user demand is mined, and products or services are designed or updated to meet the user demand, so that the marketing guidance is an important link of the sustainable development of enterprises.
At present, the first step of mining user requirements is demand collection, the development of big data and internet technology broadens the collection channel of user demand data, user generated contents such as online comments, social media, blogs and the like are the main forms of user expression for personalized experience of products or services, rich text data are provided for mining user requirements, and the method is a promising data mining source, has higher analysis value and good timeliness, can quickly obtain original user contents, is low in cost, and the repeatability and non-informativeness of a large amount of label-free user generated contents enable establishment of an effective demand mining model to become a problem to be solved urgently.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a user demand discovery method, a system, a storage medium and electronic equipment based on multi-view deep clustering, which solve the technical problem that demand mining cannot be effectively carried out on user generated content.
(II) technical scheme
In order to realize the purpose, the invention is realized by the following technical scheme:
a user demand discovery method based on multi-view depth clustering comprises the following steps:
s1, acquiring a plurality of texts containing the description of the requirement of a single user, and vectorizing the texts;
s2, obtaining multi-view text representation characteristics according to the vectorized text;
and S3, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
Preferably, the obtaining of the multi-view text representation feature in S2 includes:
and respectively constructing a text convolutional neural network and a bidirectional long-time and short-time memory network based on a maximum pooling strategy and an average pooling strategy, and acquiring three-view text representation characteristics considering local characteristics and context characteristics simultaneously.
Preferably, the deep clustering network comprises a self-encoder composed of an encoding layer and a decoding layer, and a clustering layer, wherein the encoding layer is composed of a plurality of diversity encoders and a consistency encoder;
the S3 includes:
s31, inputting each view text representation feature into a corresponding diversity encoder for convolution transformation to obtain the diversity depth coding feature of a single view; inputting all the view text representation characteristics into a consistency encoder to carry out convolution transformation, and acquiring consistency depth coding characteristics containing all view information;
and S32, respectively inputting the diversity depth coding features and the consistency depth coding features into a KL divergence-based clustering layer, and acquiring a user demand clustering result.
Preferably, in the training phase, the loss function of the self-encoder is as follows:
Figure BDA0003639385950000031
wherein L is loss Representing a sample reconstruction loss function;
n is the number of views, and M is the number of samples;
Figure BDA0003639385950000032
a text vector representing the jth sample in the ith view;
Figure BDA0003639385950000033
coding function representing ith viewAnd Θ denotes encoder parameters;
Figure BDA0003639385950000034
represents the decoding function for the ith view and Ω represents the decoder parameters.
Preferably, in the training phase, the loss function of the cluster layer is as follows
Figure BDA0003639385950000037
Wherein L is C Representing a clustering loss function;
λ 1 representing a clustering loss adjustment coefficient;
Q i and P i Respectively representing the clustering soft label distribution and the target distribution corresponding to the diversity depth coding characteristics of the ith view; q and P represent the cluster soft label distribution and the target distribution shared by all views, respectively.
Preferably, the diversity depth coding feature and the uniformity depth coding feature are defined by Z i I ═ 1, 2, …, N, and Z represent;
appointing the cluster number K, and respectively depth coding the diversity characteristics Z i I-1, 2, …, inputting N into a K-means clustering algorithm, respectively generating centroids of K initial clusters, and calculating the distribution Q of the soft label of each view i And target distribution P i
Figure BDA0003639385950000035
Figure BDA0003639385950000036
Wherein the content of the first and second substances,
Figure BDA0003639385950000041
Figure BDA0003639385950000042
Figure BDA0003639385950000043
representing encoded feature information of a jth sample in an ith view;
Figure BDA0003639385950000044
representing the centroid of the c clustering cluster obtained by k-means clustering the M sample features of the i view;
Figure BDA0003639385950000045
representing the probability that the jth sample in the ith view belongs to the c cluster;
Figure BDA0003639385950000046
representing a reference probability that a jth sample in the ith view belongs to a c-th cluster;
preferably, the consistent depth coding features Z are input into a K-means clustering algorithm to generate centroids of K initial clustering clusters, and soft label distribution Q and target distribution P shared by all views are calculated;
Q=[q jc ] M×K ,j=1,2,…,M;c=1,2,…,K (7)
P=[p jc ] M×K ,j=1,2,…,M;c=1,2,…,K (8)
wherein the content of the first and second substances,
Figure BDA0003639385950000047
Figure BDA0003639385950000048
z j feature information representing a jth sample in the consistent depth coding features;
μ c representing the centroid of the c-th cluster obtained through k-means clustering consistency depth coding characteristics;
q jc representing the probability that the jth sample belongs to the c cluster;
p jc indicating the reference probability that the jth sample belongs to the c-th cluster.
Preferably, in the training phase, the total loss function of the deep clustering network is as follows
L=L loss +L C +L R (11)
Figure BDA0003639385950000051
Wherein L represents the total loss function of the deep clustering network, L R A representation parameter regularization term;
λ 2 expressing the regularization term adjustment coefficient of consistency, λ 3 Representing a diversity regularization term adjustment coefficient;
Figure BDA0003639385950000053
denotes the ith 1 (i 1 1,., N) soft label distribution of the views,
Figure BDA0003639385950000052
denotes the ith 2 (i 2 1,., N) view.
A multi-view depth clustering-based user demand discovery system comprises:
the text acquisition module is used for executing S1, acquiring a plurality of texts containing the description of the requirement of a single user and vectorizing the texts;
the characteristic obtaining module is used for executing S2 and obtaining the multi-view text representation characteristic according to the vectorized text;
and the result clustering module is used for executing S3, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
A storage medium storing a computer program for multi-view depth clustering based user requirement discovery, wherein the computer program causes a computer to execute the user requirement discovery method as described above.
An electronic device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the user need discovery method as described above.
(III) advantageous effects
The invention provides a user demand discovery method, a system, a storage medium and electronic equipment based on multi-view deep clustering. Compared with the prior art, the method has the following beneficial effects:
firstly, acquiring a plurality of texts containing single user requirement description, and vectorizing the texts; then, obtaining multi-view text representation characteristics according to the vectorized text; and then, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm with multi-view consistency and diversity cooperation. By establishing a multi-view collaborative learning mechanism, view diversity information can be effectively reserved, consistency information can be mined, information complementarity and bottom information consistency among multi-view data are fully utilized, accuracy of a user demand clustering result is improved, and therefore most of representative viewpoints and few novel viewpoints about user demands in user generated contents are effectively mined.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a user requirement discovery method based on multi-view deep clustering according to an embodiment of the present invention;
fig. 2 is a structural block diagram of a deep clustering network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The embodiment of the application provides a user demand discovery method, a system, a storage medium and electronic equipment based on multi-view deep clustering, and solves the technical problem that demand mining cannot be effectively carried out on user generated content.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
in the embodiment of the invention, firstly, a plurality of texts containing the description of the requirement of a single user are obtained, and the texts are vectorized; then, obtaining multi-view text representation characteristics according to the vectorized text; and then, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm with multi-view consistency and diversity cooperation. By establishing a multi-view collaborative learning mechanism, view diversity information can be effectively reserved, consistency information can be mined, information complementarity and bottom information consistency among multi-view data are fully utilized, accuracy of a user demand clustering result is improved, and therefore most of representative viewpoints and few novel viewpoints about user demands in user generated contents are effectively mined.
In order to better understand the technical scheme, the technical scheme is described in detail in the following with reference to the attached drawings of the specification and specific embodiments.
Example (b):
as shown in fig. 1, an embodiment of the present invention provides a user requirement discovery method based on multi-view deep clustering, including:
s1, acquiring a plurality of texts containing the description of the requirement of a single user, and vectorizing the texts;
s2, obtaining multi-view text representation characteristics according to the vectorized text;
and S3, inputting the text representation characteristics of each view into a pre-trained deep clustering network, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
According to the embodiment of the invention, by establishing the multi-view collaborative learning mechanism, view diversity information can be effectively retained, consistency information can be mined, information complementarity and bottom information consistency among multi-view data are fully utilized, and the accuracy of a user demand clustering result is improved, so that most of class representative viewpoints and few class novel viewpoints about user demands in user generated contents are effectively mined.
The following will describe each step of the above technical solution in detail with reference to the specific content:
in step S1, several texts containing a single user requirement description are acquired and the texts are vectorized.
Gathering user-generated content from online reviews, social media, blogs, and the like, defining x 1 ,x 2 ,…,x M M pieces of text containing a single user requirement description are represented, and the text is vectorized through word2vec technology.
In step S2, a multi-view text representation feature is acquired from the vectorized text.
In order to improve the information utilization rate of the text required by the user, a text convolution neural network and a two-way long-time and short-time memory network based on a maximum pooling strategy and an average pooling strategy are respectively constructed in the step, and a three-view text representation characteristic considering local characteristics and context characteristics simultaneously is obtained.
In step S3, the text representation characteristics of each view are input into the deep clustering network provided by the present invention, and a user demand clustering result is obtained by using a deep clustering algorithm with multi-view consistency and diversity cooperation.
It should be noted that, as shown in fig. 2, the deep clustering network provided by the present invention includes an auto-encoder composed of an encoding layer and a decoding layer, and a clustering layer. Wherein: the coding layer consists of a plurality of diversity encoders and a consistency encoder; the structure of the decoding layer and the coding layer is completely symmetrical, and the decoding layer and the coding layer comprise a plurality of diversity decoders and a consistency decoder; the input of the clustering layer is the output of all encoders, and the object of clustering action is the depth feature obtained by the encoding layer.
The embodiment of the invention adopts the deep neural network to extract the deep semantic features of the text generated by the user, avoids the defects of time and labor waste, strong subjectivity and the like caused by manual feature extraction, can quickly cluster the generated content of the user, finds most representative requirements and few novelty requirements of the user in time, and helps enterprises to update products and services; and the deep semantic features of the extracted user generated text are better adapted to the clustering task by adopting an end-to-end deep clustering network, the problem of isolation of upstream and downstream tasks caused by mutual independence of the traditional feature engineering and the clustering algorithm is avoided, the globally optimal feature expression can be constructed for the specific clustering algorithm, and the clustering effect is improved.
In addition, in order to fuse multi-view diversity and consistency information and reversely update the deep learning network parameters by using the clustering result, the following loss function and parameter training strategy are designed.
The parameter training process comprises the following two steps:
(1) pre-training the parameters of a deep self-encoder with the goal of minimizing the network reconstruction loss, a loss function L loss Is composed of
Figure BDA0003639385950000091
Wherein L is loss Representing a sample reconstruction loss function;
n is the number of views, and M is the number of samples;
Figure BDA0003639385950000101
a text vector representing the jth sample in the ith view;
Figure BDA0003639385950000102
representing the encoding function of the ith view, and theta represents the encoder parameters;
Figure BDA0003639385950000103
represents the decoding function for the ith view and Ω represents the decoder parameters.
(2) After the pre-trained coding and decoding layer network parameters are obtained, multi-view data are input into an encoder, the diversity depth coding characteristics of a single view and the consistency depth coding characteristics containing all view information are obtained, and Z is used for obtaining the diversity depth coding characteristics of a single view and the consistency depth coding characteristics of all view information i I is 1, 2, …, N and Z. Then, the clustering result is obtained according to the following steps.
And (2-1) calculating the clustering distribution of the diversity depth features.
Appointing the cluster number K, and respectively depth coding the diversity characteristics Z i I-1, 2, …, inputting N into a K-means clustering algorithm, respectively generating centroids of K initial clusters, and calculating the distribution Q of the soft label of each view i And target distribution P i
Figure BDA0003639385950000104
Figure BDA0003639385950000105
Wherein the content of the first and second substances,
Figure BDA0003639385950000106
Figure BDA0003639385950000107
Figure BDA0003639385950000108
representing encoded feature information of a jth sample in an ith view;
Figure BDA0003639385950000109
representing the centroid of the c clustering cluster obtained by k-means clustering the M sample features of the i view;
Figure BDA00036393859500001010
representing the probability that the jth sample in the ith view belongs to the c cluster;
Figure BDA0003639385950000111
representing a reference probability that a jth sample in the ith view belongs to the c cluster;
and (2-2) calculating the cluster distribution of the consistency depth characteristics.
Inputting the consistent depth coding characteristics Z into a K-means clustering algorithm, generating centroids of K initial clustering clusters, and calculating soft label distribution Q and target distribution P shared by all views;
Q=[q jc ] M×K ,j=1,2,…,M;c=1,2,…,K (6)
P=[p jc ] M×K ,j=1,2,…,M;c=1,2,…,K (7)
wherein the content of the first and second substances,
Figure BDA0003639385950000112
Figure BDA0003639385950000113
z j feature information representing a jth sample in the consistent depth coding features;
μ c representing the centroid of the c-th cluster obtained through k-means cluster consistency depth coding characteristics;
q jc representing the probability that the jth sample belongs to the c cluster;
p jc indicating the reference probability that the jth sample belongs to the c-th cluster.
In addition, in the implementation of the present invention, the loss function of the clustering layer is designed as follows:
Figure BDA0003639385950000114
wherein L is C Representing a clustering loss function;
λ 1 representing a clustering loss adjustment coefficient;
Q i and P i Respectively representing the clustering soft label distribution and the target distribution corresponding to the diversity depth coding characteristics of the ith view; q and P represent the cluster soft label distribution and the target distribution shared by all views, respectively.
Construction while taking sample reconstruction loss L into account loss Cluster loss L C Parameter regularization term L R The total loss function is shown in formula (11), and a random gradient descent algorithm is adopted to optimize the total loss function L to obtain the soft label distribution Q and the target distribution P shared by all the views of the optimal parameters.
The total loss function of the deep clustering network is as follows:
L=L loss +L C +L R (11)
Figure BDA0003639385950000121
wherein L represents the total loss function of the deep clustering network, L R A representation parameter regularization term;
λ 2 expressing the regularization term adjustment coefficient of consistency, λ 3 Representing a diversity regularization term adjustment coefficient;
Figure BDA0003639385950000123
denotes the ith 1 (i 1 1,., N) soft label distribution of the views,
Figure BDA0003639385950000124
denotes the ith 2 (i 2 1,., N) view.
In the embodiment of the invention, when the loss function is designed, the KL divergence is adopted to measure the difference between the two distributions, so that the loss function is greatly simplified, and the effect of aligning the clustering distribution parameters of each view with the consistent clustering distribution parameters is achieved while the diversity view information is kept.
After the algorithm converges, the final clustering distribution result is generated according to a formula (13) by adopting the globally optimized shared clustering soft label distribution Q:
Figure BDA0003639385950000122
wherein s is j Representing the clustering result of the jth sample; j ═ 1, 2, …, M; c is 1, 2, …, K, q jc The value representing the shared clustered soft label distribution Q located in the jth row and the c-th column represents the probability that the jth sample belongs to the c-th cluster.
Correspondingly, the S3 specifically includes:
s31, inputting each view text representation feature into a corresponding diversity encoder for convolution transformation, and acquiring the diversity depth coding feature of a single view; inputting all the view text representation characteristics into a consistency encoder to carry out convolution transformation, and acquiring consistency depth coding characteristics containing all view information;
and S32, respectively inputting the diversity depth coding features and the consistency depth coding features into a KL divergence-based clustering layer, and acquiring a user demand clustering result.
The embodiment of the invention provides a user demand discovery system based on multi-view deep clustering, which comprises the following steps:
the text acquisition module is used for executing S1, acquiring a plurality of texts containing the description of the requirement of a single user and vectorizing the texts;
the characteristic obtaining module is used for executing S2 and obtaining multi-view text representation characteristics according to the vectorized text;
and the result clustering module is used for executing S3, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
An embodiment of the present invention provides a storage medium storing a computer program for user requirement discovery based on multi-view deep clustering, wherein the computer program enables a computer to execute the user requirement discovery method as described above.
An embodiment of the present invention further provides an electronic device, including:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the user need discovery method as described above.
It can be understood that the system, the storage medium, and the electronic device for discovering the user requirement based on the multi-view deep clustering provided in the embodiment of the present invention correspond to the method for discovering the user requirement based on the multi-view deep clustering provided in the embodiment of the present invention, and the corresponding parts in the method for discovering the user requirement may be referred to for explanation, example, and beneficial effects of the relevant contents, and are not described herein again.
In summary, compared with the prior art, the method has the following beneficial effects:
1. according to the embodiment of the invention, through establishing a multi-view collaborative learning mechanism, view diversity information can be effectively retained, consistency information can be mined, information complementarity and bottom information consistency among multi-view data are fully utilized, and the accuracy of a user demand clustering result is improved, so that most representative viewpoints and few novel viewpoints about user demands in user generated contents are effectively mined.
2. The embodiment of the invention adopts the deep neural network to extract the deep semantic features of the text generated by the user, avoids the defects of time and labor waste, strong subjectivity and the like caused by manual feature extraction, can quickly cluster the generated content of the user, finds most representative requirements and few novelty requirements of the user in time, and helps enterprises to update products and services; and the deep semantic features of the extracted user generated text are better adapted to the clustering task by adopting an end-to-end deep clustering network, the problem of isolation of upstream and downstream tasks caused by mutual independence of the traditional feature engineering and the clustering algorithm is avoided, the globally optimal feature expression can be constructed for the specific clustering algorithm, and the clustering effect is improved.
3. In the embodiment of the invention, when the loss function is designed, the KL divergence is adopted to measure the difference between the two distributions, so that the loss function is greatly simplified, and the effect of aligning the clustering distribution parameters of each view with the consistent clustering distribution parameters is achieved while the diversity view information is kept.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A user demand discovery method based on multi-view depth clustering is characterized by comprising the following steps:
s1, acquiring a plurality of texts containing the description of the requirement of a single user, and vectorizing the texts;
s2, obtaining multi-view text representation characteristics according to the vectorized text;
and S3, inputting the text representation characteristics of each view into a pre-trained deep clustering network, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
2. The method for discovering user requirements according to claim 1, wherein the obtaining of the multi-view text representation features in S2 includes:
and respectively constructing a text convolutional neural network and a two-way long-time memory network based on a maximum pooling strategy and an average pooling strategy, and acquiring a three-view text representation characteristic considering local characteristics and context characteristics simultaneously.
3. The user demand discovery method according to claim 1 or 2, wherein the deep clustering network comprises an auto-encoder composed of an encoding layer and a decoding layer, and a clustering layer, wherein the encoding layer is composed of a plurality of diversity encoders and a consistency encoder;
the S3 includes:
s31, inputting each view text representation feature into a corresponding diversity encoder for convolution transformation, and acquiring the diversity depth coding feature of a single view; inputting all the view text representation characteristics into a consistency encoder to carry out convolution transformation, and acquiring consistency depth coding characteristics containing all view information;
and S32, respectively inputting the diversity depth coding features and the consistency depth coding features into a KL divergence-based clustering layer, and acquiring a user demand clustering result.
4. A user demand discovery method according to claim 3, wherein during a training phase, the loss function of said self-encoder is as follows:
Figure FDA0003639385940000021
wherein L is loss Representing a sample reconstruction loss function;
n is the number of views, and M is the number of samples;
Figure FDA0003639385940000023
a text vector representing the jth sample in the ith view;
Figure FDA0003639385940000024
representing the encoding function of the ith view, and theta represents the encoder parameters;
Figure FDA0003639385940000025
represents the decoding function for the ith view and Ω represents the decoder parameters.
5. The method of claim 4, wherein in the training phase, the loss function of the clustering layer is as follows
Figure FDA0003639385940000022
Wherein L is C Representing a clustering loss function;
λ 1 representing a clustering loss adjustment coefficient;
Q i and P i Respectively representing the clustering soft label distribution and the target distribution corresponding to the diversity depth coding characteristics of the ith view; q and P represent the cluster soft label distribution and the target distribution shared by all views, respectively.
6. The user demand discovery method according to claim 5,
defining diversity depth coding characteristics and consistency depth coding characteristics, respectively using Z i I ═ 1, 2, …, N, and Z represent;
appointing the cluster number K, and respectively depth coding the diversity characteristics Z i I-1, 2, …, inputting N into a K-means clustering algorithm, respectively generating centroids of K initial clusters, and calculating the distribution Q of the soft label of each view i And target distribution P i
Figure FDA0003639385940000031
Figure FDA0003639385940000032
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003639385940000033
Figure FDA0003639385940000034
Figure FDA0003639385940000035
representing encoded feature information of a jth sample in an ith view;
Figure FDA0003639385940000036
representing the centroid of the c cluster obtained by good-means clustering the M sample features of the i view;
Figure FDA0003639385940000037
representing the probability that the jth sample in the ith view belongs to the c cluster;
Figure FDA0003639385940000038
representing a reference probability that a jth sample in the ith view belongs to a c-th cluster;
and/or inputting the consistent depth coding characteristics Z into a K-means clustering algorithm to generate centroids of K initial clustering clusters, and calculating soft label distribution Q and target distribution P shared by all views;
Q=[q jc ] M×K ,j=1,2,…,M;c=1,2,…,K (7)
P=[p jc ] M×K ,j=1,2,…,M;c=1,2,…,K (8)
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003639385940000039
Figure FDA00036393859400000310
z j feature information representing a jth sample in the consistent depth coding features;
μ c representing the centroid of the c-th cluster obtained through k-means cluster consistency depth coding characteristics;
q jc representing the probability that the jth sample belongs to the c-th cluster;
p jc indicating the reference probability that the jth sample belongs to the c-th cluster.
7. The method of claim 5, wherein in the training phase, the total loss function of the deep clustering network is as follows
L=L loss +L C +L R (11)
Figure FDA0003639385940000041
Wherein L represents the total loss function of the deep clustering network, L R A representation parameter regularization term; lambda [ alpha ] 2 Expressing the regularization term adjustment coefficient of consistency, λ 3 Representing a diversity regularization term adjustment coefficient;
Figure FDA0003639385940000042
denotes the ith 1 The distribution of the soft labels of the individual views,
Figure FDA0003639385940000043
denotes the ith 2 Soft label distribution for individual views.
8. A user demand discovery system based on multi-view depth clustering, comprising:
the text acquisition module is used for executing S1, acquiring a plurality of texts containing the description of the requirement of a single user and vectorizing the texts;
the characteristic obtaining module is used for executing S2 and obtaining the multi-view text representation characteristic according to the vectorized text;
and the result clustering module is used for executing S3, inputting the text representation characteristics of each view into the deep clustering network provided by the invention, and acquiring a user demand clustering result by adopting a deep clustering algorithm of multi-view consistency and diversity cooperation.
9. A storage medium storing a computer program for multi-view depth clustering based user requirement discovery, wherein the computer program causes a computer to execute the user requirement discovery method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the user need discovery method of any of claims 1-7.
CN202210510779.6A 2022-05-11 2022-05-11 User demand discovery method and system based on multi-view deep clustering Pending CN115098672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210510779.6A CN115098672A (en) 2022-05-11 2022-05-11 User demand discovery method and system based on multi-view deep clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210510779.6A CN115098672A (en) 2022-05-11 2022-05-11 User demand discovery method and system based on multi-view deep clustering

Publications (1)

Publication Number Publication Date
CN115098672A true CN115098672A (en) 2022-09-23

Family

ID=83287505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210510779.6A Pending CN115098672A (en) 2022-05-11 2022-05-11 User demand discovery method and system based on multi-view deep clustering

Country Status (1)

Country Link
CN (1) CN115098672A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778233A (en) * 2023-06-07 2023-09-19 中国人民解放军国防科技大学 Incomplete depth multi-view semi-supervised classification method based on graph neural network
CN116935169A (en) * 2023-09-13 2023-10-24 腾讯科技(深圳)有限公司 Training method for draft graph model and draft graph method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778233A (en) * 2023-06-07 2023-09-19 中国人民解放军国防科技大学 Incomplete depth multi-view semi-supervised classification method based on graph neural network
CN116778233B (en) * 2023-06-07 2024-02-06 中国人民解放军国防科技大学 Incomplete depth multi-view semi-supervised classification method based on graph neural network
CN116935169A (en) * 2023-09-13 2023-10-24 腾讯科技(深圳)有限公司 Training method for draft graph model and draft graph method
CN116935169B (en) * 2023-09-13 2024-01-02 腾讯科技(深圳)有限公司 Training method for draft graph model and draft graph method

Similar Documents

Publication Publication Date Title
CN111581401B (en) Local citation recommendation system and method based on depth correlation matching
CN109325112B (en) A kind of across language sentiment analysis method and apparatus based on emoji
CN115098672A (en) User demand discovery method and system based on multi-view deep clustering
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
CN109857871B (en) User relationship discovery method based on social network mass contextual data
Lu et al. A method based on GA-CNN-LSTM for daily tourist flow prediction at scenic spots
CN107122455A (en) A kind of network user's enhancing method for expressing based on microblogging
CN112214604A (en) Training method of text classification model, text classification method, device and equipment
CN111581376B (en) Automatic knowledge graph construction system and method
CN108197294A (en) A kind of text automatic generation method based on deep learning
CN107357899B (en) Short text sentiment analysis method based on sum-product network depth automatic encoder
CN107577782B (en) Figure similarity depicting method based on heterogeneous data
CN112183085A (en) Machine reading understanding method and device, electronic equipment and computer storage medium
CN110781319B (en) Common semantic representation and search method and device for cross-media big data
CN113822776B (en) Course recommendation method, device, equipment and storage medium
CN109918477A (en) A kind of distributed search resources bank selection method based on variation self-encoding encoder
US20180198860A1 (en) Irc-infoid data standardization for use in a plurality of mobile applications
CN115758758A (en) Inverse synthesis prediction method, medium, and apparatus based on similarity feature constraint
CN114692605A (en) Keyword generation method and device fusing syntactic structure information
CN115952424A (en) Graph convolution neural network clustering method based on multi-view structure
Ponce et al. Forest conservation and renewable energy consumption: an ARDL approach
Liu et al. Construction of power fault knowledge graph based on deep learning
CN113239143B (en) Power transmission and transformation equipment fault processing method and system fusing power grid fault case base
CN117036833B (en) Video classification method, apparatus, device and computer readable storage medium
CN113343643A (en) Supervised-based multi-model coding mapping recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination