CN109117436A - Synonym automatic discovering method and its system based on topic model - Google Patents

Synonym automatic discovering method and its system based on topic model Download PDF

Info

Publication number
CN109117436A
CN109117436A CN201710492902.5A CN201710492902A CN109117436A CN 109117436 A CN109117436 A CN 109117436A CN 201710492902 A CN201710492902 A CN 201710492902A CN 109117436 A CN109117436 A CN 109117436A
Authority
CN
China
Prior art keywords
topic
word
synonym
words
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710492902.5A
Other languages
Chinese (zh)
Inventor
曲德君
李进岭
曹大军
杨冠军
郁抒思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinfeifan E-Commerce Co Ltd
Original Assignee
Shanghai Xinfeifan E-Commerce Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinfeifan E-Commerce Co Ltd filed Critical Shanghai Xinfeifan E-Commerce Co Ltd
Priority to CN201710492902.5A priority Critical patent/CN109117436A/en
Publication of CN109117436A publication Critical patent/CN109117436A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of synonym automatic discovering method based on topic model, at least includes the following steps: importing the data of synonym to be found;Word segmentation processing is carried out according to data of the information of database to importing;Building topic model simultaneously carries out topic model cluster;Minimum relevant cluster is carried out to Subject Clustering;Export synonym.The present invention does not need priori knowledge, and mark by hand, realizes the automatic cluster of synonym, improves the efficiency of synonym discovery;And solve the problems, such as semantic approximation to a certain extent, it is not necessarily to manual intervention in addition to last screening in implementation process, to have biggish promotion to the efficiency that synonym is found automatically.

Description

Automatic synonym discovery method and system based on topic model
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a system for automatically discovering synonyms based on a topic model.
Background
With the development of the information age, the scale of web text data is getting larger, so that the processing of natural language is becoming more and more important, more and more words are generated based on new appearance, and the importance of the semantic automatic analysis technology, such as the semantic automatic discovery technology, is embodied day by day. The existing mainstream synonym automatic discovery algorithm needs prior knowledge to construct a synonym discovery reference text mode, so that the synonym discovery efficiency is limited; in another reference text pattern matching method, parts of speech and semantics of known words need to be manually labeled in advance to construct a reference text pattern.
Referring to fig. 1, it can be seen that, in the existing system, synonym discovery needs to be assisted by manual screening, and because the method for automatically discovering synonyms has a certain error rate, the existing synonym discovery methods are all inefficient.
In the present patent application with the patent application number CN201410156107.5, a synonym determination method, a synonym search method, and a synonym server are claimed, but according to the understanding of the technical solutions in the application documents, the solutions given in the comparison documents cannot improve the efficiency of synonym discovery.
Disclosure of Invention
The invention aims to provide a synonym automatic discovery method based on a topic model, which comprises the steps of constructing the topic model by analyzing the mutual occurrence probability of words, gathering the words of the same topic into the same cluster by utilizing a Gibbs sampling method, and further clustering the words of each cluster by utilizing an iterative minimum correlation method to obtain alternative synonym groups.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a synonym automatic discovery method based on a topic model at least comprises the following steps:
importing data of synonyms to be found;
performing word segmentation processing on the imported data according to the information of the database;
constructing a theme model and clustering the theme model;
performing minimum correlation clustering on the topic clusters;
outputting the synonyms.
Wherein, a step of manually screening synonyms is further included after the step of outputting synonyms.
Wherein, the topic model can be a hidden Dirichlet allocation model, and the clustering step at least comprises the following steps:
from Dirichlet distribution DirαSampling to generate subject distribution theta of document iiWherein α is a parameter of Dirichlet distribution preset by a user and represents the degree of balance of distribution of the subject on the document, and theta is DirαOne sample of (1);
distribution from topic thetaiGenerating main word of jth word of document i by intermediate samplingQuestion zi,j
From Dirichlet distribution Dirβ(β is a parameter of Dirichlet distribution) sampling to generate a topic Zi,jDistribution of the above words
From the distribution of wordsFinally generating word w by intermediate samplingi,j
Wherein, the clustering of the topic model is under the precondition that the topic to which all other words belong is determined, and a word ziThe posterior probability P belonging to topic j is:
where W is the total number of words, T is the total number of implied topics, α, β are user-set parameters as described above,finger at the exclusion of ziThen, wiThe number of words in (1) that belong to topic j, and so on.Finger at the exclusion of ziThen, wiThe number of words in (i) that belong to document i.
Wherein, the theme clustering is the theme clustering of Gibbs sampling method, include the following steps at least:
A. each word in the document set is randomly assigned to a topic;
B. assigning each word of the document set to each topic, calculating the probability P that the word belongs to the topic under the condition, and finally enabling the word to belong to the topic with the highest P;
C. and step B is executed iteratively until the probability variation of each iteration is less than the threshold value given by the user.
When the topic clustering is carried out with the minimum correlation clustering, the co-occurrence condition of the words in the document set is calculated through the Pearson correlation coefficient, and the words w belonging to the topic T are subjected toiTo say that this word is in document dkThe number of occurrences in (1) is ri,kConstructing a vectorThe length of this vector is equal to the number of documents in the document set, and the value of each unit k is ri,kThe number of the themes, for each theme,andthe Pearson correlation coefficient ρ between:
wherein,andcosine of the angle between the two vectors.
Wherein the least relevant clusters include at least:
a1, randomly assigning each word in T to a cluster;
b1, endowing each word to each theme, calculating a Pearson correlation coefficient between the vector of the word and the average vector of each theme (excluding the word), and selecting the class with the lowest Pearson correlation coefficient as the cluster to which the word belongs;
and C1, iteratively executing the step B1 until the change amount of the correlation coefficient generated in each iteration is lower than the threshold value.
An automatic synonym discovery system based on a topic model, which comprises a database storing natural language processing information, and is characterized by at least comprising:
the data import module is used for importing data of synonyms to be found;
the word segmentation processing module is used for carrying out word segmentation processing on the imported data according to the information of the database;
the topic model clustering module is used for constructing a topic model and clustering the topic model;
the minimum correlation clustering module is used for performing minimum correlation clustering on the theme clusters;
the invention discloses a synonym output module for outputting synonym data, which has the following beneficial effects:
the invention discloses a synonym automatic discovery method based on a topic model. And constructing a topic model by analyzing the mutual occurrence probability of the words, and gathering the words expressing the same topic. Then, the topics are further clustered into alternative synonym groups by a minimal correlation clustering method. According to the method, prior knowledge and manual labeling are not needed, automatic clustering of synonyms is achieved, and the efficiency of synonym discovery is improved; the problem of semantic similarity is solved to a certain extent, and manual intervention is not needed except for final screening in the implementation process, so that the efficiency of automatic synonym discovery is greatly improved.
Drawings
FIG. 1 is a flow diagram of a method of a synonym discovery system in the prior art;
FIG. 2 is a flowchart illustrating a method for automatically discovering synonyms based on a topic model according to the present invention.
Detailed Description
The technical solution of the present invention is further described below with reference to the following embodiments and the accompanying drawings.
The invention provides a synonym automatic discovery method based on a topic model, which at least comprises the following steps:
importing data of synonyms to be found;
performing word segmentation processing on the imported data according to the information of the database;
constructing a theme model and clustering the theme model;
performing minimum correlation clustering on the topic clusters;
outputting the synonyms.
In the method of the present invention, a step of manually screening synonyms is further included after the step of outputting synonyms, and the technique of manually screening synonyms can be implemented by using the prior art, and therefore, details are not repeated in this embodiment.
In the present invention, the topic model may be a hidden dirichlet allocation model, and the clustering step at least includes:
from Dirichlet distribution DirαSampling to generate subject distribution theta of document iiWherein α is a parameter of Dirichlet distribution preset by a user and represents the degree of balance of distribution of the subject on the document, and theta is DirαOne sample of (1);
distribution from topic thetaiMidampling to generate a topic z of a jth word of a document ithi,j
From Dirichlet distribution Dirβ(β is a parameter of Dirichlet distribution) sampling to generate a topic Zi,jDistribution of the above words
From the distribution of wordsFinally generating word w by intermediate samplingi,j
Clustering of the topic model under the precondition that the topic to which all other words belong is determined, wherein the topic z of a certain wordiThe posterior probability P belonging to topic j is:
where W is the total number of words, T is the total number of implied topics, α, β are user-set parameters as described above,finger at the exclusion of ziThen, wiThe number of words in (1) that belong to topic j, and so on.Finger at the exclusion of ziThen, wiThe number of words in (i) that belong to document i.
The theme clustering is the theme clustering of the Gibbs sampling method, and at least comprises the following steps:
A. each word in the document set is randomly assigned to a topic;
B. assigning each word of the document set to each topic, calculating the probability P that the word belongs to the topic under the condition, and finally enabling the word to belong to the topic with the highest P;
C. and step B is executed iteratively until the probability variation of each iteration is less than the threshold value given by the user.
In addition, when the topic clustering is carried out with the minimum correlation clustering, the co-occurrence condition of the words in the document set is calculated through the Pearson correlation coefficient, and the words w belonging to the topic T are subjected toiTo say that this word is in document dkThe number of occurrences in (1) is ri,kConstructing a vectorThe length of this vector is equal to the number of documents in the document set, and the value of each unit k is ri,kThe number of the themes, for each theme,andthe Pearson correlation coefficient ρ between:
wherein,andcosine of the angle between the two vectors.
The minimal correlation clustering comprises at least the following steps:
a1, randomly assigning each word in T to a cluster;
b1, endowing each word to each theme, calculating a Pearson correlation coefficient between the vector of the word and the average vector of each theme (excluding the word), and selecting the class with the lowest Pearson correlation coefficient as the cluster to which the word belongs;
and C1, iteratively executing the step B1 until the change amount of the correlation coefficient generated in each iteration is lower than the threshold value.
In addition, the present invention further provides a system for automatically discovering synonyms based on a topic model by using the above method, which includes a database storing natural language processing information, and referring to fig. 2, and at least includes:
the data import module is used for importing data of synonyms to be found;
the word segmentation processing module is used for carrying out word segmentation processing on the imported data according to the information of the database;
the topic model clustering module is used for constructing a topic model and clustering the topic model;
the minimum correlation clustering module is used for performing minimum correlation clustering on the theme clusters;
and the synonym output module is used for outputting synonym data.
The method can be applied to analysis of mass unlabeled text materials, and is particularly suitable for the conditions of cold start of a corpus and lack of known synonym data.
As further shown in fig. 2, the system process of the present invention includes data import, word segmentation, topic model clustering, minimum correlation clustering, synonym output, and an optional manual screening module. Wherein, the data import means: a certain amount of text is imported into the system as basic data, where one text may be a news article, an advertisement, an academic paper, etc. Typically this is a collection of a set of documents that are relevant to a particular business of an enterprise. The word segmentation means that: because there is no separator between words in the chinese context, in order to further process the imported text set and implement semantic analysis, word segmentation is required for the text. The related art of word segmentation can be realized by adopting the prior art, and therefore, the detailed description is not repeated. In addition, manual tagging is adopted in the embodiment of the invention, and the manual tagging is generally divided into part-of-speech tagging and semantic tagging on the result after word segmentation, wherein part-of-speech tagging indexes are used for tagging whether a word is a verb, a noun or a conjunctive word and the like; semantic annotation refers to further classifying words according to preset categories, such as that "husky" belongs to the animal category, that "pen" belongs to the office supply category, and the like. And analyzing the occurrence rule of the words, and finding frequent semantic sequences, namely generating the template. Words that appear at the same position in the same semantic sequence may be synonyms, e.g., "today's english exam" translates into semantic sequences that are "time-course-verb" and thus "exam" for "tomorrow's math exam" for the same semantic sequence is at the same position as "exam" for "today's english exam" and may be synonyms. The generation of the related template can be completely realized by adopting the existing technology, and therefore, the description is not repeated. Also synonyms for output mean: for the keywords input by the user, the system searches a template suitable for the keywords in a template library, finds alternative synonyms of the keywords in the input text set by using the template, and finally determines the synonym relationship through manual screening. In the embodiment of the present invention, the data import, the word segmentation, the manual labeling, the template generation and the synonym output in fig. 2 can all be implemented by the same technology as in fig. 1.
In this embodiment, there is topic model clustering:
the method comprises the following steps of researching the generation frequency of each word in a natural language text according to a Latent Dirichlet Allocation (LDA) model, wherein the model also has an implicit Topic (Topic) set, semantically speaking, each document in the document set is a representation of a certain Topic(s) in the Topic set, and each word in the document can be attributed to a certain Topic (Topic). According to the LDA model, a document is generated as follows:
a) from Dirichlet distribution DirαSampling to generate subject distribution theta of document iiWhere α is a parameter of the Dirichlet distribution indicating that the topic is in the documentThe degree of equalization of the upper distribution is generally preset by a user; theta is DirαOne sample of (1);
b) distribution from topic thetaiMidampling to generate a topic z of a jth word of a document ithi,j
c) From Dirichlet distribution Dirβ(β is a parameter of Dirichlet distribution) sampling to generate a topic Zi,jDistribution of the above words
d) From the distribution of wordsFinally generating word w by intermediate samplingi,j
In fact, because the distribution of the implicit topics is unknown, fitting is generally performed by reversely calculating the distribution of the topics according to the distribution of the words in the document set, so that the reversely calculated distribution of the topics conforms to the actual distribution of the words as much as possible. The commonly used fitting method includes maximum likelihood method and Gibbs sampling method, and both the calculation complexity of the maximum likelihood method and the Gibbs sampling method are O (N × T × i) and are at the same level. A certain word z, under precondition of a definite topic to which all other words belongiThe formula for calculating the posterior probability P belonging to topic j is (formula 1):
in equation 1, W is the total number of words, T is the total number of implied topics, α, β are user-set parameters as described above,finger at the exclusion of ziThen, wiThe number of words in (1) that belong to topic j, and so on.Finger at the exclusion of ziThen, wiThe number of words in (i) that belong to document i.
Therefore, according to the formula 1, the topic clustering implementation method of the Gibbs sampling method is as follows:
a) each word in the document set is randomly assigned to a topic;
b) according to formula 1, each word of the document set is given to each topic, the probability P that the word belongs to the topic is calculated, and finally the word belongs to the topic with the highest P;
c) and b) executing the iteration until the probability variation of each iteration is less than the threshold value given by the user.
In this embodiment, the input of the topic model cluster is the document set after word segmentation, the output is the topic cluster to which each word belongs, and the following further explains the minimum related cluster:
1. in the case of the minimum correlation clustering process, the words belonging to the same topic are not necessarily synonyms, for example, in news explaining the topic of a disaster, the words "down mountain" and "earthquake" are explaining the same event and belong to the same topic, but are not synonyms because they belong to different semantic groups. Therefore, there is a need to further cluster words in the same topic, forming groups, where the words in each group are more likely to be synonyms.
In view of the context schema content of existing solutions, synonyms typically appear in different documents at the same schema location, i.e., synonyms typically do not appear in the same document. Therefore, words in the same topic can be further clustered according to the co-occurrence condition of the words, and a group of words with few co-occurrences with each other is more likely to be synonyms.
2. When the minimum correlation clustering processing is carried out, the co-occurrence condition of the words in the document set can be calculated through the Pearson correlation coefficient. For belonging to a topicWord w of TiTo say that this word is in document dkThe number of occurrences in (1) is ri,k. Thus, for each word, a vector can be constructedThe length of this vector is equal to the number of documents in the document set, and the value of each unit k is ri,k. Thus, for each topic, the user may,andthe Pearson correlation coefficient ρ between (formula 2):
in the formula 2, the first and second groups,andcosine of the angle between the two vectors. The larger the angle, the smaller ρ, wiAnd wjThe fewer co-occurrences there between, the less co-occurring words within the same topic may be synonyms.
And further clustering words in the same topic by adopting a minimum correlation clustering method to generate alternative synonym groups. The meaning of the minimum correlation clustering method is that a batch of clusters is generated, and the correlation coefficient of two Pearson between each word in each cluster is minimum.
The implementation method of the minimum correlation clustering is as follows:
a) randomly assigning each word in the T to a cluster;
b) assigning each word to each topic, calculating a Pearson correlation coefficient between a vector of the word and an average vector of each topic (excluding the word), and selecting a class with the lowest Pearson correlation coefficient as a cluster to which the word belongs;
c) and b) executing the iteration until the change amount of the correlation coefficient generated by each iteration is lower than the threshold value.
In the invention, the input of the topic model cluster is the topic cluster to which each word belongs, and the output is the alternative synonym grouping for further subdividing the topic, wherein the words in each grouping belong to the same semantic category and belong to synonyms.
In summary, the method of the present invention constructs a topic model by analyzing the mutual occurrence probability of words, and aggregates words expressing the same topic. Then, the topics are further clustered into alternative synonym groups by a minimal correlation clustering method. The invention solves the problem of semantic similarity to a certain extent, and does not need manual intervention except final screening in the implementation process, thereby greatly improving the efficiency of automatic synonym discovery.
The sequence of the above embodiments is only for convenience of description and does not represent the advantages and disadvantages of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A synonym automatic discovery method based on a topic model is characterized by at least comprising the following steps:
importing data of synonyms to be found;
performing word segmentation processing on the imported data according to the information of the database;
constructing a theme model and clustering the theme model;
performing minimum correlation clustering on the topic clusters;
outputting the synonyms.
2. The method according to claim 1, further comprising a step of manually selecting synonyms after the step of outputting synonyms.
3. The method according to claim 1, wherein the topic model is a hidden dirichlet allocation model, and the clustering step at least comprises:
from Dirichlet distribution DirαSampling to generate subject distribution theta of document iiWherein α is a parameter of Dirichlet distribution preset by a user and represents the degree of balance of distribution of the subject on the document, and theta is DirαOne sample of (1);
distribution from topic thetaiMidampling to generate a topic z of a jth word of a document ithi,j
From Dirichlet distribution Dirβ(β is a parameter of Dirichlet distribution) sampling to generate a topic Zi,jDistribution of the above words
From the distribution of wordsFinally generating word w by intermediate samplingi,j
4. The method according to claim 1, wherein the topic model cluster is a word z under a precondition determined by a topic to which all other words belongiThe posterior probability P belonging to topic j is:
wherein, W is the total number of words,t is the total number of implied topics, α, β are user-set parameters as described above,finger at the exclusion of ziThen, wiThe number of words in (1) that belong to topic j, and so on.Finger at the exclusion of ziThen, wiThe number of words in (i) that belong to document i.
5. The method according to claim 4, wherein the topic cluster is a Gibbs sampling topic cluster, and comprises at least the following steps:
A. each word in the document set is randomly assigned to a topic;
B. assigning each word of the document set to each topic, calculating the probability P that the word belongs to the topic under the condition, and finally enabling the word to belong to the topic with the highest P;
C. and step B is executed iteratively until the probability variation of each iteration is less than the threshold value given by the user.
6. The method according to claim 1, wherein when performing the least relevant clustering on the topic clusters, the co-occurrence of words in the document set is calculated by Pearson correlation coefficient, and for words w belonging to the topic T, the method is characterized in thatiTo say that this word is in document dkThe number of occurrences in (1) is ri,kConstructing a vectorThe length of this vector is equal to the number of documents in the document set, and the value of each unit k is ri,kThe number of the themes, for each theme,andthe Pearson correlation coefficient ρ between:
wherein,andcosine of the angle between the two vectors.
7. The method according to claim 6, wherein the minimal related clusters at least comprise:
a1, randomly assigning each word in T to a cluster;
b1, endowing each word to each theme, calculating a Pearson correlation coefficient between the vector of the word and the average vector of each theme (excluding the word), and selecting the class with the lowest Pearson correlation coefficient as the cluster to which the word belongs;
and C1, iteratively executing the step B1 until the change amount of the correlation coefficient generated in each iteration is lower than the threshold value.
8. A topic model-based automatic synonym discovery system using the method of claim 1, comprising a database storing natural language processing information, and further comprising at least:
the data import module is used for importing data of synonyms to be found;
the word segmentation processing module is used for carrying out word segmentation processing on the imported data according to the information of the database;
the topic model clustering module is used for constructing a topic model and clustering the topic model;
the minimum correlation clustering module is used for performing minimum correlation clustering on the theme clusters;
and the synonym output module is used for outputting synonym data.
CN201710492902.5A 2017-06-26 2017-06-26 Synonym automatic discovering method and its system based on topic model Withdrawn CN109117436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710492902.5A CN109117436A (en) 2017-06-26 2017-06-26 Synonym automatic discovering method and its system based on topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710492902.5A CN109117436A (en) 2017-06-26 2017-06-26 Synonym automatic discovering method and its system based on topic model

Publications (1)

Publication Number Publication Date
CN109117436A true CN109117436A (en) 2019-01-01

Family

ID=64733933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710492902.5A Withdrawn CN109117436A (en) 2017-06-26 2017-06-26 Synonym automatic discovering method and its system based on topic model

Country Status (1)

Country Link
CN (1) CN109117436A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991168A (en) * 2019-12-05 2020-04-10 京东方科技集团股份有限公司 Synonym mining method, synonym mining device, and storage medium
CN111898366A (en) * 2020-07-29 2020-11-06 平安科技(深圳)有限公司 Document subject word aggregation method and device, computer equipment and readable storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991168A (en) * 2019-12-05 2020-04-10 京东方科技集团股份有限公司 Synonym mining method, synonym mining device, and storage medium
WO2021109787A1 (en) * 2019-12-05 2021-06-10 京东方科技集团股份有限公司 Synonym mining method, synonym dictionary application method, medical synonym mining method, medical synonym dictionary application method, synonym mining apparatus and storage medium
US11977838B2 (en) 2019-12-05 2024-05-07 Boe Technology Group Co., Ltd. Synonym mining method, application method of synonym dictionary, medical synonym mining method, application method of medical synonym dictionary, synonym mining device and storage medium
CN110991168B (en) * 2019-12-05 2024-05-17 京东方科技集团股份有限公司 Synonym mining method, synonym mining device, and storage medium
CN111898366A (en) * 2020-07-29 2020-11-06 平安科技(深圳)有限公司 Document subject word aggregation method and device, computer equipment and readable storage medium
CN111898366B (en) * 2020-07-29 2022-08-09 平安科技(深圳)有限公司 Document subject word aggregation method and device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US10019515B2 (en) Attribute-based contexts for sentiment-topic pairs
CN105512245B (en) A method of enterprise's portrait is established based on regression model
Bagheri et al. Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews
US20200019611A1 (en) Topic models with sentiment priors based on distributed representations
US11727211B2 (en) Systems and methods for colearning custom syntactic expression types for suggesting next best correspondence in a communication environment
WO2013151546A1 (en) Contextually propagating semantic knowledge over large datasets
CN111090731A (en) Electric power public opinion abstract extraction optimization method and system based on topic clustering
KR20200007713A (en) Method and Apparatus for determining a topic based on sentiment analysis
CN106294863A (en) A kind of abstract method for mass text fast understanding
Harrando et al. Apples to apples: A systematic evaluation of topic models
Kozlowski et al. Clustering of semantically enriched short texts
CN112949713A (en) Text emotion classification method based on ensemble learning of complex network
CN109062895A (en) A kind of intelligent semantic processing method
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
CN111259156A (en) Hot spot clustering method facing time sequence
Bhutada et al. Semantic latent dirichlet allocation for automatic topic extraction
CN115248839A (en) Knowledge system-based long text retrieval method and device
Sharma et al. Shallow Neural Network and Ontology-Based Novel Semantic Document Indexing for Information Retrieval.
CN109117436A (en) Synonym automatic discovering method and its system based on topic model
CN108804422B (en) Scientific and technological paper text modeling method
Misra AutoNLP: NLP feature recommendations for text analytics applications
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
Can et al. Tree structured dirichlet processes for hierarchical morphological segmentation
Sarkar et al. Feature Engineering for Text Representation
Makinist et al. Preparation of improved Turkish dataset for sentiment analysis in social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190101