CN110083828A - A kind of Text Clustering Method and device - Google Patents
A kind of Text Clustering Method and device Download PDFInfo
- Publication number
- CN110083828A CN110083828A CN201910250896.1A CN201910250896A CN110083828A CN 110083828 A CN110083828 A CN 110083828A CN 201910250896 A CN201910250896 A CN 201910250896A CN 110083828 A CN110083828 A CN 110083828A
- Authority
- CN
- China
- Prior art keywords
- text
- clustered
- feature words
- vector
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The present invention relates to a kind of Text Clustering Method and device, solve the problems, such as that text cluster time length, low efficiency, effect existing for existing text cluster are poor.Text Clustering Method in the present invention is the following steps are included: acquisition data construct text library, obtain all Feature Words in the text library, the frequency occurred in all Feature Words of text library according to each Feature Words, the weight of each Feature Words is obtained, Feature Words and corresponding term weight function are saved into database;Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;According to the Feature Words in each text to be clustered and its weight in the database, the feature vector of the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered is obtained;Using the feature vector of the text to be clustered, the text to be clustered is clustered.In the present invention method can effectively shorten the text cluster time, promoted cluster efficiency, reach preferable Clustering Effect.
Description
Technical field
The present invention relates to natural language text Intellectual Analysis Technology field more particularly to a kind of Text Clustering Methods and dress
It sets.
Background technique
Text cluster is a kind of application in natural language text Intellectual Analysis Technology field, by utilizing the phase between text
The aggregation of Similar Text is realized like degree, and the analysis of generic text data is handled convenient for user.
Current Text Clustering Method mainly includes supervised learning and unsupervised learning.Wherein, supervised learning method
It needs to know in advance classification belonging to text in training set, the pass between training set text and generic is obtained by modeling
System, and then realize the classification of unknown classification text data.But the disadvantages of this method is, for being not belonging to above-mentioned classification
Text data is unable to get its generic.
On the other hand, if without labeled text data, the problems such as text classification, sentiment analysis, just only
The method of some traditional unsupervised learnings of energy, unsupervised method are largely to calculate sentence vector using term vector, then
It is clustered according to sentence similarity, is formed with the text data set of label, obtain text cluster result.But existing text
This clustering method is required to count the word frequency of Feature Words in text to be clustered every time, corresponding weight is obtained, when text to be clustered
When scale is bigger, which can extend the duration of text cluster, reduce the efficiency of text cluster;Meanwhile existing power
In re-computation method, the high Feature Words relative weighting of word frequency is also high, is unable to fully consider other spies in addition to main feature word
Influence of the word to entire text to be clustered is levied, Clustering Effect is relatively poor.
Summary of the invention
In view of above-mentioned analysis, the present invention is intended to provide a kind of Text Clustering Method and device, to solve existing text
Cluster time length, low efficiency, the problem of effect difference.
The purpose of the present invention is mainly achieved through the following technical solutions:
On the one hand, a kind of Text Clustering Method is provided, comprising the following steps:
It acquires data and constructs text library, all Feature Words in the text library are obtained, according to each Feature Words in text library
The frequency occurred in all Feature Words obtains the weight of each Feature Words, and Feature Words and corresponding term weight function are saved to number
According in library;
Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;
According to the Feature Words in each text to be clustered and its weight in the database, each Feature Words are obtained
The feature vector of term vector, the sentence vector of each text to be clustered and all texts to be clustered;
Using the feature vector of the text to be clustered, the text to be clustered is clustered.
On the basis of above scheme, the present invention has also done following improvement:
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words
Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
After further, obtaining text library or text to be clustered, the data in text library or text to be clustered are segmented,
It goes stop words to handle, obtains all Feature Words in text library or text to be clustered.
Further, the Feature Words according in each text to be clustered and its weight in the database, obtain
It is specific to execute following operation to the term vector of each Feature Words:
Using the Feature Words training word2vec model in the text to be clustered, and utilize trained described
Word2vec model obtains the corresponding term vector of each Feature Words, and the term vector of each Feature Words is expressed as v1×D, D is the sky of term vector
Between dimension.
Further, it executes following operation and obtains the sentence vector of each text to be clustered:
According to the Feature Words for including in each text to be clustered, calculate the sentence vector of each text to be clustered, wherein s-th to
Cluster text sentence vector VsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text to be clustered
I-th of term vector of this sentence;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is right in the database
The weight answered.
Further, the feature vector of all texts to be clustered in the following manner;
According to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
Further, the feature vector using the text to be clustered, clusters the text to be clustered, executes
It operates below:
To the feature vector S of the text to be clusteredN*DCarry out singular value decomposition, obtain smoothed out entire text sentence to
Moment matrix S 'N*D;
According to smoothed out entire text sentence vector matrix S 'N*D, using clustering algorithm, text to be clustered is clustered.
Further, using hierarchical clustering algorithm, the cluster of text to be clustered is realized:
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into one
Cluster;The step is repeated, the classification until realizing all vectors in text to be clustered.
On the other hand, a kind of text cluster device corresponding with above-mentioned Text Clustering Method is provided, described device includes:
Term weight function computing module constitutes text library for acquiring data, obtains all features in the text library
Word obtains the weight of each Feature Words according to the frequency that each Feature Words occur in all Feature Words of text library, by Feature Words and right
The term weight function answered is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining in each text to be clustered
Feature Words;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in institute
The weight in database is stated, the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered are obtained
Feature vector:
Text cluster module gathers the text to be clustered for the feature vector using the text to be clustered
Class.
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words
Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
The present invention has the beneficial effect that: the present invention utilizes disparate networks by acquiring a large amount of disparate networks data in advance
Data obtain a large amount of Feature Words, the weight informations of these Feature Words can Efficient Characterization its appear in it is general in general sentence
Rate, and by the weight of these Feature Words directly as the weight of term vector corresponding in text to be clustered, can effectively shorten calculating
The scale of time, text to be clustered are bigger, and the effect that the method for the present invention shortens the calculating time is more obvious.Meanwhile through the invention
The term weight function of middle method setting, the frequency of occurrences is higher, and corresponding weight is smaller, so that calculating text sentence vector to be clustered
During, the weight of main feature word is reduced, fully considers other Feature Words in addition to main feature word to entirely to poly-
The influence of class text, effectively improves Clustering Effect.
It in the present invention, can also be combined with each other between above-mentioned each technical solution, to realize more preferred assembled schemes.This
Other feature and advantage of invention will illustrate in the following description, also, certain advantages can become from specification it is aobvious and
It is clear to, or understand through the implementation of the invention.The objectives and other advantages of the invention can by specification, claims with
And it is achieved and obtained in specifically noted content in attached drawing.
Detailed description of the invention
Attached drawing is only used for showing the purpose of specific embodiment, and is not to be construed as limiting the invention, in entire attached drawing
In, identical reference symbol indicates identical component.
Fig. 1 is the Text Clustering Method flow chart in first embodiment of the invention;
Fig. 2 is the part text to be clustered in second embodiment of the invention;
Fig. 3 is the part cluster result in second embodiment of the invention;
Fig. 4 is the text cluster schematic device in third embodiment of the invention.
Specific embodiment
Specifically describing the preferred embodiment of the present invention with reference to the accompanying drawing, wherein attached drawing constitutes the application a part, and
Together with embodiments of the present invention for illustrating the principle of the present invention, it is not intended to limit the scope of the present invention.
In the first embodiment of the present invention, a kind of Text Clustering Method is disclosed, flow chart is as shown in Figure 1, include following
Step:
Step S1: the various data on acquisition network constitute text library, obtain all Feature Words in the text library, root
According to the frequency that each Feature Words occur in all Feature Words of text library, the weight of each Feature Words is obtained, by Feature Words and corresponding
Term weight function is saved into database;
Wherein, the present embodiment acquires the various network datas such as all kinds of news, encyclopaedia, store by web crawlers algorithm and constitutes
Text library, the data have that coverage is wide, data volume is big, the features such as representative, and guarantee calculates in this way
The frequency of Feature Words out can represent the frequency that Feature Words occur under general nature language environment;
After obtaining text library, the data in text library are segmented, stop words is gone to handle, is obtained all in text library
Feature Words.
Wherein, the participle refers to the means divided according to morpheme to text, mode of the present invention to word segmentation processing
With no restrictions, as long as the Feature Words in text to be clustered can be obtained.
The stop words refers to the function word of not physical meaning, such as " ", " ", " ", " the " word, by going
Stop words achievees the purpose that lifting feature word quality and this paper treatment effeciency.
After obtaining all Feature Words in text library, the frequency that each Feature Words occur in all Feature Words of text library is calculated
It is secondary:
If the frequency that certain Feature Words occur is less than frequency threshold value, indicates that few people use these words, reject such spy
Levy word;Do so the vocabulary that on the one hand can reduce vocabulary;On the other hand, when calculating sentence vector can ignore these words
Term vector prevents from influencing the expression of sentence vector since these word weights are big;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word;
It can determine that weight size of the current signature word in text library, weight are equivalent to more greatly the specific word by word frequency
Significance level in text library is bigger, otherwise significance level is smaller.
Step S2: each text to be clustered of acquisition obtains the Feature Words in each text to be clustered;
The text to be clustered acquired in the present invention, content are the subset of data in the text library.That is, guaranteeing text library
In contain each Feature Words of text to be clustered.The present invention does not do any restrictions to the concrete form of text to be clustered;It is to be clustered
Text can be the text of any subject matter, such as: the Internet news data that are obtained by web crawlers algorithm, Chinese wikipedia number
According to etc.;The file format of text to be clustered is not also required, as long as text data to be clustered can normally be read;
By participle same as described above, stop words is gone to handle, obtains all Feature Words in each text to be clustered.
Step S3: it according to the Feature Words in each text to be clustered and its weight in the database, obtains each
The feature vector of the term vector of Feature Words, the sentence vector of each text to be clustered and all texts to be clustered:
Step S31: using the Feature Words training word2vec model in the text to be clustered, and trained institute is utilized
It states word2vec model and obtains the corresponding term vector of each Feature Words;
Word2vec is a opening for generating the software tool of term vector, it passes through according to given corpus
Each of sentence word is quickly and effectively mapped to the vector with true value in D dimension space by the training pattern after optimization,
And these vectors obtain grammer, semantic feature, and core architecture includes CBOW and Skip-gram.
The term vector for each Feature Words that the present invention obtains is expressed as v1×D, D is the Spatial Dimension of term vector.
Step S32: according to the Feature Words for including in each text to be clustered, the sentence vector of each text to be clustered is calculated, wherein
S-th of text sentence vector V to be clusteredsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text to be clustered
I-th of term vector of this sentence;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is right in the database
The weight answered.
Step S33: according to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
Step S4: using the feature vector of the text to be clustered, the text to be clustered is clustered:
Step S41: to the feature vector S of the text to be clusteredN*DSingular value decomposition is carried out, is obtained smoothed out entire
Text sentence vector matrix S 'N*D;
The part main shaft of the feature vector of text to be clustered is found out by singular value decomposition, and institute is removed from feature vector
Part main shaft is stated, smooth effect is reached.
Step S42: according to smoothed out entire text sentence vector matrix S 'N*D, using hierarchical clustering algorithm, to be clustered
Text is clustered.
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into one
Cluster;The step is repeated, the classification until realizing all vectors in text to be clustered.
The present invention obtains a large amount of feature using disparate networks data by acquiring a large amount of disparate networks data in advance
Word, the weight informations of these Feature Words can Efficient Characterization its appear in the probability in general sentence, and by these Feature Words
Weight can effectively shorten directly as the weight of term vector corresponding in text to be clustered and calculate time, the rule of text to be clustered
Mould is bigger, and the effect that the method for the present invention shortens the calculating time is more obvious.Meanwhile the Feature Words of middle method setting are weighed through the invention
Weight, the frequency of occurrences is higher, and corresponding weight is smaller, so that reducing main special during calculating text sentence vector to be clustered
The weight for levying word fully considers influence of other Feature Words to entire text to be clustered in addition to main feature word, effectively mentions
Clustering Effect is risen.
In the second embodiment of the present invention, a kind of application example of Text Clustering Method is disclosed, steps are as follows:
The database for being stored with Feature Words and corresponding term weight function is obtained first with the above method;
Using web crawlers algorithm, crawl the data in Sohu's news, as the text to be clustered of the present embodiment, partially to
The content for clustering text is as shown in Figure 2;
Classified using above-mentioned Text Clustering Method to text to be clustered, obtains cluster result, part cluster result is such as
Shown in Fig. 3;
By the application example it can be proved that the Text Clustering Method in the application can be realized the cluster of Similar Text,
And cluster result is more accurate.
In the third embodiment of the present invention, provide a kind of text cluster device, schematic device as shown in figure 4, with
Above-mentioned Text Clustering Method is corresponding, and described device includes:
Term weight function computing module constitutes text library for acquiring data, obtains all features in the text library
Word obtains the weight of each Feature Words according to the frequency that each Feature Words occur in all Feature Words of text library, by Feature Words and right
The term weight function answered is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining in each text to be clustered
Feature Words;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in institute
The weight in database is stated, the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered are obtained
Feature vector:
Text cluster module gathers the text to be clustered for the feature vector using the text to be clustered
Class.
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words
Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
The specific implementation process of Installation practice is referring to above method embodiment in the present invention, and the present embodiment is herein not
It repeats again.
Since the present embodiment is identical as above method embodiment principle, so this system also has above method embodiment phase
The technical effect answered.
It will be understood by those skilled in the art that realizing all or part of the process of above-described embodiment method, meter can be passed through
Calculation machine program instruction relevant hardware is completed, and the program can be stored in computer readable storage medium.Wherein, described
Computer readable storage medium is disk, CD, read-only memory or random access memory etc..
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.
Claims (10)
1. a kind of Text Clustering Method, which comprises the following steps:
It acquires data and constructs text library, obtain all Feature Words in the text library, it is all in text library according to each Feature Words
The frequency occurred in Feature Words obtains the weight of each Feature Words, and Feature Words and corresponding term weight function are saved to database
In;
Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;
According to the Feature Words in each text to be clustered and its weight in the database, obtain the words of each Feature Words to
Amount, the feature vector of the sentence vector of each text to be clustered and all texts to be clustered;
Using the feature vector of the text to be clustered, the text to be clustered is clustered.
2. the method according to claim 1, wherein it is described according to each Feature Words in all Feature Words of text library
The frequency of appearance obtains the weight of each Feature Words, specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
3. method according to claim 1 or 2, which is characterized in that after obtaining text library or text to be clustered, to text library
Or the data in text to be clustered are segmented, stop words are gone to handle, and all features in text library or text to be clustered are obtained
Word.
4. according to the method described in claim 3, it is characterized in that, the Feature Words according in each text to be clustered and
Its weight in the database, obtains the term vector of each Feature Words, specific to execute following operation:
Using the Feature Words training word2vec model in the text to be clustered, and utilize the trained word2vec mould
Type obtains the corresponding term vector of each Feature Words, and the term vector of each Feature Words is expressed as v1×D, D is the Spatial Dimension of term vector.
5. according to the method described in claim 4, it is characterized in that, execute following operation obtain the sentence of each text to be clustered to
Amount:
According to the Feature Words for including in each text to be clustered, the sentence vector of each text to be clustered is calculated, wherein s-th to be clustered
Text sentence vector VsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text sentence to be clustered
Sub i-th of term vector;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is corresponding in the database
Weight.
6. according to the method described in claim 5, it is characterized in that, in the following manner the feature of all texts to be clustered to
Amount;
According to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
7. method according to claim 1 or 6, which is characterized in that the feature vector using the text to be clustered,
The text to be clustered is clustered, following operation is executed:
To the feature vector S of the text to be clusteredN*DSingular value decomposition is carried out, smoothed out entire text sentence moment of a vector is obtained
Battle array S 'N*D;
According to smoothed out entire text sentence vector matrix S 'N*D, using clustering algorithm, text to be clustered is clustered.
8. the method according to the description of claim 7 is characterized in that realizing the poly- of text to be clustered using hierarchical clustering algorithm
Class:
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into a cluster;
The step is repeated, the classification until realizing all vectors in text to be clustered.
9. a kind of text cluster device using any Text Clustering Method in claim 1-8, which is characterized in that described device
Include:
Term weight function computing module constitutes text library for acquiring data, obtains all Feature Words in the text library, root
According to the frequency that each Feature Words occur in all Feature Words of text library, the weight of each Feature Words is obtained, by Feature Words and corresponding
Term weight function is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining the spy in each text to be clustered
Levy word;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in the number
According to the weight in library, the spy of the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered is obtained
Levy vector:
Text cluster module clusters the text to be clustered for the feature vector using the text to be clustered.
10. device according to claim 9, which is characterized in that it is described according to each Feature Words in all Feature Words of text library
The frequency of middle appearance obtains the weight of each Feature Words, specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910250896.1A CN110083828A (en) | 2019-03-29 | 2019-03-29 | A kind of Text Clustering Method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910250896.1A CN110083828A (en) | 2019-03-29 | 2019-03-29 | A kind of Text Clustering Method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110083828A true CN110083828A (en) | 2019-08-02 |
Family
ID=67413950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910250896.1A Pending CN110083828A (en) | 2019-03-29 | 2019-03-29 | A kind of Text Clustering Method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083828A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368081A (en) * | 2020-03-03 | 2020-07-03 | 支付宝(杭州)信息技术有限公司 | Method and system for determining selected text content |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778158A (en) * | 2015-03-04 | 2015-07-15 | 新浪网技术(中国)有限公司 | Method and device for representing text |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN105005589A (en) * | 2015-06-26 | 2015-10-28 | 腾讯科技(深圳)有限公司 | Text classification method and text classification device |
CN105022840A (en) * | 2015-08-18 | 2015-11-04 | 新华网股份有限公司 | News information processing method, news recommendation method and related devices |
CN106599072A (en) * | 2016-11-21 | 2017-04-26 | 东软集团股份有限公司 | Text clustering method and device |
CN108595706A (en) * | 2018-05-10 | 2018-09-28 | 中国科学院信息工程研究所 | A kind of document semantic representation method, file classification method and device based on theme part of speech similitude |
CN109101479A (en) * | 2018-06-07 | 2018-12-28 | 苏宁易购集团股份有限公司 | A kind of clustering method and device for Chinese sentence |
CN109508456A (en) * | 2018-10-22 | 2019-03-22 | 网易(杭州)网络有限公司 | A kind of text handling method and device |
-
2019
- 2019-03-29 CN CN201910250896.1A patent/CN110083828A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778158A (en) * | 2015-03-04 | 2015-07-15 | 新浪网技术(中国)有限公司 | Method and device for representing text |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN105005589A (en) * | 2015-06-26 | 2015-10-28 | 腾讯科技(深圳)有限公司 | Text classification method and text classification device |
CN105022840A (en) * | 2015-08-18 | 2015-11-04 | 新华网股份有限公司 | News information processing method, news recommendation method and related devices |
CN106599072A (en) * | 2016-11-21 | 2017-04-26 | 东软集团股份有限公司 | Text clustering method and device |
CN108595706A (en) * | 2018-05-10 | 2018-09-28 | 中国科学院信息工程研究所 | A kind of document semantic representation method, file classification method and device based on theme part of speech similitude |
CN109101479A (en) * | 2018-06-07 | 2018-12-28 | 苏宁易购集团股份有限公司 | A kind of clustering method and device for Chinese sentence |
CN109508456A (en) * | 2018-10-22 | 2019-03-22 | 网易(杭州)网络有限公司 | A kind of text handling method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368081A (en) * | 2020-03-03 | 2020-07-03 | 支付宝(杭州)信息技术有限公司 | Method and system for determining selected text content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Text classification using novel term weighting scheme-based improved TF-IDF for internet media reports | |
CN102662931B (en) | Semantic role labeling method based on synergetic neural network | |
CN107229610A (en) | The analysis method and device of a kind of affection data | |
CN110427463A (en) | Search statement response method, device and server and storage medium | |
WO2018086401A1 (en) | Cluster processing method and device for questions in automatic question and answering system | |
Lin et al. | Deep structured scene parsing by learning with image descriptions | |
CN106897262A (en) | A kind of file classification method and device and treating method and apparatus | |
CN113961705A (en) | Text classification method and server | |
EP3799640A1 (en) | Semantic parsing of natural language query | |
CN108090178A (en) | A kind of text data analysis method, device, server and storage medium | |
CN109960791A (en) | Judge the method and storage medium, terminal of text emotion | |
CN103020167A (en) | Chinese text classification method for computer | |
CN110929028A (en) | Log classification method and device | |
CN114997288A (en) | Design resource association method | |
CN112989813A (en) | Scientific and technological resource relation extraction method and device based on pre-training language model | |
Gavval et al. | CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM | |
CN108846142A (en) | A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing | |
CN110083828A (en) | A kind of Text Clustering Method and device | |
CN110309513B (en) | Text dependency analysis method and device | |
JP2019082860A (en) | Generation program, generation method and generation device | |
CN110209895A (en) | Vector index method, apparatus and equipment | |
CN112069322B (en) | Text multi-label analysis method and device, electronic equipment and storage medium | |
Li et al. | Evaluating BERT on cloud-edge time series forecasting and sentiment analysis via prompt learning | |
CN114462673A (en) | Methods, systems, computing devices, and readable media for predicting future events | |
Xiao et al. | Domain ontology learning enhanced by optimized relation instance in dbpedia |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210729 Address after: 519000 Guangdong Zhuhai science and technology innovation coastal high beam Software Park Applicant after: YUANGANG SOFTWARE Co.,Ltd. Applicant after: Zhuhai Yuanguang Mobile Interconnection Technology Co.,Ltd. Address before: 519000 room 105-4675, No. 6, Baohua Road, Hengqin new area, Zhuhai, Guangdong Applicant before: Zhuhai Yuanguang Mobile Interconnection Technology Co.,Ltd. |