CN110083828A - A kind of Text Clustering Method and device - Google Patents

A kind of Text Clustering Method and device Download PDF

Info

Publication number
CN110083828A
CN110083828A CN201910250896.1A CN201910250896A CN110083828A CN 110083828 A CN110083828 A CN 110083828A CN 201910250896 A CN201910250896 A CN 201910250896A CN 110083828 A CN110083828 A CN 110083828A
Authority
CN
China
Prior art keywords
text
clustered
feature words
vector
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910250896.1A
Other languages
Chinese (zh)
Inventor
王晓琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Yuanguang Mobile Interconnection Technology Co.,Ltd.
Yuanguang Software Co Ltd
Original Assignee
Zhuhai Yuanguang Mobile Interconnection Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Yuanguang Mobile Interconnection Technology Co Ltd filed Critical Zhuhai Yuanguang Mobile Interconnection Technology Co Ltd
Priority to CN201910250896.1A priority Critical patent/CN110083828A/en
Publication of CN110083828A publication Critical patent/CN110083828A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention relates to a kind of Text Clustering Method and device, solve the problems, such as that text cluster time length, low efficiency, effect existing for existing text cluster are poor.Text Clustering Method in the present invention is the following steps are included: acquisition data construct text library, obtain all Feature Words in the text library, the frequency occurred in all Feature Words of text library according to each Feature Words, the weight of each Feature Words is obtained, Feature Words and corresponding term weight function are saved into database;Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;According to the Feature Words in each text to be clustered and its weight in the database, the feature vector of the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered is obtained;Using the feature vector of the text to be clustered, the text to be clustered is clustered.In the present invention method can effectively shorten the text cluster time, promoted cluster efficiency, reach preferable Clustering Effect.

Description

A kind of Text Clustering Method and device
Technical field
The present invention relates to natural language text Intellectual Analysis Technology field more particularly to a kind of Text Clustering Methods and dress It sets.
Background technique
Text cluster is a kind of application in natural language text Intellectual Analysis Technology field, by utilizing the phase between text The aggregation of Similar Text is realized like degree, and the analysis of generic text data is handled convenient for user.
Current Text Clustering Method mainly includes supervised learning and unsupervised learning.Wherein, supervised learning method It needs to know in advance classification belonging to text in training set, the pass between training set text and generic is obtained by modeling System, and then realize the classification of unknown classification text data.But the disadvantages of this method is, for being not belonging to above-mentioned classification Text data is unable to get its generic.
On the other hand, if without labeled text data, the problems such as text classification, sentiment analysis, just only The method of some traditional unsupervised learnings of energy, unsupervised method are largely to calculate sentence vector using term vector, then It is clustered according to sentence similarity, is formed with the text data set of label, obtain text cluster result.But existing text This clustering method is required to count the word frequency of Feature Words in text to be clustered every time, corresponding weight is obtained, when text to be clustered When scale is bigger, which can extend the duration of text cluster, reduce the efficiency of text cluster;Meanwhile existing power In re-computation method, the high Feature Words relative weighting of word frequency is also high, is unable to fully consider other spies in addition to main feature word Influence of the word to entire text to be clustered is levied, Clustering Effect is relatively poor.
Summary of the invention
In view of above-mentioned analysis, the present invention is intended to provide a kind of Text Clustering Method and device, to solve existing text Cluster time length, low efficiency, the problem of effect difference.
The purpose of the present invention is mainly achieved through the following technical solutions:
On the one hand, a kind of Text Clustering Method is provided, comprising the following steps:
It acquires data and constructs text library, all Feature Words in the text library are obtained, according to each Feature Words in text library The frequency occurred in all Feature Words obtains the weight of each Feature Words, and Feature Words and corresponding term weight function are saved to number According in library;
Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;
According to the Feature Words in each text to be clustered and its weight in the database, each Feature Words are obtained The feature vector of term vector, the sentence vector of each text to be clustered and all texts to be clustered;
Using the feature vector of the text to be clustered, the text to be clustered is clustered.
On the basis of above scheme, the present invention has also done following improvement:
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
After further, obtaining text library or text to be clustered, the data in text library or text to be clustered are segmented, It goes stop words to handle, obtains all Feature Words in text library or text to be clustered.
Further, the Feature Words according in each text to be clustered and its weight in the database, obtain It is specific to execute following operation to the term vector of each Feature Words:
Using the Feature Words training word2vec model in the text to be clustered, and utilize trained described Word2vec model obtains the corresponding term vector of each Feature Words, and the term vector of each Feature Words is expressed as v1×D, D is the sky of term vector Between dimension.
Further, it executes following operation and obtains the sentence vector of each text to be clustered:
According to the Feature Words for including in each text to be clustered, calculate the sentence vector of each text to be clustered, wherein s-th to Cluster text sentence vector VsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text to be clustered I-th of term vector of this sentence;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is right in the database The weight answered.
Further, the feature vector of all texts to be clustered in the following manner;
According to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
Further, the feature vector using the text to be clustered, clusters the text to be clustered, executes It operates below:
To the feature vector S of the text to be clusteredN*DCarry out singular value decomposition, obtain smoothed out entire text sentence to Moment matrix S 'N*D
According to smoothed out entire text sentence vector matrix S 'N*D, using clustering algorithm, text to be clustered is clustered.
Further, using hierarchical clustering algorithm, the cluster of text to be clustered is realized:
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into one Cluster;The step is repeated, the classification until realizing all vectors in text to be clustered.
On the other hand, a kind of text cluster device corresponding with above-mentioned Text Clustering Method is provided, described device includes:
Term weight function computing module constitutes text library for acquiring data, obtains all features in the text library Word obtains the weight of each Feature Words according to the frequency that each Feature Words occur in all Feature Words of text library, by Feature Words and right The term weight function answered is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining in each text to be clustered Feature Words;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in institute The weight in database is stated, the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered are obtained Feature vector:
Text cluster module gathers the text to be clustered for the feature vector using the text to be clustered Class.
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
The present invention has the beneficial effect that: the present invention utilizes disparate networks by acquiring a large amount of disparate networks data in advance Data obtain a large amount of Feature Words, the weight informations of these Feature Words can Efficient Characterization its appear in it is general in general sentence Rate, and by the weight of these Feature Words directly as the weight of term vector corresponding in text to be clustered, can effectively shorten calculating The scale of time, text to be clustered are bigger, and the effect that the method for the present invention shortens the calculating time is more obvious.Meanwhile through the invention The term weight function of middle method setting, the frequency of occurrences is higher, and corresponding weight is smaller, so that calculating text sentence vector to be clustered During, the weight of main feature word is reduced, fully considers other Feature Words in addition to main feature word to entirely to poly- The influence of class text, effectively improves Clustering Effect.
It in the present invention, can also be combined with each other between above-mentioned each technical solution, to realize more preferred assembled schemes.This Other feature and advantage of invention will illustrate in the following description, also, certain advantages can become from specification it is aobvious and It is clear to, or understand through the implementation of the invention.The objectives and other advantages of the invention can by specification, claims with And it is achieved and obtained in specifically noted content in attached drawing.
Detailed description of the invention
Attached drawing is only used for showing the purpose of specific embodiment, and is not to be construed as limiting the invention, in entire attached drawing In, identical reference symbol indicates identical component.
Fig. 1 is the Text Clustering Method flow chart in first embodiment of the invention;
Fig. 2 is the part text to be clustered in second embodiment of the invention;
Fig. 3 is the part cluster result in second embodiment of the invention;
Fig. 4 is the text cluster schematic device in third embodiment of the invention.
Specific embodiment
Specifically describing the preferred embodiment of the present invention with reference to the accompanying drawing, wherein attached drawing constitutes the application a part, and Together with embodiments of the present invention for illustrating the principle of the present invention, it is not intended to limit the scope of the present invention.
In the first embodiment of the present invention, a kind of Text Clustering Method is disclosed, flow chart is as shown in Figure 1, include following Step:
Step S1: the various data on acquisition network constitute text library, obtain all Feature Words in the text library, root According to the frequency that each Feature Words occur in all Feature Words of text library, the weight of each Feature Words is obtained, by Feature Words and corresponding Term weight function is saved into database;
Wherein, the present embodiment acquires the various network datas such as all kinds of news, encyclopaedia, store by web crawlers algorithm and constitutes Text library, the data have that coverage is wide, data volume is big, the features such as representative, and guarantee calculates in this way The frequency of Feature Words out can represent the frequency that Feature Words occur under general nature language environment;
After obtaining text library, the data in text library are segmented, stop words is gone to handle, is obtained all in text library Feature Words.
Wherein, the participle refers to the means divided according to morpheme to text, mode of the present invention to word segmentation processing With no restrictions, as long as the Feature Words in text to be clustered can be obtained.
The stop words refers to the function word of not physical meaning, such as " ", " ", " ", " the " word, by going Stop words achievees the purpose that lifting feature word quality and this paper treatment effeciency.
After obtaining all Feature Words in text library, the frequency that each Feature Words occur in all Feature Words of text library is calculated It is secondary:
If the frequency that certain Feature Words occur is less than frequency threshold value, indicates that few people use these words, reject such spy Levy word;Do so the vocabulary that on the one hand can reduce vocabulary;On the other hand, when calculating sentence vector can ignore these words Term vector prevents from influencing the expression of sentence vector since these word weights are big;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word;
It can determine that weight size of the current signature word in text library, weight are equivalent to more greatly the specific word by word frequency Significance level in text library is bigger, otherwise significance level is smaller.
Step S2: each text to be clustered of acquisition obtains the Feature Words in each text to be clustered;
The text to be clustered acquired in the present invention, content are the subset of data in the text library.That is, guaranteeing text library In contain each Feature Words of text to be clustered.The present invention does not do any restrictions to the concrete form of text to be clustered;It is to be clustered Text can be the text of any subject matter, such as: the Internet news data that are obtained by web crawlers algorithm, Chinese wikipedia number According to etc.;The file format of text to be clustered is not also required, as long as text data to be clustered can normally be read;
By participle same as described above, stop words is gone to handle, obtains all Feature Words in each text to be clustered.
Step S3: it according to the Feature Words in each text to be clustered and its weight in the database, obtains each The feature vector of the term vector of Feature Words, the sentence vector of each text to be clustered and all texts to be clustered:
Step S31: using the Feature Words training word2vec model in the text to be clustered, and trained institute is utilized It states word2vec model and obtains the corresponding term vector of each Feature Words;
Word2vec is a opening for generating the software tool of term vector, it passes through according to given corpus Each of sentence word is quickly and effectively mapped to the vector with true value in D dimension space by the training pattern after optimization, And these vectors obtain grammer, semantic feature, and core architecture includes CBOW and Skip-gram.
The term vector for each Feature Words that the present invention obtains is expressed as v1×D, D is the Spatial Dimension of term vector.
Step S32: according to the Feature Words for including in each text to be clustered, the sentence vector of each text to be clustered is calculated, wherein S-th of text sentence vector V to be clusteredsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text to be clustered I-th of term vector of this sentence;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is right in the database The weight answered.
Step S33: according to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
Step S4: using the feature vector of the text to be clustered, the text to be clustered is clustered:
Step S41: to the feature vector S of the text to be clusteredN*DSingular value decomposition is carried out, is obtained smoothed out entire Text sentence vector matrix S 'N*D
The part main shaft of the feature vector of text to be clustered is found out by singular value decomposition, and institute is removed from feature vector Part main shaft is stated, smooth effect is reached.
Step S42: according to smoothed out entire text sentence vector matrix S 'N*D, using hierarchical clustering algorithm, to be clustered Text is clustered.
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into one Cluster;The step is repeated, the classification until realizing all vectors in text to be clustered.
The present invention obtains a large amount of feature using disparate networks data by acquiring a large amount of disparate networks data in advance Word, the weight informations of these Feature Words can Efficient Characterization its appear in the probability in general sentence, and by these Feature Words Weight can effectively shorten directly as the weight of term vector corresponding in text to be clustered and calculate time, the rule of text to be clustered Mould is bigger, and the effect that the method for the present invention shortens the calculating time is more obvious.Meanwhile the Feature Words of middle method setting are weighed through the invention Weight, the frequency of occurrences is higher, and corresponding weight is smaller, so that reducing main special during calculating text sentence vector to be clustered The weight for levying word fully considers influence of other Feature Words to entire text to be clustered in addition to main feature word, effectively mentions Clustering Effect is risen.
In the second embodiment of the present invention, a kind of application example of Text Clustering Method is disclosed, steps are as follows:
The database for being stored with Feature Words and corresponding term weight function is obtained first with the above method;
Using web crawlers algorithm, crawl the data in Sohu's news, as the text to be clustered of the present embodiment, partially to The content for clustering text is as shown in Figure 2;
Classified using above-mentioned Text Clustering Method to text to be clustered, obtains cluster result, part cluster result is such as Shown in Fig. 3;
By the application example it can be proved that the Text Clustering Method in the application can be realized the cluster of Similar Text, And cluster result is more accurate.
In the third embodiment of the present invention, provide a kind of text cluster device, schematic device as shown in figure 4, with Above-mentioned Text Clustering Method is corresponding, and described device includes:
Term weight function computing module constitutes text library for acquiring data, obtains all features in the text library Word obtains the weight of each Feature Words according to the frequency that each Feature Words occur in all Feature Words of text library, by Feature Words and right The term weight function answered is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining in each text to be clustered Feature Words;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in institute The weight in database is stated, the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered are obtained Feature vector:
Text cluster module gathers the text to be clustered for the feature vector using the text to be clustered Class.
Further, the frequency occurred in all Feature Words of text library according to each Feature Words, obtains each Feature Words Weight is specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
The specific implementation process of Installation practice is referring to above method embodiment in the present invention, and the present embodiment is herein not It repeats again.
Since the present embodiment is identical as above method embodiment principle, so this system also has above method embodiment phase The technical effect answered.
It will be understood by those skilled in the art that realizing all or part of the process of above-described embodiment method, meter can be passed through Calculation machine program instruction relevant hardware is completed, and the program can be stored in computer readable storage medium.Wherein, described Computer readable storage medium is disk, CD, read-only memory or random access memory etc..
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.

Claims (10)

1. a kind of Text Clustering Method, which comprises the following steps:
It acquires data and constructs text library, obtain all Feature Words in the text library, it is all in text library according to each Feature Words The frequency occurred in Feature Words obtains the weight of each Feature Words, and Feature Words and corresponding term weight function are saved to database In;
Each text to be clustered is acquired, the Feature Words in each text to be clustered are obtained;
According to the Feature Words in each text to be clustered and its weight in the database, obtain the words of each Feature Words to Amount, the feature vector of the sentence vector of each text to be clustered and all texts to be clustered;
Using the feature vector of the text to be clustered, the text to be clustered is clustered.
2. the method according to claim 1, wherein it is described according to each Feature Words in all Feature Words of text library The frequency of appearance obtains the weight of each Feature Words, specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
3. method according to claim 1 or 2, which is characterized in that after obtaining text library or text to be clustered, to text library Or the data in text to be clustered are segmented, stop words are gone to handle, and all features in text library or text to be clustered are obtained Word.
4. according to the method described in claim 3, it is characterized in that, the Feature Words according in each text to be clustered and Its weight in the database, obtains the term vector of each Feature Words, specific to execute following operation:
Using the Feature Words training word2vec model in the text to be clustered, and utilize the trained word2vec mould Type obtains the corresponding term vector of each Feature Words, and the term vector of each Feature Words is expressed as v1×D, D is the Spatial Dimension of term vector.
5. according to the method described in claim 4, it is characterized in that, execute following operation obtain the sentence of each text to be clustered to Amount:
According to the Feature Words for including in each text to be clustered, the sentence vector of each text to be clustered is calculated, wherein s-th to be clustered Text sentence vector VsIt is expressed as follows:
Wherein, NsIndicate the term vector number for including in s-th of text sentence to be clustered;vs,iIndicate s-th of text sentence to be clustered Sub i-th of term vector;ws,iThe weight for indicating s-th of sentence, i-th of term vector is that the specific word is corresponding in the database Weight.
6. according to the method described in claim 5, it is characterized in that, in the following manner the feature of all texts to be clustered to Amount;
According to the sentence vector of sentence each in text to be clustered, the feature vector S of text to be clustered is constructedN*D:
SN*D=[V1,V2...,VN]T (2)
Wherein, N indicates the quantity of all text sentences to be clustered, and D indicates the dimension of sentence vector, with the dimension of term vector.
7. method according to claim 1 or 6, which is characterized in that the feature vector using the text to be clustered, The text to be clustered is clustered, following operation is executed:
To the feature vector S of the text to be clusteredN*DSingular value decomposition is carried out, smoothed out entire text sentence moment of a vector is obtained Battle array S 'N*D
According to smoothed out entire text sentence vector matrix S 'N*D, using clustering algorithm, text to be clustered is clustered.
8. the method according to the description of claim 7 is characterized in that realizing the poly- of text to be clustered using hierarchical clustering algorithm Class:
By vector matrix S 'N*DIn each vector as an individual cluster;
The COS distance between different clusters is calculated, the sentence vector that the COS distance is less than certain threshold value is merged into a cluster; The step is repeated, the classification until realizing all vectors in text to be clustered.
9. a kind of text cluster device using any Text Clustering Method in claim 1-8, which is characterized in that described device Include:
Term weight function computing module constitutes text library for acquiring data, obtains all Feature Words in the text library, root According to the frequency that each Feature Words occur in all Feature Words of text library, the weight of each Feature Words is obtained, by Feature Words and corresponding Term weight function is saved into database;
Text feature word to be clustered obtains module, for acquiring each text to be clustered and obtaining the spy in each text to be clustered Levy word;
Text eigenvector to be clustered obtains module, for according to the Feature Words in each text to be clustered and its in the number According to the weight in library, the spy of the term vector of each Feature Words, the sentence vector of each text to be clustered and all texts to be clustered is obtained Levy vector:
Text cluster module clusters the text to be clustered for the feature vector using the text to be clustered.
10. device according to claim 9, which is characterized in that it is described according to each Feature Words in all Feature Words of text library The frequency of middle appearance obtains the weight of each Feature Words, specific to execute following operation:
If the frequency that Feature Words occur is less than frequency threshold value, such Feature Words are rejected;
By the inverse of remaining each Feature Words word frequency, as the corresponding weight of individual features word.
CN201910250896.1A 2019-03-29 2019-03-29 A kind of Text Clustering Method and device Pending CN110083828A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910250896.1A CN110083828A (en) 2019-03-29 2019-03-29 A kind of Text Clustering Method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910250896.1A CN110083828A (en) 2019-03-29 2019-03-29 A kind of Text Clustering Method and device

Publications (1)

Publication Number Publication Date
CN110083828A true CN110083828A (en) 2019-08-02

Family

ID=67413950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910250896.1A Pending CN110083828A (en) 2019-03-29 2019-03-29 A kind of Text Clustering Method and device

Country Status (1)

Country Link
CN (1) CN110083828A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368081A (en) * 2020-03-03 2020-07-03 支付宝(杭州)信息技术有限公司 Method and system for determining selected text content

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778158A (en) * 2015-03-04 2015-07-15 新浪网技术(中国)有限公司 Method and device for representing text
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
CN105022840A (en) * 2015-08-18 2015-11-04 新华网股份有限公司 News information processing method, news recommendation method and related devices
CN106599072A (en) * 2016-11-21 2017-04-26 东软集团股份有限公司 Text clustering method and device
CN108595706A (en) * 2018-05-10 2018-09-28 中国科学院信息工程研究所 A kind of document semantic representation method, file classification method and device based on theme part of speech similitude
CN109101479A (en) * 2018-06-07 2018-12-28 苏宁易购集团股份有限公司 A kind of clustering method and device for Chinese sentence
CN109508456A (en) * 2018-10-22 2019-03-22 网易(杭州)网络有限公司 A kind of text handling method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778158A (en) * 2015-03-04 2015-07-15 新浪网技术(中国)有限公司 Method and device for representing text
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
CN105022840A (en) * 2015-08-18 2015-11-04 新华网股份有限公司 News information processing method, news recommendation method and related devices
CN106599072A (en) * 2016-11-21 2017-04-26 东软集团股份有限公司 Text clustering method and device
CN108595706A (en) * 2018-05-10 2018-09-28 中国科学院信息工程研究所 A kind of document semantic representation method, file classification method and device based on theme part of speech similitude
CN109101479A (en) * 2018-06-07 2018-12-28 苏宁易购集团股份有限公司 A kind of clustering method and device for Chinese sentence
CN109508456A (en) * 2018-10-22 2019-03-22 网易(杭州)网络有限公司 A kind of text handling method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368081A (en) * 2020-03-03 2020-07-03 支付宝(杭州)信息技术有限公司 Method and system for determining selected text content

Similar Documents

Publication Publication Date Title
Jiang et al. Text classification using novel term weighting scheme-based improved TF-IDF for internet media reports
CN102662931B (en) Semantic role labeling method based on synergetic neural network
CN107229610A (en) The analysis method and device of a kind of affection data
CN110427463A (en) Search statement response method, device and server and storage medium
WO2018086401A1 (en) Cluster processing method and device for questions in automatic question and answering system
Lin et al. Deep structured scene parsing by learning with image descriptions
CN106897262A (en) A kind of file classification method and device and treating method and apparatus
CN113961705A (en) Text classification method and server
EP3799640A1 (en) Semantic parsing of natural language query
CN108090178A (en) A kind of text data analysis method, device, server and storage medium
CN109960791A (en) Judge the method and storage medium, terminal of text emotion
CN103020167A (en) Chinese text classification method for computer
CN110929028A (en) Log classification method and device
CN114997288A (en) Design resource association method
CN112989813A (en) Scientific and technological resource relation extraction method and device based on pre-training language model
Gavval et al. CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM
CN108846142A (en) A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
CN110083828A (en) A kind of Text Clustering Method and device
CN110309513B (en) Text dependency analysis method and device
JP2019082860A (en) Generation program, generation method and generation device
CN110209895A (en) Vector index method, apparatus and equipment
CN112069322B (en) Text multi-label analysis method and device, electronic equipment and storage medium
Li et al. Evaluating BERT on cloud-edge time series forecasting and sentiment analysis via prompt learning
CN114462673A (en) Methods, systems, computing devices, and readable media for predicting future events
Xiao et al. Domain ontology learning enhanced by optimized relation instance in dbpedia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210729

Address after: 519000 Guangdong Zhuhai science and technology innovation coastal high beam Software Park

Applicant after: YUANGANG SOFTWARE Co.,Ltd.

Applicant after: Zhuhai Yuanguang Mobile Interconnection Technology Co.,Ltd.

Address before: 519000 room 105-4675, No. 6, Baohua Road, Hengqin new area, Zhuhai, Guangdong

Applicant before: Zhuhai Yuanguang Mobile Interconnection Technology Co.,Ltd.