CN106991171A - Topic based on Intelligent campus information service platform finds method - Google Patents
Topic based on Intelligent campus information service platform finds method Download PDFInfo
- Publication number
- CN106991171A CN106991171A CN201710216639.7A CN201710216639A CN106991171A CN 106991171 A CN106991171 A CN 106991171A CN 201710216639 A CN201710216639 A CN 201710216639A CN 106991171 A CN106991171 A CN 106991171A
- Authority
- CN
- China
- Prior art keywords
- text
- service platform
- information service
- cluster
- message
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Abstract
The present invention provides a kind of topic based on Intelligent campus information service platform and finds method.The topic based on Intelligent campus information service platform finds the new technology that method is related in being found for campus much-talked-about topic, analyze conventional Text Clustering Algorithm and text representation model, the extraction of participle and keyword will have been carried out to Message-text by using Words partition system, propose the Knowledge Representation Model of Message-text, by to building the research based on vector space model, determining initial cluster center and refined Hook Jeeves algorighm using the word frequency of statistical message.Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can obtain accurate public sentiment mode and important element, so as to be properly formed public opinion.Wisdom Subject Clustering process can be handled more rapid and better, and when the message number of report is larger, the precision of cluster can keep higher.
Description
Technical field
Field is found the present invention relates to a kind of hot issue, more particularly to it is a kind of based on Intelligent campus information service platform
Topic finds method.
Background technology
Continuing to develop for computer network, increasingly enriches campus life.The network information turns into campus life
Important component, internet has become student and obtains information, the important place of communication.
How effectively to grasp the mass data of network, extract much-talked-about topic therein, or obtain oneself desired information,
Problem as the long-standing problem network user.Much-talked-about topic is found, each can be found in certain time from various information resources
Cause the topic of people's extensive concern in field, facilitate student to obtain current important information, quickly grasp current information.
Find method to realize above technology therefore, it is necessary to provide a kind of topic in Intelligent campus information service platform
Scheme.
The content of the invention
Method is found it is an object of the invention to provide a kind of topic based on Intelligent campus information service platform, to meet
User has found the demand of the sudden much-talked-about topic in network forum in real time.
The present invention provides a kind of topic based on Intelligent campus information service platform and finds method, including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message count
According to storehouse;
Step 2, Text Pretreatment is carried out to the Message-text in the database, Text Pretreatment is word segmentation processing, bag
Include semantic ambiguity analysis, unregistered word extraction, keyword extraction and stop words processing;
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is only
Vertical evaluation method, the independent assessment method includes information gain, X2Statistic and document frequency algorithm, the letter
Breath gain is classified by needing to calculate Feature item weighting size to text class cluster, and it is calculated by following formula (1)
The Feature Words more than classification information are obtained,
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P
(t), P (ci| t) belong to predefined classification c comprising Feature Words t textsi,It is the text if Feature Words t is not in text
Belong to classification ciProbability, n is the quantity of text categories;
The X2Statistic is the significant levels for assessing characteristic item, and the amount of the text message carried by characteristic item passes through
Quantify to quantify, its by following formula (2) come statistic,
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and during C is not feature
Text, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjText outside cluster
Quantity;
The document frequency algorithm is to assess feature by calculating the quantity of document including multiple documents;
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model;
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with same subject
Text forms a theme storehouse together, and the main body storehouse is hot issue storehouse.
Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can be with
Accurate public sentiment mode and important element are obtained, so as to be properly formed public opinion.Can be with faster and better to wisdom Subject Clustering process
Ground is handled, and when the message number of report is larger, the precision of cluster can keep higher.
Brief description of the drawings
Fig. 1 is the structural representation of the campus hot issue discovery module of the present invention;
Fig. 2 has found flow chart for the topic of the present invention;
Fig. 3 is the flow chart that Fig. 2 Chinese versions are pre-processed;
Fig. 4 is the flow chart that Fig. 2 Chinese versions represent model;
Fig. 5 is the (C of the clustering algorithm of the present inventionDet)NormIt is worth test chart.
Embodiment
Please refer to Fig. 1 and Fig. 2, wherein, Fig. 1 is the structural representation of the campus hot issue discovery module of the present invention
Figure, Fig. 2 has found flow chart for the topic of the present invention.A kind of topic based on Intelligent campus information service platform that the present invention is provided
It was found that method, including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message count
According to storehouse.
Step 2, Text Pretreatment is carried out to the Message-text in the database, text pretreatment specifically includes semantic discrimination
Justice analysis, unregistered word are extracted, keyword extraction and stop words are handled.It is the stream that Fig. 2 Chinese versions are pre-processed to please refer to Fig. 3
Cheng Tu.The focus motif discovery module in campus uses ICTCLAS Words partition systems, and coarse word is filtered by given stop words,
Delete modal particle, auxiliary word and conjunction, final output Chinese dictionary.
Chinese word segmentation in step 2 is used in statistical morphology, N- shortest-path methods and string matching participle method
Any one is combined.
The statistical morphology be by each phrase compound word closer to each other in Chinese text, and by close to text
The number of the word of each word in this is counted writes the probability of definite word to obtain.Before statistics, threshold value is set, if
The frequency of combinatorics on words is more than or higher than threshold value, then two adjacent words can be combined into a word.
At present, the existing participle model based on statistics mainly has:Hidden Markov, most probable number method, channel-noise etc.
Model.Participle method based on statistics must enumerate all may neighborhood word composition word, therefore the word segmentation processing time can be caused
It is relatively long, to combine and use with other participle methods, and single participle method is not used as, statistics word participle can be accurate
Reflect text semantic word segmentation result.
The thought that the N- shortest-path methods are split based on path.What is occurred in Chinese text in word storehouse is each
Word is considered as the side for constituting path profile.Each edge is endowed the weight of edge length.N- shortest paths are split divided by edge
Length value, and path profile results set by be path profile most short set.When cutting runs into identical length, by side one
Play insertion path set.After the segmentation of path, the word segmentation result of Chinese text will be obtained.
The segmenting method of the string matching is the string matching of word participle, also referred to as mechanical dissection, is a kind of phase
To simple segmenting method.Although method is easier to realize, neologisms are distinguished bad.In character string control, character string is found
It is consistent with the word in vocabulary, it may be determined that to be a word.It can also be extended by the word in word, field noun and special name
Morphology is into a participle.
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is independent assessment method, described
Independent assessment method includes information gain, X2Statistic and document frequency algorithm.
Described information gain is that text class cluster is classified by needing to calculate Feature item weighting size, Feature Words institute
How much text message containing classification is to be judged according to the size of the text message yield value of obtained Feature Words, so as to choose classification
Feature Words more than information, its be by following formula (1) calculate obtain classification information more than Feature Words,
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P
(t), P (ci| t) belong to predefined classification c comprising Feature Words t textsi,It is the text if Feature Words t is not in text
Belong to classification ciProbability, n is the quantity of text categories;
The X2Statistic can assess the significant levels of characteristic item.The amount throughput of the text message carried by characteristic item
Change to quantify.When statistic is big, its indicative character represents that content of text theme is comprehensive, and it passes through following formula (2)
Carry out statistic,
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and during C is not feature
Text, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjText outside cluster
Quantity;
The document frequency algorithm is one of most basic feature evaluation method.The idea of this method is by including many
Individual document calculates the quantity of document to assess feature, if to exclude characteristic item to see whether this characteristic item is counted greatly
Document is included or only included by a small number of texts, then its value is too high or value it is too low be all will by for except object.
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model.Please refer to Fig. 4 and represent mould for Fig. 2 Chinese versions
The flow chart of type.Focus motif discovery module represents Message-text using Knowledge Representation Model.Step is as follows:After pre-processing
Word participle as feature selecting sample;The text knowledge is reduced by correlated characteristic selection rule and represents model
Dimension;Weighted feature vector is calculated by calculating the weight of selected text feature;To weighted feature vector is deposited
Storing up is used for follow-up clustering in database.
The model of campus focus motif discovery model considers the importance of campus message subject.However, common vector
Spatial model only models the characteristic item of message report text, and this is critically important for display campus message subject.Campus message is known
PK=(C, id, F can be used by knowing expression model1, wf1, F2, wf2..., Fi, wfi) represent campus message subject, wherein C message
Belong to row, id is unique difference between message, field i value and its FiIt is corresponding, but wfiIts corresponding weight, represents message
The value of text.
Because text data can not directly be handled by computer, text is represented as designated model first, it is allowed to count
Calculation machine calculates object by clustering algorithm.The Knowledge Representation Model includes probabilistic model, Boolean Model, vector space model
And language model.
Probabilistic model is based on bayesian theory.It has the advantages that to be ranked up document by probability correlation, and
Result and user's request can be adjusted to realize higher accuracy rate.The model will cause huge to text cluster work
Workload.Meanwhile, the model does not consider the implication of text word, therefore can reduce the accuracy of text representation.
The Boolean Model is simple text representation mode.It is a kind of based on Boolean algebra and sets theory proposition
's.It is that text is marked as 1 or 0 with the presence of identification feature, the two text spies presented simultaneously by the ratio of calculating
Levy come the calculating for the similitude for calculating two message to determine.But, there is also deficiency for Boolean Model, that is to say, that Bu Ermo
Type represents that the ability of document is relatively poor, eliminates the most of characteristic of document in itself, therefore often regard Boolean Model as it
He compares submodel at similitude.
The similitude of the vector model can be calculated by the cos θ values between vector:
For as the vectorial document in n-dimensional space, for giving document D (t1, w1;t2, w2;...;tn, wn), wherein
tiIt is the text of feature, wiCharacteristic item be content of text importance execution text, using by i characteristic items as i coordinate
Axle, then wiIt is the multidimensional coordinate axle that the ratio value of respective coordinates axle, i.e. text are conceptualized as vector, sets up vector space model
Committed step is to determine i characteristic item of text, and confirms the significance level of characteristic item by calculating the weight of characteristic item.
The language model is a kind of model based on probability and statistics.Language model is generally divided into two kinds of classifications:One class
For for the rule syntax in linguistics, one kind is to be based on statistical language model.Statistical method is also the main flow side of language model
Method, is, by being processed to a corpus, and to count the probability distribution knowledge in terms of linguistics therein, that is, obtains
Linguistry included in corpus.
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with same subject
Text forms a theme storehouse together, and the main body storehouse is hot issue storehouse.
Topical subject is the discovery that the elite of Text Clustering Algorithm, and text cluster obtains one by topic cluster from topic cluster
New theme.Basic thought is that the topic cluster by each message report with having existed is compared.Given if similitude is higher than
Determine threshold value, then message is inserted into theme cluster.Similitude is lower, then news report will rebuild theme cluster.
The clustering algorithm be partition clustering algorithm, hierarchical clustering algorithm or incremental clustering algorithm in any one or
Combination.
The partition clustering algorithm is to be based on subarea clustering algorithm, it is assumed that each text can accurately be defined as a collection
Close, and calculate the text each gathered and similitude with by text classification into corresponding set.Intelligent campus hot issue
It is the discovery that based on K-Means algorithms to realize.K-Means clustering algorithms are k cluster centres being pre-selected, and are performed
Recursive operation is to realize cluster.
K-Means algorithms are to randomly choose k initial cluster center by using traditional clustering algorithm, to the knot of cluster
Fruit influences larger, to solve this problem, before clustering algorithm, collects descriptor frequency method, and then selection can split master
K text of topic as algorithm initial cluster center.Comprise the following steps that:
1) title of each message article is selected to form title set { T from sample set1, T2..., Tn};
2) the n subject information extracted is divided into word, is counted for the frequency of occurrences to the word in theme.
3) after being ranked up to descriptor frequency, keyword of the selection with k word frequency of highest is special to form theme
Collect { wt1, wt2..., wtk};
4) initial message sample is made up of the k group documents according to set of keywords, i.e. Di={ wi1, wi2..., win, wij
It is the tagged word w includedtiJ-th of text, n is the tagged word w includedtiTextual data;
5) w is comparedtiAnd DtiSimilarity between middle remaining text, we obtain the value of n similarity, and we will obtain
Their sum.Then we use the message with similitude and maximum as corresponding descriptor frequency wtiThe expression of text,
K can represent text altogether;
6) threshold value is set to calculate the k similitudes between representative text and text.If it exceeds the threshold, then in two
Heart point merges.If the similitude between all texts is less than threshold value, step 9);
7) kth obtained in step 21Individual tagged word, then proceedes to step 4;
8) k representative texts are finally given;
9) k representative text is initial cluster center and clustered with K-Means algorithms.
The text so selected is k initial center point of K-Means clustering algorithms, to improve the accuracy of cluster.
K-Means algorithms must confirm the quantity of cluster result cluster in advance, but actually be difficult to confirm cluster result
Quantity, and the algorithm can not complete the text object that newly inserts, it is therefore desirable to according to actual use demand and other two kinds of sides
A kind of combination in method or other methods is calculated.
The hierarchical clustering algorithm is a kind of clustering algorithm that text categories are divided into appropriate level, and appropriate level will be with
Text type is changed and changed.It can be divided to two major classes according to the direction of cluster:It is fractionized and from lower floor to upper from upper strata to lower floor
Layer combination.
The incremental clustering algorithm be Single-pass algorithms, be using first text as initial cluster center, and with
Other text similarities are compared, and similarity is higher than the preset value for the text being inserted into cluster, when similarity is low, and it can be automatic
Create a new cluster centre.The algorithm has substantial amounts of news report sequence, and the influence of different input sequences has to cluster
Certain influence.
Incremental clustering algorithm adapts to new samples of text, and asking for new text object can not be solved by solving K-Means algorithms
Topic, according to the news briefing time, inputs algorithm by Message-text collection successively, is to be formed dynamically clustering cluster among cluster, in advance
Elder generation simultaneously is not required to confirm initial classes number of clusters, applied to processing online information, is generally used for online topic detection.
The topic based on Intelligent campus information service platform that the present invention is provided finds method by by focus motif discovery
Algorithm is compared to verify the correctness of cluster result with Text Clustering Algorithm, and following experimental data is analyzed.It is first right
Data test index is accordingly introduced.
Assuming that the intelligence sample total amount tested is n, then to a topic i, there are a and topic i phases in n sample
The straight news reporting of pass, the message for belonging to topic i measured by topic clustering algorithm just has m, then finds the standard in m
Really belonging to topic i message has b, then the probability of the correct message of algorithm omission is shown in shown in formula (4):
Then, the algorithm is just by among 100 message mistake report clusters to topic i, and error detection defines the general of algorithm
Shown in rate such as formula (5).
For tracking system and topic detection expense, calculate shown in standard such as equation (6).
The C in formula (5)MissThe effect produced by accurate category topic i straight news reporting, C are then omitted for algorithmFaIt is then
The straight news reporting that will not belong to topic i is grouped into effect produced in i, if accurately message many to the greatest extent should be referred to words by experiment
Inscribe among i, among up to this purpose, the message for being much not belonging to topic i can be also referred to by system in the lump, therefore, experiment is false
Make CMissInfluence it is of a relatively high, and CFaInfluence it is relatively low, if CMiss=1.0, CFa=0.1.PTargetAnd PNon-targetIt is
The coefficient obtained based on past substantial amounts of cluster experiment, PTarget=0.02, PNon-target=0.98, (CDet)NormValue get over
It is small, show that the accuracy of algorithm is better.
Refer to (Cs of the Fig. 5 for the clustering algorithm of the present inventionDet)NormIt is worth test chart.Single-pass algorithms in figure
(CDet)NormValue, the contrast histogram of three kinds of algorithms of K-Means algorithms and Intelligent campus Subject Clustering algorithm, pass through experiment process
Sample size increase, the K-Means algorithms of Intelligent campus motif discovery algorithm and the C of Single-pass algorithmsDetValue also exists
Increase.This shows that the clustering precision of algorithm is reduced with the quantity of input sample.When test post 100, K-Means is calculated
Method and the cluster correctness of Single-pass algorithms difference be not obvious, when institute's test post increases to 800, Single-pass
Algorithm is obviously more accurate than K-Means algorithm, mainly due to the influence of K-Means algorithms and initial cluster center, works as survey
When the text time of examination is longer, it is difficult to randomly choose suitable k center, the campus focus algorithm of wisdom solves this problem, institute
With (CDet)NormThe impacted sample size of value is little.
Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can be with
Accurate public sentiment mode and important element are obtained, so as to be properly formed public opinion.Can be with faster and better to wisdom Subject Clustering process
Ground is handled, and when the message number of report is larger, the precision of cluster can keep higher.
Embodiments of the invention are the foregoing is only, are not intended to limit the scope of the invention, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (7)
1. a kind of topic based on Intelligent campus information service platform finds method, it is characterised in that including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message data
Storehouse;
Step 2, Text Pretreatment is carried out to the Message-text in the database, Text Pretreatment is word segmentation processing, including language
Adopted ambiguity analysis, unregistered word are extracted, keyword extraction and stop words are handled;
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is independent assessment method, the independence
Evaluation method includes information gain, X2Statistic and document frequency algorithm, described information gain calculate characteristic item power by needing
It is great it is small text class cluster is classified, it is that the Feature Words obtained more than classification information are calculated by following formula (1),
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P (t), P
(ci| t) belong to predefined classification c comprising Feature Words t textsi,It is that text belongs to if Feature Words t is not in text
Classification ciProbability, n is the quantity of text categories;
The X2Statistic is the significant levels for assessing characteristic item, and the amount of the text message carried by characteristic item is by quantifying
To quantify, its by following formula (2) come statistic,
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and C is not the text in feature
This, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjThe number of text outside cluster
Amount;
The document frequency algorithm is to assess feature by calculating the quantity of document including multiple documents;
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model;
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with the text of same subject
A theme storehouse is formed together, and the main body storehouse is hot issue storehouse.
2. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step
Rapid 2 Chinese word segmentation use statistical morphology, N- shortest-path methods and string matching participle method in any one or
Combination.
3. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step
Knowledge Representation Model in rapid 4 includes probabilistic model, Boolean Model, vector space model and language model.
4. the topic according to claim 3 based on Intelligent campus information service platform finds method, it is characterised in that institute
Stating the similitude of vector model can be calculated by the cos θ values between vector:
For as the vectorial document in n-dimensional space, for giving document D (t1, w1;t2, w2;...;tn, wn), wherein tiIt is
The text of feature, wiCharacteristic item be content of text importance execution text, using by i characteristic items as i reference axis,
Then wiIt is the multidimensional coordinate axle that the ratio value of respective coordinates axle, i.e. text are conceptualized as vector, sets up the pass of vector space model
Key step is to determine i characteristic item of text, and confirms the significance level of characteristic item by calculating the weight of characteristic item.
5. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step
Rapid 5 clustering algorithm is any one in partition clustering algorithm, hierarchical clustering algorithm or incremental clustering algorithm or combination.
6. the topic according to claim 5 based on Intelligent campus information service platform finds method, it is characterised in that institute
Partition clustering algorithm is stated for K-Means algorithms, is k cluster centre being pre-selected, and it is poly- to realize to perform recursive operation
Class.
7. the topic according to claim 5 based on Intelligent campus information service platform finds method, it is characterised in that institute
Incremental clustering algorithm is stated for Single-pass algorithms, be using first text as initial cluster center, and with other text phases
Compared like property, similarity is higher than the preset value for the text being inserted into cluster, when similarity is low, it can automatically create one newly
Cluster centre.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710216639.7A CN106991171A (en) | 2017-03-25 | 2017-03-25 | Topic based on Intelligent campus information service platform finds method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710216639.7A CN106991171A (en) | 2017-03-25 | 2017-03-25 | Topic based on Intelligent campus information service platform finds method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106991171A true CN106991171A (en) | 2017-07-28 |
Family
ID=59415240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710216639.7A Pending CN106991171A (en) | 2017-03-25 | 2017-03-25 | Topic based on Intelligent campus information service platform finds method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991171A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862070A (en) * | 2017-11-22 | 2018-03-30 | 华南理工大学 | Online class based on text cluster discusses the instant group technology of short text and system |
CN109102903A (en) * | 2018-07-09 | 2018-12-28 | 康美药业股份有限公司 | A kind of topic prediction technique and system for health consultation platform |
CN111339784A (en) * | 2020-03-06 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Automatic new topic mining method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040370A1 (en) * | 2012-08-01 | 2014-02-06 | Tagged, Inc. | Content feed for facilitating topic discovery in social networking environments |
US20160110446A1 (en) * | 2013-12-02 | 2016-04-21 | Qbase, LLC | Method for disambiguated features in unstructured text |
-
2017
- 2017-03-25 CN CN201710216639.7A patent/CN106991171A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040370A1 (en) * | 2012-08-01 | 2014-02-06 | Tagged, Inc. | Content feed for facilitating topic discovery in social networking environments |
US20160110446A1 (en) * | 2013-12-02 | 2016-04-21 | Qbase, LLC | Method for disambiguated features in unstructured text |
Non-Patent Citations (1)
Title |
---|
全增权: ""基于移动互联网的智慧校园应用研究"", 《中国优秀硕士学位论文全文数据库—信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862070A (en) * | 2017-11-22 | 2018-03-30 | 华南理工大学 | Online class based on text cluster discusses the instant group technology of short text and system |
CN109102903A (en) * | 2018-07-09 | 2018-12-28 | 康美药业股份有限公司 | A kind of topic prediction technique and system for health consultation platform |
CN111339784A (en) * | 2020-03-06 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Automatic new topic mining method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108376151B (en) | Question classification method and device, computer equipment and storage medium | |
CN110825877A (en) | Semantic similarity analysis method based on text clustering | |
CN106294593B (en) | In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study | |
CN103207913B (en) | The acquisition methods of commercial fine granularity semantic relation and system | |
CN104346379B (en) | A kind of data element recognition methods of logic-based and statistical technique | |
CN107577785A (en) | A kind of level multi-tag sorting technique suitable for law identification | |
WO2022126810A1 (en) | Text clustering method | |
CN109241530A (en) | A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks | |
CN106294344A (en) | Video retrieval method and device | |
CN109670039A (en) | Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering | |
CN107145514B (en) | Chinese sentence pattern classification method based on decision tree and SVM mixed model | |
CN110232395A (en) | A kind of fault diagnosis method of electric power system based on failure Chinese text | |
CN110222250B (en) | Microblog-oriented emergency trigger word identification method | |
CN111597328B (en) | New event theme extraction method | |
CN111581967B (en) | News theme event detection method combining LW2V with triple network | |
CN112732916A (en) | BERT-based multi-feature fusion fuzzy text classification model | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
CN110472203B (en) | Article duplicate checking and detecting method, device, equipment and storage medium | |
CN110413791A (en) | File classification method based on CNN-SVM-KNN built-up pattern | |
CN113254643B (en) | Text classification method and device, electronic equipment and text classification program | |
CN109960727A (en) | For the individual privacy information automatic testing method and system of non-structured text | |
CN105609116A (en) | Speech emotional dimensions region automatic recognition method | |
CN102880631A (en) | Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method | |
CN106528527A (en) | Identification method and identification system for out of vocabularies | |
CN111144106A (en) | Two-stage text feature selection method under unbalanced data set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170728 |