CN106991171A - Topic based on Intelligent campus information service platform finds method - Google Patents

Topic based on Intelligent campus information service platform finds method Download PDF

Info

Publication number
CN106991171A
CN106991171A CN201710216639.7A CN201710216639A CN106991171A CN 106991171 A CN106991171 A CN 106991171A CN 201710216639 A CN201710216639 A CN 201710216639A CN 106991171 A CN106991171 A CN 106991171A
Authority
CN
China
Prior art keywords
text
service platform
information service
cluster
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710216639.7A
Other languages
Chinese (zh)
Inventor
王凤领
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hezhou University
Original Assignee
Hezhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hezhou University filed Critical Hezhou University
Priority to CN201710216639.7A priority Critical patent/CN106991171A/en
Publication of CN106991171A publication Critical patent/CN106991171A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The present invention provides a kind of topic based on Intelligent campus information service platform and finds method.The topic based on Intelligent campus information service platform finds the new technology that method is related in being found for campus much-talked-about topic, analyze conventional Text Clustering Algorithm and text representation model, the extraction of participle and keyword will have been carried out to Message-text by using Words partition system, propose the Knowledge Representation Model of Message-text, by to building the research based on vector space model, determining initial cluster center and refined Hook Jeeves algorighm using the word frequency of statistical message.Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can obtain accurate public sentiment mode and important element, so as to be properly formed public opinion.Wisdom Subject Clustering process can be handled more rapid and better, and when the message number of report is larger, the precision of cluster can keep higher.

Description

Topic based on Intelligent campus information service platform finds method
Technical field
Field is found the present invention relates to a kind of hot issue, more particularly to it is a kind of based on Intelligent campus information service platform Topic finds method.
Background technology
Continuing to develop for computer network, increasingly enriches campus life.The network information turns into campus life Important component, internet has become student and obtains information, the important place of communication.
How effectively to grasp the mass data of network, extract much-talked-about topic therein, or obtain oneself desired information, Problem as the long-standing problem network user.Much-talked-about topic is found, each can be found in certain time from various information resources Cause the topic of people's extensive concern in field, facilitate student to obtain current important information, quickly grasp current information.
Find method to realize above technology therefore, it is necessary to provide a kind of topic in Intelligent campus information service platform Scheme.
The content of the invention
Method is found it is an object of the invention to provide a kind of topic based on Intelligent campus information service platform, to meet User has found the demand of the sudden much-talked-about topic in network forum in real time.
The present invention provides a kind of topic based on Intelligent campus information service platform and finds method, including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message count According to storehouse;
Step 2, Text Pretreatment is carried out to the Message-text in the database, Text Pretreatment is word segmentation processing, bag Include semantic ambiguity analysis, unregistered word extraction, keyword extraction and stop words processing;
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is only
Vertical evaluation method, the independent assessment method includes information gain, X2Statistic and document frequency algorithm, the letter Breath gain is classified by needing to calculate Feature item weighting size to text class cluster, and it is calculated by following formula (1) The Feature Words more than classification information are obtained,
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P (t), P (ci| t) belong to predefined classification c comprising Feature Words t textsi,It is the text if Feature Words t is not in text Belong to classification ciProbability, n is the quantity of text categories;
The X2Statistic is the significant levels for assessing characteristic item, and the amount of the text message carried by characteristic item passes through Quantify to quantify, its by following formula (2) come statistic,
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and during C is not feature Text, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjText outside cluster Quantity;
The document frequency algorithm is to assess feature by calculating the quantity of document including multiple documents;
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model;
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with same subject Text forms a theme storehouse together, and the main body storehouse is hot issue storehouse.
Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can be with Accurate public sentiment mode and important element are obtained, so as to be properly formed public opinion.Can be with faster and better to wisdom Subject Clustering process Ground is handled, and when the message number of report is larger, the precision of cluster can keep higher.
Brief description of the drawings
Fig. 1 is the structural representation of the campus hot issue discovery module of the present invention;
Fig. 2 has found flow chart for the topic of the present invention;
Fig. 3 is the flow chart that Fig. 2 Chinese versions are pre-processed;
Fig. 4 is the flow chart that Fig. 2 Chinese versions represent model;
Fig. 5 is the (C of the clustering algorithm of the present inventionDet)NormIt is worth test chart.
Embodiment
Please refer to Fig. 1 and Fig. 2, wherein, Fig. 1 is the structural representation of the campus hot issue discovery module of the present invention Figure, Fig. 2 has found flow chart for the topic of the present invention.A kind of topic based on Intelligent campus information service platform that the present invention is provided It was found that method, including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message count According to storehouse.
Step 2, Text Pretreatment is carried out to the Message-text in the database, text pretreatment specifically includes semantic discrimination Justice analysis, unregistered word are extracted, keyword extraction and stop words are handled.It is the stream that Fig. 2 Chinese versions are pre-processed to please refer to Fig. 3 Cheng Tu.The focus motif discovery module in campus uses ICTCLAS Words partition systems, and coarse word is filtered by given stop words, Delete modal particle, auxiliary word and conjunction, final output Chinese dictionary.
Chinese word segmentation in step 2 is used in statistical morphology, N- shortest-path methods and string matching participle method Any one is combined.
The statistical morphology be by each phrase compound word closer to each other in Chinese text, and by close to text The number of the word of each word in this is counted writes the probability of definite word to obtain.Before statistics, threshold value is set, if The frequency of combinatorics on words is more than or higher than threshold value, then two adjacent words can be combined into a word.
At present, the existing participle model based on statistics mainly has:Hidden Markov, most probable number method, channel-noise etc. Model.Participle method based on statistics must enumerate all may neighborhood word composition word, therefore the word segmentation processing time can be caused It is relatively long, to combine and use with other participle methods, and single participle method is not used as, statistics word participle can be accurate Reflect text semantic word segmentation result.
The thought that the N- shortest-path methods are split based on path.What is occurred in Chinese text in word storehouse is each Word is considered as the side for constituting path profile.Each edge is endowed the weight of edge length.N- shortest paths are split divided by edge Length value, and path profile results set by be path profile most short set.When cutting runs into identical length, by side one Play insertion path set.After the segmentation of path, the word segmentation result of Chinese text will be obtained.
The segmenting method of the string matching is the string matching of word participle, also referred to as mechanical dissection, is a kind of phase To simple segmenting method.Although method is easier to realize, neologisms are distinguished bad.In character string control, character string is found It is consistent with the word in vocabulary, it may be determined that to be a word.It can also be extended by the word in word, field noun and special name Morphology is into a participle.
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is independent assessment method, described Independent assessment method includes information gain, X2Statistic and document frequency algorithm.
Described information gain is that text class cluster is classified by needing to calculate Feature item weighting size, Feature Words institute How much text message containing classification is to be judged according to the size of the text message yield value of obtained Feature Words, so as to choose classification Feature Words more than information, its be by following formula (1) calculate obtain classification information more than Feature Words,
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P (t), P (ci| t) belong to predefined classification c comprising Feature Words t textsi,It is the text if Feature Words t is not in text Belong to classification ciProbability, n is the quantity of text categories;
The X2Statistic can assess the significant levels of characteristic item.The amount throughput of the text message carried by characteristic item Change to quantify.When statistic is big, its indicative character represents that content of text theme is comprehensive, and it passes through following formula (2) Carry out statistic,
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and during C is not feature Text, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjText outside cluster Quantity;
The document frequency algorithm is one of most basic feature evaluation method.The idea of this method is by including many Individual document calculates the quantity of document to assess feature, if to exclude characteristic item to see whether this characteristic item is counted greatly Document is included or only included by a small number of texts, then its value is too high or value it is too low be all will by for except object.
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model.Please refer to Fig. 4 and represent mould for Fig. 2 Chinese versions The flow chart of type.Focus motif discovery module represents Message-text using Knowledge Representation Model.Step is as follows:After pre-processing Word participle as feature selecting sample;The text knowledge is reduced by correlated characteristic selection rule and represents model Dimension;Weighted feature vector is calculated by calculating the weight of selected text feature;To weighted feature vector is deposited Storing up is used for follow-up clustering in database.
The model of campus focus motif discovery model considers the importance of campus message subject.However, common vector Spatial model only models the characteristic item of message report text, and this is critically important for display campus message subject.Campus message is known PK=(C, id, F can be used by knowing expression model1, wf1, F2, wf2..., Fi, wfi) represent campus message subject, wherein C message Belong to row, id is unique difference between message, field i value and its FiIt is corresponding, but wfiIts corresponding weight, represents message The value of text.
Because text data can not directly be handled by computer, text is represented as designated model first, it is allowed to count Calculation machine calculates object by clustering algorithm.The Knowledge Representation Model includes probabilistic model, Boolean Model, vector space model And language model.
Probabilistic model is based on bayesian theory.It has the advantages that to be ranked up document by probability correlation, and Result and user's request can be adjusted to realize higher accuracy rate.The model will cause huge to text cluster work Workload.Meanwhile, the model does not consider the implication of text word, therefore can reduce the accuracy of text representation.
The Boolean Model is simple text representation mode.It is a kind of based on Boolean algebra and sets theory proposition 's.It is that text is marked as 1 or 0 with the presence of identification feature, the two text spies presented simultaneously by the ratio of calculating Levy come the calculating for the similitude for calculating two message to determine.But, there is also deficiency for Boolean Model, that is to say, that Bu Ermo Type represents that the ability of document is relatively poor, eliminates the most of characteristic of document in itself, therefore often regard Boolean Model as it He compares submodel at similitude.
The similitude of the vector model can be calculated by the cos θ values between vector:
For as the vectorial document in n-dimensional space, for giving document D (t1, w1;t2, w2;...;tn, wn), wherein tiIt is the text of feature, wiCharacteristic item be content of text importance execution text, using by i characteristic items as i coordinate Axle, then wiIt is the multidimensional coordinate axle that the ratio value of respective coordinates axle, i.e. text are conceptualized as vector, sets up vector space model Committed step is to determine i characteristic item of text, and confirms the significance level of characteristic item by calculating the weight of characteristic item.
The language model is a kind of model based on probability and statistics.Language model is generally divided into two kinds of classifications:One class For for the rule syntax in linguistics, one kind is to be based on statistical language model.Statistical method is also the main flow side of language model Method, is, by being processed to a corpus, and to count the probability distribution knowledge in terms of linguistics therein, that is, obtains Linguistry included in corpus.
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with same subject Text forms a theme storehouse together, and the main body storehouse is hot issue storehouse.
Topical subject is the discovery that the elite of Text Clustering Algorithm, and text cluster obtains one by topic cluster from topic cluster New theme.Basic thought is that the topic cluster by each message report with having existed is compared.Given if similitude is higher than Determine threshold value, then message is inserted into theme cluster.Similitude is lower, then news report will rebuild theme cluster.
The clustering algorithm be partition clustering algorithm, hierarchical clustering algorithm or incremental clustering algorithm in any one or Combination.
The partition clustering algorithm is to be based on subarea clustering algorithm, it is assumed that each text can accurately be defined as a collection Close, and calculate the text each gathered and similitude with by text classification into corresponding set.Intelligent campus hot issue It is the discovery that based on K-Means algorithms to realize.K-Means clustering algorithms are k cluster centres being pre-selected, and are performed Recursive operation is to realize cluster.
K-Means algorithms are to randomly choose k initial cluster center by using traditional clustering algorithm, to the knot of cluster Fruit influences larger, to solve this problem, before clustering algorithm, collects descriptor frequency method, and then selection can split master K text of topic as algorithm initial cluster center.Comprise the following steps that:
1) title of each message article is selected to form title set { T from sample set1, T2..., Tn};
2) the n subject information extracted is divided into word, is counted for the frequency of occurrences to the word in theme.
3) after being ranked up to descriptor frequency, keyword of the selection with k word frequency of highest is special to form theme Collect { wt1, wt2..., wtk};
4) initial message sample is made up of the k group documents according to set of keywords, i.e. Di={ wi1, wi2..., win, wij It is the tagged word w includedtiJ-th of text, n is the tagged word w includedtiTextual data;
5) w is comparedtiAnd DtiSimilarity between middle remaining text, we obtain the value of n similarity, and we will obtain Their sum.Then we use the message with similitude and maximum as corresponding descriptor frequency wtiThe expression of text, K can represent text altogether;
6) threshold value is set to calculate the k similitudes between representative text and text.If it exceeds the threshold, then in two Heart point merges.If the similitude between all texts is less than threshold value, step 9);
7) kth obtained in step 21Individual tagged word, then proceedes to step 4;
8) k representative texts are finally given;
9) k representative text is initial cluster center and clustered with K-Means algorithms.
The text so selected is k initial center point of K-Means clustering algorithms, to improve the accuracy of cluster.
K-Means algorithms must confirm the quantity of cluster result cluster in advance, but actually be difficult to confirm cluster result Quantity, and the algorithm can not complete the text object that newly inserts, it is therefore desirable to according to actual use demand and other two kinds of sides A kind of combination in method or other methods is calculated.
The hierarchical clustering algorithm is a kind of clustering algorithm that text categories are divided into appropriate level, and appropriate level will be with Text type is changed and changed.It can be divided to two major classes according to the direction of cluster:It is fractionized and from lower floor to upper from upper strata to lower floor Layer combination.
The incremental clustering algorithm be Single-pass algorithms, be using first text as initial cluster center, and with Other text similarities are compared, and similarity is higher than the preset value for the text being inserted into cluster, when similarity is low, and it can be automatic Create a new cluster centre.The algorithm has substantial amounts of news report sequence, and the influence of different input sequences has to cluster Certain influence.
Incremental clustering algorithm adapts to new samples of text, and asking for new text object can not be solved by solving K-Means algorithms Topic, according to the news briefing time, inputs algorithm by Message-text collection successively, is to be formed dynamically clustering cluster among cluster, in advance Elder generation simultaneously is not required to confirm initial classes number of clusters, applied to processing online information, is generally used for online topic detection.
The topic based on Intelligent campus information service platform that the present invention is provided finds method by by focus motif discovery Algorithm is compared to verify the correctness of cluster result with Text Clustering Algorithm, and following experimental data is analyzed.It is first right Data test index is accordingly introduced.
Assuming that the intelligence sample total amount tested is n, then to a topic i, there are a and topic i phases in n sample The straight news reporting of pass, the message for belonging to topic i measured by topic clustering algorithm just has m, then finds the standard in m Really belonging to topic i message has b, then the probability of the correct message of algorithm omission is shown in shown in formula (4):
Then, the algorithm is just by among 100 message mistake report clusters to topic i, and error detection defines the general of algorithm Shown in rate such as formula (5).
For tracking system and topic detection expense, calculate shown in standard such as equation (6).
The C in formula (5)MissThe effect produced by accurate category topic i straight news reporting, C are then omitted for algorithmFaIt is then The straight news reporting that will not belong to topic i is grouped into effect produced in i, if accurately message many to the greatest extent should be referred to words by experiment Inscribe among i, among up to this purpose, the message for being much not belonging to topic i can be also referred to by system in the lump, therefore, experiment is false Make CMissInfluence it is of a relatively high, and CFaInfluence it is relatively low, if CMiss=1.0, CFa=0.1.PTargetAnd PNon-targetIt is The coefficient obtained based on past substantial amounts of cluster experiment, PTarget=0.02, PNon-target=0.98, (CDet)NormValue get over It is small, show that the accuracy of algorithm is better.
Refer to (Cs of the Fig. 5 for the clustering algorithm of the present inventionDet)NormIt is worth test chart.Single-pass algorithms in figure (CDet)NormValue, the contrast histogram of three kinds of algorithms of K-Means algorithms and Intelligent campus Subject Clustering algorithm, pass through experiment process Sample size increase, the K-Means algorithms of Intelligent campus motif discovery algorithm and the C of Single-pass algorithmsDetValue also exists Increase.This shows that the clustering precision of algorithm is reduced with the quantity of input sample.When test post 100, K-Means is calculated Method and the cluster correctness of Single-pass algorithms difference be not obvious, when institute's test post increases to 800, Single-pass Algorithm is obviously more accurate than K-Means algorithm, mainly due to the influence of K-Means algorithms and initial cluster center, works as survey When the text time of examination is longer, it is difficult to randomly choose suitable k center, the campus focus algorithm of wisdom solves this problem, institute With (CDet)NormThe impacted sample size of value is little.
Compared with correlation technique, the topic based on Intelligent campus information service platform that the present invention is provided finds that method can be with Accurate public sentiment mode and important element are obtained, so as to be properly formed public opinion.Can be with faster and better to wisdom Subject Clustering process Ground is handled, and when the message number of report is larger, the precision of cluster can keep higher.
Embodiments of the invention are the foregoing is only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (7)

1. a kind of topic based on Intelligent campus information service platform finds method, it is characterised in that including:
Step 1, intelligent campus information service platform is set up, the message collection of campus theme on the internet forms message data Storehouse;
Step 2, Text Pretreatment is carried out to the Message-text in the database, Text Pretreatment is word segmentation processing, including language Adopted ambiguity analysis, unregistered word are extracted, keyword extraction and stop words are handled;
Step 3, feature extraction is carried out to pretreated text, its Text character extraction is independent assessment method, the independence Evaluation method includes information gain, X2Statistic and document frequency algorithm, described information gain calculate characteristic item power by needing It is great it is small text class cluster is classified, it is that the Feature Words obtained more than classification information are calculated by following formula (1),
I G ( t ) = - Σ i = 1 n P ( c i ) log P ( c i ) + P ( t ) Σ i = 1 n P ( c i | t ) log P ( c i | t ) + P ( t ‾ ) Σ i = 1 n P ( c i | t ‾ ) log P ( c i | t ‾ ) - - - ( 1 )
Wherein, text collection represents classification ciProbability be P (ci), and text collection represents that Feature Words t probability is P (t), P (ci| t) belong to predefined classification c comprising Feature Words t textsi,It is that text belongs to if Feature Words t is not in text Classification ciProbability, n is the quantity of text categories;
The X2Statistic is the significant levels for assessing characteristic item, and the amount of the text message carried by characteristic item is by quantifying To quantify, its by following formula (2) come statistic,
X 2 ( t i , C j ) = N × ( A × D - C × B ) 2 ( A + C ) × ( B + D ) × ( A + B ) × ( C + D ) - - - ( 2 )
Wherein, N is the number of the text extracted, CjIt is cluster, A is CjIn text number, and C is not the text in feature This, B is characteristic item tiThe number C of outside textjCluster, and D is not in characteristic item tiIn CjThe number of text outside cluster Amount;
The document frequency algorithm is to assess feature by calculating the quantity of document including multiple documents;
Step 4, the Feature Words of extraction are appointed as Knowledge Representation Model;
Step 5, model is represented by computer by clustering algorithm calculates object for text knowledge, with the text of same subject A theme storehouse is formed together, and the main body storehouse is hot issue storehouse.
2. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step Rapid 2 Chinese word segmentation use statistical morphology, N- shortest-path methods and string matching participle method in any one or Combination.
3. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step Knowledge Representation Model in rapid 4 includes probabilistic model, Boolean Model, vector space model and language model.
4. the topic according to claim 3 based on Intelligent campus information service platform finds method, it is characterised in that institute Stating the similitude of vector model can be calculated by the cos θ values between vector:
S i m ( D 1 , D 2 ) = cos θ = Σ k = 1 n w 1 k × w 2 k ( Σ k = 1 n w 1 k 2 ) ( Σ k = 1 n w 2 k 2 ) - - - ( 3 )
For as the vectorial document in n-dimensional space, for giving document D (t1, w1;t2, w2;...;tn, wn), wherein tiIt is The text of feature, wiCharacteristic item be content of text importance execution text, using by i characteristic items as i reference axis, Then wiIt is the multidimensional coordinate axle that the ratio value of respective coordinates axle, i.e. text are conceptualized as vector, sets up the pass of vector space model Key step is to determine i characteristic item of text, and confirms the significance level of characteristic item by calculating the weight of characteristic item.
5. the topic according to claim 1 based on Intelligent campus information service platform finds method, it is characterised in that step Rapid 5 clustering algorithm is any one in partition clustering algorithm, hierarchical clustering algorithm or incremental clustering algorithm or combination.
6. the topic according to claim 5 based on Intelligent campus information service platform finds method, it is characterised in that institute Partition clustering algorithm is stated for K-Means algorithms, is k cluster centre being pre-selected, and it is poly- to realize to perform recursive operation Class.
7. the topic according to claim 5 based on Intelligent campus information service platform finds method, it is characterised in that institute Incremental clustering algorithm is stated for Single-pass algorithms, be using first text as initial cluster center, and with other text phases Compared like property, similarity is higher than the preset value for the text being inserted into cluster, when similarity is low, it can automatically create one newly Cluster centre.
CN201710216639.7A 2017-03-25 2017-03-25 Topic based on Intelligent campus information service platform finds method Pending CN106991171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710216639.7A CN106991171A (en) 2017-03-25 2017-03-25 Topic based on Intelligent campus information service platform finds method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710216639.7A CN106991171A (en) 2017-03-25 2017-03-25 Topic based on Intelligent campus information service platform finds method

Publications (1)

Publication Number Publication Date
CN106991171A true CN106991171A (en) 2017-07-28

Family

ID=59415240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710216639.7A Pending CN106991171A (en) 2017-03-25 2017-03-25 Topic based on Intelligent campus information service platform finds method

Country Status (1)

Country Link
CN (1) CN106991171A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system
CN109102903A (en) * 2018-07-09 2018-12-28 康美药业股份有限公司 A kind of topic prediction technique and system for health consultation platform
CN111339784A (en) * 2020-03-06 2020-06-26 支付宝(杭州)信息技术有限公司 Automatic new topic mining method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140040370A1 (en) * 2012-08-01 2014-02-06 Tagged, Inc. Content feed for facilitating topic discovery in social networking environments
US20160110446A1 (en) * 2013-12-02 2016-04-21 Qbase, LLC Method for disambiguated features in unstructured text

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140040370A1 (en) * 2012-08-01 2014-02-06 Tagged, Inc. Content feed for facilitating topic discovery in social networking environments
US20160110446A1 (en) * 2013-12-02 2016-04-21 Qbase, LLC Method for disambiguated features in unstructured text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
全增权: ""基于移动互联网的智慧校园应用研究"", 《中国优秀硕士学位论文全文数据库—信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system
CN109102903A (en) * 2018-07-09 2018-12-28 康美药业股份有限公司 A kind of topic prediction technique and system for health consultation platform
CN111339784A (en) * 2020-03-06 2020-06-26 支付宝(杭州)信息技术有限公司 Automatic new topic mining method and system

Similar Documents

Publication Publication Date Title
CN108376151B (en) Question classification method and device, computer equipment and storage medium
CN110825877A (en) Semantic similarity analysis method based on text clustering
CN106294593B (en) In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
CN104346379B (en) A kind of data element recognition methods of logic-based and statistical technique
CN107577785A (en) A kind of level multi-tag sorting technique suitable for law identification
WO2022126810A1 (en) Text clustering method
CN109241530A (en) A kind of more classification methods of Chinese text based on N-gram vector sum convolutional neural networks
CN106294344A (en) Video retrieval method and device
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN107145514B (en) Chinese sentence pattern classification method based on decision tree and SVM mixed model
CN110232395A (en) A kind of fault diagnosis method of electric power system based on failure Chinese text
CN110222250B (en) Microblog-oriented emergency trigger word identification method
CN111597328B (en) New event theme extraction method
CN111581967B (en) News theme event detection method combining LW2V with triple network
CN112732916A (en) BERT-based multi-feature fusion fuzzy text classification model
CN108959305A (en) A kind of event extraction method and system based on internet big data
CN110472203B (en) Article duplicate checking and detecting method, device, equipment and storage medium
CN110413791A (en) File classification method based on CNN-SVM-KNN built-up pattern
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN109960727A (en) For the individual privacy information automatic testing method and system of non-structured text
CN105609116A (en) Speech emotional dimensions region automatic recognition method
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
CN106528527A (en) Identification method and identification system for out of vocabularies
CN111144106A (en) Two-stage text feature selection method under unbalanced data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170728