CN103345471B - A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix - Google Patents
A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix Download PDFInfo
- Publication number
- CN103345471B CN103345471B CN201310217406.0A CN201310217406A CN103345471B CN 103345471 B CN103345471 B CN 103345471B CN 201310217406 A CN201310217406 A CN 201310217406A CN 103345471 B CN103345471 B CN 103345471B
- Authority
- CN
- China
- Prior art keywords
- text
- matrix
- word
- manifold
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The accessible text exhibiting method decomposed based on multiple manifold incidence matrix, after the Internet captures web page text, proceed as follows for text: first text is carried out participle, extract text statistical nature information, including word frequency and reverse document frequency, form the TF IDF vectorization character representation of text;Then building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes the duality considered between text and word, it is thus achieved that text representation and the word of low-dimensional represent;Finally the low-dimensional to text represents and clusters, and the text of same or like theme is divided into one group, the most again represents text message.Advantage of the process is that and can preferably help people with disability user to divide theme to browse the text message on the Internet, and quickly show the web page text set of same theme, strengthen the Experience Degree of user.
Description
Technical field
The present invention relates to the technical field of accessible text exhibiting method, be based particularly on multiple manifold
The accessible text exhibiting method that incidence matrix decomposes.
Background technology
China's large population base, composition colony is in variation feature, and important colony therein is residual
The total amount of disease people has reached 85,000,000, is to build a harmonious society and in developing national economy
Important force, Ye Shi governments at all levels and the colony of all kinds of groups emphasis helping.According to people with a disability in China
The statistical report form of community finds, the data of all kinds of people with disabilitys are in the past few decades in year by year
Ascendant trend.In the information age of big data-driven, increasing people with disability utilizes the most just
Prompt the Internet obtains the information resources of daily studying and living, becomes very important in netizen
Colony.In this huge information sharing platform of the Internet, text media occupies information and represents
Overwhelming ratio, such as topical news, sports reports, book review film review etc. the overwhelming majority letter
Breath presents to people with disability user by textual form.Comparing ordinary people, many people with disabilitys are due to body
Body or the various defects of psychology and be difficult to effectively browse required info web, and on the Internet
Text message a feast for the eyes, therefore be badly in need of invent a kind of clog-free text exhibiting method, side
Just the text message on the Internet is read by people with disability colony.
It is known that the info web tissue provided on all kinds of websites is loose, lacks and concentrate classification
Management, and people with disability user's only interesting web page text reading some particular topic, this makes
Become text abundant information mixed and disorderly and people with disability reads the contradiction between webpage difficulty interested.Special
Not for those hearing losss people or extremity disabled persons, search on the internet and read net
The step of page text message is more time-consuming, easily causes and feels exhausted and spirit sleepiness.If energy
Text message in all kinds of webpages is quickly put in little set according to theme, further in accordance with difference
Theme is presented to people with disability user, is beneficial to alleviate web page text and reads pressure, improves text
Reading efficiency and the Experience Degree of people with disability user.
At information retrieval and Data Mining, it is based primarily upon the cosine similarity of web page text also
Carry out the cluster of text on this basis, form the text collection of all kinds of theme.To webpage literary composition
Shelves carry out after the TF-IDF feature extraction dyad of text represents, according to text and word it
Between relation of interdependence, use the clustering algorithms such as k-means in data mining, can be by net
Page text is divided into multiple different subclass according to different themes and presents to user.
Summary of the invention
In order to help people with disability user can browse the web page text of same subject quickly and easily,
To improve the Experience Degree of text reading, the present invention proposes one and divides based on multiple manifold incidence matrix
The accessible text exhibiting method solved, the method comprises the following steps:
1, capture after web page text from the Internet, carry out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency with reverse
Document frequency, forms the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes
Consider the duality between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one
Group, represents text message the most again.
2, step 1) described in extraction text statistical nature information comprise the concrete steps that:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction,
I.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document
Frequency), if the word occurred in text has m, then formed the TF-IDF of m dimension to
Quantization characteristic represents;
1.2) the TF-IDF character representation to all texts carries out unified normalized.
3, step 2) described in structure some texts manifold and word manifold comprise the concrete steps that:
2.1) manifold structure can reflect the intrinsic structure of data, and it is by figure Laplce's square
Battle array builds, and text manifold and word manifold can reflect text data and word data respectively
Intrinsic structure;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtain n net
Page text, the character representation of i-th text isThe character representation of jth text isWill
The summit on non-directed graph regarded as by each text, if the Euclidean distance of two texts is relatively near, then in phase
Connect a limit between the summit answered and give limit weight, so can set up a reflection textual data
Non-directed graph according to manifold structure;Associated weights composition size between each text is the weight of n × n
Matrix Ws, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on,
DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain text
Figure Laplce's matrix Ls;
2.3) figure Laplce's matrix L of some texts is builts, connected by giving in non-directed graph
Different weights W of edge fitsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight,
Cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, between i.e. two summits
Boundless connection, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, i.e.
Jian You limit, two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent
Vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein
| | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging
Different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word,
The character representation dimension of each word is n, and the character representation of i-th word isJth list
The character representation of word isEach word is regarded as the summit on non-directed graph, if two words
Euclidean distance is relatively near, then connect a limit between corresponding summit and give limit weight, so may be used
To set up the non-directed graph of a reflection word data manifold structure;Associated weights group between each word
The weight matrix W becoming size to be m × mf, to WfEvery column element cumulative successively and be placed on right
Angle matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through
Lf=Df-WfObtain figure Laplce's matrix L of wordf;
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar is some with structure
Figure Laplce's matrix L of textsIdentical.
4, step 2) described in based on multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, each
The character representation of text is matrix column vector, then full text forms a dimension is m × n
Data matrix Xs;The word of composition text has m, and these words relate to cfIndividual theme, often
The feature epi-position of individual word is matrix column vector, then all one dimension of word formation is
The data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension is
(n+m) incidence matrix of × (n+m) Wherein O represents full null matrix,
Its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.The biggest
Little for m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsIt it is text
Low-dimensional represent, size is cf×csMatrix SfWord data for compression represent;Similarly,
The data matrix of word is resolved into three parts, i.e.Wherein size is cs×cf
Matrix SsText data for compression represents;So, available size is
(n+m)×(cf+cs) association low-dimensional representing matrix Wherein O represents complete
Null matrix, its dimension is determined by text and the number of word and involved number of topics;Also may be used
To obtain size for (cf+cs)×(cf+cs) association low-dimensional representing matrix
Wherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms,
I.e.WithBuild the association that q size is (n+m) × (n+m)
Manifold matrix, then i-th association manifold matrix table is shown as Wherein O represents
Full null matrix, its dimension is determined by the number of text and word;For preferably approaching to reality
Data manifold, gives each manifold one weight coefficient μi> 0, forms the linear of multiple manifold
Combination, i.e. And meet condition
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace,
Regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoided
Matching;The text low-dimensional obtained by solving this object function represents, it is possible to approach urtext
The intrinsic structure of data, and keep text data and the local geometry of word data simultaneously,
Make the text distance of same subject as close possible to.
The present invention proposes the accessible text exhibiting method decomposed based on multiple manifold incidence matrix,
Have an advantage in that: utilize the duality of text and word, the statistical nature of text is represented and carries out
Clustering processing, so that similar text presents with packet mode;It is applicable to all types of webpage
Text message, it is not necessary to backstage manual operation, can be used for helping people with disability to realize accessible webpage literary composition
This reading is it can also be used to help domestic consumer to improve text reading efficiency.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Detailed description of the invention
Referring to the drawings, the present invention is further illustrated:
A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix, the method bag
Include following steps:
1, capture after text from the Internet, carry out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency with reverse
Document frequency, forms the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes
Consider the duality between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one
Group, represents text message the most again.
Step 1) described in extract the comprising the concrete steps that of text statistical nature information:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction,
I.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document
Frequency), if the word occurred in text has m, then formed the TF-IDF of m dimension to
Quantization characteristic represents;
1.2) the TF-IDF character representation to all texts carries out unified normalized.
Step 2) described in build comprising the concrete steps that of some text manifolds and word manifold:
2.1) manifold structure can reflect the intrinsic structure of data, and it is by figure Laplce's square
Battle array builds, and text manifold and word manifold can reflect text data and word data respectively
Intrinsic structure;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtain n net
Page text, the character representation of i-th text isThe character representation of jth text isWill
The summit on non-directed graph regarded as by each text, if the Euclidean distance of two texts is relatively near, then in phase
Connect a limit between the summit answered and give limit weight, so can set up a reflection textual data
Non-directed graph according to manifold structure;Associated weights composition size between each text is the weight of n × n
Matrix Ws, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on,
DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain text
Figure Laplce's matrix Ls;
2.3) figure Laplce's matrix L of some texts is builts, connected by giving in non-directed graph
Different weights W of edge fitsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight,
Cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, between i.e. two summits
Boundless connection, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, i.e.
Jian You limit, two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent
Vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein
| | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging
Different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word,
The character representation dimension of each word is n, and the character representation of i-th word isJth list
The character representation of word isEach word is regarded as the summit on non-directed graph, if two words
Euclidean distance is relatively near, then connect a limit between corresponding summit and give limit weight, so may be used
To set up the non-directed graph of a reflection word data manifold structure;Associated weights group between each word
The weight matrix W becoming size to be m × mf, to WfEvery column element cumulative successively and be placed on right
Angle matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through
Lf=Df-WfObtain figure Laplce's matrix L of wordf;
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar is some with structure
Figure Laplce's matrix L of textsIdentical.
Step 2) described in based on multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, each
The character representation of text is matrix column vector, then full text forms a dimension is m × n
Data matrix Xs;The word of composition text has m, and these words relate to cfIndividual theme, often
The feature epi-position of individual word is matrix column vector, then all one dimension of word formation is
The data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension is
(n+m) incidence matrix of × (n+m) Wherein O represents full null matrix,
Its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.The biggest
Little for m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsIt it is text
Low-dimensional represent, size is cf×csMatrix SfWord data for compression represent;Similarly,
The data matrix of word is resolved into three parts, i.e.Wherein size is cs×cf
Matrix SsText data for compression represents;So, available size is
(n+m)×(cf+cs) association low-dimensional representing matrix Wherein O represents complete
Null matrix, its dimension is determined by text and the number of word and involved number of topics;Also may be used
To obtain size for (cf+cs)×(cf+cs) association low-dimensional representing matrix
Wherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms,
I.e.WithBuild the association that q size is (n+m) × (n+m)
Manifold matrix, then i-th association manifold matrix table is shown as Wherein O represents
Full null matrix, its dimension is determined by the number of text and word;For preferably approaching to reality
Data manifold, gives each manifold one weight coefficient μi> 0, forms the linear of multiple manifold
Combination, i.e. And meet condition
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace,
Regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoided
Matching;The text low-dimensional obtained by solving this object function represents, it is possible to approach urtext
The intrinsic structure of data, and keep text data and the local geometry of word data simultaneously,
Make the text distance of same subject as close possible to.
Content described in this specification embodiment is only enumerating of the way of realization to inventive concept, this
The protection domain of invention be not construed as being only limitted to the concrete form that embodiment is stated, this
Invention protection domain also and in those skilled in the art according to present inventive concept institute it is conceivable that
Equivalent technologies means.
Claims (1)
1. the accessible text exhibiting method decomposed based on multiple manifold incidence matrix, the method is characterized in that and capture after web page text from the Internet, carries out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency and reverse document frequency, form the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes the duality considered between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one group, the most again represents text message;
Described step 1) in extract the comprising the concrete steps that of text statistical nature information:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction, i.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document Frequency), if the word occurred in text has m, then form the TF-IDF vectorization character representation of m dimension;
1.2) the TF-IDF character representation to all texts carries out unified normalized;
Described step 2) in build comprising the concrete steps that of some text manifolds and word manifold:
2.1) manifold structure can reflect the intrinsic structure of data, and it is built by figure Laplacian Matrix, and text manifold and word manifold can reflect the intrinsic structure of text data and word data respectively;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtaining n web page text, the character representation of i-th text isThe character representation of jth text isEach text is regarded as the summit on non-directed graph, if the Euclidean distance of two texts is relatively near, then between corresponding summit, connects a limit and give limit weight, so can set up the non-directed graph of a reflection text data manifold structure;The weight matrix W that associated weights composition size is n × n between each texts, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on, DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain figure Laplce's matrix L of texts;
2.3) figure Laplce's matrix L of some texts is builts, by giving different weights W on connected limit in non-directed graphsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight, cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, boundless connection between i.e. two summits, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, Jian You limit, i.e. two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein | | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word, the character representation dimension of each word is n, and the character representation of i-th word isThe character representation of jth word isEach word is regarded as the summit on non-directed graph, if the Euclidean distance of two words is relatively near, then between corresponding summit, connects a limit and give limit weight, so can set up the non-directed graph of a reflection word data manifold structure;The weight matrix W that associated weights composition size is m × m between each wordf, to WfEvery column element cumulative successively and be placed on diagonal matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through Lf=Df-WfObtain figure Laplce's matrix L of wordf;
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar and the figure Laplce's matrix L building some textssIdentical;
Step 2) in multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, the character representation of each text is matrix column vector, then one dimension of full text formation is the data matrix X of m × ns;The word of composition text has m, and these words relate to cfIndividual theme, the feature epi-position of each word is matrix column vector, then all one dimension of word formation is the data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension for (
The incidence matrix of n+m) × (n+m)Wherein O represents full null matrix, and its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.Wherein size is m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsBeing that the low-dimensional of text represents, size is cf×csMatrix SfWord data for compression represent;Similarly, the data matrix of word is resolved into three parts, i.e.Wherein size is cs×cfMatrix SsText data for compression represents;So, available size is (n+m) × (cf+cs) association low-dimensional representing matrixWherein O represents full null matrix, and its dimension is determined by text and the number of word and involved number of topics;Size can also be obtained for (cf+cs)×(cf+cs) association low-dimensional representing matrixWherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms, i.e.WithBuild the association manifold matrix that q size is (n+m) × (n+m), then i-th association manifold matrix table is shown asWherein O represents full null matrix, and its dimension is determined by the number of text and word;For the data manifold of preferably approaching to reality, give each manifold one weight coefficient μi> 0, forms the linear combination of multiple manifold, i.e.And meet condition
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace, and regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoid over-fitting;The text low-dimensional obtained by solving this object function represents, it is possible to approach the intrinsic structure of urtext data, and keeps text data and the local geometry of word data simultaneously so that the text distance of same subject as close possible to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310217406.0A CN103345471B (en) | 2013-06-03 | 2013-06-03 | A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310217406.0A CN103345471B (en) | 2013-06-03 | 2013-06-03 | A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103345471A CN103345471A (en) | 2013-10-09 |
CN103345471B true CN103345471B (en) | 2016-08-10 |
Family
ID=49280266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310217406.0A Active CN103345471B (en) | 2013-06-03 | 2013-06-03 | A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103345471B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844769A (en) * | 2017-02-27 | 2017-06-13 | 百度在线网络技术(北京)有限公司 | With reference to the pattern of passing through and in limited time reading model information flow recommend method and apparatus |
CN107967483B (en) * | 2017-11-09 | 2021-03-30 | 上海电机学院 | Approximate image compression coding method |
CN108334494B (en) * | 2018-01-23 | 2022-01-25 | 创新先进技术有限公司 | Method and device for constructing user relationship network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986295A (en) * | 2010-10-28 | 2011-03-16 | 浙江大学 | Image clustering method based on manifold sparse coding |
CN102831161A (en) * | 2012-07-18 | 2012-12-19 | 天津大学 | Semi-supervision sequencing study method for image searching based on manifold regularization |
-
2013
- 2013-06-03 CN CN201310217406.0A patent/CN103345471B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986295A (en) * | 2010-10-28 | 2011-03-16 | 浙江大学 | Image clustering method based on manifold sparse coding |
CN102831161A (en) * | 2012-07-18 | 2012-12-19 | 天津大学 | Semi-supervision sequencing study method for image searching based on manifold regularization |
Non-Patent Citations (2)
Title |
---|
基于流形学习和SVM的Web文档分类算法;王自强等;《计算机工程》;20090831;第35卷(第15期);第38-40页 * |
文本聚类中权重计算的对偶性策略;卜东波等;《软件学报》;20021130;第13卷(第11期);第2083-2089页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103345471A (en) | 2013-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cheng et al. | Scene recognition with objectness | |
Kawade et al. | Sentiment analysis: machine learning approach | |
Chen et al. | Visual and textual sentiment analysis using deep fusion convolutional neural networks | |
CN110674298B (en) | Deep learning mixed topic model construction method | |
CN111782759B (en) | Question-answering processing method and device and computer readable storage medium | |
CN103345471B (en) | A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix | |
CN105335350A (en) | Language identification method based on ensemble learning | |
CN115017320A (en) | E-commerce text clustering method and system combining bag-of-words model and deep learning model | |
CN101408893A (en) | Method for rapidly clustering documents | |
Rezaei et al. | Event detection in twitter by deep learning classification and multi label clustering virtual backbone formation | |
Li et al. | Adding semantics to email clustering | |
CN113743079A (en) | Text similarity calculation method and device based on co-occurrence entity interaction graph | |
Min et al. | Building user interest profiles from wikipedia clusters | |
Hua et al. | A character-level method for text classification | |
Liu et al. | LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task. | |
He et al. | Construction of Diachronic Ontologies from People's Daily of Fifty Years. | |
Wang et al. | Integrating roberta fine-tuning and user writing styles for authorship attribution of short texts | |
Merson et al. | A text mining approach to identify and analyse prominent issues from public complaints | |
Xue et al. | Web page classification based on SVM | |
Lin et al. | [Retracted] Digital Library Information Integration System Based on Big Data and Deep Learning | |
CN113515624A (en) | Text classification method for emergency news | |
Zheng et al. | A short-text oriented clustering method for hot topics extraction | |
Göbel et al. | Table modelling, extraction and processing | |
Zhou et al. | Novel Classification Method for Short Texts with Few Words | |
Sheeba et al. | Unsupervised Hidden Topic Framework for Extracting Keywords (Synonym, Homonym, Hyponymy and Polysemy) and Topics in Meeting Transcripts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |