CN103345471B - A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix - Google Patents

A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix Download PDF

Info

Publication number
CN103345471B
CN103345471B CN201310217406.0A CN201310217406A CN103345471B CN 103345471 B CN103345471 B CN 103345471B CN 201310217406 A CN201310217406 A CN 201310217406A CN 103345471 B CN103345471 B CN 103345471B
Authority
CN
China
Prior art keywords
text
matrix
word
manifold
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310217406.0A
Other languages
Chinese (zh)
Other versions
CN103345471A (en
Inventor
卜佳俊
李平
陈纯
王北斗
高珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310217406.0A priority Critical patent/CN103345471B/en
Publication of CN103345471A publication Critical patent/CN103345471A/en
Application granted granted Critical
Publication of CN103345471B publication Critical patent/CN103345471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The accessible text exhibiting method decomposed based on multiple manifold incidence matrix, after the Internet captures web page text, proceed as follows for text: first text is carried out participle, extract text statistical nature information, including word frequency and reverse document frequency, form the TF IDF vectorization character representation of text;Then building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes the duality considered between text and word, it is thus achieved that text representation and the word of low-dimensional represent;Finally the low-dimensional to text represents and clusters, and the text of same or like theme is divided into one group, the most again represents text message.Advantage of the process is that and can preferably help people with disability user to divide theme to browse the text message on the Internet, and quickly show the web page text set of same theme, strengthen the Experience Degree of user.

Description

A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix
Technical field
The present invention relates to the technical field of accessible text exhibiting method, be based particularly on multiple manifold The accessible text exhibiting method that incidence matrix decomposes.
Background technology
China's large population base, composition colony is in variation feature, and important colony therein is residual The total amount of disease people has reached 85,000,000, is to build a harmonious society and in developing national economy Important force, Ye Shi governments at all levels and the colony of all kinds of groups emphasis helping.According to people with a disability in China The statistical report form of community finds, the data of all kinds of people with disabilitys are in the past few decades in year by year Ascendant trend.In the information age of big data-driven, increasing people with disability utilizes the most just Prompt the Internet obtains the information resources of daily studying and living, becomes very important in netizen Colony.In this huge information sharing platform of the Internet, text media occupies information and represents Overwhelming ratio, such as topical news, sports reports, book review film review etc. the overwhelming majority letter Breath presents to people with disability user by textual form.Comparing ordinary people, many people with disabilitys are due to body Body or the various defects of psychology and be difficult to effectively browse required info web, and on the Internet Text message a feast for the eyes, therefore be badly in need of invent a kind of clog-free text exhibiting method, side Just the text message on the Internet is read by people with disability colony.
It is known that the info web tissue provided on all kinds of websites is loose, lacks and concentrate classification Management, and people with disability user's only interesting web page text reading some particular topic, this makes Become text abundant information mixed and disorderly and people with disability reads the contradiction between webpage difficulty interested.Special Not for those hearing losss people or extremity disabled persons, search on the internet and read net The step of page text message is more time-consuming, easily causes and feels exhausted and spirit sleepiness.If energy Text message in all kinds of webpages is quickly put in little set according to theme, further in accordance with difference Theme is presented to people with disability user, is beneficial to alleviate web page text and reads pressure, improves text Reading efficiency and the Experience Degree of people with disability user.
At information retrieval and Data Mining, it is based primarily upon the cosine similarity of web page text also Carry out the cluster of text on this basis, form the text collection of all kinds of theme.To webpage literary composition Shelves carry out after the TF-IDF feature extraction dyad of text represents, according to text and word it Between relation of interdependence, use the clustering algorithms such as k-means in data mining, can be by net Page text is divided into multiple different subclass according to different themes and presents to user.
Summary of the invention
In order to help people with disability user can browse the web page text of same subject quickly and easily, To improve the Experience Degree of text reading, the present invention proposes one and divides based on multiple manifold incidence matrix The accessible text exhibiting method solved, the method comprises the following steps:
1, capture after web page text from the Internet, carry out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency with reverse Document frequency, forms the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes Consider the duality between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one Group, represents text message the most again.
2, step 1) described in extraction text statistical nature information comprise the concrete steps that:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction, I.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document Frequency), if the word occurred in text has m, then formed the TF-IDF of m dimension to Quantization characteristic represents;
1.2) the TF-IDF character representation to all texts carries out unified normalized.
3, step 2) described in structure some texts manifold and word manifold comprise the concrete steps that:
2.1) manifold structure can reflect the intrinsic structure of data, and it is by figure Laplce's square Battle array builds, and text manifold and word manifold can reflect text data and word data respectively Intrinsic structure;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtain n net Page text, the character representation of i-th text isThe character representation of jth text isWill The summit on non-directed graph regarded as by each text, if the Euclidean distance of two texts is relatively near, then in phase Connect a limit between the summit answered and give limit weight, so can set up a reflection textual data Non-directed graph according to manifold structure;Associated weights composition size between each text is the weight of n × n Matrix Ws, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on, DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain text Figure Laplce's matrix Ls
2.3) figure Laplce's matrix L of some texts is builts, connected by giving in non-directed graph Different weights W of edge fitsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight, Cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, between i.e. two summits Boundless connection, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, i.e. Jian You limit, two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent Vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein | | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging Different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word, The character representation dimension of each word is n, and the character representation of i-th word isJth list The character representation of word isEach word is regarded as the summit on non-directed graph, if two words Euclidean distance is relatively near, then connect a limit between corresponding summit and give limit weight, so may be used To set up the non-directed graph of a reflection word data manifold structure;Associated weights group between each word The weight matrix W becoming size to be m × mf, to WfEvery column element cumulative successively and be placed on right Angle matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through Lf=Df-WfObtain figure Laplce's matrix L of wordf
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar is some with structure Figure Laplce's matrix L of textsIdentical.
4, step 2) described in based on multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, each The character representation of text is matrix column vector, then full text forms a dimension is m × n Data matrix Xs;The word of composition text has m, and these words relate to cfIndividual theme, often The feature epi-position of individual word is matrix column vector, then all one dimension of word formation is The data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension is (n+m) incidence matrix of × (n+m) R = O X f X s O , Wherein O represents full null matrix, Its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.The biggest Little for m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsIt it is text Low-dimensional represent, size is cf×csMatrix SfWord data for compression represent;Similarly, The data matrix of word is resolved into three parts, i.e.Wherein size is cs×cf Matrix SsText data for compression represents;So, available size is (n+m)×(cf+cs) association low-dimensional representing matrix V = V s O O V f , Wherein O represents complete Null matrix, its dimension is determined by text and the number of word and involved number of topics;Also may be used To obtain size for (cf+cs)×(cf+cs) association low-dimensional representing matrix S = O S f S s O , Wherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms, I.e.WithBuild the association that q size is (n+m) × (n+m) Manifold matrix, then i-th association manifold matrix table is shown as L i = L s i O O L f i , Wherein O represents Full null matrix, its dimension is determined by the number of text and word;For preferably approaching to reality Data manifold, gives each manifold one weight coefficient μi> 0, forms the linear of multiple manifold Combination, i.e. L = Σ i = 1 q μ i L i , And meet condition Σ i = 1 q μ i = 1 ;
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
min V { | R - VSV T | F 2 + α T r [ V T ( Σ i = 1 q μ i L i ) V ] + β | μ | 2 } ,
s . t . Σ i = 1 q μ i = 1 , μ ≥ 0 , V ≥ 0 ,
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace, Regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoided Matching;The text low-dimensional obtained by solving this object function represents, it is possible to approach urtext The intrinsic structure of data, and keep text data and the local geometry of word data simultaneously, Make the text distance of same subject as close possible to.
The present invention proposes the accessible text exhibiting method decomposed based on multiple manifold incidence matrix, Have an advantage in that: utilize the duality of text and word, the statistical nature of text is represented and carries out Clustering processing, so that similar text presents with packet mode;It is applicable to all types of webpage Text message, it is not necessary to backstage manual operation, can be used for helping people with disability to realize accessible webpage literary composition This reading is it can also be used to help domestic consumer to improve text reading efficiency.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Detailed description of the invention
Referring to the drawings, the present invention is further illustrated:
A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix, the method bag Include following steps:
1, capture after text from the Internet, carry out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency with reverse Document frequency, forms the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes Consider the duality between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one Group, represents text message the most again.
Step 1) described in extract the comprising the concrete steps that of text statistical nature information:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction, I.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document Frequency), if the word occurred in text has m, then formed the TF-IDF of m dimension to Quantization characteristic represents;
1.2) the TF-IDF character representation to all texts carries out unified normalized.
Step 2) described in build comprising the concrete steps that of some text manifolds and word manifold:
2.1) manifold structure can reflect the intrinsic structure of data, and it is by figure Laplce's square Battle array builds, and text manifold and word manifold can reflect text data and word data respectively Intrinsic structure;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtain n net Page text, the character representation of i-th text isThe character representation of jth text isWill The summit on non-directed graph regarded as by each text, if the Euclidean distance of two texts is relatively near, then in phase Connect a limit between the summit answered and give limit weight, so can set up a reflection textual data Non-directed graph according to manifold structure;Associated weights composition size between each text is the weight of n × n Matrix Ws, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on, DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain text Figure Laplce's matrix Ls
2.3) figure Laplce's matrix L of some texts is builts, connected by giving in non-directed graph Different weights W of edge fitsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight, Cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, between i.e. two summits Boundless connection, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, i.e. Jian You limit, two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent Vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein | | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging Different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word, The character representation dimension of each word is n, and the character representation of i-th word isJth list The character representation of word isEach word is regarded as the summit on non-directed graph, if two words Euclidean distance is relatively near, then connect a limit between corresponding summit and give limit weight, so may be used To set up the non-directed graph of a reflection word data manifold structure;Associated weights group between each word The weight matrix W becoming size to be m × mf, to WfEvery column element cumulative successively and be placed on right Angle matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through Lf=Df-WfObtain figure Laplce's matrix L of wordf
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar is some with structure Figure Laplce's matrix L of textsIdentical.
Step 2) described in based on multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, each The character representation of text is matrix column vector, then full text forms a dimension is m × n Data matrix Xs;The word of composition text has m, and these words relate to cfIndividual theme, often The feature epi-position of individual word is matrix column vector, then all one dimension of word formation is The data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension is (n+m) incidence matrix of × (n+m) R = O X f X s O , Wherein O represents full null matrix, Its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.The biggest Little for m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsIt it is text Low-dimensional represent, size is cf×csMatrix SfWord data for compression represent;Similarly, The data matrix of word is resolved into three parts, i.e.Wherein size is cs×cf Matrix SsText data for compression represents;So, available size is (n+m)×(cf+cs) association low-dimensional representing matrix V = V s O O V f , Wherein O represents complete Null matrix, its dimension is determined by text and the number of word and involved number of topics;Also may be used To obtain size for (cf+cs)×(cf+cs) association low-dimensional representing matrix S = O S f S s O , Wherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms, I.e.WithBuild the association that q size is (n+m) × (n+m) Manifold matrix, then i-th association manifold matrix table is shown as L i = L s i O O L f i , Wherein O represents Full null matrix, its dimension is determined by the number of text and word;For preferably approaching to reality Data manifold, gives each manifold one weight coefficient μi> 0, forms the linear of multiple manifold Combination, i.e. L = Σ i = 1 q μ i L i , And meet condition Σ i = 1 q μ i = 1 ;
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
min V { | R - VSV T | F 2 + α T r [ V T ( Σ i = 1 q μ i L i ) V ] + β | μ | 2 } ,
s . t . Σ i = 1 q μ i = 1 , μ ≥ 0 , V ≥ 0 ,
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace, Regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoided Matching;The text low-dimensional obtained by solving this object function represents, it is possible to approach urtext The intrinsic structure of data, and keep text data and the local geometry of word data simultaneously, Make the text distance of same subject as close possible to.
Content described in this specification embodiment is only enumerating of the way of realization to inventive concept, this The protection domain of invention be not construed as being only limitted to the concrete form that embodiment is stated, this Invention protection domain also and in those skilled in the art according to present inventive concept institute it is conceivable that Equivalent technologies means.

Claims (1)

1. the accessible text exhibiting method decomposed based on multiple manifold incidence matrix, the method is characterized in that and capture after web page text from the Internet, carries out following operation for text:
1) text is carried out participle, extract text statistical nature information, including word frequency and reverse document frequency, form the TF-IDF vectorization character representation of text;
2) building some text manifolds and word manifold, incidence matrix based on multiple manifold decomposes the duality considered between text and word, it is thus achieved that text representation and the word of low-dimensional represent;
3) low-dimensional to text represents and clusters, and the text of same or like theme is divided into one group, the most again represents text message;
Described step 1) in extract the comprising the concrete steps that of text statistical nature information:
1.1) each web page text can regard a document as, to two kinds of statistical information of Text Feature Extraction, i.e. word frequency (TF:Term Frequency) and reverse document frequency (IDF:Inverse Document Frequency), if the word occurred in text has m, then form the TF-IDF vectorization character representation of m dimension;
1.2) the TF-IDF character representation to all texts carries out unified normalized;
Described step 2) in build comprising the concrete steps that of some text manifolds and word manifold:
2.1) manifold structure can reflect the intrinsic structure of data, and it is built by figure Laplacian Matrix, and text manifold and word manifold can reflect the intrinsic structure of text data and word data respectively;
2.2) figure Laplce's matrix L of text is builts, from the Internet, first obtaining n web page text, the character representation of i-th text isThe character representation of jth text isEach text is regarded as the summit on non-directed graph, if the Euclidean distance of two texts is relatively near, then between corresponding summit, connects a limit and give limit weight, so can set up the non-directed graph of a reflection text data manifold structure;The weight matrix W that associated weights composition size is n × n between each texts, to WsEvery column element cumulative successively and be placed on diagonal matrix DsDiagonal on, DsElement on middle off-diagonal is all set to 0, then can pass through Ls=Ds-WsObtain figure Laplce's matrix L of texts
2.3) figure Laplce's matrix L of some texts is builts, by giving different weights W on connected limit in non-directed graphsRealize, i.e. utilize three kinds of different Weight Algorithms: two-value weight, cosine similarity and gaussian kernel weight;IfWithEuclidean distance farther out, boundless connection between i.e. two summits, then the limit weight of two texts is 0;IfWithEuclidean distance relatively near, Jian You limit, i.e. two summits connects, then:
A. for two-value weight, the limit weight of two texts is 1;
B. for cosine similarity, the limit weight of two texts isWherein ()TRepresent vector or the transposition of matrix;
C. for gaussian kernel weight, the limit weight of two texts isWherein | | represent the l of vector2Norm, real parameters σ > 0 represents the bandwidth of gaussian kernel, by arranging different bandwidth parameters, can obtain different gaussian kernel weights;
2.4) figure Laplce's matrix L of word is builtf, according to the duality between text and word, the character representation dimension of each word is n, and the character representation of i-th word isThe character representation of jth word isEach word is regarded as the summit on non-directed graph, if the Euclidean distance of two words is relatively near, then between corresponding summit, connects a limit and give limit weight, so can set up the non-directed graph of a reflection word data manifold structure;The weight matrix W that associated weights composition size is m × m between each wordf, to WfEvery column element cumulative successively and be placed on diagonal matrix DfDiagonal on, DfElement on middle off-diagonal is all set to 0, then can pass through Lf=Df-WfObtain figure Laplce's matrix L of wordf
2.5) figure Laplce's matrix L of some words is builtf, its concrete grammar and the figure Laplce's matrix L building some textssIdentical;
Step 2) in multiple manifold incidence matrix decompose comprise the concrete steps that:
3.1) assuming to obtain n text from the Internet, these texts relate to csIndividual theme, the character representation of each text is matrix column vector, then one dimension of full text formation is the data matrix X of m × ns;The word of composition text has m, and these words relate to cfIndividual theme, the feature epi-position of each word is matrix column vector, then all one dimension of word formation is the data matrix X of n × mf;Due to the collaborative duality relation between text and word, then meetText and word data matrix are merged into a dimension for ( The incidence matrix of n+m) × (n+m)Wherein O represents full null matrix, and its dimension is determined by the number of text and word;
3.2) data matrix of text is resolved into three parts, i.e.Wherein size is m × cfMatrix VfBeing that the low-dimensional of word represents, size is n × csMatrix VsBeing that the low-dimensional of text represents, size is cf×csMatrix SfWord data for compression represent;Similarly, the data matrix of word is resolved into three parts, i.e.Wherein size is cs×cfMatrix SsText data for compression represents;So, available size is (n+m) × (cf+cs) association low-dimensional representing matrixWherein O represents full null matrix, and its dimension is determined by text and the number of word and involved number of topics;Size can also be obtained for (cf+cs)×(cf+cs) association low-dimensional representing matrixWherein O represents full null matrix, and its dimension is determined by the number of topics involved by text and word;
3.3) q text manifold and q word manifold are built respectively according to different Weight Algorithms, i.e.WithBuild the association manifold matrix that q size is (n+m) × (n+m), then i-th association manifold matrix table is shown asWherein O represents full null matrix, and its dimension is determined by the number of text and word;For the data manifold of preferably approaching to reality, give each manifold one weight coefficient μi> 0, forms the linear combination of multiple manifold, i.e.And meet condition
3.4) incidence matrix utilizing multiple manifold decomposes the object function minimizing regularization
Wherein, | |FFor matrix norm, | | for the l of vector2Norm, Tr () is matrix trace, and regularization factors α > 0 and β > 0 is respectively intended to regulate the contribution of manifold structure and avoid over-fitting;The text low-dimensional obtained by solving this object function represents, it is possible to approach the intrinsic structure of urtext data, and keeps text data and the local geometry of word data simultaneously so that the text distance of same subject as close possible to.
CN201310217406.0A 2013-06-03 2013-06-03 A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix Active CN103345471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310217406.0A CN103345471B (en) 2013-06-03 2013-06-03 A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310217406.0A CN103345471B (en) 2013-06-03 2013-06-03 A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix

Publications (2)

Publication Number Publication Date
CN103345471A CN103345471A (en) 2013-10-09
CN103345471B true CN103345471B (en) 2016-08-10

Family

ID=49280266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310217406.0A Active CN103345471B (en) 2013-06-03 2013-06-03 A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix

Country Status (1)

Country Link
CN (1) CN103345471B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844769A (en) * 2017-02-27 2017-06-13 百度在线网络技术(北京)有限公司 With reference to the pattern of passing through and in limited time reading model information flow recommend method and apparatus
CN107967483B (en) * 2017-11-09 2021-03-30 上海电机学院 Approximate image compression coding method
CN108334494B (en) * 2018-01-23 2022-01-25 创新先进技术有限公司 Method and device for constructing user relationship network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986295A (en) * 2010-10-28 2011-03-16 浙江大学 Image clustering method based on manifold sparse coding
CN102831161A (en) * 2012-07-18 2012-12-19 天津大学 Semi-supervision sequencing study method for image searching based on manifold regularization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986295A (en) * 2010-10-28 2011-03-16 浙江大学 Image clustering method based on manifold sparse coding
CN102831161A (en) * 2012-07-18 2012-12-19 天津大学 Semi-supervision sequencing study method for image searching based on manifold regularization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于流形学习和SVM的Web文档分类算法;王自强等;《计算机工程》;20090831;第35卷(第15期);第38-40页 *
文本聚类中权重计算的对偶性策略;卜东波等;《软件学报》;20021130;第13卷(第11期);第2083-2089页 *

Also Published As

Publication number Publication date
CN103345471A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
Cheng et al. Scene recognition with objectness
Kawade et al. Sentiment analysis: machine learning approach
Chen et al. Visual and textual sentiment analysis using deep fusion convolutional neural networks
CN110674298B (en) Deep learning mixed topic model construction method
CN111782759B (en) Question-answering processing method and device and computer readable storage medium
CN103345471B (en) A kind of accessible text exhibiting method decomposed based on multiple manifold incidence matrix
CN105335350A (en) Language identification method based on ensemble learning
CN115017320A (en) E-commerce text clustering method and system combining bag-of-words model and deep learning model
CN101408893A (en) Method for rapidly clustering documents
Rezaei et al. Event detection in twitter by deep learning classification and multi label clustering virtual backbone formation
Li et al. Adding semantics to email clustering
CN113743079A (en) Text similarity calculation method and device based on co-occurrence entity interaction graph
Min et al. Building user interest profiles from wikipedia clusters
Hua et al. A character-level method for text classification
Liu et al. LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task.
He et al. Construction of Diachronic Ontologies from People's Daily of Fifty Years.
Wang et al. Integrating roberta fine-tuning and user writing styles for authorship attribution of short texts
Merson et al. A text mining approach to identify and analyse prominent issues from public complaints
Xue et al. Web page classification based on SVM
Lin et al. [Retracted] Digital Library Information Integration System Based on Big Data and Deep Learning
CN113515624A (en) Text classification method for emergency news
Zheng et al. A short-text oriented clustering method for hot topics extraction
Göbel et al. Table modelling, extraction and processing
Zhou et al. Novel Classification Method for Short Texts with Few Words
Sheeba et al. Unsupervised Hidden Topic Framework for Extracting Keywords (Synonym, Homonym, Hyponymy and Polysemy) and Topics in Meeting Transcripts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant