CN102521402A - Text filtering system and method - Google Patents
Text filtering system and method Download PDFInfo
- Publication number
- CN102521402A CN102521402A CN2011104408016A CN201110440801A CN102521402A CN 102521402 A CN102521402 A CN 102521402A CN 2011104408016 A CN2011104408016 A CN 2011104408016A CN 201110440801 A CN201110440801 A CN 201110440801A CN 102521402 A CN102521402 A CN 102521402A
- Authority
- CN
- China
- Prior art keywords
- text
- filtering
- ontology
- filtered
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013459 approach Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 230000003542 behavioural effect Effects 0.000 claims description 2
- 230000003044 adaptive effect Effects 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 8
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
本发明公开一种文本过滤系统及方法,该系统至少包括:本体库建立模组,用于根据用户的过滤需求建立本体库;自适应学习模组,通过对一组过滤样本进行训练学习以对该本体库建立模组建立的本体库动态调整,使其逐渐接近于用户的过滤需求;以及文本过滤模组,通过对待过滤文本进行预处理、抽取特征词集及相似度匹配处理后,获得该待过滤文本与本体的相关度,并根据该相关度对该待过滤文本进行过滤,通过本发明,不仅能够准确表达用户的过滤模型,并能够在过滤时进行自主学习,调整采用本体表达的用户过滤模型,并能够动态调整过滤阈值,以达到更好的过滤效果。
The invention discloses a text filtering system and method. The system at least includes: an ontology library building module, which is used to build an ontology library according to the user's filtering requirements; an adaptive learning module, which trains and learns a group of filtering samples to The ontology library established by the ontology library building module is dynamically adjusted to make it gradually approach the user's filtering requirements; and the text filtering module obtains the The degree of correlation between the text to be filtered and the ontology, and the text to be filtered is filtered according to the degree of correlation. Through the present invention, not only can the user's filtering model be accurately expressed, but also self-learning can be performed during filtering, and the users who use the ontology expression can be adjusted. Filtering model, and can dynamically adjust the filtering threshold to achieve better filtering effect.
Description
技术领域 technical field
本发明涉及一种文本过滤系统及方法,特别是涉及一种基于本体的自适应的文本过滤系统及方法。The invention relates to a text filtering system and method, in particular to an ontology-based adaptive text filtering system and method.
背景技术 Background technique
在信息检索及过滤领域中,文本过滤一直是一个研究热点。目前国内外文献中已经有不少采用不同的方法来实现文本过滤。In the field of information retrieval and filtering, text filtering has always been a research hotspot. At present, many domestic and foreign literatures have adopted different methods to realize text filtering.
在目前的文本过滤方法中,主要包括基于遗传算法的模糊聚类文本过滤方法、采用改进的分类算法的文本过滤方法、采用自适应学习过滤算法的文本过滤方法以及只采用本体的文本过滤方法。其中,采用基于遗传算法的模糊聚类方法,对种群中的每个个体,进行模糊相似矩阵直接聚类,然后根据聚类的结果采用所提出的适应度函数来评估种群的适应度,然而这种文本过滤方法过滤的精度取决于聚类的效果,对于用户的过滤需求不能进行很好的表达;采用改进的分类算法的文本过滤方法对不良文本信息进行过滤,从数据层的角度改进传统的KNN算法,其缺点同样是对用户的需求表达不够精确;采用自适应学习过滤算法的文本过滤方法,能够通过训练样板集的方式来进行自适应学习,能够调整过滤模型,但其对于用户的过滤需求的表达同样不够精确;只采用本体的文本过滤方法,过滤的精度取决于本体的建立,如果本体库创建不够精确的话,将会大大影响文本过滤的精度。The current text filtering methods mainly include the fuzzy clustering text filtering method based on genetic algorithm, the text filtering method using improved classification algorithm, the text filtering method using adaptive learning filtering algorithm and the text filtering method only using ontology. Among them, the fuzzy clustering method based on genetic algorithm is used to directly cluster each individual in the population with a fuzzy similarity matrix, and then use the proposed fitness function to evaluate the fitness of the population according to the clustering results. However, this The filtering accuracy of this text filtering method depends on the effect of clustering, and it cannot express the user's filtering needs well; the text filtering method using an improved classification algorithm filters bad text information, and improves the traditional one from the perspective of the data layer. The disadvantage of the KNN algorithm is also that it is not accurate enough to express the user's needs; the text filtering method using the adaptive learning filtering algorithm can perform adaptive learning by training the sample set, and can adjust the filtering model, but its filtering effect on the user The expression of requirements is also not precise enough; only the text filtering method of ontology is used, and the filtering accuracy depends on the establishment of ontology. If the ontology database is not created accurately, it will greatly affect the accuracy of text filtering.
综上所述,可知先前技术之文本过滤方法中存在对用户的需求表达不够精确或本体库创建不够精确影响文本过滤精度的问题,因此实有必要提出改进的技术手段,来解决此一问题To sum up, it can be seen that in the text filtering method of the prior art, there is a problem that the expression of the user's needs is not accurate enough or the creation of the ontology database is not precise enough to affect the accuracy of the text filtering. Therefore, it is necessary to propose an improved technical means to solve this problem.
发明内容 Contents of the invention
为克服上述现有技术存在的不足,本发明的主要目的在于提供一种文本过滤系统及方法,其不仅能够准确表达用户的过滤模型,并能够在过滤时进行自主学习,调整采用本体表达的用户过滤模型,并能够动态调整过滤阈值,以达到更好的过滤效果。In order to overcome the deficiencies in the above-mentioned prior art, the main purpose of the present invention is to provide a text filtering system and method, which can not only accurately express the user's filtering model, but also can carry out independent learning during filtering, and adjust the user's text using ontology expression. Filtering model, and can dynamically adjust the filtering threshold to achieve better filtering effect.
为达上述及其它目的,本发明提供一种文本过滤系统,至少包括:To achieve the above and other purposes, the present invention provides a text filtering system, at least including:
本体库建立模组,用于根据用户的过滤需求建立本体库;Ontology library building module, used to build ontology library according to user's filtering requirements;
自适应学习模组,通过对一组过滤样本进行训练学习以对该本体库建立模组建立的本体库动态调整,使其逐渐接近于用户的过滤需求;以及The self-adaptive learning module dynamically adjusts the ontology library established by the ontology library building module by training and learning a group of filtering samples, making it gradually approach the user's filtering requirements; and
文本过滤模组,通过对待过滤文本进行预处理、抽取特征词集及相似度匹配处理后,获得该待过滤文本与本体的相关度,并根据该相关度对该待过滤文本进行过滤。The text filtering module obtains the degree of correlation between the text to be filtered and the ontology after preprocessing the text to be filtered, extracting the feature word set and matching the similarity, and filters the text to be filtered according to the degree of correlation.
进一步地,该本体库建立模组至少包括:Further, the ontology library building module at least includes:
领域确定模组,用于根据用户的过滤需求,明确要构建的本体所覆盖的领域和范围以确定本体的领域与范围;The domain determination module is used to clarify the domain and scope covered by the ontology to be built according to the user's filtering requirements, so as to determine the domain and scope of the ontology;
收集分析模组,用于在本体所涉及的领域范围内进行信息的收集和分析,明确重点概念和概念之间的关系,并且用精确的术语表达;以及The collection and analysis module is used to collect and analyze information within the scope of the ontology, clarify key concepts and the relationship between concepts, and express them in precise terms; and
本体框架建立模组,用于根据收集分析结果建立本体框架。The ontology frame building module is used to build the ontology frame according to the collection and analysis results.
进一步地,该本体采取三元组Topic(C,P,S)来表示,其中,C表示由过滤领域内的名词概念抽象出来,具有相同属性和行为结构的概念类的集合;P描述概念和关系的属性;S表示类之间的结构关系,如父类、子类等。Further, the ontology is represented by a triplet Topic(C, P, S), where C represents a collection of concept classes with the same attribute and behavior structure abstracted from noun concepts in the filtering field; P describes concepts and The attribute of the relationship; S represents the structural relationship between classes, such as parent class, subclass, etc.
进一步地,该自适应学习模组采用增量式迭代方法对一组过滤样本进行训练学习以对该本体库建立模组建立的本体库动态调整。Further, the self-adaptive learning module adopts an incremental iterative method to train and learn a group of filtered samples to dynamically adjust the ontology library built by the ontology library building module.
进一步地,该文本过滤模组至少包括Further, the text filtering module includes at least
预处理模组,用于对该待过滤文本进行去除停用词操作;A preprocessing module for removing stop words from the text to be filtered;
特征词集抽取模组,用于对该待过滤文本抽取出表达文本内容的特征词,根据特征词不同的位置及频率赋予相应的权重,并将相同的特征词权重值相加,形成文本特征词集;The feature word set extraction module is used to extract the feature words that express the text content from the text to be filtered, assign corresponding weights according to the different positions and frequencies of the feature words, and add the same feature word weight values to form text features vocabulary;
相似度计算模组,根据向量空间模型,计算出该待过滤文本与该本体的相关度;以及The similarity calculation module calculates the correlation between the text to be filtered and the ontology according to the vector space model; and
过滤模组,根据该相关度与一设定的阈值,对该待过滤文本进行过滤。The filtering module filters the text to be filtered according to the correlation degree and a set threshold.
进一步地,该过滤模组对该带过滤文本中低于该阈值的文本进行过滤。Further, the filtering module filters the texts that are lower than the threshold in the filtered texts.
为达上述及其他目的,本发明提供一种文本过滤方法,其至少包括如下步骤:In order to achieve the above and other purposes, the present invention provides a text filtering method, which at least includes the following steps:
根据用户的过滤需求建立本体库;Build an ontology library according to the user's filtering requirements;
对一组过滤样本进行训练学习以对所建立的本体库动态调整,使其逐渐接近于用户的过滤需求;以及Carry out training and learning on a set of filtered samples to dynamically adjust the established ontology library, making it gradually approach the user's filtering requirements; and
对待过滤文本进行预处理、抽取特征词集及相似度匹配处理后,获得该待过滤文本与本体的相关度,并根据该相关度对该待过滤文本进行过滤。After the text to be filtered is preprocessed, feature word set extracted and similarity matching processed, the correlation between the text to be filtered and the ontology is obtained, and the text to be filtered is filtered according to the correlation.
进一步地,该根据用户的过滤需求建立本体库的步骤至少还包括如下步骤:Further, the step of establishing an ontology library according to the user's filtering requirements at least includes the following steps:
根据用户的过滤需求,明确要构建的本体所覆盖的领域和范围确定本体的领域与范围;According to the user's filtering requirements, specify the field and scope covered by the ontology to be constructed to determine the field and scope of the ontology;
在本体所涉及的领域范围内进行信息的收集和分析,明确重点概念和概念之间的关系,并且用精确的术语表达;以及Collect and analyze information within the scope of the ontology, clarify the key concepts and the relationship between concepts, and express them in precise terms; and
根据收集分析结果建立本体框架。Based on the collected and analyzed results, an ontology framework is established.
进一步地,对该本体库动态调整采用增量式迭代方法实现。Further, the dynamic adjustment of the ontology database is realized by an incremental iterative method.
进一步地,对该待过滤文本进行过滤的步骤至少还包括如下步骤:Further, the step of filtering the text to be filtered further includes at least the following steps:
对待过滤文本进行去除停用词操作;Remove stop words from the text to be filtered;
抽取出该待过滤文本中表达文本内容的特征词,根据特征词不同的位置及频率赋予相应的权重,并将相同的特征词权重值相加,形成文本特征词集;Extract the feature words expressing the text content in the text to be filtered, assign corresponding weights according to the different positions and frequencies of the feature words, and add the same feature word weight values to form a text feature word set;
根据向量空间模型,计算出该待过滤文本与本体的相关度;以及Calculate the correlation between the text to be filtered and the ontology according to the vector space model; and
根据一设定的阈值与该相关度的关系对该待过滤文本进行过滤。The text to be filtered is filtered according to the relationship between a set threshold and the correlation degree.
与现有技术相比,本发明一种文本过滤系统及方法通过建立本体库能够比较精确地表达用户的过滤需求,同时为了进一步保证本体库更接近于用户的过滤需求,本发明采用自适应学习的方式,通过对一组样本进行训练学习,部分动态调整本体库,克服了传统的特征向量方法以及建立本体库的一般方法对用户需求表达不精确而造成过滤精度不高的缺点,另外,本发明在过滤阶段采用空间向量模型来计算待过滤的文本与本体库的相似度,将低于阈值的文本过滤掉,能够动态调整过滤阈值,以达到更好的过滤效果,实践证明,本发明这种采用基于本体的自适应的文本过滤方法能够获得较高的过滤精度。Compared with the prior art, a text filtering system and method of the present invention can more accurately express the user's filtering requirements by establishing an ontology database, and at the same time, in order to further ensure that the ontology database is closer to the user's filtering requirements, the present invention adopts adaptive learning By training and learning a group of samples, part of the ontology library is dynamically adjusted, which overcomes the shortcomings of the traditional feature vector method and the general method of establishing an ontology library, which cause inaccurate expression of user needs and cause low filtering accuracy. In addition, this In the filtering stage, the invention uses a space vector model to calculate the similarity between the text to be filtered and the ontology library, filters out texts below the threshold, and can dynamically adjust the filtering threshold to achieve a better filtering effect. Practice has proved that the present invention An adaptive text filtering method based on ontology can obtain higher filtering accuracy.
附图说明 Description of drawings
图1为本发明一种文本过滤系统的系统架构图;Fig. 1 is a system architecture diagram of a text filtering system of the present invention;
图2为本发明一种文本过滤方法的步骤流程图。FIG. 2 is a flow chart of the steps of a text filtering method in the present invention.
具体实施方式 Detailed ways
以下通过特定的具体实例并结合附图说明本发明的实施方式,本领域技术人员可由本说明书所揭示的内容轻易地了解本发明的其它优点与功效。本发明亦可通过其它不同的具体实例加以施行或应用,本说明书中的各项细节亦可基于不同观点与应用,在不背离本发明的精神下进行各种修饰与变更。The implementation of the present invention is described below through specific examples and in conjunction with the accompanying drawings, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific examples, and various modifications and changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention.
图1为本发明一种文本过滤系统的系统架构图。如图1所示,本发明一种文本过滤系统,至少包括:本体库建立模组10、自适应学习模组11以及文本过滤模组12。FIG. 1 is a system architecture diagram of a text filtering system of the present invention. As shown in FIG. 1 , a text filtering system of the present invention includes at least: an ontology
其中本体库建立模组10用于根据用户的过滤需求建立本体库,其至少包括领域确定模组101、收集分析模组102以及本体框架建立模组103。领域确定模组101首先根据用户的过滤需求,明确要构建的本体所覆盖的领域和范围以确定本体的领域与范围;收集分析模组102用于在本体所涉及的领域范围内进行信息的收集和分析,明确重点概念和概念之间的关系,并且用精确的术语表达出来,例如,在本发明较佳实施例中,本体采取三元组Topic(C,P,S)来表示,其中:C表示由过滤领域内的名词概念抽象出来,具有相同属性和行为结构的概念类的集合;P描述概念和关系的属性;S表示类之间的结构关系,如父类、子类等。C采用向量空间模型(VSM)来表示,使用二元组Ci(Keyi,Weighti),其中Keyi表示关键词,Weighti表示关键词的权重;本体框架建立模组103用于根据收集分析模组102的收集分析结果建立本体框架。The ontology
自适应学习模组11通过对一组过滤样本进行训练学习对本体库建立模组10建立的本体库动态调整,使其逐渐接近于用户的过滤需求。在本发明较佳实施例中,自适应学习模组11采用增量式迭代方法对一组过滤样本进行训练,设定固定值m作为观察新的需要被过滤掉的文档出现数量的窗口大小,根据评测指标的参数n来灵活设置,并设训练迭代次数为5,在增量迭代训练过程中,需要确定每次增加的特征项数目,以避免产生更多的噪音,根据增加的有效特征值,选取一定数量的增加到已有的本体库中,丰富用户的过滤需求模型。因此随着不断的学习,本体库越来越接近于用户的过滤需求,本体库所必需的特征也逐渐减少。The self-adaptive learning module 11 dynamically adjusts the ontology library built by the ontology
文本过滤模组12通过对待过滤文本进行预处理、抽取特征词集与相似度匹配处理后,根据待过滤文本与本体的相关度对待过滤文本进行过滤。其至少包括预处理模组121、特征词集抽取模组122、相似度计算模组123以及过滤模组124。其中,预处理模组121对待过滤文本经过去除停用词等预处理操作,特征词集抽取模组122用于抽取出表达文本内容的特征词,并根据特征词不同的位置及频率赋予相应的权重,相同的特征词权重值相加,形成文本特征词集Ti={(Word1k,Weight1k)},这样待过滤的文本采用了特征向量来表示;相似度计算模组123根据向量空间模型,两特征向量夹角的余弦值可以表示它们的相关度,由此可以计算出一个待过滤的文本与本体的相关度Simj;过滤模组124则根据该相关度Simj与设定的阈值,对待过滤文本进行过滤,即对低于阈值的文本进行过滤。The
图2为本发明一种文本过滤方法的步骤流程图。如图2所示,本发明一种文本过滤方法,至少包括如下步骤:FIG. 2 is a flow chart of the steps of a text filtering method in the present invention. As shown in Figure 2, a kind of text filtering method of the present invention comprises the following steps at least:
步骤201,根据用户的过滤需求建立本体库。在该步骤中,首先根据用户的过滤需求,明确要构建的本体所覆盖的领域和范围确定本体的领域与范围;然后在本体所涉及的领域范围内进行信息的收集和分析,明确重点概念和概念之间的关系,并且用精确的术语表达出来;最后,建立本体框架。在本发明较佳实施例中,本体采取三元组Topic(C,P,S)来表示,其中:C表示由过滤领域内的名词概念抽象出来,具有相同属性和行为结构的概念类的集合;P描述概念和关系的属性;S表示类之间的结构关系,如父类、子类等,C采用向量空间模型(VSM)来表示,使用二元组Ci(Keyi,Weighti),其中Keyi表示关键词,Weighti表示关键词的权重。In
步骤202,对一组过滤样本进行训练学习以对所建立的本体库动态调整,使其逐渐接近于用户的过滤需求。在本发明较佳实施例中,采用增量式迭代方法对一组过滤样本进行训练,设定固定值m作为观察新的需要被过滤掉的文档出现数量的窗口大小,根据评测指标的参数n来灵活设置,并设训练迭代次数为5,在增量迭代训练过程中,需要确定每次增加的特征项数目,以避免产生更多的噪音,根据增加的有效特征值,选取一定数量的增加到已有的本体库中,丰富用户的过滤需求模型,因此随着不断的学习,本体库越来越接近于用户的过滤需求,本体库所必需的特征也逐渐减少。
步骤203,对待过滤文本进行预处理、抽取特征词集与相似度匹配处理后,根据待过滤文本与本体的相关度对待过滤文本进行过滤。其具体过程如下:首先对待过滤文本经过去除停用词等预处理操作;然后抽取出表达文本内容的特征词,并根据特征词不同的位置及频率赋予相应的权重,相同的特征词权重值相加,形成文本特征词集Ti={(Word1k,Weight1k)},这样待过滤的文本采用了特征向量来表示;接着根据向量空间模型,两特征向量夹角的余弦值可以表示它们的相关度。由此可以计算出一个待过滤的文本与本体的相关度Simj;最后根据设定的阈值与相关度Simj的关系对待过滤文本进行过滤,即对低于阈值的文本进行过滤。Step 203: After preprocessing the text to be filtered, extracting feature word sets and matching similarity, the text to be filtered is filtered according to the correlation between the text to be filtered and the ontology. The specific process is as follows: firstly, the filtered text is subjected to preprocessing operations such as removing stop words; then the feature words expressing the content of the text are extracted, and corresponding weights are given according to the different positions and frequencies of the feature words. Add to form the text feature word set Ti={(Word1k, Weight1k)}, so that the text to be filtered is represented by a feature vector; then according to the vector space model, the cosine value of the angle between the two feature vectors can represent their degree of correlation. Thus, a correlation degree Sim j between the text to be filtered and the ontology can be calculated; finally, the text to be filtered is filtered according to the relationship between the set threshold and the correlation degree Sim j , that is, texts below the threshold are filtered.
可见,由于本体能够对领域概念及概念间进行明确的定义,本发明一种文本过滤系统及方法通过建立本体库能够比较精确地表达用户的过滤需求,同时为了进一步保证本体库更接近于用户的过滤需求,本发明采用自适应学习的方式,通过对一组样本进行训练学习,部分动态调整本体库,克服了传统的特征向量方法以及建立本体库的一般方法对用户需求表达不精确而造成过滤精度不高的缺点,另外,本发明在过滤阶段采用空间向量模型来计算待过滤的文本与本体库的相似度,将低于阈值的文本过滤掉,并能够动态调整过滤阈值,以达到更好的过滤效果,实践证明,本发明这种采用基于本体的自适应的文本过滤方法能够获得较高的过滤精度。It can be seen that since ontology can clearly define domain concepts and concepts, a text filtering system and method of the present invention can more accurately express the user's filtering needs by establishing an ontology library, and at the same time, in order to further ensure that the ontology library is closer to the user's To filter requirements, the present invention adopts an adaptive learning method, through training and learning on a group of samples, and partially dynamically adjusting the ontology library, which overcomes the traditional feature vector method and the general method of establishing an ontology library, which cause inaccurate expression of user needs and cause filtering. In addition, the present invention uses a space vector model to calculate the similarity between the text to be filtered and the ontology library in the filtering stage, filters out texts below the threshold, and can dynamically adjust the filtering threshold to achieve better The filtering effect is proved by practice that the ontology-based adaptive text filtering method of the present invention can obtain higher filtering precision.
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何本领域技术人员均可在不违背本发明的精神及范畴下,对上述实施例进行修饰与改变。因此,本发明的权利保护范围,应如权利要求书所列。The above-mentioned embodiments only illustrate the principles and effects of the present invention, but are not intended to limit the present invention. Any person skilled in the art can modify and change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be listed in the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110440801.6A CN102521402B (en) | 2011-12-23 | 2011-12-23 | Text filtering system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110440801.6A CN102521402B (en) | 2011-12-23 | 2011-12-23 | Text filtering system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102521402A true CN102521402A (en) | 2012-06-27 |
CN102521402B CN102521402B (en) | 2014-02-19 |
Family
ID=46292315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110440801.6A Expired - Fee Related CN102521402B (en) | 2011-12-23 | 2011-12-23 | Text filtering system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102521402B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880636A (en) * | 2012-08-03 | 2013-01-16 | 深圳证券信息有限公司 | Bad information detection method and server |
CN103034726A (en) * | 2012-12-18 | 2013-04-10 | 上海电机学院 | Text filtering system and method |
CN103902619A (en) * | 2012-12-28 | 2014-07-02 | 中国移动通信集团公司 | Internet public opinion monitoring method and system |
CN104615714A (en) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | Blog duplicate removal method based on text similarities and microblog channel features |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
CN108428382A (en) * | 2018-02-14 | 2018-08-21 | 广东外语外贸大学 | It is a kind of spoken to repeat methods of marking and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751409A (en) * | 2008-11-28 | 2010-06-23 | 上海电机学院 | Application of immune system in search engine |
CN101794311A (en) * | 2010-03-05 | 2010-08-04 | 南京邮电大学 | Fuzzy data mining based automatic classification method of Chinese web pages |
CN101901247A (en) * | 2010-03-29 | 2010-12-01 | 北京师范大学 | A vertical search engine method and system constrained by domain ontology |
-
2011
- 2011-12-23 CN CN201110440801.6A patent/CN102521402B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751409A (en) * | 2008-11-28 | 2010-06-23 | 上海电机学院 | Application of immune system in search engine |
CN101794311A (en) * | 2010-03-05 | 2010-08-04 | 南京邮电大学 | Fuzzy data mining based automatic classification method of Chinese web pages |
CN101901247A (en) * | 2010-03-29 | 2010-12-01 | 北京师范大学 | A vertical search engine method and system constrained by domain ontology |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880636A (en) * | 2012-08-03 | 2013-01-16 | 深圳证券信息有限公司 | Bad information detection method and server |
CN103034726A (en) * | 2012-12-18 | 2013-04-10 | 上海电机学院 | Text filtering system and method |
CN103034726B (en) * | 2012-12-18 | 2016-05-25 | 上海电机学院 | Text filtering system and method |
CN103902619A (en) * | 2012-12-28 | 2014-07-02 | 中国移动通信集团公司 | Internet public opinion monitoring method and system |
CN103902619B (en) * | 2012-12-28 | 2018-10-23 | 中国移动通信集团公司 | A kind of network public-opinion monitoring method and system |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
CN104615714A (en) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | Blog duplicate removal method based on text similarities and microblog channel features |
CN104615714B (en) * | 2015-02-05 | 2019-05-24 | 北京中搜云商网络技术有限公司 | Blog article rearrangement based on text similarity and microblog channel feature |
CN108428382A (en) * | 2018-02-14 | 2018-08-21 | 广东外语外贸大学 | It is a kind of spoken to repeat methods of marking and system |
Also Published As
Publication number | Publication date |
---|---|
CN102521402B (en) | 2014-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103034726B (en) | Text filtering system and method | |
CN102289522B (en) | Method of intelligently classifying texts | |
CN109902289B (en) | News video theme segmentation method oriented to fuzzy text mining | |
CN106960025B (en) | A personalized document recommendation method based on domain knowledge graph | |
CN107301171A (en) | A kind of text emotion analysis method and system learnt based on sentiment dictionary | |
CN103942340A (en) | Microblog user interest recognizing method based on text mining | |
CN107766324A (en) | A kind of text coherence analysis method based on deep neural network | |
CN107423339A (en) | Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest | |
CN104866558B (en) | A kind of social networks account mapping model training method and mapping method and system | |
CN102521402A (en) | Text filtering system and method | |
CN107291886A (en) | A kind of microblog topic detecting method and system based on incremental clustering algorithm | |
CN102929861A (en) | Method and system for calculating text emotion index | |
CN102779510A (en) | Speech emotion recognition method based on feature space self-adaptive projection | |
CN111597328B (en) | New event theme extraction method | |
CN107679031B (en) | Advertisement and blog identification method based on stacking noise reduction self-coding machine | |
CN112132096B (en) | Behavior modal identification method of random configuration network for dynamically updating output weight | |
CN108710611A (en) | A kind of short text topic model generation method of word-based network and term vector | |
CN112347761B (en) | BERT-based drug relation extraction method | |
CN108804651A (en) | A kind of Social behaviors detection method based on reinforcing Bayes's classification | |
CN108710609A (en) | A kind of analysis method of social platform user information based on multi-feature fusion | |
CN110457711A (en) | A topic recognition method for social media events based on keywords | |
CN103778206A (en) | Method for providing network service resources | |
CN109697288A (en) | A kind of example alignment schemes based on deep learning | |
CN112489689A (en) | Cross-database voice emotion recognition method and device based on multi-scale difference confrontation | |
CN108268461A (en) | A kind of document sorting apparatus based on hybrid classifer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140219 Termination date: 20161223 |
|
CF01 | Termination of patent right due to non-payment of annual fee |