CN103729466A - Name country identification method based on WEB and GBBoosting algorithms - Google Patents
Name country identification method based on WEB and GBBoosting algorithms Download PDFInfo
- Publication number
- CN103729466A CN103729466A CN201410019885.XA CN201410019885A CN103729466A CN 103729466 A CN103729466 A CN 103729466A CN 201410019885 A CN201410019885 A CN 201410019885A CN 103729466 A CN103729466 A CN 103729466A
- Authority
- CN
- China
- Prior art keywords
- rightarrow
- gbboosting
- algorithm
- web
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 26
- 238000005516 engineering process Methods 0.000 claims abstract description 13
- 238000013075 data extraction Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 48
- 238000012360 testing method Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims 1
- 230000004927 fusion Effects 0.000 abstract description 4
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000002372 labelling Methods 0.000 abstract description 2
- 238000007635 classification algorithm Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于WEB及GBBoosting算法的人名国别识别方法,属于WEB数据挖掘技术领域。该方法包括以下步骤:步骤一:通过WEB数据抽取技术提取高校学者人名;步骤二:构造GBBoosting算法:构造弱分类器,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器;步骤三:通过GBBoosting算法识别所属的国别。本发明所述的基于WEB及GBBoosting算法的人名国别识别方法,有效的解决了两个国家人名拼写方式相近的情况下不能分类的问题;同时本方法比现有的其它分类方法更易实施,能更好的应用于人名国别或者城市国别语义标注等工程实践中。
The invention discloses a name and country identification method based on WEB and GBBoosting algorithm, which belongs to the technical field of WEB data mining. The method includes the following steps: Step 1: Extract the names of academics in colleges and universities through WEB data extraction technology; Step 2: Construct GBBoosting algorithm: Construct weak classifiers, each weak classifier outputs a weak classification hypothesis for the input sample, and pass all weak classifiers The fusion of weights constitutes a strong classifier; Step 3: Identify the country to which it belongs through the GBBoosting algorithm. The national name recognition method based on WEB and GBBoosting algorithm of the present invention effectively solves the problem that the names of two countries cannot be classified under the similar spelling mode; at the same time, this method is easier to implement than other existing classification methods, and can It is better used in engineering practices such as semantic labeling of names and cities or cities and countries.
Description
技术领域technical field
本发明属于WEB数据挖掘技术领域,具体涉及一种基于WEB及GBBoosting算法的人名国别识别方法。The invention belongs to the technical field of WEB data mining, and in particular relates to a name and country recognition method based on WEB and GBBoosting algorithm.
背景技术Background technique
随着Internet的高速发展和WEB资源的日益丰富,为了从海量的数据信息中快速准确的挖掘需要且有意义的数据,近年来,WEB语义分析技术和文本分类技术在WEB数据挖掘领域得到广泛的应用,基于WEB的应用在某些程度上改变了用户的生活习惯和工作方式,也受到越来越多的广大用户的欢迎与赞赏。With the rapid development of the Internet and the increasing abundance of WEB resources, in order to quickly and accurately mine necessary and meaningful data from massive data information, in recent years, WEB semantic analysis technology and text classification technology have been widely used in the field of WEB data mining. Applications, WEB-based applications have changed users' living habits and working styles to some extent, and are also welcomed and appreciated by more and more users.
KNN、贝叶斯等分类方法在众多分类领域中取得了良好的分类效果,例如,解梅等人将KNN应用于图像处理领域,提出了一种基于KNN分类算法的MR图像灰度不均匀性校正分割方法(专利号:201010583560.6,公开日:2011.07.27);杨柳等人将贝叶斯应用于计算机软件领域,提出了一种基于改进贝叶斯分类的短信智能分类及搜索方法(专利号:201310356056.6,公开日:2013.12.04)。但是上述分类方法在人名国别分类场景中的分类准确率有待进一步提高,尤其是在两个国家人名拼写方式相近的情况下,其分类准确率仅仅高于随机猜测。可见上述分类算法在人名国别分类应用中存在极大的局限性。Classification methods such as KNN and Bayesian have achieved good classification results in many classification fields. For example, Jiemei et al. applied KNN to the field of image processing, and proposed a KNN-based classification algorithm to detect the gray level inhomogeneity of MR images. Correction segmentation method (Patent No.: 201010583560.6, publication date: 2011.07.27); Yang Liu et al. applied Bayesian to the field of computer software, and proposed an intelligent SMS classification and search method based on improved Bayesian classification (Patent No. : 201310356056.6, public date: 2013.12.04). However, the classification accuracy of the above-mentioned classification method in the scene of classification of personal names by country needs to be further improved, especially when the spelling of personal names in two countries is similar, the classification accuracy is only higher than random guessing. It can be seen that the above classification algorithm has great limitations in the application of the classification of personal names and countries.
基于上述分类方法在人名国别分类问题中存在的不足,本发明提出了一种基于Boosting的GBBoosting算法,旨在解决人名国别分类场景中存在的问题,与其他的分类算法相比,其分类准确率和召回率有了较大的提高,尤其是分类两个国家人名拼写方式相近的情况下,性能出色。将GBBoosting算法应用于人名国别、城市国别等识别场景中,进行人名或者城市的国别语义标注,进而应用到火热的社交领域中,具有非常重要的现实意义和广阔的应用前景。Based on the deficiencies of the above-mentioned classification methods in the classification of personal names and countries, the present invention proposes a Boosting-based GBBoosting algorithm, which aims to solve the problems existing in the scene of classification of personal names and countries. Compared with other classification algorithms, its classification The accuracy and recall rate have been greatly improved, especially when the spelling of names in two countries is similar, the performance is excellent. It has very important practical significance and broad application prospects to apply the GBBoosting algorithm to recognition scenarios such as person names and cities, etc., to carry out semantic annotation of names or cities by country, and then apply it to the hot social field.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种基于WEB及GBBoosting算法的人名国别识别方法,该方法通过WEB数据抽取技术提取高校学者人名,通过构造弱分类器,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器,最后通过GBBoosting算法识别人名所属的国家。In view of this, the object of the present invention is to provide a method for identifying national names based on WEB and GBBoosting algorithm. The method uses WEB data extraction technology to extract the names of academics in colleges and universities, and constructs weak classifiers. Output a weak classification hypothesis, form a strong classifier through the weight fusion of all weak classifiers, and finally use the GBBoosting algorithm to identify the country to which the name belongs.
为达到上述目的,本发明提供如下技术方案:To achieve the above object, the present invention provides the following technical solutions:
一种基于WEB及GBBoosting算法的人名国别识别方法,包括以下步骤:步骤一:通过WEB数据抽取技术提取高校学者人名;步骤二:构造GBBoosting算法:构造弱分类器,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器;步骤三:通过GBBoosting算法识别所属的国别。A name and country recognition method based on WEB and GBBoosting algorithm, including the following steps: Step 1: Extracting the names of academics in universities through WEB data extraction technology; Step 2: Constructing GBBoosting algorithm: Constructing weak classifiers, each weak classifier pairs input The sample outputs a weak classification hypothesis, and a strong classifier is formed by merging the weights of all weak classifiers; Step 3: Identify the country to which it belongs through the GBBoosting algorithm.
进一步,在步骤一中,通过GOOGLE搜索引擎接口得到高校学院页面,然后在学院页面进行语义分析得到学院学者所在页面,最终通过命名实体识别技术和语义分析技术得到抽取页面中的学者信息。Further, in step 1, the college page of the university is obtained through the GOOGLE search engine interface, and then semantic analysis is performed on the college page to obtain the page where the scholars of the college are located, and finally the scholar information in the extracted page is obtained through named entity recognition technology and semantic analysis technology.
进一步,在步骤二中,弱分类器的构造步骤具体包括:Further, in
1)将两种类型的训练文本用向量表示为
2)根据公式计算出两种训练文本的中间向量 2) According to the formula Two training texts are calculated The intermediate vector of
3)根据公式计算出中间向量的垂直向量 对于任意一个测试向量ai,如果(wi·ai)>0,则将ai的标签标记为+1,如果(wi·ai)<0,则将ai的标签标记为-1;3) According to the formula Calculate the intermediate vector The vertical vector of For any test vector a i , if (w i ·a i )>0, mark the label of a i as +1, and if (w i ·a i )<0, mark the label of a i as - 1;
迭代弱分类器,其权值融合形成强分类器,其具体步骤如下:The weak classifiers are iterated, and their weights are fused to form a strong classifier. The specific steps are as follows:
首先,给定两个训练集D1=(x1,x2,...,xi,...,xn),D2=(y1,y2,...,yi,...,yn),一个测试集DTest=(z1,z2,...,zi,...,zn),将训练集D1、D2,测试集DTest,分别表示成向量形式:
其次,1)从D1,D2中随机选取M(N/5<M<N)个样本组成子集D11、D21,分别对子集D11、D21中的向量对应相加并且单位化得到两个向量2)根据线性分类器的构造过程,得到与两个向量的中间向量垂直的向量生成弱分类器H(x)1;经过p次循环,得到p个不同的垂直向量p个弱分类器h(x)1,h(x)2,...,h(x)p;最终H(x)=h(x)1+h(x)2+...+h(x)p,即 Secondly, 1) Randomly select M(N/5<M<N) samples from D 1 and D 2 to form subsets D 11 and D 21 , add correspondingly to the vectors in subsets D 11 and D 21 and Normalize to get two vectors 2) According to the construction process of the linear classifier, the two vectors The intermediate vector of vertical vector Generate a weak classifier H(x) 1 ; after p cycles, get p different vertical vectors p weak classifiers h(x) 1 ,h(x) 2 ,...,h(x) p ; final H(x)=h(x) 1 +h(x) 2 +...+h (x) p , ie
进一步,在步骤三中,将高校学者人名通过GBBoosting算法识别出学者所属国家。Further, in Step 3, the name of the university scholar is identified by the GBBoosting algorithm to identify the country to which the scholar belongs.
本发明的有益效果在于:本发明提供了一种基于WEB及GBBoosting算法的人名国别识别方法,有效的解决了两个国家人名拼写方式相近的情况下不能分类的问题;同时本方法比现有的其它分类方法更易实施,能更好的应用于人名国别或者城市国别语义标注等工程实践中。The beneficial effect of the present invention is that: the present invention provides a method for identifying country names based on WEB and GBBoosting algorithm, which effectively solves the problem that the names of people in two countries have similar spellings and cannot be classified; at the same time, this method is better than the existing Other classification methods are easier to implement and can be better applied to engineering practices such as semantic labeling of person names or cities.
附图说明Description of drawings
为了使本发明的目的、技术方案和有益效果更加清楚,本发明提供如下附图进行说明:In order to make the purpose, technical scheme and beneficial effect of the present invention clearer, the present invention provides the following drawings for illustration:
图1为本发明所述方法的宏观流程图;Fig. 1 is the macro flow chart of method for the present invention;
图2为向量相似度计算图;Figure 2 is a vector similarity calculation diagram;
图3为弱分类器构造图;Figure 3 is a structural diagram of a weak classifier;
图4为本方法的微观流程图。Fig. 4 is the micro flow chart of this method.
具体实施方式Detailed ways
下面将结合附图,对本发明的优选实施例进行详细的描述。The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
图1为本发明所述方法的宏观流程图,如图所示,本方法包括以下步骤:步骤一:通过WEB数据抽取技术提取高校学者人名;步骤二:构造GBBoosting算法:构造弱分类器,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器;步骤三:通过GBBoosting算法识别所属的国别。Fig. 1 is the macro-flow chart of the method of the present invention, as shown in the figure, this method comprises the following steps: Step 1: Extract the names of academics in universities through WEB data extraction technology; Step 2: Construct GBBoosting algorithm: Construct a weak classifier, each A weak classifier outputs a weak classification hypothesis for the input sample, and forms a strong classifier through the weight fusion of all weak classifiers; Step 3: Identify the country to which it belongs through the GBBoosting algorithm.
图4为本方法的微观流程图,现结合图4对本方法的具体实施步骤进行说明。Fig. 4 is a micro-flow chart of the method, and the specific implementation steps of the method will now be described in conjunction with Fig. 4 .
1.通过WEB数据抽取技术提取高校学者人名1. Extract the names of university scholars through WEB data extraction technology
1)通过GOOGLE搜索引擎搜索“university+computerscience”找到学院主页;2)通过学院首页找到包含该学院中所有学者信息页面。学校中学者的姓名一般都会存在于对应学院(系),只要找到对应学院(系)的URL就可以得到学校所有学者的姓名及主页地址。步骤二中找对应大学中计算机学院(系)的URL,经过观察学院(系)和学院学者两个页面的URL地址,可以得到两个规则:1) Search "university+computerscience" through the GOOGLE search engine to find the homepage of the college; 2) Find the information page of all scholars in the college through the homepage of the college. The names of the scholars in the school generally exist in the corresponding college (department). As long as you find the URL of the corresponding college (department), you can get the names and home page addresses of all the scholars in the school. In
①后一个地址包含前一个地址。①The latter address contains the former address.
②后一个地址中还包含“people、faculty、faculty&Advisors”特征。②The latter address also includes the features of "people, faculty, faculty&Advisors".
只需要遍历计算机学院(系)中的所有链接,筛选出链接中符合上述两个规则并且链接对应文字为“faculty或people”的URL,通过实验发现一般可以过滤出两个URL地址,之所以出现两个URL是由于学院中一般含有people菜单,而faculty属于people的子菜单链接,第二个URL才是需要的链接,所以当出现两个URL时选择第二个URL地址,反之选择第一个地址。最后输入过滤出URL即可得到所有学者的姓名及对应的个人主页。3)通过计算机学院(系)faculty页面提取所有学者的姓名和主页。提取计算机学院(系)faculty页面所有的链接,找到链接对应的文本,通过命名实体技术分析文本是否为人名。It is only necessary to traverse all the links in the School of Computer Science (Department), and filter out the URLs in the links that meet the above two rules and whose corresponding text is "faculty or people". Through experiments, it is found that generally two URL addresses can be filtered out. The two URLs are because the college generally contains the people menu, and faculty belongs to the submenu link of people, the second URL is the required link, so when there are two URLs, select the second URL address, otherwise select the first URL address. Finally, enter the filtered URL to get the names and corresponding personal homepages of all scholars. 3) Extract the names and homepages of all scholars through the faculty page of the School of Computer Science (Department). Extract all the links on the faculty page of the School of Computer Science (Department), find the text corresponding to the link, and analyze whether the text is a person's name through named entity technology.
2.实现GBBoosting算法:构造弱分类器,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器。2. Implement the GBBoosting algorithm: construct weak classifiers, each weak classifier outputs a weak classification hypothesis for the input sample, and form a strong classifier through the weight fusion of all weak classifiers.
弱分类器的构造是通过简单空间向量相似度是判断两类文本的向量内积大小,即求两个向量的夹角大小。如图2所示,两个文本越相似,则对应向量的夹角越小,夹角的余弦值越大。如图3所示,弱分类器在简单空间向量相似度的基础上做了改进,构造一个简单的线性分类器。其具体步骤如下:The construction of the weak classifier is to judge the size of the vector inner product of two types of texts through the simple space vector similarity, that is, to find the angle between the two vectors. As shown in Figure 2, the more similar the two texts are, the smaller the angle between the corresponding vectors is, and the larger the cosine of the angle is. As shown in Figure 3, the weak classifier is improved on the basis of the similarity of simple space vectors to construct a simple linear classifier. The specific steps are as follows:
步骤一:给定两种类型的训练文本向量表示
步骤三:存在一个d维的向量和门限值0,对于任意一个测试向量ai,如果(wi·ai)>0,则将ai的标签标记为+1,如果(wi·ai)<0,则将ai的标签标记为-1。Step 3: There is a d-dimensional vector and threshold value 0, for any test vector a i , if (w i ·a i )>0, mark the label of a i as +1, if (w i ·a i )<0, set a The label of i is marked as -1.
通过弱分类器是实现GBBoosting算法的基础,每个弱分类器对输入样本输出一个弱分类假设,通过所有弱分类器的权重融合构成一个强分类器。给定两个训练集D1=(x1,x2,...,xi,...,xn),D2=(y1,y2,...,yi,...,yn)。分别从D1,D2中随机选取M个样本,生成两个向量通过计算得到与两个向量的中间向量垂直的向量将测试集DTest=(z1,z2,...,zi,...,zn)中的每个样本与向量V做点积,通过点积结果的正负判断样本的分类,其具体步骤如下:Weak classifiers are the basis for implementing the GBBoosting algorithm. Each weak classifier outputs a weak classification hypothesis for the input sample, and a strong classifier is formed by merging the weights of all weak classifiers. Given two training sets D 1 =(x 1 ,x 2 ,..., xi ,...,x n ), D 2 =(y 1 ,y 2 ,...,y i ,.. .,y n ). Randomly select M samples from D 1 and D 2 respectively, and generate two vectors Calculate the intermediate vector with two vectors vertical vector Do a dot product of each sample in the test set D Test = (z 1 ,z 2 ,..., zi ,...,z n ) with the vector V, and judge the classification of the sample by the positive or negative of the dot product result , the specific steps are as follows:
步骤一:两个训练集D1=(x1,x2,...,xi,...,xn),D2=(y1,y2,...,yi,...,yn),一个测试集DTest=(z1,z2,...,zi,...,zn),将训练集D1、D2,测试集DTest,分别表示成向量形式:
步骤二:1)从D1,D2中随机选取M(N/5<M<N)个样本组成子集D11、D21,分别对子集D11、D21中的向量对应相加并且单位化得到两个向量2)根据线性分类器的构造过程,得到与两个向量的中间向量垂直的向量生成弱分类器H(x)1。经过p次循环,得到p个不同的垂直向量p个弱分类器h(x)1,h(x)2,...,h(x)p。Step 2: 1) Randomly select M (N/5<M<N) samples from D 1 and D 2 to form subsets D 11 and D 21 , and add correspondingly to the vectors in subsets D 11 and D 21 and normalize to get two vectors 2) According to the construction process of the linear classifier, the two vectors The intermediate vector of vertical vector Generate a weak classifier H(x) 1 . After p cycles, get p different vertical vectors p weak classifiers h(x) 1 ,h(x) 2 ,...,h(x) p .
步骤三:H(x)=h(x)1+h(x)2+...+h(x)p,即 Step 3: H(x)=h(x) 1 +h(x) 2 +...+h(x) p , namely
最后说明的是,以上优选实施例仅用以说明本发明的技术方案而非限制,尽管通过上述优选实施例已经对本发明进行了详细的描述,但本领域技术人员应当理解,可以在形式上和细节上对其作出各种各样的改变,而不偏离本发明权利要求书所限定的范围。Finally, it should be noted that the above preferred embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail through the above preferred embodiments, those skilled in the art should understand that it can be described in terms of form and Various changes may be made in the details without departing from the scope of the invention defined by the claims.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410019885.XA CN103729466B (en) | 2014-01-16 | 2014-01-16 | Name country origin recognition methods based on WEB and GBBoosting algorithms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410019885.XA CN103729466B (en) | 2014-01-16 | 2014-01-16 | Name country origin recognition methods based on WEB and GBBoosting algorithms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103729466A true CN103729466A (en) | 2014-04-16 |
CN103729466B CN103729466B (en) | 2017-07-04 |
Family
ID=50453540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410019885.XA Expired - Fee Related CN103729466B (en) | 2014-01-16 | 2014-01-16 | Name country origin recognition methods based on WEB and GBBoosting algorithms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103729466B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484412A (en) * | 2014-12-16 | 2015-04-01 | 芜湖乐锐思信息咨询有限公司 | Big data analysis system based on multiform processing |
CN108108371A (en) * | 2016-11-24 | 2018-06-01 | 北京国双科技有限公司 | A kind of file classification method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080168070A1 (en) * | 2007-01-08 | 2008-07-10 | Naphade Milind R | Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification |
CN101609450A (en) * | 2009-04-10 | 2009-12-23 | 南京邮电大学 | Web page classification method based on training set |
CN102142078A (en) * | 2010-02-03 | 2011-08-03 | 中国科学院自动化研究所 | Method for detecting and identifying targets based on component structure model |
US20130218872A1 (en) * | 2012-02-16 | 2013-08-22 | Benzion Jair Jehuda | Dynamic filters for data extraction plan |
CN103400471A (en) * | 2013-08-12 | 2013-11-20 | 电子科技大学 | Detecting system and detecting method for fatigue driving of driver |
-
2014
- 2014-01-16 CN CN201410019885.XA patent/CN103729466B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080168070A1 (en) * | 2007-01-08 | 2008-07-10 | Naphade Milind R | Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification |
CN101609450A (en) * | 2009-04-10 | 2009-12-23 | 南京邮电大学 | Web page classification method based on training set |
CN102142078A (en) * | 2010-02-03 | 2011-08-03 | 中国科学院自动化研究所 | Method for detecting and identifying targets based on component structure model |
US20130218872A1 (en) * | 2012-02-16 | 2013-08-22 | Benzion Jair Jehuda | Dynamic filters for data extraction plan |
CN103400471A (en) * | 2013-08-12 | 2013-11-20 | 电子科技大学 | Detecting system and detecting method for fatigue driving of driver |
Non-Patent Citations (1)
Title |
---|
肖江,张亚非: "Boosting算法在文本自动分类中的应用", 《解放军理工大学学报(自然科学版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484412A (en) * | 2014-12-16 | 2015-04-01 | 芜湖乐锐思信息咨询有限公司 | Big data analysis system based on multiform processing |
CN108108371A (en) * | 2016-11-24 | 2018-06-01 | 北京国双科技有限公司 | A kind of file classification method and device |
CN108108371B (en) * | 2016-11-24 | 2021-06-29 | 北京国双科技有限公司 | Text classification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103729466B (en) | 2017-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740148B (en) | Text emotion analysis method combining BiLSTM with Attention mechanism | |
CN104899298B (en) | A kind of microblog emotional analysis method based on large-scale corpus feature learning | |
CN107480125B (en) | Relation linking method based on knowledge graph | |
WO2019071754A1 (en) | Method for sensing image privacy on the basis of deep learning | |
CN107526799A (en) | A kind of knowledge mapping construction method based on deep learning | |
CN102708164B (en) | Method and system for calculating movie expectation | |
CN103034726B (en) | Text filtering system and method | |
CN110472652B (en) | Small sample classification method based on semantic guidance | |
CN104636325B (en) | A kind of method based on Maximum-likelihood estimation determination Documents Similarity | |
CN105912716A (en) | Short text classification method and apparatus | |
CN110830489B (en) | Method and system for detecting counterattack type fraud website based on content abstract representation | |
CN104036010A (en) | Semi-supervised CBOW based user search term subject classification method | |
CN104361059B (en) | A kind of harmful information identification and Web page classification method based on multi-instance learning | |
CN103324708A (en) | Method of transfer learning from long text to short text | |
CN104700100A (en) | Feature extraction method for high spatial resolution remote sensing big data | |
Fengmei et al. | FSFP: Transfer learning from long texts to the short | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
CN102693316B (en) | Linear generalization regression model based cross-media retrieval method | |
CN109918648B (en) | A Rumor Depth Detection Method Based on Dynamic Sliding Window Feature Scoring | |
CN115718792A (en) | Sensitive information extraction method based on natural semantic processing and deep learning | |
Hao et al. | Similarity evaluation between graphs: a formal concept analysis approach | |
Zhang et al. | Enhanced semantic similarity learning framework for image-text matching | |
CN106445914A (en) | Microblog emotion classifier establishing method and device | |
CN103729466B (en) | Name country origin recognition methods based on WEB and GBBoosting algorithms | |
CN110866087A (en) | An Entity-Oriented Text Sentiment Analysis Method Based on Topic Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170704 |