CN101334843B - Pattern recognition characteristic extraction method and apparatus - Google Patents
Pattern recognition characteristic extraction method and apparatus Download PDFInfo
- Publication number
- CN101334843B CN101334843B CN200710118156XA CN200710118156A CN101334843B CN 101334843 B CN101334843 B CN 101334843B CN 200710118156X A CN200710118156X A CN 200710118156XA CN 200710118156 A CN200710118156 A CN 200710118156A CN 101334843 B CN101334843 B CN 101334843B
- Authority
- CN
- China
- Prior art keywords
- feature
- variable
- variables
- class
- contribution degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 74
- 238000003909 pattern recognition Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 208000024891 symptom Diseases 0.000 claims description 68
- 238000004364 calculation method Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 4
- 201000010099 disease Diseases 0.000 claims description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 2
- 230000001815 facial effect Effects 0.000 abstract 1
- 208000011580 syndromic disease Diseases 0.000 description 24
- 239000008280 blood Substances 0.000 description 6
- 210000004369 blood Anatomy 0.000 description 6
- 230000004069 differentiation Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 208000032023 Signs and Symptoms Diseases 0.000 description 3
- 238000004880 explosion Methods 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 238000010219 correlation analysis Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 206010008469 Chest discomfort Diseases 0.000 description 1
- 208000004044 Hypesthesia Diseases 0.000 description 1
- 206010062717 Increased upper airway secretion Diseases 0.000 description 1
- 206010022998 Irritability Diseases 0.000 description 1
- 208000013738 Sleep Initiation and Maintenance disease Diseases 0.000 description 1
- 206010041956 Stasis syndrome Diseases 0.000 description 1
- 206010046996 Varicose vein Diseases 0.000 description 1
- 208000031975 Yang Deficiency Diseases 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 208000005634 blind loop syndrome Diseases 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 208000034783 hypoesthesia Diseases 0.000 description 1
- 206010022437 insomnia Diseases 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 231100000862 numbness Toxicity 0.000 description 1
- 238000012567 pattern recognition method Methods 0.000 description 1
- 208000026435 phlegm Diseases 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 208000027185 varicose disease Diseases 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种模式识别中的特征提取方法及装置,为有效避免以往特征提取中预先人为指定选择的特征个数的主观性。该特征提取方法包括步骤:根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;设定联合贡献度阈值;确定特征变量的组合与类变量的联合贡献度;获取所述联合贡献度大于或等于所设定联合贡献度阈值的特征变量的组合。该特征提取装置,包括:数值预处理模块、阈值设定模块、联合贡献度确定模块和特征提取模块。本发明模式识别中的特征提取方法及装置,能够广泛应用于离散的数字图像信息、指纹信息、脸纹信息、语音信息或手写/印刷字符信息的等的特征提取。
The invention discloses a method and a device for feature extraction in pattern recognition, in order to effectively avoid the subjectivity of pre-designated and selected feature numbers in previous feature extraction. The feature extraction method includes the steps of: determining discrete feature variables and class variables according to the pattern original information of the sample, and preprocessing the feature variables and class variables; setting a joint contribution threshold; determining the combination of feature variables and the class variable Joint contribution degree: obtain the combination of feature variables whose joint contribution degree is greater than or equal to the set joint contribution degree threshold. The feature extraction device includes: a numerical preprocessing module, a threshold setting module, a joint contribution degree determination module and a feature extraction module. The feature extraction method and device in pattern recognition of the present invention can be widely used in the feature extraction of discrete digital image information, fingerprint information, facial pattern information, voice information or handwritten/printed character information.
Description
技术领域technical field
本发明涉及模式识别领域,特别涉及模式识别中的特征提取方法及装置。The invention relates to the field of pattern recognition, in particular to a feature extraction method and device in pattern recognition.
背景技术Background technique
模式是通过对具体的个别事物进行观测所得到的具有时间和空间分布的信息;把模式所属的类别或同一类中模式的总体称为模式类(或简称为类)。而“模式识别”则是在某些一定量度或观测基础上把待识模式划分到各自的模式类中去。A pattern is information with time and space distribution obtained by observing specific individual things; the category to which a pattern belongs or the overall pattern in the same category is called a pattern class (or class for short). "Pattern recognition" is to divide the patterns to be recognized into their respective pattern categories on the basis of some certain measurements or observations.
模式识别的研究主要集中在两方面,即研究生物体(包括人)是如何感知对象的,以及在给定的任务下,如何用计算机实现模式识别。The research on pattern recognition mainly focuses on two aspects, that is, how to study how organisms (including people) perceive objects, and how to use computers to realize pattern recognition under a given task.
一个计算机模式识别系统基本上由三个相互关联而又有明显区别的过程组成,即数据生成、模式分析和模式分类。数据生成是将输入模式的原始信息进行量化处理,转换为向量,成为计算机易于处理的形式。模式分析是对数据进行加工,包括特征选择、特征提取、数据维数压缩和决定可能存在的类别等。模式分类则是利用模式分析所获得的信息,对计算机进行训练,从而制定判别标准,以期对待识模式进行分类。A computer pattern recognition system basically consists of three interrelated but distinct processes, namely data generation, pattern analysis, and pattern classification. Data generation is to quantify the original information of the input pattern, convert it into a vector, and become a form that can be easily processed by a computer. Pattern analysis is the processing of data, including feature selection, feature extraction, data dimensionality compression, and determination of possible categories. Pattern classification is to use the information obtained by pattern analysis to train the computer so as to formulate discrimination standards in order to classify the recognized patterns.
其中模式分析中的特征提取对于高效的模式分类是非常重要。模式分类涉及到各个领域,如图像分类、语音识别、生物技术、医学等。分类的效率始终是模式分类研究的重要内容,在很多实际问题中,可进行模式分类研究的特征变量非常多,如果将所有可供参考的特征变量都考虑进去进行分类,那么效率将非常低,在实际中无法使用。因此,需要对特征变量进行提取,将经特征提取得到的特征子集作为客观分类器的输入,经过对客观分类器训练,利用特征子集进行分类,从而提高分类的效率。Among them, feature extraction in pattern analysis is very important for efficient pattern classification. Pattern classification involves various fields, such as image classification, speech recognition, biotechnology, medicine, etc. The efficiency of classification is always an important content of pattern classification research. In many practical problems, there are many characteristic variables that can be studied for pattern classification. If all the characteristic variables available for reference are taken into consideration for classification, the efficiency will be very low. Unusable in practice. Therefore, it is necessary to extract the feature variables, and use the feature subset obtained through feature extraction as the input of the objective classifier. After training the objective classifier, the feature subset is used for classification, thereby improving the efficiency of classification.
特征提取是基于搜索一个使信息损失量最小的特征子空间,信息量是通过特征子空间和类变量之间的互信息来度量,特征提取方法不但考虑特征变量与类变量之间的相关性,而且考虑特征变量之间的相关性。Feature extraction is based on searching for a feature subspace that minimizes the amount of information loss. The amount of information is measured by the mutual information between feature subspaces and class variables. Feature extraction methods not only consider the correlation between feature variables and class variables, Moreover, the correlation between feature variables is considered.
特征提取可应用在中医学中。辨证论治是中医的核心,辨证是利用中医理论来理解和诊断疾病的一种方法,证候是未知病因的症状复合体,是机体发生异常的表征。广义的症状不但包括四诊信息,还包括性别、体质、情绪、压力、饮食、生活习惯等众多因素。在辨证过程中,因为有太多的症状体征,医生很难将所有观察到的症状都考虑进去。不同的症状体征在辨证过程中起不同的作用,如何找出包含信息量最大的症状体征集合作为某种证候的辨证标准是中医界非常重要的问题。Feature extraction can be applied in Chinese medicine. Syndrome differentiation and treatment is the core of TCM. Syndrome differentiation is a method of understanding and diagnosing diseases using TCM theory. Syndrome is a complex of symptoms with unknown etiology and a symptom of abnormalities in the body. Symptoms in a broad sense include not only the information of the four diagnoses, but also gender, physical fitness, emotion, stress, diet, living habits and many other factors. In the process of syndrome differentiation, because there are too many symptoms and signs, it is difficult for doctors to take all observed symptoms into account. Different symptoms and signs play different roles in the process of syndrome differentiation. How to find out the set of symptoms and signs with the largest amount of information as a syndrome differentiation standard for a certain syndrome is a very important issue in the field of traditional Chinese medicine.
特征提取同样可应用于数字图像的模式识别。数字图像的模式识别是根据图像的像素灰度值进行模式分类的,一幅图像的像素量很多,如常用的1280×960像素、640×480像素、320×240像素、160×120像素等,如果在模式分类中将所有的像素作为模式分类器的输入,那样的效率将非常低。因此特征提取对于图像的模式分类也是非常重要的研究内容。在图像的特征提取中,将每个像素看作是一个特征变量,选取出对于模式分类最有用的像素作为客观分类器的输入。Feature extraction can also be applied to pattern recognition of digital images. The pattern recognition of digital images is based on the pixel gray value of the image for pattern classification. An image has a lot of pixels, such as commonly used 1280×960 pixels, 640×480 pixels, 320×240 pixels, 160×120 pixels, etc. If all pixels are used as the input of the pattern classifier in pattern classification, the efficiency will be very low. Therefore, feature extraction is also a very important research content for image pattern classification. In image feature extraction, each pixel is regarded as a feature variable, and the most useful pixel for pattern classification is selected as the input of objective classifier.
关于特征变量提取的方法。相关分析是选择信息量大的特征集合的基础,特征变量可以根据它们与类变量的相关度值进行选择。On the method of feature variable extraction. Correlation analysis is the basis for selecting feature sets with a large amount of information, and feature variables can be selected according to their correlation values with class variables.
目前有多种分析相关的统计方法,最简单的方法是相关系数法,但该方法只适用于分析线性相关问题,而许多实际中的问题都是非线性关系。通常使用的非线性统计分析方法是逻辑(logistic)回归法,该方法需要特征变量之间是相互独立的,而实际的很多问题难以满足这个条件。更重要的是logistic回归方法的回归系数不能够直接反映特征变量与类变量之间的相关度值,要用胜算比(odds ratio,OR)值来确定,并且OR值没有实际的物理意义。主成分分析方法和因子分析方法也可用于相关性分析,这两种方法也只能分析变量之间的线性关系,不能度量变量之间任意的相关性。At present, there are many statistical methods for analyzing correlation. The simplest method is the correlation coefficient method, but this method is only suitable for analyzing linear correlation problems, and many practical problems are nonlinear relations. The commonly used non-linear statistical analysis method is the logistic regression method, which requires that the characteristic variables be independent of each other, but many practical problems are difficult to meet this condition. More importantly, the regression coefficient of the logistic regression method cannot directly reflect the correlation value between the feature variable and the class variable, it must be determined by the odds ratio (OR) value, and the OR value has no actual physical meaning. Principal component analysis method and factor analysis method can also be used for correlation analysis. These two methods can only analyze the linear relationship between variables, and cannot measure any correlation between variables.
基于熵的互信息方法则不但可以分析数值变量(离散变量和连续变量)之间的相关性,而且可以度量变量之间的任意相关性。互信息是熵理论中的核心概念之一,是非线性复杂系统自适应性的重要测度,其实质是事物之间的信息传递,随机变量之间的统计相关性,已被应用到很多领域,特别是模式识别领域。The entropy-based mutual information method can not only analyze the correlation between numerical variables (discrete variables and continuous variables), but also measure any correlation between variables. Mutual information is one of the core concepts in entropy theory. It is an important measure of the adaptability of nonlinear complex systems. Its essence is the information transfer between things and the statistical correlation between random variables. It has been applied to many fields, especially is the field of pattern recognition.
与传统方法相比,基于熵的互信息主要有以下优点:Compared with traditional methods, entropy-based mutual information has the following advantages:
1)它既可以度量变量之间线性相关性又可度量变量之间的非线性相关性;1) It can measure both the linear correlation between variables and the nonlinear correlation between variables;
2)与logistic回归非线性分析方法相比,基于熵的互信息方法对分析的变量没有互相独立的条件限制;2) Compared with the logistic regression nonlinear analysis method, the entropy-based mutual information method has no independent conditions for the analyzed variables;
3)基于熵的互信息方法不但可以分析数值变量(离散变量和连续变量)之间的相关性,而且可以度量分级变量、符号变量之间的相关性。3) The entropy-based mutual information method can not only analyze the correlation between numerical variables (discrete variables and continuous variables), but also measure the correlation between hierarchical variables and symbolic variables.
最优的特征选择方法,是将所有的特征组合都评估一遍,这通常会产生组合爆炸问题,因此研究有效的特征提取方法是非常重要的问题。目前,已经有很多学者从事这方面的研究,几种有效的特征提取方法已被提出,用来解决组合问题。但在这些方法中,选择的特征个数通常被预先人为指定,这样势必引入个人的主观性,因此,不是一个好的截尾准则。The optimal feature selection method is to evaluate all feature combinations, which usually leads to a combination explosion problem, so it is very important to study effective feature extraction methods. At present, many scholars have been engaged in research in this area, and several effective feature extraction methods have been proposed to solve the combination problem. But in these methods, the number of selected features is usually pre-specified artificially, which will inevitably introduce personal subjectivity, so it is not a good censoring criterion.
发明内容Contents of the invention
本发明的目的之一在于提一种模式识别中的特征提取方法,能够有效避免预先指定选择的特征个数的主观性。One of the objects of the present invention is to provide a feature extraction method in pattern recognition, which can effectively avoid the subjectivity of pre-specifying the number of features to be selected.
为达到上述目的,本发明采用的技术方案为:In order to achieve the above object, the technical scheme adopted in the present invention is:
该模式识别中的特征提取方法,包括步骤:The feature extraction method in the pattern recognition comprises steps:
根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;Determine the discrete feature variables and class variables according to the model original information of the sample, and preprocess the feature variables and class variables;
设定联合贡献度阈值;Set the joint contribution threshold;
确定特征变量的组合与类变量的联合贡献度;Determine the joint contribution of the combination of feature variables and class variables;
获取所述联合贡献度大于或等于所设定联合贡献度阈值的特征变量的组合。A combination of feature variables whose joint contribution degree is greater than or equal to the set joint contribution degree threshold is acquired.
在现有的特征提取方法中,选择的特征个数通常被预先人为指定,这样势必引入个人的主观性。基于这个问题,本发明提出了一种新的基于互信息的贡献度定义形式,用指定联合贡献度的阈值代替指定特征个数作为特征提取的截尾准则。根据所指定的联合贡献度的阈值,提取联合贡献度大于或等于所设定联合贡献度阈值的特征变量的组合,从而获得一个使信息损失量最小的特征子空间,这样能有效避免以往特征提取中的主观性。In the existing feature extraction methods, the number of selected features is usually pre-specified artificially, which inevitably introduces personal subjectivity. Based on this problem, the present invention proposes a new definition form of contribution degree based on mutual information, and uses the threshold value of the specified joint contribution degree instead of the specified number of features as the censoring criterion for feature extraction. According to the specified joint contribution threshold, extract the combination of feature variables whose joint contribution is greater than or equal to the set joint contribution threshold, so as to obtain a feature subspace that minimizes information loss, which can effectively avoid previous feature extraction. subjectivity in .
本发明的另一目的在于提一种模式识别中的特征提取装置,能够有效避免预先指定选择的特征个数的主观性。Another object of the present invention is to provide a feature extraction device in pattern recognition, which can effectively avoid the subjectivity of specifying the number of features to be selected in advance.
为达到该目的,所采用的技术方案为:In order to achieve this goal, the technical solutions adopted are:
该模式识别中的特征提取装置,包括:The feature extraction device in the pattern recognition includes:
数值预处理模块,用于根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;确定每个特征变量可能的取值,确定类变量可能的取值,设定特征子集,并把该特征子集初始化为空集;The numerical preprocessing module is used to determine discrete feature variables and class variables according to the original information of the sample pattern, and preprocess the feature variables and class variables; determine the possible values of each feature variable, and determine the possible values of class variables value, set the feature subset, and initialize the feature subset to an empty set;
阈值设定模块,用于设定联合贡献度阈值;A threshold setting module, configured to set a joint contribution threshold;
联合贡献度确定模块,用于确定特征子集与类变量的联合贡献度;A joint contribution determination module is used to determine the joint contribution of feature subsets and class variables;
特征提取模块,用于根据该联合贡献度,获取联合贡献度大于或等于所设定联合贡献度阈值的特征子集。The feature extraction module is configured to obtain a feature subset whose joint contribution is greater than or equal to the set joint contribution threshold according to the joint contribution.
在现有的特征提取中,选择的特征个数通常被预先人为指定,这样势必引入个人的主观性。基于这个问题,本发明提出了一种新的基于互信息的贡献度定义形式,用设定模块所预先设定的联合贡献度的阈值代替指定特征个数作为特征提取的截尾准则。通过联合贡献度确定模块来确定特征子集与类变量的联合贡献度,根据设定模块所预先设定的联合贡献度的阈值,由特征提取模块提取联合贡献度大于或等于所设定联合贡献度阈值的特征子集,从而获得一个使信息损失量最小的特征子空间,这样能有效避免以往特征提取中的主观性。In the existing feature extraction, the number of selected features is usually pre-specified artificially, which inevitably introduces individual subjectivity. Based on this problem, the present invention proposes a new definition form of contribution degree based on mutual information, and uses the threshold value of the joint contribution degree preset by the setting module instead of the specified number of features as the censoring criterion for feature extraction. The joint contribution degree of feature subsets and class variables is determined by the joint contribution degree determination module, and according to the threshold value of the joint contribution degree preset by the setting module, the joint contribution degree extracted by the feature extraction module is greater than or equal to the set joint contribution The feature subset of degree threshold is obtained to obtain a feature subspace that minimizes the amount of information loss, which can effectively avoid the subjectivity in previous feature extraction.
附图说明Description of drawings
图1为本发明模式识别方法的流程图;Fig. 1 is the flowchart of pattern recognition method of the present invention;
图2为本发明模式识别装置的系统框图;Fig. 2 is a system block diagram of the pattern recognition device of the present invention;
图3为本发明实施例中每个症状与证候之间的互信息示意图;Fig. 3 is a schematic diagram of mutual information between each symptom and syndrome in the embodiment of the present invention;
图4为本发明实施例中每个症状的贡献度示意图;Fig. 4 is a schematic diagram of the contribution of each symptom in the embodiment of the present invention;
图5为本发明实施例中选择症状的联合贡献度示意图。Fig. 5 is a schematic diagram of the joint contribution of selected symptoms in the embodiment of the present invention.
具体实施方式Detailed ways
为了更好地理解本发明,下面结合附图和具体实施方式对本发明作详细说明。In order to better understand the present invention, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.
特征提取是要选择最重要的特征组合,使其信息损失量达到最小,从实用的角度出发,可以节省大量的分类处理时间。Feature extraction is to select the most important feature combination to minimize the amount of information loss. From a practical point of view, it can save a lot of classification processing time.
本发明提出了一种基于新的截尾准则的特征提取方法和装置,这主要是针对离散变量的特征提取。在该特征提取方法和装置中,定义了一种新的基于互信息的联合贡献度形式,用指定联合贡献度的阈值代替指定特征个数作为特征提取的截尾准则,提取联合贡献度大于或等于所设定联合贡献度阈值的特征变量的组合,从而获得一个使信息损失量最小的特征子空间,这样能有效避免以往特征提取中的主观性,同时,本发明提出的基于样本量的计算联合互信息的方法,能大大降低计算量。The present invention proposes a feature extraction method and device based on a new truncation criterion, which is mainly for feature extraction of discrete variables. In the feature extraction method and device, a new form of joint contribution degree based on mutual information is defined, and the threshold value of the specified joint contribution degree is used instead of the specified number of features as the censoring criterion for feature extraction, and the extracted joint contribution degree is greater than or The combination of feature variables equal to the set joint contribution threshold can obtain a feature subspace that minimizes the amount of information loss, which can effectively avoid the subjectivity in the previous feature extraction. At the same time, the calculation based on the sample size proposed by the present invention The method of joint mutual information can greatly reduce the amount of calculation.
一种新的基于互信息的贡献度定义如下:A new contribution degree based on mutual information is defined as follows:
定义:设I(Xi;Y),i=1,2,…,n表示每个特征变量与类变量之间的互信息,I(X;Y)表示总的联合互信息,每个特征变量的基于互信息的贡献度定义为:Definition: Let I(X i ; Y), i=1, 2, ..., n represent the mutual information between each feature variable and class variable, I(X; Y) represents the total joint mutual information, each feature The mutual information-based contribution of variables is defined as:
ri=I(Xi;Y)/I(X;Y),i=1,2,…,nr i =I(X i ;Y)/I(X;Y), i=1, 2, . . . , n
特征变量集X的子集S与类变量Y之间的联合贡献度为:The joint contribution between the subset S of the feature variable set X and the class variable Y is:
rs=I(S;Y)/I(X;Y)r s =I(S;Y)/I(X;Y)
说明:根据基于香农熵互信息的性质,特征变量越多,与类变量之间的互信息越大,因此,贡献度与联合贡献度的取值范围在[0,1]之间。Explanation: According to the nature of mutual information based on Shannon entropy, the more feature variables, the greater the mutual information with class variables. Therefore, the value range of contribution degree and joint contribution degree is between [0, 1].
基于联合贡献度的特征提取具体操作方法介绍如下:The specific operation method of feature extraction based on joint contribution is introduced as follows:
给定一个已选择的特征子集S,该算法从特征集合X中选择下一个特征变量要满足该特征变量加入到S中生成的新的特征子集S←{S,Xi}与类变量之间的互信息最大。一个特征变量要被选择,那么该特征变量所提供的信息不应该在已选的特征子集S中包含。例如,如果两个特征变量Xi和Xj是高度相关的,那么I(Xi;Xj)的值就很大,当其中一个变量被选中,则另一个变量被选中的机会将大大降低。Given a selected feature subset S, the algorithm selects the next feature variable from the feature set X to meet the requirement that the feature variable be added to S to generate a new feature subset S←{S,X i } and the class variable The mutual information between them is the largest. If a feature variable is to be selected, the information provided by the feature variable should not be included in the selected feature subset S. For example, if two feature variables X i and X j are highly correlated, then the value of I(X i ; X j ) will be large, and when one of the variables is selected, the chance of the other variable being selected will be greatly reduced .
本发明模式识别中的特征提取方法,包括步骤:根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;设定联合贡献度阈值;确定特征变量的组合与类变量的联合贡献度;获取所述联合贡献度大于或等于所设定联合贡献度阈值的特征变量的组合。The feature extraction method in the pattern recognition of the present invention comprises the steps of: determining discrete feature variables and class variables according to the pattern original information of the sample, and preprocessing the feature variables and class variables; setting a joint contribution threshold; determining the feature variables The combination of the combination and the joint contribution degree of the class variable; obtain the combination of the feature variables whose joint contribution degree is greater than or equal to the set joint contribution degree threshold.
参考图1所示,结合中医的辨证论治问题,本发明模式识别中的特征提取方法,用于对从人体观测到的中间症状信息进行处理,包括如下具体步骤:Referring to Fig. 1, in combination with the problem of syndrome differentiation and treatment in traditional Chinese medicine, the feature extraction method in the pattern recognition of the present invention is used to process the intermediate symptom information observed from the human body, including the following specific steps:
步骤一、根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;将所有特征变量组合为特征变量集,并确定每个特征变量可能的取值;确定类变量可能的取值;设定特征子集,并把该特征子集初始化为空集。
分析1022份血瘀证临床数据。在这些数据里记载了71个人体症状,这些症状所对应的取值也就是模式原始信息,所有症状都用离散的特征变量表示,其中,一些症状(特征变量)有两个值,用取值0,1表示,一些症状(特征变量)有四个值,用取值0,1,2,3表示;中医的证候用类变量表示,该类变量有五个值,分别代表中医的五个证候:气虚血瘀、气滞血瘀、阳虚血瘀、痰瘀互阻、瘀血阻络。1022 clinical data of blood stasis syndrome were analyzed. There are 71 human symptoms recorded in these data. The values corresponding to these symptoms are the original information of the model. All symptoms are represented by discrete characteristic variables. Among them, some symptoms (characteristic variables) have two values. 0, 1 means that some symptoms (characteristic variables) have four values, which are represented by the
步骤二、设定联合贡献度阈值。Step 2: Set the joint contribution threshold.
该阈值的取值范围为[0,1].具体的取值通常根据实际需求进行确定,阈值越大,提取的症状数越多,根据经验,该阈值的取值范围一般为[0.9,0.98]。本实施例中的联合贡献度的阈值指定为0.95。The value range of the threshold is [0, 1]. The specific value is usually determined according to actual needs. The larger the threshold, the more symptoms will be extracted. According to experience, the value range of the threshold is generally [0.9, 0.98 ]. The threshold value of the joint contribution degree in this embodiment is designated as 0.95.
步骤三、确定症状的组合与症候之间的联合贡献度,具体包括如下步骤:Step 3. Determine the combination of symptoms and the joint contribution between symptoms, specifically including the following steps:
S300、确定每个症状与症候之间的互信息;S300. Determine mutual information between each symptom and symptom;
S301、确定使症状与证候之间的互信息最大的症状,将该症状从症状集合中去除,并加入到特征子集中;S301. Determine the symptom that maximizes the mutual information between the symptom and the syndrome, remove the symptom from the symptom set, and add it to the feature subset;
S302、确定该特征子集与证候的联合贡献度。S302. Determine the joint contribution degree of the feature subset and the syndrome.
其中,在步骤S300中,每个症状与证候的互信息是通过公式:Wherein, in step S300, the mutual information of each symptom and syndrome is through the formula:
每个症状与证候的互信息公式是这样的得来的:The mutual information formula of each symptom and syndrome is obtained as follows:
设n个特征变量用集合X={X1;X2;…;Xn}表示,其概率密度函数分别为p(x1),p(x2),…,p(xn)。
类变量Y的Shannon熵可表示为:The Shannon entropy of class variable Y can be expressed as:
特征变量Xi和类变量Y之间的联合熵可表示为:The joint entropy between feature variable Xi and class variable Y can be expressed as:
其中Xi可用特征变量集X的一个子集来代替,即联合熵可推广到n个特征变量的情况。类变量Y与特征变量Xi之间的互信息可表示成:Among them, Xi can be replaced by a subset of feature variable set X, that is, the joint entropy can be extended to n feature variables. The mutual information between class variable Y and feature variable Xi can be expressed as:
其中Xi可用特征变量集X的一个子集来代替。Where X i can be replaced by a subset of feature variable set X.
特征变量、类变量和它们的联合概率分布是通过统计的方法获得的,具体为:Feature variables, class variables and their joint probability distributions are obtained through statistical methods, specifically:
令n个特征变量用集合X={X1,X2,…,Xn}表示,变量Xi有mi个值,即
这时,特征变量、类变量和它们的联合概率分布就可以通过统计的方法获得,即
通过计算每个症状与证候之间的互信息如图3所示。By calculating the mutual information between each symptom and syndrome, it is shown in Figure 3.
在步骤S300与步骤S301之间还有步骤:从症状集合中除去与证候的互信息小于预定值的症状。There is a step between step S300 and step S301: removing symptoms whose mutual information with syndromes is smaller than a predetermined value from the symptom set.
通过上述互信息计算公式得的到每个症状与症候的互信息后,一些症状的互信息非常小,因此这些症状可以被忽略,对保留下来的症状集合进行特征提取,而且这不会对正确分类产生太大的影响,这样可大大节省特征提取的时间。After the mutual information of each symptom and symptom is obtained through the above mutual information calculation formula, the mutual information of some symptoms is very small, so these symptoms can be ignored, and feature extraction is performed on the retained symptom set, and this will not be correct. Classification has too much impact, which can save a lot of time for feature extraction.
在步骤S302中,特征子集与证候的联合贡献度是通过公式:In step S302, the joint contribution degree of feature subset and syndrome is obtained through the formula:
rs=I(S;Y)/I(X;Y)来确定的。r s =I(S;Y)/I(X;Y) to determine.
其中,rs表示联合贡献度;Among them, r s represents the joint contribution;
I(S;Y)表示联合互信息,通过公式:I(S; Y) represents the joint mutual information, through the formula:
I(X;Y)表示总的联合互信息。I(X;Y) denotes the total joint mutual information.
下面介绍关于总的联合互信息的确定方法。The method for determining the total joint mutual information is introduced below.
根据贡献度的定义,需要计算症状集合与证候之间总的联合互信息,当用常规的互信息计算方法进行计算时,它的计算量非常大,并且当症状很多时会产生组合爆炸。例如,有30个症状,每个症状有4个取值,它们被映射到2类,那么它需要计算大约1.15×1018个组合值,这在实际运算中是很难完成的。通过统计可以发现,在样本有限的情况下,很多组合的概率为0,因此可通过样本而不考虑具体的症状组合来计算总的联合互信息,下面将介绍该计算方法。According to the definition of the contribution degree, it is necessary to calculate the total joint mutual information between the symptom set and the syndrome. When the conventional mutual information calculation method is used for calculation, the calculation amount is very large, and when there are many symptoms, the combination explosion will occur. For example, there are 30 symptoms, each symptom has 4 values, and they are mapped to 2 categories, then it needs to calculate about 1.15× 1018 combined values, which is difficult to complete in actual operation. Through statistics, it can be found that in the case of limited samples, the probability of many combinations is 0, so the total joint mutual information can be calculated through samples without considering specific symptom combinations. The calculation method will be introduced below.
B=(B1,B2,…,BN)T是一个频次向量,表示特征变量(症状)的值都相等的样本数,它的计算过程将在下面描述。D=(Dij),i=1,2,…,N;j=1,2,…,k是一个频次矩阵,表示特征变量(症状)值都相等,同时类变量(证候)的值也相等的样本数,E=(E1,E2,…,Ek)T是一个频次向量,表示类变量(证候)的值相等的样本数。该算法可通过下面的步骤来实现:B=(B 1 , B 2 , . . . , B N ) T is a frequency vector, representing the number of samples with equal values of characteristic variables (symptoms), and its calculation process will be described below. D=(D ij ), i=1, 2,..., N; j=1, 2,..., k is a frequency matrix, which means that the values of characteristic variables (symptoms) are all equal, and the values of class variables (syndrome) The number of samples that are also equal, E=(E 1 , E 2 , ..., E k ) T is a frequency vector, indicating the number of samples with equal values of class variables (syndrome). This algorithm can be realized through the following steps:
步骤S3031:设训练样本T已知,初始化参数:令向量B的所有元素值为1,令矩阵D和向量E的所有元素值都为0。Step S3031: Assuming that the training sample T is known, initialize parameters: set all elements of vector B to 1, and set all elements of matrix D and vector E to 0.
步骤S3032:下面的程序用来获得计算概率时用到的频次。Step S3032: The following procedure is used to obtain the frequency used in calculating the probability.
设i=1,2,…,N,j=i+1,i+2,…,NLet i=1, 2,..., N, j=i+1, i+2,..., N
如果Bi=0,那么执行下一个循环;If B i =0, execute the next cycle;
否则otherwise
如果yi=cl,那么El=El+1,l=1,2,…,k;If y i = c l , then E l = E l + 1, l = 1, 2, . . . , k;
如果xi=xj,那么Bi=Bi+1,Bj=0;If x i =x j , then B i =B i +1, B j =0;
如果xi=xj和yi=cl,那么Dil=Dil+1,l=1,2,…,k。步骤S3033:计算总的联合互信息If x i =x j and y i =c l , then D il =D il +1, l=1, 2, . . . , k. Step S3033: Calculate the total joint mutual information
说明:当Dij×Bi×Ej等于0时,log(Dij/BiEj)=0。Note: when D ij ×B i ×E j is equal to 0, log(D ij /B i E j )=0.
利用该算法,很容易计算总的联合互信息I(X;Y),当样本量不是很大的情况下,计算量可大大的降低。例如,当N=2000,n=30,k=2时,仅需要循环来计算联合概率,该算法与特征变量(症状)个数和每个特征变量(症状)可能的取值个数无关。Using this algorithm, it is easy to calculate the total joint mutual information I(X; Y), and when the sample size is not very large, the calculation amount can be greatly reduced. For example, when N=2000, n=30, k=2, only need Cycle to calculate the joint probability, the algorithm has nothing to do with the number of characteristic variables (symptoms) and the number of possible values of each characteristic variable (symptom).
通过计算本实施例中71个症状与证候之间总的联合互信息为1.7342。By calculating the total joint mutual information between 71 symptoms and syndromes in this embodiment is 1.7342.
根据每个特征变量的基于互信息的贡献度的定义,很容易计算每个症状的贡献度,所有症状的单独贡献度如图4所示。According to the definition of mutual information-based contribution of each feature variable, it is easy to calculate the contribution of each symptom, and the individual contributions of all symptoms are shown in Figure 4.
步骤四:获取所述联合贡献度大于或等于所设定联合贡献度阈值的症状的组合,具体包括步骤:Step 4: Obtain the combination of symptoms whose joint contribution is greater than or equal to the set joint contribution threshold, specifically including steps:
将所确定的联合贡献度与所设定的联合贡献度阈值进行比较,Comparing the determined joint contribution with the set joint contribution threshold,
若所确定的联合贡献度大于或等于所设定的联合贡献度阈值,则获取该特征子集;If the determined joint contribution degree is greater than or equal to the set joint contribution degree threshold, then obtain the feature subset;
若所确定的联合贡献度小于所设定的联合贡献度阈值,则对于症状集合的每个症状分别与特征子集的组合,确定使该组合与证候的互信息最大的症状,将该症状从症状集合中去除,并加入到特征子集中;然后回到步骤三往下执行。If the determined joint contribution degree is less than the set joint contribution degree threshold, then for the combination of each symptom of the symptom set and the feature subset, determine the symptom that maximizes the mutual information between the combination and the syndrome, and the symptom Remove from the symptom set and add to the feature subset; then go back to step 3 and execute down.
通过特征提取,9个症状被选择,他们的联合贡献度为0.9711,结果如图5所示。选择的循序依次为急噪易怒,偏身麻木,胸闷,失眠,疲乏无力,职业,舌脉曲张,舌质紫暗,面色黑,这意味这着这9个症状的联合贡献度最大,在诊断这五个症候时包含的信息量最多。Through feature extraction, 9 symptoms are selected, and their joint contribution is 0.9711, and the results are shown in Figure 5. The order of selection is anxious noise and irritability, side numbness, chest tightness, insomnia, fatigue, occupation, varicose veins, dark purple tongue, and dark complexion, which means that the combined contribution of these 9 symptoms is the largest. These five symptoms contain the most information when diagnosing them.
为证明所选择的症状组合信息量最大,有效的方法是用这些症状来辨证,这里选用多类支持向量机进行分类,支持向量机的设置为:惩罚参数C=20,核函数选为径向基函数,宽度设为σ2=0.1。863个样本作为训练样本,余下的159个样本作为测试样本,当所有症状作为支持向量机的输入,通过训练,107个样本可以被正确分类,分类正确率为0.6729。当经过特征提取的症状组合作为支持向量机的输入,123个样本可以被正确分类,分类正确率为0.7736。它的正确率高于所有症状作为输入时的正确率是因为在整个症状集合中存在噪音,经过特征提取,噪音可以被降低,因此经过特征提取的症状组合是信息量最大的组合。In order to prove that the selected symptom combination has the largest amount of information, the most effective method is to use these symptoms to differentiate. Here, a multi-class support vector machine is selected for classification. The setting of the support vector machine is: penalty parameter C = 20, and the kernel function is selected as radial Basis function, the width is set to σ 2 =0.1. 863 samples are used as training samples, and the remaining 159 samples are used as test samples. When all symptoms are used as the input of the support vector machine, 107 samples can be correctly classified through training, and the classification is correct The rate is 0.6729. When the symptom combination after feature extraction is used as the input of the support vector machine, 123 samples can be correctly classified, and the classification accuracy rate is 0.7736. Its correct rate is higher than that of all symptoms as input because there is noise in the entire symptom set. After feature extraction, the noise can be reduced, so the combination of symptoms after feature extraction is the combination with the largest amount of information.
在该特征提取实例中如果用常规的互信息计算方法进行计算,会发生组合爆炸,实际中无法实现,而根据这里提出的离散变量互信息的快速算法,本特征提取在2个小时左右就可完成。In this feature extraction example, if the conventional mutual information calculation method is used for calculation, a combination explosion will occur, which cannot be realized in practice. However, according to the fast algorithm of discrete variable mutual information proposed here, this feature extraction can be done in about 2 hours. Finish.
本发明另一实例为利用本发明对实时集成电路IC卡数字字符进行识别。Another example of the present invention is to use the present invention to recognize the digital characters of the real-time integrated circuit IC card.
该实例是要实现对生产的IC卡上面打印的卡号进行快速识别,以检验打印的卡号与输入的卡号是否符合。每张卡上打印32个数字,这些打印的数字是由阿拉伯数字0-9组合而成的。This example is to quickly identify the card number printed on the produced IC card, so as to check whether the printed card number matches the input card number. 32 numbers are printed on each card, and these printed numbers are composed of Arabic numerals 0-9.
首先通过图像采集卡对IC卡上打印的数字进行采集,生成数字图像,其次通过图像处理方法将打印的数字分割为32个数字区域,每个数字区域的大小为8×10个像素,然后对每个数字区域进行识别,确定其所对应的数字。每秒钟需要处理6张这样的IC卡。First, the digits printed on the IC card are collected by the image acquisition card to generate a digital image, and then the printed digits are divided into 32 digital areas by image processing method, and the size of each digital area is 8×10 pixels, and then Each number field is identified to determine its corresponding number. Six such IC cards need to be processed per second.
应用本发明模式识别中的特征提取方法,对每个数字区域进行特征提取,包括如下步骤:Apply the feature extraction method in the pattern recognition of the present invention, carry out feature extraction to each digital area, comprise the steps:
S01、根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;将所有特征变量组合为特征变量集,并确定每个特征变量可能的取值;确定类变量可能的取值;设定特征子集,并把该特征子集初始化为空集。S01. Determine discrete feature variables and class variables according to the pattern original information of the sample, and preprocess the feature variables and class variables; combine all feature variables into a feature variable set, and determine the possible value of each feature variable; Determine the possible values of the class variables; set the feature subset, and initialize the feature subset to an empty set.
在这里,模式原始信息为IC卡上数字图像中像素点的灰度值,特征变量为数字图像的像素点,类变量为数字值。每一个特征变量(像素点)有2个灰度值0和1,特征变量集合为80个像素点。数字区域可分成10类,即数字0—9。Here, the original information of the pattern is the gray value of the pixel in the digital image on the IC card, the feature variable is the pixel of the digital image, and the class variable is the digital value. Each feature variable (pixel) has two
S02、设定联合贡献度阈值。S02. Set a joint contribution threshold.
本实施例中的联合贡献度的阈值指定为0.95The threshold of joint contribution in this embodiment is specified as 0.95
S03,确定像素点的组合与数字之间的联合贡献度,具体包括如下步骤:S03, determining the joint contribution degree between the combination of pixels and numbers, specifically including the following steps:
S031、确定每个像素点与数字之间的互信息;S031. Determine the mutual information between each pixel and the number;
S032、确定使与数字互信息最大的像素点,将该像素点从像素点集合中去除,并加入到特征子集中;S032. Determine the pixel point that maximizes the mutual information with the number, remove the pixel point from the pixel point set, and add it to the feature subset;
S033、确定该特征子集与数字之间的联合贡献度。S033. Determine the joint contribution degree between the feature subset and the number.
其中,在步骤S031中,每个像素点与数字之间的互信息是通过上述公式:Wherein, in step S031, the mutual information between each pixel and the number is through the above formula:
在步骤S031与步骤S032之间还有一步:从像素点集合中除去与数字的互信息小于预定值的像素点。There is another step between step S031 and step S032: removing pixels whose mutual information with numbers is smaller than a predetermined value from the set of pixels.
通过上述互信息计算公式得的到每个像素点与数字的互信息后,一些像素点的互信息非常小,因此这些像素点可以被忽略,对保留下来的像素点集合进行特征提取,而且这不会对正确分类产生太大的影响,这样可大大节省特征提取的时间。After the mutual information between each pixel and the number is obtained through the above mutual information calculation formula, the mutual information of some pixels is very small, so these pixels can be ignored, and feature extraction is performed on the retained set of pixels, and this It will not have much impact on the correct classification, which can greatly save the time of feature extraction.
在步骤S033中,特征子集与数字之间的联合贡献度是通过公式:In step S033, the joint contribution degree between the feature subset and the number is obtained by the formula:
rs=I(S;Y)/I(X;Y)来确定的。r s =I(S;Y)/I(X;Y) to determine.
其中,rs表示联合贡献度;Among them, r s represents the joint contribution;
I(S;Y)表示联合互信息;I(S; Y) represents joint mutual information;
I(X;Y)表示总的联合互信息。I(X;Y) denotes the total joint mutual information.
S04,获取所述联合贡献度大于或等于所设定联合贡献度阈值的像素点的组合,具体包括步骤:S04. Obtain the combination of pixels whose joint contribution degree is greater than or equal to the set joint contribution degree threshold, specifically including steps:
将所确定的联合贡献度与所设定的联合贡献度阈值进行比较,Comparing the determined joint contribution with the set joint contribution threshold,
若所确定的联合贡献度大于或等于所设定的联合贡献度阈值,则获取该特征子集;If the determined joint contribution degree is greater than or equal to the set joint contribution degree threshold, then obtain the feature subset;
若所确定的联合贡献度小于所设定的联合贡献度阈值,则对于像素点集合的每个像素点分别与特征子集的组合,确定使该组合与数字之间的互信息最大的像素点,将该像素点从像素点集合中去除,并加入到特征子集中,然后回到步骤S033往下执行。If the determined joint contribution degree is less than the set joint contribution degree threshold, then for the combination of each pixel point of the pixel point set and the feature subset, determine the pixel point that maximizes the mutual information between the combination and the number , remove the pixel from the pixel set and add it to the feature subset, and then go back to step S033 for execution.
通过该特征提取方法,只有21个像素点就可以达到预期的识别效果,大大提高了IC卡上所打印卡号的识别效率。Through this feature extraction method, only 21 pixel points can achieve the expected recognition effect, which greatly improves the recognition efficiency of the card number printed on the IC card.
如图2所示,本发明还提供一种模式识别中的特征提取装置,包括:As shown in Figure 2, the present invention also provides a feature extraction device in pattern recognition, including:
数值预处理模块10,根据样本的模式原始信息确定离散的特征变量与类变量,并对该特征变量与类变量进行预处理;
阈值设定模块20,用于设定联合贡献度阈值;Threshold
联合贡献度确定模块30,用于确定数值预处理模块所设定的特征子集与类变量的联合贡献度;A joint contribution
特征提取模块40,用于根据该联合贡献度,获取联合贡献度大于或等于所设定联合贡献度阈值的特征子集。The
其中,所述联合贡献度确定模块30包括:Wherein, the joint contribution
互信息确定单元301,用于确定每个特征变量与类变量之间的互信息;A mutual
最大值确定单元303,用于根据所述互信息,确定使特征变量与类变量之间的互信息最大的特征变量,将该特征变量从特征变量集中去除,并加入到特征变量集的子集中;对于特征变量集的每个特征变量分别与特征子集的组合,确定使该组合与类变量的互信息最大的特征变量,将该特征变量从特征变量集中去除,并加入到特征子集中;The maximum
联合贡献度确定单元304,用于确定特征子集与类变量的联合贡献度。A joint contribution
为了节省特征提取的时间,在所述互信息确定单元与最大值确定单元之间还有一过滤单元302,用于从特征变量集中除去与类变量的互信息小于预定值的特征变量。这样,通过上述互信息计算公式得的到每个症状与症候的互信息后,一些症状的互信息非常小,因此这些症状可以被忽略,对保留下来的症状集合进行特征提取,而且这不会对正确分类产生太大的影响,这样可大大节省特征提取的时间。In order to save time for feature extraction, there is a
所述特征提取模块40包括:Described
比较单元401,用于将所确定的联合贡献度与所设定的联合贡献度阈值进行比较;A comparing
提取单元402,用于提取联合贡献度大于或等于所设定的联合贡献度阈值的特征子集。The
若比较单元401所确定的联合贡献度大于或等于所设定的联合贡献度阈值,则提取单元402将该特征子集;若比较单元401所确定的联合贡献度小于所设定的联合贡献度阈值,则由互信息确定单元301确定特征变量集的每个特征变量分别与特征子集的组合与类变量的互信息,由最大值确定单元303从中确定使该组合与类变量的互信息最大的特征变量,将该特征变量从特征变量集中去除,并加入到特征子集中;然后由联合贡献度确定单元304确定该特征子集联合贡献度。If the joint contribution degree determined by the
所述阈值设定模块所设定的联合贡献度阈值的取值范围一般为[0.9,0.98]。The value range of the joint contribution threshold set by the threshold setting module is generally [0.9, 0.98].
本发明模式识别中的特征提取方法与装置,主要是针对离散变量的特征提取。在该特征提取方法和装置中,定义了一种新的联合贡献度形式,这种基于联合贡献度的特征提取方法可有效避免以往特征提取方法预先指定选择的特征个数的主观性,并且可以提高提取的速度,能够广泛应用于离散的数字图像信息、指纹信息、脸纹信息、语音信息或手写/印刷字符信息的等的特征提取。The feature extraction method and device in the pattern recognition of the present invention are mainly aimed at the feature extraction of discrete variables. In the feature extraction method and device, a new form of joint contribution is defined. This feature extraction method based on joint contribution can effectively avoid the subjectivity of the previous feature extraction method specifying the number of selected features in advance, and can Improve the speed of extraction, and can be widely used in the feature extraction of discrete digital image information, fingerprint information, face pattern information, voice information or handwritten/printed character information.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200710118156XA CN101334843B (en) | 2007-06-29 | 2007-06-29 | Pattern recognition characteristic extraction method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200710118156XA CN101334843B (en) | 2007-06-29 | 2007-06-29 | Pattern recognition characteristic extraction method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101334843A CN101334843A (en) | 2008-12-31 |
CN101334843B true CN101334843B (en) | 2010-08-25 |
Family
ID=40197432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200710118156XA Expired - Fee Related CN101334843B (en) | 2007-06-29 | 2007-06-29 | Pattern recognition characteristic extraction method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101334843B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574351B (en) * | 2015-12-31 | 2017-02-15 | 北京千安哲信息技术有限公司 | Medical data processing method |
CN112559591B (en) * | 2020-12-08 | 2023-06-13 | 晋中学院 | An outlier detection system and detection method for cold roll manufacturing process |
CN113780481B (en) * | 2021-11-11 | 2022-04-08 | 中国南方电网有限责任公司超高压输电公司广州局 | Monitoring method and device for power equipment, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1617161A (en) * | 2003-11-10 | 2005-05-18 | 北京握奇数据系统有限公司 | Finger print characteristic matching method based on inter information |
CN1631321A (en) * | 2003-12-23 | 2005-06-29 | 中国科学院自动化研究所 | A Multimodal Medical Image Registration Method Based on Sensitive Regions of Mutual Information |
-
2007
- 2007-06-29 CN CN200710118156XA patent/CN101334843B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1617161A (en) * | 2003-11-10 | 2005-05-18 | 北京握奇数据系统有限公司 | Finger print characteristic matching method based on inter information |
CN1631321A (en) * | 2003-12-23 | 2005-06-29 | 中国科学院自动化研究所 | A Multimodal Medical Image Registration Method Based on Sensitive Regions of Mutual Information |
Non-Patent Citations (2)
Title |
---|
孙占成,西广成,易建强,李海霞.中医辩证的智能系统模型.系统仿真学报19 10.2007,19(10),2318-2320,2391. |
孙占成,西广成,易建强,李海霞.中医辩证的智能系统模型.系统仿真学报19 10.2007,19(10),2318-2320,2391. * |
Also Published As
Publication number | Publication date |
---|---|
CN101334843A (en) | 2008-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tom et al. | Fingerprint based gender classification using 2D discrete wavelet transforms and principal component analysis | |
Kaur et al. | Fingerprint based gender identification using frequency domain analysis | |
CN101874738B (en) | Method for biophysical analysis and identification of human body based on pressure accumulated footprint image | |
WO2021120007A1 (en) | Infrared image sequence-based sleep quality evaluation system and method | |
CN103324952A (en) | Method for acne classification based on characteristic extraction | |
Zhou et al. | constitution identification of tongue image based on CNN | |
CN113469143A (en) | Finger vein image identification method based on neural network learning | |
CN101334843B (en) | Pattern recognition characteristic extraction method and apparatus | |
Yuan et al. | Fingerprint liveness detection using an improved CNN with the spatial pyramid pooling structure | |
CN104376320B (en) | A kind of feature extracting method for artificial fingerprint detection | |
CN118411744A (en) | Facial video depression detection system, device and medium based on deep learning | |
CN118823771A (en) | A lung cancer image recognition method based on frequency domain deep learning | |
Wu et al. | Fusion recognition of palmprint and palm vein based on modal correlation | |
CN116664956A (en) | Image recognition method and system based on multi-task automatic encoder | |
Mukahar | Performance comparison of pcanet-based deep learning techniques for palmprint recognition | |
Luan et al. | Ghost Module Based Residual Mixture of Self-Attention and Convolution for Online Signature Verification. | |
George | Development of efficient biometric recognition algorithms based on fingerprint and face | |
CN107194918A (en) | Data analysing method and device | |
Kumar et al. | Comparison of ResNet50 Algorithm with AlexNet Algorithm in Precise Biometric Palm Print Recognition | |
Prasanna et al. | Skin cancer detection using image classification in deep learning | |
Ibrahem et al. | Age invariant face recognition model based on convolution neural network (CNN) | |
Yahia et al. | Efficient Epileptic Seizure Detection Method Based on EEG Images: The Reduced Descriptor Patterns | |
CN117893528B (en) | Method and device for constructing cardiovascular and cerebrovascular disease classification model | |
Rai et al. | Intelligent Framework for Early Prediction of Type–II Diabetes by Accelerating Palm Print Images using Graph Data Science | |
Hassani et al. | ECG Signal Classification based on combined CNN Features and Optimised Support Vector Machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20100825 Termination date: 20170629 |
|
CF01 | Termination of patent right due to non-payment of annual fee |