WO2018149084A1 - 相关变量识别方法、装置、终端及存储介质 - Google Patents
相关变量识别方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2018149084A1 WO2018149084A1 PCT/CN2017/090578 CN2017090578W WO2018149084A1 WO 2018149084 A1 WO2018149084 A1 WO 2018149084A1 CN 2017090578 W CN2017090578 W CN 2017090578W WO 2018149084 A1 WO2018149084 A1 WO 2018149084A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- variable
- correlation coefficient
- target
- group number
- variables
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
Definitions
- the present invention relates to the field of computer processing, and in particular, to a related variable identification method, apparatus, terminal, and storage medium.
- a related variable identification method, apparatus, terminal, and storage medium are provided.
- a related variable identification method includes:
- the correlation coefficient table records correlation coefficients of a plurality of variables with each other;
- the target correlation coefficient with the mark in the adjusted correlation coefficient table is highlighted.
- a related variable identification device includes:
- An obtaining module configured to obtain a correlation coefficient table to be processed, where the correlation coefficient table records a correlation coefficient between a plurality of variables;
- a search module configured to search for a correlation coefficient that the absolute value of the correlation coefficient in the correlation coefficient table is greater than a preset threshold, and use a correlation coefficient whose absolute value is greater than a preset threshold as a target correlation coefficient, and perform the target correlation coefficient mark;
- a clustering module configured to cluster the related plurality of variables into the same group according to the target correlation coefficient, and assign a unique group number to the group;
- An adjustment module configured to adjust a sequence of variables in the correlation coefficient table according to a group number of the group, and adjust multiple variables having the same group number to adjacent variables;
- a display module configured to highlight the target correlation coefficient with the mark in the adjusted correlation coefficient table.
- a terminal comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
- the correlation coefficient table records correlation coefficients of a plurality of variables with each other;
- the target correlation coefficient with the mark in the adjusted correlation coefficient table is highlighted.
- One or more non-transitory readable storage mediums storing computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
- the correlation coefficient table records correlation coefficients of a plurality of variables with each other;
- the target correlation coefficient with the mark in the adjusted correlation coefficient table is highlighted.
- FIG. 1 is a block diagram showing the internal structure of a terminal in an embodiment
- FIG. 2 is a flow chart of a method for identifying a related variable in an embodiment
- 3A is a schematic diagram showing a partial recognition result of a conventional method
- 3B is a schematic diagram of a partial recognition result in one embodiment
- FIG. 4 is a flow chart of a method for clustering related variables into the same group according to a target correlation coefficient in one embodiment
- FIG. 5 is a flow chart of a method for assigning a group number to a target variable if the target variable is not grouped in one embodiment
- FIG. 6 is a flow chart of a method for identifying a related variable in another embodiment
- Figure 7 is a block diagram showing the structure of a correlation variable identifying apparatus in an embodiment
- FIG. 8 is a structural block diagram of a clustering module in an embodiment
- Fig. 9 is a block diagram showing the structure of a correlation variable identifying apparatus in another embodiment.
- the internal structure of the terminal 102 is as shown in FIG. 1, including a processor connected through a system bus, an internal memory, a non-volatile storage medium, a network interface, a display screen, and an input device.
- the non-volatile storage medium of the terminal 102 stores an operating system and computer readable instructions executable by the processor to implement a related variable identification method suitable for the terminal 102.
- the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
- the internal memory in the terminal provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium.
- the network interface is used to connect to the network for communication.
- the display screen of the terminal 102 may be a liquid crystal display or an electronic ink display screen.
- the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the outer casing of the electronic device, or may be An external keyboard, trackpad, or mouse.
- the terminal can be a tablet, a laptop, a desk Computer, etc.
- FIG. 1 A person skilled in the art can understand that the structure shown in FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
- the specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
- a related variable identification method is proposed, which is applicable to the terminal shown in FIG. 1, and includes the following steps:
- Step 202 Acquire a correlation coefficient table to be processed, and record a correlation coefficient between a plurality of variables in the correlation coefficient table.
- the correlation coefficient table to be processed is obtained, wherein the correlation coefficient table is obtained by calculating the correlation between the two variables in advance, and the correlation coefficient between the two variables is recorded, and the absolute value of the correlation coefficient represents The correlation between the two variables, the larger the absolute value, the greater the correlation between the two variables.
- the terminal acquires a correlation coefficient table to be processed according to a user instruction, and a correlation coefficient between the plurality of variables is recorded in the correlation coefficient table, and Table 1 is an implementation.
- Table 1 Schematic diagram of the correlation coefficient table in the example:
- A1, A2, A3, ... An represents each variable, and any two variables correspond to a correlation coefficient.
- the correlation coefficient corresponding to A1 and A2 is 0.007482 (ie, A1 row and A2 column pair).
- the magnitude of the absolute value of each correlation coefficient represents the correlation between the two variables. The larger the absolute value, the higher the correlation between the two variables.
- Step 204 Find a correlation coefficient whose absolute value of the correlation coefficient in the correlation coefficient table is greater than a preset threshold, and use a correlation coefficient whose absolute value is greater than a preset threshold as a target correlation coefficient, and mark the target correlation coefficient.
- the threshold of the correlation coefficient is preset in the terminal, and the correlation coefficient between the two variables is only greater than the preset threshold, and the two variables are considered to be related. Otherwise, the two variables are irrelevant.
- Find the correlation coefficient whose absolute value of the correlation coefficient in the correlation coefficient table is greater than the preset threshold and use the found correlation coefficient as the target correlation coefficient, and mark the target correlation coefficients.
- the preset threshold is set to 0.75, and the absolute value of the correlation coefficient is greater than the preset threshold, indicating that the two variables are related.
- Traverse the entire correlation coefficient table find the correlation coefficients whose absolute values of all correlation coefficients are greater than the preset threshold (>0.75), and use these correlation coefficients as the target correlation coefficients, and then mark the target correlation coefficients to facilitate subsequent correspondence. deal with.
- Step 206 Cluster the plurality of variables having relevance according to the target correlation coefficient into the same group, and assign a unique group number to the group.
- the correlation coefficient in the correlation coefficient table represents the correlation between the two variables. Only when the correlation is greater than the preset threshold, the two variables are considered to be related, otherwise they are irrelevant. Therefore, the correlation between two variables is defined as: if the absolute value of the correlation coefficient is greater than the preset threshold, the corresponding two variables are related; the three variables are defined as: if the absolute values of the correlation coefficients of A and B are greater than a preset threshold, B If the absolute value of the correlation coefficient with C is greater than the preset threshold, then A, B and C are related. That is to say, the three variable correlations do not require each other to be related, only two or two related variables need to have a common variable.
- a and B and C are related, and the two have a common variable B to associate the three.
- a and B and C are both Correlation, and so on, can get more than three variables related.
- the target correlation coefficient refers to a correlation coefficient whose absolute value is greater than a preset threshold
- the two variables corresponding to the target correlation coefficient are necessarily related.
- Two related variables are called a pair of related variables, if Two pairs of related variables have the same variable, then the variables included in the two pairs are related. Further, if there are other variables related to any of the two pairs of variables, then the other variables are also related to the three variables.
- the terminal classifies the related multiple variables into the same group by clustering, that is, grouping the plurality of related variables into one group, and assigning a group number to the group, that is, having the same group number. Multiple variables are related.
- Step 208 adjusting the order of the variables in the correlation coefficient table according to the group number of the group, and adjusting a plurality of variables having the same group number to adjacent variables.
- the terminal sets the group numbers of the related multiple variables to be the same
- the plurality of variables having the same group number are adjusted to adjacent variables, that is, each of the correlation coefficient tables is re-adjusted according to the group number of the group.
- the order in which the variables are arranged In this way, multiple variables with associations are gathered together so that subsequent variables can be quickly identified.
- Step 210 Highlight the target correlation coefficient with the mark in the adjusted correlation coefficient table.
- FIG. 3B is a part of an embodiment.
- Schematic diagram of the recognition result since the number of variables is often many, Figure 3B only shows the partial recognition results
- the gray shading is the target correlation coefficient, that is, the correlation coefficient greater than 0.75 is set as the target correlation coefficient.
- a plurality of variables corresponding to the aggregation of multiple target correlation coefficients are related variables. In order to gather related variables together, the original order is broken, so that the correlation between multiple variables can be clearly seen from the figure, thus realizing the rapid identification of multiple related variables, which is conducive to improving construction. The speed of the mold process.
- the correlation coefficient table by acquiring the correlation coefficient table, searching for the target correlation coefficient whose absolute value of the correlation coefficient in the correlation coefficient table is greater than a preset threshold, and grouping the related plurality of variables into the same group according to the target correlation coefficient, and Assign a unique group number to the group, adjust the order of the variables in the correlation coefficient table according to the group number of the group, adjust multiple variables with the same group number to adjacent variables, and then adjust the correlation coefficient table.
- the target correlation coefficient with the mark is highlighted. At this time, a plurality of related variables are gathered together, and the corresponding target correlation coefficient is highlighted, thereby realizing rapid recognition of multiple related variables, thereby improving data modeling speed. .
- the step of grouping the related plurality of variables into the same group according to the target correlation coefficient value, and assigning a unique group number to the group includes:
- Step 206A Obtain a target variable to be clustered, determine whether the target variable has been grouped, and if it has been grouped, proceed to step 206B, and if not, proceed to step 206C.
- a plurality of related variables are clustered by using a traversal method.
- a target variable to be clustered is determined, and then other variables related to the target variable are found.
- the variable to be clustered is referred to as a target variable, and then it is determined whether the target variable has been grouped. If it has been grouped, the first variable related to the target variable needs to be acquired according to the target correlation coefficient, because the target correlation coefficient corresponds to The two variables are related variables, so first the first variable related to the target variable is obtained according to the target correlation coefficient, and then the second variable having the same number as the first variable group number is searched, and the first variable and the second variable are grouped. The number is modified to be the same as the group number of the target variable. If not grouped, a first variable associated with the target variable and a second variable identical to the first variable group number are obtained, and a new group number is assigned to the target variable, the first variable, and the second variable.
- Step 206B Acquire a first variable related to the target variable and a second variable that is the same as the first variable group number according to the target correlation coefficient, and modify the group number of the first variable and the second variable to be the same as the group number of the target variable.
- the target variable has been grouped in the terminal, indicating that the target variable already has the group number
- the first variable associated with the target variable is searched, and then the first variable is also searched for.
- the second variable with the same group number, the first variable and the second variable group number are the same, indicating that the two are already related variables, so that the first variable directly related to the target variable and the indirect correlation with the target variable can be
- the second variable is all found, so that all variables related to the target variable are found, and then the group number of all variables related to the target variable is modified to be the same as the group number of the target variable, that is, the first variable and the second variable
- the group number of the variable is modified to be the same as the group number of the target variable.
- the first variable is used to represent a variable directly related to the target variable
- the second variable is used to represent a variable related to the target variable through the first variable.
- the first variable and the second variable represent a type of variable, respectively, and are not used to limit the quantity. Specifically, assuming that the target variable is A, first, finding the first variable related to the target variable A, assuming that the first variable found is B, C, and then finding the second variable that is the same as the first variable group number, such as Find the same D as the B group number, and the same as the C group number, then set the group numbers of B, C, D, and E to be the same as the group number of the target variable A.
- Step 206C Acquire a first variable related to the target variable and a second variable that is the same as the first variable group number according to the target correlation coefficient, and assign a new group number to the target variable, the first variable, and the second variable.
- the target variable to be clustered if the target variable to be clustered has not been grouped, it indicates that the target variable does not have a group number, but the first variable associated with the target variable is also acquired, and the first variable group number The same second variable is then assigned a new group number for the target variable, the first variable, and the second variable.
- the target variable, the first variable, and the second variable may be assigned a new group number in an increasing order. For example, if G represents the current total number of groups, the new group number is assigned to G+1.
- the first variable related to the target variable and the second variable identical to the first variable group number are acquired, and a new variable is assigned to the target variable, the first variable, and the second variable.
- the steps for the group number include:
- Step 502 If the target variable is not grouped, obtain a first variable related to the target variable and a second variable that is the same as the first variable group number.
- the target variable is not grouped, it indicates that the current target variable has no group number, and the first variable related to the target variable is obtained, that is, the correlation coefficient is directly related to the target variable according to the correlation coefficient.
- the first variable when the first variable has been divided into groups, it is also necessary to obtain the second variable that is the same as the first variable group number. Then assign a new group number to the target variable, the first variable, and the second variable.
- Step 504 uniformly assigning the target variable, the first variable, and the group number of the second variable to G+1, where G represents the current total number of groups.
- the grouping is performed in an increasing order, that is, when the target variable is still If not grouped, the first variable associated with the target variable and the second variable having the same number as the first variable group are searched, and then the group number of the target variable, the first variable, and the second variable are uniformly assigned to G+1, wherein , G represents the current total number of groups.
- A1 is related to A2, A2 is related to A3, and A4 is related to A5.
- A1 is used as the target variable to find the first variable related to the target variable A1, and the first variable found. Only A2, because A2 is not yet grouped at this time, that is, A2 does not have a group number, so there is no second variable with the same A2 group number at this time, that is, only A2 is found related to A1. At this time, it is A1 and A2.
- a group number G+1 is assigned. Since there is no group before, that is, G is initially 0, the group number assigned to A1 and A2 is 1. Then A2 is taken as the target variable. At this time, A2 has been grouped. Similarly, the first variable related to the target variable A2 needs to be searched. The first variable found has A1 and A3, and then the first variable group number is obtained separately.
- the order of the variables in the correlation coefficient table is adjusted according to the group number of the group, and the step of adjusting the plurality of variables having the same group number to the adjacent variables includes: adjusting a plurality of variables having the same group number For adjacent variables, and adjust according to the size of the group number in descending order The order in which the variables in the correlation coefficient table are arranged.
- a plurality of variables having the same group number are adjusted to adjacent variables, that is, a plurality of variables of the same group number are gathered together, and then according to the size of the group number, from large to large
- the small order adjusts the order in which the variables in the correlation coefficient table are arranged. This makes it easier to identify multiple related variables more regularly.
- the related variable identification method further includes:
- step 212 principal component analysis is used to select a correlation coefficient representing the group from the target correlation coefficients corresponding to the same group number.
- the principal component analysis method is used to correspond to the same group number.
- a correlation coefficient representing the group is selected from the target correlation coefficients, and then the subsequent correlation processing is performed according to the selected correlation coefficient, for example, a linear regression model is established according to the selected correlation coefficient.
- a related variable identification device 700 comprising:
- the obtaining module 702 is configured to obtain a correlation coefficient table to be processed, and the correlation coefficient table records correlation coefficients of the plurality of variables with each other.
- the searching module 704 is configured to search for a correlation coefficient whose absolute value of the correlation coefficient in the correlation coefficient table is greater than a preset threshold, and use a correlation coefficient greater than a preset threshold as a target correlation coefficient, and mark the target correlation coefficient.
- the clustering module 706 is configured to group the related multiple variables into the same group according to the target correlation coefficient, and assign a unique group number to the group.
- the adjusting module 708 is configured to adjust the order of the variables in the correlation coefficient table according to the group number of the group, and adjust the plurality of variables having the same group number to the adjacent variables.
- the display module 710 is configured to highlight the target correlation coefficient with the mark in the adjusted correlation coefficient table.
- the clustering module 706 includes:
- the determining module 706A is configured to acquire a target variable to be clustered, and determine whether the target variable has been grouped.
- the group number modification module 706B is configured to: if the target variable has been grouped, acquire a first variable related to the target variable and a second variable that is the same as the first variable group number according to the target correlation coefficient, and the first variable and the second variable The group number is modified to be the same as the group number of the target variable.
- the group number assigning module 706C is configured to acquire, when the target variable is not grouped, a first variable related to the target variable and a second variable that is the same as the first variable group number, and allocate the target variable, the first variable, and the second variable A new group number.
- the group number assignment module is further configured to: if the target variable is not grouped, acquire a first variable related to the target variable and a second variable that is the same as the first variable group number, and the rule according to the group number increment
- the target variable, the first variable, and the group number of the second variable are uniformly assigned G+1, where G represents the current total number of groups.
- the adjustment module is further configured to adjust a plurality of variables having the same group number to adjacent variables, and adjust the order of the variables in the correlation coefficient table in descending order of the size of the group number.
- a related variable identification apparatus 900 is proposed.
- the method further includes:
- the screening module 712 is configured to use a principal component analysis to select a correlation coefficient representing the group from the target correlation coefficients corresponding to the same group number.
- Each of the above-described modules based on the relevant variable identification means may be implemented in whole or in part by software, hardware, and combinations thereof.
- the network interface may be an Ethernet card or a wireless network card.
- the above modules may be embedded in the hardware in the terminal or in the memory in the terminal, or may be stored in the memory in the terminal in the form of software, so that the processor calls the execution of the operations corresponding to the above modules.
- the processor can be a central processing unit (CPU), a microprocessor, a microcontroller, or the like.
- the processor when the computer readable instructions in the terminal of FIG. 1 are executed by a processor, The processor performs the following steps: acquiring a correlation coefficient table to be processed, wherein the correlation coefficient table records correlation coefficients between the plurality of variables; and searching for the absolute value of the correlation coefficient in the correlation coefficient table is greater than a preset threshold a correlation coefficient, the correlation coefficient whose absolute value is greater than a preset threshold is used as a target correlation coefficient, and the target correlation coefficient is marked; and the related plurality of variables are clustered into the same group according to the target correlation coefficient, And assigning a unique group number to the group; adjusting the order of the variables in the correlation coefficient table according to the group number of the group, adjusting multiple variables having the same group number to adjacent variables; and adjusting the correlation The target correlation coefficient with the marker in the coefficient table is highlighted.
- the processor performs clustering the related plurality of variables into the same group according to the target correlation coefficient, and assigning a unique group number to the group includes: acquiring a cluster to be clustered a target variable, determining whether the target variable has been grouped; if the target variable has been grouped, acquiring a first variable related to the target variable according to the target correlation coefficient and being the same as the first variable group number a second variable, the group number of the first variable and the second variable is modified to be the same as the group number of the target variable; if the target variable is not grouped, the target correlation coefficient is obtained according to the target correlation coefficient The first variable associated with the target variable and the second variable having the same number as the first variable group assign a new group number to the target variable, the first variable, and the second variable.
- acquiring a first variable related to the target variable and a second variable identical to the first variable group number, Assigning a new group number to the target variable, the first variable, and the second variable includes: if the target variable is not grouped, acquiring a first variable related to the target variable and the first variable group number The same second variable; the group number of the target variable, the first variable, and the second variable are uniformly assigned to G+1 according to the rule of increasing the group number, wherein G represents the current total number of groups.
- the step of adjusting, by the processor, the order of the variables in the correlation coefficient table according to the group number of the group, and adjusting the plurality of variables having the same group number to the adjacent variables includes: : Adjust multiple variables with the same group number to adjacent variables, and adjust the order of the variables in the correlation coefficient table in descending order of the group number.
- the processor is further configured to perform the following steps: using principal component analysis from A correlation coefficient representing the group is selected from the target correlation coefficients corresponding to the same group number.
- the storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
一种相关变量识别方法,包括:获取相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数,查找所述相关系数表中相关系数的绝对值大于预设阈值的目标相关系数,并将所述目标相关系数进行标记,根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号,根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量,将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
Description
本申请要求于2017年2月17日提交中国专利局、申请号为201710087590X、发明名称为“相关变量识别方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及计算机处理领域,特别是涉及一种相关变量识别方法、装置、终端及存储介质。
在数据建模过程中,当变量彼此高度相关时,即相关系数的绝对值很大时,变量会表现出很强的共线性,这时会造成模型失真。所以在建模过程中,一定会处理变量的相关性,传统的处理相关性只能将两个变量相关的数据显示,而三个以上的相关变量则需要人工识别,由于处理的数据往往比较多,通过人工识别无疑会耗时耗力,从而降低了数据建模的速度。
发明内容
根据本申请的各种实施例,提供一种相关变量识别方法、装置、终端及存储介质。
一种相关变量识别方法,包括:
获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;
查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关
系数进行标记;
根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;
根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及
将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
一种相关变量识别装置,包括:
获取模块,用于获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;
查找模块,用于查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;
聚类模块,用于根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;
调整模块,用于根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及
显示模块,用于将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
一种终端,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;
查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;
根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该
组分配一个唯一的组号;
根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及
将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;
查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;
根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;
根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及
将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中终端的内部结构框图;
图2为一个实施例中相关变量识别方法流程图;
图3A为传统方法的部分识别结果的示意图;
图3B为一个实施例中部分识别结果的示意图;
图4为一个实施例中根据目标相关系数将相关的多个变量聚类为同一组的方法流程图;
图5为一个实施例中若目标变量未被分组则为该目标变量分配组号的方法流程图;
图6为另一个实施例中相关变量识别方法流程图;
图7为一个实施例中相关变量识别装置的结构框图;
图8为一个实施例中聚类模块的结构框图;
图9为另一个实施例中相关变量识别装置的结构框图。
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
如图1所示,在一个实施例中,终端102的内部结构如图1所示,包括通过系统总线连接的处理器、内存储器、非易失性存储介质、网络接口、显示屏和输入装置。其中,终端102的非易失性存储介质存储有操作系统和计算机可读指令,该计算机可读指令可被处理器执行以实现适用于终端102的一种相关变量识别方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。网络接口用于连接到网络进行通信。终端102的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是电子设备外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是平板电脑、笔记本电脑、台
式计算机等。本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
如图2所示,在一个实施例中,提出了一种相关变量识别方法,该方法可应用于如图1所示的终端中,其包括以下步骤:
步骤202,获取待处理的相关系数表,相关系数表中记载了多个变量彼此之间的相关系数。
在本实施例中,在终端中进行数据建模的过程中往往会涉及到很多个变量,当变量彼此之间的相关性比较高时,即变量之间的相关系数的绝对值比较大时,变量之间会表现出很强的共线性,容易造成模型失真。所以为了避免模型失真,需要识别出变量相关性比较高的变量进行相应的处理。首先,获取待处理的相关系数表,其中,相关系数表是预先通过计算多个变量两两之间的相关性得到的,里面记载了两两变量之间的相关系数,相关系数的绝对值代表了两个变量之间的相关性,绝对值越大,说明两个变量之间的相关性越大。具体的,为了识别出相关性比较高的变量,首先,终端根据用户的指令获取待处理的相关系数表,相关系数表中记载了多个变量两两之间的相关系数,表1为一个实施例中相关系数表的示意图:
表1
变量名称 | A1 | A2 | A3 | ... | An |
A1 | 1 | 0.007482 | 0.027993 | ... | 0.684049 |
A2 | 0.007482 | 1 | 0.835227 | ... | 0.472902 |
A3 | 0.027993 | 0.835227 | 1 | ... | -0.616960 |
... | ... | ... | ... | 1 | ... |
An | 0.684049 | 0.472902 | -0.616960 | ... | 1 |
其中,A1,A2,A3,...An表示的是各个变量,任两个变量都对应一个相关系数,比如,A1和A2对应的相关系数为0.007482(即A1行和A2列对
应的值或者A2行和A1列对应的值)。而每个相关系数的绝对值的大小代表了两个变量彼此之间的相关性,绝对值越大,说明两个变量之间的相关性越高。
步骤204,查找相关系数表中相关系数的绝对值大于预设阈值的相关系数,将绝对值大于预设阈值的相关系数作为目标相关系数,并将目标相关系数进行标记。
在本实施例中,在终端中预先设置相关系数的阈值,两个变量之间的相关系数只有大于该预设阈值才认为两个变量相关,否则,说明该两个变量不相关。查找相关系数表中相关系数的绝对值大于预设阈值的相关系数,将查找到的相关系数作为目标相关系数,并将这些目标相关系数进行标记。具体地,比如,预设阈值设为0.75,凡是相关系数的绝对值大于该预设阈值的就说明两个变量相关。遍历整个相关系数表,找到所有相关系数的绝对值大于该预设阈值(>0.75)的相关系数,并将这些相关系数作为目标相关系数,然后将这些目标相关系数进行标记,便于后续进行对应的处理。
步骤206,根据目标相关系数将具有相关性的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号。
在本实施例中,相关系数表中的相关系数代表的是两个变量的相关性,只有相关性大于预设阈值,两个变量才认为是相关的,否则就是不相关的。故,两个变量相关定义为:如果相关系数的绝对值大于预设阈值,则对应的两个变量相关;三个变量相关定义为:如果A和B的相关系数绝对值大于预设阈值,B和C的相关系数绝对值大于预设阈值,则A、B和C相关。也就是说,三个变量相关并不要求彼此都相关,只需要两两相关的变量具有一个共同的变量即可。即当A和B相关,B和C相关,两者有共同的变量B就可以将三者关联起来,此时不管A和C的相关系数绝对值是否大于预设阈值,A和B和C都相关,依次类推,可以得到三个以上的变量是否相关。具体地,由于目标相关系数是指绝对值大于预设阈值的相关系数,所以与目标相关系数对应的两个变量必然是相关的。两个相关的变量称为一对相关变量,如果
两对相关变量具有相同的变量,那么该两对中包括的变量都相关,进一步的,如果有其他变量与这两对变量中的任一变量相关,那么该其他变量也与这三个变量相关,依次类推。比如,A和B相关,B和C相关、C和D相关,D和E相关,那么A、B、C、D和E这多个变量相关。也就是说,对于相关的几个变量,只要其他变量与该相关的几个变量中的任何一个相关,则该其他变量与这几个变量相关。在本实施例中,终端通过聚类将相关的多个变量归为同一组,即将多个相关的变量分为一组,并为该组分配一个组号,也就是说,具有相同组号的多个变量相关。
步骤208,根据分组的组号调整相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量。
在本实施例中,终端将相关的多个变量的组号设置为相同后,将具有同一组号的多个变量调整为相邻的变量,即根据分组的组号重新调整相关系数表中各个变量的排列顺序。这样,具有关联的多个变量就聚集在了一起,便于后续可以快速识别相关变量。
步骤210,将调整后的相关系数表中的具有标记的目标相关系数进行突出显示。
在本实施例中,将具有同一组号的多个变量调整为相邻的变量后,将调整后的相关系数表中的具有标记的目标相关系数进行突出显示,图3B为一个实施例中部分识别结果的示意图(由于变量数目往往很多,图3B只展示了部分识别结果),其中,加灰色底纹的为目标相关系数,即设置大于0.75的相关系数为目标相关系数。多个目标相关系数聚集在一起所对应的多个变量为相关变量。为了将相关的变量聚集在一起,所以打破了原来的顺序排列,这样从图中就可以明显的看出多个变量之间的相关性,从而实现了快速识别多个相关变量,有利于提高建模过程中的速度。传统的只能识别两个变量相关的数据进行显示,若要识别三个以上的变量,则是通过将列表中没有目标相关系数的行和列隐藏,如3A所示,然后通过人工识别的方法来找到三个以上的相关变量。如图3A中所示,传统的方法目标相关系数分布的比较乱,
需要人工来识别三个以上相关变量。
在本实施例中,通过获取相关系数表,查找相关系数表中相关系数的绝对值大于预设阈值的目标相关系数,根据目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号,根据分组的组号调整相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量,然后将调整后的相关系数表中具有标记的目标相关系数进行突出显示,此时,多个相关变量聚集在了一起,通过将相应的目标相关系数进行突出显示,从而实现了快速识别多个相关变量,从而提高了数据建模速度。
如图4所示,在一个实施例中,根据目标相关系数值将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号的步骤包括:
步骤206A,获取待聚类的目标变量,判断该目标变量是否已经被分组,若已经被分组,则进入步骤206B,若未被分组,则进入步骤206C。
在本实施例中,将相关的多个变量采用遍历的方法进行聚类,首先,确定一个待聚类的目标变量,然后找出与这个目标变量相关的其他变量。具体地,将要被聚类的变量称为目标变量,然后判断该目标变量是否已经被分组,若已经被分组,则需要根据目标相关系数获取与目标变量相关的第一变量,因为目标相关系数对应的两个变量是相关变量,所以首先根据目标相关系数获取与目标变量相关的第一变量,然后在再查找与第一变量组号相同的第二变量,将第一变量和第二变量的组号修改为与目标变量的组号相同。若未被分组,则获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,为该目标变量、第一变量和第二变量分配一个新的组号。
步骤206B,根据目标相关系数获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,将第一变量和第二变量的组号修改为与目标变量的组号相同。
在本实施例中,若在终端中目标变量已经被分组,说明目标变量已经有了组号,那么查找与该目标变量相关的第一变量,然后还要查找与该第一变
量组号相同的第二变量,第一变量和第二变量组号相同说明两者已经是相关的变量,这样,就可以将与目标变量直接相关的第一变量、以及与目标变量间接相关的第二变量全部查找到,从而就找到了与目标变量相关的所有变量,然后将与该目标变量相关的所有变量的组号修改为与该目标变量的组号相同,即将第一变量和第二变量的组号修改为与目标变量的组号相同。其中,第一变量用来表示与目标变量直接相关的变量,第二变量用来表示通过第一变量与目标变量相关的变量。第一变量和第二变量分别表示的是一类变量,并不用于限制数量。具体的,假设目标变量为A,首先,查找与目标变量A相关的第一变量,假设查找到的第一变量为B、C,然后再查找与第一变量组号相同的第二变量,比如,查找到与B组号相同的为D,与C组号相同的为E,那么将B、C、D和E的组号都设置为与目标变量A的组号相同。
步骤206C,根据目标相关系数获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,为目标变量、第一变量和第二变量分配一个新的组号。
在本实施例中,若待聚类的目标变量还没有被分组,那么说明该目标变量还没有组号,不过,同样要获取与该目标变量相关的第一变量,以及和第一变量组号相同的第二变量,然后为该目标变量、第一变量和第二变量分配一个新的组号。具体的,可以按照递增的顺序为该目标变量、第一变量和第二变量分配一个新的组号,比如,若G表示当前的总组数,则分配新的组号为G+1。
如图5所示,若目标变量未被分组,则获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,为目标变量、第一变量和第二变量分配一个新的组号的步骤包括:
步骤502,若目标变量未被分组,则获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量。
在本实施例中,若目标变量未被分组,说明当前目标变量还没有组号,获取与目标变量相关的第一变量,即根据相关系数获取与目标变量直接相关
的第一变量;当第一变量已经被分过组,还需要获取与该第一变量组号相同的第二变量。然后为目标变量、第一变量以及第二变量统一分配一个新的组号。
步骤504,将目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
在本实施例中,若变量还未被分组,则说明变量还没有组号,为了便于后续可以更直观的看到多个变量之间的关系,采用递增的顺序进行分组,即当目标变量还未被分组,则查找与目标变量相关的第一变量以及和第一变量组号相同的第二变量,然后将目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
在一个具体的实施例中,假设有五个变量,A1、A2、A3、A4和A5,若根据相关系数表中的相关系数可知:A1与A2相关、A2和A3相关、A4和A5相关。在开始阶段A1、A2、A3、A4和A5都还未被分组,采用遍历聚类的方法,首先,将A1作为目标变量,查找与目标变量A1相关的第一变量,查找到的第一变量只有A2,由于A2此时还未分组,即A2还没有组号,所以此时不存在与A2组号相同的第二变量,即查找到的与A1相关只有A2,此时,为A1和A2分配一个组号G+1,由于之前没有组,即G初始为0,所以分配给A1和A2的组号为1。然后将A2作为目标变量,此时A2已经被分组,同样的,需要查找与目标变量A2相关的第一变量,查找到的第一变量有A1和A3,然后再分别获取与第一变量组号相同的第二变量,由于A3还未被分组,所以此时不存在与A3相关的第二变量,而A1已经被分组,而与A1组号相同的只有A2本身,所以,查找到的与A2相关的只有A1和A3,那么将A2、A1和A3的组号都修改为与A2相同,即组号为1,依次类推,通过该方法将所有相关的变量聚类到同一组,当然不相关的分别在不同的组。
在一个实施例中,根据分组的组号调整相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量的步骤包括:将具有同一组号的多个变量调整为相邻的变量,并根据组号的大小按照从大到小的顺序调整
相关系数表中变量的排列顺序。
在本实施例中,为了快速识别相关变量,将具有同一组号的多个变量调整为相邻的变量,即将同一组号的多个变量聚集在一起,然后根据组号的大小按照从大到小的顺序调整相关系数表中变量的排列顺序。这样便于更有规律的识别多个相关变量。
如图6所示,在一个实施例中,上述相关变量识别方法还包括:
步骤212,采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
在本实施例中,终端将调整后的相关系数表中的具有标记的目标相关系数进行突出显示识别出多个相关变量后,为了消除共线性,采用主成分分析方法从相同组号对应的多个目标相关系数中筛选出一个代表该组的相关系数,然后根据筛选出的相关系数进行后续的处理,比如,根据筛选出的相关系数建立线性回归模型等。
如图7所示,在一个实施例中,提出了一种相关变量识别装置700,该装置包括:
获取模块702,用于获取待处理的相关系数表,相关系数表中记载了多个变量彼此之间的相关系数。
查找模块704,用于查找相关系数表中相关系数的绝对值大于预设阈值的相关系数,将大于预设阈值的相关系数作为目标相关系数,,并将目标相关系数进行标记。
聚类模块706,用于根据目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号。
调整模块708,用于根据分组的组号调整相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量。
显示模块710,用于将调整后的相关系数表中的具有标记的目标相关系数进行突出显示。
如图8所示,在一个实施例中,聚类模块706包括:
判断模块706A,用于获取待聚类的目标变量,判断该目标变量是否已经被分组。
组号修改模块706B,用于若目标变量已经被分组,则根据目标相关系数获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,将第一变量和第二变量的组号修改为与目标变量的组号相同。
组号分配模块706C,用于若目标变量未被分组,则获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,为目标变量、第一变量和第二变量分配一个新的组号。
在一个实施例中,组号分配模块还用于若目标变量未被分组,则获取与目标变量相关的第一变量以及和第一变量组号相同的第二变量,按照组号递增的规则将目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
在一个实施例中,调整模块还用于将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
如图9所示,在一个实施例中,提出了一种相关变量识别装置900,除了包括模块712至模块710,还包括:
筛选模块712,用于采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
上述基于相关变量识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。其中,网络接口可以是以太网卡或无线网卡等。上述各模块可以硬件形式内嵌于或独立于终端中的处理器中,也可以以软件形式存储于终端中的存储器中,以便于处理器调用执行以上各个模块对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。
在一个实施例中,图1终端中所述计算机可读指令被处理器执行时,使
得处理器执行以下步骤:获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
在一个实施例中,所述处理器所执行根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号的步骤包括:获取待聚类的目标变量,判断该目标变量是否已经被分组;若所述目标变量已经被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,将所述第一变量和第二变量的组号修改为与所述目标变量的组号相同;若所述目标变量未被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号。
在一个实施例中,所述处理器所执行的若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号包括:若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量;按照组号递增的规则将所述目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
在一个实施例中,所述处理器所执行的所述根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量的步骤包括:将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
在一个实施例中,所述处理器还用于执行以下步骤:采用主成分分析从
相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种相关变量识别方法,包括:获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号包括:获取待聚类的目标变量,判断该目标变量是否已经被分组;若所述目标变量已经被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,将所述第一变量和第二变量的组号修改为与所述目标变量的组号相同;若所述目标变量未被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号。
- 根据权利要求2所述的方法,其特征在于,所述若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号包括:若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及 和所述第一变量组号相同的第二变量;按照组号递增的规则将所述目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
- 根据权利要求1所述的方法,其特征在于,所述根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量包括:将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
- 一种相关变量识别装置,包括:获取模块,用于获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;查找模块,用于查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;聚类模块,用于根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;调整模块,用于根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及显示模块,用于将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
- 根据权利要求6所述的装置,其特征在于,所述聚类模块包括:判断模块,用于获取待聚类的目标变量,判断该目标变量是否已经被分组;组号修改模块,用于若所述目标变量已经被分组,则根据所述目标相关 系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,将所述第一变量和第二变量的组号修改为与所述目标变量的组号相同;组号分配模块,用于若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号。
- 根据权利要求7所述的装置,其特征在于,所述组号分配模块还用于若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,按照组号递增的规则将所述目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
- 根据权利要求6所述的装置,其特征在于,所述调整模块还用于将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
- 根据权利要求6所述的装置,其特征在于,所述装置还包括:筛选模块,用于采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
- 一种终端,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组 号的多个变量调整为相邻的变量;及将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
- 根据权利要求11所述的终端,其特征在于,所述处理器所执行根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号的步骤包括:获取待聚类的目标变量,判断该目标变量是否已经被分组;若所述目标变量已经被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,将所述第一变量和第二变量的组号修改为与所述目标变量的组号相同;若所述目标变量未被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号。
- 根据权利要求12所述的终端,其特征在于,所述处理器所执行的若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号包括:若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量;按照组号递增的规则将所述目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
- 根据权利要求11所述的终端,其特征在于,所述处理器所执行的所述根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量的步骤包括:将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
- 根据权利要求11所述的终端,其特征在于,所述处理器还用于执行 以下步骤:采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
- 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:获取待处理的相关系数表,所述相关系数表中记载了多个变量彼此之间的相关系数;查找所述相关系数表中相关系数的绝对值大于预设阈值的相关系数,将所述绝对值大于预设阈值的相关系数作为目标相关系数,并将所述目标相关系数进行标记;根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号;根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量;及将调整后的相关系数表中的具有标记的所述目标相关系数进行突出显示。
- 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述处理器所执行的根据所述目标相关系数将相关的多个变量进行聚类归为同一组,并为该组分配一个唯一的组号的步骤包括:获取待聚类的目标变量,判断该目标变量是否已经被分组;若所述目标变量已经被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,将所述第一变量和第二变量的组号修改为与所述目标变量的组号相同;若所述目标变量未被分组,则根据所述目标相关系数获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号。
- 根据权利要求17所述的非易失性可读存储介质,其特征在于,所述处理器所执行的若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量,为所述目标变量、第一变量和第二变量分配一个新的组号包括:若所述目标变量未被分组,则获取与所述目标变量相关的第一变量以及和所述第一变量组号相同的第二变量;按照组号递增的规则将所述目标变量、第一变量以及第二变量的组号统一赋值为G+1,其中,G表示当前的总组数。
- 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述处理器所执行的所述根据分组的组号调整所述相关系数表中变量的排列顺序,将具有同一组号的多个变量调整为相邻的变量的步骤包括:将具有同一组号的多个变量调整为相邻的变量,并按照组号的大小从大到小的顺序调整相关系数表中变量的排列顺序。
- 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述处理器还用于执行以下步骤:采用主成分分析从相同组号对应的目标相关系数中筛选出一个代表该组的相关系数。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710087590.X | 2017-02-17 | ||
CN201710087590.XA CN106940803B (zh) | 2017-02-17 | 2017-02-17 | 相关变量识别方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018149084A1 true WO2018149084A1 (zh) | 2018-08-23 |
Family
ID=59468745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/090578 WO2018149084A1 (zh) | 2017-02-17 | 2017-06-28 | 相关变量识别方法、装置、终端及存储介质 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN106940803B (zh) |
TW (1) | TWI662472B (zh) |
WO (1) | WO2018149084A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252627A (zh) * | 2013-06-28 | 2014-12-31 | 广州华多网络科技有限公司 | Svm分类器训练样本获取方法、训练方法及其系统 |
US20150120639A1 (en) * | 2013-10-30 | 2015-04-30 | Samsung Sds Co., Ltd. | Apparatus and method for classifying data and system for collecting data |
CN106156791A (zh) * | 2016-06-15 | 2016-11-23 | 北京京东尚科信息技术有限公司 | 业务数据分类方法和装置 |
CN106324405A (zh) * | 2016-09-07 | 2017-01-11 | 南京工程学院 | 一种基于改进主成分分析的变压器故障诊断方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7617176B2 (en) * | 2004-07-13 | 2009-11-10 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
JP4308871B2 (ja) * | 2005-10-12 | 2009-08-05 | 学校法人東京電機大学 | 脳機能データ解析方法、脳機能解析装置及び脳機能解析プログラム |
US20070255512A1 (en) * | 2006-04-28 | 2007-11-01 | Delenstarr Glenda C | Methods and systems for facilitating analysis of feature extraction outputs |
TWI340345B (en) * | 2006-08-10 | 2011-04-11 | Uniminer Inc | Method for selecting critical variables |
JP4368905B2 (ja) * | 2007-05-11 | 2009-11-18 | シャープ株式会社 | グラフ描画装置および方法、その方法を実行する歩留り解析方法および歩留り向上支援システム、プログラム、並びにコンピュータ読み取り可能な記録媒体 |
TWI451336B (zh) * | 2011-12-20 | 2014-09-01 | Univ Nat Cheng Kung | 預測模型之建模樣本的篩選方法及其電腦程式產品 |
CN103473255A (zh) * | 2013-06-06 | 2013-12-25 | 中国科学院深圳先进技术研究院 | 一种数据聚类方法、系统及数据处理设备 |
CN104281569B (zh) * | 2013-07-01 | 2017-08-01 | 富士通株式会社 | 构建装置和方法、分类装置和方法以及电子设备 |
CN103699653A (zh) * | 2013-12-26 | 2014-04-02 | 沈阳航空航天大学 | 数据聚类方法和装置 |
CN105956628B (zh) * | 2016-05-13 | 2021-01-26 | 北京京东尚科信息技术有限公司 | 数据分类方法和用于数据分类的装置 |
CN106339354B (zh) * | 2016-08-17 | 2018-11-20 | 盐城师范学院 | 基于改进pca的云计算网络中高维数据可视化方法 |
-
2017
- 2017-02-17 CN CN201710087590.XA patent/CN106940803B/zh active Active
- 2017-06-28 WO PCT/CN2017/090578 patent/WO2018149084A1/zh active Application Filing
- 2017-11-30 TW TW106141760A patent/TWI662472B/zh active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252627A (zh) * | 2013-06-28 | 2014-12-31 | 广州华多网络科技有限公司 | Svm分类器训练样本获取方法、训练方法及其系统 |
US20150120639A1 (en) * | 2013-10-30 | 2015-04-30 | Samsung Sds Co., Ltd. | Apparatus and method for classifying data and system for collecting data |
CN106156791A (zh) * | 2016-06-15 | 2016-11-23 | 北京京东尚科信息技术有限公司 | 业务数据分类方法和装置 |
CN106324405A (zh) * | 2016-09-07 | 2017-01-11 | 南京工程学院 | 一种基于改进主成分分析的变压器故障诊断方法 |
Also Published As
Publication number | Publication date |
---|---|
CN106940803B (zh) | 2018-04-17 |
CN106940803A (zh) | 2017-07-11 |
TW201832070A (zh) | 2018-09-01 |
TWI662472B (zh) | 2019-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10353810B2 (en) | Software testing with minimized test suite | |
US11003625B2 (en) | Method and apparatus for operating on file | |
WO2019019769A1 (zh) | 业务功能实现的方法、装置、计算机设备及存储介质 | |
US10078502B2 (en) | Verification of a model of a GUI-based application | |
CN109977366B (zh) | 一种目录生成方法及装置 | |
US11288266B2 (en) | Candidate projection enumeration based query response generation | |
WO2019056494A1 (zh) | 图表生成方法、装置、计算机设备和存储介质 | |
CN109376153B (zh) | 一种基于NiFi的数据写入图数据库的系统及方法 | |
US12045918B2 (en) | Techniques for analyzing command usage of software applications | |
WO2016101751A1 (zh) | 一种分布式存储系统中的主从平衡方法和装置 | |
CN108255976B (zh) | 数据排序的方法、装置和存储介质以及电子设备 | |
JP6268435B2 (ja) | データベースの再構成方法、データベースの再構成プログラム、及び、データベースの再構成装置 | |
US11361195B2 (en) | Incremental update of a neighbor graph via an orthogonal transform based indexing | |
US10114951B2 (en) | Virus signature matching method and apparatus | |
CN104700255B (zh) | 多进程处理方法、装置和系统 | |
WO2018149084A1 (zh) | 相关变量识别方法、装置、终端及存储介质 | |
WO2017067459A1 (zh) | 一种桌面数据加载方法及装置 | |
CN105733921A (zh) | 下一代测序分析系统及其下一代测序分析方法 | |
CN111078671A (zh) | 数据表字段的修改方法、装置、设备和介质 | |
EP2892018A1 (en) | Automated compilation of graph input for the hipergraph solver | |
US9135300B1 (en) | Efficient sampling with replacement | |
US20150039633A1 (en) | Duplicate station detection system | |
JP2020525963A (ja) | メディア特徴の比較方法及び装置 | |
US9715748B2 (en) | Method and apparatus for graphical data interaction and vizualization of graphs via paths | |
US11816245B2 (en) | Method for analysis on interim result data of de-identification procedure, apparatus for the same, computer program for the same, and recording medium storing computer program thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17897093 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/11/2019) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17897093 Country of ref document: EP Kind code of ref document: A1 |