WO2022009408A1 - Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement - Google Patents
Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement Download PDFInfo
- Publication number
- WO2022009408A1 WO2022009408A1 PCT/JP2020/026973 JP2020026973W WO2022009408A1 WO 2022009408 A1 WO2022009408 A1 WO 2022009408A1 JP 2020026973 W JP2020026973 W JP 2020026973W WO 2022009408 A1 WO2022009408 A1 WO 2022009408A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- classes
- class
- information processing
- variation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- This disclosure relates to information processing devices, information processing methods and storage media.
- Patent Document 1 discloses an example of a projection matrix generation method used for dimension reduction.
- This disclosure aims to provide an information processing device, an information processing method, and a storage medium that realizes dimension reduction in which classes can be separated better.
- the plurality of data are based on an acquisition means, each of which acquires a plurality of data classified into any of the plurality of classes, and an objective function including statistics of the plurality of data. It has a calculation means for calculating a projection matrix used for dimension reduction of the above, and the objective function is a variation among the plurality of data classes between the first class and the second class among the plurality of classes.
- a first function comprising the first term indicating the above, and a second function including the second term indicating the intraclass variation of the plurality of data in at least one of the first class and the second class.
- the plurality of data are based on an acquisition means, each of which acquires a plurality of data classified into any of the plurality of classes, and an objective function containing the statistics of the plurality of data. It has a calculation means for calculating a projection matrix used for reducing the dimension of the data of the above, and the objective function has a first term indicating variation among the classes of the plurality of data, and the plurality of data over the plurality of classes.
- the minimum value across the plurality of classes of the first function including the third term indicating the average of the interclass variation of the data, the second term indicating the intraclass variation of the plurality of data, and the plurality of terms over the plurality of classes.
- An information processing apparatus is provided that includes a ratio of a second function to a maximum value over the plurality of classes, including a fourth term indicating the average of intraclass variation of data.
- the computer is based on a step of retrieving multiple pieces of data, each classified into one of a plurality of classes, and an objective function containing the statistics of the plurality of data. It has a step of calculating a projection matrix used for dimension reduction of the plurality of data, and the objective function is a method of the plurality of data between the first class and the second class of the plurality of classes.
- a first function containing a first term indicating variation between classes
- a second function containing a second term indicating intraclass variation of the plurality of data in at least one of the first class and the second class.
- the computer is based on a step of retrieving multiple pieces of data, each classified into one of a plurality of classes, and an objective function containing the statistics of the plurality of data.
- the objective function comprises a step of calculating a projection matrix used for dimension reduction of the plurality of data, a first term indicating interclass variation of the plurality of data, and the plurality of said over the plurality of classes.
- the minimum value of the first function including the third term indicating the average of the variation between classes of data, the second term indicating the variation within the class of the plurality of data, and the plurality of terms across the plurality of classes.
- An information processing method for executing an information processing method is provided, which includes a ratio of a second function including a fourth term indicating the average of intraclass variation of the data to the maximum value over the plurality of classes.
- the computer is based on a step of retrieving multiple data, each classified into one of a plurality of classes, and an objective function containing the statistics of the plurality of data. It has a step of calculating a projection matrix used for dimension reduction of the plurality of data, and the objective function is a method of the plurality of data between the first class and the second class of the plurality of classes.
- a first function including a first term indicating interclass variation
- a second function including a second term indicating intraclass variation of the plurality of data in at least one of the first class and the second class.
- a storage medium containing a program for executing an information processing method including the above is provided.
- the computer is based on a step of retrieving multiple pieces of data, each classified into one of a plurality of classes, and an objective function containing the statistics of the plurality of data.
- the objective function comprises a step of calculating a projection matrix used for dimension reduction of the plurality of data, a first term indicating interclass variation of the plurality of data, and the plurality of said over the plurality of classes.
- the minimum value of the first function including the third term indicating the average of the variation between classes of data, the second term indicating the variation within the class of the plurality of data, and the plurality of terms across the plurality of classes.
- a storage medium containing a program for executing an information processing method, including a ratio of a second function including a fourth term indicating the average of intraclass variation of the data to the maximum value over the plurality of classes. Will be done.
- the information processing device of the present embodiment is a device that calculates a projection matrix used for dimensionality reduction of input data. Further, the information processing apparatus of the present embodiment may be provided with a determination function of performing determination such as person identification on the data for which feature selection using a projection matrix is performed on the input data. This data may be, for example, feature data extracted from biometric information. In this case, the information processing device may be a biometric matching device that confirms the identity of a person based on biometric information.
- the information processing apparatus of the present embodiment is assumed to be a biological collation apparatus having both a training function for calculating a projection matrix and a determination function based on the projection matrix, but the present invention is not limited thereto.
- FIG. 1 is a block diagram showing a hardware configuration example of the information processing device 1.
- the information processing device 1 of the present embodiment may be, for example, a computer such as a PC (Personal Computer), a processing server, a smartphone, or a microcomputer.
- the information processing device 1 includes a processor 101, a memory 102, a communication I / F (Interface) 103, an input device 104, and an output device 105.
- Each part of the information processing apparatus 1 is connected to each other via a bus, wiring, a driving device, etc. (not shown).
- the processor 101 includes, for example, an arithmetic processing circuit such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field-Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), and a TPU (Tensor Processing Unit). It is a processing unit provided with one or more.
- the processor 101 performs a predetermined operation according to a program stored in a memory 102 or the like, and also has a function of controlling each part of the information processing apparatus 1.
- the memory 102 is a non-volatile storage medium that provides a temporary memory area necessary for the operation of the processor 101, and non-volatile storage that non-temporarily stores information such as data to be processed and an operation program of the information processing apparatus 1.
- Can include media and.
- An example of a volatile storage medium is RAM (RandomAccessMemory).
- Examples of the non-volatile storage medium include ROM (ReadOnlyMemory), HDD (HardDiskDrive), SSD (SolidStateDrive), flash memory and the like.
- Communication I / F103 is a communication interface based on standards such as Ethernet (registered trademark), Wi-Fi (registered trademark), and Bluetooth (registered trademark).
- the communication I / F 103 is a module for communicating with other devices such as a data server and a sensor device.
- the input device 104 is a keyboard, a pointing device, a button, or the like, and is used by the user to operate the information processing device 1. Examples of pointing devices include mice, trackballs, touch panels, pen tablets and the like.
- the input device 104 may include a sensor device such as a camera or a microphone. These sensor devices can be used to acquire biometric information.
- the output device 105 is, for example, a device that presents information to a user such as a display device and a speaker.
- the input device 104 and the output device 105 may be integrally formed as a touch panel.
- the information processing device 1 is composed of one device, but the configuration of the information processing device 1 is not limited to this.
- the information processing device 1 may be a system composed of a plurality of devices. Further, devices other than these may be added to the information processing device 1, and some devices may not be provided. Further, some devices may be replaced with other devices having similar functions. Further, some functions of the present embodiment may be provided by other devices via a network, or the functions of the present embodiment may be distributed and realized by a plurality of devices.
- the memory 102 may include cloud storage, which is a storage device provided for other measures. In this way, the hardware configuration of the information processing apparatus 1 can be changed as appropriate.
- FIG. 2 is a functional block diagram of the information processing apparatus 1 according to the present embodiment.
- the information processing apparatus 1 includes a projection matrix calculation unit 110, a first feature extraction unit 121, a second feature extraction unit 131, a feature selection unit 132, a determination unit 133, an output unit 134, a training data storage unit 141, and a projection matrix storage unit 142.
- the target data storage unit 143 is provided.
- the projection matrix calculation unit 110 includes a separation degree calculation unit 111, a constraint setting unit 112, and a projection matrix update unit 113.
- the processor 101 performs predetermined arithmetic processing by executing the program stored in the memory 102. Further, the processor 101 controls each part of the memory 102, the communication I / F 103, the input device 104, and the output device 105 based on the program. As a result, the processor 101 realizes the functions of the projection matrix calculation unit 110, the first feature extraction unit 121, the second feature extraction unit 131, the feature selection unit 132, the determination unit 133, and the output unit 134. Further, the memory 102 realizes the functions of the training data storage unit 141, the projection matrix storage unit 142, and the target data storage unit 143.
- the first feature extraction unit 121 and the projection matrix calculation unit 110 may be more generally referred to as acquisition means and calculation means, respectively.
- the information processing device 1 may be divided into a training device that performs training using training data and a determination device that makes a determination on the target data.
- the training device may include a projection matrix calculation unit 110, a first feature extraction unit 121, and a training data storage unit 141.
- the determination device may include a second feature extraction unit 131, a feature selection unit 132, a determination unit 133, an output unit 134, and a target data storage unit 143.
- FIG. 3 is a flowchart showing an outline of the training process performed in the information processing apparatus 1 according to the present embodiment.
- the training process of the present embodiment is started when a command for the training process using the training data is given to the information processing apparatus 1 by, for example, a user operation or the like.
- the timing at which the training process of the present embodiment is performed is not particularly limited, and may be the time when the information processing apparatus 1 acquires the training data, and the training process is repeatedly executed at predetermined time intervals. There may be.
- the training data stored in the training data storage unit 141 in advance is classified into any of a plurality of classes, but when the training process is executed, it is stored from another device such as a data server. Training data may be acquired.
- the first feature extraction unit 121 acquires training data from the training data storage unit 141.
- Information indicating which of the plurality of classes is classified in advance by the user or the like is associated with this training data.
- this training data is sensor data acquired from a living body, an object, or the like
- the plurality of classes may be identification numbers or the like that identify the person, object, or the like from which the training data was acquired.
- step S12 the first feature extraction unit 121 extracts feature amount data from the training data.
- step S13 the projection matrix calculation unit 110 calculates the projection matrix.
- the calculated projection matrix is stored in the projection matrix storage unit 142.
- the feature amount data is multidimensional data, and dimension reduction may be required in order to appropriately perform a determination based on the feature amount data.
- the projection matrix calculation unit 110 performs training for determining a projection matrix for dimension reduction based on the training data. Details of the processing in step S13 will be described later.
- the feature amount data extracted from the training data in advance may be stored in the training data storage unit 141, in which case the process of step S12 may be omitted.
- FIG. 4 is a flowchart showing an outline of the determination process performed in the information processing apparatus 1 according to the present embodiment.
- the determination process of the present embodiment is started when the information processing apparatus 1 is instructed to perform the determination process using the target data, for example, by a user operation or the like.
- the timing at which the determination process of the present embodiment is performed is not particularly limited, and may be the time when the information processing apparatus 1 acquires the target data, and the determination process is repeatedly executed at predetermined time intervals. There may be.
- the projection matrix is stored in the projection matrix storage unit 142 in advance and the target data is stored in the target data storage unit 143.
- the target data may be acquired from the device of.
- step S21 the second feature extraction unit 131 acquires the target data from the target data storage unit 143.
- This target data is unknown data to be determined in this determination process.
- step S22 the second feature extraction unit 131 extracts feature amount data from the target data.
- step S23 the feature selection unit 132 executes feature selection based on the projection matrix on the target data. Specifically, this process is a process of reducing the dimension of the target data by applying a projection matrix to the target data. In other words, the feature selection unit 132 performs a process of reducing the number of features by selecting features that well reflect the properties of the target data.
- the determination unit 133 makes a determination based on the feature amount data after feature selection. For example, if the determination in the determination unit 133 is a class classification, this determination is a process of determining the class to which the input feature amount data belongs. Further, for example, if the determination in the determination unit 133 is a person identification in the biological collation, this determination is a process of determining whether or not the person who acquired the target data is the same person as the registered person.
- step S25 the output unit 134 outputs the determination result by the determination unit 133.
- the output destination may be the memory 102 in the information processing device 1 or another device.
- the projection matrix calculation of the present embodiment is touched on by touching on LDA (Linear Discriminant Analysis) and WLDA (Worst-case Linear Discriminant Analysis) related to the process of the present embodiment.
- LDA Linear Discriminant Analysis
- WLDA Wide-case Linear Discriminant Analysis
- the dimensionality of the training data d, n the number of training data, i th showing a training data d-dimensional vector x i, the number of classes C, and the number of dimensions after the dimension reduction and r.
- the projection matrix W is represented by the execution column of the d-th column and the r-column as shown in the following equation (1). By applying the projection matrix W to the training data x i , the dimension can be reduced from the d dimension to the r dimension.
- the matrices S b and Sw are defined by the following equations (3) to (6).
- argmax ( ⁇ ) shows the argument that gives the maximum value of the function within the brackets
- tr ( ⁇ ) shows the trace of the square matrix
- W T indicates a transposed matrix of W.
- Equation (5) shows the intra-class mean of x i in the k-th class [pi k
- equation (6) is the sample mean of all the training data. Therefore, the matrix S b is a matrix showing the average of the interclass variances, and the matrix Sw is a matrix showing the average of the intraclass variances. That is, in LDA, a projection matrix W that maximizes the ratio of the term indicating the average of the interclass variation of the training data divided by the term indicating the average of the intraclass variation of the training data is roughly determined. Since this method focuses only on the average during optimization, the risk of confusion between critical classes is neglected, such as data being distributed so that only some of the different classes overlap.
- the matrix I r shows the unit matrix of r rows r columns.
- s. t. (Subject to) indicates a constraint condition.
- the matrices S ij and Sk are defined by the following equations (9) and (10).
- Equation (8) is a constraint condition called an orthonormal constraint.
- the orthonormal constraint has the function of limiting the scale of each column of the projection matrix W and eliminating redundancy.
- Equation (13) is a set showing the solution space after the constraint condition is relaxed.
- 0 d indicates a zero matrix of d rows and d columns
- I d indicates a unit matrix of d rows and d columns.
- Equation (14) is Gyoretsu (M e -0 d) is is positive semidefinite and Gyoretsu (I d -M e) is shown to be positive semidefinite. Equation (14) is called a semi-definite matrix.
- equations (11) and (13) the optimization problem of equations (7) and (8) can be alleviated as in equations (15) and (16) below.
- equation transformation the property that the matrix trace is invariant to the order transformation of the matrix product is used when the matrix size is appropriate.
- the matrix S ij included in the objective function of WLDA is a matrix showing the variance between classes, and the matrix S i is a matrix showing the variance within the class. Therefore, the WLDA roughly determines a projection matrix W that maximizes the ratio of the term indicating the minimum interclass variation of the training data divided by the term indicating the maximum intraclass variation of the training data. To. This method considers the worst case combination of multiple training data. Therefore, unlike LDA, which focuses only on the average, even when data is distributed so that only a part of the class overlaps, it is optimized to widen the interclass distance of such a critical part.
- the projected projection matrix W can be calculated.
- the set of two classes that give the minimum value of the interclass variation of the numerator of the objective function such as Eq. (15) and the class that gives the minimum value of the intraclass variation of the denominator are different classes. In some cases. In such a case, the class that gives the minimum value of the variability within the class of the denominator becomes unrelated to the critical part, and the optimization may be insufficient.
- the objective function of the optimization problem of the equation (15) is modified from that of the above-mentioned WLDA.
- the projection matrix calculation process of this embodiment will be described.
- the optimization problem in the projection matrix calculation process of this embodiment is as shown in the following equations (17) to (19). Note that n i and n j in the equation (18) indicate the number of data in the class indexes i and j, respectively.
- the matrix Sij included in the objective function of the present embodiment is a matrix (first term) showing the interclass variance of the i-th class (first class) and the j-th class (second class). Further, the matrices S i and j (overline omitted) are matrices (second term) showing the weighted average of the intraclass variances in the two classes used for calculating the interclass variance.
- the first function is a function containing the first term indicating the inter-class variation between the first class and the second class, which is the denominator of the fraction of the formula (17), and is the denominator of the fraction of the formula (17).
- the second function is a function including a second term indicating at least one intraclass variation of the first class and the second class. In this embodiment, a projection matrix W that maximizes the minimum value of the ratio of the first function divided by the second function over a plurality of classes is roughly determined.
- FIG. 5 is a diagram schematically showing the relationship between the variance of a plurality of classes and the orientation of the projection axis.
- FIG. 5 schematically shows the distribution of training data classified into a plurality of classes.
- the training data is two-dimensional for the sake of simplification of the illustration, and the projection matrix that reduces the two-dimensional data to one dimension is calculated.
- the first and second axes of FIG. 5 correspond to the two dimensions of the training data.
- the elliptical dashed line indicates the intraclass variance of classes CL1, CL2, and CL3. Roughly speaking, it can be considered that the training data of the corresponding classes are distributed in the broken lines of the classes CL1, CL2, and CL3.
- the rectangular dots arranged in the broken lines of the classes CL1, CL2, and CL3 indicate the in-class average of each class.
- Arrow A1 indicates the direction of the projection axis that can be calculated when WLDA is used.
- the direction of the arrow A1 is slightly different from the direction that minimizes the influence of the region R, that is, the direction of the minimum width of the region R.
- the reason for this is that the variance within the class of class CL3 is very large. Since the direction that minimizes the influence of the dispersion within the class of class CL3 is the short axis direction of the ellipse of class CL3 in FIG. 5, the direction of the arrow A1 is also close to the short axis direction of the ellipse of class CL3. .. In this case, the projection axis does not minimize the influence of the overlapping portion of the class CL1 and the class CL2.
- Arrow A2 indicates the direction of the projection axis that can be calculated when the projection matrix calculation process of the present embodiment is used.
- the direction of the arrow A2 is close to the direction that minimizes the influence of the region R, that is, the direction of the minimum width of the region R.
- the intra-class variance is calculated from the same class as the class used for calculating the inter-class variance. Therefore, in the example of FIG. 5, since the orientation of the projection axis is optimized without being affected by the intraclass variance of the class CL3, the orientation of the projection axis is determined so as to minimize the influence of the region R. To.
- the intra-class variance is calculated by the same class as the class used for calculating the inter-class variance.
- the critical points where multiple classes overlap are emphasized.
- the information processing apparatus 1 that realizes the dimension reduction that can better separate the classes is provided.
- FIG. 6 is a flowchart showing an outline of the projection matrix calculation process performed in the information processing apparatus 1 according to the present embodiment.
- step S131 the projection matrix calculation unit 110 sets the value of k to 0.
- k is a loop counter variable in the loop processing of the optimization of the matrix ⁇ .
- steps S133 to S137 are loop processes for optimizing the matrix ⁇ .
- the variable corresponding to the value k of the loop counter that is, the variable in the kth iteration may have an argument k.
- the projection matrix calculation unit 110 increments the value of k. Increment is an arithmetic process that increases the value of k by 1.
- step S134 the separation degree calculation unit 111 calculates the value of the optimization separation degree ⁇ k.
- the degree of separation ⁇ k is determined by the following equation (20) based on the equation (17) and the determinant ⁇ k-1 obtained by the k-1st iteration. Although the proof is omitted, it is known that this optimization algorithm converges because the degree of separation ⁇ k is non-decreasing with respect to the increase of k and is bounded above.
- Equation (21) is the object of the semidefinite programming problem
- equations (22) and (23) are constraints of the semidefinite programming problem.
- t in Eqs. (21) and (22) is an auxiliary variable.
- step S135 the constraint setting unit 112 calculates the above equations (22) and (23) based on the training data and the determinant ⁇ k-1 in the previous iteration, and sets the constraints of the semidefinite programming problem. Set.
- step S136 the projection matrix update unit 113 solves the semidefinite programming problem of the above equations (21) to (23) to calculate the matrix ⁇ k in the kth iteration. Since the semidefinite programming problems of equations (21) to (23) are convex optimization problems that are relatively easy to solve, they can be solved by using an existing solver.
- step S137 the projection matrix update unit 113 determines whether or not the matrix ⁇ has converged in the kth iteration. This determination can be made, for example, based on whether or not the following equation (24) is satisfied. It should be noted that ⁇ in the equation (24) is a threshold value for determination, and it is determined that the matrix ⁇ has converged when the equation (24) holds for a sufficiently small ⁇ set in advance.
- step S137 When it is determined that the determinant ⁇ k has converged (Yes in step S137), the process proceeds to step S138, and the optimization ends with the determinant ⁇ k at that time as the determinant ⁇ after optimization. When it is determined that the determinant ⁇ k has not converged (No in step S137), the process proceeds to step S133, and optimization is continued.
- step S138 the projection matrix update unit 113 calculates the projection matrix W by performing eigenvalue decomposition on the optimized matrix ⁇ .
- d eigenvalues and corresponding eigenvectors are calculated from the d-by-d matrix ⁇ .
- D be a diagonal matrix whose diagonal components are the calculated d eigenvalues
- V be an orthogonal matrix in which the calculated d eigenvectors (vertical vectors) are arranged in each column. It can be expressed as (25).
- the projection matrix W of d rows and r columns can be calculated.
- the calculated projection matrix W is stored in the projection matrix storage unit 142.
- the optimization problem of the equation (17) to the equation (19) is solved to calculate the matrix ⁇ , and the matrix ⁇ is further decomposed into eigenvalues to calculate the projection matrix W.
- the optimum projection matrix W which is the solution of the equation (19) can be obtained from the equation (17).
- the optimization procedure or the method of calculating the projection matrix W from the matrix ⁇ is not limited to this, as long as the projection matrix W can be obtained from the optimization problem of equations (17) to (19).
- the algorithm may be modified as appropriate.
- the min included in the objective function in the equation (17) can be appropriately changed according to the mode of the objective function, and is not limited to this as long as the combination of i and j is determined based on some criteria. However, it is desirable that the objective variable include min or max, as the combination of the most influential classes can be considered.
- the matrices S i and j (overline omitted) in the equation (18) are not limited to the average, and may be any one using at least one of the matrices S i and S j. However, since the two classes can be considered equally, it is desirable that the matrices S i, j (overline omitted) are weighted averages of the two classes as in Eq. (18).
- This embodiment is a modification of the objective function in the optimization problem shown in the equations (17) to (19) of the first embodiment.
- the configuration of this embodiment is the same as that of the first embodiment except for the difference in mathematical formulas due to this modification. That is, the hardware configuration, block diagram, flowchart, and the like of the present embodiment are substantially the same as those of FIGS. 1 to 4 and 6 of the first embodiment. Therefore, the description of the part that overlaps with the first embodiment in the present embodiment will be omitted.
- the optimization problem in the projection matrix calculation process of this embodiment is as shown in the following equations (26) and (27).
- the matrix Sij and the matrix ⁇ are the same as those in the above equation (17).
- the matrices S b and Sw are the same as those defined by the above equations (3) to (6).
- the matrices S i, j (overline omitted) are the same as those defined by the above equation (18).
- the coefficient ⁇ is a positive real number.
- the difference from the optimization problem of the first embodiment is that the above-mentioned regularization terms of ⁇ S b and ⁇ S w are added.
- ⁇ S b is a regularization term (third term) indicating the average of interclass variation in LDA
- ⁇ S w is a regularization term (fourth term) indicating the average of intraclass variation of LDA. That is, in the present embodiment, the objective function of the first embodiment and the objective function of the LDA are compatible with each other by weighting addition of the ratio according to the coefficient ⁇ .
- the first embodiment in order to emphasize the critical part where a plurality of classes overlap, optimization focusing on the combination of the worst case classes is performed. In such an optimization method, when there are outliers in the training data, optimization that is extremely dependent on the outliers may be performed.
- the regularization term indicating the average of the interclass variance and the average of the intraclass variance in LDA is introduced, not only the worst case but also the average is considered to some extent. Therefore, in the present embodiment, in addition to obtaining the same effect as that of the first embodiment, by introducing the regularization term based on LDA, the robustness against the outliers that can be included in the training data is improved. The effect is obtained.
- step S134 the separation degree calculation unit 111 calculates the value of the optimization separation degree ⁇ k.
- the degree of separation ⁇ k is determined by the following equation (28) based on the equation (26) and the determinant ⁇ k-1 obtained by the k-1st iteration.
- Equation (29) is the object of the semidefinite programming problem
- equations (30) and (31) are constraints of the semidefinite programming problem.
- t in Eqs. (29) and (30) is an auxiliary variable.
- the semidefinite programming problem of equations (29) to (31) is a convex optimization problem as in the case of the first embodiment, it can be solved in the same manner as in the first embodiment.
- the processing of steps S135 to S138 is the same as that of the first embodiment except that the formulas based on the formulas are the above formulas (29) to (31), and thus the description thereof will be omitted. Therefore, the optimum projection matrix W can be calculated for the optimization problem of the present embodiment as in the first embodiment.
- This embodiment is a modification of the objective function in the optimization problem shown in the equations (17) to (19) of the first embodiment.
- the configuration of this embodiment is the same as that of the first embodiment except for the difference in mathematical formulas due to this modification. That is, the hardware configuration, block diagram, flowchart, and the like of the present embodiment are substantially the same as those of FIGS. 1 to 4 and 6 of the first embodiment. Therefore, the description of the part that overlaps with the first embodiment in the present embodiment will be omitted.
- the optimization problem in the projection matrix calculation process of this embodiment is as shown in the following equations (32) and (33).
- the matrix Sij and the matrix ⁇ are the same as those in the above equation (17).
- the matrices S b and Sw are the same as those defined by the above equations (3) to (6).
- the matrix S i is the same as that defined by the above equation (10).
- the coefficient ⁇ is a positive real number.
- the regularization terms of ⁇ S b and ⁇ S w are added to the objective function of the optimization problem in WLDA as in the second embodiment.
- ⁇ S b is a regularization term (third term) indicating the average of interclass variation in LDA
- ⁇ S w is a regularization term (fourth term) indicating the average of intraclass variation of LDA. That is, in the present embodiment, the objective function of WLDA and the objective function of LDA are compatible with each other by weighting addition of the ratio according to the coefficient ⁇ .
- WLDA optimization focusing on the combination of worst case classes is performed in order to emphasize critical points where multiple classes overlap.
- optimization when there are outliers in the training data, optimization that is extremely dependent on the outliers may be performed.
- the regularization term indicating the average of the interclass variance and the average of the intraclass variance in LDA is introduced, not only the worst case but also the average is considered to some extent. Therefore, in the present embodiment, in addition to obtaining the same effect as WLDA, the introduction of the regularization term based on LDA has the effect of improving the robustness against outliers that may be included in the training data. Be done.
- the information processing apparatus 1 that realizes the dimension reduction that can better separate the classes is provided.
- step S134 the separation degree calculation unit 111 calculates the value of the optimization separation degree ⁇ k.
- the degree of separation ⁇ k is determined by the following equation (34) based on the equation (32) and the determinant ⁇ k-1 obtained by the k-1st iteration.
- Equation (35) is the object of the semidefinite programming problem
- equations (36) to (38) are constraints of the semidefinite programming problem.
- s and t in equations (35) to (37) are auxiliary variables.
- the semidefinite programming problem of equations (35) to (38) is a convex optimization problem as in the case of the first embodiment, it can be solved in the same manner as in the first embodiment.
- the processing of steps S135 to S138 is the same as that of the first embodiment except that the formulas based on the formulas are the above formulas (35) to (38), and thus the description thereof will be omitted. Therefore, the optimum projection matrix W can be calculated for the optimization problem of the present embodiment as in the first embodiment.
- the type of data to be processed is not particularly limited.
- the data to be processed is feature data extracted from biometric information.
- feature data is multidimensional data and may be difficult to process as it is.
- the determination using the feature amount data can be made more appropriate.
- the following fourth embodiment shows a specific example of an apparatus to which the determination result by feature extraction using the projection matrix W calculated by the information processing apparatus 1 of the first to third embodiments can be applied.
- Ear acoustic collation is a technique for determining the difference between a person by collating the acoustic characteristics of the head including the ear canal of the person. Since the acoustic characteristics of the ear canal differ from person to person, it is suitable for biometric information used for personal verification. Therefore, the ear acoustic collation may be used for user determination of a hearable device such as an earphone. It should be noted that the ear acoustic collation may be used not only for determining the difference between people but also for determining the wearing state of the hearable device.
- FIG. 7 is a schematic diagram showing the overall configuration of the information processing system according to the present embodiment.
- the information processing system includes an information processing device 1 and an earphone 2 that can be wirelessly connected to each other.
- the earphone 2 includes an earphone control device 20, a speaker 26, and a microphone 27.
- the earphone 2 is an audio device that can be worn on the head of the user 3, particularly the ear, and is typically a wireless earphone, a wireless headset, or the like.
- the speaker 26 functions as a sound wave generating unit that emits a sound wave toward the ear canal of the user 3 when worn, and is arranged on the mounting surface side of the earphone 2.
- the microphone 27 is arranged on the mounting surface side of the earphone 2 so that the microphone 27 can receive the sound wave echoed by the ear canal of the user 3 at the time of wearing.
- the earphone control device 20 controls the speaker 26 and the microphone 27 and communicates with the information processing device 1.
- sound such as sound wave and voice includes inaudible sound whose frequency or sound pressure level is out of the audible range.
- the information processing device 1 is the same device as described in the first to third embodiments.
- the information processing device 1 is, for example, a computer communicably connected to the earphone 2 and performs biological collation based on acoustic information.
- the information processing device 1 further controls the operation of the earphone 2, transmits voice data for generating a sound wave emitted from the earphone 2, receives voice data obtained from the sound wave received by the earphone 2, and the like.
- the information processing apparatus 1 transmits the compressed data of the music to the earphone 2.
- the information processing device 1 transmits voice data of business instructions to the earphone 2.
- the voice data of the utterance of the user 3 may be further transmitted from the earphone 2 to the information processing device 1.
- the information processing device 1 and the earphone 2 may be connected by wire. Further, the information processing device 1 and the earphone 2 may be configured as an integrated device, or another device may be included in the information processing system.
- FIG. 8 is a block diagram showing a hardware configuration example of the earphone control device 20.
- the earphone control device 20 includes a processor 201, a memory 202, a speaker I / F 203, a microphone I / F 204, a communication I / F 205, and a battery 206. Each part of the earphone control device 20 is connected to each other via a bus, wiring, a driving device, etc. (not shown).
- the description of the processor 201, the memory 202, and the communication I / F 205 will be omitted because they overlap with the first embodiment.
- the speaker I / F 203 is an interface for driving the speaker 26.
- the speaker I / F 203 includes a digital-to-analog conversion circuit, an amplifier, and the like.
- the speaker I / F 203 converts voice data into an analog signal and supplies it to the speaker 26. As a result, the speaker 26 emits a sound wave based on the voice data.
- the microphone I / F204 is an interface for acquiring a signal from the microphone 27.
- the microphone I / F 204 includes an analog-to-digital conversion circuit, an amplifier, and the like.
- the microphone I / F 204 converts an analog signal generated by a sound wave received by the microphone 27 into a digital signal. As a result, the earphone control device 20 acquires voice data based on the received sound wave.
- the battery 206 is, for example, a secondary battery and supplies the power required for the operation of the earphone 2.
- the earphone 2 can operate wirelessly without being connected to an external power source by wire.
- the battery 208 may not be provided.
- the hardware configuration shown in FIG. 8 is an example, and devices other than these may be added, and some devices may not be provided. Further, some devices may be replaced with other devices having similar functions.
- the earphone 2 may further include an input device such as a button so that the operation by the user 3 can be received, and further includes a display device such as a display and an indicator lamp for providing information to the user 3. You may.
- the hardware configuration shown in FIG. 8 can be appropriately changed.
- FIG. 9 is a functional block diagram of the earphone 2 and the information processing device 1 according to the present embodiment.
- the information processing apparatus 1 includes an acoustic characteristic acquisition unit 151, a second feature extraction unit 131, a feature selection unit 132, a determination unit 133, an output unit 134, a target data storage unit 143, and a projection matrix storage unit 142. Since the structure of the block diagram of the earphone 2 is the same as that of FIG. 7, the description thereof will be omitted.
- the functions of the functional blocks of the information processing apparatus 1 other than the acoustic characteristic acquisition unit 151 are the same as those described in the first embodiment. It is assumed that the projection matrix W that has been trained in advance is stored in the projection matrix storage unit 142, and the functional block for training is not shown in FIG. The specific contents of the processing performed by each functional block will be described later.
- each of the above-mentioned functions may be realized by the information processing device 1, the earphone control device 20, or the information processing device 1 and the earphone control device 20 in cooperation with each other. good.
- each functional block related to acquisition and determination of acoustic information is assumed to be provided in the information processing apparatus 1.
- FIG. 10 is a flowchart showing an outline of the biological collation process performed by the information processing apparatus 1 according to the present embodiment. The operation of the information processing apparatus 1 will be described with reference to FIG.
- the biological collation process of FIG. 10 is executed, for example, when the user 3 starts using the earphone 2 by operating the earphone 2.
- the biological collation process of FIG. 10 may be executed every time a predetermined time elapses when the power of the earphone 2 is on.
- step S26 the acoustic characteristic acquisition unit 151 gives an instruction to the earphone control device 20 to emit an inspection sound.
- the earphone control device 20 transmits an inspection signal to the speaker 26, and the speaker 26 emits an inspection sound generated based on the inspection signal to the ear canal of the user 3.
- the inspection signal a signal containing a predetermined range of frequency components such as a chirp signal, an M-sequence (Maximum Length Sequence) signal, white noise, and an impulse signal can be used.
- the inspection sound may be an audible sound whose frequency and sound pressure level are within the audible range. In this case, by making the user 3 perceive the sound wave at the time of collation, it is possible to inform the user 3 that the collation is being performed. Further, the inspection sound may be an inaudible sound whose frequency or sound pressure level is out of the audible range. In this case, the sound wave can be less likely to be perceived by the user 3, and the comfort at the time of use is improved.
- step S27 the microphone 27 receives the echo sound (ear sound) in the ear canal or the like and converts it into an electric signal in the time domain. This electrical signal is sometimes called an acoustic signal.
- the microphone 27 transmits an acoustic signal to the earphone control device 20, and the earphone control device 20 transmits an acoustic signal to the information processing device 1.
- the acoustic characteristic acquisition unit 151 acquires the acoustic characteristic of the frequency domain based on the sound wave propagating on the user's head.
- This acoustic characteristic can be, for example, a frequency spectrum obtained by converting an acoustic signal in the time domain into a frequency domain using an algorithm such as a fast Fourier transform.
- step S29 the target data storage unit 143 stores the acquired acoustic characteristics as the target data for feature quantity extraction.
- steps S21 to S25 are the same as those in FIG. 4, duplicated explanations will be omitted.
- the processing of each step can be embodied as follows, but is not limited to this.
- the process of extracting feature data from the target data in step S22 may be, for example, a process of extracting a logarithmic spectrum, a mer cepstrum coefficient, a linear prediction analysis coefficient, or the like from acoustic characteristics.
- the feature selection process in step S23 may be a process of reducing the dimension by applying a projection matrix to the multidimensional vector which is the feature amount data extracted in step S22.
- the determination process in step S24 may be a process of determining whether or not the user 3 corresponding to the feature amount data matches any of the feature amount data of one or two or more registrants registered in advance.
- the determination result output in step S25 is used, for example, for controlling permission or disapproval of use of the earphone 2.
- biometric collation Although an example of ear acoustic collation has been described in this embodiment, it can be similarly applied to biometric collation using other biometric information. Examples of applicable biometric information include face, iris, fingerprint, palm print, vein, voice, pinna, gait and the like.
- processing device 1 by using the projection matrix obtained by the configuration of the first embodiment to the third embodiment, information capable of appropriately reducing the dimension of the feature amount data extracted from the biological information can be performed.
- Processing device 1 is provided.
- FIG. 11 is a functional block diagram of the information processing apparatus 4 according to the fifth embodiment.
- the information processing device 4 includes an acquisition unit 401 and a calculation unit 402.
- the acquisition means 401 acquires a plurality of data, each of which is classified into one of a plurality of classes.
- the calculation means 402 calculates a projection matrix used for dimensionality reduction of a plurality of data based on an objective function including statistics of the plurality of data.
- the objective function is a first function including a first term indicating variation among a plurality of data classes between the first class and the second class among the plurality of classes, and at least one of the first class and the second class. Includes a second function, including a second term, indicating intra-class variability of a plurality of data in one.
- an information processing apparatus 4 that realizes a dimension reduction in which classes can be separated better.
- FIG. 11 is a functional block diagram of the information processing apparatus 4 according to the sixth embodiment.
- the information processing device 4 includes an acquisition unit 401 and a calculation unit 402.
- the acquisition means 401 acquires a plurality of data, each of which is classified into one of a plurality of classes.
- the calculation means 402 calculates a projection matrix used for dimensionality reduction of a plurality of data based on an objective function including statistics of the plurality of data.
- the objective function is the minimum value across multiple classes of the first function, including a first term that shows the variability between classes of multiple data and a third term that shows the average of the variability between classes of multiple data across multiple classes.
- an information processing apparatus 4 that realizes a dimension reduction in which classes can be separated better.
- the variance is exemplified as an index of the variation within the class or the variation between the classes, but a statistic other than the variance may be used as long as it is a statistic that can be an index of the variation.
- a processing method in which a program for operating the configuration of the embodiment is recorded in a storage medium so as to realize the functions of the above-described embodiment, the program recorded in the storage medium is read out as a code, and the program is executed in a computer is also described in each embodiment. Included in the category. That is, a computer-readable storage medium is also included in the scope of each embodiment. Further, not only the storage medium in which the above-mentioned program is recorded but also the program itself is included in each embodiment. Further, the one or more components included in the above-described embodiment may be a circuit such as an ASIC or FPGA configured to realize the function of each component.
- the storage medium for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, a CD (Compact Disk) -ROM, a magnetic tape, a non-volatile memory card, or a ROM can be used.
- the program recorded on the storage medium is not limited to the one that executes the processing by itself, but the one that operates on the OS (Operating System) and executes the processing in cooperation with other software and the function of the expansion board. Is also included in the category of each embodiment.
- SaaS Software as a Service
- the objective function includes a first function including a first term indicating variation among the first class and the second class of the plurality of data, and the first class and the first class. Includes a second function, including a second term, indicating intraclass variation of the plurality of data in at least one of the two classes.
- Information processing equipment includes a first function including a first term indicating variation among the first class and the second class of the plurality of data, and the first class and the first class.
- the objective function comprises a minimum or maximum value of the ratio of the first function to the second function across the plurality of classes.
- the information processing apparatus according to Appendix 1.
- the second function includes a weighted average of the intraclass variation of the plurality of data in the first class and the intraclass variation of the plurality of data in the second class.
- the information processing apparatus according to Appendix 1 or 2.
- the first function further includes a third term that indicates the average of the interclass variation of the plurality of data across the plurality of classes.
- the second function further comprises a fourth term that indicates the average intraclass variation of the plurality of data across the plurality of classes.
- the objective function is the plurality of first functions including a first term showing the variation between classes of the plurality of data and a third term showing the average of the variation between classes of the plurality of data over the plurality of classes.
- the plurality of second functions including a minimum value across classes, a second term indicating the intraclass variation of the plurality of data, and a fourth term indicating the average of the intraclass variation of the plurality of data across the plurality of classes. Including the ratio to the maximum value over the class of Information processing equipment.
- the calculation means determines the projection matrix by performing optimization that maximizes or minimizes the objective function under predetermined constraints.
- the information processing apparatus according to any one of Supplementary note 1 to 5.
- the data is feature amount data extracted from biological information.
- the information processing apparatus according to any one of Supplementary note 1 to 6.
- the objective function includes a first function including a first term indicating variation among the first class and the second class of the plurality of data, and the first class and the first class.
- the objective function includes a second function, including a second term, indicating intraclass variation of the plurality of data in at least one of the two classes.
- the objective function is the plurality of first functions including a first term showing the variation between classes of the plurality of data and a third term showing the average of the variation between classes of the plurality of data over the plurality of classes.
- the plurality of second functions including a minimum value across classes, a second term indicating the intraclass variation of the plurality of data, and a fourth term indicating the average of the intraclass variation of the plurality of data across the plurality of classes. Including the ratio to the maximum value over the class of An information processing method that executes an information processing method.
- the objective function includes a first function including a first term indicating variation among the first class and the second class of the plurality of data, and the first class and the first class.
- the objective function includes a second function, including a second term, indicating intraclass variation of the plurality of data in at least one of the two classes.
- a storage medium in which a program for executing an information processing method is stored.
- the objective function is the plurality of first functions including a first term showing the variation between classes of the plurality of data and a third term showing the average of the variation between classes of the plurality of data over the plurality of classes.
- the plurality of second functions including a minimum value across classes, a second term indicating the intraclass variation of the plurality of data, and a fourth term indicating the average of the intraclass variation of the plurality of data across the plurality of classes. Including the ratio to the maximum value over the class of A storage medium in which a program for executing an information processing method is stored.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
L'invention concerne un dispositif de traitement d'informations comprenant : un moyen d'acquisition qui acquiert une pluralité d'éléments de données qui ont chacun été triés dans une classe d'une pluralité de classes; et un moyen de calcul qui calcule, d'après une fonction objective comprenant une statistique pour la pluralité d'éléments de données, une matrice de projection qui est destinée à la réduction de dimensionnalité de la pluralité d'éléments de données. La fonction objective comprend : une première fonction qui comprend un premier terme indiquant la variation inter-classe parmi la pluralité de données entre une première classe et une seconde classe de la pluralité de classes; et une seconde fonction qui comprend un second terme indiquant la variation de classe dans la pluralité de données pour la première classe et/ou la seconde classe.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/026973 WO2022009408A1 (fr) | 2020-07-10 | 2020-07-10 | Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement |
US18/014,676 US20230259580A1 (en) | 2020-07-10 | 2020-07-10 | Information processing apparatus, information processing method, and storage medium |
JP2022534611A JP7533584B2 (ja) | 2020-07-10 | 2020-07-10 | 情報処理装置、情報処理方法及び記憶媒体 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/026973 WO2022009408A1 (fr) | 2020-07-10 | 2020-07-10 | Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022009408A1 true WO2022009408A1 (fr) | 2022-01-13 |
Family
ID=79552369
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/026973 WO2022009408A1 (fr) | 2020-07-10 | 2020-07-10 | Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230259580A1 (fr) |
JP (1) | JP7533584B2 (fr) |
WO (1) | WO2022009408A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003177785A (ja) * | 2001-12-10 | 2003-06-27 | Nec Corp | 線形変換行列計算装置及び音声認識装置 |
-
2020
- 2020-07-10 JP JP2022534611A patent/JP7533584B2/ja active Active
- 2020-07-10 US US18/014,676 patent/US20230259580A1/en active Pending
- 2020-07-10 WO PCT/JP2020/026973 patent/WO2022009408A1/fr active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003177785A (ja) * | 2001-12-10 | 2003-06-27 | Nec Corp | 線形変換行列計算装置及び音声認識装置 |
Non-Patent Citations (1)
Title |
---|
SU BING; DING XIAOQING; CHANGSONG LIU; YING WU: "Heteroscedastic max-min distance analysis", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 7 June 2015 (2015-06-07), pages 4539 - 4547, XP032793910, DOI: 10.1109/CVPR.2015.7299084 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022009408A1 (fr) | 2022-01-13 |
US20230259580A1 (en) | 2023-08-17 |
JP7533584B2 (ja) | 2024-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11705105B2 (en) | Speech synthesizer for evaluating quality of synthesized speech using artificial intelligence and method of operating the same | |
CN109564759A (zh) | 说话人识别 | |
Mansour et al. | Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms | |
US10311865B2 (en) | System and method for automated speech recognition | |
Wu et al. | Robust multifactor speech feature extraction based on Gabor analysis | |
Gunawan et al. | Development of quranic reciter identification system using MFCC and GMM classifier | |
Ramanarayanan et al. | Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories | |
Chin et al. | Speaker identification using discriminative features and sparse representation | |
JP7519021B2 (ja) | 情報処理装置、情報処理方法及び記憶媒体 | |
JP2015175859A (ja) | パターン認識装置、パターン認識方法及びパターン認識プログラム | |
WO2022009408A1 (fr) | Dispositif de traitement d'informations, procédé de traitement d'informations, et support d'enregistrement | |
Schafer et al. | Noise-robust speech recognition through auditory feature detection and spike sequence decoding | |
JP4946330B2 (ja) | 信号分離装置及び方法 | |
Al-Talabani | Automatic speech emotion recognition-feature space dimensionality and classification challenges | |
Mostafa et al. | Voiceless Bangla vowel recognition using sEMG signal | |
CN116964669A (zh) | 用于产生音频信号的系统和方法 | |
Folorunso et al. | Laughter signature, a new approach to gender recognition | |
US11017782B2 (en) | Speaker classification | |
Can et al. | A Review of Recent Machine Learning Approaches for Voice Authentication Systems | |
KR102709425B1 (ko) | 발화 상상 시 뇌파 기반 음성 합성 방법 및 장치 | |
WO2024157726A1 (fr) | Dispositif de traitement d'informations, procédé de traitement d'informations et support de stockage | |
US20240161747A1 (en) | Electronic device including text to speech model and method for controlling the same | |
Trabelsi et al. | Dynamic sequence-based learning approaches on emotion recognition systems | |
Bhargava | Vocal source separation using spectrograms and spikes, applied to speech and birdsong | |
US20210397649A1 (en) | Recognition apparatus, recognition method, and computer-readable recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20944423 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022534611 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20944423 Country of ref document: EP Kind code of ref document: A1 |