WO2022157973A1 - 情報処理システム、情報処理方法、及びコンピュータプログラム - Google Patents
情報処理システム、情報処理方法、及びコンピュータプログラム Download PDFInfo
- Publication number
- WO2022157973A1 WO2022157973A1 PCT/JP2021/002439 JP2021002439W WO2022157973A1 WO 2022157973 A1 WO2022157973 A1 WO 2022157973A1 JP 2021002439 W JP2021002439 W JP 2021002439W WO 2022157973 A1 WO2022157973 A1 WO 2022157973A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- likelihood ratio
- likelihood
- information processing
- learning
- processing system
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 125
- 238000004590 computer program Methods 0.000 title claims description 18
- 238000003672 processing method Methods 0.000 title claims description 9
- 230000006870 function Effects 0.000 claims abstract description 138
- 238000004364 calculation method Methods 0.000 claims abstract description 66
- 238000000034 method Methods 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 description 26
- 230000000694 effects Effects 0.000 description 21
- 230000008569 process Effects 0.000 description 14
- 238000012549 training Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000011478 gradient descent method Methods 0.000 description 9
- 239000000047 product Substances 0.000 description 9
- 238000011840 criminal investigation Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Definitions
- This disclosure relates to the technical field of information processing systems, information processing methods, and computer programs that process information related to classification, for example.
- Patent Document 1 discloses a technique for classifying series data into one of a plurality of predetermined classes by sequentially acquiring and analyzing multiple elements included in the series data.
- Patent Document 2 discloses classifying the movement trajectories included in the image subset into subclasses, assigning the same subclass label to trajectories with a high subclass sharing ratio, and classifying each subclass.
- Patent Document 4 discloses optimizing the parameters of an identification device by updating the parameters so that the loss function including the log-likelihood ratio becomes small.
- This disclosure aims to improve the related technology described above.
- One aspect of the information processing system of this disclosure is an acquisition unit that acquires a plurality of elements included in series data, and a class to which the series data belongs based on at least two consecutive elements among the plurality of elements.
- calculation means for calculating a likelihood ratio indicating likelihood; classification means for classifying the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio; learning means for performing learning regarding calculation of the likelihood ratio using an exp-type loss function.
- One aspect of the information processing method of this disclosure acquires a plurality of elements included in series data, and determines the likelihood of a class to which the series data belongs based on at least two consecutive elements among the plurality of elements. Calculate the likelihood ratio shown, classify the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio, and use a log-sum-exp type loss function, Learning regarding calculation of the likelihood ratio is performed.
- One aspect of the computer program of the present disclosure obtains a plurality of elements included in series data, and indicates the likelihood of a class to which the series data belongs based on at least two consecutive elements among the plurality of elements. Calculate a likelihood ratio, classify the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio, and use a log-sum-exp type loss function to The computer is operated so as to learn about calculating the likelihood ratio.
- FIG. 2 is a block diagram showing the hardware configuration of the information processing system according to the first embodiment
- FIG. 1 is a block diagram showing a functional configuration of an information processing system according to a first embodiment
- FIG. 4 is a flow chart showing the flow of operation of the classification device in the information processing system according to the first embodiment
- 4 is a flow chart showing the flow of operation of a learning unit in the information processing system according to the first embodiment
- 9 is a flow chart showing the operation flow of a learning unit in the information processing system according to the second embodiment
- FIG. 11 is a matrix diagram showing an example of likelihood ratios considered by a learning unit in the information processing system according to the second embodiment
- 13 is a flow chart showing the flow of operation of a learning unit in an information processing system according to the third embodiment
- FIG. 16 is a flow chart showing the flow of operation of a learning unit in an information processing system according to the fourth embodiment
- FIG. FIG. 14 is a matrix diagram showing an example of likelihood ratios considered by a learning unit in an information processing system according to a fourth embodiment
- FIG. 22 is a block diagram showing a functional configuration of an information processing system according to a seventh embodiment
- FIG. FIG. 16 is a flow chart showing the flow of operation of a classification device in an information processing system according to a seventh embodiment
- FIG. FIG. 22 is a block diagram showing a functional configuration of an information processing system according to an eighth embodiment
- FIG. FIG. 21 is a flow chart showing the flow of operations of a likelihood ratio calculation unit in an information processing system according to the eighth embodiment
- FIG. FIG. 21 is a flow chart showing the flow of operation of a classification device in an information processing system according to a ninth embodiment
- FIG. 1 An information processing system according to the first embodiment will be described with reference to FIGS. 1 to 4.
- FIG. 1 An information processing system according to the first embodiment will be described with reference to FIGS. 1 to 4.
- FIG. 1 is a block diagram showing the hardware configuration of an information processing system according to the first embodiment.
- the information processing system 1 includes a processor 11, a RAM (Random Access Memory) 12, a ROM (Read Only Memory) 13, and a storage device .
- the information processing system 1 may further include an input device 15 and an output device 16 .
- Processor 11 , RAM 12 , ROM 13 , storage device 14 , input device 15 and output device 16 are connected via data bus 17 .
- the processor 11 reads a computer program.
- processor 11 is configured to read a computer program stored in at least one of RAM 12, ROM 13 and storage device .
- the processor 11 may read a computer program stored in a computer-readable recording medium using a recording medium reader (not shown).
- the processor 11 may acquire (that is, read) a computer program from a device (not shown) arranged outside the information processing system 1 via a network interface.
- the processor 11 controls the RAM 12, the storage device 14, the input device 15 and the output device 16 by executing the read computer program.
- the processor 11 implements functional blocks for performing classification using likelihood ratios and related learning processing.
- Examples of the processor 11 include a CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field-Programmable Gate Array), DSP (Demand-Side Platform), and ASIC (Application Specific Integrate).
- the processor 11 may use one of the examples described above, or may use a plurality of them in parallel.
- the RAM 12 temporarily stores computer programs executed by the processor 11.
- the RAM 12 temporarily stores data temporarily used by the processor 11 while the processor 11 is executing the computer program.
- the RAM 12 may be, for example, a D-RAM (Dynamic RAM).
- the ROM 13 stores computer programs executed by the processor 11 .
- the ROM 13 may also store other fixed data.
- the ROM 13 may be, for example, a P-ROM (Programmable ROM).
- the storage device 14 stores data that the information processing system 1 saves for a long period of time.
- Storage device 14 may act as a temporary storage device for processor 11 .
- the storage device 14 may include, for example, at least one of a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and a disk array device.
- the input device 15 is a device that receives input instructions from the user of the information processing system 1 .
- Input device 15 may include, for example, at least one of a keyboard, mouse, and touch panel.
- the input device 15 may be a dedicated controller (operation terminal).
- the input device 15 may include a terminal owned by the user (for example, a smart phone, a tablet terminal, or the like).
- the input device 15 may be a device capable of voice input including, for example, a microphone.
- the output device 16 is a device that outputs information about the information processing system 1 to the outside.
- the output device 16 may be a display device (for example, display) capable of displaying information about the information processing system 1 .
- the display device here may be a television monitor, a personal computer monitor, a smart phone monitor, a tablet terminal monitor, or a monitor of other mobile terminals.
- the display device may be a large monitor, digital signage, or the like installed in various facilities such as stores.
- the output device 16 may be a device that outputs information in a format other than an image.
- the output device 16 may be a speaker that outputs information about the information processing system 1 by voice.
- FIG. 2 is a block diagram showing the functional configuration of the information processing system according to the first embodiment.
- the information processing system 1 includes a classification device 10 and a learning section 300 .
- the classification device 10 is a device that performs class classification of input series data, and includes a data acquisition unit 50, a likelihood ratio calculation unit 100, and a class classification unit 200 as processing blocks for realizing the function. configured with.
- the learning unit 300 is configured to be able to execute learning processing regarding the classification device 10 .
- the classification device 10 may be configured to include the learning unit 300 .
- Each of the data acquisition unit 50, the likelihood ratio calculation unit 100, the class classification unit 200, and the learning unit 300 may be implemented by the above-described processor 11 (see FIG. 1).
- the data acquisition unit 50 is configured to be able to acquire a plurality of elements included in series data.
- the data acquisition unit 50 may directly acquire data from an arbitrary data acquisition device (for example, a camera, a microphone, etc.), or read data that has been acquired in advance by a data acquisition device and stored in a storage or the like. can be anything.
- the data acquisition unit 50 may be configured to acquire data from each of the plurality of cameras.
- the elements of the series data acquired by the data acquisition unit 50 are configured to be output to the likelihood ratio calculation unit 100 .
- series data is data that includes a plurality of elements arranged in a predetermined order, and an example thereof is time series data. More specific examples of series data include moving image data and audio data, but are not limited to these.
- the likelihood ratio calculator 100 is configured to be able to calculate the likelihood ratio based on at least two consecutive elements among the plurality of elements acquired by the data acquisition unit 50 .
- the “likelihood ratio” here is an index indicating the likelihood of the class to which the series data belongs.
- a specific example of the likelihood ratio and a specific calculation method will be described in detail in other embodiments described later.
- the class classification unit 200 is configured to be able to classify series data based on the likelihood ratios calculated by the likelihood ratio calculation unit 100 .
- the class classification unit 200 selects at least one class to which series data belongs from among a plurality of classes that are classification candidates.
- a plurality of classes that are classification candidates may be set in advance.
- a plurality of classes, which are classification candidates may be set as appropriate by the user, or may be set as appropriate based on the type of series data to be handled.
- the learning unit 300 uses the loss function to learn about calculating the likelihood ratio. Specifically, learning regarding the calculation of the likelihood ratio is performed so that the class classification based on the likelihood ratio is performed accurately.
- the loss function used by the learning unit 300 according to the present embodiment is a log-sum-exp type loss function, more specifically, a function in which log contains sum and exp.
- a loss function may be preset as a function that satisfies such a definition. A specific example of the loss function will be described in detail in another embodiment described later.
- FIG. 3 is a flow chart showing the operation flow of the classification device in the information processing system according to the first embodiment.
- the data acquisition unit 50 first acquires the elements included in the series data (step S11).
- the data acquisition unit 50 outputs the acquired elements of the series data to the likelihood ratio calculation unit 100 .
- the likelihood ratio calculator 100 calculates the likelihood ratio based on the two or more acquired elements (step S12).
- the class classification unit 200 performs class classification based on the calculated likelihood ratio (step S13).
- the class classification may determine one class to which the series data belongs, or may determine a plurality of classes to which the series data are highly likely to belong.
- the class classification unit 200 may output the result of class classification to a display or the like. Further, the class classification unit 200 may output the result of class classification by voice through a speaker or the like.
- FIG. 4 is a flow chart showing the operation flow of the learning unit in the information processing system according to the first embodiment.
- training data is first input to the learning unit 300 (step S101).
- the training data may be configured, for example, as a set of series data and information on the correct class to which the series data belongs (that is, correct data).
- the learning unit 300 adjusts the parameters (specifically, the parameters of the model for calculating the likelihood ratio) so that the calculated loss function becomes smaller (step S103). That is, the learning unit 300 optimizes the parameters of the model for calculating the likelihood ratio.
- the learning unit 300 optimizes the parameters of the model for calculating the likelihood ratio.
- existing techniques can be appropriately adopted.
- An example of an optimization technique is error backpropagation, but other techniques may be used.
- the learning unit 300 determines whether or not all learning has been completed (step S104).
- the learning unit 300 may determine whether or not all learning has been completed, depending on whether or not all training data has been input, for example.
- the learning section 300 may determine whether or not all learning has been completed based on whether or not a predetermined period of time has elapsed since the start of learning.
- the learning unit 300 may determine whether or not all learning has been completed by determining whether or not the processing from steps S101 to S103 described above has been looped a predetermined number of times.
- step S104 If it is determined that all learning has been completed (step S104: YES), the series of processing ends. On the other hand, if it is determined that all learning has not been completed (step S104: NO), the learning section 300 starts the process from step S101 again. As a result, the learning process using the training data is repeated, and the parameters are adjusted to be more optimal.
- the learning unit 300 performs learning regarding calculation of likelihood ratios used for class classification. Especially in this embodiment, learning is performed using a log-sum-exp type loss function.
- a log-sum-exp type loss function for learning the likelihood ratio improves the convergence in the stochastic gradient descent method. More specifically, it is possible to assign larger gradients to those that are relatively difficult to classify using the likelihood ratio (for example, hard class, hard frame, hard example), so convergence is accelerated and efficiency is improved. It is possible to learn For example, it takes a relatively long time to learn a DNN (Deep Neural Network), so by improving the convergence as described above, extremely efficient learning can be performed.
- DNN Deep Neural Network
- Loss Weighting As an existing technology that can be used when learning difficult classifications, there is a method of weighting by multiplying the loss function by an appropriate coefficient (so-called Loss Weighting), but this method has a coefficient Empirical rules and tuning are required when deciding. There is also a known method of learning by inputting difficult-to-classify data many times while allowing duplication (so-called oversampling). It takes a lot of steps to see and slows convergence. Alternatively, there is a known method of emphasizing difficult data by deleting data that is easy to classify (so-called undersampling). Degradation is inevitable. However, if learning is performed using the log-sum-exp type loss function described in this embodiment, efficient learning can be performed while solving the above problems.
- FIG. 5 An information processing system 1 according to the second embodiment will be described with reference to FIGS. 5 and 6.
- FIG. 5 It should be noted that the second embodiment differs from the above-described first embodiment only in a part of the operation. ) and the like may be the same as in the first embodiment. Therefore, in the following, portions different from the first embodiment will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- FIG. 5 is a flow chart showing the operation flow of the information processing system according to the second embodiment.
- the same reference numerals are given to the same processes as those shown in FIG.
- training data is first input to the learning unit 300 (step S101).
- the learning unit 300 calculates a loss function using the input training data.
- the likelihood of belonging to one class is the denominator, and the likelihood of belonging to another class is the numerator.
- This loss function is a function that increases the likelihood ratio when the correct class to which the series data belongs is in the numerator of the likelihood ratio, and decreases the likelihood ratio when the correct class is in the denominator of the likelihood ratio.
- This loss function is also a log-sum-exp type loss function, as in the first embodiment. The likelihood ratio considered in the loss function will be described later in detail with specific examples.
- the learning unit 300 adjusts the parameters so that the calculated loss function becomes smaller (step S103). That is, the learning unit 300 optimizes the parameters of the model for calculating the likelihood ratio. After that, the learning unit 300 determines whether or not all learning has been completed (step S104). If it is determined that all learning has been completed (step S104: YES), a series of processing ends. On the other hand, if it is determined that all learning has not been completed (step S104: NO), the learning section 300 starts the process from step S101 again.
- FIG. 6 is a matrix diagram showing an example of likelihood ratios considered by a learning unit in the information processing system according to the second embodiment.
- the likelihood is considered in a matrix form.
- class 0 the likelihood indicating the likelihood that the series data is “class 0”.
- the numerator of the logarithmic likelihood ratio (hereinafter simply referred to as "likelihood ratio”) is all p(X
- y 0).
- the numerators of the likelihood ratios are all p(X
- y 1).
- the numerators of the likelihood ratios are all p(X
- y 2).
- the denominators of the likelihood ratios are all p(X
- y 0).
- the denominators of the likelihood ratios are all p(X
- y 1).
- the denominator of the likelihood ratio is all p(X
- y 2).
- the likelihood ratios on the diagonal of the matrix have the same likelihood in the denominator and the numerator. Specifically, log ⁇ p(X
- y 0)/p(X
- y 0) ⁇ in the first row from the top and the first column from the left, and in the second row from the top and the second column from the left log ⁇ p(X
- y 1)/p(X
- y 1) ⁇ , third row from top, third column from left log ⁇ p(X
- y 2)/p(X
- y 2) ⁇ have the same denominator and numerator respectively.
- the likelihood ratios located opposite to each other with the likelihood ratio on the diagonal line have opposite denominators and numerators.
- y 0)/p(X
- y 1) ⁇ in the first row from the top and the second column from the left, and log ⁇ p(X
- y 0)/p(X
- y 1) ⁇ , the denominator and numerator are reversed.
- y 0)/p(X
- y 2) ⁇ in the first row from the top and the third column from the left and log In ⁇ p(X
- y 2)/p(X
- y 0) ⁇ , the denominator and the numerator are reversed.
- the likelihood ratios where the denominator and numerator on the diagonal line are the same are all log1, and their value is zero. For this reason, the likelihood ratio in which the denominator and the numerator on the diagonal line are the same becomes a substantially meaningless value even if it is considered in the loss function. Therefore, the likelihood ratio in which the denominator and the numerator on the diagonal line are the same is not considered in the loss function.
- the number of likelihood ratios remaining after excluding the likelihood ratios on the diagonal is N ⁇ (N ⁇ 1) where N is the number of classes.
- the likelihood ratios of these N ⁇ (N ⁇ 1) patterns ie, the likelihood ratios excluding the diagonal likelihood ratios in the matrix
- a specific example of the loss function considering the likelihood ratio of N ⁇ (N ⁇ 1) patterns will be described in detail in another embodiment described later.
- the likelihood indicating the likelihood of belonging to one class is used as the denominator, and the likelihood indicating the likelihood of belonging to another class is used as the denominator.
- Learning is performed using a loss function that considers the likelihood ratios of N ⁇ (N ⁇ 1) patterns as numerators.
- a loss function that considers the likelihood ratios of N ⁇ (N ⁇ 1) patterns as numerators.
- the loss function used in the second embodiment is a log-sum-exp type loss function as in the first embodiment. Therefore, it is possible to improve the convergence in the stochastic gradient descent method, and as a result, it is possible to perform efficient learning.
- FIG. 7 is a flow chart showing the operation flow of the information processing system according to the third embodiment.
- the same reference numerals are assigned to the same processes as those shown in FIG.
- training data is first input to the learning unit 300 (step S101).
- the learning unit 300 calculates a loss function using the input training data.
- a loss function is calculated taking into consideration part of the likelihood ratio of N ⁇ (N ⁇ 1) patterns whose denominator is the likelihood indicating the likelihood of belonging to another class and whose numerator is the likelihood indicating the likelihood of belonging to another class (step S301 ). That is, the learning unit 300 according to the third embodiment does not consider all the likelihood ratios of the N ⁇ (N ⁇ 1) patterns described in the second embodiment, but only some of them.
- this loss function also has a large likelihood ratio when the correct class to which the series data belongs is in the numerator of the likelihood ratio, and when the correct class is in the denominator of the likelihood ratio It is a function that reduces the likelihood ratio.
- This loss function is also a log-sum-exp type loss function, as in the first embodiment.
- the learning unit 300 adjusts the parameters so that the calculated loss function becomes smaller (step S103). After that, the learning unit 300 determines whether or not all learning has been completed (step S104). If it is determined that all learning has been completed (step S104: YES), a series of processing ends. On the other hand, if it is determined that all learning has not been completed (step S104: NO), the learning section 300 starts the process from step S101 again.
- some of the likelihood ratios to be considered in the loss function may be selected in advance by the user or the like, or may be automatically selected by the learning unit 300.
- the learning unit 300 may select the likelihood ratios according to a preset rule.
- the learning unit 300 may determine whether to make a selection based on the calculated likelihood ratio value.
- An example of selecting some likelihood ratios to be considered in the loss function is, for example, selecting only the likelihood ratios of one row or one column in the matrix shown in FIG.
- the likelihood ratio to be considered in the loss function only the likelihood ratio in the first row of the matrix shown in FIG. 6 may be selected, or only the likelihood ratio in the second row may be selected, Only the likelihood ratio in the third row may be selected.
- only the likelihood ratios in the first column of the matrix may be selected, only the likelihood ratios in the second column may be selected, or only the likelihood ratios in the third column may be selected.
- only the likelihood ratios of some multiple rows or some multiple columns in the matrix may be selected. Specifically, only the likelihood ratios in the first and second rows of the matrix may be selected, only the likelihood ratios in the second and third rows may be selected, and the third row and only the likelihood ratios in the first row may be selected. Alternatively, only the likelihood ratios in the first and second columns of the matrix may be selected, only the likelihood ratios in the second and third columns may be selected, or the third and first columns may be selected. Only the eye likelihood ratios may be selected.
- likelihood ratios considered in the loss function may be randomly selected without regard to rows or columns.
- the likelihood indicating the likelihood of belonging to one class is used as the denominator, and the likelihood indicating the likelihood of belonging to another class is used as the numerator.
- Learning is performed using a loss function that considers a portion of the likelihood ratios of the N ⁇ (N ⁇ 1) patterns.
- a loss function that considers a portion of the likelihood ratios of the N ⁇ (N ⁇ 1) patterns.
- the likelihood ratios to be considered in the loss function from among the N ⁇ (N ⁇ 1) patterns, all the likelihood ratios of the N ⁇ (N ⁇ 1) patterns Efficient learning can be performed compared to the case of considering .
- learning efficiency can be improved by selecting only likelihood ratios that have a relatively large effect on the loss function and not selecting likelihood ratios that have a relatively small effect on the loss function.
- the loss function used in the third embodiment is also a log-sum-exp type loss function, as in each of the above-described embodiments. Therefore, the convergence in the stochastic gradient descent method can be improved, and as a result, more efficient learning can be performed.
- FIG. 8 An information processing system 1 according to the fourth embodiment will be described with reference to FIGS. 8 and 9.
- FIG. 8 the fourth embodiment describes a specific selection example of the above-described third embodiment (that is, a selection example of some likelihood ratios to be considered in the loss function), and other parts , may be the same as in the third embodiment. Therefore, in the following, portions different from the already described embodiments will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- FIG. 8 is a flow chart showing the operation flow of the information processing system according to the fourth embodiment.
- the same reference numerals are assigned to the same processes as those shown in FIG.
- training data is first input to the learning unit 300 (step S101).
- the learning unit 300 calculates a loss function using the input training data. is in the numerator to calculate a loss function considering the likelihood ratio (step S401). That is, the learning unit 300 according to the fourth embodiment selects the likelihood ratios in which the correct class is in the numerator as the likelihood ratios of a portion of the N ⁇ (N ⁇ 1) patterns described in the third embodiment. As in the second and third embodiments, this loss function also has a large likelihood ratio when the correct class to which the series data belongs is in the numerator of the likelihood ratio, and the correct class is in the denominator of the likelihood ratio. It is a function that reduces the likelihood ratio in a certain case. This loss function is also a log-sum-exp type loss function, as in the first embodiment. A specific example of the loss function considering the likelihood ratio that the correct class is in the numerator will be described in detail in another embodiment described later.
- the learning unit 300 adjusts the parameters so that the calculated loss function becomes smaller (step S103). After that, the learning unit 300 determines whether or not all learning has been completed (step S104). If it is determined that all learning has been completed (step S104: YES), a series of processing ends. On the other hand, if it is determined that all learning has not been completed (step S104: NO), the learning section 300 starts the process from step S101 again.
- FIG. 9 is a matrix diagram showing an example of likelihood ratios considered by a learning unit in the information processing system according to the fourth embodiment.
- the likelihood ratios are arranged like an alternating matrix, as already explained in the second embodiment (see FIG. 6).
- the learning unit 300 selects the likelihood ratios in which the correct class is in the numerator from among the likelihood ratios of N ⁇ (N ⁇ 1) patterns excluding the likelihood ratios on the diagonal line in such a matrix. to be considered in the loss function.
- the learning unit 300 selects the likelihood ratios in which class 1 is in the numerator from the likelihood ratios of N ⁇ (N ⁇ 1) patterns and considers them in the loss function. Specifically, only the likelihood ratios in the second row from the top of FIG. 9 (excluding the likelihood ratios on the diagonal line) are selected and considered in the loss function. In this case, log ⁇ p(X
- y 1)/p(X
- y 0) ⁇ on the second row from the top and the first column from the left and log on the second row from the top and the third column from the left ⁇ p(X
- y 1)/p(X
- y 2) ⁇ will be considered in the loss function. That is, the likelihood ratios not shaded in gray in FIG. 9 are taken into account in the loss function.
- learning section 300 calculates the likelihood that class 0 is in the numerator from the likelihood ratios of the N ⁇ (N ⁇ 1) patterns.
- a power ratio may be selected and taken into account in the loss function. Specifically, only the likelihood ratios in the first row from the top of FIG. 9 (excluding the likelihood ratios on the diagonal line) should be selected and considered in the loss function. In this case, log ⁇ p(X
- y 0)/p(X
- y 1) ⁇ in the first row from the top and the second column from the left and log in the first row from the top and the third column from the left ⁇ p(X
- y 0)/p(X
- y 2) ⁇ will be considered in the loss function.
- the learning unit 300 determines that class 2 is in the numerator from the likelihood ratios of the N ⁇ (N ⁇ 1) patterns.
- a likelihood ratio may be selected to be considered in the loss function. Specifically, only the likelihood ratios in the third row from the top of FIG. 9 (excluding the likelihood ratios on the diagonal line) should be selected and considered in the loss function. In this case, log ⁇ p(X
- y 2)/p(X
- y 0) ⁇ on the third row from the top and the first column from the left and log on the third row from the top and the second column from the left ⁇ p(X
- y 2)/p(X
- y 1) ⁇ will be considered in the loss function.
- the information processing system 1 uses a loss function that considers the likelihood ratio that the correct class is in the numerator of the N ⁇ (N ⁇ 1) patterns. Learning is performed. If such a loss function is used, as in each of the above-described embodiments, appropriate learning is performed. It becomes possible to select to Moreover, in the fourth embodiment, the likelihood ratio that the correct class is in the numerator (in other words, the likelihood ratio that may have a large impact on the loss function) is considered in the loss function. -1) Efficient learning can be performed compared to the case where all likelihood ratios of patterns are considered.
- the loss function used in the fourth embodiment is also a log-sum-exp type loss function, as in each of the above-described embodiments. Therefore, the convergence in the stochastic gradient descent method can be improved, and as a result, more efficient learning can be performed.
- the fifth embodiment describes a specific example of the loss function used in the above-described first to fourth embodiments, and the device configuration and operation flow are the same as those in the first to fourth embodiments. can be Therefore, in the following, portions different from the already described embodiments will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- Equation (1) is a loss function corresponding to the configuration that considers the likelihood ratio that the correct class is in the numerator, as described in the fourth embodiment.
- K is the number of classes
- M is the number of data
- T is the time series length.
- i is a subscript in the row direction
- l is a subscript in the column direction (that is, a subscript indicating the row number and column number in the matrix shown in FIG. 6, etc.).
- ⁇ is the likelihood ratio, and in Equation (1) above, represents the logarithmic likelihood ratio at the label yk row, l-th column at time t.
- the above formula (1) is in the form of "log( ⁇ exp(x))" with sum in log, and there is a large gradient for the dominant sum in log. Therefore, for example, the convergence in the stochastic gradient descent method is faster than loss functions such as " ⁇ log(1+exp(x))" and " ⁇ log(x)".
- Equation (2) when the loss function is transformed as in Equation (2) above, there are multiple variations depending on which sum is included in log. For example, in equation (2), only the sum about K is put in log, but only the sum about M may be put in log, or only the sum about T may be put in log . Alternatively, we can put two sums over M and T into log, two sums over M and K into log, two sums over T and K into log You can put it in.
- the sum to be included in the log may be determined in consideration of the influence given by each item. It should be noted that it suffices if it is set in advance as to which loss function with which sum is included in log is to be used. However, the configuration may be such that the user can appropriately select which loss function to use which sum is included in the log.
- the learning unit 300 uses loss functions such as the above formulas (1) and (2). Therefore, it is possible to improve the convergence in the stochastic gradient descent method, and as a result, it is possible to perform efficient learning.
- a loss function that includes multiple sums as in equation (2), select at least one sum to be included in log and put the remaining sums out of log to give convergence You can change the effect. As a result, it is possible to perform more efficient learning by appropriately setting which of a plurality of sums is to be used as a loss function in which the sum is included in log.
- the sixth embodiment like the fifth embodiment, describes a specific example of the loss function used in the above-described first to fourth embodiments. It may be the same as the first to fourth embodiments. Therefore, in the following, portions different from the already described embodiments will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- Equation (3) above is a loss function corresponding to the configuration that considers all the likelihood ratios of the N ⁇ (N ⁇ 1) patterns described in the second embodiment.
- K is the number of classes
- M is the number of data
- T is the time series length.
- i is a subscript in the row direction
- l is a subscript in the column direction (that is, a subscript indicating the row number and column number in the matrix shown in FIG. 6, etc.).
- ⁇ is the Kronecker delta, which is "1" if the indices match, and "0" otherwise.
- ⁇ is the likelihood ratio, and in the above formula (3), represents the logarithmic likelihood ratio at the label yk row, l-th column at time t.
- the loss function may be weighted. For example, weighting the above equation (4) results in the following equation (5).
- weighting factors w it and w' itkl in the above equation (5) are weighting factors. These weighting factors may be values determined by empirical rules or tuning, for example. Alternatively, weighting may be performed using only one of the weighting coefficients w it and w′ itkl .
- the weighting in equation (5) described above is merely an example, and weighting may be performed by multiplying a term different from equation (5) by a weighting factor, for example.
- Equation (4) Although an example of weighting a loss function such as Equation (4) has been described here, other log-sum-exp type loss functions can be similarly weighted. For example, the equation (3) before transformation may be weighted, or the equations (1) and (2) described in the fifth embodiment may be weighted.
- the learning unit 300 uses loss functions such as Equations (3), (4), and (5) above. Therefore, it is possible to improve the convergence in the stochastic gradient descent method, and as a result, it is possible to perform efficient learning. Further, more efficient learning can be performed by weighting as in Equation (5).
- FIG. 10 An information processing system 1 according to the seventh embodiment will be described with reference to FIGS. 10 and 11.
- FIG. The seventh embodiment differs from the above-described first to sixth embodiments only in a part of the configuration and operation (specifically, the configuration and operation of the classification device 10). It may be the same as the first to sixth embodiments. Therefore, in the following, portions different from the already described embodiments will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- FIG. 10 is a block diagram showing the functional configuration of an information processing system according to the seventh embodiment.
- symbol is attached
- the likelihood ratio calculator 100 in the classification device 10 includes a first calculator 110 and a second calculator 120 .
- each of the first calculation unit 110 and the second calculation unit 120 may be implemented by, for example, the above-described processor 11 (see FIG. 1).
- the first calculation unit 110 is configured to be able to calculate individual likelihood ratios based on two consecutive elements included in series data.
- the individual likelihood ratio is calculated as a likelihood ratio indicating the likelihood of a class to which two consecutive elements belong.
- the first calculation unit 110 may, for example, sequentially obtain elements included in the series data from the data obtaining unit 50 and sequentially calculate individual likelihood ratios based on two consecutive elements.
- the individual likelihood ratio calculated by the first calculator 110 is configured to be output to the second calculator 120 .
- the second calculator 120 is configured to be able to calculate the integrated likelihood ratio based on the plurality of individual likelihood ratios calculated by the first calculator 110 .
- the integrated likelihood ratio is calculated as a likelihood ratio indicating the likelihood of a class to which the plurality of elements considered in each of the plurality of individual likelihood ratios belong.
- the integrated likelihood ratio is calculated as a likelihood ratio indicating the likelihood of a class to which series data composed of multiple elements belong.
- the integrated likelihood ratio calculated by the second calculation unit 120 is configured to be output to the class classification unit 200 .
- the class classification unit 200 classifies the series data based on the integrated likelihood ratio.
- the learning unit 300 may perform learning as the entire likelihood ratio calculation unit 100 (that is, the first calculation unit 110 and the second calculation unit 120 together).
- the learning may be performed separately for the first calculator 110 and the second calculator 120 .
- the learning unit 300 may be separately provided as a first learning unit that performs learning only on the first calculation unit 110 and a second learning unit that performs learning only on the second calculation unit 120 . In this case, only one of the first learning section and the second learning section may be provided.
- FIG. 11 is a flow chart showing the operation flow of the classification device in the information processing system according to the seventh embodiment.
- the data acquisition unit 50 first acquires the elements included in the series data (step S21).
- the data acquisition unit 50 outputs the acquired elements of the series data to the first calculation unit 110 .
- the first calculator 110 calculates an individual likelihood ratio based on the two consecutive elements that have been obtained (step S22).
- the second calculator 120 calculates an integrated likelihood ratio based on the plurality of individual likelihood ratios calculated by the first calculator 110 (step S23).
- the class classification unit 200 performs class classification based on the calculated integrated likelihood ratio (step S24).
- the class classification may determine one class to which the series data belongs, or may determine a plurality of classes to which the series data are highly likely to belong.
- the class classification unit 200 may output the result of class classification to a display or the like. Further, the class classification unit 200 may output the result of class classification by voice through a speaker or the like.
- the individual likelihood ratios are first calculated based on two elements, and then integrated based on a plurality of individual likelihood ratios.
- a likelihood ratio is calculated.
- the integrated likelihood ratio calculated in this way it is possible to appropriately select the class to which the series data belongs.
- the classification device 10 that calculates the individual likelihood ratio and the integrated likelihood ratio by using the log-sum-exp type loss function described in each of the above embodiments, the convergence in the stochastic gradient descent method can be improved. can be improved. Therefore, it becomes possible to perform efficient learning.
- FIG. 12 An information processing system 1 according to the eighth embodiment will be described with reference to FIGS. 12 and 13.
- FIG. 12 is a block diagram showing the functional configuration of an information processing system according to the eighth embodiment.
- symbol is attached
- the likelihood ratio calculator 100 in the classification device 10 includes a first calculator 110 and a second calculator 120 .
- the first calculator 110 includes an individual likelihood ratio calculator 111 and a first storage.
- the second calculator 120 includes an integrated likelihood ratio calculator 121 and a second storage 122 .
- each of the individual likelihood ratio calculator 111 and the integrated likelihood ratio calculator 121 may be realized by, for example, the above-described processor 11 (see FIG. 1).
- each of the first storage unit 112 and the second storage unit 122 may be implemented by, for example, the above-described storage device 14 (see FIG. 1).
- the individual likelihood ratio calculation unit 111 is configured to be able to calculate an individual likelihood ratio based on two successive elements among the elements sequentially acquired by the data acquisition unit 50 . More specifically, individual likelihood ratio calculation section 111 calculates the individual likelihood ratio based on the newly acquired element and past data stored in first storage section 112 . Information stored in the first storage unit 112 can be read by the individual likelihood ratio calculation unit 111 . When the first storage unit 112 stores past individual likelihood ratios, the individual likelihood ratio calculation unit 111 reads out the stored past individual likelihood ratios and calculates new individual likelihood ratios in consideration of the acquired elements. It suffices to calculate the likelihood ratio. On the other hand, when the first storage unit 112 stores the elements themselves acquired in the past, the individual likelihood ratio calculation unit 111 calculates past individual likelihood ratios from the stored past elements, It is sufficient to calculate the likelihood ratio for the elements acquired in .
- the integrated likelihood ratio calculation unit 121 is configured to be able to calculate an integrated likelihood ratio based on a plurality of individual likelihood ratios.
- Integrated likelihood ratio calculation section 121 uses the individual likelihood ratios calculated by individual likelihood ratio calculation section 111 and the past integrated likelihood ratios stored in second storage section 122 to calculate new integrated likelihood ratios. Calculate the degree ratio.
- Information stored in the second storage unit 122 (that is, past integrated likelihood ratios) is configured to be readable by the integrated likelihood ratio calculation unit 121 .
- FIG. 13 is a flow chart showing the flow of operations of a likelihood ratio calculator in the information processing system according to the eighth embodiment.
- the individual likelihood ratio calculation unit 111 in the first calculation unit 110 acquires past data from the first storage unit 112. Read out (step S31).
- the past data is, for example, the result of processing by the individual likelihood ratio calculation unit 111 of the element acquired immediately before the element acquired this time by the data acquisition unit 50 (in other words, the calculated result of the element immediately before the element acquired this time). individual likelihood ratio).
- the past data may be the element itself acquired immediately before the element acquired in the acquisition.
- the individual likelihood ratio calculation unit 111 calculates a new individual likelihood ratio (that is, the data acquisition unit 50, the individual likelihood ratio for the element acquired this time) is calculated (step S32).
- Individual likelihood ratio calculation section 111 outputs the calculated individual likelihood ratio to second calculation section 120 .
- Individual likelihood ratio calculation section 111 may store the calculated individual likelihood ratio in first storage section 112 .
- the integrated likelihood ratio calculator 121 in the second calculator 120 reads the past integrated likelihood ratios from the second storage unit 122 (step S33).
- the past integrated likelihood ratio is, for example, the processing result of the integrated likelihood ratio calculation unit 121 for the element acquired immediately before the element acquired this time by the data acquisition unit 50 (in other words, the previous integrated likelihood ratio calculated for the elements).
- the integrated likelihood ratio calculation unit 121 calculates a new integrated likelihood ratio based on the likelihood ratio calculated by the individual likelihood ratio calculation unit 111 and the past integrated likelihood ratio read from the second storage unit 122.
- a likelihood ratio (that is, an integrated likelihood ratio for the elements acquired this time by the data acquisition unit 50) is calculated (step S34).
- Integrated likelihood ratio calculation section 121 outputs the calculated integrated likelihood ratio to class classification section 200 .
- the integrated likelihood ratio calculator 121 may store the calculated integrated likelihood ratio in the second storage unit 122 .
- the integrated likelihood ratio is calculated.
- the probability gradient Convergence in the descent method can be improved. Therefore, it becomes possible to perform efficient learning.
- the ninth embodiment differs from the first to eighth embodiments described above only in part of the operation (specifically, the operation of the classifying unit 200). It may be the same as the eighth embodiment. Therefore, in the following, portions different from the already described embodiments will be described in detail, and descriptions of other overlapping portions will be omitted as appropriate.
- FIG. 14 is a flow chart showing the operation flow of the classification device in the information processing system according to the ninth embodiment.
- the same reference numerals are assigned to the same processes as those described in FIG.
- the data acquisition unit 50 first acquires the elements included in the series data (step S11).
- the data acquisition unit 50 outputs the acquired elements of the series data to the likelihood ratio calculation unit 100 .
- the likelihood ratio calculator 100 calculates the likelihood ratio based on the two or more acquired elements (step S12).
- the class classification unit 200 performs class classification based on the calculated likelihood ratio. Especially in the ninth embodiment, the class classification unit 200 selects a plurality of classes to which series data may belong. and output (step S41). That is, the class classification unit 200 does not determine one class to which the series data belongs, but rather determines a plurality of classes to which the series data are likely to belong. More specifically, the class classification unit 200 selects k classes (where k is a natural number equal to or less than n) from n classes (where n is a natural number) prepared as classification candidates. Execute the process.
- the class classification unit 200 may output information about k classes to which series data may belong to a display or the like. Also, the class classification unit 200 may output information about k classes to which the series data may belong by voice through a speaker or the like.
- the class classification unit 200 may rearrange and output them. For example, the class classification section 200 may rearrange the information on the k classes in descending order of likelihood ratio and output the sorted information. Alternatively, the class classification unit 200 may output each piece of information about k classes in a different manner for each class. For example, the class classification unit 200 may output a class with a high likelihood ratio in an emphasized display mode, while outputting a class with a low likelihood ratio in a non-emphasized display mode. In the case of highlighting, for example, the size or color of the displayed object may be changed, or the displayed object may be animated.
- the information processing system 1 may be used for proposing products that a user is likely to be interested in on a web shopping site. Specifically, the information processing system 1 selects k products (that is, k classes) that the user is likely to be interested in from n products (that is, n classes) that are handled products. may be selected and output to the user (where k is a number smaller than n). In this case, past purchase histories, browsing histories, and the like are given as examples of the series data to be input.
- a user's image can be captured by a mounted camera in some cases.
- the user's emotion may be estimated from the user's image, and stores and products corresponding to that emotion may be suggested.
- the line of sight of the user may be estimated from the image of the user (that is, the part the user is looking at may be estimated) to suggest stores and products that the user is likely to be interested in.
- the user's attributes for example, gender, age, etc.
- the n classes may be weighted according to the estimated information.
- the information processing system 1 according to the ninth embodiment can also be used for criminal investigation. For example, when finding the true culprit from among a plurality of suspects, if only one of them who is most likely to be the culprit is selected, a big problem will arise if the selection is wrong.
- the information processing system 1 according to the ninth embodiment can also be applied to analysis of radar images. Since radar images tend to have low definition due to their nature, it is difficult to accurately determine, for example, what is shown in the image only by a machine. However, in the information processing system 1 according to the present embodiment, it is possible to select and output k candidates that are highly likely to appear in the radar image. Therefore, it is possible to first output k candidates and have the user make a decision among them. For example, if "dog”, “cat”, "ship”, and "tank” are listed as candidates for the radar image of the port, the user can select "ship” which is highly related to the port as the radar image. It can be easily recognized from the photograph.
- the application example described above is an example, and the information processing system 1 according to the present embodiment is applied in a situation where it is required to select k candidates from n candidates. By doing so, it is possible to achieve beneficial effects.
- a processing method is also implemented in which a program for operating the configuration of each embodiment is recorded on a recording medium so as to realize the functions of each embodiment described above, the program recorded on the recording medium is read as code, and executed by a computer. Included in the category of form. That is, a computer-readable recording medium is also included in the scope of each embodiment. In addition to the recording medium on which the above program is recorded, the program itself is also included in each embodiment.
- a floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, CD-ROM, magnetic tape, non-volatile memory card, and ROM can be used as recording media.
- the program recorded on the recording medium alone executes the process, but also the one that operates on the OS in cooperation with other software and the function of the expansion board to execute the process. included in the category of
- the information processing system includes acquisition means for acquiring a plurality of elements included in series data, and based on at least two consecutive elements among the plurality of elements, the likelihood of a class to which the series data belongs. a calculation means for calculating a likelihood ratio indicating the likelihood ratio, a classification means for classifying the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio, and a log-sum-exp type and learning means for learning about the calculation of the likelihood ratio using the loss function of .
- the learning means calculates the likelihood of belonging to one of N classes (where N is a natural number) that are classification candidates for the series data as a denominator. Supplementary note 1, wherein the learning is performed using a loss function that takes into account the likelihood ratio of N ⁇ (N-1) patterns whose numerator is the likelihood that indicates the likelihood of belonging to another class.
- N is a natural number
- the learning is performed using a loss function that takes into account the likelihood ratio of N ⁇ (N-1) patterns whose numerator is the likelihood that indicates the likelihood of belonging to another class.
- the learning means performs the learning using a loss function that takes into account the likelihood ratio of part of the N ⁇ (N ⁇ 1) patterns.
- the learning means performs the learning using a loss function that takes into account the likelihood ratio that the correct class is in the numerator of the N ⁇ (N ⁇ 1) patterns.
- the information processing system according to appendix 3 characterized by:
- Appendix 5 The information processing system according to appendix 5, wherein the loss function includes a plurality of sums, and at least one of the plurality of sums is included in the log-sum-exp type. 5. The information processing system according to any one of Supplements 1 to 4.
- the loss function includes a weighting factor corresponding to the difficulty of classifying the series data.
- the likelihood ratio is an integrated likelihood ratio calculated by considering a plurality of individual likelihood ratios calculated based on two consecutive elements included in the series data. 7.
- the acquisition means sequentially acquires a plurality of elements included in the series data, and the calculation means calculates the individual likelihood ratio based on the newly acquired elements. and the previously calculated integrated likelihood ratio to calculate a new integrated likelihood ratio.
- appendix 9 The information processing method according to appendix 9 obtains a plurality of elements included in the series data, and calculates a likelihood of a class to which the series data belongs based on at least two consecutive elements among the plurality of elements. Calculate the likelihood ratio, classify the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio, and use a log-sum-exp type loss function to calculate the likelihood This information processing method is characterized by learning about calculation of the degree ratio.
- appendix 10 The computer program according to appendix 10 obtains a plurality of elements included in series data, and obtains a likelihood indicating the likelihood of a class to which the series data belongs, based on at least two consecutive elements among the plurality of elements. Calculate the ratio, classify the series data into at least one class among a plurality of classes that are classification candidates based on the likelihood ratio, and use a log-sum-exp type loss function to calculate the likelihood
- a computer program characterized by operating a computer to perform learning relating to calculation of ratios.
- a recording medium according to appendix 11 is a recording medium characterized in that the computer program according to appendix 10 is recorded.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Operations Research (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Algebra (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
Description
第1実施形態に係る情報処理システムについて、図1から図4を参照して説明する。
まず、図1を参照しながら、第1実施形態に係る情報処理システムのハードウェア構成について説明する。図1は、第1実施形態に係る情報処理システムのハードウェア構成を示すブロック図である。
次に、図2を参照しながら、第1実施形態に係る情報処理システム1の機能的構成について説明する。図2は、第1実施形態に係る情報処理システムの機能的構成を示すブロック図である。
次に、図3を参照しながら、第1実施形態に係る情報処理システム1における分類装置10の動作(具体的には、学習後のクラス分類動作)の流れについて説明する。図3は、第1実施形態に係る情報処理システムにおける分類装置の動作の流れを示すフローチャートである。
次に、図4を参照しながら、第1実施形態に係る情報処理システム1における学習部300の動作(即ち、尤度比の算出に関する学習動作)の流れについて説明する。図4は、第1実施形態に係る情報処理システムにおける学習部の動作の流れを示すフローチャートである。
次に、第1実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第2実施形態に係る情報処理システム1について、図5及び図6を参照して説明する。なお、第2実施形態は、上述した第1実施形態と比較して一部の動作が異なるのみであり、例えば装置構成(図1及び図2参照)や、分類装置10の動作(図3参照)等については、第1実施形態と同様であってよい。このため、以下では、第1実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図5を参照しながら、第2実施形態に係る情報処理システム1における学習部300の動作の流れについて説明する。図5は、第2実施形態に係る情報処理システムの動作の流れを示すフローチャートである。なお、図5では、図4で示した処理と同様の処理に同一の符号を付している。
次に、図6を参照しながら、上述した学習部300による学習動作において考慮される尤度比(即ち、損失関数の算出に考慮される尤度比)について、具体的に説明する。図6は、第2実施形態に係る情報処理システムにおける学習部が考慮する尤度比の一例を示すマトリクス図である。
次に、第2実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第3実施形態に係る情報処理システム1について、図7を参照して説明する。なお、第3実施形態は、上述した第2実施形態と一部の動作が異なるのみで、その他の部分については、第第2実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図7を参照しながら、第3実施形態に係る情報処理システム1における学習部300の動作の流れについて説明する。図7は、第3実施形態に係る情報処理システムの動作の流れを示すフローチャートである。なお、図7では、図4で示した処理と同様の処理に同一の符号を付している。
続いて、損失関数に考慮する尤度比の選択例(即ち、N×(N-1)パターンの一部の尤度比の選択例)について具体的に説明する。
次に、第3実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第4実施形態に係る情報処理システム1について、図8及び図9を参照して説明する。なお、第4実施形態は、上述した第3実施形態の具体的な選択例(即ち、損失関数に考慮する一部の尤度比の選択例)を説明するものであり、その他の部分については、第3実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図8を参照しながら、第4実施形態に係る情報処理システム1における学習部300の動作の流れについて説明する。図8は、第4実施形態に係る情報処理システムの動作の流れを示すフローチャートである。なお、図8では、図4で示した処理と同様の処理に同一の符号を付している。
次に、図9を参照しながら、上述した学習部300による学習動作において考慮される尤度比(即ち、損失関数の算出に考慮される尤度比)について、具体的に説明する。図9は、第4実施形態に係る情報処理システムにおける学習部が考慮する尤度比の一例を示すマトリクス図である。
次に、第4実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第5実施形態に係る情報処理システム1について説明する。なお、第5実施形態は、上述した第1から第4実施形態で用いられる損失関数の具体例を説明するものであり、装置構成や動作の流れについては、第1から第4実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
第5実施形態に係る情報処理システム1で用いられるlog-sum-exp型の損失関数として、例えば下記式(1)が挙げられる。なお、入力されるデータセット(データとラベルのセット)は{Xi,yi}N i=1であるとする。
上記式(1)の損失関数は、下記式(2)のように変形できる。
次に、第5実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第6実施形態に係る情報処理システム1について説明する。なお、第6実施形態は、第5実施形態と同様に、上述した第1から第4実施形態で用いられる損失関数の具体例を説明するものであり、装置構成や動作の流れについては、第1から第4実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
第6実施形態に係る情報処理システム1で用いられるlog-sum-exp型の損失関数として、例えば下記式(3)が挙げられる。なお、入力されるデータセット(データとラベルのセット)は{Xi,yi}N i=1であるとする。
上記式(3)の損失関数は、下記式(4)のように変形できる。
損失関数には、重み付けが行われてもよい。例えば、上記式(4)に重み付けをすると下記式(5)のようになる。
次に、第6実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第7実施形態に係る情報処理システム1について、図10及び図11を参照して説明する。なお、第7実施形態は、上述した第1から第6実施形態と一部の構成及び動作(具体的には、分類装置10の構成及び動作)が異なるのみで、その他の部分については、第1から第6実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図10を参照しながら、第7実施形態に係る情報処理システム1の機能的構成について説明する。図10は、第7実施形態に係る情報処理システムの機能的構成を示すブロック図である。なお、図10では、図2で示した各構成要素と同様の要素に同一の符号を付している。
次に、図11を参照しながら、第7実施形態に係る情報処理システム1における分類装置10の動作(具体的には、学習後のクラス分類動作)の流れについて説明する。図11は、第7実施形態に係る情報処理システムにおける分類装置の動作の流れを示すフローチャートである。
次に、第7実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第8実施形態に係る情報処理システム1について、図12及び図13を参照して説明する。なお、第8実施形態は、上述した第7実施形態と一部の構成及び動作(具体的には、尤度比算出部100の構成及び動作)が異なるのみで、その他の部分については、第7実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図12を参照しながら、第8実施形態に係る情報処理システム1の機能的構成について説明する。図12は、第8実施形態に係る情報処理システムの機能的構成を示すブロック図である。なお、図12では、図2及び図10で示した各構成要素と同様の要素に同一の符号を付している。
次に、図13を参照しながら、第8実施形態に係る情報処理システム1における尤度比算出動作(即ち、尤度比算出部100の動作)の流れについて説明する。図13は、第8実施形態に係る情報処理システムにおける尤度比算出部の動作の流れを示すフローチャートである。
次に、第8実施形態に係る情報処理システム1によって得られる技術的効果について説明する。
第9実施形態に係る情報処理システム1について、図14を参照して説明する。なお、第9実施形態は、上述した第1から第8実施形態と一部の動作(具体的には、クラス分類部200の動作)が異なるのみで、その他の部分については、第1から第8実施形態と同様であってよい。このため、以下では、すでに説明した各実施形態と異なる部分について詳しく説明し、他の重複する部分については適宜説明を省略するものとする。
まず、図14を参照しながら、第9実施形態に係る情報処理システム1における分類装置10の動作(具体的には、学習後のクラス分類動作)の流れについて説明する。図14は、第9実施形態に係る情報処理システムにおける分類装置の動作の流れを示すフローチャートである。なお、図14では、図3で説明した処理と同様の処理に同一の符号を付している。
上述したn個のクラスの中からk個のクラスを出力する構成について、具体的な適用例をいくつか挙げて説明する。
第9実施形態に係る情報処理システム1は、ウェブ上のショッピングサイトにおいて、ユーザが興味を持ちそうな商品の提案に用いられてもよい。具体的には、情報処理システム1は、取扱商品であるn個の商品(即ち、n個のクラス)の中から、ユーザが興味を持ちそうなk個の商品(即ち、k個のクラス)を選択して、ユーザに対して出力するようにしてもよい(なお、kはnより小さい数である)。この場合、入力される系列データの一例として、過去の購入履歴や閲覧履歴等が挙げられる。
第9実施形態に係る情報処理システム1は、犯罪捜査に用いることもできる。例えば、複数の容疑者の中から真犯人を見つける場合、その中から最も犯人である可能性の高い1人だけを選択すると、その選択が間違っていた場合に大きな問題が生ずる。しかるに、本実施形態に係る情報処理システム1では、犯人である可能性が高い上位K人の容疑者を選択して出力することができる。具体的には、複数の容疑者の各々に関する情報を要素として含む系列データから、犯人である可能性が高い上位k人に対応するクラスを選択して出力するようにすればよい。このようにすれば、例えば犯人である可能性が高い複数の容疑者を捜査対象として、適切に真犯人を見つけ出すことが可能となる。
第9実施形態に係る情報処理システム1は、レーダ画像の分析に適用することもできる。レーダ画像は、その性質上、鮮明度が低いものが多いため、例えばその画像に写っているものが何であるのか、機械のみで正確に判定することが難しい。しかるに本実施形態に係る情報処理システム1では、レーダ画像に写っている可能性が高いk個の候補を選択して出力することができる。よって、まずはk個の候補を出力し、その中からユーザ自身で判断してもらうことが可能である。例えば、港のレーダ画像に写っているものとして、「犬」、「猫」、「船」、及び「戦車」が候補として挙げられれば、ユーザは港に関連の高い「船」がレーダ画像に写っていると容易に判断できる。
以上説明した実施形態に関して、更に以下の付記のようにも記載されうるが、以下には限られない。
付記1に記載の情報処理システムは、系列データに含まれる複数の要素を取得する取得手段と、前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出する算出手段と、前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類する分類手段と、log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行う学習手段とを備えることを特徴とする情報処理システムである。
付記2に記載の情報処理システムは、前記学習手段は、前記系列データの分類候補であるN個(ただし、Nは自然数)のクラスのうち、一のクラスに属する尤もらしさを示す尤度を分母とし、他のクラスに属する尤もらしさを示す尤度を分子とするN×(N-1)パターンの前記尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする付記1に記載の情報処理システムである。
付記3に記載の情報処理システムは、前記学習手段は、前記N×(N-1)パターンのうち一部の前記尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする付記2に記載の情報処理システムである。
付記4に記載の情報処理システムは、前記学習手段は、前記N×(N-1)パターンのうち、前記正解クラスが分子にある尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする付記3に記載の情報処理システムである。
付記5に記載の情報処理システムは、前記損失関数は、複数のsumを含んでおり、前記log-sum-exp型の中に前記複数のsumのうち少なくとも1つを含んでいることを特徴とする付記1から4のいずれか一項に記載の情報処理システムである。
付記6に記載の情報処理システムは、前記損失関数は、前記系列データの分類しにくさに応じた重み係数を含んでいることを特徴とする付記1から5のいずれか一項に記載の情報処理システムである。
付記7に記載の情報処理システムは、前記尤度比は、前記系列データに含まれる2つの連続する要素に基づいて算出される個別尤度比を複数考慮して算出される統合尤度比であることを特徴とする付記1から6のいずれか一項に記載の情報処理システムである。
付記8に記載の情報処理システムは、前記取得手段は、系列データに含まれる複数の要素を逐次的に取得し、前記算出手段は、新たに取得した要素に基づいて算出した前記個別尤度比と、過去に算出した前記統合尤度比とを用いて、新たな前記統合尤度比を算出することを特徴とする付記7に記載の情報処理システムである。
付記9に記載の情報処理方法は、系列データに含まれる複数の要素を取得し、前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出し、前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類し、log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行うことを特徴とする情報処理方法である。
付記10に記載のコンピュータプログラムは、系列データに含まれる複数の要素を取得し、前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出し、前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類し、log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行うようにコンピュータを動作させることを特徴とするコンピュータプログラムである。
付記11に記載の記録媒体は、付記10に記載のコンピュータプログラムが記録されていることを特徴とする記録媒体である。
11 プロセッサ
14 記憶装置
10 分類装置
50 データ取得部
100 尤度比算出部
110 第1算出部
111 個別尤度比算出部
112 第1記憶部
120 第2算出部
121 統合尤度比算出部
122 第2記憶部
200 クラス分類部
300 学習部
Claims (10)
- 系列データに含まれる複数の要素を取得する取得手段と、
前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出する算出手段と、
前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類する分類手段と、
log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行う学習手段と
を備えることを特徴とする情報処理システム。 - 前記学習手段は、前記系列データの分類候補であるN個(ただし、Nは自然数)のクラスのうち、一のクラスに属する尤もらしさを示す尤度を分母とし、他のクラスに属する尤もらしさを示す尤度を分子とするN×(N-1)パターンの前記尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする請求項1に記載の情報処理システム。
- 前記学習手段は、前記N×(N-1)パターンのうち一部の前記尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする請求項2に記載の情報処理システム。
- 前記学習手段は、前記N×(N-1)パターンのうち、前記正解クラスが分子にある尤度比を考慮した損失関数を用いて、前記学習を行うことを特徴とする請求項3に記載の情報処理システム。
- 前記損失関数は、複数のsumを含んでおり、前記log-sum-exp型の中に前記複数のsumのうち少なくとも1つを含んでいることを特徴とする請求項1から4のいずれか一項に記載の情報処理システム。
- 前記損失関数は、前記系列データの分類しにくさに応じた重み係数を含んでいることを特徴とする請求項1から5のいずれか一項に記載の情報処理システム。
- 前記尤度比は、前記系列データに含まれる2つの連続する要素に基づいて算出される個別尤度比を複数考慮して算出される統合尤度比であることを特徴とする請求項1から6のいずれか一項に記載の情報処理システム。
- 前記取得手段は、系列データに含まれる複数の要素を逐次的に取得し、
前記算出手段は、新たに取得した要素に基づいて算出した前記個別尤度比と、過去に算出した前記統合尤度比とを用いて、新たな前記統合尤度比を算出する
ことを特徴とする請求項7に記載の情報処理システム。 - 系列データに含まれる複数の要素を取得し、
前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出し、
前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類し、
log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行う
ことを特徴とする情報処理方法。 - 系列データに含まれる複数の要素を取得し、
前記複数の要素のうち少なくとも2つの連続する要素に基づいて、前記系列データが属するクラスの尤もらしさを示す尤度比を算出し、
前記尤度比に基づいて、分類候補である複数のクラスのうち少なくとも1つのクラスに前記系列データを分類し、
log-sum-exp型の損失関数を用いて、前記尤度比の算出に関する学習を行う
ようにコンピュータを動作させることを特徴とするコンピュータプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022576936A JPWO2022157973A1 (ja) | 2021-01-25 | 2021-01-25 | |
PCT/JP2021/002439 WO2022157973A1 (ja) | 2021-01-25 | 2021-01-25 | 情報処理システム、情報処理方法、及びコンピュータプログラム |
US18/272,959 US20240086424A1 (en) | 2021-01-25 | 2021-01-25 | Information processing system, information processing method, and computer program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/002439 WO2022157973A1 (ja) | 2021-01-25 | 2021-01-25 | 情報処理システム、情報処理方法、及びコンピュータプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022157973A1 true WO2022157973A1 (ja) | 2022-07-28 |
Family
ID=82548651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/002439 WO2022157973A1 (ja) | 2021-01-25 | 2021-01-25 | 情報処理システム、情報処理方法、及びコンピュータプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240086424A1 (ja) |
JP (1) | JPWO2022157973A1 (ja) |
WO (1) | WO2022157973A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024079854A1 (ja) * | 2022-10-13 | 2024-04-18 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び記録媒体 |
WO2024079853A1 (ja) * | 2022-10-13 | 2024-04-18 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び記録媒体 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007114413A (ja) * | 2005-10-19 | 2007-05-10 | Toshiba Corp | 音声非音声判別装置、音声区間検出装置、音声非音声判別方法、音声区間検出方法、音声非音声判別プログラムおよび音声区間検出プログラム |
WO2020194497A1 (ja) * | 2019-03-26 | 2020-10-01 | 日本電気株式会社 | 情報処理装置、個人識別装置、情報処理方法及び記憶媒体 |
-
2021
- 2021-01-25 US US18/272,959 patent/US20240086424A1/en active Pending
- 2021-01-25 JP JP2022576936A patent/JPWO2022157973A1/ja active Pending
- 2021-01-25 WO PCT/JP2021/002439 patent/WO2022157973A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007114413A (ja) * | 2005-10-19 | 2007-05-10 | Toshiba Corp | 音声非音声判別装置、音声区間検出装置、音声非音声判別方法、音声区間検出方法、音声非音声判別プログラムおよび音声区間検出プログラム |
WO2020194497A1 (ja) * | 2019-03-26 | 2020-10-01 | 日本電気株式会社 | 情報処理装置、個人識別装置、情報処理方法及び記憶媒体 |
Non-Patent Citations (1)
Title |
---|
UCHIBE, EIJI ET AL.: "Imitation learning based on entropy-regularized reinforcement learning", THE 33RD ANNUAL CONFERENCE OF THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 7 June 2019 (2019-06-07) - 2 March 2021 (2021-03-02), pages 1 - 4, XP081742725, Retrieved from the Internet <URL:https://www.jstage.jst.go.jp/article/pjsai/JSAI2019/0/JSAI2019_1I3J203/_pdf/-char/ja> * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024079854A1 (ja) * | 2022-10-13 | 2024-04-18 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び記録媒体 |
WO2024079853A1 (ja) * | 2022-10-13 | 2024-04-18 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び記録媒体 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022157973A1 (ja) | 2022-07-28 |
US20240086424A1 (en) | 2024-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446430B (zh) | 产品推荐的方法、装置、计算机设备及可读存储介质 | |
US9424493B2 (en) | Generic object detection in images | |
CN112632385A (zh) | 课程推荐方法、装置、计算机设备及介质 | |
CN106293074B (zh) | 一种情绪识别方法和移动终端 | |
CN112784778B (zh) | 生成模型并识别年龄和性别的方法、装置、设备和介质 | |
WO2022157973A1 (ja) | 情報処理システム、情報処理方法、及びコンピュータプログラム | |
US20170103284A1 (en) | Selecting a set of exemplar images for use in an automated image object recognition system | |
CN110363084A (zh) | 一种上课状态检测方法、装置、存储介质及电子 | |
CN111461168A (zh) | 训练样本扩充方法、装置、电子设备及存储介质 | |
US11605002B2 (en) | Program, information processing method, and information processing apparatus | |
CN108369664A (zh) | 调整神经网络的大小 | |
US11809519B2 (en) | Semantic input sampling for explanation (SISE) of convolutional neural networks | |
CN111737473A (zh) | 文本分类方法、装置及设备 | |
CN114399808A (zh) | 一种人脸年龄估计方法、系统、电子设备及存储介质 | |
CN113886697A (zh) | 基于聚类算法的活动推荐方法、装置、设备及存储介质 | |
Bajwa et al. | A multifaceted independent performance analysis of facial subspace recognition algorithms | |
CN108229572B (zh) | 一种参数寻优方法及计算设备 | |
CN116503608A (zh) | 基于人工智能的数据蒸馏方法及相关设备 | |
CN113961765B (zh) | 基于神经网络模型的搜索方法、装置、设备和介质 | |
Dey et al. | Mood recognition in online sessions using machine learning in realtime | |
US11042837B2 (en) | System and method for predicting average inventory with new items | |
US20240054400A1 (en) | Information processing system, information processing method, and computer program | |
JP7099254B2 (ja) | 学習方法、学習プログラム及び学習装置 | |
KR20210035622A (ko) | 시계열 데이터 유사도 계산 시스템 및 방법 | |
CN113837811B (zh) | 一种电梯广告点位推荐方法、装置、计算机设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21921081 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022576936 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18272959 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21921081 Country of ref document: EP Kind code of ref document: A1 |