WO2021145185A1 - Behavior recognition device, behavior recognition method, program, and recording medium - Google Patents

Behavior recognition device, behavior recognition method, program, and recording medium Download PDF

Info

Publication number
WO2021145185A1
WO2021145185A1 PCT/JP2020/048361 JP2020048361W WO2021145185A1 WO 2021145185 A1 WO2021145185 A1 WO 2021145185A1 JP 2020048361 W JP2020048361 W JP 2020048361W WO 2021145185 A1 WO2021145185 A1 WO 2021145185A1
Authority
WO
WIPO (PCT)
Prior art keywords
situation
behavior
pattern
learning
information data
Prior art date
Application number
PCT/JP2020/048361
Other languages
French (fr)
Japanese (ja)
Inventor
黒田 大介
高橋 一徳
由仁 宮内
Original Assignee
Necソリューションイノベータ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Necソリューションイノベータ株式会社 filed Critical Necソリューションイノベータ株式会社
Priority to JP2021571127A priority Critical patent/JP7231286B2/en
Publication of WO2021145185A1 publication Critical patent/WO2021145185A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the present invention relates to a behavior recognition device, a behavior recognition method, a program, and a recording medium.
  • Deep learning uses a calculation method called back propagation to calculate the output error when a large amount of teacher data is input to a multi-layer neural network, and learns so that the error is minimized.
  • Patent Documents 1 to 3 disclose a neural network processing apparatus capable of constructing a neural network with a small amount of labor and arithmetic processing by defining a large-scale neural network as a combination of a plurality of subnetworks. ing. Further, Patent Document 4 discloses a structure optimizing device that optimizes a neural network.
  • Japanese Unexamined Patent Publication No. 2001-051968 JP-A-2002-251601 Japanese Unexamined Patent Publication No. 2003-317573 Japanese Unexamined Patent Publication No. 09-091263
  • the situation information data is generated, and the usage learning model maps a third pattern in which the relationship between the plurality of elements and information representing their degree is mapped to a specific situation, and the plurality of elements and them.
  • the behavior identification unit includes a plurality of models in which a fourth pattern mapping a relationship with information representing the duration of the data and the behavior of the person estimated from the specific situation are associated with the behavior identification unit.
  • the model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or more than a predetermined threshold value, the extracted model estimates.
  • an action recognition device that determines an action as the action of the person and determines the action estimated by the situation information data as the action of the person when the degree of conformity of the extracted model is less than the predetermined threshold value. ..
  • the model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person.
  • a program is provided that functions as a means for determining the behavior estimated by the situation information data as the behavior of the person.
  • FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the first embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing a configuration example of a situation learning / identification unit in the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 3 is a schematic view showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the first embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing a configuration example of a situation learning / identification unit in the behavior recognition device according to the first
  • FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit.
  • FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit.
  • FIG. 9 is a diagram showing an example of situation information data.
  • FIG. 10 is a diagram showing an example of a usage learning model.
  • FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model.
  • FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model.
  • FIG. 13 is a schematic view showing a hardware configuration example of the behavior recognition device according to the first embodiment of the present invention.
  • FIG. 14 is a schematic view showing a configuration example of the behavior recognition
  • FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the present embodiment.
  • FIG. 2 is a schematic diagram showing a configuration example of the situation learning / identification unit in the behavior recognition device according to the present embodiment.
  • FIG. 3 is a schematic diagram showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the present embodiment.
  • FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the present embodiment.
  • FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the present embodiment.
  • the image acquisition unit 100 is a functional block having a function of acquiring an image from an external camera or storage device (not shown).
  • the image acquired by the image acquisition unit 100 includes a plurality of images taken with respect to the same subject at different times, and is, for example, a moving image.
  • An image suitable for processing in the situation grasping unit 200 can be appropriately selected as the image, and may include, for example, an RGB image or a depth image.
  • the situation grasping unit 200 may have a function of performing time series analysis of the subject.
  • RNN Recurrent Neural Network
  • RSTM Long Short-Term Memory Network
  • GRU Gate Recurrent Unit
  • Memory Networks can be applied to long-time time series analysis of a subject.
  • the situation learning / identification unit 300 is a functional block having a function of generating situation information data based on the information received from the situation grasping unit 200.
  • the situation information data is data in which a pattern mapping information received from the situation grasping unit 200 and an estimation result indicating a person's behavior estimated from the information received from the situation grasping unit 200 are associated with each other. The details of the status information data will be described later.
  • a situation learning model that estimates the behavior of a person from the information received from the situation grasping unit 200 is constructed.
  • the situation learning / identification unit 300 combines the information received from the situation grasping unit 200 with the information output from the situation learning model to generate the situation information data.
  • FIG. 2 is used for the situation learning / identification unit 300 having a function of performing learning based on the information received from the situation grasping unit 200 and generating a situation learning model.
  • the situation learning model is not particularly limited as long as it outputs the behavior of a person estimated by inputting the information received from the situation grasping unit 200, and may be based on a rule base, for example. .. In this case, the situation learning / identification unit 300 does not necessarily have to have a function of performing learning based on the information received from the situation grasping unit 200.
  • the situation learning / identification unit 300 includes a situation information data generation unit 310, a neural network unit 320, a determination unit 330, a learning unit 340, an identification unit 350, and an output unit 360.
  • the learning unit 340 may be composed of a weight correction unit 342 and a learning cell generation unit 344.
  • the situation information data generation unit 310 has a function of generating pattern data representing information related to the behavior of a person or the situation of an object in an image based on the information received from the situation grasping unit 200. Further, the situation information data generation unit 310 has a function of combining the information received from the situation grasping unit 200 and the information output from the situation learning model to generate the situation information data.
  • a weighting coefficient ⁇ for giving a predetermined weighting to the element value I is set in each of the branches (axons) connecting the cell 42 and the cell 44.
  • the cell 42 1, 42 2, ..., 42 i, ..., the branch connecting the 42 M and the cell 44 j for example, as shown in FIG. 5, the weighting factor ⁇ 1j, ⁇ 2j, ..., ⁇ ij, ..., ⁇ Mj is set.
  • the cell 44 j performs the operation shown in the following equation (1) and outputs the output value O j .
  • the single cell 44, the branch (input node) for inputting the element value I 1 ⁇ I M to the cell 44, the branch (output node) outputs an output value O from the cell 44 May be generically referred to as learning cell 46.
  • the determination unit 330 compares the correlation value between the plurality of element values of the pattern data and the output value of the learning cell 46 with a predetermined threshold value, and determines whether the correlation value is equal to or greater than the threshold value or less than the threshold value. do.
  • An example of the correlation value is the likelihood of the learning cell 46 with respect to the output value.
  • the function of the determination unit 330 may be provided in each of the learning cells 46.
  • the learning target data is taken into the situation information data generation unit 310.
  • the situation information data generation unit 310 extracts element values indicating the characteristics of the captured learning target data, and generates predetermined pattern data.
  • Element value I 1 ⁇ I M of the pattern data inputted to the neural network 320 is input to the cell 44 1 ⁇ 42 N via the cell 42 1 ⁇ 42 M.
  • outputs O 1 to N can be obtained from cells 44 1 to 42 N.
  • the output value O is calculated based on the equation (1).
  • the correlation value between the element values I 1 to IM and the output value O of the learning cell 46 (here, the likelihood regarding the output value of the learning cell). (Likelihood P) is calculated.
  • the method for calculating the likelihood P is not particularly limited.
  • the likelihood P j of the learning cell 46 j can be calculated based on the following equation (2).
  • Equation (2) is the likelihood P j have shown to be expressed by the ratio of the output value O j of the learning cell 46 j for the cumulative value of the weighting factor omega ij of a plurality of input nodes of the learning cell 46 j .
  • the likelihood P j is the ratio of the output value of the learning cell 46 j when a plurality of element values are input to the maximum value of the output of the learning cell 46 j based on the weighting coefficient ⁇ ij of a plurality of input nodes. It shows that it will be done.
  • the weighting coefficient ⁇ of the input node of the learning cell 46 having the largest value of the likelihood P among the learning cells 46 is updated. In this way, the information of the learning target data whose likelihood P value is equal to or greater than a predetermined threshold value is accumulated in the weighting coefficient ⁇ of each input node.
  • the learning cells 46 are linked to the category. A new learning cell 46 is generated.
  • the above-mentioned situation learning model can be constructed in the neural network unit 320.
  • the situation information data generation unit 310 extracts element values indicating the characteristics of the captured information and generates predetermined pattern data.
  • the element values I 1 ⁇ I M of the pattern data and inputs to the neural network unit 320 performing learning as described above.
  • the learning cell 46 receives the output value O in accordance with the element values I 1 ⁇ I M.
  • the identification unit 350 has a correlation value between the element values I 1 to IM and the output value O of the learning cell 46 (here, the output value of the learning cell). (Likelihood P for) is calculated.
  • the method for calculating the likelihood P is not particularly limited.
  • the behavior of the person estimated from the pattern data is identified based on the calculated likelihood P of all the learning cells 46.
  • the method of identifying the behavior of a person is not particularly limited. For example, among all the learning cells 46, the behavior associated with the learning cell 46 having the highest likelihood P can be identified from the behavior estimated from the pattern data. Alternatively, a predetermined number of learning cells 46 are extracted from all the learning cells 46 in descending order of likelihood P, and the behavior most associated with the extracted learning cells 46 is estimated from the pattern data. Can be distinguished from the behavior.
  • the situation information data acquisition unit 410 has a function of acquiring the situation information data generated by the situation information data generation unit 310 from the situation learning / identification unit 300.
  • the usage learning model generation unit 430 when information indicating a state in which a person is "sit shallowly (weak)" is mapped to a pattern of situation information data, the state of "sit deeply (strong)” with respect to the situation at that time is set. It is assumed that the user thinks that it is also necessary. In such a case, the usage learning model generation unit 430 additionally maps the information indicating the "deeply sitting (strong)" state to the pattern of the situation information data based on the comment from the user, and newly maps the pattern. To generate. The usage learning model generation unit 430 maps the information to predetermined coordinates according to words such as "weak”, “medium”, and “strong” input by the user via a keyboard or the like. A new pattern can be generated.
  • the behavior identification unit 440 has a function of identifying a person's behavior based on the situation information data and the usage learning model generation unit 430.
  • FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the present embodiment.
  • FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit.
  • FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit.
  • FIG. 9 is a diagram showing an example of situation information data.
  • FIG. 10 is a diagram showing an example of a usage learning model.
  • FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model.
  • FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model.
  • the image acquisition unit 100 acquires a plurality of images of the same subject taken at different times from a camera or a storage device (step S101).
  • the plurality of images acquired by the image acquisition unit 100 are, for example, images of each frame of a moving image. In this case, it is not always necessary to acquire the images of all the frames, and the images may be thinned out as appropriate.
  • the image to be acquired may be any image suitable for grasping the situation of the subject, and can be appropriately selected. For example, RGB images and depth images acquired by an RGB camera and an infrared camera can be applied.
  • the image acquired by the image acquisition unit 100 may be input to the situation grasping unit 200 as it is, or may be temporarily stored in a storage device (not shown).
  • the situation grasping unit 200 recognizes a person or an object appearing in the image by using a known image recognition technique, for example, an image recognition technique using deep learning, for each of the images acquired by the image acquisition unit 100. , The situation is grasped (step S102).
  • a known image recognition technique for example, an image recognition technique using deep learning
  • the situation of the person may be whether he is sitting shallowly on the chair or deeply on the chair.
  • the situation of the object for example, whether it is open or closed, whether it is near the face of a person, or the like can be mentioned.
  • the situation learning / identification unit 300 generates situation information data based on the information received from the situation grasping unit 200 (step S103).
  • the generated situation information data is estimated as a person's behavior from the pattern data of the first layer in which the degree of each element indicating the situation of a person or an object is mapped in a plurality of stages and the pattern data of the first layer. Includes information about the situation (value).
  • the situation (value) estimated as the behavior of a person is information acquired by applying the pattern data of the first layer to the situation learning model.
  • the situation information data is provided with pattern data of the second layer in which the duration of each element indicating the situation of a person or an object is mapped by dividing it into a plurality of stages.
  • “book state”, “book position”, and “sitting condition” are used as three elements indicating the situation of a person or an object, and the degree of each element is mapped in three stages.
  • information as shown in FIG. 7 is obtained as three elements indicating the situation of a person or an object and the situation (value) estimated in that case. It is assumed that it has been done.
  • FIG. 9 is an example in which the information of frames 18 to 21 shown in FIG. 7 is represented as situation information data according to the rules shown in FIG.
  • the situation information data includes patterns and values of the first layer and the second layer corresponding to the image of each frame.
  • the behavior identification unit 440 applies the usage learning model to the situation information data corresponding to the image of each frame generated by the situation learning / identification unit 300, and verifies the estimation result in the situation learning (step S104). Specifically, the pattern of the situation information data is compared with the pattern of the usage learning model, and it is searched whether or not there is a model highly compatible with the situation information data in the usage learning model.
  • the behavior identification unit 440 recognizes the behavior of the person based on the verification result in step S104 (step S105). Specifically, when there is no model highly compatible with the situation information data in the usage learning model, the value of the situation information data is recognized as the behavior of the person as the behavior of the person. On the other hand, when there is a model highly compatible with the situation information data in the usage learning model, the value of the model highly compatible with the situation information data is recognized as the behavior of the person.
  • the storage unit 450 stores a usage learning model including a plurality of models as shown as model 1 and model 2 in FIG. 10, for example.
  • model 1 since the book is closed, it is judged in the situation learning model that "it is sitting but not reading the book", but because the time when the book is closed is short, “sitting and reading the book” It encourages reconsideration.
  • model 2 since the book is half closed, it is judged in the situation learning model that "it is sitting but not reading the book”, but because the time when the book is closed is short, “sitting and reading the book” It encourages reconsideration that "is.”
  • the action identification unit 440 compares the situation information data corresponding to the image of each frame with each of the usage learning models stored in the storage unit 450, and uses the model most suitable for the situation information data for usage learning. Extract from the model. Then, it is determined whether to apply the value of the situation information data or the value of the extracted model according to the degree of conformity between the situation information data and the extracted model.
  • the method of determining the suitability of the situation information data and the usage learning model is not particularly limited, but for example, a method of using the inner product value of the pattern of the situation information data and the pattern of the usage learning model can be mentioned.
  • the situation information data and the usage learning model include nine cells arranged in a 3 ⁇ 3 matrix as a pattern of the first layer and the second layer (FIG. 9 and FIG. 9 and). See FIG. 10).
  • the value of each cell is 0 or 1.
  • the value of the cell corresponding to the level of each element indicating the situation of the person or the object is 1, and the value of the other cells is 0.
  • cells having a value of 1 are painted black.
  • the inner product value of the pattern of the first layer of the situation information data and the pattern of the first layer of the usage learning model is calculated (step S201).
  • the inner product value of the pattern of the situation information data and the pattern of the usage learning model is calculated by multiplying the values of cells having the same coordinates and adding up the multiplied values of each coordinate.
  • the values of the cells constituting the pattern of the situation information data are A, B, C, D, E, F, G, H, and I, and the pattern of the usage learning model to be compared. It is assumed that the value of each cell constituting the above is 1,0,0,0,1,0,0,0,1.
  • the inner product value of the pattern of the situation information data and the pattern of the usage learning model is A ⁇ 1 + B ⁇ 0 + C ⁇ 0 + D ⁇ 0 + E ⁇ 1 + F ⁇ 0 + G ⁇ 0 + H ⁇ 0 + I ⁇ 1.
  • the inner product value calculated in this way is normalized by dividing by the number of cells having a value of 1 among the cells included in the status information data.
  • the calculation and normalization of the inner product value for the situation information data is performed for each of the plurality of models included in the usage learning model.
  • step S204 it is determined whether or not there are two or more models having the maximum inner product value. As a result of the determination, when there is only one model having the maximum internal product value (“No” in step S204), the process proceeds to step S205, and the model value having the maximum internal product value in the first layer is set as the person. Recognizes as the action of, and ends the process of step S104. On the other hand, as a result of the determination, when there are two or more models having the maximum inner product value (“Yes” in step S204), the process proceeds to step S206.
  • step S206 the inner product value is calculated and normalized to the second layer pattern of the status information data for each of the second layer patterns of the two or more models having the maximum inner product value.
  • the process of calculating the inner product value and the normalization is the same as the process for the pattern of the first layer.
  • step S207 it is determined whether or not there are two or more models having the maximum inner product value.
  • the process proceeds to step S208, and the value of the model having the maximum inner product value in the second layer is set. Recognizing it as a person's action, the process of step S104 ends.
  • the process proceeds to step S209.
  • step S209 whether or not there is a model that does not include an element whose duration is shorter than a predetermined time (short-time element) among the two or more models having the maximum inner product value in the second layer. Make a judgment.
  • the process proceeds to step S210, the value of the previous frame is recognized as the action of the person, and the process of step S104 is performed. To finish.
  • the process proceeds to step S211.
  • step S211 the value of the model that does not include the short-time element is determined as the behavior of the person, and the process of step S104 is completed. If there are multiple models that do not include short-term elements, select the latest model. It should be noted that a predetermined time, which is a criterion for determining whether or not the element is a short-time element, can be appropriately set for each of a plurality of elements representing the situation.
  • the information on the behavior of the person recognized by the usage learning unit 400 can be used as information for executing various actions. For example, when a person recognizes an action of sitting on a chair and starting to read a book, he / she can perform an action such as turning on a light. Alternatively, when the person recognizes the action of stopping reading and standing up, an action such as turning off the light can be executed. Further, the information regarding the behavior of the person recognized by the usage learning unit 400 may be fed back to the situation learning / identification unit 300 and used for learning of the neural network unit 320.
  • the behavior recognition device even if a large amount of learning data when a person closes or opens a book is not prepared, a comment is input in that state to learn usage. You can learn the situation properly just by doing. Therefore, for example, a series of actions such as a person sitting down and starting to read a book, closing the book after a while, and stopping reading can be appropriately recognized by simple learning.
  • FIG. 13 is a schematic view showing a hardware configuration example of the action recognition device according to the present embodiment.
  • the behavior recognition device 1000 can be realized by a hardware configuration similar to that of a general information processing device.
  • the action recognition device 1000 may include a CPU (Central Processing Unit) 500, a main storage unit 502, a communication unit 504, and an input / output interface unit 506.
  • a CPU Central Processing Unit
  • main storage unit 502 main storage unit 502
  • communication unit 504 main storage unit 504
  • the CPU 500 is a control / arithmetic device that controls the overall control and arithmetic processing of the action recognition device 1000.
  • the main storage unit 502 is a storage unit used for a data work area or a data temporary storage area, and may be configured by a memory such as a RAM (Random Access Memory).
  • the communication unit 504 is an interface for transmitting and receiving data via a network.
  • the input / output interface unit 506 is an interface for connecting to an external output device 510, an input device 512, a storage device 514, and the like to transmit and receive data.
  • the CPU 500, the main storage unit 502, the communication unit 504, and the input / output interface unit 506 are connected to each other by the system bus 508.
  • the storage device 514 may be composed of, for example, a hard disk device composed of a ROM (Read Only Memory), a magnetic disk, a non-volatile memory such as a semiconductor memory, or the like.
  • the main storage unit 502 can be used as a work area for constructing a neural network unit 320 including a plurality of learning cells 46 and executing an operation.
  • the CPU 500 functions as a control unit that controls arithmetic processing in the neural network unit 320 constructed in the main storage unit 502.
  • the storage device 514 can store the learning cell information (situation learning model) including the information about the learned learning cell 46. Further, by reading the learning cell information stored in the storage device 514 and configuring the main storage unit 502 to construct the neural network unit 320, it is possible to construct a learning environment for various situation information data.
  • the storage unit 450 for storing the usage learning model may be configured by the storage device 514. It is desirable that the CPU 500 is configured to execute arithmetic processing in a plurality of learning cells 46 of the neural network unit 320 constructed in the main storage unit 502 in parallel.
  • the communication unit 504 is a communication interface based on standards such as Ethernet (registered trademark) and Wi-Fi (registered trademark), and is a module for communicating with other devices.
  • the learning cell information may be received from another device via the communication unit 504. For example, frequently used learning cell information can be stored in the storage device 514, and less frequently used learning cell information can be configured to be read from another device.
  • the output device 510 includes a display such as a liquid crystal display device.
  • the output device 510 can be used as a display device for presenting the situation information data and the information on the behavior estimated by the situation learning / identification unit 300 to the user at the time of learning the usage learning unit 400. Further, the user can be notified of the learning result and the action decision via the output device 510.
  • the input device 512 is a keyboard, a mouse, a touch panel, or the like, and is used for the user to input predetermined information to the action recognition device 1000, for example, a user episode at the time of learning of the usage learning unit 400.
  • the status information data can also be configured to be read from another device via the communication unit 504.
  • the input device 512 can be used as a means for inputting the situation information data.
  • each part of the action recognition device 1000 can be realized in terms of hardware by mounting circuit components that are hardware components such as LSI (Large Scale Integration) in which a program is incorporated.
  • LSI Large Scale Integration
  • a program that provides the function can be stored in the storage device 514, loaded into the main storage unit 502, and executed by the CPU 500, so that the program can be realized by software.
  • the configuration of the action recognition device 1000 shown in FIG. 1 does not necessarily have to be configured as one independent device.
  • a part of the image acquisition unit 100, the situation grasping unit 200, the situation learning / identification unit 300, and the usage learning unit 400 for example, the situation learning / identification unit 300 and the usage learning unit 400 are arranged on the cloud, and by these.
  • the behavior recognition system may be constructed.
  • FIG. 14 is a schematic view showing a configuration example of the action recognition device according to the present embodiment.
  • the action recognition device 1000 includes a situation information data generation unit 310, an action identification unit 440, and a storage unit 450.
  • the situation information data generation unit 310 has a function of generating situation information data based on the situation of the subject in the image of the subject including a person.
  • the storage unit 450 stores the usage learning model.
  • the behavior identification unit 440 has a function of identifying a person's behavior based on situation information data and a usage learning model.
  • the situation information data generation unit maps the relationship between the first pattern that maps the relationship between a plurality of elements representing the situation and the information that represents their degree, and the first pattern that maps the relationship between the plurality of elements and the information that represents their duration.
  • the situation information data in which the pattern 2 and the behavior of the person estimated from the situation are linked is generated.
  • the usage learning model mapped the relationship between the third pattern, which maps the relationship between multiple elements and the information representing their degree, and the relationship between the plurality of elements and the information representing their duration, for a specific situation. It includes a plurality of models in which the fourth pattern and the behavior of the person estimated from a specific situation are associated with each other.
  • the behavior identification unit extracts the model with the highest degree of suitability for the situation information data from among the multiple models of the usage learning model. Then, when the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person. When the goodness of fit of the extracted model is less than a predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of a person.
  • a program for operating the configuration of the embodiment is recorded on a recording medium so as to realize the functions of the above-described embodiment
  • the program recorded on the recording medium is read out as a code, and the program is executed by a computer.
  • a computer-readable recording medium is also included in the scope of each embodiment.
  • not only the recording medium on which the above-mentioned program is recorded but also the program itself is included in each embodiment.
  • the behavior estimated by the extracted model is determined to be the behavior of the person.
  • a behavior recognition method characterized in that when the goodness of fit of the extracted model is less than the predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of the person.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

This behavior recognition device includes: a data generating unit which generates status information data on the basis of the status of a subject; a storage unit which stores a usage learning model; and a behavior identifying unit which identifies a behavior of a person on the basis of the status information data and the usage learning model. The status information data associate a pattern indicating a relationship between a plurality of elements representing the status, and the degree and duration thereof, with a behavior estimated from the status. The usage learning model includes a plurality of models associating, for a specific status, a pattern indicating the relationship between the plurality of elements and the degree and duration thereof, with a behavior estimated from the specific status. The behavior identifying unit extracts the model having the highest degree of fit to the status information data, from within the usage learning model, and determines the behavior of the person on the basis of the degree of fit of the extracted model.

Description

行動認識装置、行動認識方法、プログラム及び記録媒体Behavior recognition device, behavior recognition method, program and recording medium
 本発明は、行動認識装置、行動認識方法、プログラム及び記録媒体に関する。 The present invention relates to a behavior recognition device, a behavior recognition method, a program, and a recording medium.
 近年、機械学習手法として、多層ニューラルネットワークを用いた深層学習(ディープラーニング)が注目されている。深層学習は、バック・プロパゲーションと呼ばれる計算手法を用い、大量の教師データを多層ニューラルネットワークへ入力した際の出力誤差を計算し、誤差が最小となるように学習を行うものである。 In recent years, deep learning using a multi-layer neural network has been attracting attention as a machine learning method. Deep learning uses a calculation method called back propagation to calculate the output error when a large amount of teacher data is input to a multi-layer neural network, and learns so that the error is minimized.
 特許文献1乃至3には、大規模なニューラルネットワークを複数のサブネットワークの組み合わせとして規定することにより、少ない労力及び演算処理量でニューラルネットワークを構築することを可能にしたニューラルネットワーク処理装置が開示されている。また、特許文献4には、ニューラルネットワークの最適化を行う構造最適化装置が開示されている。 Patent Documents 1 to 3 disclose a neural network processing apparatus capable of constructing a neural network with a small amount of labor and arithmetic processing by defining a large-scale neural network as a combination of a plurality of subnetworks. ing. Further, Patent Document 4 discloses a structure optimizing device that optimizes a neural network.
特開2001-051968号公報Japanese Unexamined Patent Publication No. 2001-051968 特開2002-251601号公報JP-A-2002-251601 特開2003-317073号公報Japanese Unexamined Patent Publication No. 2003-317573 特開平09-091263号公報Japanese Unexamined Patent Publication No. 09-091263
 人の仕草や振る舞いを認識するための行動認識においても深層学習の適用が検討されている。しかしながら、深層学習では、教師データとして良質な大量のデータが必要であり、また、学習に長時間を要していた。特許文献1乃至4にはニューラルネットワークの構築のための労力や演算処理量を低減する手法が提案されているが、システム負荷等の更なる軽減のために、簡単なアルゴリズムによってより高い精度で学習及び認識することが望まれていた。 The application of deep learning is also being considered in behavior recognition for recognizing human behavior and behavior. However, in deep learning, a large amount of high-quality data is required as teacher data, and learning takes a long time. Patent Documents 1 to 4 propose a method for reducing the labor and the amount of arithmetic processing for constructing a neural network, but in order to further reduce the system load and the like, learning with higher accuracy by a simple algorithm is used. And it was desired to recognize.
 本発明の目的は、画像に写る人物の行動を簡単なアルゴリズムで且つ高い精度で認識することが可能な行動認識装置、行動認識方法、プログラム及び記録媒体を提供することにある。 An object of the present invention is to provide a behavior recognition device, a behavior recognition method, a program, and a recording medium capable of recognizing the behavior of a person in an image with a simple algorithm and with high accuracy.
 本発明の一観点によれば、人物を含む被写体の画像における前記被写体の状況に基づいて、状況情報データを生成する状況情報データ生成部と、用法学習モデルを格納する記憶部と、前記状況情報データと前記用法学習モデルとに基づいて前記人物の行動を識別する行動識別部と、を有し、前記状況情報データ生成部は、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた前記状況情報データを生成し、前記用法学習モデルは、特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含み、前記行動識別部は、前記用法学習モデルの前記複数のモデルのうち、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する行動認識装置が提供される。 According to one aspect of the present invention, a situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, a storage unit that stores a usage learning model, and the situation information. It has an action identification unit that identifies the behavior of the person based on the data and the usage learning model, and the situation information data generation unit has a plurality of elements representing the situation and information indicating the degree thereof. The first pattern that maps the relationship, the second pattern that maps the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. In the usage learning model, the situation information data is generated, and the usage learning model maps a third pattern in which the relationship between the plurality of elements and information representing their degree is mapped to a specific situation, and the plurality of elements and them. The behavior identification unit includes a plurality of models in which a fourth pattern mapping a relationship with information representing the duration of the data and the behavior of the person estimated from the specific situation are associated with the behavior identification unit. Among the plurality of models of the usage learning model, the model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or more than a predetermined threshold value, the extracted model estimates. Provided is an action recognition device that determines an action as the action of the person and determines the action estimated by the situation information data as the action of the person when the degree of conformity of the extracted model is less than the predetermined threshold value. ..
 また、本発明の他の一観点によれば、人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成し、特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には、前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には、前記状況情報データが推定する行動を前記人物の行動と判定する行動認識方法が提供される。 Further, according to another aspect of the present invention, the first mapping of the relationship between a plurality of elements representing the situation and information representing the degree thereof based on the situation of the subject in the image of the subject including a person. Generates situation information data in which the second pattern mapping the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. Then, for a specific situation, a third pattern that maps the relationship between the plurality of elements and information representing their degree and a third pattern that maps the relationship between the plurality of elements and information representing their duration are mapped. From the usage learning models including a plurality of models in which the pattern 4 and the behavior of the person estimated from the specific situation are linked, the model with the highest degree of conformity to the situation information data is selected. When the degree of suitability of the extracted and extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is less than the predetermined threshold value. In this case, an action recognition method for determining the action estimated by the situation information data as the action of the person is provided.
 また、本発明の更に他の一観点によれば、コンピュータを、人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成する手段、特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルを格納する手段、及び前記用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する手段、として機能させるプログラムが提供される。 Further, according to still another aspect of the present invention, the computer uses the computer to determine the relationship between a plurality of elements representing the situation and information representing the degree thereof, based on the situation of the subject in the image of the subject including a person. A situation in which the first pattern that is mapped, the second pattern that maps the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. A means for generating information data, a third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and information representing the plurality of elements and their duration. From the means for storing the usage learning model including a plurality of models in which the fourth pattern mapping the relationship and the behavior of the person estimated from the specific situation are linked, and the usage learning model. , The model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person. When the degree of suitability of the extracted model is less than the predetermined threshold value, a program is provided that functions as a means for determining the behavior estimated by the situation information data as the behavior of the person.
 本発明によれば、画像に写る人物の行動をより簡単なアルゴリズムで且つより高い精度で認識することが可能となる。 According to the present invention, it is possible to recognize the behavior of a person in an image with a simpler algorithm and with higher accuracy.
図1は、本発明の第1実施形態による行動認識装置の構成例を示す概略図である。FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the first embodiment of the present invention. 図2は、本発明の第1実施形態による行動認識装置における状況学習・識別部の構成例を示す概略図である。FIG. 2 is a schematic diagram showing a configuration example of a situation learning / identification unit in the behavior recognition device according to the first embodiment of the present invention. 図3は、本発明の第1実施形態による行動認識装置の状況学習・識別部におけるニューラルネットワーク部の構成例を示す概略図である。FIG. 3 is a schematic view showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the first embodiment of the present invention. 図4は、本発明の第1実施形態による行動認識装置の状況学習・識別部における学習セルの構成例を示す概略図である。FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the first embodiment of the present invention. 図5は、本発明の第1実施形態による行動認識装置における用法学習部の構成例を示す概略図である。FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the first embodiment of the present invention. 図6は、本発明の第1実施形態による行動認識装置を用いた行動認識方法を示すフローチャートである。FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the first embodiment of the present invention. 図7は、画像取得部が取得した画像から状況把握部が把握した情報の一例を示す図である。FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit. 図8は、状況把握部が把握した情報をマッピングする規則の一例を示す図である。FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit. 図9は、状況情報データの一例を示す図である。FIG. 9 is a diagram showing an example of situation information data. 図10は、用法学習モデルの一例を示す図である。FIG. 10 is a diagram showing an example of a usage learning model. 図11は、状況情報データと用法学習モデルとに基づいて人物の行動を認識する方法を示すフローチャートである。FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model. 図12は、状況情報データのパターンと用法学習モデルのパターンとの内積値を算出する方法を説明する図である。FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model. 図13は、本発明の第1実施形態による行動認識装置のハードウェア構成例を示す概略図である。FIG. 13 is a schematic view showing a hardware configuration example of the behavior recognition device according to the first embodiment of the present invention. 図14は、本発明の第2実施形態による行動認識装置の構成例を示す概略図である。FIG. 14 is a schematic view showing a configuration example of the behavior recognition device according to the second embodiment of the present invention.
 [第1実施形態]
 本発明の第1実施形態による行動認識装置の概略構成について、図1乃至図5を用いて説明する。図1は、本実施形態による行動認識装置の構成例を示す概略図である。図2は、本実施形態による行動認識装置における状況学習・識別部の構成例を示す概略図である。図3は、本実施形態による行動認識装置の状況学習・識別部におけるニューラルネットワーク部の構成例を示す概略図である。図4は、本実施形態による行動認識装置の状況学習・識別部における学習セルの構成例を示す概略図である。図5は、本実施形態による行動認識装置における用法学習部の構成例を示す概略図である。
[First Embodiment]
The schematic configuration of the behavior recognition device according to the first embodiment of the present invention will be described with reference to FIGS. 1 to 5. FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the present embodiment. FIG. 2 is a schematic diagram showing a configuration example of the situation learning / identification unit in the behavior recognition device according to the present embodiment. FIG. 3 is a schematic diagram showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the present embodiment. FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the present embodiment. FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the present embodiment.
 本実施形態による行動認識装置1000は、例えば図1に示すように、画像取得部100と、状況把握部200と、状況学習・識別部300と、用法学習部400と、により構成され得る。 As shown in FIG. 1, for example, the behavior recognition device 1000 according to the present embodiment may be composed of an image acquisition unit 100, a situation grasping unit 200, a situation learning / identification unit 300, and a usage learning unit 400.
 画像取得部100は、図示しない外部のカメラや記憶装置から画像を取得する機能を備えた機能ブロックである。画像取得部100が取得する画像は、同じ被写体に対して異なる時間で撮影した複数の画像を含み、例えば動画像である。画像には、状況把握部200における処理に適切な画像を適宜選択することができ、例えば、RGB画像や深度画像を含み得る。 The image acquisition unit 100 is a functional block having a function of acquiring an image from an external camera or storage device (not shown). The image acquired by the image acquisition unit 100 includes a plurality of images taken with respect to the same subject at different times, and is, for example, a moving image. An image suitable for processing in the situation grasping unit 200 can be appropriately selected as the image, and may include, for example, an RGB image or a depth image.
 状況把握部200は、画像取得部100が取得した画像の各々に対し、公知の画像認識技術、例えばディープラーニングを用いた画像認識技術を用いて、画像に写る被写体(人物、物体)の認識とその状況を把握する機能を備えた機能ブロックである。状況把握部200における人物認識や物体認識には、公知の機器や方式を適宜用いることができる。例えば、人物認識に適用可能な機器や方式としては、Kinect(登録商標)、Face Grapher、OpenPose、Pose Net、Pose Proposal Networks、DensePose等が挙げられる。物体認識に適用可能な機器や方式としては、SSD(Single Shot Multibox Detector)、YOLOv3、Mask R-CNN等が挙げられる。 The situation grasping unit 200 uses a known image recognition technique, for example, an image recognition technique using deep learning, for each of the images acquired by the image acquisition unit 100 to recognize a subject (person, object) appearing in the image. It is a functional block that has a function to grasp the situation. A known device or method can be appropriately used for person recognition and object recognition in the situation grasping unit 200. For example, examples of devices and methods applicable to person recognition include Kinect (registered trademark), Face Grapher, OpenPose, Pose Net, Pose Proposal Networks, DensePose and the like. Examples of devices and methods applicable to object recognition include SSD (Single Shot Multibox Detector), YOLOv3, Mask R-CNN, and the like.
 また、状況把握部200は、被写体の時系列解析を行う機能を備え得る。被写体の短時間時系列解析には、例えば、RNN(Recurrent Neural Network)、LSTM(Long Short-Term Memory Network)、GRU(Gated Recurrent Unit)等を適用することができる。被写体の長時間時系列解析には、例えばMemory Networksを適用することができる。 Further, the situation grasping unit 200 may have a function of performing time series analysis of the subject. For example, RNN (Recurrent Neural Network), RSTM (Long Short-Term Memory Network), GRU (Gated Recurrent Unit) and the like can be applied to the short-time time series analysis of the subject. For example, Memory Networks can be applied to long-time time series analysis of a subject.
 状況学習・識別部300は、状況把握部200から受け取った情報に基づいて状況情報データを生成する機能を備えた機能ブロックである。状況情報データとは、状況把握部200から受け取った情報をマッピングしたパターンと、状況把握部200から受け取った情報から推定される人物の行動を示す推定結果と、が紐付けられたデータである。なお、状況情報データの詳細については後述する。 The situation learning / identification unit 300 is a functional block having a function of generating situation information data based on the information received from the situation grasping unit 200. The situation information data is data in which a pattern mapping information received from the situation grasping unit 200 and an estimation result indicating a person's behavior estimated from the information received from the situation grasping unit 200 are associated with each other. The details of the status information data will be described later.
 状況学習・識別部300には、状況把握部200から受け取った情報から人物の行動を推定する状況学習モデルが構築されている。状況学習・識別部300は、状況把握部200から受け取った情報と状況学習モデルから出力される情報とを組み合わせ、状況情報データを生成する。 In the situation learning / identification unit 300, a situation learning model that estimates the behavior of a person from the information received from the situation grasping unit 200 is constructed. The situation learning / identification unit 300 combines the information received from the situation grasping unit 200 with the information output from the situation learning model to generate the situation information data.
 ここでは状況学習・識別部300の一例として、状況把握部200から受け取った情報に基づいて学習を行い、状況学習モデルを生成する機能を備えた状況学習・識別部300について、図2を用いて説明する。なお、状況学習モデルは、状況把握部200から受け取った情報を入力として推定される人物の行動を出力するものであれば特に限定されるものではなく、例えばルールベースに基づくものであってもよい。この場合、状況学習・識別部300は、必ずしも状況把握部200から受け取った情報に基づいて学習を行う機能を備えている必要はない。 Here, as an example of the situation learning / identification unit 300, FIG. 2 is used for the situation learning / identification unit 300 having a function of performing learning based on the information received from the situation grasping unit 200 and generating a situation learning model. explain. The situation learning model is not particularly limited as long as it outputs the behavior of a person estimated by inputting the information received from the situation grasping unit 200, and may be based on a rule base, for example. .. In this case, the situation learning / identification unit 300 does not necessarily have to have a function of performing learning based on the information received from the situation grasping unit 200.
 状況学習・識別部300は、例えば図2に示すように、状況情報データ生成部310と、ニューラルネットワーク部320と、判定部330と、学習部340と、識別部350と、出力部360と、により構成され得る。学習部340は、重み修正部342と、学習セル生成部344と、により構成され得る。 As shown in FIG. 2, for example, the situation learning / identification unit 300 includes a situation information data generation unit 310, a neural network unit 320, a determination unit 330, a learning unit 340, an identification unit 350, and an output unit 360. Can be composed of. The learning unit 340 may be composed of a weight correction unit 342 and a learning cell generation unit 344.
 状況情報データ生成部310は、状況把握部200から受け取った情報に基づいて、画像に写る人物の行動や物体の状況に関わる情報を表すパターンデータを生成する機能を備える。また、状況情報データ生成部310は、状況把握部200から受け取った情報と状況学習モデルから出力される情報とを組み合わせ、状況情報データを生成する機能を備える。 The situation information data generation unit 310 has a function of generating pattern data representing information related to the behavior of a person or the situation of an object in an image based on the information received from the situation grasping unit 200. Further, the situation information data generation unit 310 has a function of combining the information received from the situation grasping unit 200 and the information output from the situation learning model to generate the situation information data.
 ニューラルネットワーク部320は、例えば図3に示すように、入力層と出力層とを含む2層の人工ニューラルネットワークにより構成され得る。入力層は、少なくとも、1つのパターンデータに含まれる要素値の数に対応する数のセル(ニューロン)42を備える。例えば、1つのパターンデータがM個の要素値を含む場合、入力層は、少なくともM個のセル42,42,…,42,…,42を含む。出力層は、少なくとも、推定される行動の数に対応する数のセル(ニューロン)44を備える。例えば、出力層は、推定される行動の数に対応するN個のセル44,44,…,44,…,44を含む。出力層を構成するセル44の各々は、推定される行動のうちのいずれかに紐付けられている。なお、教師データを用いてニューラルネットワーク部320を学習する場合、出力層は、少なくとも教師データに紐付けられた行動の数に対応する数のセル44を含む。 As shown in FIG. 3, for example, the neural network unit 320 may be composed of a two-layer artificial neural network including an input layer and an output layer. The input layer includes at least a number of cells (neurons) 42 corresponding to the number of element values contained in one pattern data. For example, if one pattern data containing M element values, the input layer comprises at least M cells 42 1, 42 2, ..., 42 i, ..., a 42 M. The output layer comprises at least a number of cells (neurons) 44 corresponding to the estimated number of actions. For example, the output layer comprises N cells 44 1, 44 2 corresponding to the number of actions to be estimated, ..., 44 j, ..., a 44 N. Each of the cells 44 constituting the output layer is associated with one of the presumed behaviors. When learning the neural network unit 320 using the teacher data, the output layer includes at least a number of cells 44 corresponding to the number of actions associated with the teacher data.
 入力層のセル42,42,…,42,…,42には、状況情報データのM個の要素値I,I,…,I,…,Iが、それぞれ入力される。セル42,42,…,42,…,42の各々は、入力された要素値Iをセル44,44,…,44,…,44のそれぞれに出力する。 Cell 42 1 of the input layer, 42 2, ..., 42 i, ..., 42 in the M, M-number of element values I 1 of the status information data, I 2, ..., I i , ..., I M , respectively input Will be done. Each of the cells 42 1, 42 2, ..., 42 i, ..., 42 M , the cell 44 1, 44 2 the input element values I, ..., 44 j, ..., and outputs the respective 44 N.
 セル42とセル44とを繋ぐ枝(軸索)の各々には、要素値Iに対して所定の重み付けをするための重み付け係数ωが設定されている。例えば、セル42,42,…,42,…,42とセル44とを繋ぐ枝には、例えば図5に示すように、重み付け係数ω1j,ω2j,…,ωij,…,ωMjが設定されている。これによりセル44は、以下の式(1)に示す演算を行い、出力値Oを出力する。
Figure JPOXMLDOC01-appb-M000001
A weighting coefficient ω for giving a predetermined weighting to the element value I is set in each of the branches (axons) connecting the cell 42 and the cell 44. For example, the cell 42 1, 42 2, ..., 42 i, ..., the branch connecting the 42 M and the cell 44 j, for example, as shown in FIG. 5, the weighting factor ω 1j, ω 2j, ..., ω ij, …, Ω Mj is set. As a result, the cell 44 j performs the operation shown in the following equation (1) and outputs the output value O j .
Figure JPOXMLDOC01-appb-M000001
 なお、本明細書では、1つのセル44と、そのセル44に要素値I~Iを入力する枝(入力ノード)と、そのセル44から出力値Oを出力する枝(出力ノード)とを総称して学習セル46と表記することがある。 In this specification, the single cell 44, the branch (input node) for inputting the element value I 1 ~ I M to the cell 44, the branch (output node) outputs an output value O from the cell 44 May be generically referred to as learning cell 46.
 判定部330は、パターンデータの複数の要素値と学習セル46の出力値との間の相関値と所定の閾値とを比較し、当該相関値が閾値以上であるか閾値未満であるかを判定する。相関値の一例は、学習セル46の出力値に対する尤度である。なお、判定部330の機能は、学習セル46の各々が備えていてもよい。 The determination unit 330 compares the correlation value between the plurality of element values of the pattern data and the output value of the learning cell 46 with a predetermined threshold value, and determines whether the correlation value is equal to or greater than the threshold value or less than the threshold value. do. An example of the correlation value is the likelihood of the learning cell 46 with respect to the output value. The function of the determination unit 330 may be provided in each of the learning cells 46.
 学習部340は、判定部330の判定結果に応じてニューラルネットワーク部320の学習を行う機能ブロックである。重み修正部342は、上記相関値が所定の閾値以上である場合に、学習セル46の入力ノードに設定された重み付け係数ωを更新する。また、学習セル生成部344は、上記相関値が所定の閾値未満である場合に、ニューラルネットワーク部320に新たな学習セル46を追加する。 The learning unit 340 is a functional block that learns the neural network unit 320 according to the determination result of the determination unit 330. The weight correction unit 342 updates the weighting coefficient ω set in the input node of the learning cell 46 when the correlation value is equal to or greater than a predetermined threshold value. Further, the learning cell generation unit 344 adds a new learning cell 46 to the neural network unit 320 when the correlation value is less than a predetermined threshold value.
 識別部350は、パターンデータの複数の要素値と学習セル46の出力値との間の相関値に基づき、当該パターンデータから推定される人物の行動を識別する。出力部360は、識別部350による識別結果を出力する。 The identification unit 350 identifies the behavior of a person estimated from the pattern data based on the correlation value between the plurality of element values of the pattern data and the output value of the learning cell 46. The output unit 360 outputs the identification result by the identification unit 350.
 次に、状況学習・識別部300における学習方法について、簡単に説明する。 Next, the learning method in the situation learning / identification unit 300 will be briefly explained.
 まず、初期状態として、ニューラルネットワーク部320に、学習対象データに紐付けられた教師情報のカテゴリ(ニューラルネットワーク部320に学習させたい人物の行動)の数に相当する数の学習セル46を設定する。 First, as an initial state, the neural network unit 320 is set with a number of learning cells 46 corresponding to the number of categories of teacher information (behavior of a person to be trained by the neural network unit 320) associated with the learning target data. ..
 次いで、状況情報データ生成部310に、学習対象データを取り込む。次いで、状況情報データ生成部310において、取り込んだ学習対象データの特徴を示す要素値を抽出し、所定のパターンデータを生成する。 Next, the learning target data is taken into the situation information data generation unit 310. Next, the situation information data generation unit 310 extracts element values indicating the characteristics of the captured learning target data, and generates predetermined pattern data.
 次いで、パターンデータの複数の要素値を、ニューラルネットワーク部320に入力する。ニューラルネットワーク部320に入力されたパターンデータの要素値I~Iは、セル42~42を介してセル44~42に入力される。これにより、セル44~42Nから、出力Oが得られる。この際、学習セル46の入力ノードには重み付け係数ωが設定されているため、出力値Oは式(1)に基づいて算出される。 Next, a plurality of element values of the pattern data are input to the neural network unit 320. Element value I 1 ~ I M of the pattern data inputted to the neural network 320 is input to the cell 44 1 ~ 42 N via the cell 42 1 ~ 42 M. As a result, outputs O 1 to N can be obtained from cells 44 1 to 42 N. At this time, since the weighting coefficient ω is set in the input node of the learning cell 46, the output value O is calculated based on the equation (1).
 次いで、当該学習セル46の出力値Oに基づき、判定部330において、要素値I~Iと学習セル46の出力値Oとの間の相関値(ここでは、学習セルの出力値に関する尤度Pとする)を算出する。尤度Pの算出方法は、特に限定されるものではない。例えば、学習セル46の尤度Pは、以下の式(2)に基づいて算出することができる。
Figure JPOXMLDOC01-appb-M000002
Next, based on the output value O of the learning cell 46, in the determination unit 330, the correlation value between the element values I 1 to IM and the output value O of the learning cell 46 (here, the likelihood regarding the output value of the learning cell). (Likelihood P) is calculated. The method for calculating the likelihood P is not particularly limited. For example, the likelihood P j of the learning cell 46 j can be calculated based on the following equation (2).
Figure JPOXMLDOC01-appb-M000002
 式(2)は、尤度Pが、学習セル46の複数の入力ノードの重み付け係数ωijの累積値に対する学習セル46の出力値Oの比率で表されることを示している。或いは、尤度Pが、複数の入力ノードの重み付け係数ωijに基づく学習セル46の出力の最大値に対する、複数の要素値を入力したときの学習セル46の出力値の比率で表されることを示している。 Equation (2) is the likelihood P j have shown to be expressed by the ratio of the output value O j of the learning cell 46 j for the cumulative value of the weighting factor omega ij of a plurality of input nodes of the learning cell 46 j .. Alternatively, the likelihood P j is the ratio of the output value of the learning cell 46 j when a plurality of element values are input to the maximum value of the output of the learning cell 46 j based on the weighting coefficient ω ij of a plurality of input nodes. It shows that it will be done.
 次いで、判定部330において、算出した尤度Pの値と所定の閾値とを比較し、尤度Pの値が閾値以上であるか否かを判定する。 Next, the determination unit 330 compares the calculated value of the likelihood P with a predetermined threshold value, and determines whether or not the value of the likelihood P is equal to or greater than the threshold value.
 取り込んだ学習対象データの教師情報のカテゴリに紐付けられた学習セル46のうち、尤度Pの値が閾値以上である学習セル46が1つ以上存在した場合には、当該カテゴリに紐付けられた学習セル46のうち尤度Pの値が最も大きい学習セル46の入力ノードの重み付け係数ωを更新する。このようにして、尤度Pの値が所定の閾値以上の学習対象データの情報を各入力ノードの重み付け係数ωに累積していく。 If there is one or more learning cells 46 whose likelihood P value is equal to or greater than the threshold among the learning cells 46 associated with the teacher information category of the imported learning target data, they are associated with the category. The weighting coefficient ω of the input node of the learning cell 46 having the largest value of the likelihood P among the learning cells 46 is updated. In this way, the information of the learning target data whose likelihood P value is equal to or greater than a predetermined threshold value is accumulated in the weighting coefficient ω of each input node.
 一方、取り込んだ学習対象データの教師情報のカテゴリに紐付けられた学習セル46のうち、尤度Pの値が閾値以上である学習セル46が1つも存在しない場合には、当該カテゴリに紐付けられた新たな学習セル46を生成する。 On the other hand, if none of the learning cells 46 whose likelihood P value is equal to or higher than the threshold value exists among the learning cells 46 linked to the teacher information category of the imported learning target data, the learning cells 46 are linked to the category. A new learning cell 46 is generated.
 このようにしてニューラルネットワーク部320を繰り返し学習することにより、ニューラルネットワーク部320に前述の状況学習モデルを構築することができる。 By repeatedly learning the neural network unit 320 in this way, the above-mentioned situation learning model can be constructed in the neural network unit 320.
 上記学習方法は、深層学習などにおいて用いられている誤差逆伝播法(バック・プロパゲーション)を適用するものではなく、1パスでの学習が可能である。このため、ニューラルネットワーク部320の学習処理を簡略化することができる。また、各々の学習セル46は独立しているため、学習データの追加、削除、更新が容易である。 The above learning method does not apply the error back propagation method (back propagation) used in deep learning and the like, and learning in one pass is possible. Therefore, the learning process of the neural network unit 320 can be simplified. Further, since each learning cell 46 is independent, it is easy to add, delete, and update learning data.
 なお、上述のアルゴリズムを用いた学習方法及び識別方法については、例えば同一出願人による国際出願第PCT/JP2018/042781号明細書に詳述されている。 The learning method and identification method using the above algorithm are described in detail in, for example, International Application No. PCT / JP2018 / 042781 by the same applicant.
 次に、状況学習・識別部300における識別方法について、簡単に説明する。 Next, the identification method in the situation learning / identification unit 300 will be briefly described.
 まず、状況情報データ生成部310に、状況把握部200から受け取った情報を取り込む。次いで、状況情報データ生成部310において、取り込んだ情報の特徴を示す要素値を抽出し、所定のパターンデータを生成する。 First, the information received from the situation grasping unit 200 is taken into the situation information data generation unit 310. Next, the situation information data generation unit 310 extracts element values indicating the characteristics of the captured information and generates predetermined pattern data.
 次いで、パターンデータの要素値I~Iを、上述のようにして学習を行ったニューラルネットワーク部320に入力する。ニューラルネットワーク部320に入力された要素値I~Iは、セル42~42を介して、各学習セル46に入力される。これにより、総ての学習セル46から、要素値I~Iに応じた出力値Oを得る。 Then, the element values I 1 ~ I M of the pattern data, and inputs to the neural network unit 320 performing learning as described above. Element value I 1 ~ I M input to the neural network 320, via the cell 42 1 ~ 42 M, is input to the learning cell 46. Thus, from all the learning cell 46, to obtain an output value O in accordance with the element values I 1 ~ I M.
 次いで、学習セル46から出力された出力値Oに基づき、識別部350において、要素値I~Iと学習セル46の出力値Oとの間の相関値(ここでは、学習セルの出力値に関する尤度Pとする)を算出する。尤度Pの算出方法は、特に限定されるものではない。 Next, based on the output value O output from the learning cell 46, the identification unit 350 has a correlation value between the element values I 1 to IM and the output value O of the learning cell 46 (here, the output value of the learning cell). (Likelihood P for) is calculated. The method for calculating the likelihood P is not particularly limited.
 次いで、算出した総ての学習セル46の尤度Pに基づいて、パターンデータから推定される人物の行動を識別する。人物の行動を識別する方法は、特に限定されるものではない。例えば、総ての学習セル46のうち、最も尤度Pの大きい学習セル46に紐付けられた行動を、パターンデータから推定される行動と識別することができる。或いは、総ての学習セル46の中から尤度Pの大きい順に所定の数の学習セル46を抽出し、抽出した学習セル46に対して最も多く紐付けられた行動を、パターンデータから推定される行動と識別することができる。 Next, the behavior of the person estimated from the pattern data is identified based on the calculated likelihood P of all the learning cells 46. The method of identifying the behavior of a person is not particularly limited. For example, among all the learning cells 46, the behavior associated with the learning cell 46 having the highest likelihood P can be identified from the behavior estimated from the pattern data. Alternatively, a predetermined number of learning cells 46 are extracted from all the learning cells 46 in descending order of likelihood P, and the behavior most associated with the extracted learning cells 46 is estimated from the pattern data. Can be distinguished from the behavior.
 用法学習部400は、状況学習・識別部300が生成した状況情報データに対するユーザの評価に基づき用法学習モデルを生成するとともに、状況情報データ及び用法学習モデルに基づき人物の動作を識別する機能を備えた機能ブロックである。 The usage learning unit 400 has a function of generating a usage learning model based on the user's evaluation of the situation information data generated by the situation learning / identification unit 300, and identifying a person's movement based on the situation information data and the usage learning model. It is a functional block.
 用法学習部400は、例えば図5に示すように、状況情報データ取得部410と、評価取得部420と、用法学習モデル生成部430と、行動識別部440と、記憶部450と、により構成され得る。 As shown in FIG. 5, for example, the usage learning unit 400 includes a situation information data acquisition unit 410, an evaluation acquisition unit 420, a usage learning model generation unit 430, an action identification unit 440, and a storage unit 450. obtain.
 状況情報データ取得部410は、状況情報データ生成部310が生成した状況情報データを状況学習・識別部300から取得する機能を備える。 The situation information data acquisition unit 410 has a function of acquiring the situation information data generated by the situation information data generation unit 310 from the situation learning / identification unit 300.
 評価取得部420は、状況情報データに対するユーザ(アドバイザ)の評価を取得する機能を備える。この評価は、状況情報データが示す状況に対して再考を促す情報を含むものであり、言わば状況学習モデルに対してユーザが与えるノウハウである。状況情報データに対するユーザの評価は、例えば、状況学習で利用した映像を見ながらユーザがコメントをキーボードに入力することにより行うことができる。状況情報データに対するユーザの評価は、状況学習を行う際に同時に行うことも可能である。 The evaluation acquisition unit 420 has a function of acquiring the evaluation of the user (advisor) for the status information data. This evaluation includes information that encourages reconsideration of the situation indicated by the situation information data, and is, so to speak, know-how given by the user to the situation learning model. The user's evaluation of the situation information data can be performed, for example, by the user inputting a comment on the keyboard while watching the video used in the situation learning. The user's evaluation of the situation information data can be performed at the same time as the situation learning is performed.
 用法学習モデル生成部430は、状況情報データ及び状況情報データに対するユーザの評価に基づき、用法学習モデルを生成する機能を備える。用法学習モデルは、状況把握部200から受け取った情報をマッピングしたパターンと、ユーザの評価に応じた人物の行動と、が紐付けられたデータを含み得る。用法学習モデル生成部430が生成した用法学習モデルは、記憶部450に格納される。 The usage learning model generation unit 430 has a function of generating a usage learning model based on the situation information data and the user's evaluation of the situation information data. The usage learning model may include data in which a pattern mapping information received from the situation grasping unit 200 and a person's behavior according to a user's evaluation are associated with each other. The usage learning model generated by the usage learning model generation unit 430 is stored in the storage unit 450.
 用法学習モデル生成部430は、状況情報データに対するユーザの評価(コメント)に基づいて更にマッピングを行い、新たなパターンを生成する機能を備えていてもよい。この場合の用法学習モデルは、ユーザのコメントに示される情報をマッピングした新たなパターンと、そのパターンに対するユーザの評価に応じた人物の行動と、が紐付けられたデータであり得る。 The usage learning model generation unit 430 may have a function of further mapping based on the user's evaluation (comment) on the situation information data and generating a new pattern. The usage learning model in this case may be data in which a new pattern mapping the information shown in the user's comment and the behavior of the person according to the user's evaluation for the pattern are associated with each other.
 例えば、状況情報データのパターンに、人物が「浅く腰掛けている(弱)」状態を示す情報がマッピングされていた場合に、そのときの状況に対して「深く腰掛けている(強)」状態をも必要であるとユーザが考えたものとする。このような場合、用法学習モデル生成部430は、状況情報データのパターンに対し、ユーザからのコメントに基づいて「深く腰掛けている(強)」状態を示す情報を追加でマッピングし、新たなパターンを生成する。用法学習モデル生成部430は、例えば、キーボード等を介してユーザが入力する「弱」、「中」、「強」などの単語に応じてその情報を予め定められた座標にマッピングすることで、新たなパターンを生成することができる。 For example, when information indicating a state in which a person is "sit shallowly (weak)" is mapped to a pattern of situation information data, the state of "sit deeply (strong)" with respect to the situation at that time is set. It is assumed that the user thinks that it is also necessary. In such a case, the usage learning model generation unit 430 additionally maps the information indicating the "deeply sitting (strong)" state to the pattern of the situation information data based on the comment from the user, and newly maps the pattern. To generate. The usage learning model generation unit 430 maps the information to predetermined coordinates according to words such as "weak", "medium", and "strong" input by the user via a keyboard or the like. A new pattern can be generated.
 行動識別部440は、状況情報データと用法学習モデル生成部430とに基づき、人物の行動を識別する機能を備える。 The behavior identification unit 440 has a function of identifying a person's behavior based on the situation information data and the usage learning model generation unit 430.
 次に、本実施形態による行動認識装置を用いた行動認識方法について、図6乃至図12を用いて説明する。図6は、本実施形態による行動認識装置を用いた行動認識方法を示すフローチャートである。図7は、画像取得部が取得した画像から状況把握部が把握した情報の一例を示す図である。図8は、状況把握部が把握した情報をマッピングする規則の一例を示す図である。図9は、状況情報データの一例を示す図である。図10は、用法学習モデルの一例を示す図である。図11は、状況情報データと用法学習モデルとに基づいて人物の行動を認識する方法を示すフローチャートである。図12は、状況情報データのパターンと用法学習モデルのパターンとの内積値を算出する方法を説明する図である。 Next, the behavior recognition method using the behavior recognition device according to the present embodiment will be described with reference to FIGS. 6 to 12. FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the present embodiment. FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit. FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit. FIG. 9 is a diagram showing an example of situation information data. FIG. 10 is a diagram showing an example of a usage learning model. FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model. FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model.
 ここでは理解を容易にするために、1)人が椅子に腰掛けて本を読み始める、2)本を読んでいる最中に本を閉じたり開いたりする、3)しばらく本を読んだ後に本を閉じて読書をやめる、といった一連の行動を認識する場合を想定し、適宜説明を補足する。状況学習・識別部300には、本の状態、本の位置及び人の状態を入力として人の行動を推定する状況学習モデルが構築されているものとする。 Here, for ease of understanding, 1) a person sits in a chair and starts reading a book, 2) closes or opens the book while reading the book, and 3) reads the book after reading the book for a while. Assuming a case of recognizing a series of actions such as closing and stopping reading, supplement the explanation as appropriate. It is assumed that the situation learning / identification unit 300 has a situation learning model that estimates the behavior of a person by inputting the state of the book, the position of the book, and the state of the person.
 まず、画像取得部100は、カメラや記憶装置から、同じ被写体を異なる時間に撮影した複数の画像を取得する(ステップS101)。画像取得部100が取得する複数の画像は、例えば動画像の各フレームの画像である。この場合、必ずしも総てのフレームの画像を取得する必要はなく、適宜間引いてもよい。取得する画像は、被写体の状況把握に適したものであればよく、適宜選択することができる。例えば、RGBカメラ及び赤外線カメラにより取得したRGB画像及び深度画像を適用することができる。画像取得部100が取得した画像は、そのまま状況把握部200に入力してもよいし、図示しない記憶装置に一時的に格納してもよい。 First, the image acquisition unit 100 acquires a plurality of images of the same subject taken at different times from a camera or a storage device (step S101). The plurality of images acquired by the image acquisition unit 100 are, for example, images of each frame of a moving image. In this case, it is not always necessary to acquire the images of all the frames, and the images may be thinned out as appropriate. The image to be acquired may be any image suitable for grasping the situation of the subject, and can be appropriately selected. For example, RGB images and depth images acquired by an RGB camera and an infrared camera can be applied. The image acquired by the image acquisition unit 100 may be input to the situation grasping unit 200 as it is, or may be temporarily stored in a storage device (not shown).
 次いで、状況把握部200は、画像取得部100が取得した画像の各々に対し、公知の画像認識技術、例えばディープラーニングを用いた画像認識技術を用いて、画像に写る人物や物体を認識するとともに、その状況を把握する(ステップS102)。 Next, the situation grasping unit 200 recognizes a person or an object appearing in the image by using a known image recognition technique, for example, an image recognition technique using deep learning, for each of the images acquired by the image acquisition unit 100. , The situation is grasped (step S102).
 例えば、本を手に持ち椅子に腰掛けた人物が画像に写っている場合、人物の状況としては、椅子に浅く腰掛けているのか、椅子に深く腰掛けているのか、等が挙げられる。また、物体(本)の状況としては、例えば、開いているのか、閉じているのか、人物の顔の近くにあるか、等が挙げられる。 For example, when a person holding a book in his hand and sitting on a chair is shown in the image, the situation of the person may be whether he is sitting shallowly on the chair or deeply on the chair. Further, as the situation of the object (book), for example, whether it is open or closed, whether it is near the face of a person, or the like can be mentioned.
 次いで、状況学習・識別部300は、状況把握部200から受け取った情報に基づいて、状況情報データを生成する(ステップS103)。生成する状況情報データには、人物や物体の状況を示す各要素についてその度合を複数の段階に分けてマッピングした第1階層のパターンデータと、第1階層のパターンデータから人物の行動として推定される状況(バリュー)に関する情報と、が含まれる。人物の行動として推定される状況(バリュー)は、第1階層のパターンデータを状況学習モデルに適用することにより取得される情報である。また、状況情報データには、人物や物体の状況を示す各要素についてその継続時間を複数の段階に分けてマッピングした第2階層のパターンデータが付与される。 Next, the situation learning / identification unit 300 generates situation information data based on the information received from the situation grasping unit 200 (step S103). The generated situation information data is estimated as a person's behavior from the pattern data of the first layer in which the degree of each element indicating the situation of a person or an object is mapped in a plurality of stages and the pattern data of the first layer. Includes information about the situation (value). The situation (value) estimated as the behavior of a person is information acquired by applying the pattern data of the first layer to the situation learning model. Further, the situation information data is provided with pattern data of the second layer in which the duration of each element indicating the situation of a person or an object is mapped by dividing it into a plurality of stages.
 例えば、人物や物体の状況を示す3つの要素として「本の状態」、「本の位置」及び「座り具合」を用い、各要素の度合を3段階に分けてマッピングするものとする。この場合に、例えば第18フレームから第22フレームの各画像において、人物や物体の状況を示す3つの要素と、その場合に推定される状況(バリュー)として、図7に示すような情報が得られているものとする。 For example, "book state", "book position", and "sitting condition" are used as three elements indicating the situation of a person or an object, and the degree of each element is mapped in three stages. In this case, for example, in each image of the 18th frame to the 22nd frame, information as shown in FIG. 7 is obtained as three elements indicating the situation of a person or an object and the situation (value) estimated in that case. It is assumed that it has been done.
 このような場合、例えば図8に示す規則を用いることで、図7の各情報をパターンデータとしてマッピングすることができる。図8に示す規則は、各要素に対して3段階のレベルを設け、3×3のパターンにマッピングする場合の例である。第1階層の本の状態としては、例えば、「閉じている(閉)」、「開いている(開)」、「その中間の状態(中)」の3段階のレベルを想定することができる。本の位置としては、例えば、「近い(近)」、「遠い(遠)」、「その中間の状態(中)」の3段階のレベルを想定することができる。座り具合としては、例えば、「浅く腰掛けている(弱)」、「しっかり座っている(強)」、「その中間の状態(中)」の3段階のレベルを想定することができる。第2階層の継続時間については、各要素に対し、「短い(短)」、「長い(長)」、「その中間の状態(中)」の3段階のレベルを想定することができる。 In such a case, for example, by using the rule shown in FIG. 8, each information in FIG. 7 can be mapped as pattern data. The rule shown in FIG. 8 is an example in which three levels are provided for each element and mapped to a 3 × 3 pattern. As the state of the book in the first layer, for example, three levels of "closed (closed)", "open (open)", and "intermediate state (middle)" can be assumed. .. As the position of the book, for example, three levels of "near (near)", "far (far)", and "intermediate state (middle)" can be assumed. As the sitting condition, for example, three levels of "sit shallowly (weak)", "sit firmly (strong)", and "intermediate state (medium)" can be assumed. Regarding the duration of the second layer, three levels of "short (short)", "long (long)", and "intermediate state (medium)" can be assumed for each element.
 図9は、図7に示したフレーム18~21の情報を、図7に示す規則に従って状況情報データとして表した例である。状況情報データは、各フレームの画像に対応して、第1階層及び第2階層のパターンと、バリューと、を含む。 FIG. 9 is an example in which the information of frames 18 to 21 shown in FIG. 7 is represented as situation information data according to the rules shown in FIG. The situation information data includes patterns and values of the first layer and the second layer corresponding to the image of each frame.
 次いで、行動識別部440は、状況学習・識別部300において生成した各フレームの画像に対応する状況情報データに対して用法学習モデルを適用し、状況学習における推定結果を検証する(ステップS104)。具体的には、状況情報データのパターンと用法学習モデルのパターンとを比較し、用法学習モデルの中に状況情報データに対して適合性の高いモデルがあるかどうかを検索する。 Next, the behavior identification unit 440 applies the usage learning model to the situation information data corresponding to the image of each frame generated by the situation learning / identification unit 300, and verifies the estimation result in the situation learning (step S104). Specifically, the pattern of the situation information data is compared with the pattern of the usage learning model, and it is searched whether or not there is a model highly compatible with the situation information data in the usage learning model.
 次いで、行動識別部440は、ステップS104における検証結果に基づいて、人物の行動を認識する(ステップS105)。具体的には、用法学習モデルの中に状況情報データに対して適合性の高いモデルが存在しない場合には、人物に行動として状況情報データのバリューを人物の行動として認識する。一方、用法学習モデルの中に状況情報データに対して適合性の高いモデルが存在する場合には、状況情報データに対して適合性の高いモデルのバリューを人物の行動として認識する。 Next, the behavior identification unit 440 recognizes the behavior of the person based on the verification result in step S104 (step S105). Specifically, when there is no model highly compatible with the situation information data in the usage learning model, the value of the situation information data is recognized as the behavior of the person as the behavior of the person. On the other hand, when there is a model highly compatible with the situation information data in the usage learning model, the value of the model highly compatible with the situation information data is recognized as the behavior of the person.
 記憶部450には、例えば図10にモデル1及びモデル2として示すような複数のモデルを含む用法学習モデルが格納されている。モデル1は、本が閉じた状態であるため状況学習モデルでは「座っているが本を読んでいない」と判断されるところ、本を閉じている時間が短いため「座って本を読んでいる」との再考を促すものである。モデル2は、本が半分閉じた状態であるため状況学習モデルでは「座っているが本を読んでいない」と判断されるところ、本を閉じている時間が短いため「座って本を読んでいる」との再考を促すものである。 The storage unit 450 stores a usage learning model including a plurality of models as shown as model 1 and model 2 in FIG. 10, for example. In model 1, since the book is closed, it is judged in the situation learning model that "it is sitting but not reading the book", but because the time when the book is closed is short, "sitting and reading the book" It encourages reconsideration. In model 2, since the book is half closed, it is judged in the situation learning model that "it is sitting but not reading the book", but because the time when the book is closed is short, "sitting and reading the book" It encourages reconsideration that "is."
 行動識別部440は、各フレームの画像に対応する状況情報データを、記憶部450に格納されている用法学習モデルの各々と比較し、状況情報データに対して最も適合性の高いモデルを用法学習モデルの中から抽出する。そして、状況情報データと抽出したモデルとの適合度に応じて、状況情報データのバリュー及び抽出したモデルのバリューのうちのいずれを適用するのかを決定する。 The action identification unit 440 compares the situation information data corresponding to the image of each frame with each of the usage learning models stored in the storage unit 450, and uses the model most suitable for the situation information data for usage learning. Extract from the model. Then, it is determined whether to apply the value of the situation information data or the value of the extracted model according to the degree of conformity between the situation information data and the extracted model.
 状況情報データと用法学習モデルとの適合性を判断する方法は、特に限定されるものではないが、例えば状況情報データのパターンと用法学習モデルのパターンとの内積値を用いる方法が挙げられる。 The method of determining the suitability of the situation information data and the usage learning model is not particularly limited, but for example, a method of using the inner product value of the pattern of the situation information data and the pattern of the usage learning model can be mentioned.
 以下に、状況情報データのパターンと用法学習モデルのパターンとの内積値を用いて状況情報データと用法学習モデルとの適合性を判断する方法について、図11及び図12を用いて説明する。 Hereinafter, a method of determining the suitability of the situation information data and the usage learning model by using the inner product value of the pattern of the situation information data and the pattern of the usage learning model will be described with reference to FIGS. 11 and 12.
 ここでは説明の簡略化のため、状況情報データ及び用法学習モデルは、第1階層及び第2階層のパターンとして、3×3の行列状に配された9個のセルを含むものとする(図9及び図10を参照)。各セルの値は、0又は1である。人物や物体の状況を示す各要素のレベルに対応するセルの値が1であり、その他のセルの値が0である。図9及び図10では、値が1のセルを黒く塗りつぶしている。 Here, for the sake of simplification of the explanation, it is assumed that the situation information data and the usage learning model include nine cells arranged in a 3 × 3 matrix as a pattern of the first layer and the second layer (FIG. 9 and FIG. 9 and). See FIG. 10). The value of each cell is 0 or 1. The value of the cell corresponding to the level of each element indicating the situation of the person or the object is 1, and the value of the other cells is 0. In FIGS. 9 and 10, cells having a value of 1 are painted black.
 まず、状況情報データの第1階層のパターンと、用法学習モデルの第1階層のパターンとの内積値を算出する(ステップS201)。状況情報データのパターンと用法学習モデルのパターンとの内積値は、同じ座標のセルの値同士を乗算し、各座標の乗算値を合算することにより算出する。例えば、図12に示すように、状況情報データのパターンを構成する各セルの値がA,B,C,D,E,F,G,H,Iであり、比較対象の用法学習モデルのパターンを構成する各セルの値が1,0,0,0,1,0,0,0,1であったものとする。この場合、状況情報データのパターンと用法学習モデルのパターンとの内積値は、A×1+B×0+C×0+D×0+E×1+F×0+G×0+H×0+I×1となる。このように算出した内積値は、状況情報データに含まれるセルのうち値が1であるセルの数で除することにより、正規化する。状況情報データに対する内積値の計算及び正規化の処理は、用法学習モデルに含まれる複数のモデルの各々に対して行う。 First, the inner product value of the pattern of the first layer of the situation information data and the pattern of the first layer of the usage learning model is calculated (step S201). The inner product value of the pattern of the situation information data and the pattern of the usage learning model is calculated by multiplying the values of cells having the same coordinates and adding up the multiplied values of each coordinate. For example, as shown in FIG. 12, the values of the cells constituting the pattern of the situation information data are A, B, C, D, E, F, G, H, and I, and the pattern of the usage learning model to be compared. It is assumed that the value of each cell constituting the above is 1,0,0,0,1,0,0,0,1. In this case, the inner product value of the pattern of the situation information data and the pattern of the usage learning model is A × 1 + B × 0 + C × 0 + D × 0 + E × 1 + F × 0 + G × 0 + H × 0 + I × 1. The inner product value calculated in this way is normalized by dividing by the number of cells having a value of 1 among the cells included in the status information data. The calculation and normalization of the inner product value for the situation information data is performed for each of the plurality of models included in the usage learning model.
 次いで、用法学習モデルの複数のモデルの中から、正規化した内積値が最大であるモデルを抽出し、そのモデルの内積値が所定の閾値以上であるか否かの判定を行う(ステップS202)。正規化した内積値は、その値が大きいほど、状況情報データに対する適合性が高いことを示す。判定に用いられる閾値は、そのモデルを状況情報データに適用することが妥当であるか否かを判断する基準となるものであり、適宜設定することができる。判定の結果、最大の内積値が閾値未満であると判定された場合(ステップS202における「No」)には、ステップS203ヘと移行して、状況情報データのバリューを人物の行動として認識し、ステップS104の処理を終了する。一方、判定の結果、最大の内積値が閾値以上であると判定された場合(ステップS202における「Yes」)には、ステップS204ヘと移行する。 Next, the model having the maximum normalized internal product value is extracted from the plurality of usage learning models, and it is determined whether or not the internal product value of the model is equal to or greater than a predetermined threshold value (step S202). .. The normalized inner product value indicates that the larger the value, the higher the suitability for the situation information data. The threshold value used for the determination is a standard for determining whether or not it is appropriate to apply the model to the situation information data, and can be set as appropriate. If it is determined as a result of the determination that the maximum internal product value is less than the threshold value (“No” in step S202), the process proceeds to step S203, and the value of the situation information data is recognized as the behavior of the person. The process of step S104 ends. On the other hand, if it is determined as a result of the determination that the maximum internal product value is equal to or greater than the threshold value (“Yes” in step S202), the process proceeds to step S204.
 ステップS204では、内積値が最大となるモデルが2つ以上あるか否かの判定を行う。判定の結果、内積値が最大となるモデルが1つだけの場合(ステップS204における「No」)には、ステップS205ヘと移行して、第1階層の内積値が最大となるモデルバリューを人物の行動として認識し、ステップS104の処理を終了する。一方、判定の結果、内積値が最大となるモデルが2つ以上ある場合(ステップS204における「Yes」)には、ステップS206ヘと移行する。 In step S204, it is determined whether or not there are two or more models having the maximum inner product value. As a result of the determination, when there is only one model having the maximum internal product value (“No” in step S204), the process proceeds to step S205, and the model value having the maximum internal product value in the first layer is set as the person. Recognizes as the action of, and ends the process of step S104. On the other hand, as a result of the determination, when there are two or more models having the maximum inner product value (“Yes” in step S204), the process proceeds to step S206.
 ステップS206では、内積値が最大であった2以上のモデルの各々の第2階層のパターンについて、状況情報データの第2階層のパターンに対する内積値の計算及び正規化の処理を行う。なお、内積値の計算及び正規化の処理は、第1階層のパターンに対する処理と同様である。 In step S206, the inner product value is calculated and normalized to the second layer pattern of the status information data for each of the second layer patterns of the two or more models having the maximum inner product value. The process of calculating the inner product value and the normalization is the same as the process for the pattern of the first layer.
 次いで、ステップS207において、内積値が最大となるモデルが2つ以上あるか否かの判定を行う。判定の結果、内積値が最大となるモデルが1つだけの場合(ステップS207における「No」)には、ステップS208ヘと移行して、第2階層の内積値が最大となるモデルのバリューを人物の行動として認識し、ステップS104の処理を終了する。一方、判定の結果、内積値が最大となるモデルが2つ以上ある場合(ステップS207における「Yes」)には、ステップS209ヘと移行する。 Next, in step S207, it is determined whether or not there are two or more models having the maximum inner product value. As a result of the determination, when there is only one model having the maximum inner product value (“No” in step S207), the process proceeds to step S208, and the value of the model having the maximum inner product value in the second layer is set. Recognizing it as a person's action, the process of step S104 ends. On the other hand, as a result of the determination, when there are two or more models having the maximum inner product value (“Yes” in step S207), the process proceeds to step S209.
 ステップS209では、第2階層の内積値が最大であった2以上のモデルの中に、継続時間が所定の時間よりも短い要素(短時間の要素)を含まないモデルが存在するか否かの判定を行う。判定の結果、短時間の要素を含まないモデルが存在しない場合(ステップS209における「No」)には、ステップS210ヘと移行し、前フレームのバリューを人物の行動として認識し、ステップS104の処理を終了する。一方、判定の結果、短時間の要素を含まないモデルが存在する場合(ステップS209における「Yes」)には、ステップS211ヘと移行する。そして、ステップS211において、短時間の要素を含まないモデルのバリューを人物の行動として判定し、ステップS104の処理を終了する。短時間の要素を含まないモデルが複数存在する場合には、最新のモデルを選択する。なお、短時間の要素であるか否かの判定の基準となる所定の時間は、状況を表す複数の要素毎に適宜設定することができる。 In step S209, whether or not there is a model that does not include an element whose duration is shorter than a predetermined time (short-time element) among the two or more models having the maximum inner product value in the second layer. Make a judgment. As a result of the determination, when there is no model that does not include the short-time element (“No” in step S209), the process proceeds to step S210, the value of the previous frame is recognized as the action of the person, and the process of step S104 is performed. To finish. On the other hand, as a result of the determination, if there is a model that does not include the element for a short time (“Yes” in step S209), the process proceeds to step S211. Then, in step S211 the value of the model that does not include the short-time element is determined as the behavior of the person, and the process of step S104 is completed. If there are multiple models that do not include short-term elements, select the latest model. It should be noted that a predetermined time, which is a criterion for determining whether or not the element is a short-time element, can be appropriately set for each of a plurality of elements representing the situation.
 用法学習部400が認識した人物の行動に関する情報は、種々のアクションを実行するための情報として利用することができる。例えば、人物が椅子に座って本を読み始める行動を認識した場合には、照明を点灯する等のアクションを実行することができる。或いは、人物が読書をやめて立ち上がる行動を認識した場合には、照明を消灯する等のアクションを実行することができる。また、用法学習部400が認識した人物の行動に関する情報は、状況学習・識別部300にフィードバックし、ニューラルネットワーク部320の学習に利用してもよい。 The information on the behavior of the person recognized by the usage learning unit 400 can be used as information for executing various actions. For example, when a person recognizes an action of sitting on a chair and starting to read a book, he / she can perform an action such as turning on a light. Alternatively, when the person recognizes the action of stopping reading and standing up, an action such as turning off the light can be executed. Further, the information regarding the behavior of the person recognized by the usage learning unit 400 may be fed back to the situation learning / identification unit 300 and used for learning of the neural network unit 320.
 ディープラーニングを用いた既存の状況認識技術では、例えば、座っている人物と本を認識したら読書をしていると判断するという学習をさせていた場合、読書をやめたことを認識することはできない。また、フレーム単位で学習を行っていた場合、短時間で本を閉じたり開いたりしているときには、その状態ごとに、本を読んでいる、本を読んでいない、といった認識がなされる。これを改善するためには、人物が本を閉じたり開いたりしているときの学習データを大量に準備し、学習を行う必要がある。 With existing situation recognition technology using deep learning, for example, if you are learning to judge that you are reading when you recognize a sitting person and a book, you cannot recognize that you have stopped reading. In addition, when learning is performed in frame units, when the book is closed or opened in a short time, it is recognized that the book is being read or not being read for each state. In order to improve this, it is necessary to prepare a large amount of learning data when a person closes or opens a book and perform learning.
 これに対し、本実施形態による行動認識装置においては、人物が本を閉じたり開いたりしているときの学習データを大量に準備しなくても、その状態のときにコメントを入力して用法学習を行うだけで、状況を適切に学習することができる。したがって、例えば、人物が座って本を読み始め、しばらくすると本を閉じ、読書をやめるというような一連の行動を、簡単な学習で適切に認識することが可能である。 On the other hand, in the behavior recognition device according to the present embodiment, even if a large amount of learning data when a person closes or opens a book is not prepared, a comment is input in that state to learn usage. You can learn the situation properly just by doing. Therefore, for example, a series of actions such as a person sitting down and starting to read a book, closing the book after a while, and stopping reading can be appropriately recognized by simple learning.
 次に、本実施形態による行動認識装置1000のハードウェア構成例について、図13を用いて説明する。図13は、本実施形態による行動認識装置のハードウェア構成例を示す概略図である。 Next, a hardware configuration example of the behavior recognition device 1000 according to the present embodiment will be described with reference to FIG. FIG. 13 is a schematic view showing a hardware configuration example of the action recognition device according to the present embodiment.
 行動認識装置1000は、例えば図13に示すように、一般的な情報処理装置と同様のハードウェア構成によって実現することが可能である。例えば、行動認識装置1000は、CPU(Central Processing Unit)500、主記憶部502、通信部504、入出力インターフェース部506を備え得る。 As shown in FIG. 13, for example, the behavior recognition device 1000 can be realized by a hardware configuration similar to that of a general information processing device. For example, the action recognition device 1000 may include a CPU (Central Processing Unit) 500, a main storage unit 502, a communication unit 504, and an input / output interface unit 506.
 CPU500は、行動認識装置1000の全体的な制御や演算処理を司る制御・演算装置である。主記憶部502は、データの作業領域やデータの一時退避領域に用いられる記憶部であり、RAM(Random Access Memory)等のメモリにより構成され得る。通信部504は、ネットワークを介してデータの送受信を行うためのインターフェースである。入出力インターフェース部506は、外部の出力装置510、入力装置512、記憶装置514等と接続してデータの送受信を行うためのインターフェースである。CPU500、主記憶部502、通信部504及び入出力インターフェース部506は、システムバス508によって相互に接続されている。記憶装置514は、例えばROM(Read Only Memory)、磁気ディスク、半導体メモリ等の不揮発性メモリから構成されるハードディスク装置等によって構成され得る。 The CPU 500 is a control / arithmetic device that controls the overall control and arithmetic processing of the action recognition device 1000. The main storage unit 502 is a storage unit used for a data work area or a data temporary storage area, and may be configured by a memory such as a RAM (Random Access Memory). The communication unit 504 is an interface for transmitting and receiving data via a network. The input / output interface unit 506 is an interface for connecting to an external output device 510, an input device 512, a storage device 514, and the like to transmit and receive data. The CPU 500, the main storage unit 502, the communication unit 504, and the input / output interface unit 506 are connected to each other by the system bus 508. The storage device 514 may be composed of, for example, a hard disk device composed of a ROM (Read Only Memory), a magnetic disk, a non-volatile memory such as a semiconductor memory, or the like.
 主記憶部502は、複数の学習セル46を含むニューラルネットワーク部320を構築し演算を実行するための作業領域として用いることができる。CPU500は、主記憶部502に構築したニューラルネットワーク部320における演算処理を制御する制御部として機能する。記憶装置514には、学習済みの学習セル46に関する情報を含む学習セル情報(状況学習モデル)を保存することができる。また、記憶装置514に記憶された学習セル情報を読み出し、主記憶部502においてニューラルネットワーク部320を構築するように構成することで、様々な状況情報データに対する学習環境を構築することができる。また、用法学習モデルを格納する記憶部450は、記憶装置514によって構成され得る。CPU500は、主記憶部502に構築したニューラルネットワーク部320の複数の学習セル46における演算処理を並列して実行するように構成されていることが望ましい。 The main storage unit 502 can be used as a work area for constructing a neural network unit 320 including a plurality of learning cells 46 and executing an operation. The CPU 500 functions as a control unit that controls arithmetic processing in the neural network unit 320 constructed in the main storage unit 502. The storage device 514 can store the learning cell information (situation learning model) including the information about the learned learning cell 46. Further, by reading the learning cell information stored in the storage device 514 and configuring the main storage unit 502 to construct the neural network unit 320, it is possible to construct a learning environment for various situation information data. Further, the storage unit 450 for storing the usage learning model may be configured by the storage device 514. It is desirable that the CPU 500 is configured to execute arithmetic processing in a plurality of learning cells 46 of the neural network unit 320 constructed in the main storage unit 502 in parallel.
 通信部504は、イーサネット(登録商標)、Wi-Fi(登録商標)等の規格に基づく通信インターフェースであり、他の装置との通信を行うためのモジュールである。学習セル情報は、通信部504を介して他の装置から受信するようにしてもよい。例えば、頻繁に使用する学習セル情報は記憶装置514に記憶しておき、使用頻度の低い学習セル情報は他の装置から読み込むように構成することができる。 The communication unit 504 is a communication interface based on standards such as Ethernet (registered trademark) and Wi-Fi (registered trademark), and is a module for communicating with other devices. The learning cell information may be received from another device via the communication unit 504. For example, frequently used learning cell information can be stored in the storage device 514, and less frequently used learning cell information can be configured to be read from another device.
 出力装置510は、例えば液晶表示装置等のディスプレイを含む。出力装置510は、用法学習部400の学習時にユーザに対して状況情報データや状況学習・識別部300により推定された行動に関する情報を提示するための表示装置として利用可能である。また、ユーザへの学習結果や行動決定の通知は、出力装置510を介して行うことができる。入力装置512は、キーボード、マウス、タッチパネル等であって、ユーザが行動認識装置1000に所定の情報、例えば用法学習部400の学習時におけるユーザエピソードを入力するために用いられる。 The output device 510 includes a display such as a liquid crystal display device. The output device 510 can be used as a display device for presenting the situation information data and the information on the behavior estimated by the situation learning / identification unit 300 to the user at the time of learning the usage learning unit 400. Further, the user can be notified of the learning result and the action decision via the output device 510. The input device 512 is a keyboard, a mouse, a touch panel, or the like, and is used for the user to input predetermined information to the action recognition device 1000, for example, a user episode at the time of learning of the usage learning unit 400.
 状況情報データは、通信部504を介して他の装置から読み込むように構成することもできる。或いは、入力装置512を、状況情報データを入力するための手段として用いることもできる。 The status information data can also be configured to be read from another device via the communication unit 504. Alternatively, the input device 512 can be used as a means for inputting the situation information data.
 本実施形態による行動認識装置1000の各部の機能は、プログラムを組み込んだLSI(Large Scale Integration)等のハードウェア部品である回路部品を実装することにより、ハードウェア的に実現することができる。或いは、その機能を提供するプログラムを、記憶装置514に格納し、そのプログラムを主記憶部502にロードしてCPU500で実行することにより、ソフトウェア的に実現することも可能である。 The functions of each part of the action recognition device 1000 according to the present embodiment can be realized in terms of hardware by mounting circuit components that are hardware components such as LSI (Large Scale Integration) in which a program is incorporated. Alternatively, a program that provides the function can be stored in the storage device 514, loaded into the main storage unit 502, and executed by the CPU 500, so that the program can be realized by software.
 また、図1に示す行動認識装置1000の構成は、必ずしも独立した1つの装置として構成されている必要はない。例えば、画像取得部100、状況把握部200、状況学習・識別部300及び用法学習部400のうちの一部、例えば状況学習・識別部300及び用法学習部400をクラウド上に配し、これらによって行動認識システムを構築するようにしてもよい。 Further, the configuration of the action recognition device 1000 shown in FIG. 1 does not necessarily have to be configured as one independent device. For example, a part of the image acquisition unit 100, the situation grasping unit 200, the situation learning / identification unit 300, and the usage learning unit 400, for example, the situation learning / identification unit 300 and the usage learning unit 400 are arranged on the cloud, and by these. The behavior recognition system may be constructed.
 このように、本実施形態によれば、画像に写る人物の行動をより簡単なアルゴリズムで且つより高い精度で認識することが可能となる。 As described above, according to the present embodiment, it is possible to recognize the behavior of the person in the image with a simpler algorithm and with higher accuracy.
 [第2実施形態]
 本発明の第2実施形態による行動認識装置について、図14を用いて説明する。第1実施形態による行動認識装置と同様の構成要素には同一の符号を付し、説明を省略し或いは簡潔にする。図14は、本実施形態による行動認識装置の構成例を示す概略図である。
[Second Embodiment]
The behavior recognition device according to the second embodiment of the present invention will be described with reference to FIG. The same components as those of the behavior recognition device according to the first embodiment are designated by the same reference numerals, and the description thereof will be omitted or simplified. FIG. 14 is a schematic view showing a configuration example of the action recognition device according to the present embodiment.
 本実施形態による行動認識装置1000は、図14に示すように、状況情報データ生成部310と、行動識別部440と、記憶部450と、を有している。 As shown in FIG. 14, the action recognition device 1000 according to the present embodiment includes a situation information data generation unit 310, an action identification unit 440, and a storage unit 450.
 状況情報データ生成部310は、人物を含む被写体の画像における被写体の状況に基づいて、状況情報データを生成する機能を備える。記憶部450は、用法学習モデルを格納する。行動識別部440は、状況情報データと用法学習モデルとに基づいて人物の行動を識別する機能を備える。 The situation information data generation unit 310 has a function of generating situation information data based on the situation of the subject in the image of the subject including a person. The storage unit 450 stores the usage learning model. The behavior identification unit 440 has a function of identifying a person's behavior based on situation information data and a usage learning model.
 状況情報データ生成部は、状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、状況から推定される人物の行動と、が紐付けられた状況情報データを生成する。 The situation information data generation unit maps the relationship between the first pattern that maps the relationship between a plurality of elements representing the situation and the information that represents their degree, and the first pattern that maps the relationship between the plurality of elements and the information that represents their duration. The situation information data in which the pattern 2 and the behavior of the person estimated from the situation are linked is generated.
 用法学習モデルは、特定の状況に対し、複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む。 The usage learning model mapped the relationship between the third pattern, which maps the relationship between multiple elements and the information representing their degree, and the relationship between the plurality of elements and the information representing their duration, for a specific situation. It includes a plurality of models in which the fourth pattern and the behavior of the person estimated from a specific situation are associated with each other.
 行動識別部は、用法学習モデルの複数のモデルのうち、状況情報データに対して最も適合度の高いモデルを抽出する。そして、抽出したモデルの適合度が所定の閾値以上の場合には、抽出したモデルが推定する行動を前記人物の行動と判定する。また、抽出したモデルの適合度が所定の閾値未満の場合には、状況情報データが推定する行動を人物の行動と判定する。 The behavior identification unit extracts the model with the highest degree of suitability for the situation information data from among the multiple models of the usage learning model. Then, when the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person. When the goodness of fit of the extracted model is less than a predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of a person.
 このように、本実施形態によれば、画像に写る人物の行動をより簡単なアルゴリズムで且つより高い精度で認識することが可能となる。 As described above, according to the present embodiment, it is possible to recognize the behavior of the person in the image with a simpler algorithm and with higher accuracy.
 [変形実施形態]
 本発明は、上記実施形態に限らず種々の変形が可能である。
[Modification Embodiment]
The present invention is not limited to the above embodiment and can be modified in various ways.
 例えば、いずれかの実施形態の一部の構成を他の実施形態に追加した例や、他の実施形態の一部の構成と置換した例も、本発明の実施形態である。 For example, an example in which a part of the configuration of any of the embodiments is added to another embodiment or an example in which a part of the configuration of another embodiment is replaced with another embodiment is also an embodiment of the present invention.
 また、上記実施形態では、本発明の適用例として人物が椅子に座って読書している行動を例に挙げて説明したが、本発明は画像に写る人物の様々な行動の認識に広く適用することができる。 Further, in the above embodiment, the behavior of a person sitting on a chair and reading a book has been described as an example of application of the present invention, but the present invention is widely applied to the recognition of various behaviors of a person appearing in an image. be able to.
 また、上述の実施形態の機能を実現するように該実施形態の構成を動作させるプログラムを記録媒体に記録させ、該記録媒体に記録されたプログラムをコードとして読み出し、コンピュータにおいて実行する処理方法も各実施形態の範疇に含まれる。すなわち、コンピュータ読取可能な記録媒体も各実施形態の範囲に含まれる。また、上述のプログラムが記録された記録媒体はもちろん、そのプログラム自体も各実施形態に含まれる。 Further, there are also processing methods in which a program for operating the configuration of the embodiment is recorded on a recording medium so as to realize the functions of the above-described embodiment, the program recorded on the recording medium is read out as a code, and the program is executed by a computer. Included in the category of embodiments. That is, a computer-readable recording medium is also included in the scope of each embodiment. Further, not only the recording medium on which the above-mentioned program is recorded but also the program itself is included in each embodiment.
 該記録媒体としては例えばフロッピー(登録商標)ディスク、ハードディスク、光ディスク、光磁気ディスク、CD-ROM、磁気テープ、不揮発性メモリカード、ROMを用いることができる。また該記録媒体に記録されたプログラム単体で処理を実行しているものに限らず、他のソフトウェア、拡張ボードの機能と共同して、OS上で動作して処理を実行するものも各実施形態の範疇に含まれる。 As the recording medium, for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, a non-volatile memory card, or a ROM can be used. Further, not only the program that executes the processing by the program recorded on the recording medium alone, but also the one that operates on the OS and executes the processing in cooperation with the functions of other software and the expansion board is also in each embodiment. It is included in the category of.
 上記実施形態は、いずれも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならない。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above embodiments are merely examples of embodiment of the present invention, and the technical scope of the present invention should not be construed in a limited manner by these. That is, the present invention can be implemented in various forms without departing from the technical idea or its main features.
 上記実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Part or all of the above embodiment may be described as in the following appendix, but is not limited to the following.
 (付記1)
 人物を含む被写体の画像における前記被写体の状況に基づいて、状況情報データを生成する状況情報データ生成部と、
 用法学習モデルを格納する記憶部と、
 前記状況情報データと前記用法学習モデルとに基づいて前記人物の行動を識別する行動識別部と、を有し、
 前記状況情報データ生成部は、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた前記状況情報データを生成し、
 前記用法学習モデルは、特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含み、
 前記行動識別部は、前記用法学習モデルの前記複数のモデルのうち、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する
 ことを特徴とする行動認識装置。
(Appendix 1)
A situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, and
A storage unit that stores the usage learning model,
It has a behavior identification unit that identifies the behavior of the person based on the situation information data and the usage learning model.
The situation information data generation unit determines the relationship between the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, and the relationship between the plurality of elements and the information representing their duration. The situation information data in which the mapped second pattern and the behavior of the person estimated from the situation are linked is generated.
The usage learning model has a relationship between a third pattern that maps the relationship between the plurality of elements and information representing their degree, and information representing the plurality of elements and their duration for a specific situation. Includes a plurality of models in which a fourth pattern of mapping and the behavior of the person inferred from the particular situation are associated.
The behavior identification unit extracts the model having the highest degree of conformity with respect to the situation information data from the plurality of models of the usage learning model, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value. The behavior estimated by the extracted model is determined to be the behavior of the person, and when the degree of suitability of the extracted model is less than the predetermined threshold, the behavior estimated by the situation information data is determined to be the behavior of the person. A behavior recognition device characterized by this.
 (付記2)
 前記行動識別部は、前記用法学習モデルの前記複数のモデルのうち、前記状況情報データの前記第1のパターンに対して最も適合度の高い前記第3のパターンを含むモデルを抽出する
 ことを特徴とする付記1記載の行動認識装置。
(Appendix 2)
The behavior identification unit is characterized in that, among the plurality of models of the usage learning model, a model including the third pattern having the highest degree of conformity with the first pattern of the situation information data is extracted. The action recognition device according to Appendix 1.
 (付記3)
 前記行動識別部は、前記第1のパターンの各要素値と前記第3のパターンの各要素値との間の内積値が大きいほど、前記第1のパターンに対する前記第3のパターンの適合度が高いと判定する
 ことを特徴とする付記2記載の行動認識装置。
(Appendix 3)
In the action identification unit, the larger the inner product value between each element value of the first pattern and each element value of the third pattern, the better the goodness of fit of the third pattern with respect to the first pattern. The action recognition device according to Appendix 2, characterized in that it is determined to be high.
 (付記4)
 前記行動識別部は、前記状況情報データの前記第1のパターンに対して最も適合度の高い前記第3のパターンを含むモデルが複数存在する場合は、前記最も適合度の高い前記第3のパターンを含むモデルの中から、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルを抽出する
 ことを特徴とする付記2又は3記載の行動認識装置。
(Appendix 4)
When there are a plurality of models including the third pattern having the highest degree of conformity with respect to the first pattern of the situation information data, the action identification unit has the third pattern having the highest degree of conformity. The action recognition device according to Appendix 2 or 3, wherein a model including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data is extracted from the model including the above. ..
 (付記5)
 前記行動識別部は、前記第2のパターンの各要素値と前記第4のパターンの各要素値との間の内積値が大きいほど、前記第2のパターンに対する前記第4のパターンの適合度が高いと判定する
 ことを特徴とする付記4記載の行動認識装置。
(Appendix 5)
In the action identification unit, the larger the inner product value between each element value of the second pattern and each element value of the fourth pattern, the better the goodness of fit of the fourth pattern with respect to the second pattern. The action recognition device according to Appendix 4, wherein the behavior recognition device is determined to be high.
 (付記6)
 前記行動識別部は、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルが複数存在し、且つ、前記最も適合度の高い前記第4のパターンを含む複数のモデルの中に、前記継続時間が所定の時間よりも短い要素を含むモデルが存在する場合は、前記最も適合度の高い前記第4のパターンを含むモデルの中から、前記継続時間が前記所定の時間よりも短い要素を含まないモデルを抽出する
 ことを特徴とする付記4又は5記載の行動認識装置。
(Appendix 6)
The action identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. When there is a model including an element whose duration is shorter than a predetermined time among a plurality of models including, the duration is selected from the models including the fourth pattern having the highest goodness of fit. The action recognition device according to Appendix 4 or 5, wherein the model does not include an element shorter than the predetermined time.
 (付記7)
 前記行動識別部は、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルが複数存在し、且つ、前記最も適合度の高い前記第4のパターンを含む複数のモデルの総てが、前記継続時間が所定の時間よりも短い要素を含む場合は、前フレームにおいて適用した行動を当フレームにおける行動と判定する
 ことを特徴とする付記4又は5記載の行動認識装置。
(Appendix 7)
The behavior identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. If all of the plurality of models including the above include an element whose duration is shorter than a predetermined time, the action applied in the previous frame is determined to be the action in the current frame. Behavior recognition device.
 (付記8)
 前記複数のモデルの各々が推定する前記行動に関する情報は、前記特定の状況に応じた評価としてユーザから与えられた情報である
 ことを特徴とする付記1乃至7のいずれか1項に記載の行動認識装置。
(Appendix 8)
The behavior according to any one of Appendix 1 to 7, wherein the information regarding the behavior estimated by each of the plurality of models is information given by the user as an evaluation according to the specific situation. Recognition device.
 (付記9)
 前記画像は、複数のフレームの画像を含む動画像であり、
 前記状況情報データ生成部は、前記複数のフレームの画像の各々に対して、前記状況情報データを生成する
 ことを特徴とする付記1乃至8のいずれか1項に記載の行動認識装置。
(Appendix 9)
The image is a moving image including images of a plurality of frames.
The action recognition device according to any one of Supplementary note 1 to 8, wherein the situation information data generation unit generates the situation information data for each of the images of the plurality of frames.
 (付記10)
 前記画像における前記被写体の状況に基づいて、前記状況から推定される前記人物の行動を学習する状況学習部を更に有し、
 前記状況学習部は、
  前記状況を表す前記複数の要素の各々の要素値が学習対象データとして入力されるニューラルネットワーク部と、
  前記ニューラルネットワーク部の学習を行う学習部と、を有し、
 前記ニューラルネットワーク部は、前記複数の要素値の各々に所定の重み付けをする複数の入力ノードと、重み付けをした前記複数の要素値を加算して出力する出力ノードと、を各々が含む複数の学習セルを有し、
 前記学習部は、前記学習セルの出力値に応じて、前記学習セルの前記複数の入力ノードの重み付け係数を更新し、又は、前記ニューラルネットワーク部に新たな学習セルを追加する
 ことを特徴とする付記1乃至9のいずれか1項に記載の行動認識装置。
(Appendix 10)
Further having a situation learning unit that learns the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation learning department
A neural network unit in which element values of each of the plurality of elements representing the situation are input as learning target data, and
It has a learning unit that learns the neural network unit, and
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
The learning unit is characterized in that the weighting coefficients of the plurality of input nodes of the learning cell are updated or a new learning cell is added to the neural network unit according to the output value of the learning cell. The action recognition device according to any one of Appendix 1 to 9.
 (付記11)
 前記学習部は、前記複数の要素値と前記学習セルの出力値との間の相関値が所定の閾値以上の場合に、前記学習セルの前記複数の入力ノードの前記重み付け係数を更新する
 ことを特徴とする付記10記載の行動認識装置。
(Appendix 11)
The learning unit updates the weighting coefficient of the plurality of input nodes of the learning cell when the correlation value between the plurality of element values and the output value of the learning cell is equal to or greater than a predetermined threshold value. The behavior recognition device according to Appendix 10, which is a feature.
 (付記12)
 前記画像における前記被写体の状況に基づいて、前記状況から推定される前記人物の行動を識別する状況識別部を更に有し、
 前記状況識別部は、
  前記状況を表す前記複数の要素の各々の要素値が識別対象データとして入力されるニューラルネットワーク部と、
  前記ニューラルネットワーク部の出力に基づき前記識別対象データを識別する識別部と、を有し、
 前記ニューラルネットワーク部は、前記複数の要素値の各々に所定の重み付けをする複数の入力ノードと、重み付けをした前記複数の要素値を加算して出力する出力ノードと、を各々が含む複数の学習セルを有し、
 前記複数の学習セルの各々は、教師情報を示す複数のカテゴリのうちのいずれかに紐付けられており、
 前記学習セルの前記複数の入力ノードは、前記複数の要素値の各々が対応するカテゴリに応じた所定の重みで入力されるように構成されており、
 前記識別部は、前記学習セルの出力値と前記学習セルに紐付けられたカテゴリとに基づいて、前記識別対象データの属するカテゴリを、前記状況から推定される前記人物の行動と推定し、
 前記状況情報データ生成部は、前記状況識別部が推定した結果をもとに前記状況情報データを生成する
 ことを特徴とする付記1乃至9のいずれか1項に記載の行動認識装置。
(Appendix 12)
Further, it has a situation identification unit that identifies the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation identification unit
A neural network unit in which element values of each of the plurality of elements representing the situation are input as identification target data, and
It has an identification unit that identifies the identification target data based on the output of the neural network unit.
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
Each of the plurality of learning cells is associated with one of a plurality of categories indicating teacher information.
The plurality of input nodes of the learning cell are configured so that each of the plurality of element values is input with a predetermined weight according to the corresponding category.
Based on the output value of the learning cell and the category associated with the learning cell, the identification unit estimates the category to which the identification target data belongs as the behavior of the person estimated from the situation.
The behavior recognition device according to any one of Supplementary Provisions 1 to 9, wherein the situation information data generation unit generates the situation information data based on a result estimated by the situation identification unit.
 (付記13)
 前記識別部は、前記複数の要素値と前記学習セルの出力値との間の相関値が最も大きい前記学習セルに紐付けられたカテゴリを、前記状況から推定される前記人物の行動と推定する
 ことを特徴とする付記12記載の行動認識装置。
(Appendix 13)
The identification unit estimates the category associated with the learning cell having the largest correlation value between the plurality of element values and the output value of the learning cell as the behavior of the person estimated from the situation. The behavior recognition device according to Appendix 12, wherein the behavior recognition device is characterized by the above.
 (付記14)
 人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成し、
 特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、
 抽出したモデルの適合度が所定の閾値以上の場合には、前記抽出したモデルが推定する行動を前記人物の行動と判定し、
 前記抽出したモデルの適合度が前記所定の閾値未満の場合には、前記状況情報データが推定する行動を前記人物の行動と判定する
 ことを特徴とする行動認識方法。
(Appendix 14)
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations Generates situation information data in which the second pattern mapping the relationship with the information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. From the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, the model having the highest degree of suitability for the situation information data is extracted. ,
When the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person.
A behavior recognition method characterized in that when the goodness of fit of the extracted model is less than the predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of the person.
 (付記15)
 コンピュータを、
  人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成する手段、
  特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルを格納する手段、及び
 前記用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する手段、
 として機能させるプログラム。
(Appendix 15)
Computer,
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations A means for generating situation information data in which a second pattern mapping a relationship with information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. For the situation information data from the means for storing the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, and the usage learning model. The model with the highest degree of conformity is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is determined. When is less than the predetermined threshold value, the means for determining the behavior estimated by the situation information data as the behavior of the person.
A program that functions as.
 (付記16)
 付記15記載のプログラムを記録したコンピュータが読み取り可能な記録媒体。
(Appendix 16)
A computer-readable recording medium on which the program described in Appendix 15 is recorded.
 この出願は、2020年1月17日に出願された日本出願特願2020-005536を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese application Japanese Patent Application No. 2020-005536 filed on January 17, 2020, and incorporates all of its disclosures herein.
42,44…セル
46…学習セル
100…画像取得部
200…状況把握部
300…状況学習・識別部
310…状況情報データ生成部310
320…ニューラルネットワーク部
330…判定部
340…学習部
342…重み修正部
344…学習セル生成部
350…識別部
360…出力部
400…用法学習部
410…状況情報データ取得部
420…評価取得部
430…用法学習モデル生成部
440…行動識別部
450…記憶部
500…CPU
502…主記憶部
504…通信部
506…入出力インターフェース部
508…システムバス
510…出力装置
512…入力装置
514…記憶装置
42, 44 ... Cell 46 ... Learning cell 100 ... Image acquisition unit 200 ... Situation grasping unit 300 ... Situation learning / identification unit 310 ... Situation information data generation unit 310
320 ... Neural network unit 330 ... Judgment unit 340 ... Learning unit 342 ... Weight correction unit 344 ... Learning cell generation unit 350 ... Identification unit 360 ... Output unit 400 ... Usage learning unit 410 ... Situation information data acquisition unit 420 ... Evaluation acquisition unit 430 ... Usage learning model generation unit 440 ... Behavior identification unit 450 ... Storage unit 500 ... CPU
502 ... Main storage unit 504 ... Communication unit 506 ... Input / output interface unit 508 ... System bus 510 ... Output device 512 ... Input device 514 ... Storage device

Claims (16)

  1.  人物を含む被写体の画像における前記被写体の状況に基づいて、状況情報データを生成する状況情報データ生成部と、
     用法学習モデルを格納する記憶部と、
     前記状況情報データと前記用法学習モデルとに基づいて前記人物の行動を識別する行動識別部と、を有し、
     前記状況情報データ生成部は、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた前記状況情報データを生成し、
     前記用法学習モデルは、特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含み、
     前記行動識別部は、前記用法学習モデルの前記複数のモデルのうち、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する
     ことを特徴とする行動認識装置。
    A situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, and
    A storage unit that stores the usage learning model,
    It has a behavior identification unit that identifies the behavior of the person based on the situation information data and the usage learning model.
    The situation information data generation unit determines the relationship between the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, and the relationship between the plurality of elements and the information representing their duration. The situation information data in which the mapped second pattern and the behavior of the person estimated from the situation are linked is generated.
    The usage learning model has a relationship between a third pattern that maps the relationship between the plurality of elements and information representing their degree, and information representing the plurality of elements and their duration for a specific situation. Includes a plurality of models in which a fourth pattern of mapping and the behavior of the person inferred from the particular situation are associated.
    The behavior identification unit extracts the model having the highest degree of conformity with respect to the situation information data from the plurality of models of the usage learning model, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value. The behavior estimated by the extracted model is determined to be the behavior of the person, and when the degree of suitability of the extracted model is less than the predetermined threshold, the behavior estimated by the situation information data is determined to be the behavior of the person. A behavior recognition device characterized by this.
  2.  前記行動識別部は、前記用法学習モデルの前記複数のモデルのうち、前記状況情報データの前記第1のパターンに対して最も適合度の高い前記第3のパターンを含むモデルを抽出する
     ことを特徴とする請求項1記載の行動認識装置。
    The behavior identification unit is characterized in that, among the plurality of models of the usage learning model, a model including the third pattern having the highest degree of conformity with the first pattern of the situation information data is extracted. The behavior recognition device according to claim 1.
  3.  前記行動識別部は、前記第1のパターンの各要素値と前記第3のパターンの各要素値との間の内積値が大きいほど、前記第1のパターンに対する前記第3のパターンの適合度が高いと判定する
     ことを特徴とする請求項2記載の行動認識装置。
    In the action identification unit, the larger the inner product value between each element value of the first pattern and each element value of the third pattern, the better the goodness of fit of the third pattern with respect to the first pattern. The behavior recognition device according to claim 2, wherein the behavior recognition device is determined to be high.
  4.  前記行動識別部は、前記状況情報データの前記第1のパターンに対して最も適合度の高い前記第3のパターンを含むモデルが複数存在する場合は、前記最も適合度の高い前記第3のパターンを含むモデルの中から、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルを抽出する
     ことを特徴とする請求項2又は3記載の行動認識装置。
    When there are a plurality of models including the third pattern having the highest degree of conformity with respect to the first pattern of the situation information data, the action identification unit has the third pattern having the highest degree of conformity. The behavior recognition according to claim 2 or 3, wherein a model including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data is extracted from the model including the above. Device.
  5.  前記行動識別部は、前記第2のパターンの各要素値と前記第4のパターンの各要素値との間の内積値が大きいほど、前記第2のパターンに対する前記第4のパターンの適合度が高いと判定する
     ことを特徴とする請求項4記載の行動認識装置。
    In the action identification unit, the larger the inner product value between each element value of the second pattern and each element value of the fourth pattern, the better the goodness of fit of the fourth pattern with respect to the second pattern. The behavior recognition device according to claim 4, wherein the behavior recognition device is determined to be high.
  6.  前記行動識別部は、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルが複数存在し、且つ、前記最も適合度の高い前記第4のパターンを含む複数のモデルの中に、前記継続時間が所定の時間よりも短い要素を含むモデルが存在する場合は、前記最も適合度の高い前記第4のパターンを含むモデルの中から、前記継続時間が前記所定の時間よりも短い要素を含まないモデルを抽出する
     ことを特徴とする請求項4又は5記載の行動認識装置。
    The action identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. When there is a model including an element whose duration is shorter than a predetermined time among a plurality of models including, the duration is selected from the models including the fourth pattern having the highest goodness of fit. The behavior recognition device according to claim 4 or 5, wherein the model is extracted without an element shorter than the predetermined time.
  7.  前記行動識別部は、前記状況情報データの前記第2のパターンに対して最も適合度の高い前記第4のパターンを含むモデルが複数存在し、且つ、前記最も適合度の高い前記第4のパターンを含む複数のモデルの総てが、前記継続時間が所定の時間よりも短い要素を含む場合は、前フレームにおいて適用した行動を当フレームにおける行動と判定する
     ことを特徴とする請求項4又は5記載の行動認識装置。
    The behavior identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. Claim 4 or 5 is characterized in that, when all of the plurality of models including the above include an element whose duration is shorter than a predetermined time, the action applied in the previous frame is determined to be the action in the current frame. The described behavior recognition device.
  8.  前記複数のモデルの各々が推定する前記行動に関する情報は、前記特定の状況に応じた評価としてユーザから与えられた情報である
     ことを特徴とする請求項1乃至7のいずれか1項に記載の行動認識装置。
    The information according to any one of claims 1 to 7, wherein the information regarding the behavior estimated by each of the plurality of models is information given by the user as an evaluation according to the specific situation. Behavior recognition device.
  9.  前記画像は、複数のフレームの画像を含む動画像であり、
     前記状況情報データ生成部は、前記複数のフレームの画像の各々に対して、前記状況情報データを生成する
     ことを特徴とする請求項1乃至8のいずれか1項に記載の行動認識装置。
    The image is a moving image including images of a plurality of frames.
    The behavior recognition device according to any one of claims 1 to 8, wherein the situation information data generation unit generates the situation information data for each of the images of the plurality of frames.
  10.  前記画像における前記被写体の状況に基づいて、前記状況から推定される前記人物の行動を学習する状況学習部を更に有し、
     前記状況学習部は、
      前記状況を表す前記複数の要素の各々の要素値が学習対象データとして入力されるニューラルネットワーク部と、
      前記ニューラルネットワーク部の学習を行う学習部と、を有し、
     前記ニューラルネットワーク部は、前記複数の要素値の各々に所定の重み付けをする複数の入力ノードと、重み付けをした前記複数の要素値を加算して出力する出力ノードと、を各々が含む複数の学習セルを有し、
     前記学習部は、前記学習セルの出力値に応じて、前記学習セルの前記複数の入力ノードの重み付け係数を更新し、又は、前記ニューラルネットワーク部に新たな学習セルを追加する
     ことを特徴とする請求項1乃至9のいずれか1項に記載の行動認識装置。
    Further having a situation learning unit that learns the behavior of the person estimated from the situation based on the situation of the subject in the image.
    The situation learning department
    A neural network unit in which element values of each of the plurality of elements representing the situation are input as learning target data, and
    It has a learning unit that learns the neural network unit, and
    Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
    The learning unit is characterized in that the weighting coefficients of the plurality of input nodes of the learning cell are updated or a new learning cell is added to the neural network unit according to the output value of the learning cell. The behavior recognition device according to any one of claims 1 to 9.
  11.  前記学習部は、前記複数の要素値と前記学習セルの出力値との間の相関値が所定の閾値以上の場合に、前記学習セルの前記複数の入力ノードの前記重み付け係数を更新する
     ことを特徴とする請求項10記載の行動認識装置。
    The learning unit updates the weighting coefficient of the plurality of input nodes of the learning cell when the correlation value between the plurality of element values and the output value of the learning cell is equal to or greater than a predetermined threshold value. The behavior recognition device according to claim 10, wherein the behavior recognition device is characterized.
  12.  前記画像における前記被写体の状況に基づいて、前記状況から推定される前記人物の行動を識別する状況識別部を更に有し、
     前記状況識別部は、
      前記状況を表す前記複数の要素の各々の要素値が識別対象データとして入力されるニューラルネットワーク部と、
      前記ニューラルネットワーク部の出力に基づき前記識別対象データを識別する識別部と、を有し、
     前記ニューラルネットワーク部は、前記複数の要素値の各々に所定の重み付けをする複数の入力ノードと、重み付けをした前記複数の要素値を加算して出力する出力ノードと、を各々が含む複数の学習セルを有し、
     前記複数の学習セルの各々は、教師情報を示す複数のカテゴリのうちのいずれかに紐付けられており、
     前記学習セルの前記複数の入力ノードは、前記複数の要素値の各々が対応するカテゴリに応じた所定の重みで入力されるように構成されており、
     前記識別部は、前記学習セルの出力値と前記学習セルに紐付けられたカテゴリとに基づいて、前記識別対象データの属するカテゴリを、前記状況から推定される前記人物の行動と推定し、
     前記状況情報データ生成部は、前記状況識別部が推定した結果をもとに前記状況情報データを生成する
     ことを特徴とする請求項1乃至9のいずれか1項に記載の行動認識装置。
    Further, it has a situation identification unit that identifies the behavior of the person estimated from the situation based on the situation of the subject in the image.
    The situation identification unit
    A neural network unit in which element values of each of the plurality of elements representing the situation are input as identification target data, and
    It has an identification unit that identifies the identification target data based on the output of the neural network unit.
    Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
    Each of the plurality of learning cells is associated with one of a plurality of categories indicating teacher information.
    The plurality of input nodes of the learning cell are configured so that each of the plurality of element values is input with a predetermined weight according to the corresponding category.
    Based on the output value of the learning cell and the category associated with the learning cell, the identification unit estimates the category to which the identification target data belongs as the behavior of the person estimated from the situation.
    The behavior recognition device according to any one of claims 1 to 9, wherein the situation information data generation unit generates the situation information data based on a result estimated by the situation identification unit.
  13.  前記識別部は、前記複数の要素値と前記学習セルの出力値との間の相関値が最も大きい前記学習セルに紐付けられたカテゴリを、前記状況から推定される前記人物の行動と推定する
     ことを特徴とする請求項12記載の行動認識装置。
    The identification unit estimates the category associated with the learning cell having the largest correlation value between the plurality of element values and the output value of the learning cell as the behavior of the person estimated from the situation. 12. The behavior recognition device according to claim 12.
  14.  人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成し、
     特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、
     抽出したモデルの適合度が所定の閾値以上の場合には、前記抽出したモデルが推定する行動を前記人物の行動と判定し、
     前記抽出したモデルの適合度が前記所定の閾値未満の場合には、前記状況情報データが推定する行動を前記人物の行動と判定する
     ことを特徴とする行動認識方法。
    Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations Generates situation information data in which the second pattern mapping the relationship with the information representing the above and the behavior of the person estimated from the situation are associated with each other.
    A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. From the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, the model having the highest degree of suitability for the situation information data is extracted. ,
    When the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person.
    A behavior recognition method characterized in that when the goodness of fit of the extracted model is less than the predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of the person.
  15.  コンピュータを、
      人物を含む被写体の画像における前記被写体の状況に基づいて、前記状況を表す複数の要素とそれらの度合を表す情報との関係をマッピングした第1のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第2のパターンと、前記状況から推定される前記人物の行動と、が紐付けられた状況情報データを生成する手段、
      特定の状況に対し、前記複数の要素とそれらの度合を表す情報との関係をマッピングした第3のパターンと、前記複数の要素とそれらの継続時間を表す情報との関係をマッピングした第4のパターンと、前記特定の状況から推定される前記人物の行動と、が紐付けられた複数のモデルを含む用法学習モデルを格納する手段、及び
     前記用法学習モデルの中から、前記状況情報データに対して最も適合度の高いモデルを抽出し、抽出したモデルの適合度が所定の閾値以上の場合には前記抽出したモデルが推定する行動を前記人物の行動と判定し、前記抽出したモデルの適合度が前記所定の閾値未満の場合には前記状況情報データが推定する行動を前記人物の行動と判定する手段、
     として機能させるプログラム。
    Computer,
    Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations A means for generating situation information data in which a second pattern mapping a relationship with information representing the above and the behavior of the person estimated from the situation are associated with each other.
    A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. For the situation information data from the means for storing the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, and the usage learning model. The model with the highest degree of conformity is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is determined. When is less than the predetermined threshold value, the means for determining the behavior estimated by the situation information data as the behavior of the person.
    A program that functions as.
  16.  請求項15記載のプログラムを記録したコンピュータが読み取り可能な記録媒体。 A computer-readable recording medium on which the program according to claim 15 is recorded.
PCT/JP2020/048361 2020-01-17 2020-12-24 Behavior recognition device, behavior recognition method, program, and recording medium WO2021145185A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021571127A JP7231286B2 (en) 2020-01-17 2020-12-24 Action recognition device, action recognition method, program and recording medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-005536 2020-01-17
JP2020005536 2020-01-17

Publications (1)

Publication Number Publication Date
WO2021145185A1 true WO2021145185A1 (en) 2021-07-22

Family

ID=76863684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/048361 WO2021145185A1 (en) 2020-01-17 2020-12-24 Behavior recognition device, behavior recognition method, program, and recording medium

Country Status (2)

Country Link
JP (1) JP7231286B2 (en)
WO (1) WO2021145185A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019128804A (en) * 2018-01-24 2019-08-01 株式会社日立製作所 Identification system and identification method
WO2019240047A1 (en) * 2018-06-11 2019-12-19 Necソリューションイノベータ株式会社 Behavior learning device, behavior learning method, behavior learning system, program, and recording medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019128804A (en) * 2018-01-24 2019-08-01 株式会社日立製作所 Identification system and identification method
WO2019240047A1 (en) * 2018-06-11 2019-12-19 Necソリューションイノベータ株式会社 Behavior learning device, behavior learning method, behavior learning system, program, and recording medium

Also Published As

Publication number Publication date
JP7231286B2 (en) 2023-03-01
JPWO2021145185A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
CN111291841B (en) Image recognition model training method and device, computer equipment and storage medium
CN105513591B (en) The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model
WO2020108474A1 (en) Picture classification method, classification identification model generation method and apparatus, device, and medium
KR20180057096A (en) Device and method to perform recognizing and training face expression
JP6908302B2 (en) Learning device, identification device and program
CN110472594A (en) Method for tracking target, information insertion method and equipment
CN110705428B (en) Facial age recognition system and method based on impulse neural network
CN106104568A (en) Nictation in photographs and transfer are watched attentively and are avoided
CN112116589B (en) Method, device, equipment and computer readable storage medium for evaluating virtual image
CN111598213A (en) Network training method, data identification method, device, equipment and medium
CN110147833A (en) Facial image processing method, apparatus, system and readable storage medium storing program for executing
CN112418302A (en) Task prediction method and device
CN114241587B (en) Evaluation method and device for human face living body detection confrontation robustness
CN111027610B (en) Image feature fusion method, apparatus, and medium
CN113762716A (en) Method and system for evaluating running state of transformer area based on deep learning and attention
CN112000788A (en) Data processing method and device and computer readable storage medium
CN115168720A (en) Content interaction prediction method and related equipment
WO2019240047A1 (en) Behavior learning device, behavior learning method, behavior learning system, program, and recording medium
CN112560823B (en) Adaptive variance and weight face age estimation method based on distribution learning
CN112329571B (en) Self-adaptive human body posture optimization method based on posture quality evaluation
Chang et al. Agent embeddings: a latent representation for pole-balancing networks
JP2005141437A (en) Pattern recognition device and method
WO2021145185A1 (en) Behavior recognition device, behavior recognition method, program, and recording medium
Nishimoto et al. Dialogue management with deep reinforcement learning: Balancing exploration and exploitation
Noroozi et al. Generation of Lyapunov functions by neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914650

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021571127

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914650

Country of ref document: EP

Kind code of ref document: A1