WO2021145185A1

WO2021145185A1 - Behavior recognition device, behavior recognition method, program, and recording medium

Info

Publication number: WO2021145185A1
Application number: PCT/JP2020/048361
Authority: WO
Inventors: 黒田　大介; 高橋　一徳; 由仁宮内
Original assignee: Ｎｅｃソリューションイノベータ株式会社
Priority date: 2020-01-17
Filing date: 2020-12-24
Publication date: 2021-07-22
Also published as: JP7231286B2; JPWO2021145185A1

Abstract

This behavior recognition device includes: a data generating unit which generates status information data on the basis of the status of a subject; a storage unit which stores a usage learning model; and a behavior identifying unit which identifies a behavior of a person on the basis of the status information data and the usage learning model. The status information data associate a pattern indicating a relationship between a plurality of elements representing the status, and the degree and duration thereof, with a behavior estimated from the status. The usage learning model includes a plurality of models associating, for a specific status, a pattern indicating the relationship between the plurality of elements and the degree and duration thereof, with a behavior estimated from the specific status. The behavior identifying unit extracts the model having the highest degree of fit to the status information data, from within the usage learning model, and determines the behavior of the person on the basis of the degree of fit of the extracted model.

Description

Behavior recognition device, behavior recognition method, program and recording medium

The present invention relates to a behavior recognition device, a behavior recognition method, a program, and a recording medium.

In recent years, deep learning using a multi-layer neural network has been attracting attention as a machine learning method. Deep learning uses a calculation method called back propagation to calculate the output error when a large amount of teacher data is input to a multi-layer neural network, and learns so that the error is minimized.

Patent Documents 1 to 3 disclose a neural network processing apparatus capable of constructing a neural network with a small amount of labor and arithmetic processing by defining a large-scale neural network as a combination of a plurality of subnetworks. ing. Further, Patent Document 4 discloses a structure optimizing device that optimizes a neural network.

Japanese Unexamined Patent Publication No. 2001-051968 JP-A-2002-251601 Japanese Unexamined Patent Publication No. 2003-317573 Japanese Unexamined Patent Publication No. 09-091263

The application of deep learning is also being considered in behavior recognition for recognizing human behavior and behavior. However, in deep learning, a large amount of high-quality data is required as teacher data, and learning takes a long time. Patent Documents 1 to 4 propose a method for reducing the labor and the amount of arithmetic processing for constructing a neural network, but in order to further reduce the system load and the like, learning with higher accuracy by a simple algorithm is used. And it was desired to recognize.

An object of the present invention is to provide a behavior recognition device, a behavior recognition method, a program, and a recording medium capable of recognizing the behavior of a person in an image with a simple algorithm and with high accuracy.

According to one aspect of the present invention, a situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, a storage unit that stores a usage learning model, and the situation information. It has an action identification unit that identifies the behavior of the person based on the data and the usage learning model, and the situation information data generation unit has a plurality of elements representing the situation and information indicating the degree thereof. The first pattern that maps the relationship, the second pattern that maps the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. In the usage learning model, the situation information data is generated, and the usage learning model maps a third pattern in which the relationship between the plurality of elements and information representing their degree is mapped to a specific situation, and the plurality of elements and them. The behavior identification unit includes a plurality of models in which a fourth pattern mapping a relationship with information representing the duration of the data and the behavior of the person estimated from the specific situation are associated with the behavior identification unit. Among the plurality of models of the usage learning model, the model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or more than a predetermined threshold value, the extracted model estimates. Provided is an action recognition device that determines an action as the action of the person and determines the action estimated by the situation information data as the action of the person when the degree of conformity of the extracted model is less than the predetermined threshold value. ..

Further, according to another aspect of the present invention, the first mapping of the relationship between a plurality of elements representing the situation and information representing the degree thereof based on the situation of the subject in the image of the subject including a person. Generates situation information data in which the second pattern mapping the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. Then, for a specific situation, a third pattern that maps the relationship between the plurality of elements and information representing their degree and a third pattern that maps the relationship between the plurality of elements and information representing their duration are mapped. From the usage learning models including a plurality of models in which the pattern 4 and the behavior of the person estimated from the specific situation are linked, the model with the highest degree of conformity to the situation information data is selected. When the degree of suitability of the extracted and extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is less than the predetermined threshold value. In this case, an action recognition method for determining the action estimated by the situation information data as the action of the person is provided.

Further, according to still another aspect of the present invention, the computer uses the computer to determine the relationship between a plurality of elements representing the situation and information representing the degree thereof, based on the situation of the subject in the image of the subject including a person. A situation in which the first pattern that is mapped, the second pattern that maps the relationship between the plurality of elements and the information representing their duration, and the behavior of the person estimated from the situation are linked. A means for generating information data, a third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and information representing the plurality of elements and their duration. From the means for storing the usage learning model including a plurality of models in which the fourth pattern mapping the relationship and the behavior of the person estimated from the specific situation are linked, and the usage learning model. , The model having the highest degree of conformity with respect to the situation information data is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person. When the degree of suitability of the extracted model is less than the predetermined threshold value, a program is provided that functions as a means for determining the behavior estimated by the situation information data as the behavior of the person.

According to the present invention, it is possible to recognize the behavior of a person in an image with a simpler algorithm and with higher accuracy.

FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the first embodiment of the present invention. FIG. 2 is a schematic diagram showing a configuration example of a situation learning / identification unit in the behavior recognition device according to the first embodiment of the present invention. FIG. 3 is a schematic view showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the first embodiment of the present invention. FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the first embodiment of the present invention. FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the first embodiment of the present invention. FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the first embodiment of the present invention. FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit. FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit. FIG. 9 is a diagram showing an example of situation information data. FIG. 10 is a diagram showing an example of a usage learning model. FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model. FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model. FIG. 13 is a schematic view showing a hardware configuration example of the behavior recognition device according to the first embodiment of the present invention. FIG. 14 is a schematic view showing a configuration example of the behavior recognition device according to the second embodiment of the present invention.

[First Embodiment]
The schematic configuration of the behavior recognition device according to the first embodiment of the present invention will be described with reference to FIGS. 1 to 5. FIG. 1 is a schematic view showing a configuration example of an action recognition device according to the present embodiment. FIG. 2 is a schematic diagram showing a configuration example of the situation learning / identification unit in the behavior recognition device according to the present embodiment. FIG. 3 is a schematic diagram showing a configuration example of a neural network unit in the situation learning / identification unit of the behavior recognition device according to the present embodiment. FIG. 4 is a schematic diagram showing a configuration example of a learning cell in the situation learning / identifying unit of the behavior recognition device according to the present embodiment. FIG. 5 is a schematic view showing a configuration example of a usage learning unit in the behavior recognition device according to the present embodiment.

As shown in FIG. 1, for example, the behavior recognition device 1000 according to the present embodiment may be composed of an image acquisition unit 100, a situation grasping unit 200, a situation learning / identification unit 300, and a usage learning unit 400.

The image acquisition unit 100 is a functional block having a function of acquiring an image from an external camera or storage device (not shown). The image acquired by the image acquisition unit 100 includes a plurality of images taken with respect to the same subject at different times, and is, for example, a moving image. An image suitable for processing in the situation grasping unit 200 can be appropriately selected as the image, and may include, for example, an RGB image or a depth image.

The situation grasping unit 200 uses a known image recognition technique, for example, an image recognition technique using deep learning, for each of the images acquired by the image acquisition unit 100 to recognize a subject (person, object) appearing in the image. It is a functional block that has a function to grasp the situation. A known device or method can be appropriately used for person recognition and object recognition in the situation grasping unit 200. For example, examples of devices and methods applicable to person recognition include Kinect (registered trademark), Face Grapher, OpenPose, Pose Net, Pose Proposal Networks, DensePose and the like. Examples of devices and methods applicable to object recognition include SSD (Single Shot Multibox Detector), YOLOv3, Mask R-CNN, and the like.

Further, the situation grasping unit 200 may have a function of performing time series analysis of the subject. For example, RNN (Recurrent Neural Network), RSTM (Long Short-Term Memory Network), GRU (Gated Recurrent Unit) and the like can be applied to the short-time time series analysis of the subject. For example, Memory Networks can be applied to long-time time series analysis of a subject.

The situation learning / identification unit 300 is a functional block having a function of generating situation information data based on the information received from the situation grasping unit 200. The situation information data is data in which a pattern mapping information received from the situation grasping unit 200 and an estimation result indicating a person's behavior estimated from the information received from the situation grasping unit 200 are associated with each other. The details of the status information data will be described later.

In the situation learning / identification unit 300, a situation learning model that estimates the behavior of a person from the information received from the situation grasping unit 200 is constructed. The situation learning / identification unit 300 combines the information received from the situation grasping unit 200 with the information output from the situation learning model to generate the situation information data.

Here, as an example of the situation learning / identification unit 300, FIG. 2 is used for the situation learning / identification unit 300 having a function of performing learning based on the information received from the situation grasping unit 200 and generating a situation learning model. explain. The situation learning model is not particularly limited as long as it outputs the behavior of a person estimated by inputting the information received from the situation grasping unit 200, and may be based on a rule base, for example. .. In this case, the situation learning / identification unit 300 does not necessarily have to have a function of performing learning based on the information received from the situation grasping unit 200.

As shown in FIG. 2, for example, the situation learning / identification unit 300 includes a situation information data generation unit 310, a neural network unit 320, a determination unit 330, a learning unit 340, an identification unit 350, and an output unit 360. Can be composed of. The learning unit 340 may be composed of a weight correction unit 342 and a learning cell generation unit 344.

The situation information data generation unit 310 has a function of generating pattern data representing information related to the behavior of a person or the situation of an object in an image based on the information received from the situation grasping unit 200. Further, the situation information data generation unit 310 has a function of combining the information received from the situation grasping unit 200 and the information output from the situation learning model to generate the situation information data.

As shown in FIG. 3, for example, the neural network unit 320 may be composed of a two-layer artificial neural network including an input layer and an output layer. The input layer includes at least a number of cells (neurons) 42 corresponding to the number of element values contained in one pattern data. For example, if one pattern data containing M element values, the input layer comprises at least M cells _{_{_{42 1, 42 2, ...,}}} 42 i, ..., a 42 _M. The output layer comprises at least a number of cells (neurons) 44 corresponding to the estimated number of actions. For example, the output layer comprises N cells ₄₄ _1, 44 2 corresponding to the number of actions to be estimated, _..., 44 j, ..., a 44 _N. Each of the cells 44 constituting the output layer is associated with one of the presumed behaviors. When learning the neural network unit 320 using the teacher data, the output layer includes at least a number of cells 44 corresponding to the number of actions associated with the teacher data.

Cell

₄₂ 1 of the input _layer, 42 2, _..., 42 i, ..., 42 in the _M, M-number of element values _I 1 of the status information _{_{data, I 2, ..., I i}} , ..., I M , respectively input Will be done. Each of the cells _{_{_{42 1, 42 2, ...,}}} 42 i, ..., 42 M , the cell ₄₄ _1, 44 2 the input element values I, _..., 44 j, ..., and outputs the respective 44 _N.

A weighting coefficient ω for giving a predetermined weighting to the element value I is set in each of the branches (axons) connecting the cell 42 and the cell 44. For example, the cell _{_{_{42 1, 42 2, ...,}}} 42 i, ..., the branch connecting the 42 _M and the cell 44 _j, for example, as shown in FIG. 5, the weighting factor _{_{ω 1j, ω 2j, ...,}} ω ij, …, Ω _Mj is set. As a result, the cell 44 _j performs the operation shown in the following equation (1) and outputs the output value O _j .

In this specification, the single cell 44, the branch (input node) for inputting the element value I _{1 ~} I _M to the cell 44, the branch (output node) outputs an output value O from the cell 44 May be generically referred to as learning cell 46.

The determination unit 330 compares the correlation value between the plurality of element values of the pattern data and the output value of the learning cell 46 with a predetermined threshold value, and determines whether the correlation value is equal to or greater than the threshold value or less than the threshold value. do. An example of the correlation value is the likelihood of the learning cell 46 with respect to the output value. The function of the determination unit 330 may be provided in each of the learning cells 46.

The learning unit 340 is a functional block that learns the neural network unit 320 according to the determination result of the determination unit 330. The weight correction unit 342 updates the weighting coefficient ω set in the input node of the learning cell 46 when the correlation value is equal to or greater than a predetermined threshold value. Further, the learning cell generation unit 344 adds a new learning cell 46 to the neural network unit 320 when the correlation value is less than a predetermined threshold value.

The identification unit 350 identifies the behavior of a person estimated from the pattern data based on the correlation value between the plurality of element values of the pattern data and the output value of the learning cell 46. The output unit 360 outputs the identification result by the identification unit 350.

Next, the learning method in the situation learning / identification unit 300 will be briefly explained.

First, as an initial state, the neural network unit 320 is set with a number of learning cells 46 corresponding to the number of categories of teacher information (behavior of a person to be trained by the neural network unit 320) associated with the learning target data. ..

Next, the learning target data is taken into the situation information data generation unit 310. Next, the situation information data generation unit 310 extracts element values indicating the characteristics of the captured learning target data, and generates predetermined pattern data.

Next, a plurality of element values of the pattern data are input to the neural network unit 320. Element value _I 1 ~ _{I M} of the pattern data inputted to the neural network 320 is input to the cell ₄₄ 1 ~ 42 _N via the cell ₄₂ 1 ~ ₄₂ _M. As a result, outputs O ₁ to _N can be obtained _{from cells 44 1} to 42 _N. At this time, since the weighting coefficient ω is set in the input node of the learning cell 46, the output value O is calculated based on the equation (1).

Next, based on the output value O of the learning cell 46, in the determination unit 330, _{the correlation value between the element values I 1} to _IM and the output value O of the learning cell 46 (here, the likelihood regarding the output value of the learning cell). (Likelihood P) is calculated. The method for calculating the likelihood P is not particularly limited. For example, the likelihood _{P j} of the learning cell 46 _j can be calculated based on the following equation (2).

Equation (2) is the likelihood P _j have shown to be expressed by the ratio of the output value O _j of the learning cell 46 _j for the cumulative value of the weighting factor omega _ij of a plurality of input nodes of the learning cell 46 _j .. Alternatively, the likelihood P _j is the ratio of the output value of _{the learning cell 46 j} when a plurality of element values are input to the maximum value of the output of the learning cell 46 _j _{based on the weighting coefficient ω ij} of a plurality of input nodes. It shows that it will be done.

Next, the determination unit 330 compares the calculated value of the likelihood P with a predetermined threshold value, and determines whether or not the value of the likelihood P is equal to or greater than the threshold value.

If there is one or more learning cells 46 whose likelihood P value is equal to or greater than the threshold among the learning cells 46 associated with the teacher information category of the imported learning target data, they are associated with the category. The weighting coefficient ω of the input node of the learning cell 46 having the largest value of the likelihood P among the learning cells 46 is updated. In this way, the information of the learning target data whose likelihood P value is equal to or greater than a predetermined threshold value is accumulated in the weighting coefficient ω of each input node.

On the other hand, if none of the learning cells 46 whose likelihood P value is equal to or higher than the threshold value exists among the learning cells 46 linked to the teacher information category of the imported learning target data, the learning cells 46 are linked to the category. A new learning cell 46 is generated.

By repeatedly learning the neural network unit 320 in this way, the above-mentioned situation learning model can be constructed in the neural network unit 320.

The above learning method does not apply the error back propagation method (back propagation) used in deep learning and the like, and learning in one pass is possible. Therefore, the learning process of the neural network unit 320 can be simplified. Further, since each learning cell 46 is independent, it is easy to add, delete, and update learning data.

The learning method and identification method using the above algorithm are described in detail in, for example, International Application No. PCT / JP2018 / 042781 by the same applicant.

Next, the identification method in the situation learning / identification unit 300 will be briefly described.

First, the information received from the situation grasping unit 200 is taken into the situation information data generation unit 310. Next, the situation information data generation unit 310 extracts element values indicating the characteristics of the captured information and generates predetermined pattern data.

Then, the element values I _{1 ~} I _M of the pattern data, and inputs to the neural network unit 320 performing learning as described above. Element value _I 1 ~ _{I M} input to the neural network 320, via the cell ₄₂ 1 ~ 42 _M, is input to the learning cell 46. Thus, from all the learning cell 46, to obtain an output value O in accordance with the element values _{_I} 1 ~ _I _M.

Next, based on the output value O output from the learning cell 46, the identification unit 350 has _{a correlation value between the element values I 1} to _IM and the output value O of the learning cell 46 (here, the output value of the learning cell). (Likelihood P for) is calculated. The method for calculating the likelihood P is not particularly limited.

Next, the behavior of the person estimated from the pattern data is identified based on the calculated likelihood P of all the learning cells 46. The method of identifying the behavior of a person is not particularly limited. For example, among all the learning cells 46, the behavior associated with the learning cell 46 having the highest likelihood P can be identified from the behavior estimated from the pattern data. Alternatively, a predetermined number of learning cells 46 are extracted from all the learning cells 46 in descending order of likelihood P, and the behavior most associated with the extracted learning cells 46 is estimated from the pattern data. Can be distinguished from the behavior.

The usage learning unit 400 has a function of generating a usage learning model based on the user's evaluation of the situation information data generated by the situation learning / identification unit 300, and identifying a person's movement based on the situation information data and the usage learning model. It is a functional block.

As shown in FIG. 5, for example, the usage learning unit 400 includes a situation information data acquisition unit 410, an evaluation acquisition unit 420, a usage learning model generation unit 430, an action identification unit 440, and a storage unit 450. obtain.

The situation information data acquisition unit 410 has a function of acquiring the situation information data generated by the situation information data generation unit 310 from the situation learning / identification unit 300.

The evaluation acquisition unit 420 has a function of acquiring the evaluation of the user (advisor) for the status information data. This evaluation includes information that encourages reconsideration of the situation indicated by the situation information data, and is, so to speak, know-how given by the user to the situation learning model. The user's evaluation of the situation information data can be performed, for example, by the user inputting a comment on the keyboard while watching the video used in the situation learning. The user's evaluation of the situation information data can be performed at the same time as the situation learning is performed.

The usage learning model generation unit 430 has a function of generating a usage learning model based on the situation information data and the user's evaluation of the situation information data. The usage learning model may include data in which a pattern mapping information received from the situation grasping unit 200 and a person's behavior according to a user's evaluation are associated with each other. The usage learning model generated by the usage learning model generation unit 430 is stored in the storage unit 450.

The usage learning model generation unit 430 may have a function of further mapping based on the user's evaluation (comment) on the situation information data and generating a new pattern. The usage learning model in this case may be data in which a new pattern mapping the information shown in the user's comment and the behavior of the person according to the user's evaluation for the pattern are associated with each other.

For example, when information indicating a state in which a person is "sit shallowly (weak)" is mapped to a pattern of situation information data, the state of "sit deeply (strong)" with respect to the situation at that time is set. It is assumed that the user thinks that it is also necessary. In such a case, the usage learning model generation unit 430 additionally maps the information indicating the "deeply sitting (strong)" state to the pattern of the situation information data based on the comment from the user, and newly maps the pattern. To generate. The usage learning model generation unit 430 maps the information to predetermined coordinates according to words such as "weak", "medium", and "strong" input by the user via a keyboard or the like. A new pattern can be generated.

The behavior identification unit 440 has a function of identifying a person's behavior based on the situation information data and the usage learning model generation unit 430.

Next, the behavior recognition method using the behavior recognition device according to the present embodiment will be described with reference to FIGS. 6 to 12. FIG. 6 is a flowchart showing a behavior recognition method using the behavior recognition device according to the present embodiment. FIG. 7 is a diagram showing an example of information grasped by the situation grasping unit from the image acquired by the image acquisition unit. FIG. 8 is a diagram showing an example of a rule for mapping the information grasped by the situation grasping unit. FIG. 9 is a diagram showing an example of situation information data. FIG. 10 is a diagram showing an example of a usage learning model. FIG. 11 is a flowchart showing a method of recognizing a person's behavior based on the situation information data and the usage learning model. FIG. 12 is a diagram illustrating a method of calculating the inner product value of the pattern of the situation information data and the pattern of the usage learning model.

Here, for ease of understanding, 1) a person sits in a chair and starts reading a book, 2) closes or opens the book while reading the book, and 3) reads the book after reading the book for a while. Assuming a case of recognizing a series of actions such as closing and stopping reading, supplement the explanation as appropriate. It is assumed that the situation learning / identification unit 300 has a situation learning model that estimates the behavior of a person by inputting the state of the book, the position of the book, and the state of the person.

First, the image acquisition unit 100 acquires a plurality of images of the same subject taken at different times from a camera or a storage device (step S101). The plurality of images acquired by the image acquisition unit 100 are, for example, images of each frame of a moving image. In this case, it is not always necessary to acquire the images of all the frames, and the images may be thinned out as appropriate. The image to be acquired may be any image suitable for grasping the situation of the subject, and can be appropriately selected. For example, RGB images and depth images acquired by an RGB camera and an infrared camera can be applied. The image acquired by the image acquisition unit 100 may be input to the situation grasping unit 200 as it is, or may be temporarily stored in a storage device (not shown).

Next, the situation grasping unit 200 recognizes a person or an object appearing in the image by using a known image recognition technique, for example, an image recognition technique using deep learning, for each of the images acquired by the image acquisition unit 100. , The situation is grasped (step S102).

For example, when a person holding a book in his hand and sitting on a chair is shown in the image, the situation of the person may be whether he is sitting shallowly on the chair or deeply on the chair. Further, as the situation of the object (book), for example, whether it is open or closed, whether it is near the face of a person, or the like can be mentioned.

Next, the situation learning / identification unit 300 generates situation information data based on the information received from the situation grasping unit 200 (step S103). The generated situation information data is estimated as a person's behavior from the pattern data of the first layer in which the degree of each element indicating the situation of a person or an object is mapped in a plurality of stages and the pattern data of the first layer. Includes information about the situation (value). The situation (value) estimated as the behavior of a person is information acquired by applying the pattern data of the first layer to the situation learning model. Further, the situation information data is provided with pattern data of the second layer in which the duration of each element indicating the situation of a person or an object is mapped by dividing it into a plurality of stages.

For example, "book state", "book position", and "sitting condition" are used as three elements indicating the situation of a person or an object, and the degree of each element is mapped in three stages. In this case, for example, in each image of the 18th frame to the 22nd frame, information as shown in FIG. 7 is obtained as three elements indicating the situation of a person or an object and the situation (value) estimated in that case. It is assumed that it has been done.

In such a case, for example, by using the rule shown in FIG. 8, each information in FIG. 7 can be mapped as pattern data. The rule shown in FIG. 8 is an example in which three levels are provided for each element and mapped to a 3 × 3 pattern. As the state of the book in the first layer, for example, three levels of "closed (closed)", "open (open)", and "intermediate state (middle)" can be assumed. .. As the position of the book, for example, three levels of "near (near)", "far (far)", and "intermediate state (middle)" can be assumed. As the sitting condition, for example, three levels of "sit shallowly (weak)", "sit firmly (strong)", and "intermediate state (medium)" can be assumed. Regarding the duration of the second layer, three levels of "short (short)", "long (long)", and "intermediate state (medium)" can be assumed for each element.

FIG. 9 is an example in which the information of frames 18 to 21 shown in FIG. 7 is represented as situation information data according to the rules shown in FIG. The situation information data includes patterns and values of the first layer and the second layer corresponding to the image of each frame.

Next, the behavior identification unit 440 applies the usage learning model to the situation information data corresponding to the image of each frame generated by the situation learning / identification unit 300, and verifies the estimation result in the situation learning (step S104). Specifically, the pattern of the situation information data is compared with the pattern of the usage learning model, and it is searched whether or not there is a model highly compatible with the situation information data in the usage learning model.

Next, the behavior identification unit 440 recognizes the behavior of the person based on the verification result in step S104 (step S105). Specifically, when there is no model highly compatible with the situation information data in the usage learning model, the value of the situation information data is recognized as the behavior of the person as the behavior of the person. On the other hand, when there is a model highly compatible with the situation information data in the usage learning model, the value of the model highly compatible with the situation information data is recognized as the behavior of the person.

The storage unit 450 stores a usage learning model including a plurality of models as shown as model 1 and model 2 in FIG. 10, for example. In model 1, since the book is closed, it is judged in the situation learning model that "it is sitting but not reading the book", but because the time when the book is closed is short, "sitting and reading the book" It encourages reconsideration. In model 2, since the book is half closed, it is judged in the situation learning model that "it is sitting but not reading the book", but because the time when the book is closed is short, "sitting and reading the book" It encourages reconsideration that "is."

The action identification unit 440 compares the situation information data corresponding to the image of each frame with each of the usage learning models stored in the storage unit 450, and uses the model most suitable for the situation information data for usage learning. Extract from the model. Then, it is determined whether to apply the value of the situation information data or the value of the extracted model according to the degree of conformity between the situation information data and the extracted model.

The method of determining the suitability of the situation information data and the usage learning model is not particularly limited, but for example, a method of using the inner product value of the pattern of the situation information data and the pattern of the usage learning model can be mentioned.

Hereinafter, a method of determining the suitability of the situation information data and the usage learning model by using the inner product value of the pattern of the situation information data and the pattern of the usage learning model will be described with reference to FIGS. 11 and 12.

Here, for the sake of simplification of the explanation, it is assumed that the situation information data and the usage learning model include nine cells arranged in a 3 × 3 matrix as a pattern of the first layer and the second layer (FIG. 9 and FIG. 9 and). See FIG. 10). The value of each cell is 0 or 1. The value of the cell corresponding to the level of each element indicating the situation of the person or the object is 1, and the value of the other cells is 0. In FIGS. 9 and 10, cells having a value of 1 are painted black.

First, the inner product value of the pattern of the first layer of the situation information data and the pattern of the first layer of the usage learning model is calculated (step S201). The inner product value of the pattern of the situation information data and the pattern of the usage learning model is calculated by multiplying the values of cells having the same coordinates and adding up the multiplied values of each coordinate. For example, as shown in FIG. 12, the values of the cells constituting the pattern of the situation information data are A, B, C, D, E, F, G, H, and I, and the pattern of the usage learning model to be compared. It is assumed that the value of each cell constituting the above is 1,0,0,0,1,0,0,0,1. In this case, the inner product value of the pattern of the situation information data and the pattern of the usage learning model is A × 1 + B × 0 + C × 0 + D × 0 + E × 1 + F × 0 + G × 0 + H × 0 + I × 1. The inner product value calculated in this way is normalized by dividing by the number of cells having a value of 1 among the cells included in the status information data. The calculation and normalization of the inner product value for the situation information data is performed for each of the plurality of models included in the usage learning model.

Next, the model having the maximum normalized internal product value is extracted from the plurality of usage learning models, and it is determined whether or not the internal product value of the model is equal to or greater than a predetermined threshold value (step S202). .. The normalized inner product value indicates that the larger the value, the higher the suitability for the situation information data. The threshold value used for the determination is a standard for determining whether or not it is appropriate to apply the model to the situation information data, and can be set as appropriate. If it is determined as a result of the determination that the maximum internal product value is less than the threshold value (“No” in step S202), the process proceeds to step S203, and the value of the situation information data is recognized as the behavior of the person. The process of step S104 ends. On the other hand, if it is determined as a result of the determination that the maximum internal product value is equal to or greater than the threshold value (“Yes” in step S202), the process proceeds to step S204.

In step S204, it is determined whether or not there are two or more models having the maximum inner product value. As a result of the determination, when there is only one model having the maximum internal product value (“No” in step S204), the process proceeds to step S205, and the model value having the maximum internal product value in the first layer is set as the person. Recognizes as the action of, and ends the process of step S104. On the other hand, as a result of the determination, when there are two or more models having the maximum inner product value (“Yes” in step S204), the process proceeds to step S206.

In step S206, the inner product value is calculated and normalized to the second layer pattern of the status information data for each of the second layer patterns of the two or more models having the maximum inner product value. The process of calculating the inner product value and the normalization is the same as the process for the pattern of the first layer.

Next, in step S207, it is determined whether or not there are two or more models having the maximum inner product value. As a result of the determination, when there is only one model having the maximum inner product value (“No” in step S207), the process proceeds to step S208, and the value of the model having the maximum inner product value in the second layer is set. Recognizing it as a person's action, the process of step S104 ends. On the other hand, as a result of the determination, when there are two or more models having the maximum inner product value (“Yes” in step S207), the process proceeds to step S209.

In step S209, whether or not there is a model that does not include an element whose duration is shorter than a predetermined time (short-time element) among the two or more models having the maximum inner product value in the second layer. Make a judgment. As a result of the determination, when there is no model that does not include the short-time element (“No” in step S209), the process proceeds to step S210, the value of the previous frame is recognized as the action of the person, and the process of step S104 is performed. To finish. On the other hand, as a result of the determination, if there is a model that does not include the element for a short time (“Yes” in step S209), the process proceeds to step S211. Then, in step S211 the value of the model that does not include the short-time element is determined as the behavior of the person, and the process of step S104 is completed. If there are multiple models that do not include short-term elements, select the latest model. It should be noted that a predetermined time, which is a criterion for determining whether or not the element is a short-time element, can be appropriately set for each of a plurality of elements representing the situation.

The information on the behavior of the person recognized by the usage learning unit 400 can be used as information for executing various actions. For example, when a person recognizes an action of sitting on a chair and starting to read a book, he / she can perform an action such as turning on a light. Alternatively, when the person recognizes the action of stopping reading and standing up, an action such as turning off the light can be executed. Further, the information regarding the behavior of the person recognized by the usage learning unit 400 may be fed back to the situation learning / identification unit 300 and used for learning of the neural network unit 320.

With existing situation recognition technology using deep learning, for example, if you are learning to judge that you are reading when you recognize a sitting person and a book, you cannot recognize that you have stopped reading. In addition, when learning is performed in frame units, when the book is closed or opened in a short time, it is recognized that the book is being read or not being read for each state. In order to improve this, it is necessary to prepare a large amount of learning data when a person closes or opens a book and perform learning.

On the other hand, in the behavior recognition device according to the present embodiment, even if a large amount of learning data when a person closes or opens a book is not prepared, a comment is input in that state to learn usage. You can learn the situation properly just by doing. Therefore, for example, a series of actions such as a person sitting down and starting to read a book, closing the book after a while, and stopping reading can be appropriately recognized by simple learning.

Next, a hardware configuration example of the behavior recognition device 1000 according to the present embodiment will be described with reference to FIG. FIG. 13 is a schematic view showing a hardware configuration example of the action recognition device according to the present embodiment.

As shown in FIG. 13, for example, the behavior recognition device 1000 can be realized by a hardware configuration similar to that of a general information processing device. For example, the action recognition device 1000 may include a CPU (Central Processing Unit) 500, a main storage unit 502, a communication unit 504, and an input / output interface unit 506.

The CPU 500 is a control / arithmetic device that controls the overall control and arithmetic processing of the action recognition device 1000. The main storage unit 502 is a storage unit used for a data work area or a data temporary storage area, and may be configured by a memory such as a RAM (Random Access Memory). The communication unit 504 is an interface for transmitting and receiving data via a network. The input / output interface unit 506 is an interface for connecting to an external output device 510, an input device 512, a storage device 514, and the like to transmit and receive data. The CPU 500, the main storage unit 502, the communication unit 504, and the input / output interface unit 506 are connected to each other by the system bus 508. The storage device 514 may be composed of, for example, a hard disk device composed of a ROM (Read Only Memory), a magnetic disk, a non-volatile memory such as a semiconductor memory, or the like.

The main storage unit 502 can be used as a work area for constructing a neural network unit 320 including a plurality of learning cells 46 and executing an operation. The CPU 500 functions as a control unit that controls arithmetic processing in the neural network unit 320 constructed in the main storage unit 502. The storage device 514 can store the learning cell information (situation learning model) including the information about the learned learning cell 46. Further, by reading the learning cell information stored in the storage device 514 and configuring the main storage unit 502 to construct the neural network unit 320, it is possible to construct a learning environment for various situation information data. Further, the storage unit 450 for storing the usage learning model may be configured by the storage device 514. It is desirable that the CPU 500 is configured to execute arithmetic processing in a plurality of learning cells 46 of the neural network unit 320 constructed in the main storage unit 502 in parallel.

The communication unit 504 is a communication interface based on standards such as Ethernet (registered trademark) and Wi-Fi (registered trademark), and is a module for communicating with other devices. The learning cell information may be received from another device via the communication unit 504. For example, frequently used learning cell information can be stored in the storage device 514, and less frequently used learning cell information can be configured to be read from another device.

The output device 510 includes a display such as a liquid crystal display device. The output device 510 can be used as a display device for presenting the situation information data and the information on the behavior estimated by the situation learning / identification unit 300 to the user at the time of learning the usage learning unit 400. Further, the user can be notified of the learning result and the action decision via the output device 510. The input device 512 is a keyboard, a mouse, a touch panel, or the like, and is used for the user to input predetermined information to the action recognition device 1000, for example, a user episode at the time of learning of the usage learning unit 400.

The status information data can also be configured to be read from another device via the communication unit 504. Alternatively, the input device 512 can be used as a means for inputting the situation information data.

The functions of each part of the action recognition device 1000 according to the present embodiment can be realized in terms of hardware by mounting circuit components that are hardware components such as LSI (Large Scale Integration) in which a program is incorporated. Alternatively, a program that provides the function can be stored in the storage device 514, loaded into the main storage unit 502, and executed by the CPU 500, so that the program can be realized by software.

Further, the configuration of the action recognition device 1000 shown in FIG. 1 does not necessarily have to be configured as one independent device. For example, a part of the image acquisition unit 100, the situation grasping unit 200, the situation learning / identification unit 300, and the usage learning unit 400, for example, the situation learning / identification unit 300 and the usage learning unit 400 are arranged on the cloud, and by these. The behavior recognition system may be constructed.

As described above, according to the present embodiment, it is possible to recognize the behavior of the person in the image with a simpler algorithm and with higher accuracy.

[Second Embodiment]
The behavior recognition device according to the second embodiment of the present invention will be described with reference to FIG. The same components as those of the behavior recognition device according to the first embodiment are designated by the same reference numerals, and the description thereof will be omitted or simplified. FIG. 14 is a schematic view showing a configuration example of the action recognition device according to the present embodiment.

As shown in FIG. 14, the action recognition device 1000 according to the present embodiment includes a situation information data generation unit 310, an action identification unit 440, and a storage unit 450.

The situation information data generation unit 310 has a function of generating situation information data based on the situation of the subject in the image of the subject including a person. The storage unit 450 stores the usage learning model. The behavior identification unit 440 has a function of identifying a person's behavior based on situation information data and a usage learning model.

The situation information data generation unit maps the relationship between the first pattern that maps the relationship between a plurality of elements representing the situation and the information that represents their degree, and the first pattern that maps the relationship between the plurality of elements and the information that represents their duration. The situation information data in which the pattern 2 and the behavior of the person estimated from the situation are linked is generated.

The usage learning model mapped the relationship between the third pattern, which maps the relationship between multiple elements and the information representing their degree, and the relationship between the plurality of elements and the information representing their duration, for a specific situation. It includes a plurality of models in which the fourth pattern and the behavior of the person estimated from a specific situation are associated with each other.

The behavior identification unit extracts the model with the highest degree of suitability for the situation information data from among the multiple models of the usage learning model. Then, when the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person. When the goodness of fit of the extracted model is less than a predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of a person.

[Modification Embodiment]
The present invention is not limited to the above embodiment and can be modified in various ways.

For example, an example in which a part of the configuration of any of the embodiments is added to another embodiment or an example in which a part of the configuration of another embodiment is replaced with another embodiment is also an embodiment of the present invention.

Further, in the above embodiment, the behavior of a person sitting on a chair and reading a book has been described as an example of application of the present invention, but the present invention is widely applied to the recognition of various behaviors of a person appearing in an image. be able to.

Further, there are also processing methods in which a program for operating the configuration of the embodiment is recorded on a recording medium so as to realize the functions of the above-described embodiment, the program recorded on the recording medium is read out as a code, and the program is executed by a computer. Included in the category of embodiments. That is, a computer-readable recording medium is also included in the scope of each embodiment. Further, not only the recording medium on which the above-mentioned program is recorded but also the program itself is included in each embodiment.

As the recording medium, for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, a non-volatile memory card, or a ROM can be used. Further, not only the program that executes the processing by the program recorded on the recording medium alone, but also the one that operates on the OS and executes the processing in cooperation with the functions of other software and the expansion board is also in each embodiment. It is included in the category of.

The above embodiments are merely examples of embodiment of the present invention, and the technical scope of the present invention should not be construed in a limited manner by these. That is, the present invention can be implemented in various forms without departing from the technical idea or its main features.

Part or all of the above embodiment may be described as in the following appendix, but is not limited to the following.

(Appendix 1)
A situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, and
A storage unit that stores the usage learning model,
It has a behavior identification unit that identifies the behavior of the person based on the situation information data and the usage learning model.
The situation information data generation unit determines the relationship between the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, and the relationship between the plurality of elements and the information representing their duration. The situation information data in which the mapped second pattern and the behavior of the person estimated from the situation are linked is generated.
The usage learning model has a relationship between a third pattern that maps the relationship between the plurality of elements and information representing their degree, and information representing the plurality of elements and their duration for a specific situation. Includes a plurality of models in which a fourth pattern of mapping and the behavior of the person inferred from the particular situation are associated.
The behavior identification unit extracts the model having the highest degree of conformity with respect to the situation information data from the plurality of models of the usage learning model, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value. The behavior estimated by the extracted model is determined to be the behavior of the person, and when the degree of suitability of the extracted model is less than the predetermined threshold, the behavior estimated by the situation information data is determined to be the behavior of the person. A behavior recognition device characterized by this.

(Appendix 2)
The behavior identification unit is characterized in that, among the plurality of models of the usage learning model, a model including the third pattern having the highest degree of conformity with the first pattern of the situation information data is extracted. The action recognition device according to Appendix 1.

(Appendix 3)
In the action identification unit, the larger the inner product value between each element value of the first pattern and each element value of the third pattern, the better the goodness of fit of the third pattern with respect to the first pattern. The action recognition device according to Appendix 2, characterized in that it is determined to be high.

(Appendix 4)
When there are a plurality of models including the third pattern having the highest degree of conformity with respect to the first pattern of the situation information data, the action identification unit has the third pattern having the highest degree of conformity. The action recognition device according to Appendix 2 or 3, wherein a model including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data is extracted from the model including the above. ..

(Appendix 5)
In the action identification unit, the larger the inner product value between each element value of the second pattern and each element value of the fourth pattern, the better the goodness of fit of the fourth pattern with respect to the second pattern. The action recognition device according to Appendix 4, wherein the behavior recognition device is determined to be high.

(Appendix 6)
The action identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. When there is a model including an element whose duration is shorter than a predetermined time among a plurality of models including, the duration is selected from the models including the fourth pattern having the highest goodness of fit. The action recognition device according to Appendix 4 or 5, wherein the model does not include an element shorter than the predetermined time.

(Appendix 7)
The behavior identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. If all of the plurality of models including the above include an element whose duration is shorter than a predetermined time, the action applied in the previous frame is determined to be the action in the current frame. Behavior recognition device.

(Appendix 8)
The behavior according to any one of Appendix 1 to 7, wherein the information regarding the behavior estimated by each of the plurality of models is information given by the user as an evaluation according to the specific situation. Recognition device.

(Appendix 9)
The image is a moving image including images of a plurality of frames.
The action recognition device according to any one of Supplementary note 1 to 8, wherein the situation information data generation unit generates the situation information data for each of the images of the plurality of frames.

(Appendix 10)
Further having a situation learning unit that learns the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation learning department
A neural network unit in which element values of each of the plurality of elements representing the situation are input as learning target data, and
It has a learning unit that learns the neural network unit, and
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
The learning unit is characterized in that the weighting coefficients of the plurality of input nodes of the learning cell are updated or a new learning cell is added to the neural network unit according to the output value of the learning cell. The action recognition device according to any one of Appendix 1 to 9.

(Appendix 11)
The learning unit updates the weighting coefficient of the plurality of input nodes of the learning cell when the correlation value between the plurality of element values and the output value of the learning cell is equal to or greater than a predetermined threshold value. The behavior recognition device according to Appendix 10, which is a feature.

(Appendix 12)
Further, it has a situation identification unit that identifies the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation identification unit
A neural network unit in which element values of each of the plurality of elements representing the situation are input as identification target data, and
It has an identification unit that identifies the identification target data based on the output of the neural network unit.
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
Each of the plurality of learning cells is associated with one of a plurality of categories indicating teacher information.
The plurality of input nodes of the learning cell are configured so that each of the plurality of element values is input with a predetermined weight according to the corresponding category.
Based on the output value of the learning cell and the category associated with the learning cell, the identification unit estimates the category to which the identification target data belongs as the behavior of the person estimated from the situation.
The behavior recognition device according to any one of Supplementary Provisions 1 to 9, wherein the situation information data generation unit generates the situation information data based on a result estimated by the situation identification unit.

(Appendix 13)
The identification unit estimates the category associated with the learning cell having the largest correlation value between the plurality of element values and the output value of the learning cell as the behavior of the person estimated from the situation. The behavior recognition device according to Appendix 12, wherein the behavior recognition device is characterized by the above.

(Appendix 14)
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations Generates situation information data in which the second pattern mapping the relationship with the information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. From the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, the model having the highest degree of suitability for the situation information data is extracted. ,
When the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person.
A behavior recognition method characterized in that when the goodness of fit of the extracted model is less than the predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of the person.

(Appendix 15)
Computer,
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations A means for generating situation information data in which a second pattern mapping a relationship with information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. For the situation information data from the means for storing the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, and the usage learning model. The model with the highest degree of conformity is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is determined. When is less than the predetermined threshold value, the means for determining the behavior estimated by the situation information data as the behavior of the person.
A program that functions as.

(Appendix 16)
A computer-readable recording medium on which the program described in Appendix 15 is recorded.

This application claims priority based on Japanese application Japanese Patent Application No. 2020-005536 filed on January 17, 2020, and incorporates all of its disclosures herein.

42, 44 ... Cell 46 ... Learning cell 100 ... Image acquisition unit 200 ... Situation grasping unit 300 ... Situation learning / identification unit 310 ... Situation information data generation unit 310
320 ... Neural network unit 330 ... Judgment unit 340 ... Learning unit 342 ... Weight correction unit 344 ... Learning cell generation unit 350 ... Identification unit 360 ... Output unit 400 ... Usage learning unit 410 ... Situation information data acquisition unit 420 ... Evaluation acquisition unit 430 ... Usage learning model generation unit 440 ... Behavior identification unit 450 ... Storage unit 500 ... CPU
502 ... Main storage unit 504 ... Communication unit 506 ... Input / output interface unit 508 ... System bus 510 ... Output device 512 ... Input device 514 ... Storage device

Claims

A situation information data generation unit that generates situation information data based on the situation of the subject in an image of a subject including a person, and
A storage unit that stores the usage learning model,
It has a behavior identification unit that identifies the behavior of the person based on the situation information data and the usage learning model.
The situation information data generation unit determines the relationship between the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, and the relationship between the plurality of elements and the information representing their duration. The situation information data in which the mapped second pattern and the behavior of the person estimated from the situation are linked is generated.
The usage learning model has a relationship between a third pattern that maps the relationship between the plurality of elements and information representing their degree, and information representing the plurality of elements and their duration for a specific situation. Includes a plurality of models in which a fourth pattern of mapping and the behavior of the person inferred from the particular situation are associated.
The behavior identification unit extracts the model having the highest degree of conformity with respect to the situation information data from the plurality of models of the usage learning model, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value. The behavior estimated by the extracted model is determined to be the behavior of the person, and when the degree of suitability of the extracted model is less than the predetermined threshold, the behavior estimated by the situation information data is determined to be the behavior of the person. A behavior recognition device characterized by this.
The behavior identification unit is characterized in that, among the plurality of models of the usage learning model, a model including the third pattern having the highest degree of conformity with the first pattern of the situation information data is extracted. The behavior recognition device according to claim 1.
In the action identification unit, the larger the inner product value between each element value of the first pattern and each element value of the third pattern, the better the goodness of fit of the third pattern with respect to the first pattern. The behavior recognition device according to claim 2, wherein the behavior recognition device is determined to be high.
When there are a plurality of models including the third pattern having the highest degree of conformity with respect to the first pattern of the situation information data, the action identification unit has the third pattern having the highest degree of conformity. The behavior recognition according to claim 2 or 3, wherein a model including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data is extracted from the model including the above. Device.
In the action identification unit, the larger the inner product value between each element value of the second pattern and each element value of the fourth pattern, the better the goodness of fit of the fourth pattern with respect to the second pattern. The behavior recognition device according to claim 4, wherein the behavior recognition device is determined to be high.
The action identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. When there is a model including an element whose duration is shorter than a predetermined time among a plurality of models including, the duration is selected from the models including the fourth pattern having the highest goodness of fit. The behavior recognition device according to claim 4 or 5, wherein the model is extracted without an element shorter than the predetermined time.
The behavior identification unit has a plurality of models including the fourth pattern having the highest degree of conformity with respect to the second pattern of the situation information data, and the fourth pattern having the highest degree of conformity. Claim 4 or 5 is characterized in that, when all of the plurality of models including the above include an element whose duration is shorter than a predetermined time, the action applied in the previous frame is determined to be the action in the current frame. The described behavior recognition device.
The information according to any one of claims 1 to 7, wherein the information regarding the behavior estimated by each of the plurality of models is information given by the user as an evaluation according to the specific situation. Behavior recognition device.
The image is a moving image including images of a plurality of frames.
The behavior recognition device according to any one of claims 1 to 8, wherein the situation information data generation unit generates the situation information data for each of the images of the plurality of frames.
Further having a situation learning unit that learns the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation learning department
A neural network unit in which element values of each of the plurality of elements representing the situation are input as learning target data, and
It has a learning unit that learns the neural network unit, and
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
The learning unit is characterized in that the weighting coefficients of the plurality of input nodes of the learning cell are updated or a new learning cell is added to the neural network unit according to the output value of the learning cell. The behavior recognition device according to any one of claims 1 to 9.
The learning unit updates the weighting coefficient of the plurality of input nodes of the learning cell when the correlation value between the plurality of element values and the output value of the learning cell is equal to or greater than a predetermined threshold value. The behavior recognition device according to claim 10, wherein the behavior recognition device is characterized.
Further, it has a situation identification unit that identifies the behavior of the person estimated from the situation based on the situation of the subject in the image.
The situation identification unit
A neural network unit in which element values of each of the plurality of elements representing the situation are input as identification target data, and
It has an identification unit that identifies the identification target data based on the output of the neural network unit.
Each of the neural network units includes a plurality of input nodes that give predetermined weighting to each of the plurality of element values, and an output node that adds and outputs the weighted plurality of element values. Has a cell and
Each of the plurality of learning cells is associated with one of a plurality of categories indicating teacher information.
The plurality of input nodes of the learning cell are configured so that each of the plurality of element values is input with a predetermined weight according to the corresponding category.
Based on the output value of the learning cell and the category associated with the learning cell, the identification unit estimates the category to which the identification target data belongs as the behavior of the person estimated from the situation.
The behavior recognition device according to any one of claims 1 to 9, wherein the situation information data generation unit generates the situation information data based on a result estimated by the situation identification unit.
The identification unit estimates the category associated with the learning cell having the largest correlation value between the plurality of element values and the output value of the learning cell as the behavior of the person estimated from the situation. 12. The behavior recognition device according to claim 12.
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations Generates situation information data in which the second pattern mapping the relationship with the information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. From the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, the model having the highest degree of suitability for the situation information data is extracted. ,
When the goodness of fit of the extracted model is equal to or higher than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person.
A behavior recognition method characterized in that when the goodness of fit of the extracted model is less than the predetermined threshold value, the behavior estimated by the situation information data is determined to be the behavior of the person.
Computer,
Based on the situation of the subject in the image of the subject including a person, the first pattern that maps the relationship between the plurality of elements representing the situation and the information representing the degree thereof, the plurality of elements and their durations A means for generating situation information data in which a second pattern mapping a relationship with information representing the above and the behavior of the person estimated from the situation are associated with each other.
A third pattern that maps the relationship between the plurality of elements and information representing their degree to a specific situation, and a fourth pattern that maps the relationship between the plurality of elements and information representing their duration. For the situation information data from the means for storing the usage learning model including a plurality of models in which the pattern and the behavior of the person estimated from the specific situation are associated with each other, and the usage learning model. The model with the highest degree of conformity is extracted, and when the degree of conformity of the extracted model is equal to or greater than a predetermined threshold value, the behavior estimated by the extracted model is determined to be the behavior of the person, and the degree of suitability of the extracted model is determined. When is less than the predetermined threshold value, the means for determining the behavior estimated by the situation information data as the behavior of the person.
A program that functions as.
A computer-readable recording medium on which the program according to claim 15 is recorded.