CN113850114A

CN113850114A - Motion recognition device, motion recognition method, and storage medium

Info

Publication number: CN113850114A
Application number: CN202110516167.3A
Authority: CN
Inventors: 内田滋穂里; 西行健太
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2020-06-25
Filing date: 2021-05-12
Publication date: 2021-12-28
Also published as: JP2022006885A

Abstract

A motion recognition device, a motion recognition method and a storage medium are provided, which can respectively recognize common motions performed in different working procedures. The action recognition device (10) is provided with: an acquisition unit (11) that acquires operation information including the elapsed time of a work, a common operation to be identified that is commonly included in different work processes, and skeleton data of a worker; a conversion unit (12) that calculates, based on coordinates corresponding to each of a plurality of body parts included in the bone data, a distance between each coordinate and a start point coordinate corresponding to a start point part that is a start point of the body parts, and converts each coordinate included in the bone data into a calculated distance; and a learning unit (13) that learns a model for motion recognition, the model outputting information indicating the motion of an operator belonging to any one of the common motions, based on the motion information obtained by converting the coordinates into the distance by the conversion unit.

Description

Motion recognition device, motion recognition method, and storage medium

Technical Field

The invention relates to a motion recognition device, a motion recognition method and a storage medium.

Background

Conventionally, cameras, sensors, and the like are installed in production lines of factories and the like, and based on operation information of operators obtained from the cameras, the operations of the operators are recognized and evaluated. For example, in patent document 1 described below, it is determined whether or not there is a non-standard job that deviates from the standard job based on information obtained by motion capture or the like.

In patent document 1, an operation performed in a certain operation step is assumed, and whether or not the operation is abnormal or not is determined.

Patent document 1: international publication No. 2018/131630

Disclosure of Invention

However, even if the operations performed in the work steps belong to the same operation type, there are many cases where the operations are different for each work step. For example, in the process of fastening a screw and the process of fitting a housing, the common operation includes an operation of moving a hand to a work object and grasping (holding), but the grasping operations are different from each other. Specifically, in the step of fastening the screw, for example, the electric screwdriver is operated to grip the component placed in the component box with one hand and to grip the component hung from the front upper side of the operator through the cable with the other hand. In contrast, in the process of fitting the housing, for example, the housing placed in the component box is gripped by both hands.

In patent document 1, when a common operation that performs different operations for each work process is identified as an object of identification, the common operation performed in any one work process can be identified. However, in order to recognize the common operation performed in the other work process, it is necessary to perform other design or the like so that the common operation performed in the other work process can be recognized.

Accordingly, the present invention provides an operation recognition device, an operation recognition method, and an operation recognition program capable of recognizing common operations performed in different work processes.

An operation recognition device according to an aspect of the present invention includes: an acquisition unit that acquires operation information including elapsed time of a work, a common operation to be identified that is commonly included in different work processes, and skeleton data of a worker; a conversion unit that calculates a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point of the body portions, and converts each coordinate included in the bone data into the calculated distance, based on coordinates corresponding to the plurality of body portions included in the bone data; and a learning unit that learns a model for motion recognition, the model being: based on the motion information obtained by converting the coordinates into the distance by the conversion unit, information indicating the motion of the operator belonging to any one of the common motions is output.

According to this aspect, the distance between the coordinates corresponding to each of the plurality of body parts included in the bone data and the start point coordinates corresponding to the start point part can be calculated, and each coordinate of the bone data can be replaced with the distance obtained by the calculation. Further, it is possible to learn a model for motion recognition that outputs information indicating a motion of an operator belonging to any one of the common motions based on motion information obtained by replacing each coordinate of the skeleton data with the distance data.

An operation recognition device according to another aspect of the present invention includes: an acquisition unit that acquires time-series information relating to an operation of an operator; a conversion unit that calculates a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point of the body portion, and converts each coordinate included in the bone data into the calculated distance, based on coordinates corresponding to the plurality of body portions included in the bone data extracted from the time-series information; and an operation recognition unit that inputs the distance converted by the conversion unit into a learned model and recognizes an operation of the operator based on information indicating the operation of the operator output from the learned model, wherein the operation of the operator belongs to any one of common operations to be recognized that are commonly included in different work processes.

According to this aspect, it is possible to calculate the distance between the coordinates corresponding to each of the plurality of body parts included in the bone data extracted from the time-series information of the operator to be identified and the start point coordinates corresponding to the start point part, and replace each coordinate of the bone data with the distance obtained by the calculation. Then, by inputting the distance data after replacement into the learned model, it is possible to recognize the movement of the operator belonging to any one of the common movements.

In each of the above embodiments, the starting point may be either the neck or the waist.

This makes it possible to determine the starting point position near the center of the movement of the operator to be identified.

A motion recognition method according to another aspect of the present invention is a motion recognition method executed by a processor, including the steps of: acquiring operation information including elapsed time of a work, a common operation to be identified commonly included in different work processes, and skeleton data of a worker; calculating a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, and converting each coordinate included in the bone data into the calculated distance; and learning a model for motion recognition, the model being: and outputting information indicating an action of an operator, which belongs to any one of the common actions, based on the action information obtained by converting the coordinates into the distance.

A motion recognition method according to another aspect of the present invention is a motion recognition method executed by a processor, including the steps of: acquiring time series information related to the action of an operator; calculating a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point of the body portion, based on coordinates corresponding to each of the plurality of body portions included in the skeleton data extracted from the time-series information, and converting each coordinate included in the skeleton data into the calculated distance; the distance after conversion is input to a learned model, and the movement of the operator belonging to any one of common movements as recognition objects commonly included in different work processes is recognized based on information indicating the movement of the operator output from the learned model.

A storage medium according to another aspect of the present invention stores a motion recognition program for causing a computer to function as an acquisition unit that acquires motion information including elapsed time of a work, a common motion to be recognized that is commonly included in different work processes, and skeleton data of a worker, a conversion unit that calculates distances between respective coordinates, which are coordinates corresponding to a starting point portion that is a starting point of a body portion, and starting point coordinates, which are coordinates corresponding to the starting point portion, included in the skeleton data, and converts the respective coordinates included in the skeleton data into the calculated distances, and a learning unit that learns a model for motion recognition, the learning unit being configured to: based on the motion information obtained by converting the coordinates into the distance by the conversion unit, information indicating the motion of the operator belonging to any one of the common motions is output.

A storage medium according to another aspect of the present invention stores a motion recognition program for causing a computer to function as an acquisition unit that acquires time series information on a motion of an operator, a conversion unit that calculates a distance between each coordinate and a start point coordinate that is a coordinate corresponding to a start point portion that is a start point of a body portion, based on coordinates that are included in skeleton data extracted from the time series information and that correspond to a plurality of body portions, and converts each coordinate included in the skeleton data into the calculated distance, and a motion recognition unit that inputs the distance converted by the conversion unit into a learned model and recognizes the motion of the operator based on information that is output from the learned model and that indicates the motion of the operator, the operation of the operator is one of common operations to be recognized, which are commonly included in different work processes.

According to the present invention, it is possible to provide an operation recognition device, an operation recognition method, and an operation recognition program capable of recognizing a common operation performed in different work processes.

Drawings

Fig. 1 is a diagram illustrating an outline of a motion recognition system according to an embodiment of the present invention.

Fig. 2 is a schematic view showing an example of a holding operation in a process of fastening a screw.

Fig. 3 is a schematic view showing an example of a holding operation in the process of fitting the housing.

Fig. 4 is a diagram illustrating a functional configuration of the motion recognition system and the motion recognition device.

Fig. 5 is a diagram showing an example of the operation information stored in the operation recognition device.

Fig. 6 is a diagram schematically showing a state in which the distance between the coordinates corresponding to the neck of the bone data and each of the other coordinates is calculated.

Fig. 7 is a diagram illustrating a hardware configuration of the motion recognition device.

Fig. 8 is a flowchart for explaining an example of the operation of the motion recognition device in the learning mode.

Fig. 9 is a flowchart for explaining the order of the conversion processing shown in fig. 8.

Fig. 10 is a flowchart for explaining an example of an operation in the operation recognition mode of the operation recognition device.

Detailed Description

Embodiments according to one aspect of the present invention (hereinafter, referred to as "the present embodiments") will be described below with reference to the drawings. In the drawings, portions denoted by the same reference numerals have the same structure or the same structure.

Application example § 1

First, an example of a scenario to which the present invention is applied will be described with reference to fig. 1. The motion recognition system 100 according to the present embodiment captures motions of the operator a performed in a certain work area R by the

image sensors

20a, 20b, and 20c, and the motion recognition device 10 that acquires the captured video recognizes common motions of the operator a using the learned model. Here, the common operation is an operation (an operation belonging to the same operation type) commonly included in different work steps. Such common operations include operations that are performed in common in different work steps, such as gripping, conveyance, and adjustment.

Holding is defined as an action in which a hand moves to a work object and grips, for example, an action of grasping a part, an implement, and so on. The conveyance is defined as an operation of moving the work object to a destination place, and for example, an operation of moving the work object to an assembled product transport member or an assembled tool corresponds to the operation. The adjustment is defined as an action of shifting the job to a target state, for example, an action of assembling the components corresponds to the action.

In the present embodiment, the case where the common operation is the holding operation is described as an example, but the present invention can be similarly applied to other common operations such as conveyance and adjustment.

The learned model is a model for motion recognition that has been learned as follows: the information processing apparatus receives, as input, motion information including distance data and the like generated based on skeleton data of an operator, and outputs information indicating the motion of the operator. The distance data input at the time of learning is data indicating a distance between coordinates corresponding to each of the plurality of body parts included in the skeleton data of the operator and start point coordinates corresponding to a start point part that is a start point among the body parts.

The starting point can be, for example, the neck, waist, or the like. Here, it is preferable that the neck portion is set as the starting point portion when the upper body of the operator is the recognition target, and the waist portion is set as the starting point portion when the entire body of the operator is the recognition target. In this way, it is preferable to determine the starting point portion in the vicinity of the center of the movement of the operator to be recognized.

The distance data will be further described with reference to fig. 2 and 3. Fig. 2 is a view illustrating an operation of the electric screwdriver D which is hung from the front and upper side of the operator Aa by a wire in a process of fastening a screw, and which the operator Aa attempts to extend right-hand. Fig. 3 is a diagram illustrating an operation in which the operator Ab attempts to grasp the case C placed in the component box by extending both hands in the process of fitting the case.

In both fig. 2 and 3, the operators Aa and Ab perform gripping operations. However, for example, the operation as the holding operation is completely different between the right hand of the operator Aa and the right hand of the operator Ab. In order to recognize such a gripping operation, conventionally, it is necessary to generate a learning model or the like for each working process to recognize the gripping operation performed in each working process.

In contrast, the present invention can recognize the holding operation performed in each of the different work processes by replacing each coordinate included in the skeletal data of the operator with a distance from a certain body part, and associating the characteristics of the distances with the holding operation of the operator to learn the coordinates. The following description will be specifically made.

In fig. 2 and 3, when the right-hand movement of each operator Aa, Ab is analyzed from the neck of each operator Aa, Ab, the movement of the right hand in a direction away from the neck can be characterized as any of the gripping movements. In the screw tightening process shown in fig. 2, the operation of the operator Aa gripping the electric screwdriver D with the right hand is desirably performed with the same efficiency by all the operators in the screw tightening process. Similarly, in the process of fitting the housings shown in fig. 3, the operator Ab is expected to grasp the housing C with the right hand, and all operators are expected to operate equally efficiently in the process of fitting the housings.

Therefore, by associating the distance data of the right hand corresponding to the gripping operation performed in each work process with the gripping operation in each work process and learning, the operation of the operator gripping the work object with the right hand can be recognized with high accuracy in each work process.

As described above, according to the motion recognition device 10 of the present embodiment, it is possible to recognize common motions such as a gripping motion performed in different work processes.

Construction example 2

[ functional Structure ]

Next, an example of the functional configurations of the motion recognition system 100 and the motion recognition device 10 according to the present embodiment will be described with reference to fig. 4. The motion recognition system 100 includes three

image sensors

20a, 20b, and 20c and the motion recognition device 10. Hereinafter, the three

image sensors

20a, 20b, and 20c will be referred to as the image sensor 20 unless they need to be described separately. The motion recognition device 10 includes, as functional components, for example, an acquisition unit 11, a conversion unit 12, a learning unit 13, a motion recognition unit 14, and a storage unit 19. The storage unit 19 stores, for example, a video 19a, motion information 19b, and a learned model 19 c.

Here, in the present embodiment, a case will be described where the motion recognition device 10 has a function (learning mode) of learning a model for motion recognition and a function (motion recognition mode) of recognizing a motion of an operator, but the functions may be separately provided in separate devices.

The details of the functional configurations of the motion recognition system 100 and the motion recognition device 10 will be described in turn.

< image sensor >

The image sensor 20 is, for example, a general-purpose camera, and captures an image including a case where the operator a operates in the work area R. The image sensor 20 has a functional configuration, for example, a detection unit. The detection unit detects the movement of the operator a and outputs a video representing the movement as time-series information.

Here, the time series information is not limited to the video. For example, the information may be information related to coordinates indicating the movement of the operator a measured by a motion capture device that can be provided in place of the image sensor 20.

The

image sensors

20a, 20b, and 20c are arranged to be able to capture images of the entire region of the work area R and the entire body of the worker a. In this case, for example, the

image sensors

20a, 20b, and 20c may be arranged to be able to capture images of the entire region of the work area R and the entire body of the worker a, or the

image sensors

20a, 20b, and 20c may be arranged to capture images of the entire region of the work area R and a part of the worker a, and the entire region of the worker a and the entire body of the work area R may be covered by combining the images. The

image sensors

20a, 20b, and 20c may capture images of the work area R and the operator a at different magnifications. The image sensor 20 does not need to include three, and may include at least one or more image sensors.

< acquisition part >

The acquisition unit 11 acquires time series information (video in the present embodiment) on the operation performed by the operator a from the image sensor 20. The time-series information acquired by the acquisition unit 11 is transferred to the storage unit 19 and stored as a video 19 a. The acquisition unit 11 also acquires the video 19a stored in the storage unit 19.

The acquisition unit 11 extracts skeleton data indicating the movement of the skeleton of the operator from the image of the video 19 a. The bone data can be expressed by coordinates (x, y) corresponding to a plurality of body parts, respectively. In the present embodiment, a case where the coordinates are two-dimensional coordinates (x, y) is described, but the present invention is also applicable to a case where the coordinates are three-dimensional coordinates (x, y, z). In addition to the coordinate values, information indicating the accuracy of the coordinate values may be added.

The bone data is transferred to the storage unit 19 and stored as part of the motion information 19 b. The acquisition unit 11 also acquires the operation information 19b stored in the storage unit 19.

The operation information 19b will be described with reference to fig. 5. The operation information 19b includes, for example, an elapsed time item, a right-hand operation item, a left-hand operation item, and a skeleton data item as data items. The elapsed time item is based on the time from the start of the first step among all the steps to be worked, and the elapsed time from the time to be the reference is stored. The interval of the elapsed time can be arbitrarily set, and for example, may be set for each frame unit of the video, or may be set for each predetermined time such as every 1 second.

The right-hand operation item stores information indicating to which of the common operations as objects of recognition the right-hand operation belongs. The left-hand action item stores information indicating to which of the common actions as objects of recognition the action of the left hand belongs. The information stored in the right-hand operation item and the left-hand operation item can be registered by, for example, confirming each operation of the right hand and the left hand every elapsed time by referring to the video, and inputting the content of the operation after the confirmation.

The skeleton data item of the motion information 19b stores skeleton data extracted from a video corresponding to elapsed time. The bone data stored in the bone data item is replaced with distance data calculated by a conversion process described later.

The operation information of the top line shown in fig. 5 is operation information based on a video at the time when 1 second has elapsed since the first process. The motion information stores information indicating that the motion of the right hand of the operator does not belong to any of the common motions and the motion of the left hand of the operator belongs to the holding in the common motion, and skeleton data extracted from the video at that time. The operation information of the final line shown in fig. 5 is based on the video image at the time 2 minutes 53 seconds have elapsed from the start of the first process. The motion information stores information indicating adjustment that the motions of both the right hand and the left hand of the operator belong to a common motion, and skeleton data extracted from the video at that time.

< converting part >

The conversion unit 12 shown in fig. 4 performs a conversion process of converting each coordinate included in the bone data into a distance from the start point coordinate. The conversion process is as follows: the distance between each coordinate and the start point coordinate corresponding to the start point portion is calculated based on the coordinates corresponding to each of the plurality of body portions included in the bone data, and each coordinate included in the bone data is replaced with the calculated distance.

Here, the conversion unit 12 has a common function in the learning mode and the motion recognition mode, but the input skeleton data is different. In the learning mode, the distance is calculated based on the skeleton data included in the motion information generated using the video of the learning object. On the other hand, in the motion recognition mode, the distance is calculated based on the skeleton data extracted from the video of the operator to be recognized.

Fig. 6 schematically shows an image in which the neck is set as the starting point portion a, and the distances between the starting point coordinates corresponding to the neck and the coordinates corresponding to the body portions are calculated.

The following recognition can be performed by converting each coordinate of the bone data shown in fig. 6 to a distance from the start point coordinate. For example, if the distance corresponding to the right hand becomes longer with the passage of time, it is possible to recognize how far the right hand moves (extends) in a direction away from the neck. On the other hand, if the distance corresponding to the right hand becomes shorter with the passage of time, it can be recognized how much the right hand moves (shortens) in the direction approaching the neck.

Distance d from the starting point coordinates_iFor example, the calculation can be performed as the following equation (1). In the formula (1), the starting point coordinate corresponding to the starting point portion a is represented by (x)_a，y_a) Let the coordinate corresponding to an arbitrary body part i be (x)_i，y_i) (i is variable). The coordinates corresponding to an arbitrary body part i may also include start point coordinates.

d_i＝(|x_i-x_a|^p+|y_i-y_a|^p)^1/p… type (1)

P of formula (1) can be arbitrarily designated, for example, as 1 or 2, but is not limited thereto.

The conversion unit 12 calculates the distance d based on the equation (1)_iCoordinates (x) corresponding to an arbitrary body part i_i，y_i) For example by (d)_i，d_i)。

Here, the conversion unit 12 may further include an adjustment unit that performs adjustment processing for adjusting the bone data extracted from the video. The adjustment unit included in the conversion unit 12 performs the adjustment process before the conversion process is performed.

The adjustment processing includes, for example, time-series supplementation processing of bone data, normalization processing of height (body type), time-series smoothing processing of bone data, shift processing of bone data, and noise addition processing. Each process included in the adjustment process will be described below.

The time-series supplementation processing of the bone data is processing for supplementing defect data generated by the bone data based on other bone data positioned in time sequence. The defect data is generated from an unexpectable part hidden by, for example, the posture of the operator.

The process of standardizing height (body shape) is a process of standardizing bone data on the basis of body shape in order to absorb body shape differences between men and women, for example. As the normalization process, for example, the skeleton data is divided by the length of the trunk (for example, the length from the nose to the waist) to generate normalized data.

The time-series smoothing process of the bone data is a process of removing noise from the bone data with respect to changes in the time axis direction. As the smoothing process, for example, gaussian filter processing is performed on the bone data, thereby generating smoothed data.

The displacement processing of the bone data is processing for making the origin of the bone data coincide and unifying the starting points to suppress the inconsistency of the motion and to make the feature of the motion easily recognizable. For example, the bone data after the displacement is generated by moving the entire bone data in parallel so that the joints of the neck are located at the origin.

The noise adding process is a process of adding noise to the bone data to virtually increase the bone data. The noise applied to the bone data may be generated, for example, as a randomly generated value within a range that may be suitable as the bone data.

Here, it is preferable that the adjusting unit executes each of the above-described adjustment processes in the learning mode, and executes each of the processes excluding the noise adding process from the above-described adjustment processes in the motion recognition mode.

< department of learning >

The learning unit 13 shown in fig. 4 is a function in the learning mode. The learning unit 13 generates (learns) a model for motion recognition that outputs information indicating the motion of a worker belonging to any one of the common motions based on the motion information 19b obtained by converting each coordinate of the skeleton data into a distance by the conversion unit 12.

The model learned by the learning unit 13 is transferred to the storage unit 19 and stored as a learned model 19 c.

< action recognition part >

The operation recognition unit 14 is a function in the operation recognition mode. The motion recognition unit 14 inputs distance data, which is obtained by converting each coordinate of the skeleton data corresponding to the worker to be recognized by the conversion unit 12, into the learned model 19c, and recognizes the motion of the worker based on information indicating the motion of the worker belonging to any one of the common motions output from the learned model 19 c. For recognizing the operation of the operator, a known operation recognition method such as ST-gcn (spatial Temporal Graph relational networks) can be used.

[ hardware configuration ]

Next, an example of the hardware configuration of the motion recognition device 10 according to the present embodiment will be described with reference to fig. 7. The motion recognition device 10 includes: a cpu (central Processing unit)10a corresponding to an arithmetic device, a ram (random Access memory)10b corresponding to the storage unit 19, a rom (read only memory)10c corresponding to the storage unit 19, a communication device 10d, an input device 10e, and a display device 10 f. The above-described structures are connected via a bus so as to be able to transmit and receive data to and from each other. In the present embodiment, a case where the motion recognition device 10 is configured by one computer is described, but the motion recognition device 10 may be implemented by a plurality of computers.

The CPU10a functions as a control unit that executes programs stored in the RAM10b or the ROM10c to perform data calculation and processing. The CPU10a receives various input data from the input device 10e and the communication device 10d, and displays the result of the operation on the input data on the display device 10f or stores the result in the RAM10b and the ROM10 c.

The RAM10b is composed of, for example, a semiconductor memory element, and stores data that can be rewritten. The ROM10c is composed of, for example, a semiconductor memory element, and stores data that can be read and cannot be rewritten.

The communication device 10d is an interface for connecting the operation recognition device 10 to an external device. The communication device 10d is connected to the image sensor 20 via a communication Network such as a Local Area Network (LAN) or the internet, and receives video from the image sensor 20.

The input device 10e is an interface for accepting input of data from a user, and can include, for example, a keyboard, a mouse, and a touch panel.

The Display device 10f is an interface for visually displaying the operation result and the like by the CPU10a, and may be configured by, for example, an LCD (Liquid Crystal Display).

The program may be provided as a storage medium that is stored in the RAM10b, the ROM10c, or the like and can be read by a computer, or may be provided via a communication network connected via the communication device 10 d. The operation recognition device 10 executes a program by the CPU10a to perform the operations of the acquisition unit 11, the conversion unit 12, the learning unit 13, and the operation recognition unit 14 shown in fig. 4. These physical structures are examples, and may not necessarily be independent structures. For example, the operation recognition device 10 may be an LSI (Large-Scale Integration) in which the CPU10a, the RAM10b, and the ROM10c are integrated.

Action example 3

Fig. 8 is a flowchart showing an example of the operation in the learning mode of the motion recognition device 10 according to the present embodiment. This operation is an operation when a model for motion recognition is learned using motion information 19b obtained by converting each coordinate included in skeleton data extracted from a video for learning into a distance from a start point coordinate.

First, the acquisition unit 11 acquires the motion information 19b generated based on the video 19a for learning from the storage unit 19 (step S101). The operation information 19b is information generated based on a video image captured by an operator who performs a work according to a work process including a common operation. Further, the skeleton data item of the operation information 19b stores skeleton data of the operator corresponding to the video.

Next, the conversion unit 12 performs a conversion process of converting each coordinate included in the bone data into a distance from the start point coordinate (step S102). The order of this conversion processing will be described later.

Next, the learning unit 13 learns a model for motion recognition that outputs information indicating the motion of the operator belonging to any one of the common motions based on the motion information 19b including the distance data converted in the step S102 (step S103). Then, this operation is ended.

The order of the conversion processing performed in step S102 described above will be described with reference to fig. 9. In this conversion process, a neck portion is set in advance as a starting point portion.

First, the acquisition unit 11 extracts 1 set of bone data per frame unit from the image of the video 19a having the frame number F, for example (step S201). In other words, the F sets of bone data are extracted by the acquisition section 11.

Next, the conversion unit 12 calculates the distances between the coordinates corresponding to the respective plurality of body parts and the coordinates of the starting point corresponding to the starting point part, which are included in 1 set of skeleton data, based on the above expression (1) (step S202).

Next, the conversion unit 12 converts each coordinate included in the bone data into a distance calculated in the above step S202 (step S203). Thus, the skeleton data item of the motion information 19b stores distance data.

Next, the conversion unit 12 determines whether or not the coordinates of all the pieces of bone data of the F set have been converted into distances (step S204). If the determination is no, the process proceeds to step S202, while if the determination is yes, the present conversion process is ended.

Fig. 10 is a flowchart showing an example of the operation in the operation recognition mode of the operation recognition device 10 according to the present embodiment. In this operation, it is assumed that the video 19a obtained by photographing the operation of the operator to be recognized is already stored in the storage unit 19.

First, the acquisition unit 11 acquires the video 19a for motion recognition from the storage unit 19 (step S301). The video 19a is a video that is captured by a worker who performs work in accordance with a work process including a common operation. Further, the skeleton data item of the operation information 19b stores skeleton data of the operator corresponding to the video.

Next, the conversion unit 12 performs a conversion process of converting each coordinate included in the bone data into a distance from the start point coordinate (step S302). The order of this conversion processing is the same as that of fig. 9 described above, and therefore, the description thereof is omitted.

Next, the motion recognition unit 14 inputs the distance data converted in step S302 into the learned model 19c, and recognizes the motion of the operator based on the information indicating the motion of the operator belonging to any one of the common motions output from the learned model 19c (step S303). Then, this operation is ended.

As described above, according to the motion recognition device 10 of the present embodiment, it is possible to calculate the distance between the coordinates corresponding to each of the plurality of body parts included in the skeleton data extracted from the video of the operator to be learned and the start point coordinates corresponding to the start point part, and replace each coordinate of the skeleton data with the distance obtained through the calculation. Furthermore, it is possible to learn a model for motion recognition that outputs information indicating the motion of a worker belonging to any one of the common motions based on motion information obtained by replacing each coordinate of the skeleton data with the distance data.

Further, it is possible to calculate the distance between the coordinates corresponding to each of the plurality of body parts included in the bone data extracted from the video of the operator to be recognized and the start point coordinates corresponding to the start point part, and replace the coordinates of the bone data with the calculated distance. Then, by inputting the distance data after replacement into the learned model, it is possible to recognize the movement of the operator belonging to any one of the common movements.

Therefore, according to the motion recognition device 10 of the present embodiment, it is possible to recognize common motions performed in different work processes.

The present invention is not limited to the above-described embodiments, and can be implemented in various other forms without departing from the spirit of the present invention. For example, embodiments of the present invention can be described as follows. However, the embodiments of the present invention are not limited to the embodiments described in the following description. The embodiments of the present invention may be replaced or combined with the descriptions in the attached notes.

[ additional notes 1]

A motion recognition device (10) is provided with:

an acquisition unit (11) that acquires operation information (19b), wherein the operation information (19b) includes the elapsed time of a work, a common operation that is a recognition target and is commonly included in different work processes, and skeleton data of a worker;

a conversion unit (12) that calculates a distance between each of the coordinates and a start point coordinate based on the coordinates corresponding to each of the plurality of body parts included in the skeleton data, and converts each of the coordinates included in the skeleton data into the distance obtained by the calculation, wherein the start point coordinate is a coordinate corresponding to a start point part that is determined as a start point among the body parts; and

a learning unit (13) for learning a model for motion recognition, the model being: and outputting information indicating the movement of the operator, which belongs to any one of the common movements, based on the movement information (19b) obtained by converting the coordinates into the distance by the conversion unit (12).

[ appendix 2]

A motion recognition device (10) is provided with:

an acquisition unit (11) that acquires time-series information (19a) relating to the movement of an operator;

a conversion unit (12) that calculates a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, based on coordinates corresponding to a plurality of body portions included in the skeleton data extracted from the time-series information (19a), and converts each coordinate included in the skeleton data into the distance obtained by the calculation; and

and an operation recognition unit (14) that inputs the distance converted by the conversion unit (12) into a learned model (19c) and recognizes the operation of the operator on the basis of information indicating the operation of the operator outputted from the learned model (19c), wherein the operation of the operator belongs to any one of common operations that are common recognition objects included in different work processes.

[ additional notes 3]

In the motion recognition device (10) recited in supplementary note 1 or 2, the starting point portion is either a neck portion or a waist portion.

[ additional notes 4]

A motion recognition method, being a motion recognition method executed by a processor (10a), comprising the steps of:

acquiring operation information (19b), wherein the operation information (19b) comprises the elapsed time of the operation, common operation which is a recognition object and is commonly contained in different operation procedures, and skeleton data of the operator;

calculating a distance between each of the coordinates and a start point coordinate based on coordinates corresponding to a plurality of body parts included in the skeleton data, the coordinates being a coordinate corresponding to a start point part that is determined as a start point among the body parts, and converting each of the coordinates included in the skeleton data into the distance obtained by the calculation; and

learning a model for motion recognition, the model being: and outputting information indicating the movement of the operator, which belongs to any one of the common movements, based on the movement information (19b) obtained by converting the coordinates into the distance.

[ additional notes 5]

acquiring time series information related to the action of an operator;

calculating a distance between each of the coordinates and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, based on coordinates corresponding to a plurality of body portions included in the skeleton data extracted from the time-series information (19a), and converting each of the coordinates included in the skeleton data into the distance obtained through the calculation; and

the distance after conversion is input into a learned model (19c), and the movement of the operator is recognized based on information indicating the movement of the operator, which is output from the learned model (19c), wherein the movement of the operator belongs to any one of common movements as recognition objects commonly included in different working procedures.

[ additional notes 6]

An operation recognition program for causing a computer to function as an acquisition unit (11), a conversion unit (12), and a learning unit (13),

the acquisition unit (11) acquires operation information (19b), the operation information (19b) including elapsed time of a work, common operation to be identified included in common in different work processes, and skeleton data of a worker,

the conversion unit (12) calculates a distance between each of the coordinates and a start point coordinate based on the coordinates corresponding to each of the plurality of body parts included in the skeleton data, the start point coordinate being a coordinate corresponding to a start point part that is a start point among the body parts, and converts each of the coordinates included in the skeleton data into the distance obtained by the calculation,

the learning unit (13) learns a model for motion recognition, the model being: and outputting information indicating the movement of the operator, which belongs to any one of the common movements, based on the movement information (19b) obtained by converting the coordinates into the distance by the conversion unit (12).

[ additional notes 7]

An operation recognition program for causing a computer to function as an acquisition unit (11), a conversion unit (12), and an operation recognition unit (14),

the acquisition unit (11) acquires time-series information (19a) relating to the movement of the operator,

the conversion unit (12) calculates a distance between each coordinate and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, based on coordinates corresponding to a plurality of body portions included in the skeleton data extracted from the time-series information (19a), and converts each coordinate included in the skeleton data into the distance obtained by the calculation,

the motion recognition unit (14) inputs the distance converted by the conversion unit (12) into a learned model (19c), and recognizes the motion of the operator based on information indicating the motion of the operator outputted from the learned model (19c), wherein the motion of the operator belongs to any one of common motions as recognition objects commonly included in different work processes.

Description of the reference numerals

10 … motion recognition means; 10a … CPU; 10b … RAM; 10c … ROM; 10d … communication means; 10e … input device; 10f … display device; 11 … acquisition unit; 12 … a converting part; 13 … learning part; 14 … an operation recognition unit; 19 … storage part; 19a … video; 19b … action information; 19c … completed model; 20a, 20b, 20c … image sensor; 100 … motion recognition system; a … worker; r … work area; a … origin.

Claims

1. An operation recognition device is characterized by comprising:

an acquisition unit that acquires operation information including elapsed time of a work, a common operation to be identified that is commonly included in different work processes, and skeleton data of a worker;

a conversion unit that calculates a distance between each of coordinates included in the skeleton data and a start point coordinate corresponding to a start point portion that is a start point among the body portions, and converts each of the coordinates included in the skeleton data into the distance obtained through the calculation, based on the coordinates corresponding to each of the plurality of body portions; and

a learning unit that learns a model for motion recognition, the model being: and outputting information indicating an action of the operator, which belongs to any one of the common actions, based on the action information obtained by converting the coordinates into the distance by the conversion unit.

2. An operation recognition device is characterized by comprising:

an acquisition unit that acquires time-series information relating to an operation of an operator;

a conversion unit that calculates a distance between each of coordinates included in the skeleton data extracted from the time-series information and a start point coordinate corresponding to a start point portion that is a start point among the body portions, based on the coordinates corresponding to the plurality of body portions, and converts each of the coordinates included in the skeleton data into the distance obtained by the calculation; and

and an operation recognition unit that inputs the distance converted by the conversion unit into a learned model, and recognizes an operation of the operator based on information indicating the operation of the operator output from the learned model, wherein the operation of the operator belongs to any one of common operations to be recognized, the common operations being included in common in different work processes.

3. The motion recognition apparatus according to claim 1 or 2,

the starting point is either the neck or the waist.

4. An action recognition method executed by a processor, comprising the steps of:

acquiring operation information including elapsed time of a work, a common operation to be identified commonly included in different work processes, and skeleton data of a worker;

calculating a distance between each of the coordinates and a start point coordinate based on coordinates corresponding to a plurality of body parts included in the skeleton data, the coordinates being coordinates corresponding to a start point part that is determined as a start point among the body parts, and converting each of the coordinates included in the skeleton data into the distance obtained by the calculation; and

learning a model for motion recognition, the model being: outputting information indicating an action of the operator, which belongs to any one of the common actions, based on the action information obtained by converting the coordinates into the distance.

5. An action recognition method executed by a processor, comprising the steps of:

acquiring time series information related to the action of an operator;

calculating a distance between each of the coordinates and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, based on coordinates corresponding to a plurality of body portions included in skeleton data extracted from the time-series information, and converting each of the coordinates included in the skeleton data into the distance obtained through the calculation; and

the distance after conversion is input to a learned model, and the movement of the operator belonging to any one of common movements as recognition objects commonly included in different work processes is recognized based on information indicating the movement of the operator output from the learned model.

6. A storage medium storing an operation recognition program, characterized in that,

the computer functions as an acquisition unit, a conversion unit, and a learning unit,

the acquisition unit acquires operation information including elapsed time of a work, common operation to be identified commonly included in different work processes, and skeleton data of a worker,

the conversion unit calculates a distance between each of the coordinates and a start point coordinate based on coordinates corresponding to a plurality of body parts included in the bone data, the start point coordinate being a coordinate corresponding to a start point part that is a start point among the body parts, and converts each of the coordinates included in the bone data into the distance obtained by the calculation,

the learning unit learns a model for motion recognition, the model being: and outputting information indicating an action of the operator, which belongs to any one of the common actions, based on the action information obtained by converting the coordinates into the distance by the conversion unit.

7. A storage medium storing an operation recognition program, characterized in that,

causes the computer to function as an acquisition unit, a conversion unit, and an operation recognition unit,

the acquisition unit acquires time-series information relating to the movement of the operator,

the conversion unit calculates a distance between each of the coordinates and a start point coordinate, which is a coordinate corresponding to a start point portion that is a start point among the body portions, based on coordinates corresponding to a plurality of body portions included in skeleton data extracted from the time-series information, and converts each of the coordinates included in the skeleton data into the distance obtained by the calculation,

the operation recognition unit inputs the distance converted by the conversion unit into a learned model, and recognizes the operation of the operator based on information indicating the operation of the operator outputted from the learned model, wherein the operation of the operator belongs to any one of common operations to be recognized, which are commonly included in different work processes.