CN115273215A

CN115273215A - Job recognition system and job recognition method

Info

Publication number: CN115273215A
Application number: CN202110473399.5A
Authority: CN
Inventors: 郑淼; 裴雅超; 原纯一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2022-11-01

Abstract

The invention provides a job recognition system and a job recognition method. The job recognition system includes: an acquisition unit that acquires video data obtained by imaging an area including a worker and a work object in a work; a recognition unit configured to recognize motion information indicating a motion of the operator and object information indicating an operation object used by the operator by inputting the video data acquired by the acquisition unit into a recognition model including a motion recognition model obtained by learning a human body motion in a standard operation and an object recognition model obtained by learning the operation object in the standard operation; and a specifying unit that specifies a job type corresponding to the operation information and the object information recognized by the recognition unit.

Description

Job recognition system and job recognition method

Technical Field

The present invention relates to a job recognition system and a job recognition method.

Background

Human body action recognition is always a popular research direction in computer vision, artificial intelligence, mode recognition and the like, and has very wide application in the fields of human-computer interaction, virtual reality, video retrieval, security monitoring and the like.

The existing human body action recognition method mainly comprises a human body action recognition method based on a wearable inertial sensor and a human body action recognition method based on computer vision. The human body action recognition method based on the wearable sensor is accurate in data information acquisition, but increases human body burden, is complex to operate and is difficult to popularize practically. At present, the mainstream research direction is a human body action recognition method based on computer vision, and an original image or image sequence data acquired by a camera is processed and analyzed through a computer to learn and understand the action and the behavior of a person in the original image or the image sequence data. However, in actual industrial application, application scenes are complex, situations such as people being blocked and various human-object interactions exist, single human body action recognition cannot meet the requirement of high-precision operation recognition easily, and application of action recognition in industry such as ergonomic analysis is restricted.

For example, patent document CN112070027A discloses a network training and action recognition method. In the human body action recognition method disclosed in the patent document, firstly, in a network training stage, multi-view video data capture is performed on a task target, skeleton information of a person is extracted, the skeleton information is input into a feature extraction network for pre-training to obtain a pre-training model, and model training is completed; and then, in a real-time human body action recognition stage, capturing real-time human body skeleton information in the video, inputting the captured real-time human body skeleton information into a trained model, and outputting an action recognition result by the model. The method is based on multi-view human body data in a training stage, and can reduce the problem of human body skeleton point noise caused by human body self-shielding or object shielding when motion recognition is carried out based on human body skeleton points, so that the accuracy of human body motion recognition is improved.

However, in the technology disclosed in the above patent document, the complexity of the work site has a limitation on the installation of the multi-view cameras, and when a plurality of people work, the production line is long, or there is a work intersection, the number, angle, position, etc. of the cameras become difficult to set, and it is difficult to reproduce the unobstructed data capture under the multi-view of a single person, and the cost of the deployment is high.

In an actual industrial application scenario, the executed work is often strongly related to not only the movement of the human body but also the work object. That is, when the same human body movement is performed for different work targets, the work types are different.

However, in the above patent document, only the human body data is analyzed, and the work object is not considered, so that there is a possibility of erroneous judgment, and the application demand cannot be satisfied.

Disclosure of Invention

The present invention has been made in view of the above-mentioned problems occurring in the prior art, and an object of the present invention is to provide a work recognition system and a work recognition method capable of at least improving the accuracy of work recognition.

One embodiment of the present invention provides a work recognition system including: an acquisition unit that acquires video data obtained by imaging an area including an operator and a work object in a work; a recognition unit that recognizes motion information indicating a motion of the operator and object information indicating an operation object used by the operator by inputting the video data acquired by the acquisition unit into a recognition model including a motion recognition model obtained by learning a human body motion in a standard operation and an object recognition model obtained by learning the operation object in the standard operation; and a specification unit that specifies a job type corresponding to the operation information and the object information recognized by the recognition unit.

Therefore, various problems caused by the arrangement of a plurality of cameras can be avoided, and the operation identification accuracy in a complex industrial application scene is improved by introducing the object identification model on the basis of the traditional human body action identification model and adding the human body action identification and the operation object identification into the bimodal identification model.

Further, the work recognition system may further include a storage unit that stores reference motion information indicating a motion of the human body and reference object information indicating an object of the work in association with each other for each of the work categories in the standard work; the specifying unit specifies the job type corresponding to the operation information and the target information recognized by the recognizing unit with reference to the reference operation information and the reference target information stored in the storage unit.

Thus, the work type can be specified by referring to the human body motion and the work object for each of the predefined work types, and the work identification accuracy can be further improved.

In the work recognition system, the identification means may refer to the reference motion information and the reference object information stored in the storage means, and identify a corresponding work type when the motion information and the object information identified by the identification means are successfully paired.

In this way, by determining the corresponding job type when the recognized action and job object match the standard job, the job belonging to the standard job can be recognized, which is advantageous for analyzing the work efficiency of the operator.

In the job identification system, the identification means may refer to the reference operation information and the reference object information stored in the storage means, and identify the job type of the corresponding job as an invalid job when the operation information and the object information identified by the identification means are not matched.

In this way, when the recognized action and the work target do not match the standard work, the work type of the corresponding work is determined as the invalid work, so that the work which does not belong to the standard work can be recognized, which is advantageous for analyzing the work efficiency of the worker.

In the job identification system, the image data may be displayed in a manner such that the image data is superimposed on the job type specified by the specifying means.

This enables the user to be intuitively presented with the job type of the job.

Further, the work recognition system may further include a work efficiency visualization means for performing statistics on the works of the work category for a predetermined time period for each of the work categories specified by the specification means and outputting a result of the statistics.

Thus, the work efficiency of the operator can be analyzed and evaluated based on the statistical results of the respective work categories.

In the work recognition system, the work efficiency visualization means may calculate, for each work type, a ratio of the work type in a predetermined time as the statistical result.

Therefore, the work efficiency of the operator can be more reliably and intuitively analyzed and evaluated based on the statistical result of each work type.

In the work recognition system, the motion recognition model may be obtained by extracting a set of feature values of frame data of a standard work for each of the work categories of the standard work, giving reference motion information, and performing learning using the set of feature values and the reference motion information for each of the work categories; the object recognition model is obtained by assigning reference object information to each of the work objects for each of the work categories in the standard work and learning the reference object information using frame data of each of the work categories in the standard work.

Therefore, the recognition precision of the human body action and the recognition precision of the operation object can be improved, and the operation recognition accuracy is improved based on the bimodal recognition model.

In the work recognition system, the feature amount may be a human bone two-dimensional coordinate.

Therefore, the problem that a detection object is shielded in a complex industrial application scene can be avoided by analyzing and training the two-dimensional coordinates of the human skeleton in the space, and the deployment cost can be reduced and the model training time can be reduced as a plurality of cameras are not required to be arranged.

An embodiment of the present invention also provides a job identifying method, including: an acquisition step of acquiring video data obtained by imaging an area including a worker and a work object in a work; a recognition step of recognizing motion information indicating a motion of the operator and object information indicating an operation object used by the operator by inputting the video data acquired in the acquisition step into a recognition model including a motion recognition model obtained by learning a human body motion in a standard operation and an object recognition model obtained by learning the operation object in the standard operation; and a determination step of determining a job type corresponding to the motion information and the object information recognized in the recognition step for each job included in the video data.

The above-described embodiments and effects of the job identification system according to the present invention can be realized by the job identification method, a program for causing a computer to execute the method, or a recording medium storing the program.

Drawings

Fig. 1 is a block diagram showing an example of a hardware configuration of a recognition system according to a first embodiment of the present invention.

Fig. 2 is a functional block diagram showing a job identifying system according to a first embodiment of the present invention.

Fig. 3 is a diagram for explaining a method of training a recognition model used in the job recognition system according to the first embodiment of the present invention.

Fig. 4 is a flowchart showing a process of a job identifying method executed by the job identifying system according to the first embodiment of the present invention.

Fig. 5 is a functional block diagram showing a job identifying system according to a second embodiment of the present invention.

Fig. 6 is a flowchart showing a process of a job identifying method executed by the job identifying system according to the second embodiment of the present invention.

Fig. 7 is a functional block diagram showing a job identifying system according to a third embodiment of the present invention.

Fig. 8 is a diagram showing an example of work efficiency visualization in the third embodiment of the present invention

Fig. 9 is a flowchart showing a process of a job identifying method executed by the job identifying system according to the third embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and embodiments. The following description is only an example for the convenience of understanding the present invention and is not intended to limit the scope of the present invention. In the embodiments, the components included in the apparatus, the system, and the like may be changed, deleted, or added according to the actual situation, and the steps of the method may be changed, deleted, added, or changed in order according to the actual situation.

(first embodiment)

First, a hardware configuration of the job recognition system according to the present embodiment will be described. Fig. 1 is a block diagram showing an example of a hardware configuration of a job identification system 100 according to a first embodiment of the present invention.

As shown in fig. 1, the job recognition system 100 includes a camera 110, a computer 120, and a display terminal 130. The camera 110 is installed at a work site, and photographs an area including an operator and a work object. The computer 120 has at least a processor, a memory, and an interface connected by a data bus. The processor is, for example, a CPU, a microprocessor, or the like, and executes an application program stored in the memory to realize each functional block of the job identification system 100. The interface is, for example, a communication interface, and is capable of data communication with the display terminal 130. The display terminal 130 can display a screen related to the processing procedure and the result of the job recognition system 100. Here, the computer 120 and the display terminal 130 may be integrated. Further, the computer 120 and/or the display terminal 130 may have an input device such as a keyboard, a mouse, or a microphone for a user to input instructions.

The functional configuration of the job recognition system according to the present embodiment will be described below. Fig. 2 is a functional block diagram showing a job identifying system according to a first embodiment of the present invention.

As shown in fig. 2, job recognition system 100 includes acquisition section 101, recognition section 102, and determination section 103.

The acquisition unit 101 acquires video data obtained by imaging an area including a worker and a work object in a work. The video data can be obtained by, for example, capturing images of a worker who repeatedly performs a series of work processes and a work object used by the worker in real time. However, the video data may be data obtained by shooting in advance.

Recognition section 102 inputs the video data acquired by acquisition section 101 into a recognition model, thereby recognizing motion information indicating the motion of the operator and object information indicating the work object used by the operator. As shown in fig. 2, the recognition model includes an action recognition model obtained by learning a human body action in a standard work and an object recognition model obtained by learning a work object in the standard work. These recognition models can be obtained by any known machine learning method.

A method of generating a recognition model used in the present embodiment will be described with reference to fig. 3. Fig. 3 is a diagram for explaining a method of training a recognition model used in the work recognition system according to the first embodiment of the present invention.

First, a job type in a standard job is defined (step S1). The standard work is a work method for efficiently performing production in an operation order without waste, centering on human movement. Such standard jobs in an application scenario are classified into operations, and each operation classification is associated with one job category. For example, assuming that a standard job includes n actions of action 1 to action n (n is an integer of 1 or more), n job types (job types 1 to n) are set corresponding to the n actions. In the operation categories 1 to n, the operations 1 to n and the operation objects used in the operations may be different from each other, or the operations 1 to n may have the same operation but the same operation may use different operation objects. For example, some two motions may be the same as a "pick up" motion, but one motion to pick up a "screw" and the other motion to pick up a "clamp". Similarly, the job categories 1 to n may have different operations but the same job object may be used. For example, the operations of "pick up" and "drop" are performed for one work object.

Next, video data of a standard job is prepared, and frame data of each job type is extracted from the video data of the standard job (step S2)

Next, a training model is obtained using the frame data of each job type (step S3). Specifically, the frame data is processed for each job type, the feature amount for each frame is extracted to obtain a feature amount group, an operation tag is assigned (that is, reference operation information is assigned), and an object tag is assigned for each job object (that is, reference object information is assigned). More specifically, for the frame data 1 of the work category 1, the human skeleton two-dimensional coordinates in each frame are extracted as the feature amount group 1, and an action tag "action 1" is assigned, and an object tag "work object 1" is assigned according to the work object used in the work category 1. For the frame data 2 of the work category 2, the human skeleton two-dimensional coordinates in each frame are extracted as a feature amount group 2, and an action tag "action 2" is assigned, and a work object tag "work object 2" is assigned according to the work object used in the work category 2. Similarly, for the other job types, the two-dimensional coordinates of the human skeleton in each frame of the frame data n of the job type n are extracted as a feature amount group n, and an operation label "action n" is assigned, and an operation target label "operation target n" is assigned according to the operation target used in the job type n.

Here, the human skeleton two-dimensional coordinates are a preferable example of the feature amount, and by using the human skeleton two-dimensional coordinates, it is possible to avoid the problem of the multi-camera requirement and the noise data blocked by the human body in the video. However, it is needless to say that other feature values such as coordinates of other feature points, and a position, an orientation, and a movement trajectory of the hand may be used.

The resulting data is then used for training. Specifically, the acquired feature value groups 1 to n and the operation labels ("operation 1" to "operation n") are combined, input to the deep neural network, and learned to generate an operation recognition model. The network structure may be, for example, LSTM, but is not limited to this, and other suitable network structures may be used. Frame data 1 to n of the operation types 1 to n and the operation target tags ("operation target 1" to "operation target n") are combined, input to a convolutional neural network, and learned, thereby generating an object recognition model. Here, the network configuration may be, for example, dirknet53, but is not limited to this, and other appropriate network configurations may be used.

The function of training such a recognition model may be configured in an external computer separate from the job recognition system 100 and called by the job recognition system 100, or may be built in the job recognition system 100 as a function provided in the job recognition system 100.

Returning to fig. 2, the determination unit 103 determines the job type corresponding to the action information and the object information recognized by the recognition unit 102. For example, when the recognition unit 102 recognizes "pick up" as the motion information by extracting the feature amount and recognizes "screw" as the object information, the determination unit 103 can determine the job type "pick up screw" corresponding to the motion information and the object information.

In this way, the work recognition system 100 can sequentially recognize the work type corresponding to the movement of the operator and the work object to be used for the work in the video data.

As shown in fig. 4, in step S11, the image data obtained by imaging the region including the operator and the work target in the work is acquired by the acquisition unit 101.

In step S12, recognition section 102 inputs the video data acquired in step S11 to a recognition model including a motion recognition model that outputs motion information indicating the motion of the operator by inputting feature quantities extracted from each frame data of the video data and an object recognition model that outputs object information indicating an operation object used by the operator by inputting frame data of the video data.

In step S13, the determination unit 103 determines the job type corresponding to the action information and the object information recognized in step S12, and returns.

According to the job recognition system and the job recognition method of the present embodiment, object recognition is introduced in addition to motion recognition, and the job type can be recognized more reliably than in the past by the bimodal recognition model based on the motion recognition model and the object recognition model, thereby improving the job recognition accuracy. In addition, the problem that a detection object is shielded in a complex industrial application scene can be avoided, so that the deployment cost can be reduced.

In the present embodiment, the job identification system 100 may further include a display unit that sequentially displays the job types specified by the specification unit 103 in a superimposed manner on the video data. This enables the user to be intuitively presented with the job type of the job.

In the present embodiment, when the identification means fails to identify the operation or the object to be used by the operator because the operation or the object is not a learned operation or a work object, the job type of the corresponding job may be determined as the invalid job. In addition, when the operator can clearly specify the operation type of the operator by performing a certain operation according to the nature of the operation, the operation type of the corresponding operation may be specified as the operation type even if the operator cannot recognize the operation target for some reason.

(second embodiment)

The job recognition system 100A according to the second embodiment and the job recognition method executed by the same will be described in detail below. In the present embodiment, differences from the first embodiment will be mainly described, and the same reference numerals are used for the same or similar portions as those in the first embodiment, and overlapping description will be omitted as appropriate.

In the first embodiment, as long as an action and a work object that have been learned in advance are concerned, regardless of whether or not they belong to a job in a standard job, a job type corresponding to the action and the work object is specified. However, in an actual work site, there is a possibility that an erroneous operation such as picking up a wrong work object or picking up an object unrelated to the work may occur. Therefore, in the present embodiment, only the job belonging to the standard job or only the job not belonging to the standard job is recognized, or both are recognized separately.

Fig. 5 is a functional block diagram showing a job identifying system according to a second embodiment of the present invention. As shown in fig. 5, job identifying system 100A of the present embodiment includes storage section 104 in addition to acquiring section 101, identifying section 102, and determining section 103. The processing of the acquisition unit 101 and the recognition unit 102 is the same as that of the first embodiment, and a repetitive description thereof will not be made here.

The storage section 104 stores reference motion information indicating a motion of a human body and reference object information indicating an object of a work in association with each other for each work category in a standard work. The reference motion information of each job type corresponds to each motion described in the definition of the job type in the first embodiment and each motion label used for training the recognition model. The reference object information of each job type corresponds to each job object described in the definition of the job type in the first embodiment and each object label used for training the recognition model. The correspondence relationship between the job type, the reference operation information, and the reference object information is shown in table 1 below.

Table 1:

the specification unit 103 refers to the reference motion information and the reference object information stored in the storage unit 104, and specifies the job type corresponding to the motion information and the object information recognized by the recognition unit 102.

Specifically, determination section 103 refers to the reference motion information and the reference object information stored in storage section 104, and determines the corresponding job type when the motion information and the object information recognized by recognition section 102 are successfully paired. For example, when the recognition unit 102 recognizes the action 2 and the work object 2, it can be seen from table 1 above that the action 2 and the work object 2 are matched, that is, the matching is successful. Then, the determination unit 103 determines the job type of the current job as the job type 2 corresponding to the action 2 and the job object 2. In this way, the job type belonging to the standard job can be identified.

Further, the specification unit 103 may specify the job type of the corresponding job as an invalid job when the pair of the operation information and the object information recognized by the recognition unit 102 fails by referring to the reference operation information and the reference object information stored in the storage unit 104. For example, when the recognition unit 102 recognizes the action 1 and the work object 2, referring to the above table 1, it can be seen that the action 1 and the work object 2 are not matched, that is, the pairing fails. Then, the determination unit 103 determines the job category of the job as an invalid job. In this way, a job that does not belong to the standard job can be identified, and the job type thereof can be determined as an invalid job.

Further, the job type belonging to the standard job and the invalid job not belonging to the standard job may be separately identified.

In this way, the job identification system 100A can sequentially identify the movement of the operator and the job object used for the job in the video data, determine whether the job belongs to the standard job, and further determine the specific job type if the job belongs to the standard job.

Fig. 6 is a flowchart showing a process of a job recognition method executed by the job recognition system according to the second embodiment of the present invention.

In fig. 6, similarly to the first embodiment, in step S21, video data obtained by imaging an area including a worker and a work target in a work is acquired, in step S22, the video data acquired in step S21 is input to a recognition model including an operation recognition model and a target recognition model, the operation recognition model outputs operation information indicating the operation of the worker by input of feature quantities extracted from the video data, and the target recognition model outputs object information indicating the work target used by the worker by input of frame data in the video data.

In step S23, the specification unit 103 refers to the reference operation information and the reference object information stored in the storage unit 104 in advance, and specifies the job type corresponding to the operation information and the object information recognized in step S22.

Specifically, when the operation information and the object information are recognized in step S22, the determination unit 103 refers to the reference operation information and the reference object information stored in the storage unit 104 in advance, and determines whether or not the recognized operation information and the recognized object information match each other, that is, whether or not the pairing is successful (step S231).

If it is determined that the pairing is successful (yes in step S231), since it indicates that the current job is the job type in the standard job, the job type corresponding to the recognized operation information and object information is specified (step S232), and the procedure returns. In this case, the determined job type may be displayed on the video in real time.

If it is determined that the pairing has failed (no in step S231), it indicates that the current job is not the job type in the standard job, and the job type is determined as an invalid job (step S233), and the process returns.

Here, the case where the job type and the invalid job in the standard job are recognized at the same time is described, but only one of them may be recognized as necessary.

According to the job identification system and the job identification method of the present embodiment, not only can the job identification accuracy in a complicated industrial application scene be improved, but also a job belonging to a standard job and its job type and/or an invalid job can be identified. Further, the manager can analyze and evaluate the work efficiency of the operator based on the recognition result.

(third embodiment)

Hereinafter, a job recognition system 100B and a job recognition method executed by the same according to a third embodiment will be described in detail with reference to fig. 7 to 9. In the present embodiment, differences from the second embodiment will be mainly described, and the same reference numerals are used for the same or similar portions as those in the second embodiment, and overlapping description will be appropriately omitted. Further, the present embodiment can be applied to the first embodiment.

Fig. 7 is a functional block diagram showing a job identifying system according to a third embodiment of the present invention. As shown in fig. 7, work recognition system 100B according to the present embodiment includes work efficiency visualization means 105 in addition to acquisition means 101, recognition means 102, determination means 103, and storage means 104. The configurations of the acquisition section 101, the recognition section 102, the specification section 103, and the storage section 104 are the same as those of the second embodiment, and a description thereof will not be repeated.

The job efficiency visualization unit 105 counts jobs of the job type for a predetermined time for each job type determined by the determination unit 103, and outputs the counted result. And, as the video data goes on in time, the work efficiency visualization unit 105 updates the current statistical result in real time every time a work category is determined. The predetermined time may be set to any time as needed, for example, several hours, or may be set to a time from the start of the job recognition system to the current job.

Further, depending on the nature of the job, the statistical object may be the number of times each job type is specified within the predetermined time period, or the time taken for the job of each job type within the predetermined time period.

Further, the work efficiency visualization means 105 may calculate, for each work category, the ratio of the work category in a predetermined time period, and output the calculated ratio as a statistical result. This case is particularly effective for the case where the statistical object is time. Based on the statistical results, the manager can more reliably analyze and evaluate the work efficiency of the operator.

As an output method of the statistical result, a histogram method may be adopted so as to more intuitively present the statistical result to the administrator. Fig. 8 is a diagram showing an example of a visualization of work efficiency in the third embodiment of the present invention. Fig. 8 shows the ratio of the time used by the job of the job category to the predetermined time for each of the job categories (1) to (5), where the job categories (1) to (4) are job categories belonging to standard jobs and the job category (5) is an invalid job. Thus, the manager can analyze the work efficiency of the operator based on the result. For example, whether the number of invalid jobs is too large, whether the proportion of the time used by a certain job type is too high, which may cause the delay of the whole job flow, and the like.

Fig. 9 is a flowchart showing a process of a job identifying method executed by the job identifying system according to the third embodiment of the present invention. In fig. 9, the processing of steps S31 to S33 (S331 to S333) is the same as the processing of steps S21 to S23 (S231 to S233) in the second embodiment, and a description thereof will not be repeated.

In step S34, the job efficiency visualization unit 105 counts up the jobs of the job category for a prescribed time for each job category (including the invalid job) based on the job category determined in step S332 and the invalid job determined in step S333, and outputs the statistical result. As the video data goes on over time, the current statistical result is updated in real time each time a job type is determined.

According to the work recognition system and the work recognition method of the present embodiment, by counting and outputting data with high recognition accuracy, the manager can analyze and evaluate the work efficiency of the worker in real time based on the result of counting for each work category.

The embodiments of the present invention have been described above with reference to the accompanying drawings. The above-described embodiments are merely specific examples of the present invention, which are provided for understanding the present invention and are not intended to limit the scope of the present invention. Those skilled in the art can make various modifications, combinations, and reasonable omissions of the elements of the embodiments based on the technical idea of the present invention, and the resulting means are also included in the scope of the present invention. For example, the above embodiments may be combined with each other, and the combined embodiments are also included in the scope of the present invention.

Claims

1. A job identification system, comprising:

an acquisition unit that acquires video data obtained by imaging an area including a worker and a work object in a work;

a recognition unit configured to recognize motion information indicating a motion of the operator and object information indicating an operation object used by the operator by inputting the video data acquired by the acquisition unit into a recognition model including a motion recognition model obtained by learning a human body motion in a standard operation and an object recognition model obtained by learning the operation object in the standard operation; and

and a specification unit that specifies a job type corresponding to the operation information and the object information recognized by the recognition unit.

2. The job identification system according to claim 1,

the device further comprises a storage unit which stores reference motion information representing the motion of the human body and reference object information representing an operation object in a corresponding manner for each operation type in the standard operation;

the specifying unit specifies the job type corresponding to the operation information and the target information recognized by the recognizing unit with reference to the reference operation information and the reference target information stored in the storage unit.

3. The job identification system according to claim 2,

the identification means refers to the reference motion information and the reference object information stored in the storage means, and identifies a corresponding job type when the motion information and the object information identified by the identification means are successfully matched.

4. The job identification system according to claim 3,

the specifying unit specifies the job type of the corresponding job as an invalid job when the matching between the operation information and the target information recognized by the recognizing unit fails by referring to the reference operation information and the reference target information stored in the storage unit.

5. The job identification system according to any one of claims 1 to 4,

the image processing apparatus further includes a display unit that displays the job type determined by the determination unit in a superimposed manner on the video data.

6. The job identification system according to any one of claims 1 to 4,

the work efficiency visualization unit counts the work of the work type within a specified time according to each work type determined by the determination unit and outputs a statistical result.

7. The job identification system according to claim 6,

the work efficiency visualization means calculates, for each work type, a ratio of the work type in a predetermined time as the statistical result.

8. The job identification system according to any one of claims 1 to 4,

the operation recognition model is obtained by extracting a set of feature values of frame data of a standard job for each job type in the standard job, giving reference operation information, and performing learning using the set of feature values and the reference operation information for each job type;

the object recognition model is obtained by assigning reference object information to each of the job types in the standard job based on the job object, and performing learning using frame data of each of the job types in the standard job and the reference object information.

9. The job identification system according to claim 8,

the above feature quantities are two-dimensional coordinates of human bones.

10. A job identification method, comprising:

an acquisition step of acquiring video data obtained by imaging an area including an operator and a work object in a work;

a recognition step of recognizing motion information indicating a motion of the operator and object information indicating an operation object used by the operator by inputting the video data acquired in the acquisition step into a recognition model including a motion recognition model obtained by learning a human body motion in a standard operation and an object recognition model obtained by learning the operation object in the standard operation; and

and a determination step of determining a job type corresponding to the operation information and the object information recognized in the recognition step.