CN113361362A

CN113361362A - Method and device for identifying working behavior of peasant, electronic equipment and storage medium

Info

Publication number: CN113361362A
Application number: CN202110604095.8A
Authority: CN
Inventors: 李想; 赵文馨; 李怡良; 陈昕; 卢韬
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-07

Abstract

The invention provides a method, a device, electronic equipment and a medium for identifying the working behavior of farmers, wherein the method comprises the following steps: establishing a data set of the working behaviors of farmers, and scientifically dividing the data set; constructing a P3D module and a ConvLSTM module of a deep learning model; combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model; training the P3DConvLSTM model based on the data set, and identifying farmer work behavior based on the trained P3DConvLSTM model. The invention provides a P3DConvLSTM model applied to farmer work behavior recognition based on a deep learning algorithm aiming at the characteristics of two dimensions of space and time in video analysis, and establishes an end-to-end trainable model for the farmer work behavior recognition.

Description

Method and device for identifying working behavior of peasant, electronic equipment and storage medium

Technical Field

The invention relates to the field of image processing, in particular to a method and a device for identifying the working behavior of farmers, electronic equipment and a storage medium.

Background

The most important reason for the safety problem of agricultural products and food is the incomplete, opaque and asymmetric information. To solve the problem, the most fundamental solution is to do the tracing and tracing work of the whole agricultural product production process, establish a reasonable and reliable agricultural product traceability system, ensure that the information of each link on the agricultural product supply chain is real, complete and transparent and is not easy to be falsified, and ensure that consumers can know each planting and cultivation link of the agricultural product.

Some common working behaviors of farmers are an important factor for determining the quality of agricultural products, for example, pesticide residues of the agricultural products can be directly influenced by spraying pesticides, and even potential safety hazards of the agricultural products are caused. Therefore, the working behavior of farmers is important information in agricultural product traceability. However, the existing work behavior records are mainly filled by manual personnel, and due to the existence of a large number of subjective factors, the reality degree is not high.

With the wide application of devices such as cameras and the like and the development of machine learning technology, a new solution is provided for the identification of the working behaviors of farmers, however, a reliable method for identifying the working behaviors of farmers is lacked at present, so that effective working information of farmers cannot be provided for a traceability system.

Disclosure of Invention

Aiming at the current situation that a reliable method aiming at farmer work behavior identification is lacked at present, the P3DConvLSTM neural network model applied to farmer work behavior identification is provided based on a deep learning algorithm aiming at the characteristics of two dimensions of space and time in video analysis, and an end-to-end trainable model is established for farmer work behavior identification, so that effective farmer work information can be provided for a traceability system.

Specifically, the invention is realized by the following technical scheme:

in a first aspect, the invention provides a method for identifying the working behavior of farmers, which comprises the following steps:

establishing a data set of the working behaviors of farmers, and dividing the data set;

constructing a P3D module and a ConvLSTM module;

combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model;

training the P3DConvLSTM model based on the data set;

and identifying the working behavior of the peasant to be identified based on the trained P3DConvLSTM model.

Further, the establishing of the data set of the working behaviors of the farmers and the dividing of the data set comprise:

obtaining a data set of the working behaviors of the farmers based on the working videos of the farmers;

and dividing the data set into a training set and a verification set, and carrying out normalization processing on the training set and the verification set.

Further, the obtaining of the data set of the farmer work behaviors based on the farmer work videos comprises:

converting the video into a sequence of images;

dividing a sequence of images into a plurality of video segments;

eliminating the video segments without farmers or with too fast scene switching;

and extracting images in the video segment as samples by using interval sampling, and obtaining a target image of the working behavior of the peasant, thereby obtaining the data set of the working behavior of the peasant.

Further, the P3D module includes:

a two-dimensional spatial convolution kernel and a one-dimensional time-domain convolution kernel.

Further, the P3DConvLSTM model comprises:

the c module simplified model is not reserved;

the complex model of the c module is not reserved;

c, reserving the simplified model of the module c;

the c-block complex model is not retained.

Further, the ConvLSTM module comprises 2 HC-LSTM modules, 2C 3D modules and 3 full connection layers, and the output result of the ConvLSTM module is a classification result.

Further, combining the P3D module with the ConvLSTM module to establish a P3 dconvltm model, comprising:

and cascading the P3D module and the ConvLSTM module to obtain the P3DConvLSTM model.

In a second aspect, the present invention provides a device for identifying a working behavior of a farmer, comprising:

the data set establishing unit is used for establishing a data set of the working behaviors of farmers and dividing the data set;

the module construction unit is used for constructing a P3D module and a ConvLSTM module;

the model building unit is used for combining the P3D module with the ConvLSTM module to build a P3DConvLSTM model;

a model training unit to train the P3DConvLSTM model based on the data set;

and the behavior identification unit is used for identifying the working behavior of the peasant to be identified based on the trained P3DConvLSTM model.

In a third aspect, the present invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the method for identifying the working behavior of farmers as described in the first aspect.

In a fourth aspect, the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for identifying a work behavior of a farmer as described in the first aspect.

According to the method, the data set of the farmer working behaviors is established based on the farmer working videos, the P3DConvLSTM model is established, the P3DConvLSTM model is trained based on the data set of the farmer working behaviors, the farmer working behaviors are identified based on the trained P3DConvLSTM model, a reliable method for farmer working behavior identification is provided, an end-to-end trainable model is established for the farmer working behavior identification, and therefore effective farmer working information can be provided for a traceability system.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a farmer working behavior recognition method according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of the structure of three P3D modules according to one embodiment of the invention;

FIG. 3 is a block diagram of a P3DConvLSTM network structure according to one embodiment of the present invention;

FIG. 4 is a block diagram of a HC-LSTM network architecture employed in comparative experiments in accordance with the present invention;

FIG. 5 is a block diagram of a C3D network architecture employed in comparative experiments according to the present invention;

FIG. 6 is a diagram of the structure of the C3D _ Block module employed in a comparative experiment according to the present invention;

fig. 7 is a schematic view of a farmer working behavior recognition apparatus according to another embodiment of the present invention; and

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a farmer working behavior recognition method according to one embodiment of the present invention. Referring to fig. 1, the method may include the steps of:

step 101: establishing a data set of the working behaviors of farmers, and dividing the data set;

step 102: constructing a P3D module and a ConvLSTM module;

step 103: combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model;

step 104: training the P3DConvLSTM model based on the data set;

step 105: and identifying the working behavior of the peasant to be identified based on the trained P3DConvLSTM model.

Specifically, in this embodiment, it should be noted that, in step 101, a farmer farming related video may be collected, and a sample is generated using a semi-automated technique; taking four farmer behaviors of pesticide spraying, hoeing, weeding and seedling transplanting as examples, 577 sample data sets with a video frame rate of about 25fps are obtained.

In this embodiment, the data preprocessing steps are as follows:

conversion: converting the video into an image sequence, and arranging and naming the images according to the sequence of the images in the video;

segmenting: grouping the images in sequence and combining each group into a video segment, each video segment being 50 frames in size;

manual screening: manually eliminating video segments without farmers or with too fast scene switching;

sampling: using the images in the interval sampling extraction video segment as samples, in one example, skipping the frame extraction images by taking 5 frames as an interval unit;

and (3) extracting a farmer behavior area: detecting a farmer behavior area contained in each image in the sample set by adopting a yolov3 model as a human body detector, and if a plurality of people are found in one image, taking the area of each human body in the first image in the sample set as a reference and using a nearest neighbor algorithm (KNN) as a person association matching algorithm;

through the steps, a farmer working behavior data set can be obtained, 53 videos are totally contained in the data set, 13-14 videos exist in each type of action, and the videos are divided into 119-163 samples, which are shown in table 1. The number of samples totaled 577, with 120 test sets and the remaining 457 test sets were divided into training and validation sets at 4: 1. Because the data set is less, the training data is expanded by using an online data set enhancement mode, and each sample has a probability of 0.5 to be horizontally inverted. To increase the complexity of the sample, there is a small portion of the sample that is not pre-processed. And finally, carrying out normalization processing on the image to accelerate the speed of the gradient descent optimal solution.

TABLE 1

Behavior	Video volume	Sample size
			Spraying pesticide
	14	119
			Hoeing the ground	13	146
Weeding	13	163
			Rice transplanting	13	149

In step 102, P3D is a pseudo 3D residual network, the basic idea of which is to simulate a conventional 3 × 3 × 3 three-dimensional convolution kernel using a 3 × 3 × 1 two-dimensional spatial convolution kernel and a 1 × 1 × 3 one-dimensional time-domain convolution. The invention designs three P3D modules, and cascades two of the three P3D modules or the three P3D modules to construct the P3D module of the deep learning model. The invention improves and designs three P3D modules as shown in FIG. 2, and the three P3D modules or the cascade connection of two of them are used for the P3DConvLSTM model of the invention. In order to decompose a complex 3D convolution operation into a 2D operation and a 1D operation, thereby simplifying the complex 3D convolution operation, two convolution kernels (3,3,1), (1,1,3) need to be designed, wherein (3,3,1) is equivalent to performing the 2D convolution operation only at a spatial level and not performing a feature extraction at a temporal level, and (1,1,3) is equivalent to performing the 1D convolution operation only at the temporal level and not performing the feature extraction at the spatial level, and a part of the convolution kernels in the P3D module design are (1,1,1), which perform a data shape transformation function.

Too many network layers can reduce the learning capabilities of the network due to the lightweight data set. Therefore, for the composition of the P3D module, the patent designs 4 schemes, which are respectively: (1) not preserving c-module reduced models (2blocks & 29layers) (2) not preserving c-module complex models (2blocks & 53layers) (3) preserving c-module reduced models (3blocks & 37layers) (4) not preserving c-module complex models (3blocks & 69 layers).

In addition, in step 102, the ConvLSTM module of the present invention comprises 2 HC-LSTM-blocks, 2C 3D-blocks and 3 fully connected layers, and the final output result is the classification result. The sizes of convolution kernels of ConvLSTM layers of 2 HC-LSTM-blocks are 3 multiplied by 3, and the number of the convolution kernels is 64; and the pooling layers adopt a maximum pooling mode with the size of 2 multiplied by 2 to carry out down-sampling processing on the information extracted by the convolutional layers. The size of the convolution layer of the C3D-block is 3 multiplied by 3, and the number of convolution kernels is selected to be 3 because the RGB picture is 3 channels; then, a pooling layer with a pooling window size of 1 × 7 × 7 is used. And finally, obtaining a final feature vector through 3 full-connection layers to realize classification.

In step 103, the P3D module may be combined (joined) with the ConvLSTM module to build the P3 dconvltm model; the structure of the P3DConvLSTM model is shown in fig. 3 (wherein the BN layer and the activation layer are not shown), and the main structure thereof is the cascade connection of the P3D module and the ConvLSTM module. The a _ Block, B _ Block and C _ Block in the figure are three kinds of P3D modules in fig. 2. The selection of the model portion hyper-parameters will be discussed in detail below.

In step 104, the P3DConvLSTM model may be trained based on the farmer work activity data set obtained in step 101.

In step 105, farmers' work behaviors are identified based on the trained P3DConvLSTM model.

The effects of the present invention are described in detail below by comparison:

effect comparison object 1: HC-LSTM neural network model

The HC-LSTM network structure adopted in the comparison experiment is shown in FIG. 4, which comprises 3 HC-LSTM-blocks, 1C 3D-block and 3 fully-connected layers, and the final output result is a classification result. The sizes of convolution kernels of ConvLSTM layers of 3 HC-LSTM-blocks are 3 multiplied by 3, and the number of the convolution kernels is 64; and the pooling layers adopt a maximum pooling mode with the size of 2 multiplied by 2 to carry out down-sampling processing on the information extracted by the convolutional layers. To prevent overfitting, a third HC-LSTM-block was introduced into the dropout layer. The size of the convolution layer of the C3D-block is 3 multiplied by 3, and the number of convolution kernels is selected to be 3 because the RGB picture is 3 channels; then, a pooling layer with a pooling window size of 1 × 7 × 7 is used. And finally, obtaining a final feature vector through 3 full-connection layers to realize classification.

Effect contrast object 2: C3D neural network model

The network structure of C3D and the module structure of C3D _ Block used in this comparative experiment are shown in fig. 5 and 6, respectively. In the C3D network structure, the network is composed of 5C 3D _ blocks, and the convolution kernel size of each C3D _ Block is 3 × 3 × 3.

Designing an experimental environment and an evaluation rule:

the process environment operates as follows: the operating system Windows 10, configure to i9-10900KCPU, 32GB memory, NVIDA GeForce RTX 3080, adopt tensoflow2.4.1 and keras 2.4.3 frameworks.

The experimental steps are divided into a definition phase, a training phase and an evaluation phase. Wherein the definition phase includes the definition of model structures, loss functions, optimizers, etc. The whole network adopts an end-to-end training mode, wherein the value of a Dropout parameter is 0.5, and an adam (adaptive motion) optimizer is adopted to update the parameter. The initial learning rate was set to 0.001, the learning rate decayed to 0.1 every 10 epochs, the momentum (momentum) was 0.1, and a total of 30 epochs were trained.

For the video classification problem, two types of records of Positive (Positive) and Negative (Negative) categories exist in the classification result, and the model can make Positive judgment (judging that the record belongs to the Positive category) or Negative judgment (judging that the record belongs to the Negative category) on the record classification. Positive records of correct decisions (True Positives), negative records of incorrect decisions (True Negatives), and negative records of incorrect decisions (False Negatives). The evaluation parameters in this document adopt average accuracy mAP (mean average precision) to represent the credibility of the judgment result when the model judges that a point belongs to the class, and average Recall rate Recall to represent the ratio of the model capable of detecting the class:

where TP is predicted as the number of positive samples that actually belong to positive samples, FP is predicted as the number of positive samples that actually do not belong to positive samples, and FN is the number of positive samples that are predicted as negative samples.

F1 is the harmonic average value of precision and recall ratio, and the precision and recall ratio of a class are comprehensively considered and combined in the same index.

Experiment and comparative analysis:

after 5 epochs, the model starts to converge and the accuracy gradually rises. Wherein, the convergence rate of P3DConvLSTM (2blocks & 53layers) and P3DConvLSTM (2blocks & 29layers) is faster and more stable. The accuracy of P3DConvLSTM (3blocks & 37layers) is the highest in the training process, P3DConvLSTM (3blocks & 69layers) is unstable, and the accuracy is large in the later period of training.

TABLE 2 experimental comparison of four protocols for P3DConvLSTM

As can be seen from table 2, the F1 of the model has a maximum value of 0.9338 and a minimum value of 0.8078, which are the C-module-preserved reduced model and the C-module-not-preserved complex model, respectively.

The maximum F1 value of the 'C module simplified model reservation' is 0.9338, which is 1.03 times of the 'C module complex model reservation', but the training time is about half of the training time. Therefore, the network structure is too deep, which causes the situation that partial characteristics disappear in the data transmission process, and the model effect is reduced inversely. And the value of F1 of the "complex model with C module reserved" is 1.13 times that of the "complex model without C module reserved", and also in the case of a deep network structure, the model with C module reserved performs better, and it can be considered that the scheme with C module reserved is preferable. The P3DConvLSTM (3blocks & 37layers) model was finally considered to work best.

TABLE 3 comparison of the various model experiments

Table 3 shows the performance and time efficiency of HC-LSTM, P3DConvLSTM models and C3D designed herein on FLD. The P3DConvLSTM and HC-LSTM models performed better, higher than C3D on the average for each evaluation index. Wherein the F1 value, mAP and mRecall of HC-LSTM are respectively 3.18 percent, 3.0 percent and 3.37 percent higher than that of C3D; the F1 value, mAP and mRecall of P3DConvLSTM are respectively 3.17%, 3.64% and 2.69% higher than that of C3D.

Although the performance of HC-LSTM is better than that of C3D, the training time is longer, and the speed of the hybrid model P3DConvLSTM is obviously improved and reduced by 34.13% compared with that of HC-LSTM, and the advantages of the two are combined, and the results basically prove the advantages of P3DConvLSTM in exploring space-time information from the comprehensive view of efficiency and generalization capability.

According to the embodiment, the data set of the working behaviors of the peasants is established based on the working videos of the peasants, the P3DConvLSTM model is established, the P3DConvLSTM model is trained based on the data set of the working behaviors of the peasants, and the P3DConvLSTM model has obvious advantages in space-time information exploration; the farmer working behavior is identified based on the trained P3DConvLSTM model, and the method can be used as a reliable method for identifying the farmer working behavior, so that effective farmer working information can be provided for a traceability system.

Fig. 7 is a schematic view of a farmer working behavior recognition apparatus according to another embodiment of the present invention.

In this embodiment, it should be noted that, referring to fig. 7, the peasant working behavior recognition apparatus according to an embodiment of the present invention may include a data set unit 701, a module building unit 702, a model building unit 703, a model training unit 704, and a behavior recognition unit 705. The data set unit 701 is used for establishing a data set of the working behaviors of farmers and dividing the data set; module construction unit 702 is used to construct P3D module and ConvLSTM module; the model building unit 703 is configured to combine the P3D module with the ConvLSTM module to build a P3 dconvltm model; the model training unit 704 is configured to train the P3DConvLSTM model based on the data set and obtain final model parameters; and the behavior recognition unit 705 is used for recognizing the farmer's working behavior based on the trained P3DConvLSTM model.

Since the device for identifying the working behaviors of the farmers provided by the embodiment of the present invention can be used for executing the method for identifying the working behaviors of the farmers described in the above embodiment, the working principle and the beneficial effect are similar, and therefore, detailed descriptions are omitted here, and specific contents can be referred to the description of the above embodiment.

In this embodiment, it should be noted that each unit in the apparatus according to the embodiment of the present invention may be integrated into a whole, or may be separately disposed. The units may be combined into one unit, or further divided into a plurality of sub-units.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 8: a processor 801, a memory 802, a communication interface 803, and a communication bus 804;

the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804;

the processor 801 is configured to call a computer program in the memory 802, and the processor implements all the steps of the above-mentioned farmer work activity identification method when executing the computer program, for example, the processor implements the following processes when executing the computer program: establishing a data set of the working behaviors of farmers, and dividing the data set; constructing a P3D module and a ConvLSTM module; combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model; training the P3DConvLSTM model based on the data set, and identifying farmer work behavior based on the P3DConvLSTM model.

It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.

Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements all the steps of the above-mentioned farmer work activity identification method, for example, the processor implements the following processes when executing the computer program: establishing a data set of the working behaviors of farmers, and dividing the data set; constructing a P3D module and a ConvLSTM module; combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model; training the P3DConvLSTM model based on the data set, and identifying farmer work behavior based on the trained P3DConvLSTM model.

Based on the same inventive concept, another embodiment of the present invention provides a computer program product, which includes a computer program, when being executed by a processor, the computer program implements all the steps of the above-mentioned associated application starting control method, for example, when the processor executes the computer program, the processor implements the following processes: establishing a data set of the working behaviors of farmers, and dividing the data set; constructing a P3D module and a ConvLSTM module; combining the P3D module with the ConvLSTM module to establish a P3DConvLSTM model; training the P3DConvLSTM model based on the data set, and identifying farmer work behavior based on the trained P3DConvLSTM model.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method for identifying the work practices of farmers according to the embodiments or some parts of the embodiments.

Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Furthermore, in the present disclosure, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying the working behavior of farmers is characterized by comprising the following steps:

constructing a P3D module and a ConvLSTM module;

training the P3DConvLSTM model based on the data set,

2. The method for identifying the working behaviors of farmers as claimed in claim 1, wherein the establishing a data set of the working behaviors of farmers and dividing the data set comprises:

3. The method for identifying the working behaviors of farmers as claimed in claim 2, wherein the step of obtaining the data set of the working behaviors of farmers based on the working videos of farmers comprises the following steps:

converting the video into a sequence of images;

dividing a sequence of images into a plurality of video segments;

4. The method for identifying the working behaviors of farmers as claimed in claim 2, wherein the P3D module comprises: a two-dimensional spatial convolution kernel and a one-dimensional time-domain convolution kernel.

5. The farmers' work behavior recognition method as set forth in claim 1, wherein the P3DConvLSTM model comprises:

the c module simplified model is not reserved;

the complex model of the c module is not reserved;

c, reserving the simplified model of the module c;

the c-block complex model is not retained.

6. The farmers' work behavior recognition method as claimed in claim 1, wherein the ConvLSTM module comprises 2 HC-LSTM modules, 2C 3D modules and 3 full connection layers, and the output result of the ConvLSTM module is the classification result.

7. The farmers' work activity recognition method as claimed in any one of the claims 1-6, wherein combining the P3D module with the ConvLSTM module to build a P3 dconvltm model comprises:

8. A farmer's work action recognition device, characterized by, includes:

a model training unit to train the P3DConvLSTM model based on the data set;

and the behavior recognition unit is used for recognizing the working behaviors of the recognized farmers based on the trained P3DConvLSTM model.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for identifying the working behavior of farmers as claimed in any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for identifying the working behavior of a farmer as claimed in any one of claims 1 to 7.