CN115880558B

CN115880558B - Farming behavior detection method and device, electronic equipment and storage medium

Info

Publication number: CN115880558B
Application number: CN202310194084.6A
Authority: CN
Inventors: 杨信廷; 潘良; 周超; 孙传恒; 曾昱皓; 王丁弘; 刘锦涛
Original assignee: Research Center of Information Technology of Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Research Center of Information Technology of Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-05-26
Anticipated expiration: 2043-03-03
Also published as: CN115880558A

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for detecting agronomic behaviors, belonging to the technical field of image detection, wherein the method comprises the following steps: acquiring an agricultural behavior image of a user; inputting the agronomic behavior image into an agronomic behavior detection model, and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model; the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features from the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features. The invention can accurately identify different behavior actions of the user, effectively improves the precision and effect of agricultural behavior detection, and greatly improves the efficiency of agricultural behavior detection.

Description

Farming behavior detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image detection technologies, and in particular, to a method and apparatus for detecting agronomic behaviors, an electronic device, and a storage medium.

Background

In the prior art, publication number CN 111242007A, which discloses: the method comprises the steps of continuously monitoring and collecting real-time video streams by using cameras installed on agricultural product variety planting bases, automatically identifying agricultural behavior categories displayed by images or videos by using a pre-trained behavior detection model, and extracting corresponding key frame images and/or key video clips and the corresponding agricultural behavior categories, so that supervision and video archiving of various agricultural behaviors can be conveniently realized in a large-scale agricultural production process. In addition, publication No. CN 115294651A, which discloses: training according to various kinds of farm tool information by constructing a target monitoring model of the farm tool to obtain a trained farm tool monitoring model; acquiring video data, and analyzing the human body posture in the video data by utilizing a behavior recognition model to obtain joint point coordinate feature matrixes of all human bodies; and identifying the video data through the trained farm tool monitoring model to obtain an identification result.

In aquaculture, farming activities mainly include inspection and administration. The agricultural behavior detection is an important ring in the aquaculture production process, and the cultivation process of cultivation objects such as fishes, shrimps and crabs can be accurately controlled by detecting the time and the times of the agricultural behaviors of personnel, so that the method has important significance for improving the quality of cultivation products and tracing the implementation quality.

At present, the supervision mode of farming behaviors in aquaculture is still mainly in the form of manual recording, and the efficiency is quite low. The machine vision method based on traditional machine learning is also applied to detection of personnel farming behaviors, but because the mode of extracting image features is simple, the quantity of extracted feature information is small and single, and the farming behaviors cannot be accurately detected and identified well, so that the detection precision is low and the effect is poor.

Therefore, how to better perform agricultural behavior detection has become a technical problem to be solved in the industry.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a storage medium for detecting agricultural behaviors, which are used for detecting the agricultural behaviors better.

The invention provides a method for detecting agronomic behaviors, which comprises the following steps:

acquiring an agricultural behavior image of a user;

inputting the agronomic behavior image into an agronomic behavior detection model, and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model;

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features from the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features.

According to the agronomic behavior detection method provided by the invention, the agronomic behavior detection model comprises a backbone network, a neck network and a head network; the backbone network is constructed based on a coordinate attention module, an effective layer aggregation network module and a space pyramid pooling module and is used for extracting the position characteristics and the behavior characteristics of the user in the agronomic behavior image; the neck network is used for carrying out feature fusion on the feature images output by the backbone network; the head network is used for determining the agronomic behavior category of the user based on the feature fusion image output by the neck network.

According to the agronomic behavior detection method provided by the invention, the backbone network comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a first superposition module, the effective layer aggregation network module comprises an original effective layer aggregation network module and an improved effective layer aggregation network module, and the spatial pyramid pooling module comprises an original spatial pyramid pooling module and an improved spatial pyramid pooling module;

the output end of the first feature extraction layer is respectively connected with the input end of the second feature extraction layer and the input end of the third feature extraction layer; the first superposition module is used for carrying out superposition processing on the characteristic image output by the second characteristic extraction layer and the characteristic image output by the third characteristic extraction layer;

The first feature extraction layer comprises a convolution module and the original effective layer aggregation network module which are sequentially connected;

the second feature extraction layer comprises at least one first sub-network layer and the original spatial pyramid pooling module; each first sub-network layer is connected in sequence; the first sub-network layer comprises a downsampling module and the original effective layer aggregation network module which are sequentially connected;

the third feature extraction layer comprises at least one second sub-network layer and the improved spatial pyramid pooling module, and the second sub-network layers are sequentially connected; the second sub-network layer comprises a downsampling module and the improved effective layer aggregation network module which are connected in sequence;

the improved spatial pyramid pooling module is obtained by adding the coordinate attention module to the output end of the original spatial pyramid pooling module;

the improved effective layer aggregation network module comprises a first aggregation module, a second superposition module and the convolution module which are sequentially connected, wherein the first aggregation module is obtained by replacing each convolution module at the input side of the image splicing module in the original effective layer aggregation network module with the coordinate attention module, and the second superposition module is used for superposing the characteristic image input into the improved effective layer aggregation network module and the characteristic image output by the first aggregation module.

According to the agronomic behavior detection method provided by the invention, the agronomic behavior image is input into an agronomic behavior detection model, and the agronomic behavior category of the user output by the agronomic behavior detection model is obtained, comprising the following steps:

inputting the agronomic behavior image into the backbone network to obtain a plurality of target feature images which are output by the backbone network and comprise the position features and the behavior features of the user;

inputting a plurality of target feature images into the neck network to obtain the feature fusion image output by the neck network;

and inputting the feature fusion image into the head network to obtain the agronomic behavior category of the user output by the head network.

According to the method for detecting the agronomic behaviors, the agronomic behaviors are input into the backbone network to obtain a plurality of target feature images which are output by the backbone network and contain the position features and the behavior features of the user, and the method comprises the following steps:

inputting the agronomic behavior image into the first feature extraction layer to obtain a first feature image;

inputting the first characteristic image into the second characteristic extraction layer to obtain a first target characteristic image output by each first sub-network layer and a second characteristic image output by the second characteristic extraction layer;

Inputting the first characteristic image into the third characteristic extraction layer to obtain a third characteristic image output by the third characteristic extraction layer;

superposing the second characteristic image and the third characteristic image through the first superposition module to obtain a second target characteristic image;

and obtaining a plurality of target feature images based on each of the first target feature image and the second target feature image.

According to the agronomic behavior detection method provided by the invention, before the agronomic behavior image is input into the agronomic behavior detection model to obtain the agronomic behavior category of the user output by the agronomic behavior detection model, the method further comprises:

taking the agronomic behavior image samples and the corresponding agronomic behavior class labels as a group of training samples, and obtaining a plurality of groups of training samples;

and training the agronomic behavior detection model by utilizing the plurality of groups of training samples.

According to the agronomic behavior detection method provided by the invention, the training of the agronomic behavior detection model by utilizing the plurality of groups of training samples comprises the following steps:

for any group of training samples, inputting the training samples into the agronomic behavior detection model, and outputting the prediction probability corresponding to the training samples;

Calculating a loss value according to the prediction probability corresponding to the training sample and the agronomic behavior class label corresponding to the training sample by using a preset loss function;

based on the loss value, adjusting model parameters of the agronomic behavior detection model until the loss value is smaller than a preset threshold value or the training times reach preset times;

and taking the model parameters obtained when the loss value is smaller than the preset threshold value or the training times reach the preset times as the model parameters of the trained agricultural behavior detection model, and completing the training of the agricultural behavior detection model.

The invention also provides a device for detecting the agronomic behavior, which comprises:

the acquisition module is used for acquiring the agronomic behavior image of the user;

the detection module is used for inputting the agronomic behavior image into an agronomic behavior detection model and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model;

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for determining the agronomic behavior category of the user based on the position features and the behavior features of the user obtained by extracting the features of the agronomic behavior image.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the agronomic performance detection as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of agronomic performance detection as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of detecting agronomic behaviour as described in any one of the above.

According to the agronomic behavior detection method, device, electronic equipment and storage medium, through strengthening positioning and identifying internal relations between the positions and behaviors of the users, the trained agronomic behavior detection model is utilized, fusion is conducted based on the position features and the behavior features of the users, which are obtained by extracting features of the agronomic behavior images of the input users, the agronomic behaviors of the users are identified based on the fused features, the agronomic behavior types of the users are determined, different behavior actions of the users can be accurately identified, the precision and the effect of the agronomic behavior detection are effectively improved, and meanwhile the efficiency of the agronomic behavior detection is greatly improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for detecting agronomic behaviors;

FIG. 2 is a schematic structural diagram of an agricultural behavior detection model in the agricultural behavior detection method provided by the invention;

FIG. 3 is a schematic structural diagram of an Elan module used in the method for detecting agronomic behaviors provided by the invention;

FIG. 4 is a schematic diagram of the SPP module used in the method for detecting agronomic behaviors according to the present invention;

FIG. 5 is a schematic structural view of the agricultural behavior detection device provided by the invention;

fig. 6 is a schematic diagram of the physical structure of the electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The agronomic behavior detection method, device, electronic equipment and storage medium of the present invention are described below with reference to fig. 1 to 6.

FIG. 1 is a schematic flow chart of the method for detecting agricultural behaviors, which is provided by the invention, and as shown in FIG. 1, the method comprises the following steps: step 110 and step 120.

Step 110, acquiring an agronomic behavior image of a user;

step 120, inputting the agronomic behavior image into an agronomic behavior detection model, and obtaining the agronomic behavior category of the user output by the agronomic behavior detection model;

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting the features of the agronomic behavior images, and determining the agronomic behavior category of the user based on the fused features.

Specifically, the agronomic action image of the user described in the embodiment of the invention refers to an action video image generated by shooting the agronomic action of the user. In this embodiment, the agronomic activities may include inspection activities and administration activities.

In the embodiment of the invention, the image of the agricultural behavior of the user can be acquired by arranging the image pickup device at a plurality of view angles of the cultivation area to be detected and shooting the behavior and the action of the user from different angles.

The agronomic behavior categories of the user described in the embodiments of the present invention refer to recognition results obtained by recognizing input agronomic behavior images, and different agronomic behavior images may correspond to different agronomic behavior categories. The categories of agronomic activities for the user may specifically include two categories, namely Inspection "and administration" Applying pesticides ".

The agronomic behavior detection model described by the embodiment of the invention is obtained by training according to the agronomic behavior image sample and the corresponding agronomic behavior category label and is used for learning the internal relation between the behavior action information of the user under different agronomic behaviors, the position characteristics and the behavior characteristics of the user obtained by extracting the characteristics of the agronomic behavior image of the user are fused, and the agronomic behavior category is identified based on the fused characteristics, so that a high-precision agronomic behavior identification result is output.

It should be noted that, in the embodiment of the present invention, the agronomic behavior detection model may be constructed based on a deep neural network. The deep neural network may specifically be a target detection algorithm model, such as a YOLOv5 model, a YOLOv7 model, or the like, may also be a regional convolution neural network model (Regions with Convolutional Neural Network features, R-CNNs), or may be other deep neural networks for target detection, so as to realize recognition of agronomic behaviors, which is not specifically limited in the present invention.

In the embodiment of the invention, the model training samples are composed of a plurality of groups of agronomic behavior image samples carrying agronomic behavior class labels.

In an embodiment of the present invention, the agronomic performance class labels are predetermined according to the agronomic performance image samples and are in one-to-one correspondence with the agronomic performance image samples. That is, each agronomic action image sample in the training samples is preset to carry an agronomic action category label corresponding to the agronomic action image sample.

It will be appreciated that the agronomic performance class labels may include both "patrol" and "apply" class labels.

Further, the agronomic behavior detection model can be trained by utilizing the agronomic behavior image sample and the corresponding agronomic behavior class label, and the agronomic behavior class corresponding to the agronomic behavior image output by the agronomic behavior detection model can be obtained after the obtained agronomic behavior image of the user is input into the agronomic behavior detection model.

In a specific embodiment, an image capturing apparatus, a light source, and an arithmetic processor may be provided, wherein the arithmetic processor is connected to the image capturing apparatus and the light source, respectively; the camera equipment can acquire the video stream data of the user's farming behaviors in real time under the control of the operation processor, and acquire the farming behavior images of the user. The light source can be used for carrying out light filling for the camera equipment under the condition of insufficient light, and the camera equipment is transmitted to the operation processor after gathering user's agronomic behavior video stream, and the operation processor judges user's agronomic behavior according to the agronomic behavior detection model that trains, outputs agronomic behavior class label to record the time and the number of times of patrolling and examining and applying medicine reaches intelligent monitoring's purpose.

The method provided by the embodiment of the invention can be used for accurately detecting the agricultural behaviors of the user, effectively detecting the time and the times of the 'Inspection' and the 'Applying pesticides' of the user, effectively improving the accuracy and the effect of the agricultural behavior detection and improving the production efficiency of aquaculture.

According to the agronomic behavior detection method, through strengthening positioning and identifying the internal correlation between the position and the behavior of the user, the trained agronomic behavior detection model is utilized, fusion is conducted on the position features and the behavior features of the user, which are obtained by extracting features of the input agronomic behavior image of the user, the agronomic behavior of the user is identified based on the fused features, the types of the agronomic behaviors of the user are determined, different behavior actions of the user can be accurately identified, the precision and the effect of agronomic behavior detection are effectively improved, and meanwhile the efficiency of agronomic behavior detection is greatly improved.

Based on the foregoing embodiments, fig. 2 is a schematic structural diagram of an agronomic behavior detection model in the agronomic behavior detection method provided by the present invention, where, as shown in fig. 2, the agronomic behavior detection model includes a backbone network 1, a neck network 2 and a head network 3;

the backbone network 1 is constructed based on a coordinate attention module, an effective layer aggregation network module and a space pyramid pooling module and is used for extracting the position characteristics and the behavior characteristics of a user in an agricultural behavior image;

the neck network 2 is used for carrying out feature fusion on the feature images output by the backbone network 1;

the head network 3 is used to determine the agronomic behavior category of the user based on the feature fusion image output by the neck network 2.

Specifically, in the embodiment of the invention, the agronomic behavior detection model can be constructed by adopting a target detection algorithm YOLO series model, such as a lightweight YOLO v7-tiny network model, which has faster network training and recognition speed and lower requirements on hardware equipment.

In order to enable the model to more accurately locate and identify the position and the behavior of a user engaged in agricultural behaviors, a coordinate attention (Coordinate Attention, CA) module is adopted in the embodiment of the invention, and an active layer aggregation network (Efficient Layer Aggregation Networks, elan) module and a space pyramid pooling (Spatial Pyramid Pooling, SPP) module are combined to improve and construct a Backbone (backbond) network in a YOLO series model, and the position characteristics and the behavior characteristics of the user in an agricultural behavior image are extracted through the improved backbond network.

Based on the foregoing embodiment, with continued reference to fig. 2, as shown in fig. 2, the backbone network 1 includes a first feature extraction layer 11, a second feature extraction layer 12, a third feature extraction layer 13, and a first superposition module 14, the Elan module includes an original Elan module and a modified Elan module, and the SPP module includes an original SPP module and a modified SPP module;

the output end of the first feature extraction layer 11 is connected with the input end of the second feature extraction layer 12 and the input end of the third feature extraction layer 13 respectively; the first superimposing module 14 is configured to perform a superimposing process on the feature image output by the second feature extraction layer and the feature image output by the third feature extraction layer;

the first feature extraction layer 11 comprises a convolution module and an original Elan module which are sequentially connected;

the second feature extraction layer 12 includes at least a first sub-network layer and an original SPP module; each first sub-network layer is connected in sequence; the first sub-network layer comprises a downsampling module and an original Elan module which are sequentially connected;

the third feature extraction layer 13 comprises at least one second sub-network layer and an improved SPP module, and the second sub-network layers are sequentially connected; the second sub-network layer comprises a downsampling module and an improved Elan module which are sequentially connected;

The improved SPP module is obtained by adding a coordinate attention module at the output end of the original SPP module;

the improved Elan module comprises a first aggregation module, a second superposition module and a convolution module which are sequentially connected, wherein the first aggregation module is obtained by replacing each convolution module at the input side of the image splicing module in the original Elan module with a CA module, and the second superposition module is used for carrying out superposition processing on the characteristic image of the input improved Elan module and the characteristic image output by the first aggregation module.

Specifically, the original Elan module described in the embodiment of the present invention refers to an active layer aggregation network module in a YOLO series model, for example, a lightweight YOLOv7-tiny network model is adopted, and the original Elan module is an Elan-tiny module in the YOLOv7-tiny network model.

The original SPP module described in the embodiment of the invention refers to a spatial pyramid pooling module in a YOLO series model, for example, a lightweight YOLO v7-tiny network model is adopted, and the original SPP module is an SPP-tiny module in the YOLO v7-tiny network model.

It should be noted that, the enhancement of the coordinate information can enable the network to have better positioning capability, and can better position and identify the farming behaviors of the user. In the embodiment of the present invention, in order to sufficiently acquire and utilize coordinate information in image data, a CA attention mechanism is introduced based on its excellent positioning capability, and improvement using the CA attention mechanism does not excessively increase the number of parameters and the operation time.

In the embodiment of the invention, an Elan-tiny module and an SPP-tiny module of a backhaul network in Yolov7 are improved by using a CA attention mechanism, and a characteristic correction network is constructed, so that the recognition capability of a network model is improved.

Fig. 3 is a schematic structural diagram of an Elan module used in the method for detecting an agricultural behavior provided by the present invention, as shown in fig. 3, fig. 3 (a) is an original Elan module, that is, an unmodified Elan-tiny module, and fig. 3 (b) is an improved Elan module adopted in the embodiment of the present invention, which is obtained by improving the original Elan module, and may be described as an Elan-CA module.

The convolution module described in the embodiment of the present invention may be specifically a CBS convolution module, which may sequentially perform two-dimensional convolution Conv2 x 2 operation, batch normalization (Batch Normalization, BN) operation, and operation of a SiLU activation function.

In the embodiment of the invention, the Elan-CA module comprises a first aggregation module, a second superposition module ADD module and a CBS convolution module which are sequentially connected.

As can be seen from comparing the graph (a) with the graph (b) in fig. 3, the first aggregation module in this embodiment is obtained by replacing each CBS convolution module on the input side of the image stitching Concat module in the original Elan module with a CA module. Meanwhile, by adding a direct connection (shortcut), the second superposition module is utilized to carry out superposition processing on the characteristic image input by the improved Elan module and the characteristic image output by the first aggregation module, so that the correction of the image characteristics is realized.

Fig. 4 is a schematic structural diagram of an SPP module used in the method for detecting an agricultural behavior according to the present invention, as shown in fig. 4, fig. 4 (a) is an original SPP module, that is, an unmodified SPP-tiniy module, and fig. 4 (b) is an improved SPP module adopted in the embodiment of the present invention, which is obtained by improving the original SPP module, and may be described as an SPP-CA module.

From a comparison of the graph (a) and the graph (b) in fig. 4, it can be seen that the SPP-CA module in this embodiment is obtained by adding a CA module to the output of the original SPP module.

Further, with continued reference to fig. 2, in an embodiment of the present invention, the backbone network specifically includes a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and a first superposition module, where the second feature extraction layer and the third feature extraction layer are obtained by constructing a shunt using an Elan-CA module and an SPP-CA module.

Specifically, as shown in fig. 2, the second feature extraction layer may specifically include three first sub-network layers and an original SPP module, where each first sub-network layer is sequentially connected, the first sub-network layer includes a downsampling module, i.e., an MP module, and an original Elan module, i.e., an Elan-tiny module, which are sequentially connected, and finally, a feature image is output through the original SPP module.

The constructed sub-path is a third feature extraction layer, and specifically may include three second sub-network layers and an improved SPP module, i.e. an SPP-CA module, where each second sub-network layer is sequentially connected, and the second sub-network layer includes an MP module and an improved Elan module, i.e. an Elan-CA module, which are sequentially connected, and finally outputs a feature image through the SPP-CA module.

Further, in the backbone network, the feature image output by the second feature extraction layer and the feature image output by the third feature extraction layer are subjected to superposition processing through the first superposition module, so that feature fusion is performed with the features of the original network path in an ADD data fusion mode, fusion features of the position features and the behavior features of the user are obtained, and the effect of correcting the image features is achieved.

According to the method provided by the embodiment of the invention, the Elan module and the SPP module in the Yolov7 are improved by utilizing the CA attention mechanism, and the characteristic correction network is constructed, so that the network can acquire the position information and other characteristic information more comprehensively and has more abundant characteristic information, the recognition capability of a network model is improved, and the recognition precision of the agricultural behaviors of a user is improved.

Furthermore, in the embodiment of the invention, the agronomic behavior detection model further comprises a Neck (Neck) network and a Head (Head) network, and feature fusion can be performed on feature images output by a back bone network through the Neck network, so that the diversity and the robustness of the image features are improved. And finally, predicting the recognition result by using all the features contained in the feature fusion image output by the Neck network through the Head network, and outputting the agricultural behavior category of the user.

According to the method provided by the embodiment of the invention, the improved YOLO model can more accurately locate and identify the position and the behavior of the user engaged in the agronomic behavior by utilizing the attention mechanism of the CA module and combining the Elan module and the SPP module to construct the backhaul network, so that the precision and the effect of the recognition of the agronomic behavior of the user can be effectively improved.

Based on the foregoing embodiment, as an optional embodiment, inputting the agronomic performance image into the agronomic performance detection model, obtaining the agronomic performance category of the user output by the agronomic performance detection model includes:

inputting the agronomic behavior image into a backbone network to obtain a plurality of target feature images which are output by the backbone network and comprise the position features and the behavior features of the user;

inputting a plurality of target feature images into a neck network to obtain a feature fusion image output by the neck network;

and inputting the feature fusion image into a head network to obtain the agronomic behavior category of the user output by the head network.

Specifically, the target feature image described in the embodiment of the invention refers to a feature image obtained by extracting features of an input agronomic behavior image through different forces by a backhaul network. The target feature image may contain the user's location features and behavior features based on the increased CA attention mechanism in the backhaul network.

In the embodiment of the invention, the agronomic behavior image of the user is input into the Backbone network, and a plurality of target feature images can be output by the Backbone network through the joint processing of the CA module, the Elan module and the SPP module.

Based on the foregoing embodiment, as an optional embodiment, inputting the agronomic performance image into the backbone network, to obtain a plurality of target feature images including the location features and the performance features of the user output by the backbone network, including:

inputting the agronomic behavior image into a first feature extraction layer to obtain a first feature image;

inputting the first characteristic image into a second characteristic extraction layer to obtain a first target characteristic image output by each first sub-network layer and a second characteristic image output by the second characteristic extraction layer;

inputting the first characteristic image into a third characteristic extraction layer to obtain a third characteristic image output by the third characteristic extraction layer;

superposing the second characteristic image and the third characteristic image through a first superposition module to obtain a second target characteristic image;

and obtaining a plurality of target feature images based on the first target feature image and the second target feature image.

Specifically, the first target feature image described in the embodiment of the present invention refers to a feature image output by each first sub-network layer in the second feature extraction layer, which is used for inputting into a negk network to perform feature fusion.

With continued reference to fig. 2, in this embodiment, the second feature extraction layer specifically includes three first sub-network layers and an SPP-tiny module, and the third feature extraction layer specifically includes three second sub-network layers and an SPP-CA module.

Inputting the agronomic behavior image of the user into a first feature extraction layer, and sequentially performing two-dimensional convolution Conv2×2 operation, BN operation and SiLU activation function operation on the agronomic behavior image of the user through a CBS convolution module, so that the features output by the CBS convolution module are input into an Elan-tiny module for processing, and a first feature image is obtained.

In the embodiment of the invention, a first characteristic image is input into a second characteristic extraction layer, the first characteristic image is subjected to characteristic extraction through a first sub-network layer to obtain a first target characteristic image output by the first sub-network layer, then the first target characteristic image is subjected to characteristic extraction through a second first sub-network layer to obtain a first target characteristic image corresponding to the second first sub-network layer, and then the first target characteristic image output by the second first sub-network layer is input into a third first sub-network layer to obtain a first target characteristic image corresponding to the third first sub-network layer, so that the first target characteristic image output by each first sub-network layer is obtained.

Further, a first target feature image output by the third first sub-network layer is input into the original SPP module to extract the same latitude feature, and finally a second feature image output by the second feature extraction layer is obtained.

Likewise, in this embodiment, the first feature image is input to the third feature extraction layer, and the first feature image sequentially undergoes feature extraction processing of three second sub-network layers, where each second sub-network layer includes an MP module and an Elan-CA module that are sequentially connected. And the features extracted by the three second sub-network layers in sequence are input to the SPP-CA module to extract the features with the same latitude, and finally a third feature image output by a third feature extraction layer is obtained.

Further, in this embodiment, the first superimposition module superimposes and fuses the second feature image output by the second feature extraction layer and the third feature image output by the third feature extraction layer, so as to obtain a second target feature image.

Therefore, a plurality of target feature images for inputting the Neck network to perform feature fusion are obtained by obtaining the first target feature image and the second target feature image which are output by each first sub-network layer.

According to the method provided by the embodiment of the invention, the improved Elan-CA module and the improved SPP-CA module are utilized to construct the shunt network, and the feature correction network is constructed by combining the feature extraction network in the original backhaul network, so that the backhaul network can more comprehensively acquire the position information and other feature information of the user behavior, the network model has richer feature information to identify the user farming behaviors, and the identification precision and effect of the network model can be effectively improved.

Further, in the embodiment of the invention, a plurality of target feature images output by a backhaul network are input into a rock network for feature fusion. In the Neck network, feature splicing Concat operation is carried out on a feature image output by a CBS convolution module of a first target feature image output by a third first sub-network layer corresponding to a second target feature image and a feature image output by a CBS convolution module of a second target feature image, and the obtained spliced image is subjected to the CBS convolution module to output a first fusion feature image. The first fusion characteristic image sequentially passes through a CBS convolution module and an up sampling (upsampling) to obtain a characteristic image, and the characteristic image and a first target characteristic image correspondingly output by a second first sub-network layer are subjected to a Concat operation, so that the obtained spliced image passes through an Elan-tiny module to output a second fusion characteristic image. The second fusion characteristic image sequentially passes through a CBS convolution module and an up-sampling characteristic image, then Concat operation is carried out on the characteristic image and a first target characteristic image which is output correspondingly by a first sub-network layer, and the obtained spliced image passes through an Elan-tiny module and outputs a third fusion characteristic image. The third fusion characteristic image is subjected to Concat operation with the second fusion characteristic image after being output by the CBS convolution module, and the spliced image obtained by the Concatenation operation is subjected to an Elan-tiny module to output a fourth fusion characteristic image. The feature image output by the fourth fusion feature image through the CBS convolution module is subjected to Concat operation with the first fusion feature image, and the spliced image obtained through the Concatenation operation is subjected to the Elan-tiny module to output a fifth fusion feature image. Therefore, the feature fusion image output by the Neck network is obtained by obtaining the third fusion feature image, the fourth fusion feature image and the fifth fusion feature image.

And finally, inputting a third fused characteristic image, a fourth fused characteristic image and a fifth fused characteristic image which are output by the Neck network into the Head network, outputting an image detection frame which is used for target positioning detection and has proper size through a CBS convolution module and a convolution layer in the Head network, outputting a farming behavior category label of a user, and completing rapid and accurate identification of the input farming behavior image.

According to the method provided by the embodiment of the invention, the backbone network in the YOLO model is improved by adopting the target detection algorithm YOLO model and utilizing the coordinate attention mechanism, so that the positioning capability of the backbone network on the characteristic information of the user behaviors is enhanced, the richer characteristic information is provided for the user farming behavior detection model, and the recognition precision and capability of the network model on the user farming behaviors are effectively improved.

Based on the foregoing embodiment, as an optional embodiment, before inputting the agronomic performance image into the agronomic performance detection model and acquiring the agronomic performance category of the user output by the agronomic performance detection model, the method further includes:

And training the agronomic behavior detection model by utilizing a plurality of groups of training samples.

Specifically, in the embodiment of the present invention, before the agronomic performance image is input into the agronomic performance detection model, the agronomic performance detection model is further trained to obtain a trained agronomic performance detection model.

In the embodiment of the invention, shooting is performed through a plurality of prearranged shooting devices with different directions, the video acquisition frame rate is 30Hz, the original size of the video is 2304 multiplied by 1296 pixels, the original agronomic action video data can be obtained, and then the original agronomic action video can be marked frame by frame, so that an agronomic action image sample and a corresponding agronomic action category label are obtained.

Further, all the acquired sample data are divided into a test set, a training set and a verification set. In this embodiment, 6218 samples of the agronomic performance image may be taken as a training set, and the remaining 621 samples of the agronomic performance image are divided into a verification set, 612 and Zhang Nongshi samples of the agronomic performance image are taken as a test set, and the corresponding images and labeling information are placed under the corresponding folders.

In the embodiment of the invention, the training set data is utilized to train the agricultural behavior detection model, and the specific training process is as follows:

And taking the agronomic behavior image samples and the corresponding agronomic behavior image samples as a group of training samples, namely taking each agronomic behavior image sample with an agronomic behavior type label as a group of training samples, thereby obtaining a plurality of groups of training samples.

In the embodiment of the invention, the agronomic behavior image samples are in one-to-one correspondence with the agronomic behavior class labels carried by the agronomic behavior image samples.

Then, after obtaining a plurality of groups of training samples, sequentially inputting the plurality of groups of training samples into the agronomic behavior detection model, and training the agronomic behavior detection model by utilizing the plurality of groups of training samples, namely:

the agronomic behavior image samples in each group of training samples and the carried agronomic behavior class labels are simultaneously input into an agronomic behavior detection model, model parameters in the agronomic behavior detection model are adjusted by calculating loss function values according to each output result in the agronomic behavior detection model, and under the condition that preset training termination conditions are met, the whole training process of the agronomic behavior detection model is finally completed, so that a trained agronomic behavior detection model is obtained.

According to the method provided by the embodiment of the invention, the agronomic behavior image samples and the corresponding agronomic behavior class labels are used as a group of training samples, and the agronomic behavior detection model is trained by utilizing a plurality of groups of training samples, so that the model precision of the trained agronomic behavior detection model is improved.

Based on the foregoing embodiment, as an alternative embodiment, training the agronomic performance detection model using multiple sets of training samples includes:

for any group of training samples, inputting the training samples into an agronomic behavior detection model, and outputting the prediction probability corresponding to the training samples;

based on the loss value, adjusting model parameters of the agricultural behavior detection model until the loss value is smaller than a preset threshold value or the training times reach preset times;

and taking the model parameters obtained when the loss value is smaller than a preset threshold value or the training times reach the preset times as the model parameters of the trained agricultural behavior detection model, and completing training of the agricultural behavior detection model.

Specifically, the preset loss function described in the embodiment of the present invention refers to a loss function preset in an agronomic behavior detection model, and is used for model evaluation; the preset threshold refers to a threshold preset by the model, and is used for obtaining a minimum loss value and completing model training; the preset times refer to the preset maximum times of model iterative training.

After a plurality of groups of training samples are obtained, for any group of training samples, the agronomic behavior image samples in each group of training samples and the agronomic behavior class labels carried by the agronomic behavior image samples are simultaneously input into an agronomic behavior detection model, and the prediction probability corresponding to the training samples is output.

On the basis, a preset loss function is utilized, and a loss value is calculated according to the prediction probability corresponding to the training sample and the agronomic behavior class label corresponding to the training sample.

Further, after the loss value is obtained by calculation, the training process ends. And then, based on the loss value, adjusting the model parameters of the agricultural behavior detection model to update the weight parameters of each layer of the model in the agricultural behavior detection model, and then, carrying out the next training, and repeating the iterative process to carry out the model training.

In the training process, if the training result of a certain group of training samples meets the preset training termination condition, if the loss value obtained by corresponding calculation is smaller than the preset threshold value, or the current iteration number reaches the preset number, the loss value of the model can be controlled within the convergence range, and the model training is ended. At this time, the obtained model parameters can be used as the model parameters of the trained agronomic behavior detection model, and the training of the agronomic behavior detection model is completed, so that the trained agronomic behavior detection model is obtained.

In one embodiment, on a 64-bit Windows10 operating system platform, a model for detecting agricultural behaviors is built based on a PyTorch deep learning framework and using the Python language, and training of the model is completed by using a NVIDIA GTX 2080ti GPU. The model training parameters may set Batch Size to 10, the number of iterations to 100, and the learning rate to 0.001. Training uses 6218 video frames as input, with an input data size of 3 x 1280, and acceleration environments may employ application CUDA10.1 and CUDNN7.6.5.

According to the method provided by the embodiment of the invention, the agronomic behavior detection model is repeatedly and iteratively trained by utilizing the plurality of groups of training samples, and the loss value of the agronomic behavior detection model is controlled within the convergence range, so that the accuracy of the recognition result of the agronomic behavior of the user output by the model is improved, and the recognition accuracy of the agronomic behavior of the user is improved.

In the embodiment of the invention, the weight parameters of each layer of the trained agricultural behavior detection model are verified by using the verification set data. After the verification set data is calculated by a backhaul network and a Neck network, finally, a generalized cross-over Loss function (Generalized Intersection over Union Loss, GIoU Loss) is adopted as a Loss function of a Bounding Box (Bounding Box) in an output end Head network, namely the GIoU Loss is used for calculating the Loss of the Bounding Box, and the specific calculation mode is as follows:

；

Thus, by finding the smallest closed box C from the two arbitrary boxes A and B, let C include A and B, then calculate the ratio of the area of C that is not covered by A and B to the total area of C, and then subtract this ratio from IoU of A and B. And finally, suppressing and filtering the target frame through the non-maximum value, and outputting the target detection frame and the confidence coefficient to obtain the agricultural behavior detection result.

In the embodiment of the invention, as described in the agronomic behavior detection model, the model is obtained by improving an Elan-tiny module and an SPP-tiny module of a backhaul network in the Yolov7-tiny by using a CA attention mechanism, and compared with the original Yolov7-tiny network, the agronomic behavior detection model improves the recognition precision, wherein the parameter mAP@5 of the agronomic behavior detection model can reach 99.5%, and is improved by 0.1% compared with the original Yolov7-tiny network; the parameter mAP@5.95 can reach 84.8%, and is improved by 6.6% compared with the original YOLOv7-tiny network.

The agronomic behavior detection device provided by the invention is described below, and the agronomic behavior detection device described below and the agronomic behavior detection method described above can be referred to correspondingly.

Fig. 5 is a schematic structural diagram of an agricultural behavior detection device provided by the present invention, as shown in fig. 5, including:

An acquisition module 510, configured to acquire an agronomic action image of a user;

the detection module 520 is configured to input an agronomic behavior image to the agronomic behavior detection model, and obtain an agronomic behavior category of the user output by the agronomic behavior detection model;

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for determining the agronomic behavior category of the user based on the position characteristics and the behavior characteristics of the user obtained by extracting the characteristics of the agronomic behavior image.

The agronomic behavior detection device in this embodiment may be used to execute the above-mentioned agronomic behavior detection method embodiment, and the principle and technical effects are similar, and are not repeated here.

According to the agronomic behavior detection device, through strengthening positioning and identifying the internal correlation between the position and the behavior of the user, the trained agronomic behavior detection model is utilized, fusion is carried out on the position characteristics and the behavior characteristics of the user obtained by extracting the characteristics of the agronomic behavior image of the input user, the agronomic behavior of the user is identified based on the characteristics obtained by fusion, the agronomic behavior types of the user are determined, different behavior actions of the user can be accurately identified, the precision and the effect of agronomic behavior detection are effectively improved, and meanwhile the efficiency of agronomic behavior detection is greatly improved.

Fig. 6 is a schematic physical structure of an electronic device according to the present invention, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. Processor 610 may invoke logic instructions in memory 630 to perform the agronomic performance detection method provided by the methods described above, the method comprising: acquiring an agricultural behavior image of a user; inputting the agronomic behavior image into an agronomic behavior detection model, and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model; the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features from the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the agronomic performance detection method provided by the methods described above, the method comprising: acquiring an agricultural behavior image of a user; inputting the agronomic behavior image into an agronomic behavior detection model, and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model; the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features from the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the agronomic performance detection method provided by the methods described above, the method comprising: acquiring an agricultural behavior image of a user; inputting the agronomic behavior image into an agronomic behavior detection model, and acquiring the agronomic behavior category of the user output by the agronomic behavior detection model; the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features from the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting agronomic behavior, comprising:

acquiring an agricultural behavior image of a user;

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for fusing the position features and the behavior features of the user, which are obtained by extracting features of the agronomic behavior image, and determining the agronomic behavior category of the user based on the fused features;

The agronomic behavior detection model comprises a backbone network, a neck network and a head network; the backbone network is constructed based on a coordinate attention module, an effective layer aggregation network module and a space pyramid pooling module and is used for extracting the position characteristics and the behavior characteristics of the user in the agronomic behavior image; the neck network is used for carrying out feature fusion on the feature images output by the backbone network; the head network is used for determining the agronomic behavior category of the user based on the feature fusion image output by the neck network;

the active layer aggregation network module comprises an original active layer aggregation network module and an improved active layer aggregation network module, and the spatial pyramid pooling module comprises an original spatial pyramid pooling module and an improved spatial pyramid pooling module;

the improved effective layer aggregation network module comprises a first aggregation module, a second superposition module and a convolution module which are sequentially connected, wherein the first aggregation module is obtained by replacing each convolution module at the input side of the image splicing module in the original effective layer aggregation network module with the coordinate attention module, and the second superposition module is used for superposing the characteristic image input into the improved effective layer aggregation network module and the characteristic image output by the first aggregation module.

2. The agronomic performance detection method according to claim 1, wherein the backbone network comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a first superposition module;

the third feature extraction layer comprises at least one second sub-network layer and the improved spatial pyramid pooling module, and the second sub-network layers are sequentially connected; the second sub-network layer comprises a downsampling module and the improved effective layer aggregation network module which are connected in sequence.

3. The agronomic performance detection method according to claim 2, wherein the inputting the agronomic performance image into an agronomic performance detection model, obtaining an agronomic performance category of the user output by the agronomic performance detection model, includes:

4. The method for detecting agronomic actions according to claim 3, wherein the inputting the agronomic actions image into the backbone network, obtaining a plurality of target feature images including the location feature and the action feature of the user output by the backbone network, includes:

5. The agronomic performance detection method according to any of claims 1 to 4, wherein prior to the inputting the agronomic performance image into an agronomic performance detection model, obtaining an agronomic performance category of the user output by the agronomic performance detection model, the method further comprises:

6. The method for detecting agricultural behaviors according to claim 5, wherein training the agricultural behavior detection model using the plurality of sets of training samples includes:

7. An agricultural action detection device, comprising:

the agronomic behavior detection model is obtained by training according to an agronomic behavior image sample and a corresponding agronomic behavior class label; the agronomic behavior detection model is used for determining the agronomic behavior category of the user based on the position features and the behavior features of the user obtained by extracting the features of the agronomic behavior image;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the agronomic performance detection method according to any one of claims 1 to 6 when executing the program.

9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the agronomic performance detection method according to any one of claims 1 to 6.