CN115577256A

CN115577256A - Abnormality detection method and apparatus for pig, electronic device, and storage medium

Info

Publication number: CN115577256A
Application number: CN202110750379.8A
Authority: CN
Inventors: 陈建; 李泽源; 张程; 李舜铭; 王艺霖
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-01-06

Abstract

The embodiment of the application discloses a pig abnormality detection method, a pig abnormality detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting a plurality of audio and video data of pigs; extracting the characteristics of each audio and video data to obtain corresponding target characteristics; inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so as to train the multi-example learning model; and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

Description

Abnormality detection method and apparatus for pig, electronic device, and storage medium

Technical Field

The application relates to the technical field of machine learning, and relates to but is not limited to a pig abnormality detection method, device, electronic equipment and storage medium.

Background

In the related technology, an audio detection method and an image detection method are mainly used for detecting the abnormality of the pig, in the audio detection method, a standard signal for collecting abnormal sound of the live pig is used, then characteristic parameters of the standard signal for the abnormal sound of the live pig are extracted, a sound recognition model is established according to the characteristic parameters, finally, the sound emitted by the live pig is monitored in real time, and whether the sound emitted by the live pig is abnormal sound (such as cough sound of the pig and bite sound of the pig) is judged and recognized through the sound recognition model, so that whether the abnormal behavior of the live pig occurs or not can be known. However, many abnormal behaviors of pigs do not contain audio data that can be detected acoustically, and many different abnormal, stress, and painful audio manifestations of pigs are very similar.

In the image detection method, environment information is calibrated, the back of a fattening pig is marked, then the fattening pig is subjected to target detection by using a computer vision technology, and finally the relationship between the environment information of the pig and the behavior of the pig is obtained by learning by using the detected behavior and environment information of the pig through an SVM (Support Vector Machine). However, the abnormal behavior of the pig is often dynamic behavior and is necessarily accompanied by a time segment, and the abnormal behavior of the pig cannot be defined only by a single feature of an image.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and an apparatus for detecting abnormality of a pig, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides a method for detecting abnormality of a pig, where the method includes: collecting a plurality of audio and video data of pigs; extracting the characteristics of each audio and video data to obtain corresponding target characteristics; inputting the target characteristics of each audio and video data in the plurality of audio and video data into a multi-example learning model so as to train the multi-example learning model; and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

In a second aspect, an embodiment of the present application provides a pig abnormality detection apparatus, including: the acquisition module is used for acquiring a plurality of audio and video data of the pigs; the extraction module is used for extracting the characteristics of each audio and video data to obtain corresponding target characteristics; the training module is used for inputting the target characteristics of each piece of audio and video data in the plurality of pieces of audio and video data into a multi-example learning model so as to train the multi-example learning model; and the detection module is used for detecting the abnormal behaviors of the pigs by utilizing the trained multi-example learning model.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement steps in the method for detecting an abnormality of a pig according to any one of the embodiments of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method for detecting abnormality of a pig according to any one of the embodiments of the present application.

In the embodiment of the application, the target characteristics are obtained by extracting the characteristics of the audio and video data, and are input into the multi-example learning model, so that the learning capacity of the multi-example learning model is enhanced, the abnormal division capacity of the multi-example learning model is stronger, and the accuracy of classifying abnormal audio and video data and non-abnormal audio and video data is improved.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting abnormality in pigs according to an embodiment of the present application;

fig. 2 is a schematic diagram of data acquisition of audio/video data according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a method for extracting video features using a convolutional neural network according to an embodiment of the present application;

FIG. 4 is a diagram illustrating audio feature extraction according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating fusion of audio features and video features according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a multi-example learning network (MIL) training inference in accordance with an embodiment of the present application;

FIG. 7 is a diagram illustrating a data enhancement method according to an embodiment of the present application;

fig. 8 is a schematic flow chart of pig abnormality detection and early warning according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a pig abnormality detection apparatus according to an embodiment of the present application;

fig. 10 is a hardware entity diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solution of the present application is further elaborated below with reference to the drawings and the embodiments.

Fig. 1 is a schematic flow chart of a method for detecting abnormality of a pig according to an embodiment of the present application, as shown in fig. 1, the method includes:

step 102: collecting a plurality of audio and video data of pigs;

wherein, the abnormal occurrence of the pig per se often shows a behavior and a dual-state representation of audio frequency, for example, the pig is often accompanied by the sound of bite when being held; groaning for disease often accompanies cachexia, therefore, a plurality of audio and video data can be generated according to the acquired behavior and audio frequency of the pig, the audio and video data can also be called as sample data packets, and each audio and video data can comprise at least one group of video data and corresponding audio data; the video data may be referred to as behavioral data and the audio data may be referred to as sound data.

Step 104: extracting the characteristics of each audio and video data to obtain corresponding target characteristics;

the audio characteristics can be obtained by carrying out characteristic extraction on the audio frequency of the pig, the video characteristics can be obtained by carrying out characteristic extraction on the behavior of the pig, the key audio characteristics and video characteristics can be reserved by carrying out characteristic extraction, and the audio characteristics and the video characteristics are processed to obtain the corresponding target characteristics;

step 106: inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so as to train the multi-example learning model;

the model training may also be referred to as model updating, a Multiple Instance Learning (MIL) model may also be referred to as a Multiple Instance Learning network, and the Multiple Instance Learning model may learn a data packet composed of Multiple instances, where the data packet may include audio and video data, and the instances may include audio features, image features, video features, and the like; in this embodiment, the audio/video data may include audio features and video features; in other embodiments, the audio-video data may also include two different video features, the audio-video data may further include two different audio features, the audio-video data may further include an audio feature and an image feature, and the audio-video data may further include a first audio feature, a first video feature, a second audio feature, and a second video feature, where the first audio feature corresponds to the first video feature, and the second audio feature corresponds to the second video feature.

Step 108: and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

The embodiment of the application also provides a pig abnormality detection method, which comprises the following steps:

step S202: collecting a plurality of audio and video data of pigs;

wherein each of the audio and video data comprises audio data and video data; fig. 2 is a schematic diagram of data acquisition of audio and video data according to an embodiment of the present application, referring to fig. 2, an image acquisition device 201 may be used to acquire video data 202 of a pig, and an audio acquisition device 203 may also be used to acquire audio data 204 of the pig, where the image acquisition device 201 may be a camera, a video camera, a camera, other devices with shooting function (such as a mobile phone, a tablet computer, etc.), and the audio acquisition device 203 may be a microphone, a sound pickup, an intercom, other devices with recording function (such as a mobile phone, a tablet computer, etc.).

Step S204: performing feature extraction on video data in each piece of audio and video data by using a three-dimensional convolutional neural network to obtain video features corresponding to the audio and video data;

fig. 3 is a schematic diagram of a method for extracting video features by using a convolutional neural network according to an embodiment of the present application, and see fig. 3,2D (2-dimension, two-dimensional) convolutional neural network 301 may be used for feature extraction of a single image, 2D multi-frame convolutional neural network 302 may be used for feature extraction of multiple images, and 3D (3-dimension, three-dimensional) convolutional neural network 303 may be used for feature extraction of images and time sequences, that is, feature extraction of a video stream. The video data (also called video data) and the audio data (also called audio data) are acquired in the data acquisition stage, the audio data are different from image frames and are distributed in a two-dimensional space, the audio data are strongly related to a time sequence, belong to the distribution on the time sequence, cannot be directly spliced with the dimension of the image streams, and cannot be fused with the image streams, so that the feature extraction of the video data by using a three-dimensional convolution neural network can be performed, and the extracted video features can be fused with the audio features.

Referring to fig. 3, k × k may represent the size of the convolution kernel of the convolutional neural network, L may represent the number of images in the video stream, H × W may represent the size of the images, and d may represent the number of images in the selected video segment.

Step S206: performing cepstrum analysis on the audio data in each piece of audio and video data to obtain audio features corresponding to the audio and video data;

it should be noted that, not only the video data in the audio and video data can be subjected to feature extraction to obtain the video features in the corresponding audio and video data, but also the audio data in the audio and video data can be subjected to feature extraction to obtain the audio features in the corresponding audio and video data.

Step S208: fusing the audio characteristic and the video characteristic of each piece of audio and video data to obtain a corresponding target characteristic;

step S210: inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so as to train the multi-example learning model;

step S212: and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

In the embodiment of the application, the video data are subjected to feature extraction through the three-dimensional convolutional neural network, and the audio data are subjected to feature extraction through cepstrum analysis, so that the extraction of key video features in the video data and key audio features in the audio data can be more accurately realized, the key video features and the key audio features are fused to obtain fused target features, the learning capacity of the multi-example learning network is enhanced, the abnormal partition capacity of the multi-example learning network is stronger, and the accuracy of classifying abnormal audio and video data and non-abnormal audio and video data is improved.

The embodiment of the application also provides a method for detecting the abnormality of the pig, which comprises the following steps:

step S302: collecting a plurality of audio and video data of pigs;

step S304: performing feature extraction on video data in each piece of audio and video data by using a three-dimensional convolutional neural network to obtain video features corresponding to the audio and video data;

step S306: carrying out Fourier transform, logarithm taking and inverse Fourier transform on the audio data of each piece of audio and video data to obtain an envelope function of the corresponding audio and video data;

fig. 4 is a schematic diagram of audio feature extraction according to an embodiment of the present application, and referring to fig. 4, time domain information (e.g., a spectrogram 401) of an acquired audio may be converted into frequency domain information by using fourier transform, so as to obtain an audio frequency spectrum X [ k ] in an audio frequency spectrogram 402; further, the frequency domain signal X [ k ] may be split into a product of two parts: the envelope of the audio frequency spectrum and the details of the audio frequency spectrum, the frequency domain signal X k obtained at this time can be represented by formula (1):

X[k]＝H[k]E[k] (1)；

secondly, logarithms can be taken simultaneously on both sides of the formula (1), i.e. log operation is performed, and the logarithmic transformation process can be expressed by the formula (2):

log||X[k]||＝log||H[k]||+log||E[k]|| (2)；

wherein, | X [ k ] | represents the norm of X [ k ], | H [ k ] | represents the norm of H [ k ], and | E [ k ] | represents the norm of E [ k ].

Furthermore, inverse fourier transforms may be simultaneously performed on both sides of equation (2) to obtain a cepstrum x [ k ], and the inverse fourier transform process may be represented by equation (3):

x[k]＝h[k]+e[k] (3)；

in equation (3), h [ k ] describes the low frequency part in the cepstrum x [ k ], which is also the core feature of the audio data, and this embodiment of the application may name h [ k ] as an envelope function map 403 and name e [ k ] as a spectrum detail map 404.

Step S308: performing transposition transformation on the envelope function of each audio and video data to obtain the audio characteristics of the corresponding audio and video data;

in order to match with the image convolution term to achieve the purpose of co-training, the time series data h [ k ] needs to be converted into one-dimensional data U (i.e. the envelope transposing block 405), the envelope function may be subjected to a transposing transformation to obtain an audio feature (i.e. the envelope transposing block 405), and the transposing transformation process may be represented by formula (4):

U＝h[k] ^T (4)；

since audio data and image data are acquired synchronously and time is used as a reference dimension, the dimension of the envelope transpose block can be (1,L) or (1, l).

Step S310: updating the convolution kernel of the three-dimensional convolution neural network by using the audio frequency characteristics of each piece of audio and video data to obtain an updated three-dimensional convolution neural network;

step S312: performing convolution operation on the video characteristics of the corresponding audio and video data by using the updated three-dimensional convolution neural network to obtain corresponding target characteristics;

fig. 5 is a schematic diagram illustrating fusion of audio features and video features according to an embodiment of the present application, and referring to fig. 5, after feature extraction is performed by using a three-dimensional convolutional neural network (3D convolution) and an envelope transposition block, a convolution activation formula of the three-dimensional convolutional neural network is shown in formula (5):

y＝Wx+b (5)；

wherein y is an activation value, W is a convolution kernel parameter for training and learning of the three-dimensional convolution neural network, x is an activated value, and b is a bias compensation item; in order to enable the fusion of video data and envelope transposed block data (audio data), the embodiment of the present application proposes to perform a deconvolution on envelope parameters and a convolution kernel of a three-dimensional convolution neural network, as shown in formula (6):

W＝W _f x _f (6)；

wherein, W _f For transposing recorded values, x, of corresponding time periods of the block for the envelope _f For the convolution kernel parameters, equation (5) and equation (6) are combined to obtain equation (7):

y＝W _f x _f x+b (7)；

wherein, the physical meaning expressed by the formula (7) can be: firstly, the parameter W of the time slice (or time period) corresponding to the sound is passed _f For convolution kernel x _f Performs parameter update and then uses the convolution kernel W _f x _f Performing convolution operation on the video slice to obtain target characteristics; updating the convolution kernel of the three-dimensional convolution neural network by using the audio characteristic 501 of each piece of audio and video data to obtain an updated three-dimensional convolution neural network; and performing convolution operation on the video features 502 of the corresponding audio and video data by using the updated three-dimensional convolution neural network to obtain corresponding target features 503.

In the mode, the audio and the video are organically combined together to complete the extraction of various characteristics, and the achievable characteristic extraction items can be the extraction of image characteristics by using a 2D convolutional neural network, the extraction of behavior characteristics (video characteristics) by using a 3D convolutional neural network, and the extraction of the behavior characteristics and the behavior characteristics by combining the behavior characteristics by using a network architecture; or extracting the combination of sound features and sound features by using a 3D convolution sequence, namely an envelope transposition block; or activating a convolution kernel by using the audio characteristic, and performing convolution operation on the video characteristic by using the activated convolution kernel so as to extract the behavior characteristic and the sound characteristic; finally, two groups of behavior characteristics and sound characteristics can be extracted simultaneously.

Step S314: inputting the target characteristics of each audio and video data in the plurality of audio and video data into a multi-example learning model so that the multi-example learning model can output the probability of the corresponding audio and video data;

step S316: determining probability distribution information of the plurality of audio and video data according to the probability of each audio and video data in each plurality of audio and video data;

the probability presumed by the multi-example learning network is gradually in a distinguishable state along with the continuous expansion of the collected video data and audio data, and the probability distribution information of the plurality of audio and video data can be determined according to the probability corresponding to each audio and video data, and can be used for representing the distribution rule of the probability corresponding to the audio and video data.

Step S318: determining a classification result of each audio and video data according to the probability distribution information;

the classification result can comprise abnormal audio and video data and normal audio and video data, the abnormal audio and video data indicates that the behaviors and/or the sounds of the pigs are abnormal under the condition that the audio and video data are the abnormal audio and video data, and the abnormal audio and video data indicate that the behaviors and the sounds of the pigs are not abnormal under the condition that the audio and video data are the normal audio and video data.

In general, the number of abnormal pigs is far smaller than that of normal pigs, so that the audio and video data which are distributed in a concentrated manner in normal distribution and correspond to similar probabilities can be determined as normal audio and video data, namely the audio and video data corresponding to normal pigs; determining audio and video data corresponding to the probability of scattered distribution in normal distribution as abnormal audio and video data, namely the audio and video data corresponding to abnormal pigs; the probability values for various abnormal situations are offset from the probability values for non-abnormal situations.

Step S320: updating the multi-example learning model according to the classification result of each piece of audio and video data;

fig. 6 is a schematic diagram of a multi-instance learning network (MIL) training inference in an embodiment of the present application, which is different from the maximum difference of the related art in two points, one is that the training data needs to be performed in an unsupervised learning manner, the other is that the output of the multi-instance learning network is a probability value instead of a multi-class label probability, and details of how unsupervised learning is performed, how unsupervised online iterative growth is performed, and how the probability value strengthens the abnormality detection accuracy.

Referring to fig. 6, a default pre-training weight file of a three-dimensional convolutional neural network may be used, for a target feature obtained by feature extraction and feature fusion of video data and audio data, first reducing the dimension and fully connecting the layers into a layer with a size of 4096 dimensions, then passing through a fully connected layer with 1024 dimensions and 256 dimensions in a similar manner, and finally outputting a scalar probability value. If supervised learning is available, the real categories and the predicted values can be compared, and then the value weights of each layer of convolution kernels are updated through back propagation. Therefore, under the condition that the video features and the audio features are subjected to effective feature extraction by a three-dimensional convolutional neural network, through a default 3-layer full connection layer, the predicted probability values of the video features and the audio features can be in normal distribution, namely the probability values generated by a large amount of non-abnormal data are similar and are distributed in a centralized manner, the non-abnormal audio and video data can be distributed in an area 601, the probability values of various abnormal audio and video data are deviated compared with the non-abnormal probability values, for example, the abnormal audio and video data can be distributed in an area 602, and the deviation is little even the probability distributions are overlapped and crossed under the condition of no learning training outside the area 601.

It should be noted that, unified abnormal tags can be added to the abnormal audio/video data, and unified normal tags can be added to the normal audio/video data; further, different exception tags can be added to different exception sample data.

In the initial training stage of the multi-example learning network, the multi-example learning network has no learning capability and can only detect extreme anomalies, two probability value sets of the extreme anomalies and non-extreme anomalies gradually exist in distribution along with the increase of data volume, and after the fact that the extreme anomalies really exist is verified, the initial training task is completed.

By artificially labeling extreme anomalies of the multi-instance learning network, the multi-instance learning network can cohere and separate two large sets (anomalous audio and video data and normal audio and video data) of the multi-instance learning network. The method comprises the steps that a multi-example learning network is manually pointed, the extreme anomalies are confirmed to exist indeed, then the difference of probability values between abnormal audio and video data is calculated, the difference of probability values is fed back to a neural network (the multi-example learning network), so that the difference of probability values enabling the extreme anomalies to be propagated forwards is smaller, the difference of probability values enabling the extreme anomalies to be propagated forwards is defined as cohesion, similarly, the difference of probability values between abnormal audio and video data and non-abnormal audio and video data is calculated, larger difference of probability values is generated as far as possible, and the difference of probability values is defined as departure. When extreme and non-extreme anomalies learn to train to stationary, i.e. loss values cannot be reduced by iteration, it means that at least some of the same extreme anomalies can feed forward infer the same probability values, while other anomalies surround and are distributed in the vicinity and intermediate zones of extreme and normal. And at the moment, manually selecting data with the most probability value statistics for verification, and if the verification abnormal types are similar, fixing the weight of the data as a pre-training weight of the next data training, and finishing the middle-stage task.

In the later stage of training of the multi-example learning network, different anomalies are distinguished, anomalies with similar behaviors and different sounds are distinguished, a certain class is manually determined in a posterior labeling mode in the same way as the early and middle stage learning training of the multi-example learning network, different anomaly labels are added to the anomalies of different classes, for example, different labels are added to the anomalous audio and video data in the region 603 and the anomalous audio and video data in the region 604, then the data is fed back again to adjust so that the output probability values are separated, namely, the anomaly points of the same class are spread forward to form smaller probability value differences, and similarly, the probability value differences of the anomalies of different classes are calculated, and larger probability value differences are generated as far as possible, and the purpose of predicting the pig anomalies is achieved.

Step S322: and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

In the embodiment of the application, the convolution kernel of the three-dimensional convolution neural network is updated through the audio characteristic, and then the convolution operation is performed on the video characteristic by using the updated three-dimensional convolution neural network, so that the fusion of the audio characteristic and the video characteristic can be more accurately realized, and the target characteristic is obtained; in addition, the embodiment of the application adopts an online learning mode, and only the abnormal audio and video data and the normal audio and video data need to be labeled after the abnormality is detected by the multi-example learning network, so that the complexity that a classification detection algorithm needs to label a large amount of abnormal data in advance is avoided. The early stage with rare abnormal data is easier to adapt to through online learning; the detection rate of abnormal audio and video data is improved; a large amount of labor cost and time cost can be saved.

step S402: collecting a plurality of audio and video data of pigs;

step S404: extracting the characteristics of each audio and video data to obtain corresponding target characteristics;

step S406: inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so that the multi-example learning model can output the probability of the corresponding audio and video data;

step S408: determining probability distribution information of the plurality of audio and video data according to the probability of each audio and video data in each plurality of audio and video data;

wherein the probability distribution information includes a normal distribution.

Step S410: determining the classification result of the audio and video data corresponding to the probability of the concentrated distribution in the normal distribution as normal audio and video data;

step S412: determining the classification result of other audio and video data except the normal audio and video data as abnormal audio and video data;

step S414: performing data enhancement on the abnormal audio and video data to obtain enhanced abnormal audio and video data; the data enhancement mode at least comprises the following steps: conventional enhancement, frame interpolation enhancement and reverse amplification enhancement;

fig. 7 is a schematic diagram of a data enhancement mode according to an embodiment of the present application, and referring to fig. 7, the data enhancement mode may include a conventional enhancement 701, an interpolation enhancement 702, and an inverse-playing enhancement 703.

Wherein the conventional enhancement can be performed by MI _norm The conventional enhancement means commonly used for enhancing video stream data, and the principle of the conventional enhancement means comes from the field of image processing, such as image random cropping, distance-limited cropping, multi-graph fusion, color space transformation, salt-and-pepper noise, gaussian noise, rotation and mirror image.

Framing enhancement may be performed in MI _iF It means that an exception must be generated by a normal transition, whereas a normal generation must be generated by an exception transition. An abnormal segment generally has a normal phase, a transition early phase, an abnormal continuous phase, a transition late phase and a normal phase. The conversion prediction of an abnormal segment from an abnormal stage to a normal stage is necessary, and the conversion prediction of an abnormal segment from a normal stage to an abnormal stage is uncertain. Therefore, the later transition stage of an abnormal behavior is more important than the earlier transition stage, the embodiment of the application can use the frame number (later transition frame) of the later transition stage for data enhancement, and under the condition that the sampling frequency of a camera is 15 frames, every 15 frames in video data in a multi-example learning network can be set as a video clip, and the breakpoint of the later transition frame used in the embodiment of the application can be the last 5 frames; the determination of the video segment can be represented by equation (8):

U _MI ＝U _T //15 (8)；

wherein, U _T Representing video data, U _MI Representing a video segment (also referred to as a behavioral segment),// representing a rounding down division.

The later stage of the transition may be decomposed every 5 frames to obtain a stage decomposition set of the later stage of the transition, where the decomposition process of the later stage of the transition may be as shown in equation (9):

U _time ＝U _0～4 +U _5～9 +U _10～14 ,x∈U _time (9)；

wherein, referring to formula (9), x can be expressed as a phase component of the late transition phaseEach video frame in the solution set; the last 5 frames U of the late stage of the transition can be taken _10～14 As late transition frame segments that need to be inserted into the video segment.

The frame interpolation formula can be expressed by formula (10):

wherein MI _iF Indicating clip-inserted frame segment (i.e. video clip after insertion of late stage of transition), U _MI A video segment is represented that is,

a frame segment representing a late stage of the transition.

Specific data enhancement methods include the following:

the first mode is as follows: inserting the transition late frame in the abnormal segment into each video segment; the second mode is as follows: inserting the transition late frame in the abnormal segment into front of each video segment except the first video segment; the third mode is as follows: inserting the frame interval of the late transition period in the abnormal segment into each video segment; the fourth mode is as follows: and inserting the frame interval frames of the late transition frames in the abnormal segment into each video segment.

Enhancement by inversion can be achieved with MI _back It is shown that the principles of inverse-playing augmentation and picture mirroring and image rotation are similar, for example, mirroring is not possible in reality, and the situation of walking upside down, but data can be augmented in this way to improve the robustness of the model. Even if a video time slice with abnormal behaviors is inversely played, the abnormal behaviors can still be seen, and the generalization characteristic of model training can be enhanced by increasing the inverse playing of abnormal behavior segments. Assuming that the inverse of the normal behavior Act is taken to be-Act, the inverse of the abnormal behavior Abn is-Abn. The ways of visiting the enhancement are as follows: the first method comprises the following steps: the whole abnormal event time slice (abnormal segment) is put upside down; the second mode is as follows: inverting in an abnormal continuous stage; the third mode is as follows: and (5) reversely placing at the later stage of transition. The inverse equation can be expressed by equation (11):

MI _back ＝-U _MI (11)；

step S416: updating the multi-example learning model by utilizing each enhanced abnormal audio and video data and each normal audio and video data;

step S418: and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

step S502: collecting a plurality of audio and video data of pigs;

step S504: extracting the characteristics of each audio and video data to obtain corresponding target characteristics;

step S506: inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so that the multi-example learning model can output the probability of the corresponding audio and video data;

step S508: determining probability distribution information of the plurality of audio and video data according to the probability of each audio and video data in each plurality of audio and video data;

wherein the probability distribution information includes a normal distribution,

step S510: determining the classification result of the audio and video data corresponding to the probability of the concentrated distribution in the normal distribution as normal audio and video data;

step S512: determining the classification result of other audio and video data except the normal audio and video data as abnormal audio and video data;

step S514: determining a loss function of the multi-example learning model by using the prior probability and the posterior probability of each abnormal audio and video data and the prior probability and the posterior probability of each normal audio and video data;

the probability value distribution is stable, each type of abnormality can be marked through an artificial posterior mode, the audio and video data are expanded through a data enhancement mode, the multi-example learning network can learn the expanded audio and video data to enable the audio and video data of the same type to tend to a certain fixed probability value, the probability value can be normalized according to the embodiment of the application, the probability value is in the range of (0,1), 0.5 can be used as a boundary line for judging whether the audio and video data are abnormal or not, the probability value of the audio and video data tends to 1 to indicate that the audio and video data are more abnormal, and the probability value of the audio and video data tends to 0 to indicate that the audio and video data are more normal. And the minimum probability value in the probability values of the abnormal audio and video data is higher than the maximum probability value in the probability values of the normal sample data.

The prior probability of the abnormal audio and video data is the probability of the corresponding audio and video data output by the multi-example learning network, and the posterior probability of the abnormal audio and video data can be calculated by a Bayesian formula by using the prior probability and a likelihood function of the abnormal audio and video data; similarly, the prior probability of the normal audio and video data is the probability of the corresponding audio and video data output by the multi-instance learning network, and the posterior probability of the normal audio and video data can be calculated by using the prior probability and the likelihood function of the normal audio and video data through a Bayesian formula.

Prior probability value f (V) corresponding to abnormal audio and video data _a ) The value range of (A) can be expressed by a formula (12), and the prior probability value f (V) of the normal audio and video data _n ) The value range can be represented by a formula (13), and the magnitude relation between the prior probability value corresponding to the abnormal audio and video data and the prior probability value corresponding to the normal audio and video data can be represented by a formula (14):

0.5≤f(V _a )＜1 (12)；

0＜f(V _n )＜0.5 (13)；

wherein the content of the first and second substances,

representing the minimum probability value in the probability values corresponding to the i abnormal audio and video data,

and representing the maximum probability value in the probability values corresponding to the i abnormal audio and video data.

Assuming that F (a) is a posterior to determine the current probability value of the abnormal segment (posterior probability of the abnormal audio/video data), and F (n) is a posterior to determine the current probability value of the normal segment (posterior probability of the normal audio/video data), continuously iterating in the loss process until the loss is minimum, the loss function of the multi-instance learning network can be represented by formula (15):

wherein, | F (a) -F (V) _a ) I denotes F (a) and F (V) _a ) Absolute value of the difference between, | F (n) -F (V) _n ) I denotes F (n) and F (V) _n ) The absolute value of the difference between.

To avoid the over-training of the multi-instance learning network, a smoothing constraint and a sparse constraint can be added to the loss function, the smoothing constraint can be represented by formula (16), and the sparse constraint can be represented by formula (17):

wherein, γ ₁ And gamma ₂ Belonging to hyper-parameters and used for expressing the required degree of smoothness and sparsity of a model (multi-example learning network) for penalty coefficient ₁ And l ₂ Is a regular penalty term.

Wherein the final loss function can be expressed by equation (18):

LOSS＝l(a,n)+γ ₁ l ₁ +γ ₂ l ₂ +γ ₃ ||W|| (18)；

wherein, γ ₃ The method comprises the following steps that a regular penalty term is adopted, W is a parameter needing learning training, and | W | represents the norm of W; the multi-example learning network can gradually improve the abnormal types which can be presumed by on-line learning, the detection accuracy and the recall rate through long-time posterior labeling.

Step S516: adjusting parameter weights of the multi-instance learning model using the loss function to update the multi-instance learning model;

step S518: and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

In the embodiment of the application, the problem of less negative sample data (namely abnormal sample data) in the sample data can be solved by adopting a data enhancement mode; the balance of the positive and negative samples is improved, and the detection rate of abnormal audio and video data is further improved; in addition, the parameter weight of the multi-example learning network is adjusted by using the loss function of the multi-example learning network, so that the updated multi-example learning network is obtained, and the detection rate of the multi-example learning network on abnormal audio and video data can be further improved.

Deep learning techniques are a branch of artificial intelligence. Deep learning is one way to realize artificial intelligence, namely, deep learning is taken as a means to solve problems in artificial intelligence. Deep learning has been developed into a multi-field interdisciplinary subject in more than 30 years, and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, computational complexity theory and the like. Deep learning theory is mainly to design and analyze some algorithms that allow computers to "learn" automatically. The deep learning algorithm is an algorithm for automatically analyzing and obtaining rules from data and predicting unknown data by using the rules. Because a large number of statistical theories are involved in the learning algorithm, deep learning is particularly closely related to inference statistics, which is also called a statistical learning theory. In the aspect of algorithm design, deep learning theory attention can be realized, and an effective learning algorithm is realized. Many inference problems are of no procedural difficulty, so part of deep learning research is to develop an easy-to-handle approximation algorithm.

Video processing techniques generally refer to various techniques for capturing, recording, processing, storing, transmitting, and reproducing a series of still images as electrical signals. In the embodiment of the application, openCV (Open Source Computer Vision Library) is used for processing videos, and the OpenCV is a cross-platform Computer Vision and machine learning Software Library issued based on BSD (Berkeley Software suite) permission (Open Source), and can run on Linux, windows, android and Mac OS operating systems. The method is light in weight and efficient, is composed of a series of C functions and a small number of C + + classes, provides interfaces of languages such as Python, ruby, MATLAB and the like, and realizes a plurality of general algorithms in the aspects of image processing and computer vision.

Anomaly detection technology is a technology for identifying abnormal patterns that do not conform to expected behavior, also known as outlier detection, and is primarily used for identifying items, events, or observations that do not match expected patterns or other items in a data set. Often abnormal items translate into problems of the type of bank fraud, structural defects, medical problems, text errors, etc. Anomalies are also known as outliers, novelties, noise, deviations, and exceptions.

In the related technology, the main modes for pig abnormality detection and early warning are as follows:

the first method comprises the following steps: audio detection method: utilize the standard signal of gathering the unusual sound of live pig, then right the unusual sound standard signal of live pig carries out extracting of characteristic parameter, according to characteristic parameter establishes the sound identification model, and the sound that the final real-time live pig of monitoring sent to judge through the sound identification model and discern whether the sound that the live pig sent is unusual sound (like pig cough sound, pig bite sound), and then can in time know whether the live pig takes place unusual action, so that timely taking corresponding measure guarantees the normal growth of swinery.

And the second method comprises the following steps: an image detection method comprises the following steps: the relationship between the pig environmental information and the pig behaviors is obtained by calibrating the environmental information, marking the back of the fattening pig, then carrying out target detection on the fattening pig by using a computer vision technology, and finally learning by using the detected pig behaviors and the environmental information through an SVM (Support Vector Machine). The method mainly solves the problems that only the behavior of the fattening pigs is tracked in the traditional fattening pig detection, and the influence of the environment of a fattening pig house on the behavior is ignored, so that the growing process of the fattening pigs is accurately detected, and the abnormal behavior of the target pigs is alarmed.

The first method has the following defects for the abnormal detection of the pig:

first, the detection of abnormality of pigs is limited, many abnormal behaviors of pigs do not contain audio data which can be detected acoustically, and many different abnormalities of pigs and audio performances of stress on pain are very similar.

Secondly, audio data acquisition is difficult, and pigs have great difference from humans only in terms of audio resolution and audio generation, which results in that recording devices generally suitable for humans cannot be applied to pigs. For example: the human auditory frequency range is 20 to 20000HZ (hertz), while the porcine auditory frequency range is 42 to 40500HZ.

Thirdly, the data processing and algorithm identification is poor, and since the sound data is serialized data and the visual manual processing is difficult, the standard label audio frequency is very difficult to separate due to the multi-sound fusion and the influence of noise on the multi-sound fusion. Thereby affecting the recognition accuracy of the algorithm.

The second method has the following defects for the abnormal detection of the pig:

firstly, the data characteristics are single, the image data only contains two-dimensional image information, from the data definition perspective, the abnormal behavior of the pig is often dynamic behavior and is accompanied with time slices, and the abnormal behavior of the pig cannot be defined only by the single characteristics of the image.

Secondly, algorithm identification is difficult, abnormality is difficult to define from the root of data, detection-based technology can detect position information of pigs in algorithm identification, and abnormality is still difficult to detect. Particularly, the data of abnormal pigs are difficult to acquire, and the detection technology relies on the premise of balancing a large number of positive and negative samples, so that effective training is difficult to obtain.

Thirdly, offline training has high limitation, a large number of equalization standards need to be performed on data in advance, if new anomalies exist, a large number of new anomaly data need to be marked, and then retraining is performed, so that the method is difficult to universally adapt to the anomalies.

The technical problem to be solved by the embodiment of the application can include:

first, the anomalous data itself occurs less often, which can cause the positive and negative samples to be unbalanced and thus not allow for algorithm training.

Second, data enhancement of abnormal video data has a strong dependence on image enhancement, but is difficult to enhance in terms of time sequence, i.e. only a single image is processed by changing color, channel, rotation, etc., and the relationship between images is not involved.

Thirdly, because the number of negative samples is small, the negative samples are easy to use in the training process, overfitting can be caused if the training time is prolonged, and model training iteration can not be met completely if the training is stopped.

Fourth, it is difficult to integrate video and audio data input algorithmically, and the way that pigs express abnormalities is limited to these two observable phenomena, so a new algorithm capable of processing video and audio simultaneously is needed.

Fifthly, due to the difficulty in acquiring abnormal data, the requirement on an algorithm for online learning is more urgent, an offline training mode hardly meets the almost infinite possible abnormal scenes, and the algorithm needs to be automatically adapted to various abnormal scenes without a large amount of data labeling.

The embodiment of the application provides a multi-Instance Learning (MIL Multiple instant Learning) pig abnormality detection early warning system and method based on a three-dimensional Convolutional Neural network (C3D, 3-dimensional Convolutional Neural Networks), and fig. 8 is a schematic flow diagram of pig abnormality detection early warning in the embodiment of the application, and referring to fig. 8, the method includes the following steps:

step 801: collecting data;

wherein, the original data of training and prediction can be obtained through data acquisition, and some simple data processing is additionally carried out, such as: cutting, rotating and scaling to meet the input dimension of a subsequent neural network; the raw data includes audio data and corresponding video data.

Referring to fig. 2, in order to better analyze and detect the type of the anomaly, feature extraction needs to be performed between behaviors and between sounds, and in the embodiment of the present application, fusion data of video and audio is used as input of a model, so that not only is the difficulty in detecting the anomaly by using a single video data feature made up, but also the problem of difficulty in confirming the anomaly detection by using audio data is made up.

Step 802: C3D audio and video fusion feature extraction;

the C3D audio and video fusion feature extraction can be used for extracting features of the collected audio data and video data, so that key features are reserved, and the features can be conveniently used for carrying out abnormity detection and classification by subsequent network reasoning combination.

It should be noted that step 802 may include the following steps 8021 to 8023:

step 8021: C3D feature extraction;

referring to fig. 3, in the embodiment of the present application, a three-dimensional convolutional neural network 303 may be used to perform feature extraction on video data (video stream) of each piece of audio/video data, so as to obtain video features corresponding to the audio/video data.

Step 8022: converting the audio map into an envelope transposition block;

referring to fig. 4, cepstrum analysis may be performed on audio data, and first, time domain information (e.g., spectrogram 401) of the collected audio may be converted into frequency domain information by using fourier transform to obtain audio frequency spectrum X [ k ] in the audio frequency spectrogram 402, and then, logarithms may be simultaneously taken on two sides of the audio frequency spectrum X [ k ]402, and then inverse fourier transform is performed to obtain cepstrum X [ k ], where the cepstrum X [ k ] includes a low-frequency portion h [ k ], which is also a core feature of the audio data, and in this embodiment, h [ k ] may be named as an envelope function graph 403. The cepstrum x k also comprises a spectral detail map 404, the envelope function map 403 being converted into one-dimensional data U, i.e. an envelope transposition block 405.

Step 8023: fusing the 3D features with the envelope features;

referring to fig. 5, the parameter of the convolution kernel is updated by the parameter of the sound corresponding time slice (audio feature 501), and then the convolution kernel is used to perform the convolution operation on the video slice (video feature 502), so as to obtain the target feature 503.

Step 803: MIL training reasoning;

referring to fig. 6, the MIL training reasoning uses the fused target features to perform fitting and splitting through the MIL network, and finally hides to the low-dimensional space to implement linear separability of abnormal data and non-abnormal data in the low-dimensional space. The low-dimensional spatial distribution of the input video packet set has been obtained by MIL.

It should be noted that the learning process of the whole multi-example learning network still has two difficulties: the first point is that even if online learning can solve the problem that the prior label training data is too small to be acquired, the essence that abnormal data is always a small sample cannot be solved, under the condition that positive and negative samples are not balanced, a model (multi-example learning network) can regard all samples as positive samples so as to improve the accuracy, for example, an abnormal event only occurs once in ten thousand events, if processing is not carried out, the model can learn that all events are normal to meet the accuracy of 99.99%, and if positive and negative balance is required, only two pictures (one positive and one negative) can obviously not be effectively trained.

The second point is how to express the loss measurement mathematically in the whole on-line learning training process, so that the purpose of making the output probability value realize cohesion and separation can be realized in a programmable way. In the related technology, a cross entropy mode is adopted to carry out loss measurement to describe the difference between a real label and a predicted label, and gradient updating is carried out on weights of all layers through difference back propagation.

For the first problem, the embodiment of the present application may use a data enhancement mode to solve the problem that negative sample data (i.e. abnormal sample data) in the sample data is few. For the second problem, the embodiment of the present application may adopt LOSS _MIL New loss function to train model to solve the question of how to learn iterationTo give a title.

Step 804: and obtaining a classification result.

Since most data distribution should belong to non-abnormal data, only the data of the minimum part belongs to abnormal data, the part with the most content in the classification result can be directly represented as the non-abnormal data and automatically added with non-abnormal labels, and the rest parts (the data except the non-abnormal data) are marked as abnormal labels. The method comprises the steps that various data in an abnormal label are subjected to preliminary clustering, only the abnormal name of each abnormal category needs to be marked, abnormal data can be subjected to continuous online learning through data enhancement and expansion, the distance between the abnormal data and non-abnormal data is gradually increased according to a defined loss function, the distance between various abnormalities in the abnormal data is gradually increased, and the purpose and the effect of detecting and early warning the abnormality of the pigs are finally achieved.

step S601: monitoring the pig state of the pigsty by using a multi-example learning network;

wherein, video data and audio data of the pig can be collected simultaneously.

Step S602: and carrying out probability value speculation on the collected video data and audio data through a default online learning platform.

The online learning platform can be deployed with a multi-instance learning network, and can firstly respectively perform feature extraction on video data and corresponding audio data to obtain video features and corresponding audio features, and then fuse the video features and the corresponding audio features to obtain target features; the target feature may be input into the multi-instance learning network, which may make a prediction of a probability value of the target feature and output the predicted probability value. With the continuous expansion of the collected video data and audio data, the presumed probability distribution through the default online learning platform is in a differentiable state gradually.

Step S603: based on the prior knowledge that the normality is far more than the abnormality, the abnormal label is marked in an artificial posterior way.

It should be noted that, because the number of normal pigs is far greater than the number of abnormal pigs in general, data corresponding to the probability of the concentrated distribution in the probability distribution may be determined as normal data, data other than the normal data may be determined as abnormal data, an abnormal label is labeled for the abnormal data, and a normal label is labeled for the normal data.

Step S604: performing data enhancement on the tag data;

the data added with the tag can be subjected to data enhancement, and the data enhancement mode can include conventional enhancement, frame interpolation enhancement and reverse amplification enhancement.

Step S605: learning by taking the probability value of the label data as an initial point to realize cohesive segregation;

the label data is data added with a normal label or an abnormal label, the multi-example learning network can be regulated by means of feedback of the label data subjected to data enhancement, a loss function of the multi-example learning network is determined according to the prior probability and the posterior probability of the label data, the parameter weight of the multi-example learning network is adjusted by means of the loss function, the updated multi-example learning network is obtained, so that the updated multi-example learning network can output the probability value to achieve cohesion outlier, namely, the extreme abnormal point is propagated forward to form a smaller probability value, and similarly, the difference between the abnormal probability value and the non-abnormal probability value is calculated, and a larger probability value difference is generated as far as possible.

Step S606: iteration is stable to obtain the distinguishing detection of the abnormity and the non-abnormity;

step S607: manually checking the abnormal type of the specific label category;

it should be noted that, in step S606, the distinction between abnormal and non-abnormal (i.e. normal) is mainly performed, and in step S607, the distinction between different abnormal conditions is mainly performed, for example, abnormal conditions with similar behaviors and different behaviors, and as the learning of abnormal conditions and non-abnormal conditions, the specific abnormal category may be determined in the manner of manual posterior labeling, and then the data added with the specific abnormal category label is used for feedback adjustment again, so that the output probability values are separated from each other, that is, the abnormal points of the same kind are propagated forward to have smaller probability value differences, and similarly, the probability value differences of the abnormal conditions of different categories are calculated, and a larger probability value difference is generated as much as possible.

Step S608: repeating the step S604 to the step S606 to realize the extraction and learning of new abnormity;

step S609: and (5) completing training to obtain a model with strong robustness and strong generalization so as to detect abnormal behaviors of the pigs.

The method and the device have the advantages that pre-trained C3D is used for extracting video and audio features, the problem that offline training in the related technology strongly depends on a large amount of prior abnormal labeling data is solved by using an online learning mode, and the specific principle, the data fusion enhancement mode, the Multiple Instance Learning (MIL) mode and the specially constructed loss function are used for model training.

The embodiment of the application can provide an online learning method for audio and video fusion abnormity detection and analysis and detection of the relation between abnormal behaviors, abnormal behaviors and abnormal sound and between abnormal behavior sound and abnormal behavior sound.

In the embodiment of the application, online learning is more easily adapted to the early stage with rare abnormal data; the detection rate is higher, and the model learning capability is enhanced by the video and audio multi-feature fusion extraction; the abnormal partition capability is stronger, the relations between behaviors and sound are learned, and the abnormal partitions of similar behaviors and the abnormal partitions of similar sound are easier to distinguish; a large amount of labor cost and time cost can be saved, online learning only needs to label the abnormity after the abnormity is detected by the algorithm, and the complexity that a classification detection algorithm needs to label a large amount of abnormal data is eliminated.

Based on the foregoing embodiments, the present application provides a pig abnormality detection apparatus, which includes modules that can be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 9 is a schematic structural diagram of a component of a device for detecting abnormality of a pig according to an embodiment of the present application, and as shown in fig. 9, the device 900 includes an acquisition module 901, an extraction module 902, a training module 903, and a detection module 904, where:

the acquisition module 901 is used for acquiring a plurality of audio and video data of the pigs;

an extraction module 902, configured to perform feature extraction on each piece of audio/video data to obtain a corresponding target feature;

a training module 903, configured to input a target feature of each of the multiple pieces of audio-video data into a multi-example learning model, so as to train the multi-example learning model;

and a detection module 904, configured to detect abnormal behaviors of the pig by using the trained multi-instance learning model.

In an embodiment, each of the audio and video data includes audio data and video data, and the extracting module 902 includes: the first extraction unit is used for extracting the characteristics of the video data in each piece of audio and video data by using a three-dimensional convolution neural network to obtain the video characteristics of the corresponding audio and video data; the second extraction unit is used for performing cepstrum analysis on the audio data in each piece of audio and video data to obtain audio features corresponding to the audio and video data; and the fusion unit is used for fusing the audio features and the video features of each audio and video data to obtain corresponding target features.

In one embodiment, the training module 903 comprises: the input unit is used for inputting the target characteristics of each piece of audio and video data in the plurality of pieces of audio and video data into a multi-example learning model so as to enable the multi-example learning model to output the probability of the corresponding audio and video data; the first determining unit is used for determining probability distribution information of the plurality of audio and video data according to the probability of each audio and video data in each plurality of audio and video data; the second determining unit is used for determining the classification result of each piece of audio and video data according to the probability distribution information; and the updating unit is used for updating the multi-example learning model according to the classification result of each piece of audio and video data.

In one embodiment, the probability distribution information includes a normal distribution, and the second determining unit includes: the first determining subunit is used for determining that the classification result of the audio and video data corresponding to the probability of the concentrated distribution in the normal distribution is normal audio and video data; and the second determining subunit is used for determining that the classification result of other audio and video data except the normal audio and video data is abnormal audio and video data.

In one embodiment, the update unit includes: the enhancer unit is used for performing data enhancement on the abnormal audio and video data to obtain enhanced abnormal audio and video data; the data enhancement mode at least comprises the following steps: conventional enhancement, frame interpolation enhancement and reverse amplification enhancement; and the first updating subunit is used for updating the multi-example learning model by using each enhanced abnormal audio and video data and each normal audio and video data.

In one embodiment, the update unit includes: the third determining subunit is configured to determine a loss function of the multi-example learning model by using the prior probability and the posterior probability of each abnormal audio/video data and the prior probability and the posterior probability of each normal audio/video data; a second updating subunit, configured to adjust the parameter weights of the multi-instance learning model by using the loss function, so as to update the multi-instance learning model.

In one embodiment, the second extraction unit includes: the first transformation subunit is used for carrying out Fourier transformation, logarithm taking and inverse Fourier transformation on the audio data of each piece of audio and video data to obtain an envelope function of the corresponding audio and video data; and the second conversion subunit is used for performing transposition conversion on the envelope function of each piece of audio and video data to obtain the audio characteristics of the corresponding audio and video data.

In one embodiment, the fusion unit includes: the third updating subunit is used for updating the convolution kernel of the three-dimensional convolution neural network by using the audio frequency characteristics of each piece of audio and video data to obtain an updated three-dimensional convolution neural network; and the convolution subunit is used for performing convolution operation on the video characteristics of the corresponding audio and video data by using the updated three-dimensional convolution neural network to obtain the corresponding target characteristics.

It should be noted that, in the embodiment of the present application, if the above-mentioned abnormality detection method for pigs is implemented in the form of a software functional module and is sold or used as an independent product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or a part contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including a plurality of instructions for enabling an electronic device (which may be a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a video phone, a television, a sensing device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

Correspondingly, an embodiment of the present application provides an electronic device, fig. 10 is a schematic diagram of a hardware entity of the electronic device according to the embodiment of the present application, and as shown in fig. 10, the hardware entity of the electronic device 1000 includes: the pig abnormal detection method comprises a memory 1001 and a processor 1002, wherein the memory 1001 stores a computer program which can run on the processor 1002, and the processor 1002 executes the computer program to realize the steps of the pig abnormal detection method according to the embodiment.

The Memory 1001 is configured to store instructions and applications executable by the processor 1002, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 1002 and modules in the electronic device 1000, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

Correspondingly, the embodiment of the application provides a computer readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormality detection method for the pig provided in the above embodiment.

Here, it should be noted that: the above description of the storage medium and device embodiments, similar to the above description of the method embodiments, has similar advantageous effects as the device embodiments. For technical details not disclosed in the embodiments of the storage medium and method of the present application, reference is made to the description of the embodiments of the apparatus of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or a part contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including a plurality of instructions for enabling a computer device (which may be a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a video phone, a television, a sensing device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict. The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for detecting abnormality in a pig, comprising:

collecting a plurality of audio and video data of pigs;

extracting the characteristics of each audio and video data to obtain corresponding target characteristics;

inputting the target characteristics of each audio and video data in the plurality of audio and video data into a multi-example learning model so as to train the multi-example learning model;

and detecting abnormal behaviors of the pigs by using the trained multi-example learning model.

2. The method according to claim 1, wherein each of the audio-video data includes audio data and video data, and the performing feature extraction on each of the audio-video data to obtain a corresponding target feature includes:

performing feature extraction on video data in each piece of audio and video data by using a three-dimensional convolutional neural network to obtain video features corresponding to the audio and video data;

performing cepstrum analysis on the audio data in each piece of audio and video data to obtain audio features corresponding to the audio and video data;

and fusing the audio characteristic and the video characteristic of each audio and video data to obtain a corresponding target characteristic.

3. The method according to claim 1, wherein the inputting the target feature of each of the plurality of audio-video data into a multi-example learning model to train the multi-example learning model comprises:

inputting the target characteristics of each audio and video data in the audio and video data into a multi-example learning model so that the multi-example learning model can output the probability of the corresponding audio and video data;

determining probability distribution information of the plurality of audio and video data according to the probability of each audio and video data in each plurality of audio and video data;

determining a classification result of each piece of audio and video data according to the probability distribution information;

and updating the multi-example learning model according to the classification result of each piece of audio and video data.

4. The method according to claim 3, wherein the probability distribution information includes a normal distribution, and the determining the classification result of each piece of audio-video data according to the probability distribution information includes:

determining the classification result of the audio and video data corresponding to the probability of the concentrated distribution in the normal distribution as normal audio and video data;

and determining that the classification result of other audio and video data except the normal audio and video data is abnormal audio and video data.

5. The method according to claim 4, wherein the updating the multi-instance learning model according to the classification result of each piece of audio-video data comprises:

performing data enhancement on the abnormal audio and video data to obtain enhanced abnormal audio and video data; the data enhancement mode at least comprises the following steps: conventional enhancement, frame interpolation enhancement and reverse amplification enhancement;

and updating the multi-example learning model by utilizing each enhanced abnormal audio and video data and each normal audio and video data.

6. The method according to claim 4, wherein the updating the multi-instance learning model according to the classification result of each piece of audio-video data comprises:

determining a loss function of the multi-example learning model by using the prior probability and the posterior probability of each abnormal audio and video data and the prior probability and the posterior probability of each normal audio and video data;

adjusting the parameter weights of the multi-instance learning model using the loss function to update the multi-instance learning model.

7. The method according to claim 2, wherein performing cepstrum analysis on the audio data in each of the audio and video data to obtain audio features corresponding to the audio and video data comprises:

carrying out Fourier transform, logarithm taking and inverse Fourier transform on the audio data of each piece of audio and video data to obtain an envelope function of the corresponding audio and video data;

and performing transposition transformation on the envelope function of each audio and video data to obtain the audio characteristics of the corresponding audio and video data.

8. The method according to claim 2, wherein the fusing the audio features and the video features of each of the audio-video data to obtain corresponding target features comprises:

updating the convolution kernel of the three-dimensional convolution neural network by using the audio frequency characteristics of each piece of audio and video data to obtain an updated three-dimensional convolution neural network;

and carrying out convolution operation on the video characteristics of the corresponding audio and video data by using the updated three-dimensional convolution neural network to obtain the corresponding target characteristics.

9. An abnormality detection apparatus for a pig, said apparatus comprising:

the acquisition module is used for acquiring a plurality of audio and video data of the pigs;

the extraction module is used for extracting the characteristics of each audio and video data to obtain corresponding target characteristics;

the training module is used for inputting the target characteristics of each piece of audio and video data in the plurality of pieces of audio and video data into a multi-example learning model so as to train the multi-example learning model;

and the detection module is used for detecting the abnormal behaviors of the pigs by utilizing the trained multi-example learning model.

10. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor when executing the program performs the steps in the method of abnormality detection for a pig according to any one of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for detecting abnormality in a pig according to any one of claims 1 to 8.