CN112287893B - Sow lactation behavior identification method based on audio and video information fusion - Google Patents

Sow lactation behavior identification method based on audio and video information fusion Download PDF

Info

Publication number
CN112287893B
CN112287893B CN202011336361.5A CN202011336361A CN112287893B CN 112287893 B CN112287893 B CN 112287893B CN 202011336361 A CN202011336361 A CN 202011336361A CN 112287893 B CN112287893 B CN 112287893B
Authority
CN
China
Prior art keywords
audio
behavior
lactation
sequence
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011336361.5A
Other languages
Chinese (zh)
Other versions
CN112287893A (en
Inventor
杨阿庆
薛月菊
赵慧民
林智勇
刘晓勇
陈荣军
黄华盛
张磊
韩娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202011336361.5A priority Critical patent/CN112287893B/en
Publication of CN112287893A publication Critical patent/CN112287893A/en
Application granted granted Critical
Publication of CN112287893B publication Critical patent/CN112287893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P60/00Technologies relating to agriculture, livestock or agroalimentary industries
    • Y02P60/80Food processing, e.g. use of renewable energies or variable speed drives in handling, conveying or stacking
    • Y02P60/87Re-use of by-products of food processing for fodder production

Abstract

The invention discloses a sow lactation behavior identification method based on audio and video information fusion, which comprises the following steps: acquiring overlooking video and audio data of lactating sows, extracting audio and video pictures from the original video and audio data, carrying out audio denoising and framing, acquiring an audio waveform chart sequence, and extracting a corresponding optical flow chart sequence according to the original video pictures; inputting the optical flow sequence and the video frame into a preset appearance-motion double-flow network to extract visual characteristics, inputting the audio waveform diagram sequence into a preset auditory characteristic extraction network, splicing and fusing the visual and auditory characteristics, inputting into a preset long-period and short-period memory network to extract time sequence visual and auditory characteristics, and finally, sending into a full-connection layer and a soft maximum to conduct behavior classification, and outputting a lactation behavior class. The invention utilizes the visual and auditory information in the lactation behavior to identify the sow lactation behavior in the pig farm environment, thereby acquiring the sow lactation information, being beneficial to timely finding out abnormal conditions of the lactation behavior and taking effective measures, and improving the economic benefit of the pig farm.

Description

Sow lactation behavior identification method based on audio and video information fusion
Technical Field
The invention relates to the technical fields of intelligent livestock raising, multi-mode information fusion and interactive behavior recognition, in particular to a sow lactation behavior recognition method based on audio and video information fusion.
Background
The lactation of the sow is not only an important embodiment of the physical condition and maternal behavior capability of the sow, but also a main factor affecting the survival and growth of piglets in lactation. The nursing behavior state of the sow is observed, and the nursing behavior abnormality is found in time, so that effective manual intervention decisions can be made by a pig farm manager in time, the body health states of the sow and the piglet are improved, and the economic benefit of the pig farm is improved.
Currently, monitoring of sow lactation is mainly performed by manual on-site or video monitoring and vision-based behavior monitoring methods. The artificial observation mode requires an artificial site or long-time observation of sow behaviors through a monitoring video, records information such as occurrence time, duration and frequency of sow lactation, and judges whether the sow lactation is abnormal through experience. The sow lactation behavior is an interactive behavior of a sow and a piglet group, researches on the current vision-based sow lactation behavior identification are freshly reported, and the mode is mainly based on visual information expressed by the sow in the lactation process, such as the characteristics that the sow lies on the side, the breast is exposed, the piglet lies on the breast to perform breast pumping exercise and the like, and the sow lactation behavior information is automatically acquired through algorithm analysis. However, due to the change of the environmental light of the pig house, crowding and shielding of the piglet group, partial visual information cannot be obtained, and the judgment and analysis of the lactation behavior are affected.
In summary, none of the existing methods for monitoring lactating sows are suitable for or difficult to identify lactating sows. In view of the above, it is highly desirable to provide an automatic identification method for sow lactation, so as to realize accurate monitoring of sow lactation. Because the sow lactation is accompanied with regular lactation sound, the invention provides the sow lactation behavior identification method based on audio-video information fusion, and accurate monitoring of sow lactation behaviors is realized by mutually cooperating and assisting the video and audio signals.
Disclosure of Invention
The invention aims to overcome the defect and the defect that the prior art is difficult to accurately identify the lactating behavior of the sow, and provides an identification method of the lactating behavior of the sow based on audio and video information fusion, which provides data support for timely making effective management decisions and improves the health condition of pigs and economic benefit of a pig farm.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a sow lactation behavior identification method based on audio and video information fusion comprises the following steps:
1) Collecting audio and video data of sows in lactation period;
2) Data preprocessing: firstly separating audio and video data, then denoising and framing the audio data, obtaining an audio waveform diagram sequence, and finally extracting optical flow of the video data to obtain an optical flow image sequence;
3) Inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio waveform image sequence into a preset auditory feature extraction network to obtain auditory features;
4) Inputting the visual features and the auditory features into a long-term and short-term memory network for further feature fusion and extraction to obtain time sequence audiovisual features;
5) And sending the time sequence visual and audio visual characteristics into a full-connection layer and a soft maximum classifier to conduct behavior classification, so as to realize automatic identification of sow lactation behaviors.
In the step 1), a camera with a recording function is arranged right above a pig house, and overlook video and audio data of the sow in the lactation period are collected.
Said step 2) comprises the steps of:
2.1 Separating audio and video data from the photographed audio and video data;
2.2 Processing the original audio signal by using a band-pass filter to obtain a denoised audio signal corresponding to the original audio signal;
2.3 Framing the denoised audio signal, wherein the frame length is 30ms, overlapping frames for 10ms, and converting the audio signal into an audio waveform diagram sequence;
2.4 And (3) acquiring an optical flow image sequence of the sow in the lactation period to be monitored according to the original image sequence of the sow in the lactation period to be monitored by using an optical flow method.
The step 3) comprises the following two treatments:
a. inputting a video frame and an optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in a video from the video frame and the optical flow image sequence through a convolution layer, a downsampling layer and a full connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector; before inputting the video frame and the optical flow image sequence into the preset appearance-motion double-flow network, training the preset appearance-motion double-flow network is needed, and the method specifically comprises the following steps:
acquiring an original video frame with a mammal behavior mark and an optical flow image sequence; inputting an original video frame with a lactation behavior mark and a corresponding optical flow image sequence into an appearance-motion double-flow network for training, and obtaining optimal network parameters of the appearance-motion double-flow network;
b. inputting the audio waveform diagram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a downsampling layer and a full-connection layer of the auditory feature extraction network; before inputting the audio waveform chart sequence into a preset auditory feature extraction network, training the preset auditory feature extraction network is needed, and the method specifically comprises the following steps:
acquiring raw audio data with a lactation mark; denoising the original audio signal by adopting a band-pass filter; framing the denoised audio signal, wherein the frame length is 30ms, and the frames overlap for 10ms; converting the audio signal after framing into an audio waveform diagram sequence, and obtaining an audio waveform diagram sequence corresponding to the original audio signal; inputting the audio waveform diagram sequence with the lactation mark into a preset auditory feature extraction network for training, and obtaining the optimal network parameters of the auditory feature extraction network;
said step 4) comprises the steps of:
4.1 Stacking and splicing the one-dimensional visual feature vector and the one-dimensional auditory feature vector to obtain visual and audio features;
4.2 The audio-visual characteristics are sent into a preset long-short-period memory network to perform characteristic extraction, and the time sequence audio-visual characteristics are output;
before the audiovisual features are sent to the preset long-short-period memory network, the preset network is trained, and the method specifically comprises the following steps:
acquiring original video and audio sequence samples with behavior marks, acquiring corresponding optical flow image sequence samples according to the original video sequence with the behavior marks, and acquiring denoised audio waveform diagram sequence samples according to the original audio sequence samples with the behavior marks;
inputting an original video frame with a behavior mark and an optical flow sequence into a preset appearance-motion double-flow network to extract visual features, and inputting an audio waveform chart sequence sample into a preset auditory feature extraction network to extract auditory features;
and stacking and splicing the auditory features and the visual features, inputting the stacked and spliced auditory features and the visual features into a preset long-short-period memory network for training, and obtaining optimal network parameters.
Said step 5) comprises the steps of:
5.1 Inputting the time sequence visual and auditory characteristics into a full-connection layer for further characteristic extraction and integration to obtain 2 characteristic values which respectively correspond to the characteristic values of the lactation behavior and the non-lactation behavior;
5.2 Inputting the characteristic values of the lactation behavior and the non-lactation behavior into a soft maximum classifier to calculate the probability values of the 2 characteristics corresponding to the lactation behavior and the non-lactation behavior, and taking the behavior class of the behavior with the largest probability as a behavior recognition result, thereby realizing the recognition of the lactation behavior of the sow.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides an automatic identification method for sow lactation, so that sow lactation information is obtained, and lactation capacity is further analyzed.
2. The invention combines the video and audio information, and the video and audio information are mutually cooperated and assisted, so that the extracted features are rich and accurate.
3. The invention adopts the video frame sequence, the optical flow sequence and the audio sequence to analyze from three aspects of space distribution, time sequence motion and sound respectively, so the identification precision is high.
4. The visual feature expression capability is enhanced by adopting the appearance-motion double-flow network to integrate the optical flow and the video convolution feature.
5. The invention adopts the long-period memory network to integrate visual and auditory characteristics, and enhances the expression capability of behavior characteristics on time sequence.
6. In the potential application method, the occurrence time, duration and frequency of the lactation of the sow in the fixed time are counted, so that the law of the lactation of the sow is studied. On one hand, the lactation information can be used for predicting the health and welfare conditions of sows and piglets, on the other hand, the lactation information provides data reference for sow performance judgment, further provides decision support for pig farm seed selection and feeding management, and improves pig farm economic benefits.
7. For behaviors accompanied with specific sounds, the method fuses video and audio multi-mode data, has strong applicability, and can be used for researching other animal behaviors.
Drawings
FIG. 1 is a schematic overall flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the method of the present invention.
Fig. 3 is a schematic diagram of the structure of an appearance-motion dual-flow network.
Detailed Description
The invention will be further illustrated with reference to specific examples.
As shown in fig. 1 and fig. 2, the sow lactation behavior recognition method based on audio-video information fusion provided by the invention realizes the recognition of commercial in-band sow lactation behavior based on audio-video multi-mode data, provides a reference for real-time monitoring and analysis of sow lactation behavior and health status, and comprises the following steps:
1) Collecting audio and video data of sows in lactation period;
specifically, a camera with a recording function is arranged right above an actual commercial pig house, and daily behavior video and audio of the lactating sow are continuously shot. In the embodiment, a Haikang visual DS-2CD3T46FWD V2-I3 high-definition audio camera is adopted to obtain daily behavior video and audio data comprising the lactation behavior of the sow.
2) Data preprocessing: firstly separating audio and video data, then denoising and framing the audio data, obtaining an audio waveform diagram sequence, and finally extracting optical flow of the video data to obtain an optical flow image sequence; the method specifically comprises the following steps:
2.1 Audio and video frequency editing and converting tool is adopted to separate audio and video frequency of the obtained audio and video frequency data, and audio frequency data and video frequency pictures are extracted;
2.2 In order to reduce the judgment of the noise on the lactation sound of the sow, the invention firstly denoises the acquired original audio data because of the noise of the piglets, the noise of other pigs and the noise of the machine equipment in the actual pig house. The sow can send humming noise before breast feeding to attract the piglet to suck milk, and the piglet is guided to change between massage and sucking through the change of humming speed, based on the property, a band-pass filter suitable for sow breast feeding noise is adopted to filter the original audio, so that the influence of other noise on breast feeding noise is reduced;
2.3 For two-dimensional image input suitable for the convolutional network in the embodiment, the embodiment frames the denoised time-sequence audio signal, the frame length is 30ms, the inter-frame overlap is 10ms, and the audio signal is converted into an audio waveform diagram sequence;
2.4 Acquiring an optical flow image sequence of the sow to be monitored according to an original image sequence of the sow to be monitored by using an optical flow method, wherein the motion intensity and the motion direction of each pixel point in the optical flow image are recorded.
3) Inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio waveform image sequence into a preset auditory feature extraction network to obtain auditory features; the specific cases are as follows:
a. inputting a video frame and an optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in the video from the video frame and the optical flow image sequence through a convolution layer, a downsampling layer and a full connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector;
in this embodiment, a preset appearance-motion dual-flow network structure is shown in fig. 3, where the network structure is composed of an appearance flow and a motion flow, and video frames and optical flow sequences are respectively used as inputs, and then feature maps with the same dimension are output after passing through 5 layers of convolution layers and 5 layers of downsampling layers, and the two flow feature maps are fused in a splicing manner and then sent to 2 layers of convolution layers for further feature extraction and fusion, and the fused feature maps are sent to a continuous 2 layers of full-connection layers to output one-dimensional visual features representing appearance and motion;
the video frames and optical flow sequences need to be trained before they are input into the look-and-motion dual-flow network architecture. Since the network only implements feature extraction, to obtain the optimal network parameters of the network, 1 full convolution layer and soft maximum (softmax) layer are added for behavior classification after the network before training the network. In the training process, the number of the output behavior categories of the network is set to be 2, the output behavior categories are used for representing the lactation behavior and the non-lactation behavior, an original video frame with a lactation behavior mark and a corresponding optical flow image sequence are input into the appearance-motion double-flow network for forward propagation and backward feedback training, and the optimal network parameters of the appearance-motion double-flow network are obtained. The original video frame and the optical flow sequence are input into the convolution, downsampling and full connection layer of the trained appearance-motion double-flow network, and then one-dimensional visual characteristics are output.
b. Inputting the audio waveform diagram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a downsampling layer and a full-connection layer of the auditory feature extraction network;
in this embodiment, the preset hearing feature extraction structure is composed of 5 convolutions layers, 5 downsampling layers and 2 full-connection layers;
the audio waveform diagram sequence needs to be trained before it is input into a predetermined auditory feature extraction network. Because the network only realizes feature extraction, in order to acquire the optimal network parameters of the network, before training the network, adding 1 layer of full convolution layer and softmax layer for behavior classification after the network, and setting the number of output behavior classes as 2 to represent lactation behavior and non-lactation behavior;
the method comprises the steps of obtaining original audio data with a lactation behavior mark, denoising an original audio signal by a band-pass filter, framing the denoised audio signal, enabling the frame length to be 30ms, overlapping frames for 10ms, converting the framed audio signal into an audio waveform diagram sequence, inputting the audio waveform diagram sequence with the lactation behavior mark into a preset auditory feature extraction network for forward propagation and backward feedback training, and obtaining optimal network parameters of the auditory feature extraction network; the audio waveform diagram sequence is input into a convolution layer, a downsampling layer and a full-connection layer of the trained auditory feature extraction network to output one-dimensional auditory features.
4) The visual and auditory characteristics are input into a long-term and short-term memory network to perform further characteristic fusion and extraction, and the time sequence visual and auditory characteristics are obtained, which specifically comprises the following steps:
4.1 Stacking and splicing the one-dimensional visual features and the auditory features to obtain the visual and audio features;
4.2 The audio-visual characteristics are sent into a preset long-short-period memory network for characteristic extraction, one-dimensional time sequence characteristics are output, and the preset long-short-period memory network is set in advance and trained; a long short term memory network (LSTM) is a time recurrent neural network adapted to process and predict time series events. An LSTM is made up of a cell and three controllers, input gate, forget gate and Output gate, respectively, to protect and control the cell state. In this embodiment, the LSTM is designed for timing feature extraction, and the LSTM may be set according to actual requirements, which is not specifically limited herein.
When the LSTM is trained, the network parameters can be trained by forming a classification network together with the subsequent full-connection layer and the soft maximum classifier. The specific training steps of LSTM include:
acquiring original video and audio sequence samples with behavior marks, acquiring optical flow image sequence samples with behavior marks according to the original video sequence with the behavior marks, and acquiring audio waveform diagram sequence samples with behavior marks after denoising according to the original audio sequence samples with the behavior marks; inputting the video and optical flow sequence with the behavior mark into a preset appearance-motion double-flow network extraction visual feature, and inputting the audio waveform chart sequence sample into a preset auditory feature extraction network to extract auditory features; and stacking and splicing the auditory features and the visual features, inputting the stacked and spliced auditory features and the visual features into a preset long-short-period memory network for forward propagation and backward feedback training, and obtaining optimal network parameters.
5) The method sends the time sequence visual and audio sense characteristics into a full-connection layer and a soft maximum classifier to conduct behavior classification, realizes automatic identification of sow lactation behavior, and specifically comprises the following steps of
5.1 Inputting the time sequence visual and auditory characteristics into a full-connection layer for further characteristic extraction and integration, setting the number of neurons of the full-connection layer to be 2, and obtaining 2 characteristic values which respectively correspond to the characteristic values of the lactation behavior and the non-lactation behavior;
5.2 Inputting the characteristic values of the lactation behavior and the non-lactation behavior into a soft maximum classifier to calculate the probability values of the 2 characteristics corresponding to the lactation behavior and the non-lactation behavior, and taking the behavior class of the behavior with the largest probability as a behavior recognition result, thereby realizing the recognition of the lactation behavior of the sow.
In summary, the sow lactation behavior recognition method disclosed by the invention comprises the steps of collecting lactation sow overlook video-audio data, extracting audio and video pictures from original video-audio data, carrying out audio denoising and framing to obtain an audio waveform diagram sequence, and extracting a corresponding optical flow diagram sequence according to the original video pictures; inputting the optical flow sequence and the video frame into a preset appearance-motion double-flow network to extract visual characteristics, inputting the audio waveform diagram sequence into a preset auditory characteristic extraction network, splicing and fusing the visual and auditory characteristics, inputting a preset long-period and short-period memory network to extract time sequence visual and auditory characteristics, finally sending the time sequence visual and auditory characteristics into a full-connection layer and performing soft maximum behavior classification, and outputting a lactation behavior classification so as to realize sow lactation behavior identification. The study is based on the video-audio multi-mode information fusion to identify the lactating behavior of the sow, provides reliable lactating information for a pig farm manager, guides the pig farm manager to conduct timely and effective management decisions, improves the health and welfare conditions of the pig, provides data support for exploring the mother behavior rule of the sow, and is worth popularizing.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims (5)

1. The sow lactation behavior identification method based on audio and video information fusion is characterized by comprising the following steps of:
1) Collecting audio and video data of sows in lactation period;
2) Data preprocessing: firstly separating audio and video data, then denoising and framing the audio data, obtaining an audio waveform diagram sequence, and finally extracting optical flow of the video data to obtain an optical flow image sequence;
3) Inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio waveform image sequence into a preset auditory feature extraction network to obtain auditory features; the appearance-motion double-flow network consists of an appearance flow and a motion flow, takes a video frame and an optical flow sequence as input respectively, then outputs feature images with the same dimension after passing through a 5-layer convolution layer and a 5-layer downsampling layer, and sends the two-flow feature images into a 2-layer convolution layer for feature extraction and fusion after being fused in a splicing mode, and sends the fused feature images into a continuous 2-layer full-connection layer to output one-dimensional visual features representing appearance and motion;
4) Inputting the visual features and the auditory features into a long-term and short-term memory network for further feature fusion and extraction to obtain time sequence audiovisual features;
5) And sending the time sequence visual and audio visual characteristics into a full-connection layer and a soft maximum classifier to conduct behavior classification, so as to realize automatic identification of sow lactation behaviors.
2. The sow lactation identification method based on audio and video information fusion according to claim 1, wherein the method is characterized in that: in the step 1), a camera with a recording function is arranged right above a pig house, and overlook video and audio data of the sow in the lactation period are collected.
3. The sow lactation identification method based on audio and video information fusion according to claim 1, wherein the step 2) comprises the following steps:
2.1 Separating audio and video data from the photographed audio and video data;
2.2 Processing the original audio signal by using a band-pass filter to obtain a denoised audio signal corresponding to the original audio signal;
2.3 Framing the denoised audio signal, wherein the frame length is 30ms, overlapping frames for 10ms, and converting the audio signal into an audio waveform diagram sequence;
2.4 And (3) acquiring an optical flow image sequence of the sow in the lactation period to be monitored according to the original image sequence of the sow in the lactation period to be monitored by using an optical flow method.
4. The sow lactation identification method based on audio and video information fusion according to claim 1, wherein the step 3) comprises the following two processes:
a. inputting a video frame and an optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in a video from the video frame and the optical flow image sequence through a convolution layer, a downsampling layer and a full connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector; before inputting the video frame and the optical flow image sequence into the preset appearance-motion double-flow network, training the preset appearance-motion double-flow network is needed, and the method specifically comprises the following steps:
acquiring an original video frame with a mammal behavior mark and an optical flow image sequence; inputting an original video frame with a lactation behavior mark and a corresponding optical flow image sequence into an appearance-motion double-flow network for training, and obtaining optimal network parameters of the appearance-motion double-flow network;
b. inputting the audio waveform diagram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a downsampling layer and a full-connection layer of the auditory feature extraction network; before inputting the audio waveform chart sequence into a preset auditory feature extraction network, training the preset auditory feature extraction network is needed, and the method specifically comprises the following steps:
acquiring raw audio data with a lactation mark; denoising the original audio signal by adopting a band-pass filter; framing the denoised audio signal, wherein the frame length is 30ms, and the frames overlap for 10ms; converting the audio signal after framing into an audio waveform diagram sequence, and obtaining an audio waveform diagram sequence corresponding to the original audio signal; inputting the audio waveform diagram sequence with the lactation mark into a preset auditory feature extraction network for training, and obtaining the optimal network parameters of the auditory feature extraction network;
said step 4) comprises the steps of:
4.1 Stacking and splicing the one-dimensional visual feature vector and the one-dimensional auditory feature vector to obtain visual and audio features;
4.2 The audio-visual characteristics are sent into a preset long-short-period memory network to perform characteristic extraction, and the time sequence audio-visual characteristics are output;
before the audiovisual features are sent to the preset long-short-period memory network, the preset network is trained, and the method specifically comprises the following steps:
acquiring original video and audio sequence samples with behavior marks, acquiring corresponding optical flow image sequence samples according to the original video sequence with the behavior marks, and acquiring denoised audio waveform diagram sequence samples according to the original audio sequence samples with the behavior marks;
inputting an original video frame with a behavior mark and an optical flow sequence into a preset appearance-motion double-flow network to extract visual features, and inputting an audio waveform chart sequence sample into a preset auditory feature extraction network to extract auditory features;
and stacking and splicing the auditory features and the visual features, inputting the stacked and spliced auditory features and the visual features into a preset long-short-period memory network for training, and obtaining optimal network parameters.
5. The sow lactation identification method based on audio and video information fusion according to claim 1, wherein the step 5) comprises the following steps:
5.1 Inputting the time sequence visual and auditory characteristics into a full-connection layer for further characteristic extraction and integration to obtain 2 characteristic values which respectively correspond to the characteristic values of the lactation behavior and the non-lactation behavior;
5.2 Inputting the characteristic values of the lactation behavior and the non-lactation behavior into a soft maximum classifier to calculate the probability values of the 2 characteristics corresponding to the lactation behavior and the non-lactation behavior, and taking the behavior class of the behavior with the largest probability as a behavior recognition result, thereby realizing the recognition of the lactation behavior of the sow.
CN202011336361.5A 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion Active CN112287893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011336361.5A CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011336361.5A CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Publications (2)

Publication Number Publication Date
CN112287893A CN112287893A (en) 2021-01-29
CN112287893B true CN112287893B (en) 2023-07-18

Family

ID=74425526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011336361.5A Active CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Country Status (1)

Country Link
CN (1) CN112287893B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299551A (en) * 2022-03-07 2022-04-08 深圳市海清视讯科技有限公司 Model training method, animal behavior identification method, device and equipment
CN114581749B (en) * 2022-05-09 2022-07-26 城云科技(中国)有限公司 Audio-visual feature fusion target behavior identification method and device and application

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
CN107609460A (en) * 2017-05-24 2018-01-19 南京邮电大学 A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
CN108200483A (en) * 2017-12-26 2018-06-22 中国科学院自动化研究所 Dynamically multi-modal video presentation generation method
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109492535A (en) * 2018-10-12 2019-03-19 华南农业大学 A kind of sow Breast feeding behaviour recognition methods of computer vision
US10289912B1 (en) * 2015-04-29 2019-05-14 Google Llc Classifying videos using neural networks
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111428789A (en) * 2020-03-25 2020-07-17 广东技术师范大学 Network traffic anomaly detection method based on deep learning
CN111461235A (en) * 2020-03-31 2020-07-28 合肥工业大学 Audio and video data processing method and system, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8135221B2 (en) * 2009-10-07 2012-03-13 Eastman Kodak Company Video concept classification using audio-visual atoms
US9946933B2 (en) * 2016-08-18 2018-04-17 Xerox Corporation System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture
US10685236B2 (en) * 2018-07-05 2020-06-16 Adobe Inc. Multi-model techniques to generate video metadata

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10289912B1 (en) * 2015-04-29 2019-05-14 Google Llc Classifying videos using neural networks
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
CN107609460A (en) * 2017-05-24 2018-01-19 南京邮电大学 A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
CN108200483A (en) * 2017-12-26 2018-06-22 中国科学院自动化研究所 Dynamically multi-modal video presentation generation method
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109492535A (en) * 2018-10-12 2019-03-19 华南农业大学 A kind of sow Breast feeding behaviour recognition methods of computer vision
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111428789A (en) * 2020-03-25 2020-07-17 广东技术师范大学 Network traffic anomaly detection method based on deep learning
CN111461235A (en) * 2020-03-31 2020-07-28 合肥工业大学 Audio and video data processing method and system, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种基于三维卷积网络的暴力视频检测方法;宋伟;张栋梁;齐振国;郑男;;信息网络安全(第12期);60-66 *
基于MFO-LSTM的母猪发情行为识别;王凯;刘春红;段青玲;;农业工程学报(第14期);219-227 *
基于深度神经网络的语音情感识别及性格分析;洪兆金;魏晨阳;庄媛;王影;王祎庭;赵力;;信息化研究(第01期);52-57 *
多模态特征融合与多任务学习的特种视频分类;吴晓雨;顾超男;王生进;;光学精密工程(第05期);186-195 *

Also Published As

Publication number Publication date
CN112287893A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112287893B (en) Sow lactation behavior identification method based on audio and video information fusion
Wu et al. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms
CN106778555B (en) Cow rumination chewing and swallowing frequency statistical method based on machine vision
CN111294565A (en) Intelligent pig raising monitoring method and management terminal
CN108491807B (en) Real-time monitoring method and system for oestrus of dairy cows
CN110598658B (en) Convolutional network identification method for sow lactation behaviors
CN112580552A (en) Method and device for analyzing behavior of rats
CN115937251A (en) Multi-target tracking method for shrimps
CN115861721B (en) Livestock and poultry breeding spraying equipment state identification method based on image data
Ayadi et al. Dairy cow rumination detection: A deep learning approach
CN112287741A (en) Image processing-based farming operation management method and device
CN102706877A (en) Portable detecting system for diseases and insect pests of cotton and detecting method
Ruchay et al. Accurate 3d shape recovery of live cattle with three depth cameras
CN113785783A (en) Livestock grouping system and method
CN113762113A (en) Livestock parturition behavior monitoring method and device
Xi et al. Smart headset, computer vision and machine learning for efficient prawn farm management
CN116246223A (en) Multi-target tracking algorithm and health assessment method for dairy cows
Prema et al. Smart Farming: IoT based plant leaf disease detection and prediction using deep neural network with image processing
Jovanović et al. Splash detection in fish Plants surveillance videos using deep learning
Jin et al. An improved mask r-cnn method for weed segmentation
CN115272862A (en) Audio-visual cooperation based winged insect tracking and identifying method and device
CN114358163A (en) Food intake monitoring method and system based on twin network and depth data
CN113989745A (en) Non-contact monitoring method for feeding condition of ruminants
Xi et al. Smart Headset, Computer Vision and Machine Learning for Efficient Prawn Farm Management
CN111783720A (en) Cattle rumination behavior detection method based on gun-ball linkage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant