CN112287893A - Sow lactation behavior identification method based on audio and video information fusion - Google Patents

Sow lactation behavior identification method based on audio and video information fusion Download PDF

Info

Publication number
CN112287893A
CN112287893A CN202011336361.5A CN202011336361A CN112287893A CN 112287893 A CN112287893 A CN 112287893A CN 202011336361 A CN202011336361 A CN 202011336361A CN 112287893 A CN112287893 A CN 112287893A
Authority
CN
China
Prior art keywords
audio
sequence
lactation
network
auditory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011336361.5A
Other languages
Chinese (zh)
Other versions
CN112287893B (en
Inventor
杨阿庆
薛月菊
赵慧民
林智勇
刘晓勇
陈荣军
黄华盛
张磊
韩娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202011336361.5A priority Critical patent/CN112287893B/en
Publication of CN112287893A publication Critical patent/CN112287893A/en
Application granted granted Critical
Publication of CN112287893B publication Critical patent/CN112287893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P60/00Technologies relating to agriculture, livestock or agroalimentary industries
    • Y02P60/80Food processing, e.g. use of renewable energies or variable speed drives in handling, conveying or stacking
    • Y02P60/87Re-use of by-products of food processing for fodder production

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a sow lactation behavior identification method based on audio and video information fusion, which comprises the following steps: collecting the overlook video and audio data of the sows in the lactation period, extracting audio and video pictures of the original video and audio data, denoising and framing the audio, obtaining an audio oscillogram sequence, and extracting a corresponding optical flow graph sequence according to the original video pictures; inputting the optical flow sequence and the video frame into a preset appearance-motion double-flow network to extract visual characteristics, inputting the audio oscillogram sequence into a preset auditory characteristic extraction network, splicing and fusing the visual characteristics and the auditory characteristics, inputting the audio oscillogram sequence into a preset long-short term memory network to extract time sequence visual and auditory characteristics, finally sending the audio oscillogram sequence into a full connection layer and a soft maximum layer to perform behavior classification, and outputting the lactation behavior classification. The method and the device utilize visual and auditory information in the lactation behaviors to identify the sow lactation behaviors in the pig farm environment, thereby obtaining the sow lactation information, being beneficial to finding abnormal lactation behavior conditions in time and taking effective measures, and improving the economic benefits of the pig farm.

Description

Sow lactation behavior identification method based on audio and video information fusion
Technical Field
The invention relates to the technical field of intelligent animal husbandry, multi-mode information fusion and interactive behavior recognition, in particular to a sow lactation behavior recognition method based on audio and video information fusion.
Background
The lactation behavior of the sow is not only an important embodiment of the physical health condition and the maternal behavior ability of the sow, but also a main factor influencing the survival and growth of piglets in the lactation period. The sow lactation behavior state is observed, the abnormal lactation behavior is found in time, a pig farm manager can make effective manual intervention decisions in time, the health states of the sows and piglets are improved, and therefore the economic benefit of the pig farm is improved.
At present, the monitoring of the lactation behavior of the sows mainly comprises manual field or video monitoring and a behavior monitoring method based on vision. The manual observation mode requires manual on-site observation or long-time observation of sow behavior through a monitoring video, records the occurrence time, duration, frequency and other information of the sow lactation behavior, and judges whether the sow lactation behavior is abnormal through experiments. The sow lactation behavior is an interactive behavior of a sow and a piglet group, the research of the sow lactation behavior identification based on vision is rarely reported at present, the mode of the method is mainly based on the vision information of the sow in the lactation process, such as the characteristics that the sow lies on the side, the breast is exposed, the piglet lies down in the pit and performs breast pumping movement on the breast of the sow, and the information of the sow lactation behavior is automatically obtained through algorithm analysis. However, due to the change of light in the pigsty environment, crowding and blocking of piglet groups and the like, part of visual information cannot be acquired, and judgment and analysis of lactation behaviors are influenced.
In conclusion, the existing sow lactation behavior monitoring method is not suitable for or difficult to identify sow lactation behaviors. In view of this, it is desirable to provide an automatic identification method for sow lactation behavior to realize accurate monitoring of sow lactation behavior. The sow lactation behavior identification method based on audio and video information fusion realizes accurate monitoring of the sow lactation behavior through mutual cooperative assistance of the video signal and the audio signal because the sow lactation is accompanied by regular lactation sound.
Disclosure of Invention
The invention aims to overcome the defects that the prior art is difficult to accurately identify the sow lactation behavior, provides a sow lactation behavior identification method based on audio and video information fusion, provides data support for timely making effective management decisions, and improves the health condition of pigs and the economic benefit of a pig farm.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a sow lactation behavior identification method based on audio and video information fusion comprises the following steps:
1) collecting audio and video data of sows in a lactation period;
2) data preprocessing: firstly, separating out audio and video data, then denoising and framing the audio data, acquiring an audio oscillogram sequence, and finally extracting optical flow from the video data to acquire an optical flow image sequence;
3) inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio oscillogram sequence into a preset auditory feature extraction network to obtain auditory features;
4) inputting the visual features and the auditory features into a long-term and short-term memory network for further feature fusion and extraction, and acquiring time sequence visual and auditory features;
5) and (4) sending the time sequence visual and auditory characteristics into a full connection layer and a soft maximum classifier to perform behavior classification, and realizing automatic identification of the lactation behavior of the sow.
In the step 1), a camera with a recording function is installed right above the pigsty, and overlooking video and audio data of the sows in the lactation period are collected.
The step 2) comprises the following steps:
2.1) separating audio and video data from the shot audio and video data;
2.2) processing the original audio signal by using a band-pass filter to obtain a denoised audio signal corresponding to the original audio signal;
2.3) framing the denoised audio signal, wherein the frame length is 30ms, the frames are overlapped for 10ms, and the audio signal is converted into an audio oscillogram sequence;
and 2.4) acquiring an optical flow image sequence of the sow to be monitored in the lactation period according to the original image sequence of the sow to be monitored in the lactation period by using an optical flow method.
The step 3) comprises the following two treatments:
a. inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in the video from the video frame and the optical flow image sequence through a convolution layer, a down-sampling layer and a full-connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector; before inputting a video frame and an optical flow image sequence into a preset appearance-motion double-flow network, the preset appearance-motion double-flow network needs to be trained, and the method specifically comprises the following steps:
acquiring an original video frame with a lactation behavior mark and an optical flow image sequence; inputting an original video frame with a lactation behavior mark and a corresponding optical flow image sequence into an appearance-motion double-flow network for training, and acquiring optimal network parameters of the appearance-motion double-flow network;
b. inputting the audio oscillogram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a down-sampling layer and a full-connection layer of the auditory feature extraction network; before inputting the audio oscillogram sequence into the preset auditory feature extraction network, the preset auditory feature extraction network needs to be trained, and the method specifically comprises the following steps:
acquiring original audio data with a lactation behavior mark; denoising an original audio signal by adopting a band-pass filter; framing the denoised audio signal, wherein the frame length is 30ms, and the interframes are overlapped for 10 ms; converting the audio signal after framing into an audio oscillogram sequence to obtain an audio oscillogram sequence corresponding to the original audio signal; inputting the audio oscillogram sequence with the lactation behavior mark into a preset auditory feature extraction network for training to obtain the optimal network parameters of the auditory feature extraction network;
the step 4) comprises the following steps:
4.1) stacking and splicing the one-dimensional visual feature vector and the one-dimensional auditory feature vector to obtain visual and auditory features;
4.2) sending the visual and auditory characteristics into a preset long-short term memory network for characteristic extraction, and outputting time sequence visual and auditory characteristics;
before the visual and auditory characteristics are sent into a preset long-term and short-term memory network, the preset network needs to be trained, and the method specifically comprises the following steps:
acquiring original video and audio sequence samples with behavior marks, acquiring corresponding optical flow image sequence samples according to an original video sequence with the behavior marks, and acquiring denoised audio oscillogram sequence samples according to an original audio sequence sample with the behavior marks;
inputting an original video frame with a behavior mark and an optical flow sequence into a preset appearance-motion double-flow network to extract visual features, and inputting an audio oscillogram sequence sample into a preset auditory feature extraction network to extract auditory features;
and stacking and splicing the auditory characteristics and the visual characteristics, inputting a preset long-term and short-term memory network for training, and acquiring optimal network parameters.
The step 5) comprises the following steps:
5.1) inputting the time sequence visual and auditory characteristics into a full-link layer for further characteristic extraction and integration to obtain 2 characteristic values which respectively correspond to characteristic values of lactation behaviors and non-lactation behaviors;
and 5.2) inputting the feature values of the lactation behaviors and the non-lactation behaviors into a soft maximum classifier, calculating probability values of the lactation behaviors and the non-lactation behaviors corresponding to 2 features, and taking the behavior class to which the probability maximum belongs as a behavior recognition result, thereby realizing the sow lactation behavior recognition.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides an automatic identification method for sow lactation behaviors, so that sow lactation behavior information is obtained, and lactation behavior capability is further analyzed.
2. The invention integrates the video and audio information, and the video and audio information are mutually cooperated and assisted, so that the extracted features are rich and accurate.
3. The invention adopts the video frame sequence, the optical flow sequence and the audio sequence, and analyzes from three aspects of space distribution, time sequence motion and sound respectively, thereby having high identification precision.
4. The invention adopts the appearance-motion double-flow network to fuse the optical flow and the video convolution characteristics, thereby enhancing the visual characteristic expression capability.
5. The invention adopts the long and short term memory network to integrate the visual and auditory characteristics, thereby enhancing the expression ability of the behavior characteristics on the time sequence.
6. In a potential application method, the occurrence time, duration and frequency of the sow lactation behavior within a fixed time are counted, and the method is used for researching the sow lactation behavior rule. The lactation information can be used for predicting the health and welfare conditions of sows and piglets, providing data reference for judging the maternal ability of the sows, further providing decision support for seed selection and feeding management of a pig farm and improving the economic benefit of the pig farm.
7. For behaviors accompanied by specific sounds, the visual and audio multimodal data are fused, so that the method has strong applicability and can be used for researching the behaviors of other animals.
Drawings
FIG. 1 is a schematic overall flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the method of the present invention.
Fig. 3 is a schematic diagram of the appearance-motion dual-stream network structure.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1 and fig. 2, the sow lactation behavior recognition method based on audio and video information fusion provided by the invention realizes recognition of commercial captive sow lactation behaviors based on audio and video multi-modal data, and provides reference for real-time monitoring and analysis of sow lactation behaviors and health states, and comprises the following steps:
1) collecting audio and video data of sows in a lactation period;
specifically, a camera with a recording function is installed right above an actual commercial captive pig house to continuously shoot the daily behavior video and audio of lactating sows. In the embodiment, a Haokwegian DS-2CD3T46FWD V2-I3 high-definition audio camera is adopted to acquire daily behavior video and audio data containing the lactation behavior of the sows.
2) Data preprocessing: firstly, separating out audio and video data, then denoising and framing the audio data, acquiring an audio oscillogram sequence, and finally extracting optical flow from the video data to acquire an optical flow image sequence; the method specifically comprises the following steps:
2.1) separating audio and video of the obtained video and audio data by adopting an audio and video editing and converting tool, and extracting audio data and video pictures;
2.2) because the actual pigling screaming sound, the screaming sound of other pigsties and the noise of machine equipment are noisy, in order to reduce the judgment of the noise on the lactation sound of the sows, the method firstly carries out denoising on the acquired original audio data. The sow can send a humming and snore evoking sound to attract the piglets to suck before suckling, and the piglets are guided to change between massage and sucking through the change of the humming rate, and based on the property, the original audio is filtered by adopting a band-pass filter suitable for the suckling sound of the sow in the embodiment, so that the influence of other noises on the suckling sound is reduced;
2.3) the method is suitable for two-dimensional image input of a convolutional network in the embodiment, the embodiment frames the denoised time sequence audio signal, the frame length is 30ms, the frames are overlapped for 10ms, and the audio signal is converted into an audio oscillogram sequence;
and 2.4) acquiring an optical flow image sequence of the sow to be monitored according to the original image sequence of the sow to be monitored by using an optical flow method, wherein each pixel point in the optical flow image records the motion intensity and direction of the pixel.
3) Inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio oscillogram sequence into a preset auditory feature extraction network to obtain auditory features; the specific situation is as follows:
a. inputting a video frame and optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in a video from the video frame and optical flow image sequence through a convolution layer, a down-sampling layer and a full-connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector;
in this embodiment, a preset appearance-motion dual-flow network structure is shown in fig. 3, where the network structure is composed of an appearance flow and a motion flow, and takes a video frame and an optical flow sequence as input, and then outputs feature maps of the same dimension after passing through 5 layers of convolutional layers and 5 layers of downsampling layers, the two-flow feature maps are fused in a splicing manner and then sent into 2 layers of convolutional layers for further feature extraction and fusion, the fused feature maps are sent into 2 continuous layers of fully-connected layers, and one-dimensional visual features representing appearance and motion are output;
before the video frames and optical flow sequence are input into the appearance-motion dual-flow network structure, the network needs to be trained. As the network only realizes feature extraction, 1 layer of full convolution layer and a soft maximum (softmax) layer are added behind the network for behavior classification before training the network in order to obtain the optimal network parameters of the network. In the training process, the number of output behavior categories of the network is set to be 2, the lactation behaviors and the non-lactation behaviors are represented, an original video frame with lactation behavior marks and a corresponding optical flow image sequence are input into the appearance-motion double-flow network for forward propagation and backward feedback training, and the optimal network parameters of the appearance-motion double-flow network are obtained. The original video frame and the optical flow sequence are input into the convolution, down sampling and full connection layer of the trained appearance-motion double-flow network, and then one-dimensional visual features are output.
b. Inputting the audio oscillogram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a down-sampling layer and a full-connection layer of the auditory feature extraction network;
in this embodiment, the preset auditory feature extraction structure is composed of 5 convolutional layers, 5 downsampling layers and 2 full-connection layers;
the network needs to be trained before the audio waveform sequence is input into the pre-set auditory feature extraction network. As the network only realizes feature extraction, 1 layer of full convolution layer and softmax layer are added behind the network for behavior classification before training the network in order to obtain the optimal network parameters of the network, and the number of output behavior categories is set to be 2, which represents the lactation behavior and the non-lactation behavior;
acquiring original audio data with a lactation behavior mark, denoising an original audio signal by adopting a band-pass filter, framing the denoised audio signal, wherein the frame length is 30ms, the frames are overlapped for 10ms, converting the framed audio signal into an audio oscillogram sequence, inputting the audio oscillogram sequence with the lactation behavior mark into a preset auditory characteristic extraction network for forward propagation and backward feedback training, and acquiring the optimal network parameter of the auditory characteristic extraction network; the audio oscillogram sequence is input into the trained auditory feature extraction network convolution layer, down sampling layer and full connection layer, and then one-dimensional auditory features are output.
4) Inputting the visual and auditory characteristics into a long-term and short-term memory network for further characteristic fusion and extraction, and acquiring the time sequence visual and auditory characteristics, wherein the method specifically comprises the following steps:
4.1) stacking and splicing the one-dimensional visual features and the auditory features to obtain visual and auditory features;
4.2) sending the visual and auditory characteristics into a preset long-short term memory network for characteristic extraction, and outputting one-dimensional time sequence characteristics, wherein the preset long-short term memory network is set in advance and trained; the long-short term memory network (LSTM) is a time-recursive neural network suitable for processing and predicting time-series events. An LSTM consists of one cell and three controllers, the three controllers being an Input gate, a forgetting gate, and an Output gate, respectively, to protect and control the state of the cell. The LSTM in this embodiment is designed for timing feature extraction, and the LSTM may be set according to actual requirements, which is not specifically limited herein.
In the LSTM training, a classification network can be formed together with a subsequent full-connection layer and a soft maximum classifier to perform network parameter training. The specific training steps of the LSTM include:
acquiring original video and audio sequence samples with behavior marks, acquiring optical flow image sequence samples with the behavior marks according to an original video sequence with the behavior marks, and acquiring denoised audio oscillogram sequence samples with the behavior marks according to an original audio sequence sample with the behavior marks; inputting the video and optical flow sequence with the behavior mark into a preset appearance-motion double-flow network to extract visual features, and inputting the audio oscillogram sequence sample into a preset auditory feature extraction network to extract auditory features; and stacking and splicing the auditory characteristics and the visual characteristics, inputting the auditory characteristics and the visual characteristics into a preset long-term and short-term memory network to perform forward propagation and backward feedback training, and acquiring optimal network parameters.
5) Sending the time sequence visual and auditory characteristics into a full connection layer and a soft maximum classifier for behavior classification to realize automatic identification of the lactation behavior of the sow, and specifically comprising the following steps
5.1) inputting the time sequence visual and auditory characteristics into a full-link layer for further characteristic extraction and integration, setting the number of neurons in the full-link layer to be 2, obtaining 2 characteristic values which respectively correspond to characteristic values of lactation behaviors and non-lactation behaviors;
and 5.2) inputting the feature values of the lactation behaviors and the non-lactation behaviors into a soft maximum classifier, calculating probability values of the lactation behaviors and the non-lactation behaviors corresponding to 2 features, and taking the behavior class to which the probability maximum belongs as a behavior recognition result, thereby realizing the sow lactation behavior recognition.
In conclusion, the sow lactation behavior identification method disclosed by the invention comprises the steps of collecting the overlook video and audio data of the sows in the lactation period, extracting audio and video pictures from the original video and audio data, denoising and framing the audio, obtaining an audio oscillogram sequence, and extracting a corresponding optical flow graph sequence according to the original video pictures; inputting the optical flow sequence and the video frame into a preset appearance-motion double-flow network to extract visual characteristics, inputting the audio oscillogram sequence into a preset auditory characteristic extraction network, splicing and fusing the visual characteristics and the auditory characteristics, inputting the audio oscillogram sequence into a preset long-short term memory network to extract time sequence visual and auditory characteristics, finally sending the audio oscillogram sequence into a full connection layer and a soft-maximum layer to perform behavior classification, and outputting lactation behavior categories to realize the identification of the lactation behaviors of the sows. The research is based on video and audio multi-mode information fusion to identify the sow lactation behaviors, provides reliable lactation information for pig farm managers, guides the pig farm managers to make timely and effective management decisions, improves the health and welfare conditions of pigs, provides data support for the exploration of sow maternal behavior rules, and is worthy of popularization.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (5)

1. A sow lactation behavior identification method based on audio and video information fusion is characterized by comprising the following steps:
1) collecting audio and video data of sows in a lactation period;
2) data preprocessing: firstly, separating out audio and video data, then denoising and framing the audio data, acquiring an audio oscillogram sequence, and finally extracting optical flow from the video data to acquire an optical flow image sequence;
3) inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network for feature extraction to obtain visual features, and inputting the audio oscillogram sequence into a preset auditory feature extraction network to obtain auditory features;
4) inputting the visual features and the auditory features into a long-term and short-term memory network for further feature fusion and extraction, and acquiring time sequence visual and auditory features;
5) and (4) sending the time sequence visual and auditory characteristics into a full connection layer and a soft maximum classifier to perform behavior classification, and realizing automatic identification of the lactation behavior of the sow.
2. The sow lactation behavior identification method based on audio-video information fusion as claimed in claim 1, which is characterized in that: in the step 1), a camera with a recording function is installed right above the pigsty, and overlooking video and audio data of the sows in the lactation period are collected.
3. The sow lactation behavior identification method based on audio-video information fusion as claimed in claim 1, wherein the step 2) comprises the following steps:
2.1) separating audio and video data from the shot audio and video data;
2.2) processing the original audio signal by using a band-pass filter to obtain a denoised audio signal corresponding to the original audio signal;
2.3) framing the denoised audio signal, wherein the frame length is 30ms, the frames are overlapped for 10ms, and the audio signal is converted into an audio oscillogram sequence;
and 2.4) acquiring an optical flow image sequence of the sow to be monitored in the lactation period according to the original image sequence of the sow to be monitored in the lactation period by using an optical flow method.
4. The sow lactation behavior identification method based on audio-video information fusion as claimed in claim 1, wherein the step 3) comprises the following two processes:
a. inputting the video frame and the optical flow image sequence into a preset appearance-motion double-flow network, extracting corresponding appearance-motion characteristics in the video from the video frame and the optical flow image sequence through a convolution layer, a down-sampling layer and a full-connection layer of the appearance-motion double-flow network, and outputting a one-dimensional visual characteristic vector; before inputting a video frame and an optical flow image sequence into a preset appearance-motion double-flow network, the preset appearance-motion double-flow network needs to be trained, and the method specifically comprises the following steps:
acquiring an original video frame with a lactation behavior mark and an optical flow image sequence; inputting an original video frame with a lactation behavior mark and a corresponding optical flow image sequence into an appearance-motion double-flow network for training, and acquiring optimal network parameters of the appearance-motion double-flow network;
b. inputting the audio oscillogram sequence into a preset auditory feature extraction network, and outputting a one-dimensional auditory feature vector through a convolution layer, a down-sampling layer and a full-connection layer of the auditory feature extraction network; before inputting the audio oscillogram sequence into the preset auditory feature extraction network, the preset auditory feature extraction network needs to be trained, and the method specifically comprises the following steps:
acquiring original audio data with a lactation behavior mark; denoising an original audio signal by adopting a band-pass filter; framing the denoised audio signal, wherein the frame length is 30ms, and the interframes are overlapped for 10 ms; converting the audio signal after framing into an audio oscillogram sequence to obtain an audio oscillogram sequence corresponding to the original audio signal; inputting the audio oscillogram sequence with the lactation behavior mark into a preset auditory feature extraction network for training to obtain the optimal network parameters of the auditory feature extraction network;
the step 4) comprises the following steps:
4.1) stacking and splicing the one-dimensional visual feature vector and the one-dimensional auditory feature vector to obtain visual and auditory features;
4.2) sending the visual and auditory characteristics into a preset long-short term memory network for characteristic extraction, and outputting time sequence visual and auditory characteristics;
before the visual and auditory characteristics are sent into a preset long-term and short-term memory network, the preset network needs to be trained, and the method specifically comprises the following steps:
acquiring original video and audio sequence samples with behavior marks, acquiring corresponding optical flow image sequence samples according to an original video sequence with the behavior marks, and acquiring denoised audio oscillogram sequence samples according to an original audio sequence sample with the behavior marks;
inputting an original video frame with a behavior mark and an optical flow sequence into a preset appearance-motion double-flow network to extract visual features, and inputting an audio oscillogram sequence sample into a preset auditory feature extraction network to extract auditory features;
and stacking and splicing the auditory characteristics and the visual characteristics, inputting a preset long-term and short-term memory network for training, and acquiring optimal network parameters.
5. The sow lactation behavior identification method based on audio-video information fusion as claimed in claim 1, wherein the step 5) comprises the following steps:
5.1) inputting the time sequence visual and auditory characteristics into a full-link layer for further characteristic extraction and integration to obtain 2 characteristic values which respectively correspond to characteristic values of lactation behaviors and non-lactation behaviors;
and 5.2) inputting the feature values of the lactation behaviors and the non-lactation behaviors into a soft maximum classifier, calculating probability values of the lactation behaviors and the non-lactation behaviors corresponding to 2 features, and taking the behavior class to which the probability maximum belongs as a behavior recognition result, thereby realizing the sow lactation behavior recognition.
CN202011336361.5A 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion Active CN112287893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011336361.5A CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011336361.5A CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Publications (2)

Publication Number Publication Date
CN112287893A true CN112287893A (en) 2021-01-29
CN112287893B CN112287893B (en) 2023-07-18

Family

ID=74425526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011336361.5A Active CN112287893B (en) 2020-11-25 2020-11-25 Sow lactation behavior identification method based on audio and video information fusion

Country Status (1)

Country Link
CN (1) CN112287893B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299551A (en) * 2022-03-07 2022-04-08 深圳市海清视讯科技有限公司 Model training method, animal behavior identification method, device and equipment
CN114581749A (en) * 2022-05-09 2022-06-03 城云科技(中国)有限公司 Audio-visual feature fusion target behavior identification method and device and application

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110081082A1 (en) * 2009-10-07 2011-04-07 Wei Jiang Video concept classification using audio-visual atoms
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
CN107609460A (en) * 2017-05-24 2018-01-19 南京邮电大学 A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
US20180053057A1 (en) * 2016-08-18 2018-02-22 Xerox Corporation System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture
CN108200483A (en) * 2017-12-26 2018-06-22 中国科学院自动化研究所 Dynamically multi-modal video presentation generation method
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109492535A (en) * 2018-10-12 2019-03-19 华南农业大学 A kind of sow Breast feeding behaviour recognition methods of computer vision
US10289912B1 (en) * 2015-04-29 2019-05-14 Google Llc Classifying videos using neural networks
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
US20200012862A1 (en) * 2018-07-05 2020-01-09 Adobe Inc. Multi-model Techniques to Generate Video Metadata
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111428789A (en) * 2020-03-25 2020-07-17 广东技术师范大学 Network traffic anomaly detection method based on deep learning
CN111461235A (en) * 2020-03-31 2020-07-28 合肥工业大学 Audio and video data processing method and system, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110081082A1 (en) * 2009-10-07 2011-04-07 Wei Jiang Video concept classification using audio-visual atoms
US10289912B1 (en) * 2015-04-29 2019-05-14 Google Llc Classifying videos using neural networks
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
US20180053057A1 (en) * 2016-08-18 2018-02-22 Xerox Corporation System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture
CN107609460A (en) * 2017-05-24 2018-01-19 南京邮电大学 A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
CN108200483A (en) * 2017-12-26 2018-06-22 中国科学院自动化研究所 Dynamically multi-modal video presentation generation method
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
US20200012862A1 (en) * 2018-07-05 2020-01-09 Adobe Inc. Multi-model Techniques to Generate Video Metadata
CN109492535A (en) * 2018-10-12 2019-03-19 华南农业大学 A kind of sow Breast feeding behaviour recognition methods of computer vision
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111428789A (en) * 2020-03-25 2020-07-17 广东技术师范大学 Network traffic anomaly detection method based on deep learning
CN111461235A (en) * 2020-03-31 2020-07-28 合肥工业大学 Audio and video data processing method and system, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
吴晓雨;顾超男;王生进;: "多模态特征融合与多任务学习的特种视频分类", 光学精密工程 *
宋伟;张栋梁;齐振国;郑男;: "一种基于三维卷积网络的暴力视频检测方法", 信息网络安全 *
洪兆金;魏晨阳;庄媛;王影;王?庭;赵力;: "基于深度神经网络的语音情感识别及性格分析", 信息化研究 *
王凯;刘春红;段青玲;: "基于MFO-LSTM的母猪发情行为识别", 农业工程学报 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299551A (en) * 2022-03-07 2022-04-08 深圳市海清视讯科技有限公司 Model training method, animal behavior identification method, device and equipment
CN114581749A (en) * 2022-05-09 2022-06-03 城云科技(中国)有限公司 Audio-visual feature fusion target behavior identification method and device and application
CN114581749B (en) * 2022-05-09 2022-07-26 城云科技(中国)有限公司 Audio-visual feature fusion target behavior identification method and device and application
WO2023216609A1 (en) * 2022-05-09 2023-11-16 城云科技(中国)有限公司 Target behavior recognition method and apparatus based on visual-audio feature fusion, and application

Also Published As

Publication number Publication date
CN112287893B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Pegoraro et al. Automated video monitoring of insect pollinators in the field
Wu et al. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms
CN112287893B (en) Sow lactation behavior identification method based on audio and video information fusion
CN106778555B (en) Cow rumination chewing and swallowing frequency statistical method based on machine vision
CN110598658B (en) Convolutional network identification method for sow lactation behaviors
CN111294565A (en) Intelligent pig raising monitoring method and management terminal
CN112257564B (en) Aquatic product quantity statistical method, terminal equipment and storage medium
CN110942045A (en) Intelligent fish tank feeding system based on machine vision
CN103248703A (en) Automatic monitoring system and method for live pig action
CN111860203B (en) Abnormal pig identification device, system and method based on image and audio mixing
CN111968159A (en) Simple and universal fish video image track tracking method
CN115861721B (en) Livestock and poultry breeding spraying equipment state identification method based on image data
CN115937251A (en) Multi-target tracking method for shrimps
Nava et al. Learning visual localization of a quadrotor using its noise as self-supervision
CN116721370A (en) Method for capturing and identifying interesting animal behavior fragments aiming at long-time video monitoring
CN116267236A (en) Cluster fruit picking robot
CN116246223A (en) Multi-target tracking algorithm and health assessment method for dairy cows
CN112766171B (en) Spraying method, device, system and medium
CN112396033A (en) Bird background rhythm detection method and device, terminal equipment and storage medium
CN109002791A (en) A kind of system and method automatically tracking milk cow Ruminant behavior based on video
CN116978099B (en) Lightweight sheep identity recognition model construction method and recognition model based on sheep face
Dhivya Performance evaluation of image processing filters Towads strawberry leaf disease
JP7368916B1 (en) Pollen identification device using learning
CN113962336B (en) Real-time cattle face ID coding method
Cheng et al. ResNet-based dairy daily behavior recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant