CN114005056A - Scenic spot feeding behavior identification method, device and system - Google Patents

Scenic spot feeding behavior identification method, device and system Download PDF

Info

Publication number
CN114005056A
CN114005056A CN202111239654.6A CN202111239654A CN114005056A CN 114005056 A CN114005056 A CN 114005056A CN 202111239654 A CN202111239654 A CN 202111239654A CN 114005056 A CN114005056 A CN 114005056A
Authority
CN
China
Prior art keywords
video
animal
area
tourist
video clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111239654.6A
Other languages
Chinese (zh)
Inventor
王春鹏
张思国
范绪
李小龙
唐捷
杨金澄
陈智
周维
邓建均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Zhisheng Huilv Technology Co ltd
Original Assignee
Sichuan Zhisheng Huilv Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Zhisheng Huilv Technology Co ltd filed Critical Sichuan Zhisheng Huilv Technology Co ltd
Priority to CN202111239654.6A priority Critical patent/CN114005056A/en
Publication of CN114005056A publication Critical patent/CN114005056A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a scenic spot feeding behavior identification method, a device and a system, wherein the method comprises the following steps: acquiring a tourist area video and an animal area video of a scenic spot; performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; segmenting the video of the animal region according to the time interval corresponding to the video clip set of the tourist region to obtain a video clip set of the animal region; obtaining a video clip group set according to the video clip set of the tourist area and the video clip set of the animal area; according to the video clip group set, respectively extracting time characteristics and space characteristics of video clips in a tourist area and video clips in an animal area in each video clip group by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of tourists and the feeding behaviors of animals; and the characteristics of the tourist feeding behavior and the animal feeding behavior are fused to obtain the feeding behavior result. The method aims at the accuracy of recognition of the feeding behavior in scenic spots, and is suitable for large-area popularization and use in tourist scenic spots.

Description

Scenic spot feeding behavior identification method, device and system
Technical Field
The invention relates to the technical field of video processing, in particular to a scenic spot feeding behavior identification method, device and system.
Background
With the development of tourism industry and the development of internet, the tourists in tourist attractions are fed with the food which should not be eaten by the animals or the food which is too much is eaten by the animals by mistake, so that the animals are seriously injured. The existing non-civilized feeding monitoring mainly identifies non-civilized feeding behaviors in a manual monitoring mode, and has the problems of large manpower and material resources and low identification precision. In addition, at present, the recognition of the illiterate feeding behavior based on the deep learning mode is to consider only the feeding behavior of the tourists and recognize the illiterate feeding behavior through the body actions, however, the simple body actions of the body, such as bending the arms or doing sports, which are not feeding, may be mistakenly recognized as the feeding behavior, and further, the recognition accuracy and efficiency are not high.
Disclosure of Invention
The invention aims to solve the technical problems that the existing non-civilized feeding monitoring in tourist attractions is mainly a manual monitoring mode, needs a large amount of manpower and material resources and has low identification precision; the traditional method for recognizing the non-civilized feeding behavior based on the deep learning only considers the feeding behavior of tourists and recognizes the feeding behavior through limb actions, but simple limb actions which are not fed, such as bending arms or doing sports, of a human body can be mistakenly recognized as the feeding behavior, so that the recognition accuracy and efficiency are not high; causing damage to animals caused by improper or overeating.
The invention aims to provide a scenic spot feeding behavior identification method, a device and a system, and the scenic spot feeding behavior identification method, device and system considers that the feeding behavior is an interactive behavior, and emphasizes identification on the behavior that tourists feed and animals eat fed food aiming at scenic spots where feeding is not allowed, so that overeating injury to the animals is avoided; the recognition of the feeding behaviors of the tourists and the feeding behaviors of the animals can recognize the feeding behaviors of the tourists, such as feeding, lifting and chewing, and the feeding behaviors of the animals, such as raising heads, chewing, lowering heads and the like, are also considered, and then the multi-stream convolution neural network model is adopted for training and recognizing, so that the accuracy of the recognition of the feeding behaviors is improved, and the overeating injury to the animals is avoided.
The invention is realized by the following technical scheme:
in a first aspect, the invention provides a scenic spot feeding behavior identification method, which comprises the following steps:
s1: acquiring a monitoring video of a scenic spot, wherein the monitoring video of the scenic spot comprises a tourist area video and an animal area video;
s2: performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; segmenting the video of the animal region according to the time interval corresponding to the video clip set of the tourist region to obtain a video clip set of the animal region; obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
s3: according to the video clip group set, respectively extracting time characteristics and space characteristics of video clips in a tourist area and video clips in an animal area in each video clip group by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of tourists and the feeding behaviors of animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
The working principle is as follows:
at present, the mode of the non-civilized feeding monitoring of tourist attractions is mainly a manual monitoring mode, a large amount of manpower and material resources are needed, and the identification precision is not high; the traditional method for recognizing the non-civilized feeding behavior based on the deep learning only considers the feeding behavior of tourists and recognizes the feeding behavior through limb actions, but simple limb actions which are not fed, such as bending arms or doing sports, of a human body can be mistakenly recognized as the feeding behavior, so that the recognition accuracy and efficiency are not high; causing damage to animals caused by improper or overeating.
The invention designs a scenic spot feeding behavior recognition method aiming at the problems, the invention considers that the feeding behavior is an interactive behavior, and emphasizes recognition on the behavior that tourists feed and animals eat fed food aiming at scenic spots which do not allow feeding, thereby avoiding mistaken feeding or overeating damage to the animals; the recognition that the tourists throw and feed and the animals eat again can recognize the throwing and feeding behaviors of the tourists and also can consider the feeding behaviors of raising heads, chewing heads, lowering heads, chewing and the like of the animals, and then the recognition of feature and classifier training is carried out by adopting a multi-stream convolution neural network model, so that the accuracy of the recognition of the throwing and feeding behaviors is improved, and the overeating injury to the animals is avoided. Specifically, special video acquisition is carried out on throwing and lifting actions of tourists, and special video acquisition is carried out on feeding actions of animal raising chewing, lowering chewing and the like, so that the accuracy of video acquisition of different objects is guaranteed; accordingly, the monitoring video of the scenic spot comprises a tourist area video and an animal area video, the two videos are related, and related segmentation needs to be performed on the two videos, considering that the animal video is basically a fixed animal, the image difference in the animal area video is not large, and the effect of taking the animal area video as the basis of the segmentation time interval is not good, so that the video segmentation is performed on the tourist area firstly, and then the animal area video is segmented according to the time interval corresponding to the segmentation result of the tourist area video. Obtaining a video clip group set after segmentation, and considering that the behaviors of tourists and animals in the same similar interval are fused in the video clip group; then, continuously extracting time characteristics and space characteristics of the video clips of the tourist area and the video clips of the animal area in each video clip group respectively by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of the tourists and the feeding behaviors of the animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
The method provided by the invention has high accuracy and efficiency in recognizing the feeding behavior in scenic spots, and is suitable for large-area popularization and application in tourist scenic spots.
Further, the tourist area video is a video shot for a fixed tourist area, and the animal area video is a video shot for a fixed animal area.
Furthermore, a reporting device is arranged in the tourist area according to the feeding behavior, and tourists report according to the seen feeding behavior so as to obtain a reporting signal; of course, the processing can also be performed according to the non-reporting signal. Performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; the method specifically comprises the following steps:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; based on the guest video being taken for a fixed area, the background information in the video image is fixed, and thus the background image for the guest area is known; the fixed background information is an interference factor for video segmentation, and the interference information needs to be removed. Then for each sampled image, calculating its pixel difference value D1, D2, …, Dn from the background image of the tourist area; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; such as the difference DD1 | D1-D2| between the sampled image I1 and the image I2. When the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; and calculating the division points of all the video streams of the tourist area videos by analogy to form a tourist area video clip set P { P1, P2, …, Pn } < Sn, Sm >.
Further, the animal region video is segmented according to the time interval corresponding to the tourist region video clip set, so that an animal region video clip set is obtained; the method specifically comprises the following steps:
the time interval corresponding to the tourist area video clip set is the time period + the preset offset corresponding to the tourist area video clip set;
according to the time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, and considering that the animal feeding action is after the human feeding action, namely the animal feeding action and the human feeding action have a certain time difference and are not completely synchronous; therefore, a preset offset M frame is added to the end frame of the animal region video clip, and min (Wn + M, Wmax) is taken as the corresponding end frame, wherein Wmax represents the longest frame number of the animal region video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax). For example, the first fragment a1 of the animal region is < W1, min (W2+ M, Wmax) >.
Further, a video clip group set is obtained according to the tourist area video clip set and the animal area video clip set, wherein the video clip group is written as < Pn, An >, Pn represents a tourist area video clip, and An represents An animal area video clip.
Further, step S3 includes the following sub-steps:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
The spatial stream convolutional network and the time stream convolutional network can be obtained by repeatedly training historical data.
In a second aspect, the present invention further provides a scenic spot feeding behavior recognition apparatus, which supports the scenic spot feeding behavior recognition method, and the apparatus includes:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a monitoring video of a scenic spot, and the monitoring video of the scenic spot comprises a tourist area video and an animal area video;
the video segmentation unit comprises a tourist area video segmentation subunit and an animal area video segmentation subunit; the tourist area video segmentation subunit is used for performing video segmentation on the tourist area video to obtain a tourist area video clip set; the animal region video segmentation subunit is used for segmenting the animal region video according to the time interval corresponding to the tourist region video clip set to obtain an animal region video clip set;
the video clip group unit is used for obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
the feeding behavior identification unit is used for respectively extracting time characteristics and space characteristics of the video clips of the tourist area and the video clips of the animal area in each video clip group by adopting a multi-stream convolution neural network model according to the video clip group set, and identifying the feeding behaviors of the tourists and the feeding behaviors of the animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
Further, the execution process of the guest region video segmentation subunit is as follows:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; calculating pixel difference values D1, D2, … and Dn of the sampling images and the background images of the tourist areas respectively; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; when the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; therefore, the division points of all the video streams of the tourist area videos are calculated, and a tourist area video clip set P { P1, P2, …, Pn } is formed.
Further, the execution process of the animal region video segmentation subunit is as follows:
according to time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, adding a preset offset M frame to an end frame of an animal area video clip, and taking min (Wn + M and Wmax) from the corresponding end frame, wherein Wmax represents the longest frame number of the animal area video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax). For example, the first fragment a1 of the animal region is < W1, min (W2+ M, Wmax) >.
Further, the feeding behavior recognition unit executes the following steps:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
In a third aspect, the invention further provides a scenic spot feeding behavior recognition system, which comprises a video signal acquisition module, a reporting signal acquisition module, a feeding behavior recognition module, an alarm module, a face recognition module, a ticket and an evidence storage module; the execution process of the feeding behavior recognition module is executed according to the scenic spot feeding behavior recognition method;
the video signal acquisition module is used for acquiring video stream signals at different angles;
the reporting signal acquisition module is used for acquiring reporting signals of the tourists on feeding behaviors;
the feeding behavior recognition module is used for recognizing feeding behaviors according to the video stream signals and the reporting signals;
the alarm module sends out an alarm signal according to the identified feeding behavior and plays warning information in the panoramic area;
the face recognition module is used for recognizing the identity information of the feeding actor;
and the ticket and storage module is used for issuing a ticket to the offender triggering multiple warnings and storing the offender information, the ticket record and the screenshot information of the relevant evidence video frame.
When the system of the invention is used for feeding, the alarm module sends out an alarm and identifies the offender through the face recognition technology, the offender is punished and the evidence is kept, meanwhile, the tourist reporting device is arranged, when other tourists find the non-civilized feeding behavior, the system can click the reporting device, and the system can identify the non-civilized feeding behavior.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention considers that the feeding behavior is an interactive behavior, and emphasizes the recognition of the behavior that tourists feed and animals eat the fed food aiming at scenic spots which do not allow feeding, thereby avoiding the damage of mistaken feeding or overeating to the animals; the recognition that the tourists throw and feed and the animals eat again can recognize the throwing and feeding behaviors of the tourists and also can consider the feeding behaviors of raising heads, chewing heads, lowering heads, chewing and the like of the animals, and then the recognition of feature and classifier training is carried out by adopting a multi-stream convolution neural network model, so that the accuracy of the recognition of the throwing and feeding behaviors is improved, and the overeating injury to the animals is avoided.
2. The method and the device have the advantages that the special video acquisition is carried out on the throwing and feeding behaviors of tourists, and the special video acquisition is carried out on the feeding behaviors of animal raising chewing, animal lowering chewing and the like, so that the video acquisition accuracy of different objects is ensured; accordingly, the monitoring video of the scenic spot comprises a tourist area video and an animal area video, the two videos are related, and related segmentation needs to be performed on the two videos, considering that the animal video is basically a fixed animal, the image difference in the animal area video is not large, and the effect of taking the animal area video as the basis of the segmentation time interval is not good, so that the video segmentation is performed on the tourist area firstly, and then the animal area video is segmented according to the time interval corresponding to the segmentation result of the tourist area video.
3. The feeding action of the animal is considered to be after the feeding action of the person, namely the feeding action of the animal and the feeding action of the person have certain time difference and are not completely synchronous; therefore, the division of the corresponding time interval of the tourist video and the animal video is processed by adding the preset offset M frames to the end frame of the animal region video clip.
4. The method provided by the invention has high accuracy and efficiency in recognizing the feeding behavior in scenic spots, and is suitable for large-area popularization and application in tourist scenic spots.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a flow chart of a scenic spot feeding behavior identification method of the present invention.
FIG. 2 is a block diagram of a multi-stream convolutional neural network according to the present invention.
Fig. 3 is a structural diagram of a scenic spot feeding behavior recognition device of the present invention.
Fig. 4 is a frame diagram of a scenic spot feeding behavior recognition system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1
As shown in fig. 1, the invention relates to a scenic spot feeding behavior identification method, which comprises the following steps:
s1: acquiring a monitoring video of a scenic spot, wherein the monitoring video of the scenic spot comprises a tourist area video and an animal area video; the tourist area video is a video shot aiming at a fixed tourist area, and the animal area video is a video shot aiming at a fixed animal area.
S2: performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; segmenting the video of the animal region according to the time interval corresponding to the video clip set of the tourist region to obtain a video clip set of the animal region; obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
s3: according to the video clip group set, respectively extracting time characteristics and space characteristics of video clips in a tourist area and video clips in an animal area in each video clip group by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of tourists and the feeding behaviors of animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
In the embodiment, a reporting device is arranged in the tourist area according to the feeding behavior, and tourists report according to the seen feeding behavior so as to obtain a reporting signal; of course, the processing can also be performed according to the non-reporting signal. Performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; the method specifically comprises the following steps:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; based on the guest video being taken for a fixed area, the background information in the video image is fixed, and thus the background image for the guest area is known; the fixed background information is an interference factor for video segmentation, and the interference information needs to be removed. Then for each sampled image, calculating its pixel difference value D1, D2, …, Dn from the background image of the tourist area; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; such as the difference DD1 | D1-D2| between the sampled image I1 and the image I2. When the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; and calculating the division points of all the video streams of the tourist area videos by analogy to form a tourist area video clip set P { P1, P2, …, Pn } < Sn, Sm >.
In this embodiment, the animal region video is segmented according to the time interval corresponding to the tourist region video clip set to obtain an animal region video clip set; the method specifically comprises the following steps:
the time interval corresponding to the tourist area video clip set is the time period + the preset offset corresponding to the tourist area video clip set;
according to the time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, and considering that the animal feeding action is after the human feeding action, namely the animal feeding action and the human feeding action have a certain time difference and are not completely synchronous; therefore, a preset offset M frame is added to the end frame of the animal region video clip, and min (Wn + M, Wmax) is taken as the corresponding end frame, wherein Wmax represents the longest frame number of the animal region video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax). For example, the first fragment a1 of the animal region is < W1, min (W2+ M, Wmax) >.
The two videos (guest area video, animal area video) are divided into video clip groups < Pn, An >, Pn denotes guest area video clips, and An denotes animal area video clips in the above manner.
In the embodiment, the multi-stream convolution neural network model is adopted for feature fusion and recognition, the feeding behavior recognition considers the human feeding body behavior and the animal feeding body behavior at the same time, and the recognition accuracy is improved. The structure of the multi-stream convolutional neural network is shown in fig. 2, each small square box in the animal-spatial feature extraction network, the animal-temporal feature extraction network, the guest-spatial feature extraction network, and the guest-temporal feature extraction network represents a convolutional layer of the multi-stream convolutional neural network, and each extraction network is implemented by adopting a plurality of convolutional layers. Step S3 includes the following substeps:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
The spatial stream convolutional network and the time stream convolutional network can be obtained by repeatedly training historical data.
The multi-stream convolution neural network model adopts a loss function calculation formula as follows:
Loss=XLoss1+YLoss2
x, Y respectively represent the Loss functions of the tourist behavior classifier and the animal behavior classifier corresponding to the weights Loss1 and Loss2 respectively.
The working principle is as follows:
at present, the mode of the non-civilized feeding monitoring of tourist attractions is mainly a manual monitoring mode, a large amount of manpower and material resources are needed, and the identification precision is not high; the traditional method for recognizing the non-civilized feeding behavior based on the deep learning only considers the feeding behavior of tourists and recognizes the feeding behavior through limb actions, but simple limb actions which are not fed, such as bending arms or doing sports, of a human body can be mistakenly recognized as the feeding behavior, so that the recognition accuracy and efficiency are not high; causing damage to animals caused by improper or overeating.
The invention designs a scenic spot feeding behavior recognition method aiming at the problems, the invention considers that the feeding behavior is an interactive behavior, and emphasizes recognition on the behavior that tourists feed and animals eat fed food aiming at scenic spots which do not allow feeding, thereby avoiding mistaken feeding or overeating damage to the animals; the recognition that the tourists throw and feed and the animals eat again can recognize the throwing and feeding behaviors of the tourists and also can consider the feeding behaviors of raising heads, chewing heads, lowering heads, chewing and the like of the animals, and then the recognition of feature and classifier training is carried out by adopting a multi-stream convolution neural network model, so that the accuracy of the recognition of the throwing and feeding behaviors is improved, and the overeating injury to the animals is avoided. Specifically, special video acquisition is carried out on throwing and lifting actions of tourists, and special video acquisition is carried out on feeding actions of animal raising chewing, lowering chewing and the like, so that the accuracy of video acquisition of different objects is guaranteed; accordingly, the monitoring video of the scenic spot comprises a tourist area video and an animal area video, the two videos are related, and related segmentation needs to be performed on the two videos, considering that the animal video is basically a fixed animal, the image difference in the animal area video is not large, and the effect of taking the animal area video as the basis of the segmentation time interval is not good, so that the video segmentation is performed on the tourist area firstly, and then the animal area video is segmented according to the time interval corresponding to the segmentation result of the tourist area video. Obtaining a video clip group set after segmentation, and considering that the behaviors of tourists and animals in the same similar interval are fused in the video clip group; then, continuously extracting time characteristics and space characteristics of the video clips of the tourist area and the video clips of the animal area in each video clip group respectively by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of the tourists and the feeding behaviors of the animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
The method provided by the invention has high accuracy and efficiency in recognizing the feeding behavior in scenic spots, and is suitable for large-area popularization and application in tourist scenic spots.
Example 2
As shown in fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a scenic spot feeding behavior recognition apparatus that supports a scenic spot feeding behavior recognition method described in embodiment 1, the apparatus including:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a monitoring video of a scenic spot, and the monitoring video of the scenic spot comprises a tourist area video and an animal area video;
the video segmentation unit comprises a tourist area video segmentation subunit and an animal area video segmentation subunit; the tourist area video segmentation subunit is used for performing video segmentation on the tourist area video to obtain a tourist area video clip set; the animal region video segmentation subunit is used for segmenting the animal region video according to the time interval corresponding to the tourist region video clip set to obtain an animal region video clip set;
the video clip group unit is used for obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
the feeding behavior identification unit is used for respectively extracting time characteristics and space characteristics of the video clips of the tourist area and the video clips of the animal area in each video clip group by adopting a multi-stream convolution neural network model according to the video clip group set, and identifying the feeding behaviors of the tourists and the feeding behaviors of the animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
Specifically, the execution process of the guest region video segmentation subunit is as follows:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; calculating pixel difference values D1, D2, … and Dn of the sampling images and the background images of the tourist areas respectively; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; when the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; therefore, the division points of all the video streams of the tourist area videos are calculated, and a tourist area video clip set P { P1, P2, …, Pn } is formed.
Specifically, the execution process of the animal region video segmentation subunit is as follows:
according to time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, adding a preset offset M frame to an end frame of an animal area video clip, and taking min (Wn + M and Wmax) from the corresponding end frame, wherein Wmax represents the longest frame number of the animal area video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax). For example, the first fragment a1 of the animal region is < W1, min (W2+ M, Wmax) >.
Specifically, the two videos (the guest area video and the animal area video) are divided into the video clip groups < Pn, An >, Pn represents the guest area video clip, and An represents the animal area video clip in the above manner in the video clip group unit.
Specifically, the feeding behavior recognition unit executes the following steps:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
Example 3
As shown in fig. 4, the difference between this embodiment and embodiment 1 is that this embodiment provides a scenic spot feeding behavior recognition system, which includes a video signal acquisition module, a report signal acquisition module, a feeding behavior recognition module, an alarm module, a face recognition module, a ticket and an evidence storage module; the execution process of the feeding behavior recognition module is executed according to the scenic spot feeding behavior recognition method;
the video signal acquisition module is used for acquiring video stream signals at different angles;
the reporting signal acquisition module is used for acquiring reporting signals of the tourists on feeding behaviors;
the feeding behavior recognition module is used for recognizing feeding behaviors according to the video stream signals and the reporting signals;
the alarm module sends out an alarm signal according to the identified feeding behavior and plays warning information in the panoramic area;
the face recognition module is used for recognizing the identity information of the feeding actor;
and the ticket and storage module is used for issuing a ticket to the offender triggering multiple warnings and storing the offender information, the ticket record and the screenshot information of the relevant evidence video frame.
When the system of the invention is used for feeding, the alarm module sends out an alarm and identifies the offender through the face recognition technology, the offender is punished and the evidence is kept, meanwhile, the tourist reporting device is arranged, when other tourists find the non-civilized feeding behavior, the system can click the reporting device, and the system can identify the non-civilized feeding behavior.
When the system of the invention is used for feeding, the alarm module gives an alarm and identifies the offender through the face recognition technology, the offender is punished and the evidence is kept, meanwhile, the tourist reporting button device is arranged, when other tourists find the non-civilized feeding behavior, the system can click the reporting button, and the system can identify the non-civilized feeding behavior.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A scenic spot feeding behavior recognition method is characterized by comprising the following steps:
s1: acquiring a monitoring video of a scenic spot, wherein the monitoring video of the scenic spot comprises a tourist area video and an animal area video;
s2: performing video segmentation on the video of the tourist area to obtain a video clip set of the tourist area; segmenting the video of the animal region according to the time interval corresponding to the video clip set of the tourist region to obtain a video clip set of the animal region; obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
s3: according to the video clip group set, respectively extracting time characteristics and space characteristics of video clips in a tourist area and video clips in an animal area in each video clip group by adopting a multi-stream convolution neural network model, and identifying the feeding behaviors of tourists and the feeding behaviors of animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
2. A scenic spot feeding behavior recognition method according to claim 1, wherein the tourist area video is a video shot for a fixed tourist area, and the animal area video is a video shot for a fixed animal area.
3. A scenic spot feeding behavior recognition method as claimed in claim 1, wherein the video segmentation is performed on the video of the tourist area to obtain a set of video clips of the tourist area; the method specifically comprises the following steps:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; calculating pixel difference values D1, D2, … and Dn of the sampling images and the background images of the tourist areas respectively; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; when the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; therefore, the division points of all the video streams of the tourist area videos are calculated, and a tourist area video clip set P { P1, P2, …, Pn } is formed.
4. A scenic spot feeding behavior identification method according to claim 3, wherein the animal region video is segmented according to a time interval corresponding to the tourist region video clip set to obtain an animal region video clip set; the method specifically comprises the following steps:
the time interval corresponding to the tourist area video clip set is the time period + the preset offset corresponding to the tourist area video clip set;
according to time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, adding a preset offset M frame to an end frame of an animal area video clip, and taking min (Wn + M and Wmax) from the corresponding end frame, wherein Wmax represents the longest frame number of the animal area video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax).
5. A scenic spot feeding behavior recognition method as claimed in claim 1, wherein step S3 includes the following sub-steps:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
6. A scenic spot feeding behavior recognition apparatus that supports a scenic spot feeding behavior recognition method according to any one of claims 1 to 5, the apparatus comprising:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a monitoring video of a scenic spot, and the monitoring video of the scenic spot comprises a tourist area video and an animal area video;
the video segmentation unit comprises a tourist area video segmentation subunit and an animal area video segmentation subunit; the tourist area video segmentation subunit is used for performing video segmentation on the tourist area video to obtain a tourist area video clip set; the animal region video segmentation subunit is used for segmenting the animal region video according to the time interval corresponding to the tourist region video clip set to obtain an animal region video clip set;
the video clip group unit is used for obtaining a video clip group set according to the tourist area video clip set and the animal area video clip set; the video clip group set comprises a plurality of video clip groups;
the feeding behavior identification unit is used for respectively extracting time characteristics and space characteristics of the video clips of the tourist area and the video clips of the animal area in each video clip group by adopting a multi-stream convolution neural network model according to the video clip group set, and identifying the feeding behaviors of the tourists and the feeding behaviors of the animals; and performing characteristic fusion on the tourist feeding behavior and the animal feeding behavior to obtain a feeding behavior result.
7. A scenic spot feeding behavior recognition apparatus according to claim 6, wherein the execution of the tourist area video segmentation subunit is as follows:
sampling every N frames from the 0 th frame to form sampling images S1, S2, … and Sn aiming at the non-reporting signal;
aiming at the reported signal, calculating the video frame number corresponding to the reported time, and sampling every N frames from the frame forward and backward respectively to form sampling images I1, I2, … and In; calculating pixel difference values D1, D2, … and Dn of the sampling images and the background images of the tourist areas respectively; calculating difference values DD1, DD2, … and DDn between each sampling image according to the pixel difference values D1, D2, … and Dn, wherein DDi is | Dn-D (n +1) |; when the difference value DDn between the sampling images is greater than a set threshold value T, the sampling frame Sn corresponding to the images is a segmentation point; therefore, the division points of all the video streams of the tourist area videos are calculated, and a tourist area video clip set P { P1, P2, …, Pn } is formed.
8. A scenic spot feeding behavior recognition apparatus as claimed in claim 7, wherein the animal region video segmentation subunit performs the following process:
according to time points corresponding to the segmentation points S1, S2, … and Sn of the tourist area video, searching frames W1, W2, … and Wn corresponding to the time points of the animal area video, adding a preset offset M frame to an end frame of an animal area video clip, and taking min (Wn + M and Wmax) from the corresponding end frame, wherein Wmax represents the longest frame number of the animal area video; the nth fragment An of the animal region was obtained as < Wn, min (W (n +1) + M, Wmax).
9. A scenic spot feeding behavior recognition device according to claim 6, wherein the feeding behavior recognition unit performs the following process:
according to the video clip group, aiming at the video clip of the tourist area and the video clip of the animal area, respectively, the RGB image in the video clip is used as the input of a spatial stream convolution network;
calculating optical flow images aiming at two adjacent frames in the video as the input of a time flow convolution network;
respectively extracting the spatial characteristics of the RGB images of the video of the tourist area and the video of the animal area by utilizing a spatial stream convolution network;
respectively extracting time characteristics in a multi-frame superposed optical flow image in a tourist area video sequence and an animal area video sequence by utilizing a time flow convolution network;
respectively fusing spatial features and temporal features in videos of each region, and respectively identifying tourist behavior classification results and animal behavior classification results;
and fusing the tourist behavior classification result and the animal behavior classification result to obtain a final feeding behavior result.
10. A scenic spot feeding behavior recognition system is characterized by comprising a video signal acquisition module, a reporting signal acquisition module, a feeding behavior recognition module, an alarm module, a face recognition module, a penalty list and an evidence storage module; the execution process of the feeding behavior recognition module is executed according to a scenic spot feeding behavior recognition method as claimed in any one of claims 1 to 5;
the video signal acquisition module is used for acquiring video stream signals at different angles;
the reporting signal acquisition module is used for acquiring reporting signals of the tourists on feeding behaviors;
the feeding behavior recognition module is used for recognizing feeding behaviors according to the video stream signals and the reporting signals;
the alarm module sends out an alarm signal according to the identified feeding behavior and plays warning information in the panoramic area;
the face recognition module is used for recognizing the identity information of the feeding actor;
and the ticket and storage module is used for issuing a ticket to the offender triggering multiple warnings and storing the offender information, the ticket record and the screenshot information of the relevant evidence video frame.
CN202111239654.6A 2021-10-25 2021-10-25 Scenic spot feeding behavior identification method, device and system Pending CN114005056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111239654.6A CN114005056A (en) 2021-10-25 2021-10-25 Scenic spot feeding behavior identification method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111239654.6A CN114005056A (en) 2021-10-25 2021-10-25 Scenic spot feeding behavior identification method, device and system

Publications (1)

Publication Number Publication Date
CN114005056A true CN114005056A (en) 2022-02-01

Family

ID=79923716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111239654.6A Pending CN114005056A (en) 2021-10-25 2021-10-25 Scenic spot feeding behavior identification method, device and system

Country Status (1)

Country Link
CN (1) CN114005056A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117156203B (en) * 2023-09-19 2024-05-07 广西艺术学院 Automatic video display method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117156203B (en) * 2023-09-19 2024-05-07 广西艺术学院 Automatic video display method and system

Similar Documents

Publication Publication Date Title
Tejero-de-Pablos et al. Summarization of user-generated sports video by using deep action recognition features
CN111010590B (en) Video clipping method and device
CN106162223B (en) News video segmentation method and device
Zhu et al. Multi-drone-based single object tracking with agent sharing network
CN112507760B (en) Method, device and equipment for detecting violent sorting behaviors
CN111401238B (en) Method and device for detecting character close-up fragments in video
Ravi et al. A dataset and preliminary results for umpire pose detection using SVM classification of deep features
CN107133629B (en) Picture classification method and device and mobile terminal
CN106648078A (en) Multimode interaction method and system applied to intelligent robot
JP6787831B2 (en) Target detection device, detection model generation device, program and method that can be learned by search results
CN110176024A (en) Method, apparatus, equipment and the storage medium that target is detected in video
CN110856039A (en) Video processing method and device and storage medium
CN113128368B (en) Method, device and system for detecting character interaction relationship
CN112417970A (en) Target object identification method, device and electronic system
CN113435355A (en) Multi-target cow identity identification method and system
CN106529500A (en) Information processing method and system
CN112712051A (en) Object tracking method and device, computer equipment and storage medium
CN114005056A (en) Scenic spot feeding behavior identification method, device and system
Liu et al. Automated player identification and indexing using two-stage deep learning network
CN113312951A (en) Dynamic video target tracking system, related method, device and equipment
US20210374419A1 (en) Semi-Supervised Action-Actor Detection from Tracking Data in Sport
CN114821441A (en) Deep learning-based airport scene moving target identification method combined with ADS-B information
CN114241363A (en) Process identification method, process identification device, electronic device, and storage medium
CN110662106B (en) Video playback method and device
CN111160156B (en) Method and device for identifying moving object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination