WO2020151084A1 - 目标对象的监控方法、装置及系统 - Google Patents

目标对象的监控方法、装置及系统 Download PDF

Info

Publication number
WO2020151084A1
WO2020151084A1 PCT/CN2019/080747 CN2019080747W WO2020151084A1 WO 2020151084 A1 WO2020151084 A1 WO 2020151084A1 CN 2019080747 W CN2019080747 W CN 2019080747W WO 2020151084 A1 WO2020151084 A1 WO 2020151084A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
video
image
server
video frame
Prior art date
Application number
PCT/CN2019/080747
Other languages
English (en)
French (fr)
Inventor
臧云波
支建壮
鲁邹尧
吴明辉
Original Assignee
北京明略软件系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京明略软件系统有限公司 filed Critical 北京明略软件系统有限公司
Priority to JP2019570566A priority Critical patent/JP7018462B2/ja
Publication of WO2020151084A1 publication Critical patent/WO2020151084A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • This application relates to the computer field, and in particular to a method, device and system for monitoring a target object.
  • the current method of monitoring the target object is usually to identify the target object in the captured video, but this method is often inefficient.
  • the embodiments of the present application provide a method, device, and system for monitoring a target object, so as to at least solve the problem of low efficiency in monitoring the target object in related technologies.
  • a method for monitoring a target object including: a first server receives an image sent by a video surveillance device when a moving object is detected in the target area, wherein the image It is an image obtained from a target video where the object appears in a video captured by the video monitoring device of the target area; the first server determines whether the object is a target object according to the image.
  • the method further includes: in a case where the object is determined to be the target object, the first server Obtain the target video.
  • the first server acquiring the target video includes: the first server acquiring the target video from the video surveillance device; or, the first server acquiring the target video from a second server, Wherein, the target video is sent to the second server by the video monitoring device when a moving object is detected in the target area.
  • the method further includes: in a case where it is determined that the object is not the target object, the first The server sends instruction information to the second server, where the instruction information is used to instruct the second server to delete the target video.
  • the method further includes: the first server determines in the target video a movement track of the target object in the target area.
  • the method further includes: the first server generates according to the movement track Prompt information, wherein the prompt information is used to prompt a way to eliminate the target object.
  • the method further includes: the first server generates alarm information corresponding to the target object, wherein the alarm information is used to indicate The target object appears in the target area, and the alarm information includes at least one of the following: the target video, the movement track, and the prompt information; the first server sends the alarm information to Client.
  • the method further includes: the video surveillance device detects that a moving object appears in the target area.
  • a moving object a video image is intercepted from the video obtained by the video surveillance device shooting the target area every predetermined time since the object appears in the target area, until the object no longer appears in the In the target area, the image includes the video image; the video surveillance device sends the intercepted video image to the first server in real time; or, the video surveillance device acquires all the intercepted videos And send the image set to the first server.
  • the first server determining whether the object is the target object according to the image includes: the first server recognizing whether the object in each received video image is the target object , Obtain the recognition result corresponding to each of the video images; the first server merges the recognition results corresponding to all the received video images into a target result; the first server determines the recognition result according to the target result Whether the object is the target object.
  • the first server identifying whether the object in each of the received video images is the target object includes: the first server determining that each of the received video images is Whether the object appears; the first server recognizes whether the object in the video image in which the object appears is the target object.
  • the first server determining whether the object is a target object according to the image includes:
  • the first server detects the target object for each target video frame image to obtain the image characteristics of each target video frame image, wherein the image includes multiple target video frames obtained from the target video Image, each target video frame image is used to indicate the object in the target area, and the image feature is used to indicate that the similarity between the target object and the target object is greater than the first The target image area where the threshold object is located;
  • the first server determines the motion feature according to the image feature of each of the target video frame images, where the motion feature is used to indicate the motion speed and the motion direction of the object in the multiple target video frame images;
  • the first server determines whether the target object appears in the multiple target video frame images according to the motion characteristic and the image characteristic of each target video frame image.
  • the first server determining the motion feature according to the image feature of each target video frame image includes:
  • the moving speed and moving direction of the object when passing through the target image area forming the first target vector according to the time sequence of each target video frame image in the video file by the multiple target vectors, wherein, the motion feature includes the first target vector; or
  • each of the two-dimensional optical flow diagrams includes a corresponding The moving speed and moving direction of the object in one of the target video frame images when passing through the target image area; the multiple two-dimensional optical flow diagrams are displayed in the video file according to each of the target video frame images
  • the time sequence in composes a three-dimensional second target vector, wherein the motion feature includes the three-dimensional second target vector.
  • the first server determining whether the target object appears in the multiple target video frame images according to the motion characteristic and the image characteristic of each target video frame image includes:
  • the motion feature and the image feature of each target video frame image are input into a pre-trained neural network model to obtain an object recognition result, where the object recognition result is used to represent the multiple target video frames Whether the target object appears in the image.
  • inputting the motion feature and the image feature of each target video frame image into a pre-trained neural network model to obtain an object recognition result includes:
  • a neural network layer structure including a convolution layer, a regularization layer, and an activation function layer to obtain a plurality of first feature vectors; fuse the plurality of first feature vectors with the motion feature , Obtain the second feature vector; input the second feature vector to the fully connected layer for classification, and obtain the first classification result, wherein the neural network model includes the neural network layer structure and the fully connected layer, so The object recognition result includes the first classification result, and the first classification result is used to indicate whether the target object appears in the multiple target video frame images; or
  • each image feature through a first neural network layer structure including a convolutional layer, a regularization layer, and an activation function layer to obtain multiple first feature vectors; pass the motion feature through a convolutional layer, a regularization layer 1.
  • the result includes the second classification result, and the second classification result is used to indicate whether the target object appears in the multiple target video frame images.
  • the receiving, by the first server, the image sent by the video surveillance device when a moving object is detected in the target area includes:
  • the first server receives the multiple target video frame images sent by a video surveillance device, where the multiple target video frame images are obtained by sampling the target video by the video surveillance device to obtain a set of Video frame images, and determined in the set of video frame images according to the pixel values of the pixels in the set of video frame images; or,
  • the first server receives a set of video frame images sent by a video surveillance device, where the set of video frame images is obtained by sampling the target video by the video surveillance device; the first server The multiple target video frame images are determined in the group of video frame images according to the pixel values of the pixels in the group of video frame images.
  • the first server includes: a first cloud server.
  • the second server includes: a second cloud server.
  • a method for monitoring a target object includes: when a video monitoring device detects that a moving object appears in the target area, shooting the target area from the video monitoring device In the obtained video, an image is obtained from the target video where the object appears; the video monitoring device sends the image to the first server, where the image is used to instruct the first server to determine the location based on the image Whether the object is the target object.
  • the method further includes: the video monitoring device sends the target video to a second server, where the second server is set to In a case where the first request sent by the first server is received, the target video is sent to the first server in response to the first request.
  • the method further includes: the video surveillance device receives a second request sent by the first server; the video surveillance device responds The second request sends the target video to the first server.
  • acquiring an image from the target video where the object appears in the video obtained by the video surveillance device shooting the target area includes: the video surveillance device detects that a moving object appears in the target area From the moment the object appears in the target area, a video image is intercepted every predetermined time from the video obtained by the video surveillance device shooting the target area until the object no longer appears in the target area ,
  • the image includes the video image; sending the image to the first server by the video monitoring device includes: the video monitoring device sends the intercepted video image to the first server in real time; or The video monitoring device obtains an image set including all the intercepted video images, and sends the image set to the first server.
  • the method further includes: the video monitoring device obtains from the video obtained by shooting the target area from the object appearing in the target area The object starts with the first video until the object no longer appears in the target area; the video monitoring device acquires the second video of the first target time period before the object appears in the target area and all The third video of the second target time period after the object no longer appears in the target area; the video monitoring device determines the second video, the first video, and the third video as the target video.
  • a monitoring system for a target object including: a video monitoring device and a first server, wherein the video monitoring device is connected to the first server; the video monitoring device is configured To obtain an image from the target video where the object appears in the video obtained by shooting the target area when a moving object is detected in the target area, and send the image to the first server ; The first server is configured to determine whether the object is a target object according to the image.
  • the video surveillance device is configured to: in the case of detecting that a moving object appears in the target area, start from the occurrence of the object in the target area, start from the video surveillance device every predetermined time
  • the video image is intercepted from the video captured by the target area until the object no longer appears in the target area, and the image includes the video image; the intercepted video image is sent to the first server in real time Or, acquiring an image set including all the captured video images, and sending the image set to the first server.
  • the first server is configured to: identify whether the object in each of the received video images is the target object, and obtain the recognition result corresponding to each of the video images; Recognition results corresponding to all the video images obtained are merged into a target result; and whether the object is a target object is determined according to the target result.
  • the first server is further configured to: in a case where it is determined that the object is the target object, obtain the target video; determine in the target video that the target object is in the target The movement trajectory in the area; generate prompt information according to the movement trajectory, wherein the prompt information is used to prompt the way to eliminate the target object; generate alarm information corresponding to the target object, wherein the alarm information is used for It is indicated that the target object appears in the target area, and the alarm information includes at least one of the following: the target video, the movement track, and the prompt information.
  • the system further includes: a client, wherein the first server is connected to the client; the first server is configured to send the alarm information to the client; the client Set to display the alarm information on the display interface.
  • the system further includes: a second server, wherein the second server is connected to the video monitoring device and the first server; the video monitoring device is further configured to send the video to the The second server; the second server is configured to store the target video; the first server is configured to obtain the target video from the second server.
  • a second server wherein the second server is connected to the video monitoring device and the first server; the video monitoring device is further configured to send the video to the The second server; the second server is configured to store the target video; the first server is configured to obtain the target video from the second server.
  • the first server is further configured to send indication information to the second server in a case where it is determined that the object is not the target object; the second server is configured to: respond to the indication The information deletes the target video.
  • the video monitoring device is further configured to: acquire from a video obtained by shooting the target area from the time the object appears in the target area until the object no longer appears in the target area Acquiring a second video in a first target time period before the object appears in the target area and a third video in a second target time period after the object no longer appears in the target area; The second video, the first video, and the third video are determined as the target video.
  • a monitoring device for a target object which is applied to a first server, and includes: a receiving module configured to receive when a video monitoring device detects a moving object in the target area The sent image, where the image is an image obtained from the target video where the object appears in the video captured by the video monitoring device in the target area; the determining module is configured to determine the Whether the object is the target object.
  • a monitoring device for a target object which is applied to a video monitoring device, and includes: an acquisition module configured to detect a moving object in the target area from the The video obtained by the video monitoring device shooting the target area acquires an image on the target video where the object appears; the sending module is configured to send the image to the first server, where the image is used to indicate the first server A server determines whether the object is a target object according to the image.
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
  • the first server receives the image sent by the video surveillance device when a moving object in the target area is detected, where the image is the video obtained from the video surveillance device shooting the target area and the object appears The image obtained on the target video; the first server determines whether the object is the target object according to the image, the first server determines whether the object appearing in the target area is the target object according to the image obtained from the video surveillance device, the image is video surveillance When the device detects that a moving object appears in the target area, it is obtained from the target video of the object appearing in the video obtained by the video surveillance device shooting the target area, so the video surveillance device only needs to detect the target When a moving object appears in the area, it sends an image of a possible object to the first server.
  • the first server can determine whether the object appearing in the target area is the target object based on the received image. It can be seen that compared to monitoring the target object based on video The method can greatly reduce the amount of data transmitted, thereby increasing the transmission speed, reducing the transmission time, and improving the monitoring efficiency. Therefore, the problem of low efficiency in monitoring the target object in related technologies can be solved, and the effect of improving the efficiency of monitoring the target object can be achieved.
  • FIG. 1 is a block diagram of the hardware structure of a mobile terminal of a method for monitoring a target object according to an embodiment of the present application
  • Fig. 2 is a first flowchart of a method for monitoring a target object according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of a data connection of each module according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of the principle of a rat infestation detection system according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a Faster-RCNN network model according to an embodiment of the present application.
  • Fig. 6 is a second flowchart of a method for monitoring a target object according to an embodiment of the present application.
  • Fig. 7 is a first structural block diagram of a monitoring device for a target object according to an embodiment of the present application.
  • Fig. 8 is a second structural block diagram of a monitoring device for a target object according to an embodiment of the present application.
  • Fig. 9 is a structural block diagram of a target object monitoring system according to an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a target object monitoring architecture according to an optional embodiment of the present application.
  • FIG. 1 is a hardware structure block diagram of a mobile terminal of a method for monitoring a target object in an embodiment of the present application.
  • the mobile terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. ) And a memory 104 for storing data.
  • the above-mentioned mobile terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
  • FIG. 1 is merely illustrative, and does not limit the structure of the above-mentioned mobile terminal.
  • the mobile terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration from that shown in FIG.
  • the memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the monitoring method of the target object in the embodiment of the present application.
  • the processor 102 runs the computer programs stored in the memory 104, thereby Perform various functional applications and data processing, that is, realize the above-mentioned methods.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the mobile terminal 10 via a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is configured to receive or transmit data via a network.
  • the aforementioned optional network examples may include a wireless network provided by a communication provider of the mobile terminal 10.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (RF) module, which is configured to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • a method for monitoring a target object is provided.
  • Fig. 2 is a flowchart 1 of the method for monitoring a target object according to an embodiment of the present application. As shown in Fig. 2, the process includes the following steps:
  • step S202 the first server receives an image sent by the video surveillance device when a moving object is detected in the target area, where the image is the target of the object in the video obtained from the video surveillance device shooting the target area. Images captured on the video;
  • Step S204 The first server determines whether the object is a target object according to the image.
  • the target object may include, but is not limited to: rats, pests and other harmful organisms.
  • the target area may include, but is not limited to, a kitchen, a warehouse, a factory building, and so on.
  • the video monitoring device may include, but is not limited to, a camera, a monitor, and so on.
  • the aforementioned camera may include, but is not limited to, a camera with an infrared lighting function, for example, an infrared low-light night vision camera. Further, the camera may also include, but is not limited to: motion detection function, storage function, networking function (such as Wierless Fidelity (WIFI) networking) and high-definition (such as greater than 1080p) configuration.
  • a camera with an infrared lighting function for example, an infrared low-light night vision camera.
  • the camera may also include, but is not limited to: motion detection function, storage function, networking function (such as Wierless Fidelity (WIFI) networking) and high-definition (such as greater than 1080p) configuration.
  • WIFI Wierless Fidelity
  • the video surveillance device may include, but is not limited to, one or more video surveillance devices.
  • the first server may include, but is not limited to: a first cloud server.
  • a first cloud server For example: Ziyouyun.
  • the first server determines whether the object appearing in the target area is the target object according to the image obtained from the video surveillance device.
  • the image is the video surveillance device from the video when the video surveillance device detects a moving object in the target area.
  • the video obtained by the surveillance equipment shooting the target area is obtained from the target video where the object appears, so the video surveillance equipment only needs to send the possible object to the first server when a moving object is detected in the target area
  • the first server can determine whether the object appearing in the target area is the target object. It can be seen that compared with the method of monitoring the target object based on video, the amount of data transmitted can be greatly reduced, thereby increasing the transmission speed and reducing Transmission time improves monitoring efficiency. Therefore, the problem of low efficiency in monitoring the target object in related technologies can be solved, and the effect of improving the efficiency of monitoring the target object can be achieved.
  • the first server may obtain the target video after determining that the object appearing in the target area is the target object. If the object appearing in the target area is not the target object, the target video is no longer obtained, thereby saving Resources. For example: after the above step S204, in a case where the object is determined to be the target object, the first server obtains the target video.
  • the storage location of the target video may include, but is not limited to, a video surveillance device or a second server.
  • the first server may, but is not limited to, obtain the target video in one of the following ways:
  • Method 1 The first server obtains the target video from the video surveillance device.
  • the first server obtains the target video from the second server, where the target video is sent to the second server by the video surveillance device when a moving object is detected in the target area.
  • the second server may include but is not limited to: a second cloud server.
  • a second cloud server For example: fluorite cloud.
  • the video surveillance device may send the target video to the second server. If the first server determines that the object in the target area is not the target object according to the image, it may send indication information to the second server to instruct the second server to The target video is deleted to save storage space. For example: after the above step S204, in the case where it is determined that the object is not the target object, the first server sends instruction information to the second server, where the instruction information is used to instruct the second server to delete the target video.
  • the first server may analyze the movement track of the target object in the target area from the target video. For example: after the first server obtains the target video, the first server determines the movement track of the target object in the target area in the target video.
  • the first server may generate a suggestion for eliminating the target object according to the analyzed movement track of the target object, and provide it to the user. For example: after the first server determines the movement track of the target object in the target area in the target video, the first server generates prompt information according to the movement track, where the prompt information is used to prompt a way to eliminate the target object.
  • the first server may send alarm information carrying the target video, movement trajectory, and prompt information to the client to provide the user with an alarm of the target object, and how to eliminate the target object according to the movement trajectory of the target object, And the playback of the moving process of the target object is provided to users for their reference.
  • the first server may send alarm information carrying the target video, movement trajectory, and prompt information to the client to provide the user with an alarm of the target object, and how to eliminate the target object according to the movement trajectory of the target object, And the playback of the moving process of the target object is provided to users for their reference.
  • the first server after the first server generates prompt information according to the movement track, the first server generates alarm information corresponding to the target object, where the alarm information is used to indicate that the target object appears in the target area, and the alarm information includes at least one of the following: target Video, movement track, prompt information; the first server sends the alarm information to the client.
  • the video surveillance device may, but is not limited to, obtain the image sent to the first server in the following manner: when the video surveillance device detects a moving object in the target area, When the object appears in the video, it starts to intercept the video image from the video obtained by the video surveillance equipment shooting the target area at predetermined intervals, until the object no longer appears in the target area, the image includes the video image; the video surveillance equipment will intercept the video image in real time Send to the first server; or, the video surveillance device obtains an image set including all the intercepted video images, and sends the image set to the first server.
  • the images sent by the video surveillance device to the first server may be multiple images, and the first server may recognize each image to obtain recognition results, and then merge these recognition results to obtain the final target result.
  • the first server recognizes whether the object in each received video image is the target object, and obtains the recognition result corresponding to each video image; the first server corresponds to all the received video images The recognition result of is fused into the target result; the first server determines whether the object is the target object according to the target result.
  • the first server may, but is not limited to, recognize whether the object in the video image is the target object in the following manner:
  • the first server determines whether an object appears in each video image received
  • the first server recognizes whether the object in the video image where the object appears is the target object.
  • the target object may be recognized but not limited to the following methods:
  • the first server detects the target object for each target video frame image to obtain the image characteristics of each target video frame image, where the image includes multiple target video frame images obtained from the target video, and each target video frame image It is used to indicate the object in the target area, and the image feature is used to indicate the target image area where the similarity between the object and the target object is greater than the first threshold among the moving objects;
  • the first server determines the motion feature according to the image feature of each target video frame image, where the motion feature is used to represent the motion speed and motion direction of the object in the multiple target video frame images;
  • the first server determines whether the target object appears in the multiple target video frame images according to the motion characteristics and the image characteristics of each target video frame image.
  • a method for determining a target object is also provided. Assuming that the video surveillance device is a camera device, the acquired image is an image frame extracted from the target video. The above method includes the following steps:
  • Step S1 Obtain a video file obtained by shooting the target area by the camera device.
  • the camera device may be a surveillance camera, for example, the camera device is an infrared low-light night vision camera for shooting and monitoring the target area to obtain a video file.
  • the target area is the space area detected in the target building, that is, the area used to detect whether there is a target object.
  • the target object can be a large-sized disease vector that needs to be controlled, for example, the target object For the mouse.
  • the video file of this embodiment includes original video data obtained by shooting a target area, and may include a surveillance video sequence of the target area, which is also an image video sequence.
  • the original video data of the target area is acquired through the ARM board at the video data collection layer to generate the above-mentioned video file, thereby achieving the purpose of collecting the video of the target area.
  • Step S2 Perform frame sampling on the video file to obtain a group of video frame images.
  • step S2 of this application after obtaining the video file captured by the camera device in the target area, the video file is preprocessed, and the video file can be sampled at the video data processing layer to obtain a set of Video frame image.
  • the video file can be sampled at equal intervals to obtain a set of video frame images of the video file.
  • a video file includes a sequence of 100 video frames. After the frame sampling is performed, 10 frames are obtained.
  • the 10 video frame sequences are used as the above-mentioned set of video frame images, thereby reducing the calculation amount of the algorithm for determining the target object.
  • Step S3 Determine multiple target video frame images in a group of video frame images according to pixel values of pixels in a group of video frame images.
  • step S3 of this application after sampling the video file to obtain a group of video frame images, the pixel values of the pixels in the group of video frame images are determined in a group of video frame images. Multiple target video frame images are generated, where each target video frame image is used to indicate an object moving in a corresponding target area.
  • preprocessing the video file also includes performing dynamic detection on the video file, and determining a target video frame image used to indicate an object moving in the target area from a set of video frame images, that is, in the A moving object in the target video frame image.
  • the target video frame image may be a video clip of a moving object, where the moving object may or may not be the target object.
  • the target video frame image can be determined by a dynamic detection algorithm, and multiple target video frame images can be determined in a group of video frame images according to the pixel values of pixels in a group of video frame images, and then step S4 is performed.
  • video frame images other than multiple target video frame images do not indicate that there is a moving image in the corresponding target area, and subsequent detection may not be performed.
  • Step S4 Perform target object detection on each target video frame image to obtain the image characteristics of each target video frame image.
  • each target video frame image After determining multiple target video frame images in a set of video frame images according to the pixel values of pixels in a set of video frame images, each target video frame image Perform target object detection to obtain the image characteristics of each target video frame image. For each target video frame image, the image characteristics are used to indicate that among the moving objects, the similarity with the target object is greater than the first The target image area where the threshold object is located.
  • the target object detection is performed on each target video frame image, that is, the moving object existing in the target video frame image is detected.
  • the target detection system can adopt the dynamic target detection method and the target based on neural network.
  • the detection method detects the moving objects in the target video frame image, and obtains the image characteristics of each target video frame image.
  • the dynamic target detection method has fast calculation speed and low requirements for machine configuration, while the neural network-based target The accuracy and robustness of the detection method is better.
  • the image feature can be the visual information in a rectangular frame to represent the target image area.
  • the rectangular frame can be a detection frame to indicate that the object is in a moving object and is The target image area where the similarity between the target objects is greater than the first threshold.
  • the above-mentioned image features are used to indicate the possible locations of the target objects confirmed by the coarse screen.
  • Step S5 Determine the motion feature according to the image feature of each target video frame image.
  • step S5 of the present application after the target object detection is performed on each target video frame image, and the image characteristics of each target video frame image are obtained, it is determined according to the image characteristics of each target video frame image
  • the motion feature where the motion feature is used to represent the motion speed and motion direction of objects moving in multiple target video frame images.
  • the image characteristics of each target video frame image can be input to the motion feature extraction module.
  • the motion feature extraction module determines the motion feature according to the image feature of each target video frame image. For multiple target video frame images, the motion feature is used to represent the motion speed and direction of the moving object in the multiple target video frame images , And at the same time further filter out the interference images caused by the movement of non-target objects, for example, delete the interference information such as the movement of mosquitoes.
  • the motion feature extraction algorithm of the motion feature extraction module may first detect multiple images based on the image features of each target video frame image.
  • the correlation of the image features between the target video frame images can determine the objects corresponding to the image features with high correlation as the same object, and match the image features of each target video frame image to obtain a series of moving pictures of the object.
  • a three-dimensional (3-Dimension, abbreviated as 3D) feature extraction network can be used to extract the features of the motion sequence to obtain the motion characteristics.
  • the detection frame of each target video frame image calculate the difference between multiple target video frame images
  • the correlation of the detection frame can determine the object corresponding to the detection frame with high correlation as the same object, and match the detection frame of each target video frame image to obtain a series of moving pictures of the object, and finally use the 3D feature extraction network
  • the features of the motion sequence are extracted to obtain the motion characteristics, and then the motion speed and motion direction of the moving objects in multiple target video frame images are determined.
  • the image features of multiple target video frames can also be fused and feature extraction is performed, so as to prevent a single frame of target detector from misjudgment, and then realize the precision of the target video frame image. Screen to accurately determine whether the target object appears.
  • Step S6 according to the motion characteristics and the image characteristics of each target video frame image, it is determined whether the target object appears in the multiple target video frame images.
  • the classification network is a pre-designed classification network model used to determine whether there are target objects in multiple target video frame images, and then determine according to the motion characteristics and the image characteristics of each target video frame image Whether there are target objects in multiple target video frame images, for example, determine whether there are rats in multiple target video frame images.
  • this embodiment can input the image features in the images with the target object in the multiple target video frame images to the front-end display interface, which can further display the detection frame and movement track of the target object.
  • the classification network model of this embodiment can be used to filter non-target object picture sequences, while retaining the target object picture sequence, thereby reducing the false alarm rate and ensuring the accuracy of the target object prompt information.
  • each target video frame image is used to indicate an object moving in the target area; target object detection is performed on each target video frame image to obtain each target video
  • the image feature of the frame image where the image feature is used to indicate the target image area where the similarity between the target object and the target object is greater than the first threshold among the moving objects; it is determined according to the image characteristics of each target video frame image
  • Motion features where the motion features are used to indicate the speed and direction of the moving objects in multiple target video frames; according to the motion characteristics and the image characteristics of each target video frame, determine whether the multiple target video frames There is a target.
  • the video file in the target area is sampled to obtain a set of video frame images.
  • a set of video frame images is determined to indicate the target area.
  • the multiple target video frame images of the moving object in the moving object and then determine the motion characteristics according to the image characteristics of each target video frame image, and then according to the motion characteristics and the image characteristics of each target video frame image, to automatically determine multiple target video frames Whether the purpose of the target object appears in the image not only greatly reduces the labor cost of determining the target object, but also improves the accuracy of determining the target object, solves the problem of low efficiency in determining the target object, and thus achieves the improvement of rat infestation The effect of detection accuracy.
  • step S3, determining multiple target video frame images in a group of video frame images according to the pixel values of pixels in a group of video frame images includes: acquiring The average pixel value of each pixel; get the difference between the pixel value of each pixel in each video frame image in a group of video frame images and the corresponding average pixel value; combine a group of video frame images The video frame image whose difference value meets the predetermined condition is determined as the target video frame image.
  • each pixel point in a group of video frame images can be obtained Calculate the average pixel value according to the pixel value of each pixel, and then obtain the difference between the pixel value of each pixel in a group of video frame images and the corresponding average pixel value.
  • this embodiment may also obtain the difference between the pixel value of each pixel in each video frame image in a group of video frame images and the background or the previous frame of each video frame image.
  • the video frame image of a group of video frame images whose difference value meets the predetermined condition is determined as the target video frame image, thereby obtaining multiple targets in the group of video frame images Video frame image.
  • each video frame image is regarded as In the current video frame image, each pixel is regarded as the current pixel.
  • (x, y) can be used to indicate the coordinates of the current pixel in the current video frame image, for example, the upper left corner of the current video frame image is the origin, and the width
  • the direction is the X axis
  • the height direction is the coordinate of the pixel in the coordinate system established by the Y axis.
  • the pixel value of the current pixel is represented by f(x,y), and the average pixel value of the current pixel is represented by b(x,y).
  • each video frame image is regarded as the current video frame image, and each pixel is viewed Is the current pixel
  • M(x,y) represents the current video frame image
  • D(x,y) represents the difference between the pixel value of the current pixel and the corresponding average pixel value
  • T represents the first preset Threshold
  • multiple target video frame images in a group of video frame images form a moving target video frame image, and all moving objects can be obtained by combining pixels through morphological operations as an output result.
  • the detection of moving objects in the target video frame image in this embodiment is a neural network-based target detection.
  • a group of video frame images can be input to a pre-trained network model to obtain all moving objects and their confidence levels. , And use image features greater than a certain confidence threshold as the output of the network module.
  • the network model used can include, but is not limited to, Single Shot MultiBox Detector (SSD), Regional Convolutional Network (Faster Region-CNN, Faster-RCNN), Feature Pyramid Network (Feature Pyramid Network). , Referred to as FPN), etc., there are no restrictions here.
  • the time sequence in the video file composes the first target vector, where the motion feature includes the first target vector; or the two-dimensional optical flow diagram corresponding to the target image area represented by the image feature of each target video frame image is obtained to obtain Multiple two-dimensional optical flow diagrams, where each two-dimensional optical flow diagram includes the movement speed and direction of the moving object in a corresponding target video frame image when passing through the target image area;
  • the time sequence of each target video frame image in the video file forms a three-dimensional second target vector, where the motion feature includes the three-dimensional second target vector.
  • the image feature of each target video frame image can be used to represent the target vector corresponding to the target image area, so as to obtain multiple target vectors one-to-one corresponding to multiple target video frames, each of which is The vector is used to represent the moving speed and direction of the moving object in the corresponding target video frame image when passing the target image area, that is, the moving speed of the moving object in each target video frame image when passing the target image area And the direction of motion, as the image characteristics of each target video frame image.
  • the multiple target vectors are formed into the first target vector according to the time sequence of each target video frame image in the video file, where the time sequence of each target video frame image in the video file can be passed
  • the time axis is expressed, and multiple target vectors can be spliced along the time axis to obtain a first target vector, the first target vector is a one-dimensional vector, and the one-dimensional vector is output as a motion feature.
  • each target video frame image is used to represent the target image area
  • the optical flow (optical flow or optic flow) of each target image area can be calculated to obtain the two-dimensional optical flow corresponding to the target image area Figure, and then obtain multiple two-dimensional optical flow diagrams corresponding to multiple target video frame images one-to-one, where the optical flow is used to describe the movement of the observation target, surface or edge caused by the movement of the observer.
  • Each two-dimensional optical flow diagram of this embodiment includes the moving speed and direction of the moving object in a corresponding target video frame image when passing through the target image area, that is, the moving object in the target video frame image is passing through the target image area.
  • the speed and direction of movement at time can be represented by a two-dimensional optical flow diagram.
  • the multiple two-dimensional optical flow diagrams are formed into a three-dimensional second target vector according to the time sequence of each target video frame image in the video file, where each target video frame image is in the video file.
  • the time sequence in the file can be represented by the time axis.
  • Multiple two-dimensional optical flow graphs can be spliced along the time axis to obtain a second target vector.
  • the second target vector is a three-dimensional vector.
  • This embodiment adopts a target vector used to represent the moving speed and direction of the moving object in a corresponding target video frame image when passing through the target image area, or the target image area represented by the image characteristics of each target video frame image.
  • the corresponding two-dimensional optical flow diagram is used to determine the motion feature.
  • the motion feature can be a one-dimensional vector or a three-dimensional vector, thereby achieving the purpose of determining the motion feature according to the image feature of each target video frame image, and then according to the motion feature With the image characteristics of each target video frame image, determine whether there are target objects in multiple target video frame images, achieve the purpose of automatically determining whether there are target objects in multiple target video frame images, and improve the accuracy of determining target objects rate.
  • a feature map is output by a network that combines the detection of the above-mentioned moving object (target detection) and motion feature extraction.
  • the feature map is fused with a four-dimensional vector including visual and motion features, where the four-dimensional
  • the vector may include, but is not limited to, time dimension, channel dimension, long dimension, and high dimension.
  • step S6 determining whether a target object appears in the multiple target video frame images according to the motion characteristics and the image characteristics of each target video frame image includes: combining the motion characteristics and each target video frame The image features of the image are input into a pre-trained neural network model to obtain an object recognition result, where the object recognition result is used to indicate whether there are target objects in multiple target video frame images.
  • the motion characteristics and the image characteristics of each target video frame image can be combined.
  • the neural network model is also the classification network model, which can be based on the image feature samples of the moving target object, the motion feature sample and the data used to indicate the target object.
  • the initial neural network model is trained and used to determine whether there is a model of the target object in the video frame image.
  • the object recognition result that is, the classification result and the discrimination result, is used to indicate whether there are target objects in multiple target video frame images.
  • inputting the motion feature and the image feature of each target video frame image into a pre-trained neural network model to obtain the object recognition result includes: passing each image feature through a convolutional layer, The neural network layer structure of the regularization layer and the activation function layer to obtain multiple first feature vectors; fuse multiple first feature vectors with motion features to obtain a second feature vector; input the second feature vector to the fully connected layer Perform classification to obtain the first classification result.
  • the neural network model includes the neural network layer structure and the fully connected layer.
  • the object recognition result includes the first classification result. The first classification result is used to indicate whether there are multiple target video frames.
  • Target object or pass each image feature through a first neural network layer structure including a convolutional layer, a regularization layer and an activation function layer to obtain multiple first feature vectors; pass a motion feature through a convolutional layer, a regularization layer
  • the second neural network layer structure of the activation function layer is used to obtain the second feature vector; the multiple first feature vectors are merged with the second feature vector to obtain the third feature vector; the third feature vector is input to the fully connected layer to perform Classification to obtain a second classification result, where the neural network model includes a first neural network layer structure, a second neural network layer structure, and a fully connected layer, and the object recognition result includes a second classification result, and the second classification result is used to represent multiple Whether the target object appears in the target video frame image.
  • the overall structure of the neural network model can be divided into a convolutional layer, a regularization layer, an activation function layer, and a fully connected layer.
  • the convolutional layer is composed of several convolutional units. The parameters of each convolutional unit They are all optimized through the back-propagation algorithm; the regularization layer can be used to prevent the over-fitting of the neural network model training, the activation function layer can introduce nonlinearity into the network, and the fully connected layer starts in the entire convolutional neural network. To the role of the classifier.
  • each image feature can be passed through a convolutional layer, a regular
  • the neural network layer structure of the transformation layer and the activation function layer obtains a plurality of first feature vectors, and the plurality of first feature vectors are merged with the aforementioned motion features to obtain a second feature vector, where the motion feature is one-dimensional motion feature.
  • multiple first feature vectors and motion features can be spliced (or called a combination) to obtain a second feature vector.
  • the second feature vector is input to the fully connected layer for classification, that is, the second feature vector is classified through the fully connected layer to obtain the first classification result, wherein the neural network of this embodiment
  • the network model includes the above-mentioned neural network layer structure and the above-mentioned fully connected layer.
  • the first classification result is used to indicate whether the object recognition result of the target object appears in the multiple target video frame images, for example, whether the target object appears in the multiple target video frame images There are classification results of mice.
  • each image feature is passed through a neural network layer structure including a convolution layer, a regularization layer, and an activation function layer to obtain multiple first feature vectors, and the multiple first feature vectors are merged with the motion features, Obtain the second feature vector, input the second feature vector into the fully connected layer for classification, and obtain the first classification result.
  • the method can obtain the target vector corresponding to the target image area represented by the image feature of each target video frame image , Obtain multiple target vectors, and execute the multiple target vectors after forming the first target vector according to the time sequence of each target video frame image in the video file.
  • each image feature is passed through a convolutional layer, a regularization layer and Activate the first neural network layer structure of the activation function layer to obtain a plurality of first feature vectors; pass the above motion features through the second neural network layer structure including the convolutional layer, the regularization layer, and the activation function layer to obtain the second feature vector.
  • the multiple first feature vectors and the second feature vectors are merged to obtain the third feature vector.
  • first feature vectors and second feature vectors can be spliced (or called a combination) to obtain a third feature vector.
  • the neural network model of this embodiment includes a first neural network layer structure and a second neural network layer. Structure and fully connected layer, the object recognition result includes the second classification result, the second classification result is used to indicate whether there are target objects in multiple target video frame images, for example, whether there are rats in multiple target video frame images The classification results.
  • each image feature is passed through a first neural network layer structure including a convolutional layer, a regularization layer, and an activation function layer to obtain a plurality of first feature vectors
  • the motion feature is passed through a convolutional layer, a regularization layer, and
  • the second neural network layer structure of the layer and activation function layer to obtain the second feature vector, fuse multiple first feature vectors with the second feature vector to obtain the third feature vector, and input the third feature vector to the fully connected layer
  • the method of performing classification to obtain the second classification result can obtain a two-dimensional optical flow diagram corresponding to the target image area represented by the image feature of each target video frame image to obtain multiple two-dimensional optical flow diagrams.
  • the two-dimensional optical flow diagram is executed after the three-dimensional second target vector is composed according to the time sequence of each target video frame image in the video file.
  • inputting the motion feature and the image feature of each target video frame image into a pre-trained neural network model to obtain the object recognition result includes: passing each image feature through multiple blocks in turn, Obtain a plurality of first feature vectors, where in each block, the input of the block is sequentially performed on the convolution operation on the convolution layer, the regularization operation on the regularization layer, and the activation operation on the activation function layer; The first feature vector is spliced with the motion feature to obtain the second feature vector; the second feature vector is input to the fully connected layer, and the first classification result is obtained through the output of the fully connected layer.
  • the neural network model includes multiple blocks and full In the connection layer, the object recognition result includes the first classification result.
  • the first classification result is used to indicate whether the target object appears in the multiple target video frame images; or each image feature passes through multiple first blocks in turn to obtain multiple first blocks.
  • the feature passes through multiple second blocks in turn to obtain a second feature vector. In each second block, the input of the second block is sequentially performed on the convolution layer and the regularization operation on the regularization layer.
  • the neural network model includes multiple first blocks, multiple second blocks, and fully connected layers, the object recognition result includes a second classification result, and the second classification result is used to indicate whether there are multiple target video frame images. target.
  • each image feature can also be processed by block.
  • Each image feature can be passed through multiple blocks in turn to obtain multiple first feature vectors.
  • the input of the block will be sequentially performed on the convolution layer and regularization on the regularization layer. Operations and activation operations on the activation function layer.
  • the multiple first feature vectors are obtained, the multiple first feature vectors are spliced with the motion feature to obtain the second feature vector.
  • the second feature vector is obtained, the second feature vector is input to the fully connected layer for classification, and the first classification result is obtained through the output of the fully connected layer.
  • the neural network model of this embodiment includes multiple blocks and a fully connected layer,
  • the object recognition result includes a first classification result, and the first classification result is used to indicate whether a target object appears in a plurality of target video frame images, for example, whether a mouse appears in a plurality of target video frame images.
  • this embodiment processes each image feature through the first block, and passes each image feature through multiple first blocks in turn to obtain multiple first feature vectors.
  • the first feature vector is obtained.
  • a block of input sequentially performs the convolution operation on the convolution layer, the regularization operation on the regularization layer, and the activation operation on the activation function layer.
  • the motion feature can also be processed through the second block, and the motion feature is sequentially passed through multiple second blocks to obtain the second feature vector.
  • the input of the second block is sequentially executed in the volume. Convolution operation on the build-up layer, regularization operation on the regularization layer, and activation operation on the activation function layer.
  • the neural network model of this embodiment includes a plurality of first blocks, a plurality of second blocks, and a fully connected layer.
  • the object recognition result includes the second classification result. The result is used to indicate whether there are target objects in multiple target video frame images, for example, the classification result of whether there are rats in multiple target video frame images.
  • performing frame sampling on a video file to obtain a group of video frame images includes: sampling a video sequence in the video file at equal intervals to obtain a group of video frame images.
  • the video file includes a video sequence.
  • the video sequence in the video file is sampled at equal intervals to obtain a set of video frames. Image, thereby reducing the calculation amount of the algorithm for determining the target object, and then quickly whether there is a target object in multiple target video frames, and improving the efficiency of determining the target object.
  • acquiring a video file captured by a camera device on a target area includes: the acquired video file includes: acquiring a video file captured by an infrared low-light night vision camera on the target area, where in the video file The video frame image is an image captured by an infrared low-light night vision camera.
  • the imaging device may be a camera, for example, an infrared low-light night vision camera, and the infrared low-light night vision camera has an infrared illumination function.
  • the target area is photographed by an infrared low-light night vision camera to obtain a video file, and the video frame image in the video file is an image taken by the infrared low-light night vision camera.
  • the camera device of this embodiment also includes but is not limited to: motion detection function, networking function (such as WIFI networking) and high-definition (such as greater than 1080p) configuration.
  • motion detection function such as WIFI networking
  • high-definition such as greater than 1080p
  • the method further includes: in the case where it is determined that the target object appears in the multiple target video frame images, determining the target The position of the object in multiple target video frames; the position is displayed in multiple target video frames.
  • the target object after determining whether the target object appears in the multiple target video frame images, in the case where it is determined that the target object appears in the multiple target video frame images, it can be further determined that the target object is in the multiple target video frames.
  • the position in the frame image for example, to determine the position of the mouse in multiple target video frame images, and then display the position in multiple target video frame images, for example, display information such as icons and texts used to indicate the position in multiple target video frames.
  • the target video frame image for example, to determine the position of the mouse in multiple target video frame images, and then display the position in multiple target video frame images, for example, display information such as icons and texts used to indicate the position in multiple target video frames.
  • this embodiment can also obtain information such as the time when the target object appears, the active area in the target area, and the location and time of the target object, the specific active area in the target area, and the frequency of activity in the target area.
  • the movement track and other information are output to the front end, the front end is also the display part.
  • the information such as the appearance time and active area of the target object can be displayed on the display interface, thereby avoiding the inefficient determination of the target object caused by the manual determination of the target object For the problem.
  • an alarm message can be sent to the front end.
  • the alarm information is used to indicate that the target object appears in the target area, so that relevant prevention and control personnel can take prevention measures. Measures to improve the efficiency of prevention and control of target objects.
  • the method for determining the target object is executed by a server set locally.
  • the method for determining the target object in this embodiment can be executed by a server set up locally, without connecting to a cloud server, the above calculation and visualization can be realized internally, which avoids that the computing end is on the cloud server, and there will be computing resources and transmission.
  • This embodiment aims to apply image recognition technology, integrate image features and motion features, automatically detect whether there is a target object in the surveillance video, locate and track the target object, and generate the movement trajectory of the target object and the activity in each target area Frequency, the whole process is realized by algorithm, without additional labor cost;
  • this embodiment does not need to place a target capture device to determine the target object in the target area, and does not need to spend manpower for observation, which not only greatly reduces the monitoring of the target object
  • the labor cost improves the efficiency of determining the target object, and further facilitates the work of preventing and controlling the target object.
  • the target object is a mouse as an example.
  • Another method for determining a target object according to an embodiment of the present application.
  • the method also includes:
  • Step S1 Obtain a video file captured by an infrared low-light night vision camera.
  • Step S2 Determine whether there are moving objects in the video file.
  • Step S3 if there is a moving object, extract a video clip with the moving object.
  • Step S4 Perform image feature and dynamic feature extraction on the video clip with moving objects.
  • Step S5 judging whether the moving object is a mouse based on the extracted image features and dynamic features.
  • step S6 if the judgment result is yes, a prompt message is sent.
  • the video file captured by the infrared low-light night vision camera is acquired; it is determined whether there are moving objects in the video file; if there are moving objects, the video clips with moving objects are extracted; the video clips with moving objects are imaged. And dynamic feature extraction; judge whether the moving object is a mouse according to the extracted image features and dynamic features; if the judgment result is yes, then a prompt message will be issued, thereby solving the problem of low efficiency in determining the target object, thereby achieving improvement The effect of rodent detection accuracy.
  • the technical solutions of the embodiments of the present application can be used as a mouse-infested video monitoring method that integrates visual features and trajectory features, and can be used in a variety of scenes to detect whether there are mice in the captured video, through an infrared low-light night vision camera Take a video file of the current environment, and then determine whether there is a moving object. If there is a moving object, perform feature recognition by extracting the video clip of the moving object to further determine whether the extracted moving object is a mouse. If it is determined to be a mouse, a prompt message will be issued , The prompt message can be text displayed on the screen, it can be a sound prompt message, or it can be a variety of types of prompt information such as lighting or flashing.
  • the surveillance camera adopts an infrared low-light night vision camera.
  • the judgment, extraction and other processing processes are performed in the local server, and there is no need to send data to the remote server. It can reduce the amount of data transmission and improve the efficiency of monitoring.
  • the position of the moving object in each frame of the picture in the video file is determined; the preset mark is superimposed on the position corresponding to each frame of picture and displayed on the front-end interface.
  • the preset mark can be a green or red rectangular frame. Mark the position of the mouse in each frame of the picture with a rectangular frame, so that the user can check the position of the mouse and the area frequently seen in time.
  • judging whether there are moving objects in the video file includes: sampling the video sequence in the video file at equal intervals to obtain sampled video frames; judging the sampled video through a dynamic target detection algorithm or a neural network-based target detection algorithm Whether there are moving objects in the frame image.
  • M(x, y) is 1, it means there is a moving target, and all pixels of X(x, y) form the moving target video frame image, and all moving targets can be obtained by merging the pixels through morphological operations.
  • judging whether the moving object is a mouse based on the extracted image features and dynamic features includes: inputting the extracted image features and dynamic features into a pre-trained neural network model, performing model discrimination, and obtaining model output results; Determine whether the moving object is a mouse according to the output result of the model.
  • the extracted image features and dynamic features can be distinguished by the pre-trained neural network model.
  • the model is trained in advance based on a large number of samples. A large number of samples include the picture and whether there is a mouse label in the picture. In this case, you can also include the label of the number of rats in the picture, which can make the model more accurate.
  • the technical solutions of the embodiments of this application can be used in kitchens, restaurants and other application scenarios that need to be monitored for rat infestation, and can also be used in hospitality schools, laboratories, hospitals and other indoor and outdoor places that require environmental hygiene.
  • the image recognition technology of the embodiments of this application is used to detect and track rodents.
  • An independent device is used to monitor rodent infestations locally through a surveillance camera.
  • Rat works are used.
  • the embodiments of this application aim to apply image recognition technology, integrate visual and image sequence features, automatically detect whether there is a mouse in the surveillance video, locate and track the mouse, and generate the movement trajectory route of the mouse and the activity frequency of each area.
  • the process is all implemented by algorithms, without additional labor costs, and is an independent device without connecting to a cloud server, and all calculations and visualizations can be implemented internally.
  • a mouse disease video monitoring device can include several components: an infrared low-light night vision camera, a data processing module and a front-end display component.
  • the working principle of the above device is as follows: the infrared low-light night vision camera is responsible for Collect the scene video sequence, the data processing module receives the video sequence and detects whether there is a mouse in the video. If a mouse is detected, a series of information such as the position of the mouse is output to the front-end display interface.
  • the front-end display interface displays the mouse's position, appearance time, and activity Area and can immediately alarm for rat infestation.
  • FIG. 3 is a schematic diagram of a data connection of each module according to an embodiment of the present application.
  • the video capture module 302 uses a reduced instruction set computer (Reduced Instruction Set Computer, referred to as RISC) microprocessor (Advanced RISC Machines).
  • RISC Reduced Instruction Set Computer
  • ARM Advanced RISC Machines
  • FIG. 3 collects video data, and preprocesses it through the video preprocessing module 3024, the video processing module 304 reads the trained model in the embedded graphics processor (Graphics Processing Unit, referred to as GPU) processor
  • the video processing is performed according to the deep learning algorithm. If the deep learning network model detects a mouse in a certain segment time, the segment and the corresponding detection result are stored in the storage module 306, and the storage module 306 outputs the series of information to the front end .
  • Fig. 4 is a schematic diagram of the principle of a rat infestation detection system according to an embodiment of the present application.
  • the algorithm includes the following modules: preprocessing, target detection, motion feature extraction and classification network.
  • the input of the system is the original video sequence.
  • Preprocessing consists of two steps: frame extraction and dynamic detection.
  • the original video sequence is sampled at equal intervals to reduce the computational complexity of the algorithm, and then the target detection algorithm is used for target detection to determine whether there are moving objects in the image. If there is no moving object, no subsequent detection is performed. If there is a moving object , The video clips of moving objects are input to the subsequent module.
  • each frame of the pre-processed video sequence is detected, and image features (such as the visual information in the detection frame corresponding to the location) are acquired at the location where rats may exist, and the motion feature extraction module is used to The information between each video image frame is fused and feature extraction is performed to prevent the single-frame target detector from misjudgment. Then the extracted motion features and image features are input into the classification network, and the classification network determines whether it is a mouse. If it is a mouse, the rectangular detection frame of the mouse at each frame is transmitted to the front-end display interface.
  • image features such as the visual information in the detection frame corresponding to the location
  • the above-mentioned target detection process allocates two algorithms according to specific machine computing resources: dynamic target detection algorithm and neural network-based target detection algorithm.
  • the former has fast calculation speed and requires machine configuration. Low, the latter is accurate and robust.
  • the dynamic target detection algorithm includes background difference and frame difference methods, using the following formula (1) to calculate the difference between the current frame and the background or the previous frame:
  • (x, y) is the origin of the upper left corner of the image
  • the width direction is the X axis
  • the height direction is the coordinate of the pixel in the coordinate system established by the Y axis
  • k is the index of the current frame
  • f represents the current frame.
  • b represents the background or the previous frame.
  • M(x,y) is a moving image
  • T is a threshold. If M(x,y) is 1, it means there is a moving target. All pixels of X(x,y) form the moving target video frame image, which is combined through morphological operations Pixels can get all moving targets as the output of this module.
  • Fig. 5 is a schematic diagram of a Faster-RCNN network model according to an embodiment of the present application. As shown in Figure 5, where conv is the convolutional layer, the convolution kernel (which is a matrix) draws windows on the input, and the window position of each input is multiplied by the matrix according to formula (3), the result F is output as the feature of the window position.
  • conv is the convolutional layer
  • the convolution kernel which is a matrix
  • RPN is a region proposal network, and a series of candidate frames will be proposed.
  • the region of interest pooling layer maps the region of the feature map mentioned by the convolutional layer into the coordinates of the RPN output to a fixed size (w, h)
  • the input is a classifier composed of a fully connected layer and a border regression, and the border regression outputs the possible coordinate position of the mouse.
  • the output of the classifier is the confidence level of the mouse at that position.
  • the motion feature extraction algorithm first calculates the correlation of the detection frame between frames according to the detection frame obtained in each frame, and the detection frame with a large correlation is considered the same object. Match the detection frame of each frame to obtain a series of moving pictures of the object, and finally use the 3D feature extraction network to extract the features of the motion sequence.
  • the above classification network fusion of the visual information and motion characteristics in the target detection box, input the designed classification network model, used to screen out the picture sequence of non-rats, reduce the false alarm rate, and input the results into the front-end display interface to display the mouse’s Detection frame and track.
  • the overall framework it is also possible but not limited to achieve the purpose of detection and recognition through target detection and classification network, so as to save the cost of framework layout.
  • the embodiment of this application proposes the use of image recognition algorithms to automatically identify mice in surveillance videos, without placing mouse traps in mouse cages, and without having to spend manpower for observation, turning monitoring of rodent damage into an efficient and fully automated process, which not only greatly reduces
  • the labor cost of monitoring rodents is high and the accuracy rate is high, which is convenient for the supervision of the hygiene of the rear kitchen rodents.
  • it can also provide the trajectory of the rat movement, which is convenient for personnel to choose the location of the rodent control tool, which facilitates the further work of detoxification.
  • FIG. 6 is a flowchart of the target object monitoring method according to an embodiment of the present application. As shown in FIG. 6, the process includes the following steps:
  • step S602 when the video surveillance device detects that a moving object appears in the target area, it acquires an image from the target video where the object appears in the video obtained by the video surveillance device shooting the target area;
  • Step S604 The video surveillance device sends the image to the first server, where the image is used to instruct the first server to determine whether the object is the target object according to the image.
  • the target object may include, but is not limited to: rats, pests and other harmful organisms.
  • the target area may include, but is not limited to, a kitchen, a warehouse, a factory building, and so on.
  • the video monitoring device may include, but is not limited to, a camera, a monitor, and so on.
  • the video surveillance device may include, but is not limited to, one or more video surveillance devices.
  • the first server may include, but is not limited to: a first cloud server.
  • a first cloud server For example: Ziyouyun.
  • the first server determines whether the object appearing in the target area is the target object according to the image obtained from the video surveillance device.
  • the image is the video surveillance device from the video when the video surveillance device detects a moving object in the target area.
  • the video obtained by the surveillance equipment shooting the target area is obtained from the target video where the object appears, so the video surveillance equipment only needs to send the possible object to the first server when a moving object is detected in the target area
  • the first server can determine whether the object appearing in the target area is the target object. It can be seen that compared with the method of monitoring the target object based on video, the amount of data transmitted can be greatly reduced, thereby increasing the transmission speed and reducing Transmission time improves monitoring efficiency. Therefore, the problem of low efficiency in monitoring the target object in related technologies can be solved, and the effect of improving the efficiency of monitoring the target object can be achieved.
  • the video surveillance device sends the target video to the second server, where the second server is used for receiving the first request sent by the first server.
  • the target video is sent to the first server in response to the first request.
  • the video surveillance device receives the second request sent by the first server, and the video surveillance device sends the target video to the first server in response to the second request.
  • the video surveillance device in the case that the video surveillance device detects that a moving object appears in the target area, it is obtained from the video surveillance device taking pictures of the target area every predetermined time since the object appears in the target area. Intercept the video image from the video until the object no longer appears in the target area, and the image includes the video image.
  • the video surveillance device sending the image to the first server includes: the video surveillance device sends the intercepted video image to the first server in real time; or the video surveillance device acquires an image set including all the intercepted video images, and sends the image set to The first server.
  • the video surveillance device obtains from the video obtained by shooting the target area from the time the object appears in the target area until the object no longer appears in the target area The first video; the video surveillance device acquires the second video of the first target time period before the object appears in the target area and the third video of the second target time period after the object no longer appears in the target area; the video surveillance device sets the second video Video, the first video and the third video are determined as target videos.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the various embodiments of the present application.
  • a device for monitoring a target object is also provided, which is applied to the first server.
  • the device is used to implement the above-mentioned embodiments and optional implementation manners, and those that have been explained will not be repeated.
  • the term "module" can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 7 is a first structural block diagram of a device for monitoring a target object according to an embodiment of the present application. As shown in Fig. 7, the device includes:
  • the receiving module 72 is configured to receive an image sent by the video surveillance device when a moving object is detected in the target area, where the image is the target of the object appearing in the video obtained from the video surveillance device shooting the target area Images captured on the video;
  • the determining module 74 is configured to determine whether the object is the target object according to the image.
  • the target object may include, but is not limited to: rats, pests and other harmful organisms.
  • the target area may include, but is not limited to, a kitchen, a warehouse, a factory building, and so on.
  • the video monitoring device may include, but is not limited to, a camera, a monitor, and so on.
  • the aforementioned camera may include, but is not limited to, a camera with an infrared lighting function, for example, an infrared low-light night vision camera. Further, the camera may also include but is not limited to: motion detection function, storage function, networking function (such as wifi networking) and high-definition (such as greater than 1080p) configuration.
  • the video surveillance device may include, but is not limited to, one or more video surveillance devices.
  • the first server may include, but is not limited to: a first cloud server.
  • a first cloud server For example: Ziyouyun.
  • the above-mentioned apparatus is further configured to obtain the target video in a case where the object is determined to be the target object.
  • the above-mentioned apparatus is further configured to: obtain a target video from a video surveillance device; or obtain a target video from a second server, where the target video is a situation where a moving object is detected by the video surveillance device in the target area Sent to the second server.
  • the above-mentioned apparatus is further configured to send instruction information to the second server when it is determined that the object is not the target object, where the instruction information is used to instruct the second server to delete the target video.
  • the above-mentioned device is further configured to determine the movement track of the target object in the target area in the target video.
  • the above-mentioned device is further configured to generate prompt information according to the movement track, wherein the prompt information is used to prompt a way to eliminate the target object.
  • the above device is further configured to generate alarm information corresponding to the target object, where the alarm information is used to indicate that the target object appears in the target area, and the alarm information includes at least one of the following: target video, movement track, and prompt information ; Send the alarm information to the client.
  • the determining module is configured to: identify whether the object in each received video image is the target object, and obtain the recognition result corresponding to each video image; and merge the recognition results corresponding to all the received video images into Target result: Determine whether the object is the target object according to the target result.
  • the determining module is further configured to: determine whether an object appears in each video image received; and identify whether the object in the video image where the object appears is the target object.
  • the determining module is configured to: perform target object detection on each target video frame image to obtain the image characteristics of each target video frame image, where the image includes multiple target video frame images obtained from the target video, Each target video frame image is used to indicate the object in the target area, and the image feature is used to indicate the target image area of the object whose similarity with the target object is greater than the first threshold; according to each target video
  • the image characteristics of the frame image determine the motion characteristics, where the motion characteristics are used to indicate the motion speed and direction of the objects in the multiple target video frame images; multiple targets are determined according to the motion characteristics and the image characteristics of each target video frame image Whether the target object appears in the video frame image.
  • the determining module is configured to: obtain a target vector corresponding to the target image area represented by the image feature of each target video frame image to obtain multiple target vectors, wherein each target vector is used to represent a corresponding target The motion speed and direction of the object in the video frame image when it passes through the target image area; multiple target vectors are formed into the first target vector according to the time sequence of each target video frame image in the video file, where the motion feature includes the first target vector Target vector; or, obtain a two-dimensional optical flow diagram corresponding to the target image area represented by the image feature of each target video frame image to obtain multiple two-dimensional optical flow diagrams, where each two-dimensional optical flow diagram includes a corresponding The moving speed and direction of the object in a target video frame image when passing through the target image area; multiple two-dimensional optical flow graphs are formed into a three-dimensional second target vector according to the time sequence of each target video frame image in the video file, Among them, the motion feature includes a three-dimensional second target vector.
  • the determining module is configured to: input the motion characteristics and the image characteristics of each target video frame image into a pre-trained neural network model to obtain an object recognition result, where the object recognition result is used to represent multiple target videos Whether the target object appears in the frame image.
  • the determining module is configured to: pass each image feature through a neural network layer structure including a convolution layer, a regularization layer, and an activation function layer to obtain multiple first feature vectors; combine the multiple first feature vectors with motion Features are fused to obtain the second feature vector; the second feature vector is input to the fully connected layer for classification, and the first classification result is obtained.
  • a neural network layer structure including a convolution layer, a regularization layer, and an activation function layer
  • the determining module is configured to: pass each image feature through a neural network layer structure including a convolution layer, a regularization layer, and an activation function layer to obtain multiple first feature vectors; combine the multiple first feature vectors with motion Features are fused to obtain the second feature vector; the second feature vector is input to the fully connected layer for classification, and the first classification result is obtained.
  • the neural network model includes the neural network layer structure and the fully connected layer, and the object recognition result includes the first Classification result, the first classification result is used to indicate whether there are target objects in multiple target video frame images; or, each image feature is passed through a first neural network layer structure including a convolution layer, a regularization layer, and an activation function layer , Obtain multiple first feature vectors; pass the motion feature through the second neural network layer structure including the convolution layer, the regularization layer, and the activation function layer to obtain the second feature vector; combine the multiple first feature vectors with the second feature The vector is fused to obtain the third feature vector; the third feature vector is input to the fully connected layer for classification, and the second classification result is obtained.
  • the neural network model includes the first neural network layer structure, the second neural network layer structure and the full In the connection layer, the object recognition result includes a second classification result, and the second classification result is used to indicate whether the target object appears in the multiple target video frame images.
  • the receiving module is configured to receive multiple target video frame images sent by the video surveillance device, where the multiple target video frame images are obtained by sampling the target video by the video surveillance device to obtain a set of video frame images, And determined in a set of video frame images according to the pixel values of pixels in a set of video frame images; or,
  • another target object monitoring device is also provided, which is applied to video monitoring equipment.
  • the device is used to implement the above-mentioned embodiments and optional implementation modes, and those that have been described will not be repeated.
  • the term "module" can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 8 is a second structural block diagram of a device for monitoring a target object according to an embodiment of the present application. As shown in Fig. 8, the device includes:
  • the acquiring module 82 is configured to acquire an image from the target video where the object appears in the video obtained by the video surveillance device shooting the target area in the case of detecting that a moving object appears in the target area;
  • the sending module 84 is configured to send the image to the first server, where the image is used to instruct the first server to determine whether the object is the target object according to the image.
  • the above-mentioned device is further configured to send the target video to a second server in the case that a moving object is detected in the target area, where the second server is configured to receive the first server sent by the first server.
  • the target video is sent to the first server in response to the first request.
  • the above device is further configured to: receive a second request sent by the first server; and send the target video to the first server in response to the second request.
  • the acquisition module is set to: in the case that the video surveillance device detects that a moving object appears in the target area, it is obtained from the video surveillance device taking pictures of the target area every predetermined time since the object appears in the target area Intercept the video image from the video until the object no longer appears in the target area, and the image includes the video image;
  • the sending module is configured to: the video surveillance device sends the intercepted video images to the first server in real time; or, the video surveillance device acquires an image set including all the intercepted video images, and sends the image set to the first server.
  • the above-mentioned device is further configured to: in the case of detecting that a moving object appears in the target area, acquire from the video obtained by shooting the target area from the occurrence of the object in the target area until the target area no longer appears The first video until the object; the second video of the first target time period before the object appears in the target area and the third video of the second target time period after the object no longer appears in the target area; the second video, the first The first video and the third video are determined as target videos.
  • each of the above modules can be implemented by software or hardware.
  • it can be implemented in the following manner, but not limited to this: the above modules are all located in the same processor; or, the above modules are combined in any combination The forms are located in different processors.
  • FIG. 9 is a structural block diagram of the target object monitoring system according to an embodiment of the present application. As shown in FIG. 9, the system includes: a video monitoring device 92 and a second One server 94, of which,
  • the video monitoring device 92 is connected to the first server 94;
  • the video monitoring device 92 is configured to obtain an image from the target video where the object appears in the video obtained by shooting the target area when a moving object is detected in the target area, and send the image to the first server 94 ;
  • the first server 94 is configured to determine whether the object is a target object based on the image.
  • the video surveillance device is set to: in the case of detecting that a moving object appears in the target area, start from the occurrence of the object in the target area from the video obtained by the video surveillance device shooting the target area at predetermined intervals Intercept the video image until the object no longer appears in the target area, the image includes the video image; send the intercepted video image to the first server in real time; or obtain an image set including all the intercepted video images, and send the image set To the first server.
  • the first server is configured to: identify whether an object in each received video image is a target object, and obtain a recognition result corresponding to each video image; and merge the recognition results corresponding to all received video images Is the target result; according to the target result, determine whether the object is the target object.
  • the first server is further configured to: when the object is determined to be the target object, obtain the target video; determine the movement track of the target object in the target area in the target video; generate prompt information according to the movement track, wherein , The prompt information is used to prompt the way to eliminate the target object; the alarm information corresponding to the target object is generated, where the alarm information is used to indicate that the target object appears in the target area, and the alarm information includes at least one of the following: target video, moving track, Prompt information.
  • the above system further includes: a client, wherein the first server is connected to the client; the first server is set to send alarm information to the client; the client is set to display alarm information on a display interface.
  • the above system further includes: a second server, wherein the second server is connected to the video monitoring device and the first server; the video monitoring device is further configured to send the video to the second server; the second server is configured to store the target video ; The first server is set to obtain the target video from the second server.
  • the first server is further configured to send instruction information to the second server in a case where it is determined that the object is not the target object; the second server is configured to delete the target video in response to the instruction information.
  • the video monitoring device is further configured to: obtain the first video from the video obtained by shooting the target area from the time the object appears in the target area until the object no longer appears in the target area; obtain the first video before the object appears in the target area The second video in the first target time period and the third video in the second target time period after the object no longer appears in the target area; the second video, the first video and the third video are determined as the target videos.
  • FIG. 10 is a schematic diagram of a monitoring architecture of a target object according to an optional embodiment of the present application.
  • a system architecture is proposed. Information on the external environment and pest activities. The system has the characteristics of rapid deployment. There is no need to deploy a server on the customer site. It only needs video surveillance equipment to collect data and deploy a wireless network environment for data upload. All subsequent calculations and analysis are completed in the cloud, which greatly saves the hardware of the system. Cost, complexity of system deployment, and can also excellently complete functions such as real-time warning of pests, video playback, path analysis, and rodent control and pest control recommendations. The system also combines pest monitoring and pest control, forming a benign closed loop, and assisting the actual pest control work as a whole.
  • the system includes the following parts: a data collection part, a data analysis part, an instant alarm part, a video playback part, a path analysis part, and an application (APP) display part.
  • a data collection part a data analysis part
  • an instant alarm part a data analysis part
  • a video playback part a video playback part
  • a path analysis part a path analysis part
  • APP application
  • the data collection part is used to collect video and picture collections.
  • an indoor environment can deploy multiple sets of monitoring equipment. Taking into account the characteristics of rats appearing at night, the video surveillance equipment needs infrared night vision function.
  • Video surveillance equipment uses motion detection. When there are any changes in the content of the filmed picture (for example, when there is a mouse, a cockroach, or a foreign object flies in), the video in the period is written into the SD card (usually it will Pre-record and delay the video for 5 seconds, so that the video can record a complete action), upload the video data to the video cloud server (ie fluorite cloud, or other public clouds).
  • the video surveillance equipment has the function of resuming the transmission when the network environment is unstable. It can also ensure that the video is uploaded to the video cloud server later.
  • the video cloud server is set to temporarily save the video data. After the image recognition and analysis of the pictures, it is confirmed that there are pests and rodents, for the retrieval and playback, and further analysis.
  • the video surveillance equipment saves and uploads the video, while saving a picture every 500 milliseconds (ms), and uploads the picture to its own cloud server in real time for image recognition.
  • the self-owned cloud server After receiving the picture, the self-owned cloud server immediately completes the image recognition of the picture, using artificial intelligence (AI) technology to determine whether there are target pests in the image, such as mice, cockroaches, etc., or just Non-insect attack scenes such as foreign objects flying in. Enter the data analysis part.
  • AI artificial intelligence
  • the data analysis part uses its own cloud to perform image recognition, and applies image recognition algorithms to the images returned by video surveillance equipment to recognize rats, cockroaches and other pests.
  • image recognition it is considered that rodents and pests have been found at that moment, and a request is sent to the video cloud server to retrieve and download the video data of pests and rodents in this time period for further analysis (when the server receives continuous pictures After the collection is received, and it is judged that there is a pest intrusion, the video of the entire time period is requested in real time; when the recognition is false, the dynamic recognition at that moment is considered to be irrelevant to the pest and no further processing is performed.
  • the instant alarm part can be used for emergency rodent control.
  • the cloud server sends an alarm message to the user terminal to instruct restaurant operators and pest control personnel to take measures. It also provides image playback to mark the identified pests such as rats and cockroaches, so that the operator can make a preliminary judgment on the location and hazards of the animals, and take timely control measures.
  • the emergency deratization scene is suitable for the monitoring of places where rodent infestation is not allowed, such as computer rooms, hospitals, etc., with people on duty. Immediately instruct relevant personnel to take measures after discovering the rodent situation, and the system is responsible for providing pictures and video playback in time for reference to rodent control.
  • the alarm information can also be sent via SMS, push information, etc.
  • Video playback part When the video cloud server returns the requested video data and downloads it to its own cloud, the user terminal can access the video playback data.
  • the speed of video downloading depends on whether the network is unblocked or not. It is slightly slower than the real-time picture display. Generally, the video playback data can be obtained within a few minutes after the rat situation occurs.
  • the path analysis part extracts the movement paths of pests such as mice and cockroaches through further analysis of the video data, and marks the intrusion point, hiding point, travel route, activity duration, skin color and other information when the rat is infested for the purpose of formulating mouse control ,
  • the further program of insect control is displayed on the user terminal.
  • the mouse path display can be indicated by punctuation, with a string of numbers from small to large on the line segment to indicate the direction of the mouse or cockroach.
  • the APP display part can display rodent and insect-killing recommendations, which are used for conventional pest control, summarize the pest information collected at each contact point, and visualize the historical path of pests and rodents. It is suitable for the deployment of sticky boards and cockroaches based on the location. The location of equipment such as the house, and suggestions for placement are given.
  • the data dimensions used for display can also include the active duration of pests and rodents on the previous day/night, the types of pests, and the number of catches.
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
  • the foregoing storage medium may be configured to store a computer program for executing the following steps:
  • the first server receives an image sent by the video surveillance device when a moving object is detected in the target area, where the image is a target video where the object appears in the video obtained from the video surveillance device shooting the target area Images acquired on
  • S2 The first server determines whether the object is the target object according to the image.
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs such as mobile hard disks, magnetic disks, or optical disks.
  • U disk Read-Only Memory
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Various media that can store computer programs such as mobile hard disks, magnetic disks, or optical disks.
  • An embodiment of the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • the first server receives an image sent by the video surveillance device when a moving object is detected in the target area, where the image is a target video where the object appears in the video obtained from the video surveillance device shooting the target area Images acquired on
  • S2 The first server determines whether the object is the target object according to the image.
  • modules or steps of the present application can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, can be executed in a different order than here.
  • this application receives through the first server the image sent by the video surveillance device when a moving object is detected in the target area, where the image is taken from the video surveillance device to the target area
  • the obtained video shows the image obtained on the target video of the object
  • the first server determines whether the object is the target object according to the image
  • the first server determines whether the object appearing in the target area is the target according to the image obtained from the video surveillance device
  • the object the image is obtained from the target video where the object appears in the video obtained by the video surveillance device shooting the target area when the video surveillance device detects that a moving object appears in the target area, thus the video surveillance
  • the device only needs to send an image of the possible object to the first server when it detects a moving object in the target area, and the first server can determine whether the object in the target area is the target object according to the received image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Alarm Systems (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种目标对象的监控方法、装置及系统,该方法包括:第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;第一服务器根据图像确定对象是否为目标对象。通过本申请,解决了相关技术中对目标对象进行监控的效率较低的问题,进而达到了提高对目标对象进行监控的效率的效果。

Description

目标对象的监控方法、装置及系统 技术领域
本申请涉及计算机领域,具体而言,涉及一种目标对象的监控方法、装置及系统。
背景技术
目前对目标对象进行监控的方法通常是在拍摄的视频中识别目标对象,但是这种方式往往效率较低。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种目标对象的监控方法、装置及系统,以至少解决相关技术中对目标对象进行监控的效率较低的问题。
根据本申请的一个实施例,提供了一种目标对象的监控方法,包括:第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,所述图像是从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取的图像;所述第一服务器根据所述图像确定所述对象是否为目标对象。
可选地,在所述第一服务器根据所述图像确定所述对象是否为目标对象之后,所述方法还包括:在确定出所述对象为所述目标对象的情况下,所述第一服务器获取所述目标视频。
可选地,所述第一服务器获取所述目标视频包括:所述第一服务器从所述视频监控设备获取所述目标视频;或者,所述第一服务器从第二服务器获取所述目标视频,其中,所述目标视频是由所述视频监控设备在检测到目标区域中出现了移动的对象的情况下发送至所述第二服务器的。
可选地,在所述第一服务器根据所述图像确定所述对象是否为目标对 象之后,所述方法还包括:在确定出所述对象不为所述目标对象的情况下,所述第一服务器向所述第二服务器发送指示信息,其中,所述指示信息用于指示所述第二服务器删除所述目标视频。
可选地,在所述第一服务器获取所述目标视频之后,所述方法还包括:所述第一服务器在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹。
可选地,在所述第一服务器在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹之后,所述方法还包括:所述第一服务器根据所述移动轨迹生成提示信息,其中,所述提示信息用于提示消除所述目标对象的方式。
可选地,在所述第一服务器根据所述移动轨迹生成提示信息之后,所述方法还包括:所述第一服务器生成所述目标对象对应的告警信息,其中,所述告警信息用于指示在所述目标区域出现了所述目标对象,所述告警信息中包括以下至少之一:所述目标视频、所述移动轨迹、所述提示信息;所述第一服务器将所述告警信息发送至客户端。
可选地,在第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像之前,所述方法还包括:所述视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;所述视频监控设备将截取的所述视频图像实时发送至所述第一服务器;或者,所述视频监控设备获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
可选地,所述第一服务器根据所述图像确定所述对象是否为目标对象包括:所述第一服务器识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象,得到每一张所述视频图像对应的识别结果;所述第一服务器将接收到的全部所述视频图像对应的识别结果融合为目标结果;所 述第一服务器根据所述目标结果确定所述对象是否为目标对象。
可选地,所述第一服务器识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象包括:所述第一服务器确定接收到的每一张所述视频图像中是否出现了所述对象;所述第一服务器识别出现了所述对象的所述视频图像中的所述对象是否为所述目标对象。
可选地,所述第一服务器根据所述图像确定所述对象是否为目标对象包括:
所述第一服务器对每个目标视频帧图像进行目标对象的检测,得到每个所述目标视频帧图像的图像特征,其中,所述图像包括从所述目标视频上获取的多个目标视频帧图像,每个所述目标视频帧图像用于指示在所述目标区域中的所述对象,所述图像特征用于表示在所述对象中,与所述目标对象之间的相似度大于第一阈值的对象所在的目标图像区域;
所述第一服务器根据每个所述目标视频帧图像的图像特征确定出运动特征,其中,所述运动特征用于表示所述多个目标视频帧图像中所述对象的运动速度和运动方向;
所述第一服务器根据所述运动特征和每个所述目标视频帧图像的图像特征,确定所述多个目标视频帧图像中是否出现有所述目标对象。
可选地,所述第一服务器根据每个所述目标视频帧图像的图像特征确定出运动特征包括:
获取与每个所述目标视频帧图像的图像特征所表示的目标图像区域对应的目标矢量,得到多个目标矢量,其中,每个所述目标矢量用于表示对应的一个所述目标视频帧图像中所述对象在经过所述目标图像区域时的运动速度和运动方向;将所述多个目标矢量按照每个所述目标视频帧图像在所述视频文件中的时间顺序组成第一目标向量,其中,所述运动特征包括所述第一目标向量;或者
获取与每个所述目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图,得到多个二维光流图,其中,每个所述二维光流图包 括对应的一个所述目标视频帧图像中所述对象在经过所述目标图像区域时的运动速度和运动方向;将所述多个二维光流图按照每个所述目标视频帧图像在所述视频文件中的时间顺序组成三维第二目标向量,其中,所述运动特征包括所述三维第二目标向量。
可选地,所述第一服务器根据所述运动特征和每个所述目标视频帧图像的图像特征,确定所述多个目标视频帧图像中是否出现有所述目标对象包括:
将所述运动特征和每个所述目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果,其中,所述对象识别结果用于表示所述多个目标视频帧图像中是否出现有所述目标对象。
可选地,将所述运动特征和每个所述目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果包括:
将每个所述图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量;将所述多个第一特征向量与所述运动特征进行融合,得到第二特征向量;将所述第二特征向量输入到全连接层进行分类,得到第一分类结果,其中,所述神经网络模型包括所述神经网络层结构和所述全连接层,所述对象识别结果包括所述第一分类结果,所述第一分类结果用于表示所述多个目标视频帧图像中是否出现有所述目标对象;或者
将每个所述图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一特征向量;将所述运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量;将所述多个第一特征向量与所述第二特征向量进行融合,得到第三特征向量;将所述第三特征向量输入到全连接层进行分类,得到第二分类结果,其中,所述神经网络模型包括所述第一神经网络层结构、所述第二神经网络层结构和所述全连接层,所述对象识别结果包括所述第二分类结果,所述第二分类结果用于表示所述多个目标视频帧图像中是否出现有所述目 标对象。
可选地,所述第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像包括:
所述第一服务器接收视频监控设备发送的所述多个目标视频帧图像,其中,所述多个目标视频帧图像是通过所述视频监控设备对所述目标视频进行抽帧采样,得到一组视频帧图像,并根据所述一组视频帧图像中的像素点的像素值在所述一组视频帧图像中确定的;或者,
所述第一服务器接收视频监控设备发送的一组视频帧图像,其中,所述一组视频帧图像是通过所述视频监控设备对所述目标视频进行抽帧采样得到的;所述第一服务器根据所述一组视频帧图像中的像素点的像素值在所述一组视频帧图像中确定出所述多个目标视频帧图像。
可选地,所述第一服务器包括:第一云服务器。
可选地,所述第二服务器包括:第二云服务器。
根据本申请的另一个实施例,提供了一种目标对象的监控方法,包括:视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像;所述视频监控设备将所述图像发送至第一服务器,其中,所述图像用于指示所述第一服务器根据所述图像确定所述对象是否为目标对象。
可选地,在检测到目标区域中出现了移动的对象的情况下,所述方法还包括:所述视频监控设备将所述目标视频发送至第二服务器,其中,所述第二服务器设置为在接收到所述第一服务器发送的第一请求的情况下,响应所述第一请求将所述目标视频发送至所述第一服务器。
可选地,在所述视频监控设备将所述图像发送至第一服务器之后,所述方法还包括:所述视频监控设备接收所述第一服务器发送的第二请求;所述视频监控设备响应所述第二请求将所述目标视频发送至所述第一服务器。
可选地,从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像包括:所述视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;所述视频监控设备将所述图像发送至第一服务器包括:所述视频监控设备将截取的所述视频图像实时发送至所述第一服务器;或者,所述视频监控设备获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
可选地,在检测到目标区域中出现了移动的对象的情况下,所述方法还包括:所述视频监控设备从对所述目标区域进行拍摄得到的视频中获取从所述目标区域中出现所述对象开始直至所述目标区域中不再出现所述对象为止的第一视频;所述视频监控设备获取所述目标区域中出现所述对象之前的第一目标时间段的第二视频以及所述目标区域中不再出现所述对象之后的第二目标时间段的第三视频;所述视频监控设备将所述第二视频,所述第一视频和所述第三视频确定为所述目标视频。
根据本申请的另一个实施例,提供了一种目标对象的监控系统,包括:视频监控设备和第一服务器,其中,所述视频监控设备与所述第一服务器连接;所述视频监控设备设置为在检测到目标区域中出现了移动的对象的情况下,从对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像,并将所述图像发送至所述第一服务器;所述第一服务器设置为根据所述图像确定所述对象是否为目标对象。
可选地,所述视频监控设备设置为:在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;将截取的所述视频图像实时发送至所述第一服务器;或者,获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
可选地,所述第一服务器设置为:识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象,得到每一张所述视频图像对应的识别结果;将接收到的全部所述视频图像对应的识别结果融合为目标结果;根据所述目标结果确定所述对象是否为目标对象。
可选地,所述第一服务器还设置为:在确定出所述对象为所述目标对象的情况下,获取所述目标视频;在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹;根据所述移动轨迹生成提示信息,其中,所述提示信息用于提示消除所述目标对象的方式;生成所述目标对象对应的告警信息,其中,所述告警信息用于指示在所述目标区域出现了所述目标对象,所述告警信息中包括以下至少之一:所述目标视频、所述移动轨迹、所述提示信息。
可选地,所述系统还包括:客户端,其中,所述第一服务器与所述客户端连接;所述第一服务器设置为将所述告警信息发送至所述客户端;所述客户端设置为在显示界面上显示所述告警信息。
可选地,所述系统还包括:第二服务器,其中,所述第二服务器与所述视频监控设备和所述第一服务器连接;所述视频监控设备还设置为将所述视频发送至所述第二服务器;所述第二服务器设置为存储所述目标视频;所述第一服务器设置为从所述第二服务器获取所述目标视频。
可选地,所述第一服务器还设置为:在确定所述对象不为所述目标对象的情况下,向所述第二服务器发送指示信息;所述第二服务器设置为:响应所述指示信息删除所述目标视频。
可选地,所述视频监控设备还设置为:从对所述目标区域进行拍摄得到的视频中获取从所述目标区域中出现所述对象开始直至所述目标区域中不再出现所述对象为止的第一视频;获取所述目标区域中出现所述对象之前的第一目标时间段的第二视频以及所述目标区域中不再出现所述对象之后的第二目标时间段的第三视频;将所述第二视频,所述第一视频和所述第三视频确定为所述目标视频。
根据本申请的另一个实施例,提供了一种目标对象的监控装置,应用于第一服务器,包括:接收模块,设置为接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,所述图像是从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取的图像;确定模块,设置为根据所述图像确定所述对象是否为目标对象。
根据本申请的另一个实施例,提供了一种目标对象的监控装置,应用于视频监控设备,包括:获取模块,设置为在检测到目标区域中出现了移动的对象的情况下,从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像;发送模块,设置为将所述图像发送至第一服务器,其中,所述图像用于指示所述第一服务器根据所述图像确定所述对象是否为目标对象。
根据本申请的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。
通过本申请,通过第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;第一服务器根据图像确定对象是否为目标对象的方式,第一服务器根据从视频监控设备获取的图像确定目标区域中出现的对象是否为目标对象,该图像是视频监控设备在检测到目标区域中出现了移动的对象的情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的,由此视频监控设备只需在检测到目标区域中出现了移动的对象的情况下向第一服务器发送可能存在对象的图像,第一服务器即可根据接收 到的图像确定目标区域出现的对象是否为目标对象,可见相对于根据视频监控目标对象的方式,能够大大减少传输数据的数据量,从而提高传输速度,减少传输时间,提高监控效率。因此,可以解决相关技术中对目标对象进行监控的效率较低的问题,达到提高对目标对象进行监控的效率的效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是本申请实施例的一种目标对象的监控方法的移动终端的硬件结构框图;
图2是根据本申请实施例的目标对象的监控方法的流程图一;
图3是根据本申请实施例的一种各模块数据连接的示意图;
图4是根据本申请实施例的一种鼠患检测系统的原理示意图;
图5是本申请实施例的一种Faster-RCNN网络模型的示意图;
图6是根据本申请实施例的目标对象的监控方法的流程图二;
图7是根据本申请实施例的目标对象的监控装置的结构框图一;
图8是根据本申请实施例的目标对象的监控装置的结构框图二;
图9是根据本申请实施例的目标对象的监控系统的结构框图;
图10是根据本申请可选实施例的目标对象的监控架构的示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序 或先后次序。
本申请实施例一所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本申请实施例的一种目标对象的监控方法的移动终端的硬件结构框图。如图1所示,移动终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,可选地,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的目标对象的监控方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106设置为经由一个网络接收或者发送数据。上述的网络可选实例可包括移动终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。
在本实施例中提供了一种目标对象的监控方法,图2是根据本申请实 施例的目标对象的监控方法的流程图一,如图2所示,该流程包括如下步骤:
步骤S202,第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;
步骤S204,第一服务器根据图像确定对象是否为目标对象。
可选地,在本实施例中,目标对象可以但不限于包括:老鼠,害虫等等有害生物。
可选地,在本实施例中,目标区域可以但不限于包括:厨房、仓库、厂房等等。
可选地,在本实施例中,视频监控设备可以但不限于包括:摄像头、监控器等等。
可选地,上述摄像头可以包括但不限于:带有红外照明功能的摄像头,例如,红外微光夜视摄像头。进一步,该摄像头还可以包括但不限于:移动侦测功能、存储功能、联网功能(如无线保真(WIerless Fidelity,简称为WIFI)联网)及高清晰度(如大于1080p)配置。
可选地,在本实施例中,视频监控设备可以但不限于包括一个或者多个视频监控设备。
可选地,在本实施例中,第一服务器可以但不限于包括:第一云服务器。例如:自有云。
通过上述步骤,第一服务器根据从视频监控设备获取的图像确定目标区域中出现的对象是否为目标对象,该图像是视频监控设备在检测到目标区域中出现了移动的对象的情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的,由此视频监控设备只需在检测到目标区域中出现了移动的对象的情况下向第一服务器发送可能存在对象的图像,第一服务器即可根据接收到的图像确定目标区域出现的对象是否为目标对象,可见相对于根据视频监控目标对象的方式,能够大大 减少传输数据的数据量,从而提高传输速度,减少传输时间,提高监控效率。因此,可以解决相关技术中对目标对象进行监控的效率较低的问题,达到提高对目标对象进行监控的效率的效果。
可选地,第一服务器可以在确定了目标区域中出现的对象为目标对象的情况下,再获取目标视频,如果目标区域中出现的对象不为目标对象,则不再获取目标视频,从而节省资源。例如:在上述步骤S204之后,在确定出对象为目标对象的情况下,第一服务器获取目标视频。
可选地,目标视频的存储位置可以但不限于包括视频监控设备或者第二服务器。例如:第一服务器可以但不限于通过以下方式之一获取目标视频:
方式一,第一服务器从视频监控设备获取目标视频。
方式二,第一服务器从第二服务器获取目标视频,其中,目标视频是由视频监控设备在检测到目标区域中出现了移动的对象的情况下发送至第二服务器的。
可选地,在本实施例中,第二服务器可以但不限于包括:第二云服务器。例如:萤石云。
可选地,视频监控设备可以将目标视频发送给第二服务器,如果第一服务器根据图像确定目标区域出现的对象不为目标对象,则可以发送指示信息给第二服务器,以指示第二服务器将目标视频删除,从而节省存储空间。例如:在上述步骤S204之后,在确定出对象不为目标对象的情况下,第一服务器向第二服务器发送指示信息,其中,指示信息用于指示第二服务器删除目标视频。
可选地,第一服务器获取到目标视频之后,可以从目标视频中分析出目标对象在目标区域中的移动轨迹。例如:在第一服务器获取目标视频之后,第一服务器在目标视频中确定出目标对象在目标区域中的移动轨迹。
可选地,第一服务器可以根据分析出的目标对象的移动轨迹生成消除目标对象的建议,提供给用户。例如:在第一服务器在目标视频中确定出 目标对象在目标区域中的移动轨迹之后,第一服务器根据移动轨迹生成提示信息,其中,提示信息用于提示消除目标对象的方式。
可选地,第一服务器可以向客户端发送携带有目标视频、移动轨迹、提示信息的告警信息,来向用户提供目标对象的告警,并将目标对象的移动轨迹,如何消除目标对象的方式,以及目标对象移动过程的回放提供给用户,供其参考。例如:在第一服务器根据移动轨迹生成提示信息之后,第一服务器生成目标对象对应的告警信息,其中,告警信息用于指示在目标区域出现了目标对象,告警信息中包括以下至少之一:目标视频、移动轨迹、提示信息;第一服务器将告警信息发送至客户端。
可选地,在上述步骤S202之前,视频监控设备可以但不限于通过以下方式获取发送给第一服务器的图像:视频监控设备在检测到目标区域中出现了移动的对象的情况下,从目标区域中出现了对象开始每隔预定时间从视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至对象不再出现在目标区域中,图像包括视频图像;视频监控设备将截取的视频图像实时发送至第一服务器;或者,视频监控设备获取包括截取到的全部视频图像的图像集,并将图像集发送至第一服务器。
可选地,视频监控设备发送给第一服务器的图像可以为多张图像,第一服务器可以对每一张图像进行识别,得到识别结果,再融合这些识别结果,得到最终的目标结果。例如:在上述步骤S204中,第一服务器识别接收到的每一张视频图像中的对象是否为目标对象,得到每一张视频图像对应的识别结果;第一服务器将接收到的全部视频图像对应的识别结果融合为目标结果;第一服务器根据目标结果确定对象是否为目标对象。
可选地,第一服务器可以但不限于通过以下方式识别视频图像中的对象是否为目标对象:
第一服务器确定接收到的每一张视频图像中是否出现了对象;
第一服务器识别出现了对象的视频图像中的对象是否为目标对象。
可选地,在上述步骤S204中,可以但不限于通过以下方式识别目标 对象:
第一服务器对每个目标视频帧图像进行目标对象的检测,得到每个目标视频帧图像的图像特征,其中,图像包括从目标视频上获取的多个目标视频帧图像,每个目标视频帧图像用于指示在目标区域中的对象,图像特征用于表示在移动的对象中,与目标对象之间的相似度大于第一阈值的对象所在的目标图像区域;
第一服务器根据每个目标视频帧图像的图像特征确定出运动特征,其中,运动特征用于表示多个目标视频帧图像中对象的运动速度和运动方向;
第一服务器根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象。
可选地,在本实施例中,还提供了一种目标对象的确定方法。假设视频监控设备为摄像设备,获取到的图像是从目标视频中提取出的图像帧。上述方法包括以下步骤:
步骤S1,获取摄像设备对目标区域拍摄得到的视频文件。
在本申请上述步骤S1提供的技术方案中,摄像设备可以为监控摄像头,比如,该摄像设备为红外微光夜视摄像头,用于对目标区域进行拍摄监控,得到视频文件。其中,目标区域为目标建筑内被检测的空间区域,也即,用于检测是否有目标对象出现的区域,该目标对象可以为体型较大的需要进行防治的病媒生物,比如,该目标对象为老鼠。
该实施例的视频文件包括对目标区域进行拍摄得到的原始视频数据,可以包括目标区域的监控视频序列,该监控视频序列也即图像视频序列。
可选地,该实施例在视频数据采集层通过ARM板获取目标区域的原始视频数据,以生成上述视频文件,从而实现了对目标区域的视频进行采集的目的。
步骤S2,对视频文件进行抽帧采样,得到一组视频帧图像。
在本申请上述步骤S2提供的技术方案中,在获取摄像设备对目标区域拍摄得到的视频文件之后,对视频文件进行预处理,可以在视频数据处理层对视频文件进行抽帧采样,得到一组视频帧图像。
在该实施例中,可以对视频文件进行等间隔的抽帧采样,从而得到视频文件的一组视频帧图像,比如,视频文件包括100个视频帧序列,在进行抽帧采样之后,得到10个视频帧序列,则将这10个视频帧序列作为上述一组视频帧图像,从而减少对目标对象进行确定的算法的运算量。
步骤S3,根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像。
在本申请上述步骤S3提供的技术方案中,在对视频文件进行抽帧采样,得到一组视频帧图像之后,根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像,其中,每个目标视频帧图像用于指示在对应的目标区域中移动的对象。
在该实施例中,对视频文件进行预处理,还包括对视频文件进行动态检测,从一组视频帧图像中确定用于指示在目标区域中移动的对象的目标视频帧图像,也即,在该目标视频帧图像中移动的对象,该目标视频帧图像可以为移动的对象的视频片段,其中,移动的对象可能是目标对象,也可能不是。该实施例可以通过动态检测算法确定目标视频帧图像,根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像,进而执行步骤S4。
可选地,在一组视频帧图像中,除多个目标视频帧图像之外的视频帧图像未指示出在对应的目标区域中有运动的图像,可以不进行后续的检测。
步骤S4,对每个目标视频帧图像进行目标对象检测,得到每个目标视频帧图像的图像特征。
在本申请上述步骤S4提供的技术方案中,在根据一组视频帧图像中 的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像之后,对每个目标视频帧图像进行目标对象检测,得到每个目标视频帧图像的图像特征,其中,图像特征针对每个目标视频帧图像而言,用于表示在移动的对象中,与目标对象之间的相似度大于第一阈值的对象所在的目标图像区域。
在该实施例中,对每个目标视频帧图像进行目标对象检测,也即,对目标视频帧图像中存在的运动对象进行检测,可以通过目标检测系统采用动态目标检测方法和基于神经网络的目标检测方法对目标视频帧图像中存在的运动对象进行检测,得到每个目标视频帧图像的图像特征,其中,动态目标检测方法的运算速度快、对机器配置要求较低,而基于神经网络的目标检测方法的准确性和鲁棒性更好,图像特征可以为矩形框中的视觉信息,用于表示目标图像区域,该矩形框可以为检测框,用于表示在移动的对象中,与所要识别的目标对象之间的相似度大于第一阈值的对象所在的目标图像区域。也就是说,上述图像特征用于指示粗筛确认出的目标对象可能出现的位置。
步骤S5,根据每个目标视频帧图像的图像特征确定出运动特征。
在本申请上述步骤S5,提供的技术方案中,在对每个目标视频帧图像进行目标对象检测,得到每个目标视频帧图像的图像特征之后,根据每个目标视频帧图像的图像特征确定出运动特征,其中,运动特征用于表示多个目标视频帧图像中移动的对象的运动速度和运动方向。
在该实施例中,在对每个目标视频帧图像进行目标对象检测,得到每个目标视频帧图像的图像特征之后,可以将每个目标视频帧图像的图像特征输入至运动特征提取模块,该运动特征提取模块根据每个目标视频帧图像的图像特征确定出运动特征,该运动特征针对多个目标视频帧图像而言,用于表示多个目标视频帧图像中移动的对象的运动速度和运动方向,同时进一步过滤掉非目标对象的移动所造成的干扰图像,比如,删除掉蚊虫的 移动等干扰信息。
可选地,在该实施例中,由于每个目标视频帧图像中移动的对象的运动是连续的,运动特征提取模块的运动特征提取算法可以先根据每个目标视频帧图像的图像特征检测多个目标视频帧图像之间的图像特征的相关性,可以将相关性大的图像特征对应的对象确定为同一对象,对每一目标视频帧图像的图像特征进行匹配,得到对象的一系列运动图片,最后可以使用三维(3-Dimension,简称为3D)的特征提取网络提取运动序列的特征,从而得到运动特征,比如,根据每个目标视频帧图像的检测框,计算多个目标视频帧图像之间检测框的相关性,可以将相关性大的检测框对应的对象确定为同一对象,对每个目标视频帧图像的检测框进行匹配,得到对象的一系列运动图片,最后使用3D的特征提取网络提取运动序列的特征,得到运动特征,进而确定多个目标视频帧图像中移动的对象的运动速度和运动方向。
可选地,该实施例也可以将多个目标视频帧图像的图像特征进行融合和且进行特征提取,从而防止单帧的目标检测器出现误判的情况,进而实现对目标视频帧图像进行精筛以准确确定出是否出现目标对象。
步骤S6,根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象。
在本申请上述步骤S6提供的技术方案中,在根据每个目标视频帧图像的图像特征确定出运动特征之后,可以将运动特征和每个目标视频帧图像的图像特征进行融合,输入至预先训练好的分类网络中,该分类网络为预先设计好的用于确定多个目标视频帧图像中是否出现有目标对象的分类网络模型,进而根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象,比如,确定多个目标视频帧图像中是否出现有老鼠。
可选地,该实施例可以将多个目标视频帧图像中有目标对象的图像中 的图像特征输入至前端显示界面,该前端显示界面可以进而显示出目标对象的检测框和移动轨迹。
可选地,该实施例的分类网络模型可以用于过滤非目标对象的图片序列,而保留目标对象的图片序列,从而降低虚警率,保证目标对象提示信息的准确性。
通过上述步骤S1至步骤S6,通过获取摄像设备对目标区域拍摄得到的视频文件;对视频文件进行抽帧采样,得到一组视频帧图像;根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像,其中,每个目标视频帧图像用于指示在目标区域中移动的对象;对每个目标视频帧图像进行目标对象检测,得到每个目标视频帧图像的图像特征,其中,图像特征用于表示在移动的对象中,与目标对象之间的相似度大于第一阈值的对象所在的目标图像区域;根据每个目标视频帧图像的图像特征确定出运动特征,其中,运动特征用于表示多个目标视频帧图像中移动的对象的运动速度和运动方向;根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象。也就是说,对目标区域的视频文件进行抽帧采样,得到一组视频帧图像,根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出用于指示在目标区域中移动的对象的多个目标视频帧图像,再根据每个目标视频帧图像的图像特征确定出运动特征,进而根据运动特征和每个目标视频帧图像的图像特征,达到自动确定多个目标视频帧图像中是否出现有目标对象的目的,不仅大大减少了确定目标对象的人力成本,而且提高了确定目标对象的准确率,解决了对目标对象进行确定的效率低的问题,进而达到了提高鼠患检测准确度的效果。
作为一种可选的实施方式,步骤S3,根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像包括:获取一组视频帧图像中的每个像素点的平均像素值;获取一组视频帧图像中的每 个视频帧图像中的每个像素点的像素值与对应的平均像素值之间的差值;将一组视频帧图像中差值满足预定条件的视频帧图像确定为目标视频帧图像。
在该实施例中,在根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像时,可以获取一组视频帧图像中的每个像素点的像素值,根据每个像素点的像素值计算出平均像素值,再获取一组视频帧图像中的每个像素点的像素值与对应的平均像素值之间的差值。
可选地,该实施例还可以获取一组视频帧图像中的每个视频帧图像中的每个像素点的像素值与背景或者每个视频帧图像的前一帧之间的差值。
在获取上述差值之后,判断差值是否满足预定条件,将一组视频帧图像中差值满足预定条件的视频帧图像确定为目标视频帧图像,从而得到一组视频帧图像中的多个目标视频帧图像。
作为一种可选的实施方式,获取一组视频帧图像中的每个视频帧图像中的每个像素点的像素值与对应的平均像素值之间的差值包括:对于一组视频帧图像中的每个视频帧图像中的每个像素点执行以下操作,其中,在执行以下操作时每个视频帧图像被视为当前视频帧图像,每个像素点被视为当前像素点:D(x,y)=|f(x,y)-b(x,y)|,其中,(x,y)为当前像素点在当前视频帧图像中的坐标,f(x,y)表示当前像素点的像素值,b(x,y)表示当前像素点的平均像素值,D(x,y)表示当前像素点的像素值与对应的平均像素值之间的差值。
在该实施例中,在获取一组视频帧图像中的每个视频帧图像中的每个像素点的像素值与对应的平均像素值之间的差值时,每个视频帧图像被视为当前视频帧图像,每个像素点被视为当前像素点,可以通过(x,y)表示当前像素点在当前视频帧图像中的坐标,比如,为以当前视频帧图像左上角为原点,宽方向为X轴,高方向为Y轴建立的坐标系中像素点的坐标,通过f(x,y)表示当前像素点的像素值,通过b(x,y)表示当前像素点的平均像素值,通过D(x,y)表示当前像素点的像素值与对应的平均像素值之间的差 值,按照公式D(x,y)=|f(x,y)-b(x,y)|计算出当前像素点的像素值与对应的平均像素值之间的差值,从而通过上述方法达到获取一组视频帧图像中的每个视频帧图像中的每个像素点的像素值与对应的平均像素值之间的差值的目的。
作为一种可选的实施方式,将一组视频帧图像中差值满足预定条件的视频帧图像确定为目标视频帧图像包括:对于一组视频帧图像中的每个视频帧图像中的每个像素点执行以下操作,其中,在执行以下操作时每个视频帧图像被视为当前视频帧图像,每个像素点被视为当前像素点:
Figure PCTCN2019080747-appb-000001
其中,D(x,y)表示为当前像素点的像素值与对应的平均像素值之间的差值,T为第一预设阈值;其中,预定条件包括:目标视频帧图像中M(x,y)=1的像素点的个数超过第二预设阈值。
在该实施例中,在将一组视频帧图像中差值满足预定条件的视频帧图像确定为目标视频帧图像时,每个视频帧图像被视为当前视频帧图像,每个像素点被视为当前像素点,通过M(x,y)表示当前视频帧图像,D(x,y)表示当前像素点的像素值与对应的平均像素值之间的差值,通过T表示第一预设阈值,如果当前视频帧中M(x,y)=1的像素点的个数超过第二预设阈值,则将当前视频帧图像确定为目标视频帧图像,也即,则当前视频帧图像中移动的对象,为目标视频帧图像,否则,当前视频帧图像中不移动的对象。
该实施例的一组视频帧图像中多个目标视频帧图像组成了运动目标视频帧图像,可以经过形态学运算合并像素点可得出所有运动的对象,作为输出结果。
可选地,该实施例对目标视频帧图像中移动的对象的检测为基于神经网络的目标检测,可以将一组视频帧图像输入预先训练好的网络模型,得出所有移动的对象和其置信度,将大于某个置信度阈值的图像特征作为该网络模块的输出。使用的网络模型可以包含但不限于单次多目标检测器(Single Shot MultiBox Detector,简称为SSD)、区域卷积网络(Faster Region-CNN,简称为Faster-RCNN)、特征金字塔网络(Feature Pyramid  Network,简称为FPN)等,此处不做任何限制。
作为一种可选的实施方式,步骤S5,根据每个目标视频帧图像的图像特征确定出运动特征包括:获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的目标矢量,得到多个目标矢量,其中,每个目标矢量用于表示对应的一个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向;将多个目标矢量按照每个目标视频帧图像在视频文件中的时间顺序组成第一目标向量,其中,运动特征包括第一目标向量;或者获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图,得到多个二维光流图,其中,每个二维光流图包括对应的一个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向;将多个二维光流图按照每个目标视频帧图像在视频文件中的时间顺序组成三维第二目标向量,其中,运动特征包括三维第二目标向量。
在该实施例中,每个目标视频帧图像的图像特征可以用于表示与目标图像区域对应的目标矢量,从而得到与多个目标视频帧一一对应的多个目标矢量,其中的每个目标矢量用于表示对应的一个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向,也即,可以将每个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向,作为每个目标视频帧图像的图像特征。在得到多个目标矢量之后,将多个目标矢量按照每个目标视频帧图像在视频文件中的时间顺序组成第一目标向量,其中,每个目标视频帧图像在视频文件中的时间顺序可以通过时间轴表示,进而可以将多个目标矢量沿着时间轴做拼接,得到第一目标向量,该第一目标向量为一维向量,将该一维向量作为运动特征进行输出。
可选地,每个目标视频帧图像的图像特征用于表示目标图像区域,可以计算每个目标图像区域的光流(Optical flow or optic flow),得到与该目 标图像区域对应的二维光流图,进而得到与多个目标视频帧图像一一对应的多个二维光流图,其中,光流用于描述相对于观察者的运动所造成的观测目标、表面或边缘的运动。该实施例的每个二维光流图包括对应的一个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向,也即,目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向可以通过二维光流图来表示。在得到多个二维光流图之后,将多个二维光流图按照每个目标视频帧图像在视频文件中的时间顺序组成三维第二目标向量,其中,每个目标视频帧图像在视频文件中的时间顺序可以通过时间轴表示,可以将多个二维光流图沿着时间轴做拼接,得到第二目标向量,该第二目标向量为三维向量,将该三维向量作为运动特征进行输出。
该实施例通过用于表示对应的一个目标视频帧图像中移动的对象在经过目标图像区域时的运动速度和运动方向的目标矢量,或者与每个目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图来确定出运动特征,该运动特征可以为一维向量或者为三维向量,从而实现了根据每个目标视频帧图像的图像特征确定出运动特征的目的,进而根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象,达到自动确定多个目标视频帧图像中是否出现有目标对象的目的,提高了确定目标对象的准确率。
作为一种可选的示例,通过融合了对上述移动的对象的检测(目标检测)和运动特征提取的网络输出特征图,该特征图融合了包括视觉和运动特征的四维向量,其中,该四维向量可以包括但不限于时间维度、通道维度、长维度、高维度。
作为一种可选的实施方式,步骤S6,根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象包括:将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经 网络模型中,得到对象识别结果,其中,对象识别结果用于表示多个目标视频帧图像中是否出现有目标对象。
在该实施例中,在根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象时,可以将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果,该神经网络模型也即分类网络模型,可以根据存在有运动的目标对象的图像特征样本、运动特征样本和用于指示目标对象的数据对初始神经网络模型进行训练,且用于确定视频帧图像中是否出现有目标对象的模型。对象识别结果也即分类结果、判别结果,用于表示多个目标视频帧图像中是否出现有目标对象。
作为一种可选的实施方式,将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果包括:将每个图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量;将多个第一特征向量与运动特征进行融合,得到第二特征向量;将第二特征向量输入到全连接层进行分类,得到第一分类结果,其中,神经网络模型包括神经网络层结构和全连接层,对象识别结果包括第一分类结果,第一分类结果用于表示多个目标视频帧图像中是否出现有目标对象;或者将每个图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一特征向量;将运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量;将多个第一特征向量与第二特征向量进行融合,得到第三特征向量;将第三特征向量输入到全连接层进行分类,得到第二分类结果,其中,神经网络模型包括第一神经网络层结构、第二神经网络层结构和全连接层,对象识别结果包括第二分类结果,第二分类结果用于表示多个目标视频帧图像中是否出现有目标对象。
在该实施例中,神经网络模型的总体结构可以分为卷积层、正则化层、 激活函数层、全连接层,其中,卷积层由若干卷积单元组成,每个卷积单元的参数都是通过反向传播算法最佳化得到的;正则化层可以用于防止神经网络模型训练的过拟合,激活函数层可以将非线性引入网络,全连接层在整个卷积神经网络中起到分类器的作用。
在该实施例中,在将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果时,可以将每个图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量,将该多个第一特征向量与上述运动特征进行融合,从而得到第二特征向量,其中,运动特征为一维运动特征。
作为一种可选的融合方式,可以将多个第一特征向量与运动特征进行拼接(或称为组合),得到第二特征向量。
在得到第二特征向量之后,将第二特征向量输入到全连接层进行分类,也即,通全连接层对第二特征向量进行分类,从而得到第一分类结果,其中,该实施例的神经网络模型包括上述神经网络层结构和上述全连接层,第一分类结果用于表示多个目标视频帧图像中是否出现有目标对象的对象识别结果,比如,为多个目标视频帧图像中是否出现有老鼠的分类结果。
可选地,上述将每个图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量,将多个第一特征向量与运动特征进行融合,得到第二特征向量,将第二特征向量输入到全连接层进行分类,得到第一分类结果的方法,可以在获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的目标矢量,得到多个目标矢量,将多个目标矢量按照每个目标视频帧图像在视频文件中的时间顺序组成第一目标向量之后执行。
可选地,在将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果时,将每个图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一 特征向量;将上述运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量。在得到第一特征向量和得到第二特征向量之后,将多个第一特征向量与第二特征向量进行融合,得到第三特征向量。
作为一种可选的融合方式,可以将多个第一特征向量与第二特征向量进行拼接(或称为组合),得到第三特征向量。
在得到第三特征向量之后,将第三特征向量输入到全连接层进行分类,从而得到第二分类结果,其中,该实施例的神经网络模型包括第一神经网络层结构、第二神经网络层结构和全连接层,对象识别结果包括第二分类结果,该第二分类结果用于表示多个目标视频帧图像中是否出现有目标对象,比如,为多个目标视频帧图像中是否出现有老鼠的分类结果。
可选地,上述将每个图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一特征向量,将运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量,将多个第一特征向量与第二特征向量进行融合,得到第三特征向量,将第三特征向量输入到全连接层进行分类,得到第二分类结果的方法,可以在获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图,得到多个二维光流图,将多个二维光流图按照每个目标视频帧图像在视频文件中的时间顺序组成三维第二目标向量之后执行。
作为另一种可选的示例,将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果包括:将每个图像特征依次经过多个块,得到多个第一特征向量,其中,在每个块中会对块的输入依次执行卷积层上的卷积操作、正则化层上的正则化操作、激活函数层上的激活操作;将多个第一特征向量与运动特征进行拼接,得到第二特征向量;将第二特征向量输入到全连接层,通过全连接层输出得到第一分类结果,其中,神经网络模型包括多个块和全连接层,对象识别结 果包括第一分类结果,第一分类结果用于表示多个目标视频帧图像中是否出现有目标对象;或者将每个图像特征依次经过多个第一块,得到多个第一特征向量,其中,在每个第一块中会对第一块的输入依次执行卷积层上的卷积操作、正则化层上的正则化操作、激活函数层上的激活操作;将运动特征依次经过多个第二块,得到第二特征向量,其中,在每个第二块中会对第二块的输入依次执行卷积层上的卷积操作、正则化层上的正则化操作、激活函数层上的激活操作;将多个第一特征向量与第二特征向量进行拼接,得到第三特征向量;将第三特征向量输入到全连接层,通过全连接层输出得到第二分类结果,其中,神经网络模型包括多个第一块、多个第二块和全连接层,对象识别结果包括第二分类结果,第二分类结果用于表示多个目标视频帧图像中是否出现有目标对象。
在该实施例中,还可以通过块对每个图像特征进行处理。可以将每个图像特征依次经过多个块,得到多个第一特征向量,在每个块中会对块的输入依次执行在卷积层上的卷积操作、在正则化层上的正则化操作以及在激活函数层上的激活操作。在得到多个第一特征向量之后,将多个第一特征向量与运动特征进行拼接,从而得到第二特征向量。在得到第二特征向量之后,将第二特征向量输入到全连接层进行分类,通过全连接层输出得到第一分类结果,其中,该实施例的神经网络模型包括多个块和全连接层,对象识别结果包括第一分类结果,该第一分类结果用于表示多个目标视频帧图像中是否出现有目标对象,比如,为多个目标视频帧图像中是否出现有老鼠的分类结果。
可选地,该实施例通过第一块对每个图像特征进行处理,将每个图像特征依次经过多个第一块,得到多个第一特征向量,在每个第一块中会对第一块的输入依次执行在卷积层上的卷积操作、在正则化层上的正则化操作以及在激活函数层上的激活操作。该实施例还可以通过第二块对运动特征进行处理,将运动特征依次经过多个第二块,得到第二特征向量,在每 个第二块中会对第二块的输入依次执行在卷积层上的卷积操作、在正则化层上的正则化操作以及在激活函数层上的激活操作。在得到多个第一特征向量和第二特征向量之后,将多个第一特征向量与第二特征向量进行拼接,得到第三特征向量,最后将第三特征向量输入到全连接层进行分类,通过全连接层输出得到第二分类结果,其中,该实施例的神经网络模型包括多个第一块、多个第二块和全连接层,对象识别结果包括第二分类结果,该第二分类结果用于表示多个目标视频帧图像中是否出现有目标对象,比如,为多个目标视频帧图像中是否出现有老鼠的分类结果。
作为一种可选的实施方式,对视频文件进行抽帧采样,得到一组视频帧图像包括:对视频文件中的视频序列进行等间隔的抽帧采样,得到一组视频帧图像。
在该实施例中,视频文件包括视频序列,可以在对视频文件进行抽帧采样,得到一组视频帧图像时,对视频文件中的视频序列进行等间隔的抽帧采样,得到一组视频帧图像,从而减少对目标对象进行确定的算法的运算量,进而快速多个目标视频帧中是否出现有目标对象,提高了对目标对象进行确定的效率。
作为一种可选的实施方式,获取摄像设备对目标区域拍摄得到的视频文件包括:获取的视频文件包括:获取红外微光夜视摄像头对目标区域拍摄得到的视频文件,其中,视频文件中的视频帧图像为通过红外微光夜视摄像头拍摄到的图像。
在该实施例中,摄像设备可以为摄像头,比如,为红外微光夜视摄像头,该红外微光夜视摄像头带有红外照明功能。通过红外微光夜视摄像头对目标区域进行拍摄,得到视频文件,该视频文件中的视频帧图像为通过红外微光夜视摄像头拍摄到的图像。
可选地,该实施例的摄像设备还包括但不限于:移动侦测功能、联网功能(如WIFI联网)及高清晰度(如大于1080p)配置。
作为一种可选的实施方式,在确定多个目标视频帧图像中是否出现有目标对象之后,该方法还包括:在确定出多个目标视频帧图像中出现有目标对象的情况下,确定目标对象在多个目标视频帧图像中的位置;将位置显示在多个目标视频帧图像中。
在该实施例中,在确定多个目标视频帧图像中是否出现有目标对象之后,在确定出多个目标视频帧图像中出现有目标对象的情况下,可以进一步确定目标对象在多个目标视频帧图像中的位置,比如,确定老鼠在多个目标视频帧图像中的位置,进而将位置显示在多个目标视频帧图像中,比如,将用于指示位置的图标、文本等信息显示在多个目标视频帧图像中。
可选地,该实施例还可以获取目标对象出现的时间、在目标区域中的活动区域等信息,将目标对象的位置、时间、在目标区域中的具体活动区域、在目标区域的活动频率、移动轨迹等信息输出至前端,该前端也即显示部件,目标对象出现的时间、活动区域等信息可以在显示界面中进行显示,从而避免了人工确定目标对象导致对目标对象进行确定的效率低下的为问题。
可选地,在确定出多个目标视频帧图像中出现有目标对象的情况下,可以发送报警信息至前端,该报警信息用于指示目标区域中出现有目标对象,以使相关防治人员采取防治措施,从而提高对目标对象进行防治的效率。
作为一种可选的实施方式,目标对象的确定方法由设置在本地的服务器执行。
该实施例的目标对象的确定方法可以由设置在本地的服务器执行,无需连接云服务器,内部即可实现上述的运算和可视化,避免了运算端在云服务器上,会有计算资源上、传输上的问题,导致整个框架效率较为低下的问题,从而提高了对目标对象进行确定的效率。
该实施例旨在应用图像识别的技术,融合图像特征和运动特征,自动检测监控视频中是否有目标对象,对目标对象做定位和跟踪,可以生成目标对象的移动轨迹和在各目标区域的活动频率,整个过程全为算法实现,无需额外的人力成本;另外,该实施例无需通过放置目标捕捉装置来确定目标区域中的目标对象,也无需花费人力进行观测,不仅大大减少了监测目标对象的人力成本,提高了对目标对象进行确定的效率,进而方便了进一步对目标对象进行防治的工作。
进一步,下面结合可选的实施例对本申请实施例的技术方案进行举例说明。具体以目标对象为老鼠进行举例说明。
根据本申请实施例的另一种目标对象的确定方法。该方法还包括:
步骤S1,获取红外微光夜视摄像头拍摄到的视频文件。
步骤S2,判断视频文件中是否存在运动物体。
步骤S3,如果存在运动物体,则提取存在运动物体的视频片段。
步骤S4,对存在运动物体的视频片段进行图像特征和动态特征提取。
步骤S5,根据提取到的图像特征和动态特征判断运动物体是否为老鼠。
步骤S6,如果判断结果为是,则发出提示信息。
该实施例采用获取红外微光夜视摄像头拍摄到的视频文件;判断视频文件中是否存在运动物体;如果存在运动物体,则提取存在运动物体的视频片段;对存在运动物体的视频片段进行图像特征和动态特征提取;根据提取到的图像特征和动态特征判断运动物体是否为老鼠;如果判断结果为是,则发出提示信息,从而解决了对目标对象进行确定的效率低的问题,进而达到了提高鼠患检测准确度的效果。
本申请实施例的技术方案可以作为一种融合视觉特征和轨迹特征的 鼠患视频监测方法,可以应用在多种场景中用于检测拍摄到的视频中是否存在老鼠,通过红外微光夜视摄像头拍摄当前环境的视频文件,然后判断是否存在运动物体,如果存在运动物体,则通过提取运动物体的视频片段进行特征识别,进一步判断提取运动物体是否为老鼠,如果判断出是老鼠,则发出提示信息,提示信息可以是在屏幕上显示文字,也可以是发出声音提示信息,也可以是亮灯或闪烁等多种类型的提示信息。
需要说明的是,本申请实施例的技术方案中,监控摄像头采用的是红外微光夜视摄像头,另外,其判断、提取等处理过程是在本地服务器中进行的,无需将数据发送到远程服务器来处理,可以减少数据传输量,提高监测效率。
可选地,在发出提示信息之后,确定运动物体在视频文件中每帧图片中的位置;将预设标记叠加在每帧图片对应的位置处显示在前端界面上。
在发出有老鼠的提示后,确定老鼠在视频文件中每帧图片中的位置,然后将预设的标记叠加在每帧图片对应的位置处显示,预设标记可以是绿色或者红色的矩形框,把每帧图片中老鼠的位置用矩形框标记出,以方便用户可以及时查看到老鼠的位置和经常出没区域。
可选地,判断视频文件中是否存在运动物体包括:对视频文件中的视频序列进行等间隔的抽帧采样,得到采样视频帧;通过动态目标检测算法或者基于神经网络的目标检测算法判断采样视频帧图像中是否有运动物体。
在判断视频文件中是否存在运动物体时,可以对视频序列进行等间隔的抽帧采样,以减少算法的运算量,然后判断采样视频帧中是否有运动物体,判断时可以采用动态目标检测算法或者基于神经网络的目标检测算法中的任意一种,在一些情况下,也可以两者混合使用。
可选地,通过动态目标检测算法判断采样视频帧图像中是否有运动物体包括:通过D k(x,y)=|f k(x,y)-b k(x,y)|计算当前帧和背景或前一帧 的差值;通过
Figure PCTCN2019080747-appb-000002
判断是否存在运动物体,其中,(x,y)为以图像左上角为原点,宽方向为X轴,高方向为Y轴建立的坐标系中像素点的坐标,k为当前帧的索引,f表示当前帧,b表示背景或者上一帧,M(x,y)为运动图像,T为阈值。
若M(x,y)为1表示有运动目标,所有X(x,y)的像素组成了运动目标视频帧图像,经过形态学运算合并像素点可得出所有运动的目标。
可选地,根据提取到的图像特征和动态特征判断运动物体是否为老鼠包括:将提取到的图像特征和动态特征输入到预先训练好的神经网络模型中,进行模型判别,得到模型输出结果;根据模型输出结果判断运动物体是否为老鼠。
可以通过预先训练好的神经网络模型对提取到的图像特征和动态特征进行模型判别,模型是预先根据大量的样本训练得到的,大量的样本包括图片和该图片中是否有老鼠的标签,在一些情况下,还可以包括该图片中的老鼠数量的标签,这样可以使模型更加精确。
本申请实施例的技术方案可以应用在厨房、餐厅等需要监测是否有鼠害的应用场景中,也可以使用于酒店业学校、实验室、医院等室内外对于环境卫生有要求的场所,对在鼠害防治工作中,应用本申请实施例的图像识别技术进行老鼠检测和跟踪,使用独立的一个装置,通过监控摄像头在本地完成鼠患的监控,无需放置鼠夹鼠笼,也无需花费人力进行观测,将监测鼠害变为高效全自动的流程工作,不仅大大减少了监测鼠害的人力成本,同时准确率高,方便对鼠害卫生的监管,并且提供了轨迹信息,方便了进一步的灭鼠工作。
本申请实施例的技术方案还提供了一种可选实施方式,下面结合该可选实施方式对本申请实施例的技术方案进行说明。
本申请实施例旨在应用图像识别的技术,融合视觉和图像序列特征,自动检测监控视频中是否有老鼠,对老鼠做定位和跟踪,并且生成老鼠的 运动轨迹路线和各区域的活动频率,整个过程全为算法实现,无需额外的人力成本,并且是一个独立的装置,无需连接云服务器,内部可实现所有的运算和可视化。
根据本申请实施例的一种鼠患视频监测装置可以包括分为几个部件:红外微光夜视摄像头、数据处理模块和前端显示部件,上述装置工作时原理如下:红外微光夜视摄像头负责采集场景视频序列,数据处理模块接收视频序列并且检测视频中有无老鼠,若检测到老鼠,将老鼠的位置等一系列信息输出至前端显示界面,前端显示界面显示老鼠的位置、出现时间、活动区域并且可以即时进行鼠患的报警。
上述数据处理模块可以分为视频采集模块302、视频处理模块304和存储模块306。图3是根据本申请实施例的一种各模块数据连接的示意图,如图3所示,视频采集模块302通过精简指令集计算机(Reduced Instruction Set Computer,简称为RISC)微处理器(Advanced RISC Machines,简称为ARM)板3022采集视频数据,并通过视频预处理模块3024进行预处理,视频处理模块304读入已训练好的模型在嵌入式图形处理器(Graphics Processing Unit,简称为GPU)处理器3042中根据深度学习算法进行视频处理,若深度学习网络模型检测到某一个片段时间有老鼠,则将该片段以及相应的检测结果存储至存储模块306,存储模块306将这一系列信息输出至前端。
图4是根据本申请实施例的一种鼠患检测系统的原理示意图。如图4所示,该算法包括以下几个模块:预处理、目标检测,运动特征提取和分类网络,系统的输入为原始视频序列,预处理包含两个步骤:抽帧和动态检测,先是对原始视频序列进行等间隔的抽帧采样,减少算法的运算量,然后利用目标检测算法进行目标检测,判断图像中是否有运动物体,若无运动物体,则不进行后续的检测,若有运动物体,则将有运动物体的视频片段输入后续模块。在目标检测过程中,对预处理后的视频序列的每一帧 进行检测,在可能存在老鼠的位置获取图像特征(如该位置对应的检测框内的视觉信息),并通过运动特征提取模块,将各个视频图像帧之间的信息进行融合和特征提取,防止单帧的目标检测器出现误判的情况,随后将提取的运动特征和与图像特征输入分类网络,由分类网络判别是否是老鼠,若是老鼠,则将老鼠在每一帧所在位置的矩形检测框传给前端显示界面。
需要说明的是,在本实施例中,上述目标检测过程是根据具体的机器计算资源分配了两种算法:动态目标检测算法和基于神经网络的目标检测算法,前者运算速度快、对机器配置要求低,后者准确性和鲁棒性。
1)动态目标检测算法包含背景差和帧差法,利用下述公式(1),计算当前帧和背景或者前一帧的差值:
D k(x,y)=|f k(x,y)-b k(x,y)|     (1)
上式中,(x,y)为以图像左上角为原点,宽方向为X轴,高方向为Y轴建立的坐标系中像素点的坐标,k为当前帧的索引,f代表当前帧,b代表背景或者上一帧。利用公式(2)判断是否存在运动目标:
Figure PCTCN2019080747-appb-000003
M(x,y)为运动图像,T为阈值,若M(x,y)为1表示有运动目标,所有X(x,y)的像素组成了运动目标视频帧图像,经过形态学运算合并像素点可得出所有运动的目标,作为该模块的输出。
2)基于神经网络的目标检测算法将图片输入预先训练好的网络模型,得出所有可能的目标和其置信度,大于某个置信度阈值的检测框作为该模块的输出。使用的网络模型包含但不限于SSD、Faster-RCNN、FPN等。图5是本申请实施例的一种Faster-RCNN网络模型的示意图。如图5所示,其中conv是卷积层,由卷积核(是一个矩阵)在输入上进行划窗,对每个输入的划窗位置都和矩阵根据公式(3)相点乘,结果F作为该划窗位置的特征输出。
F=Σ 0≤j,j≤nk(i,j)*I(i,j)      (3)
RPN为区域提出网络,会提出一系列的候选框,感兴趣区域池化层(ROI pooling)将卷积层提到的特征图在RPN输出的坐标下的区域映射成大小(w,h)固定的矩形框,输入由全连接层构成的分类器和边框回归器,边框回归输出老鼠的可能坐标位置,分类器输出是该位置老鼠的置信度。
上述运动特征提取:因为物体的运动是连续的,运动特征提取算法先根据每一帧得到的检测框,计算帧与帧之间检测框的相关性,相关性大的检测框认为是同一物体,对每一帧的检测框进行匹配,得到物体的一系列运动图片,最后使用3D的特征提取网络提取运动序列的特征。
上述分类网络:将目标检测框中的视觉信息和运动特征融合,输入设计好的分类的网络模型,用于筛除非老鼠的图片序列,降低虚警率,将结果输入前端显示界面,显示老鼠的检测框和轨迹。
在本申请实施例中,对于整体的框架,还可以但不限于通过目标检测和分类网络来达到检测识别的目的,以节省框架布局成本。
本申请实施例提出了利用图像识别算法,自动识别监控视频中的老鼠,无需放置鼠夹鼠笼,也无需花费人力进行观测,将监测鼠害变为高效全自动的流程工作,不仅大大减少了监测鼠害的人力成本,同时准确率高,方便对后厨鼠害卫生的监管,同时,还可以提供老鼠活动的轨迹,便于人员选择灭鼠工具放置位置,方便了进一步的除害工作。
在本实施例中还提供了另一种目标对象的监控方法,图6是根据本申请实施例的目标对象的监控方法的流程图二,如图6所示,该流程包括如下步骤:
步骤S602,视频监控设备在检测到目标区域中出现了移动的对象的 情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取图像;
步骤S604,视频监控设备将图像发送至第一服务器,其中,图像用于指示第一服务器根据图像确定对象是否为目标对象。
可选地,在本实施例中,目标对象可以但不限于包括:老鼠,害虫等等有害生物。
可选地,在本实施例中,目标区域可以但不限于包括:厨房、仓库、厂房等等。
可选地,在本实施例中,视频监控设备可以但不限于包括:摄像头、监控器等等。
可选地,在本实施例中,视频监控设备可以但不限于包括一个或者多个视频监控设备。
可选地,在本实施例中,第一服务器可以但不限于包括:第一云服务器。例如:自有云。
通过上述步骤,第一服务器根据从视频监控设备获取的图像确定目标区域中出现的对象是否为目标对象,该图像是视频监控设备在检测到目标区域中出现了移动的对象的情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的,由此视频监控设备只需在检测到目标区域中出现了移动的对象的情况下向第一服务器发送可能存在对象的图像,第一服务器即可根据接收到的图像确定目标区域出现的对象是否为目标对象,可见相对于根据视频监控目标对象的方式,能够大大减少传输数据的数据量,从而提高传输速度,减少传输时间,提高监控效率。因此,可以解决相关技术中对目标对象进行监控的效率较低的问题,达到提高对目标对象进行监控的效率的效果。
可选地,在检测到目标区域中出现了移动的对象的情况下,视频监控设备将目标视频发送至第二服务器,其中,第二服务器用于在接收到第一服务器发送的第一请求的情况下,响应第一请求将目标视频发送至第一服 务器。
可选地,在上述步骤S604之后,视频监控设备接收第一服务器发送的第二请求,视频监控设备响应第二请求将目标视频发送至第一服务器。
可选地,在上述步骤S602中,视频监控设备在检测到目标区域中出现了移动的对象的情况下,从目标区域中出现了对象开始每隔预定时间从视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至对象不再出现在目标区域中,图像包括视频图像。视频监控设备将图像发送至第一服务器包括:视频监控设备将截取的视频图像实时发送至第一服务器;或者,视频监控设备获取包括截取到的全部视频图像的图像集,并将图像集发送至第一服务器。
可选地,在检测到目标区域中出现了移动的对象的情况下,视频监控设备从对目标区域进行拍摄得到的视频中获取从目标区域中出现对象开始直至目标区域中不再出现对象为止的第一视频;视频监控设备获取目标区域中出现对象之前的第一目标时间段的第二视频以及目标区域中不再出现对象之后的第二目标时间段的第三视频;视频监控设备将第二视频,第一视频和第三视频确定为目标视频。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例的方法。
在本实施例中还提供了一种目标对象的监控装置,应用于第一服务器,该装置用于实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。 尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图7是根据本申请实施例的目标对象的监控装置的结构框图一,如图7所示,该装置包括:
接收模块72,设置为接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;
确定模块74,设置为根据图像确定对象是否为目标对象。
可选地,在本实施例中,目标对象可以但不限于包括:老鼠,害虫等等有害生物。
可选地,在本实施例中,目标区域可以但不限于包括:厨房、仓库、厂房等等。
可选地,在本实施例中,视频监控设备可以但不限于包括:摄像头、监控器等等。
可选地,上述摄像头可以包括但不限于:带有红外照明功能的摄像头,例如,红外微光夜视摄像头。进一步,该摄像头还可以包括但不限于:移动侦测功能、存储功能、联网功能(如wifi联网)及高清晰度(如大于1080p)配置。
可选地,在本实施例中,视频监控设备可以但不限于包括一个或者多个视频监控设备。
可选地,在本实施例中,第一服务器可以但不限于包括:第一云服务器。例如:自有云。
可选地,上述装置还设置为:在确定出对象为目标对象的情况下,获取目标视频。
可选地,上述装置还设置为:从视频监控设备获取目标视频;或者,从第二服务器获取目标视频,其中,目标视频是由视频监控设备在检测到 目标区域中出现了移动的对象的情况下发送至第二服务器的。
可选地,上述装置还设置为:在确定出对象不为目标对象的情况下,向第二服务器发送指示信息,其中,指示信息用于指示第二服务器删除目标视频。
可选地,上述装置还设置为:在目标视频中确定出目标对象在目标区域中的移动轨迹。
可选地,上述装置还设置为:根据移动轨迹生成提示信息,其中,提示信息用于提示消除目标对象的方式。
可选地,上述装置还设置为:生成目标对象对应的告警信息,其中,告警信息用于指示在目标区域出现了目标对象,告警信息中包括以下至少之一:目标视频、移动轨迹、提示信息;将告警信息发送至客户端。
可选地,确定模块设置为:识别接收到的每一张视频图像中的对象是否为目标对象,得到每一张视频图像对应的识别结果;将接收到的全部视频图像对应的识别结果融合为目标结果;根据目标结果确定对象是否为目标对象。
可选地,确定模块还设置为:确定接收到的每一张视频图像中是否出现了对象;识别出现了对象的视频图像中的对象是否为目标对象。
可选地,确定模块设置为:对每个目标视频帧图像进行目标对象的检测,得到每个目标视频帧图像的图像特征,其中,图像包括从目标视频上获取的多个目标视频帧图像,每个目标视频帧图像用于指示在目标区域中的对象,图像特征用于表示在对象中,与目标对象之间的相似度大于第一阈值的对象所在的目标图像区域;根据每个目标视频帧图像的图像特征确定出运动特征,其中,运动特征用于表示多个目标视频帧图像中对象的运动速度和运动方向;根据运动特征和每个目标视频帧图像的图像特征,确定多个目标视频帧图像中是否出现有目标对象。
可选地,确定模块设置为:获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的目标矢量,得到多个目标矢量,其中,每个目 标矢量用于表示对应的一个目标视频帧图像中对象在经过目标图像区域时的运动速度和运动方向;将多个目标矢量按照每个目标视频帧图像在视频文件中的时间顺序组成第一目标向量,其中,运动特征包括第一目标向量;或者,获取与每个目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图,得到多个二维光流图,其中,每个二维光流图包括对应的一个目标视频帧图像中对象在经过目标图像区域时的运动速度和运动方向;将多个二维光流图按照每个目标视频帧图像在视频文件中的时间顺序组成三维第二目标向量,其中,运动特征包括三维第二目标向量。
可选地,确定模块设置为:将运动特征和每个目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果,其中,对象识别结果用于表示多个目标视频帧图像中是否出现有目标对象。
可选地,确定模块设置为:将每个图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量;将多个第一特征向量与运动特征进行融合,得到第二特征向量;将第二特征向量输入到全连接层进行分类,得到第一分类结果,其中,神经网络模型包括神经网络层结构和全连接层,对象识别结果包括第一分类结果,第一分类结果用于表示多个目标视频帧图像中是否出现有目标对象;或者,将每个图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一特征向量;将运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量;将多个第一特征向量与第二特征向量进行融合,得到第三特征向量;将第三特征向量输入到全连接层进行分类,得到第二分类结果,其中,神经网络模型包括第一神经网络层结构、第二神经网络层结构和全连接层,对象识别结果包括第二分类结果,第二分类结果用于表示多个目标视频帧图像中是否出现有目标对象。
可选地,接收模块设置为:接收视频监控设备发送的多个目标视频帧图像,其中,多个目标视频帧图像是通过视频监控设备对目标视频进行抽帧采样,得到一组视频帧图像,并根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定的;或者,
接收视频监控设备发送的一组视频帧图像,其中,一组视频帧图像是通过视频监控设备对目标视频进行抽帧采样得到的;根据一组视频帧图像中的像素点的像素值在一组视频帧图像中确定出多个目标视频帧图像。
在本实施例中还提供了另一种目标对象的监控装置,应用于视频监控设备,该装置用于实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图8是根据本申请实施例的目标对象的监控装置的结构框图二,如图8所示,该装置包括:
获取模块82,设置为在检测到目标区域中出现了移动的对象的情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取图像;
发送模块84,设置为将图像发送至第一服务器,其中,图像用于指示第一服务器根据图像确定对象是否为目标对象。
可选地,上述装置还设置为:在检测到目标区域中出现了移动的对象的情况下,将目标视频发送至第二服务器,其中,第二服务器设置为在接收到第一服务器发送的第一请求的情况下,响应第一请求将目标视频发送至第一服务器。
可选地,上述装置还设置为:接收第一服务器发送的第二请求;响应第二请求将目标视频发送至第一服务器。
可选地,获取模块设置为:视频监控设备在检测到目标区域中出现了移动的对象的情况下,从目标区域中出现了对象开始每隔预定时间从视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至对象不再出现在目标区域中,图像包括视频图像;
发送模块设置为:视频监控设备将截取的视频图像实时发送至第一服务器;或者,视频监控设备获取包括截取到的全部视频图像的图像集,并 将图像集发送至第一服务器。
可选地,上述装置还设置为:在检测到目标区域中出现了移动的对象的情况下,从对目标区域进行拍摄得到的视频中获取从目标区域中出现对象开始直至目标区域中不再出现对象为止的第一视频;获取目标区域中出现对象之前的第一目标时间段的第二视频以及目标区域中不再出现对象之后的第二目标时间段的第三视频;将第二视频,第一视频和第三视频确定为目标视频。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
在本实施例中还提供了一种目标对象的监控系统,图9是根据本申请实施例的目标对象的监控系统的结构框图,如图9所示,该系统包括:视频监控设备92和第一服务器94,其中,
视频监控设备92与第一服务器94连接;
视频监控设备92设置为在检测到目标区域中出现了移动的对象的情况下,从对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取图像,并将图像发送至第一服务器94;
第一服务器94设置为根据图像确定对象是否为目标对象。
可选地,视频监控设备设置为:在检测到目标区域中出现了移动的对象的情况下,从目标区域中出现了对象开始每隔预定时间从视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至对象不再出现在目标区域中,图像包括视频图像;将截取的视频图像实时发送至第一服务器;或者,获取包括截取到的全部视频图像的图像集,并将图像集发送至第一服务器。
可选地,第一服务器设置为:识别接收到的每一张视频图像中的对象是否为目标对象,得到每一张视频图像对应的识别结果;将接收到的全部 视频图像对应的识别结果融合为目标结果;根据目标结果确定对象是否为目标对象。
可选地,第一服务器还设置为:在确定出对象为目标对象的情况下,获取目标视频;在目标视频中确定出目标对象在目标区域中的移动轨迹;根据移动轨迹生成提示信息,其中,提示信息用于提示消除目标对象的方式;生成目标对象对应的告警信息,其中,告警信息用于指示在目标区域出现了目标对象,告警信息中包括以下至少之一:目标视频、移动轨迹、提示信息。
可选地,上述系统还包括:客户端,其中,第一服务器与客户端连接;第一服务器设置为将告警信息发送至客户端;客户端设置为在显示界面上显示告警信息。
可选地,上述系统还包括:第二服务器,其中,第二服务器与视频监控设备和第一服务器连接;视频监控设备还设置为将视频发送至第二服务器;第二服务器设置为存储目标视频;第一服务器设置为从第二服务器获取目标视频。
可选地,第一服务器还设置为:在确定对象不为目标对象的情况下,向第二服务器发送指示信息;第二服务器设置为:响应指示信息删除目标视频。
可选地,视频监控设备还设置为:从对目标区域进行拍摄得到的视频中获取从目标区域中出现对象开始直至目标区域中不再出现对象为止的第一视频;获取目标区域中出现对象之前的第一目标时间段的第二视频以及目标区域中不再出现对象之后的第二目标时间段的第三视频;将第二视频,第一视频和第三视频确定为目标视频。
下面结合本申请可选实施例进行详细说明。
本申请可选实施例提供了一种目标对象的监控架构,图10是根据本申请可选实施例的目标对象的监控架构的示意图,如图10所示,提出了 一种系统架构,监控内外部环境及有害生物活动信息。该系统具有可快速部署的特征,无需在客户现场部署服务器,只需要视频监控设备采集数据,以及部署无线网络环境用于数据上传,所有后续的计算分析都在云端完成,大幅节省了系统的硬件成本、系统部署的复杂度,同时也能出色地完成虫鼠害的实时报警、视频回放、路径分析、灭鼠控虫建议等功能。本系统还结合了虫鼠害监测与虫鼠害防治,形成良性的闭环,为实际的虫鼠害防治工作起到全局性地协助作用。
该系统包括以下部分:数据采集部分,数据分析部分,即时告警部分,视频回放部分,路径分析部分和应用(Application,简称为APP)显示部分。
数据采集部分用于采集视频和图片集,在后厨等场所,选择合适的视野较好的位置,部署视频监控设备,获得后厨关键设施的视频数据,用以观察虫类、鼠类出没情况。一个室内环境可视实际情况,部署多组监控设备。考虑到老鼠在夜间出没的特点,视频监控设备需有红外夜视功能。
视频监控设备使用移动侦测的方式,当所摄制的画面内容发生任何的变化时(比如有老鼠出现、蟑螂出现,或是异物飞入时),将该周期内的视频写入SD卡(一般会对视频预录和延时5秒钟,使得视频能够录制完整的一段动作),将视频数据即时上传至视频云服务器(即萤石云,也可以是其他公有云)。视频监控设备拥有断线续传功能,在网络环境不稳定时,也能够保证视频稍后完整上传至视频云服务器。视频云服务器设置为暂时地保存视频数据,后期在经过对图片的图像识别分析,确认确有虫鼠害存在的情况下,供调取回放,以及进一步地分析。
当所摄制的画面内容发生任何的变化,视频监控设备保存并上传视频的同时,每隔500毫秒(ms)保存一张图片,将图片实时的上传至自有的云服务器,用于图像识别。
自有云服务器在收到图片后,即时地完成对图片的图像识别,使用人工智能(Artificial Intelligence,简称为AI)技术,判断图像中是否有目标 有害生物,例如老鼠、蟑螂等,或是只是异物飞入等非虫害侵袭场景。即进入数据分析部分。
数据分析部分通过自有云进行图像识别,对视频监控设备所回传的图像应用图像识别算法,进行老鼠、蟑螂等虫鼠害的识别。当识别为真,则认为该时刻发现了鼠害、虫害,向视频云服务器发送请求,调取并下载该时间段的虫鼠害出没的视频数据以供进一步的分析(当服务器收到连续图片集接收完毕,且判断为有虫害入侵,实时请求整个时间段的视频);当识别为假,则认为该时刻的动态识别与虫鼠害无关,不作进一步处理。
可选的,为了提高判别准确率,引入人工复核,以确认每次被检出的都确实是有老鼠、蟑螂等出没,增加对虫鼠害判别的准确率。
即时告警部分可以用于紧急灭鼠,当通过对图片集的识别,检测到老鼠出没时,云服务端向用户终端发送报警信息,指示餐厅运营人员、虫害防治人员采取措施。并提供图像回放,标示出老鼠、蟑螂等被识别出的有害生物,便于操作人员初步判断其出现的位置与危害,并采取及时的控制措施。
紧急灭鼠场景适合机房、医院等不容许有鼠患发生的场所的监控,有人值守。在发现鼠情后立即指示相关人员采取措施,系统负责及时提供图片以及视频回放,供灭鼠参考。
可选的,报警信息也可以通过短信、推送信息等方式发送。
视频回放部分当视频云服务器返回了所请求的视频数据,并下载到自有云后,用户终端可以访问视频回放数据。视频下载的速度视网络通畅与否来确定,比实时的图片展示稍慢,一般能在鼠情发生后的几分钟内获取到视频回放数据。
路径分析部分通过对视频数据的进一步分析,提取出老鼠、蟑螂等有害生物的移动路径,标记出老鼠出没时的入侵点、藏匿点、行进路线、活动时长、皮肤颜色等信息,供制定控鼠、控虫的进一步的方案,在用户终端予以显示。
老鼠路径显示可采用标点表示,以一串从小到大的数字表示在线段上,以表示老鼠或蟑螂的进行方向。
APP显示部分可以显示灭鼠、灭虫建议,用于常规虫鼠害防治,汇总各个接触点收集的虫鼠害信息,视虫鼠害出没的历史路径,结合所在场所适合部署粘鼠板、蟑螂屋等器械的位置,给出放置的位置建议。
通过APP呈现给餐厅运营人员以及虫害防治人员,自动按天出具报告,通过微信公众号、短信等可选的方式,推送给餐厅运营或相关人员。
用以展示的数据维度还可以包括前一天/当天晚上的虫鼠害活跃时长、虫害种类、捕获数量等。
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;
S2,第一服务器根据图像确定对象是否为目标对象。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;
S2,第一服务器根据图像确定对象是否为目标对象。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的可选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
工业实用性:通过上述描述可知,本申请通过第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,图像是从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的图像;第一服务器根据图像确定对象是否为目标对象的 方式,第一服务器根据从视频监控设备获取的图像确定目标区域中出现的对象是否为目标对象,该图像是视频监控设备在检测到目标区域中出现了移动的对象的情况下,从视频监控设备对目标区域进行拍摄得到的视频中出现了对象的目标视频上获取的,由此视频监控设备只需在检测到目标区域中出现了移动的对象的情况下向第一服务器发送可能存在对象的图像,第一服务器即可根据接收到的图像确定目标区域出现的对象是否为目标对象,可见相对于根据视频监控目标对象的方式,能够大大减少传输数据的数据量,从而提高传输速度,减少传输时间,提高监控效率。因此,可以解决相关技术中对目标对象进行监控的效率较低的问题,达到提高对目标对象进行监控的效率的效果。

Claims (34)

  1. 一种目标对象的监控方法,包括:
    第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,所述图像是从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取的图像;
    所述第一服务器根据所述图像确定所述对象是否为目标对象。
  2. 根据权利要求1所述的方法,其中,在所述第一服务器根据所述图像确定所述对象是否为目标对象之后,所述方法还包括:
    在确定出所述对象为所述目标对象的情况下,所述第一服务器获取所述目标视频。
  3. 根据权利要求2所述的方法,其中,所述第一服务器获取所述目标视频包括:
    所述第一服务器从所述视频监控设备获取所述目标视频;或者,
    所述第一服务器从第二服务器获取所述目标视频,其中,所述目标视频是由所述视频监控设备在检测到目标区域中出现了移动的对象的情况下发送至所述第二服务器的。
  4. 根据权利要求3所述的方法,其中,在所述第一服务器根据所述图像确定所述对象是否为目标对象之后,所述方法还包括:
    在确定出所述对象不为所述目标对象的情况下,所述第一服务器向所述第二服务器发送指示信息,其中,所述指示信息用于指示所述第二服务器删除所述目标视频。
  5. 根据权利要求2所述的方法,其中,在所述第一服务器获取 所述目标视频之后,所述方法还包括:
    所述第一服务器在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹。
  6. 根据权利要求5所述的方法,其中,在所述第一服务器在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹之后,所述方法还包括:
    所述第一服务器根据所述移动轨迹生成提示信息,其中,所述提示信息用于提示消除所述目标对象的方式。
  7. 根据权利要求6所述的方法,其中,在所述第一服务器根据所述移动轨迹生成提示信息之后,所述方法还包括:
    所述第一服务器生成所述目标对象对应的告警信息,其中,所述告警信息用于指示在所述目标区域出现了所述目标对象,所述告警信息中包括以下至少之一:所述目标视频、所述移动轨迹、所述提示信息;
    所述第一服务器将所述告警信息发送至客户端。
  8. 根据权利要求1所述的方法,其中,在第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像之前,所述方法还包括:
    所述视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;
    所述视频监控设备将截取的所述视频图像实时发送至所述第一 服务器;或者,所述视频监控设备获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
  9. 根据权利要求8所述的方法,其中,所述第一服务器根据所述图像确定所述对象是否为目标对象包括:
    所述第一服务器识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象,得到每一张所述视频图像对应的识别结果;
    所述第一服务器将接收到的全部所述视频图像对应的识别结果融合为目标结果;
    所述第一服务器根据所述目标结果确定所述对象是否为目标对象。
  10. 根据权利要求9所述的方法,其中,所述第一服务器识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象包括:
    所述第一服务器确定接收到的每一张所述视频图像中是否出现了所述对象;
    所述第一服务器识别出现了所述对象的所述视频图像中的所述对象是否为所述目标对象。
  11. 根据权利要求1所述的方法,其中,所述第一服务器根据所述图像确定所述对象是否为目标对象包括:
    所述第一服务器对每个目标视频帧图像进行目标对象的检测,得到每个所述目标视频帧图像的图像特征,其中,所述图像包括从所述目标视频上获取的多个目标视频帧图像,每个所述目标视频帧图像用于指示在所述目标区域中的所述对象,所述图像特征用于表示在所述对象中,与所述目标对象之间的相似度大于第一阈值的对象所在的目 标图像区域;
    所述第一服务器根据每个所述目标视频帧图像的图像特征确定出运动特征,其中,所述运动特征用于表示所述多个目标视频帧图像中所述对象的运动速度和运动方向;
    所述第一服务器根据所述运动特征和每个所述目标视频帧图像的图像特征,确定所述多个目标视频帧图像中是否出现有所述目标对象。
  12. 根据权利要求11所述的方法,其中,所述第一服务器根据每个所述目标视频帧图像的图像特征确定出运动特征包括:
    获取与每个所述目标视频帧图像的图像特征所表示的目标图像区域对应的目标矢量,得到多个目标矢量,其中,每个所述目标矢量用于表示对应的一个所述目标视频帧图像中所述对象在经过所述目标图像区域时的运动速度和运动方向;将所述多个目标矢量按照每个所述目标视频帧图像在所述视频文件中的时间顺序组成第一目标向量,其中,所述运动特征包括所述第一目标向量;或者
    获取与每个所述目标视频帧图像的图像特征所表示的目标图像区域对应的二维光流图,得到多个二维光流图,其中,每个所述二维光流图包括对应的一个所述目标视频帧图像中所述对象在经过所述目标图像区域时的运动速度和运动方向;将所述多个二维光流图按照每个所述目标视频帧图像在所述视频文件中的时间顺序组成三维第二目标向量,其中,所述运动特征包括所述三维第二目标向量。
  13. 根据权利要求11所述的方法,其中,所述第一服务器根据所述运动特征和每个所述目标视频帧图像的图像特征,确定所述多个目标视频帧图像中是否出现有所述目标对象包括:
    将所述运动特征和每个所述目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果,其中,所述对象识别结果用于表示所述多个目标视频帧图像中是否出现有所述目标对象。
  14. 根据权利要求13所述的方法,其中,将所述运动特征和每个所述目标视频帧图像的图像特征输入到预先训练好的神经网络模型中,得到对象识别结果包括:
    将每个所述图像特征经过包括卷积层、正则化层和激活函数层的神经网络层结构,得到多个第一特征向量;将所述多个第一特征向量与所述运动特征进行融合,得到第二特征向量;将所述第二特征向量输入到全连接层进行分类,得到第一分类结果,其中,所述神经网络模型包括所述神经网络层结构和所述全连接层,所述对象识别结果包括所述第一分类结果,所述第一分类结果用于表示所述多个目标视频帧图像中是否出现有所述目标对象;或者
    将每个所述图像特征经过包括卷积层、正则化层和激活函数层的第一神经网络层结构,得到多个第一特征向量;将所述运动特征经过包括卷积层、正则化层、激活函数层的第二神经网络层结构,得到第二特征向量;将所述多个第一特征向量与所述第二特征向量进行融合,得到第三特征向量;将所述第三特征向量输入到全连接层进行分类,得到第二分类结果,其中,所述神经网络模型包括所述第一神经网络层结构、所述第二神经网络层结构和所述全连接层,所述对象识别结果包括所述第二分类结果,所述第二分类结果用于表示所述多个目标视频帧图像中是否出现有所述目标对象。
  15. 根据权利要求11所述的方法,其中,所述第一服务器接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像包括:
    所述第一服务器接收视频监控设备发送的所述多个目标视频帧图像,其中,所述多个目标视频帧图像是通过所述视频监控设备对所述目标视频进行抽帧采样,得到一组视频帧图像,并根据所述一组视频帧图像中的像素点的像素值在所述一组视频帧图像中确定的;或者,
    所述第一服务器接收视频监控设备发送的一组视频帧图像,其中,所述一组视频帧图像是通过所述视频监控设备对所述目标视频进行抽帧采样得到的;所述第一服务器根据所述一组视频帧图像中的像素点的像素值在所述一组视频帧图像中确定出所述多个目标视频帧图像。
  16. 根据权利要求1至15中任一项所述的方法,其中,所述第一服务器包括:第一云服务器。
  17. 根据权利要求3所述的方法,其中,所述第二服务器包括:第二云服务器。
  18. 一种目标对象的监控方法,包括:
    视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像;
    所述视频监控设备将所述图像发送至第一服务器,其中,所述图像用于指示所述第一服务器根据所述图像确定所述对象是否为目标对象。
  19. 根据权利要求18所述的方法,其中,在检测到目标区域中出现了移动的对象的情况下,所述方法还包括:
    所述视频监控设备将所述目标视频发送至第二服务器,其中,所述第二服务器用于在接收到所述第一服务器发送的第一请求的情况 下,响应所述第一请求将所述目标视频发送至所述第一服务器。
  20. 根据权利要求18所述的方法,其中,在所述视频监控设备将所述图像发送至第一服务器之后,所述方法还包括:
    所述视频监控设备接收所述第一服务器发送的第二请求;
    所述视频监控设备响应所述第二请求将所述目标视频发送至所述第一服务器。
  21. 根据权利要求18所述的方法,其中,
    从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像包括:所述视频监控设备在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;
    所述视频监控设备将所述图像发送至第一服务器包括:所述视频监控设备将截取的所述视频图像实时发送至所述第一服务器;或者,所述视频监控设备获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
  22. 根据权利要求18所述的方法,其中,在检测到目标区域中出现了移动的对象的情况下,所述方法还包括:
    所述视频监控设备从对所述目标区域进行拍摄得到的视频中获取从所述目标区域中出现所述对象开始直至所述目标区域中不再出现所述对象为止的第一视频;
    所述视频监控设备获取所述目标区域中出现所述对象之前的第 一目标时间段的第二视频以及所述目标区域中不再出现所述对象之后的第二目标时间段的第三视频;
    所述视频监控设备将所述第二视频,所述第一视频和所述第三视频确定为所述目标视频。
  23. 一种目标对象的监控系统,包括:视频监控设备和第一服务器,其中,
    所述视频监控设备与所述第一服务器连接;
    所述视频监控设备设置为在检测到目标区域中出现了移动的对象的情况下,从对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像,并将所述图像发送至所述第一服务器;
    所述第一服务器设置为根据所述图像确定所述对象是否为目标对象。
  24. 根据权利要求23所述的系统,其中,所述视频监控设备设置为:
    在检测到目标区域中出现了移动的对象的情况下,从所述目标区域中出现了所述对象开始每隔预定时间从所述视频监控设备对目标区域进行拍摄得到的视频中截取视频图像,直至所述对象不再出现在所述目标区域中,所述图像包括所述视频图像;
    将截取的所述视频图像实时发送至所述第一服务器;或者,获取包括截取到的全部视频图像的图像集,并将所述图像集发送至所述第一服务器。
  25. 根据权利要求24所述的系统,其中,所述第一服务器设置为:
    识别接收到的每一张所述视频图像中的所述对象是否为所述目标对象,得到每一张所述视频图像对应的识别结果;
    将接收到的全部所述视频图像对应的识别结果融合为目标结果;
    根据所述目标结果确定所述对象是否为目标对象。
  26. 根据权利要求23所述的系统,其中,所述第一服务器还设置为:
    在确定出所述对象为所述目标对象的情况下,获取所述目标视频;
    在所述目标视频中确定出所述目标对象在所述目标区域中的移动轨迹;
    根据所述移动轨迹生成提示信息,其中,所述提示信息用于提示消除所述目标对象的方式;
    生成所述目标对象对应的告警信息,其中,所述告警信息用于指示在所述目标区域出现了所述目标对象,所述告警信息中包括以下至少之一:所述目标视频、所述移动轨迹、所述提示信息。
  27. 根据权利要求26所述的系统,其中,所述系统还包括:客户端,其中,
    所述第一服务器与所述客户端连接;
    所述第一服务器设置为将所述告警信息发送至所述客户端;
    所述客户端设置为在显示界面上显示所述告警信息。
  28. 根据权利要求26所述的系统,其中,所述系统还包括:第二服务器,其中,
    所述第二服务器与所述视频监控设备和所述第一服务器连接;
    所述视频监控设备还设置为将所述视频发送至所述第二服务器;
    所述第二服务器设置为存储所述目标视频;
    所述第一服务器设置为从所述第二服务器获取所述目标视频。
  29. 根据权利要求28所述的系统,其中,
    所述第一服务器还设置为:在确定所述对象不为所述目标对象的情况下,向所述第二服务器发送指示信息;
    所述第二服务器设置为:响应所述指示信息删除所述目标视频。
  30. 根据权利要求26所述的系统,其中,所述视频监控设备还设置为:
    从对所述目标区域进行拍摄得到的视频中获取从所述目标区域中出现所述对象开始直至所述目标区域中不再出现所述对象为止的第一视频;
    获取所述目标区域中出现所述对象之前的第一目标时间段的第二视频以及所述目标区域中不再出现所述对象之后的第二目标时间段的第三视频;
    将所述第二视频,所述第一视频和所述第三视频确定为所述目标视频。
  31. 一种目标对象的监控装置,应用于第一服务器,包括:
    接收模块,设置为接收视频监控设备在检测到目标区域中出现了移动的对象的情况下发送的图像,其中,所述图像是从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取的图像;
    确定模块,设置为根据所述图像确定所述对象是否为目标对象。
  32. 一种目标对象的监控装置,应用于视频监控设备,包括:
    获取模块,设置为在检测到目标区域中出现了移动的对象的情况下,从所述视频监控设备对目标区域进行拍摄得到的视频中出现了所述对象的目标视频上获取图像;
    发送模块,设置为将所述图像发送至第一服务器,其中,所述图像用于指示所述第一服务器根据所述图像确定所述对象是否为目标对象。
  33. 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至22任一项中所述的方法。
  34. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至22任一项中所述的方法。
PCT/CN2019/080747 2019-01-24 2019-04-01 目标对象的监控方法、装置及系统 WO2020151084A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2019570566A JP7018462B2 (ja) 2019-01-24 2019-04-01 目標対象物の監視方法、装置及びシステム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910068774.0 2019-01-24
CN201910068774.0A CN109919009A (zh) 2019-01-24 2019-01-24 目标对象的监控方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2020151084A1 true WO2020151084A1 (zh) 2020-07-30

Family

ID=66960691

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/080747 WO2020151084A1 (zh) 2019-01-24 2019-04-01 目标对象的监控方法、装置及系统

Country Status (3)

Country Link
JP (1) JP7018462B2 (zh)
CN (1) CN109919009A (zh)
WO (1) WO2020151084A1 (zh)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101344A (zh) * 2020-08-25 2020-12-18 腾讯科技(深圳)有限公司 一种视频文本跟踪方法及装置
CN112199993A (zh) * 2020-09-01 2021-01-08 广西大学 基于人工智能识别任意方向变电站绝缘子红外图像检测模型的方法
CN112437274A (zh) * 2020-11-17 2021-03-02 浙江大华技术股份有限公司 一种抓拍图片的传输方法及抓拍机
CN112565863A (zh) * 2020-11-26 2021-03-26 深圳Tcl新技术有限公司 视频播放方法、装置、终端设备及计算机可读存储介质
CN112633131A (zh) * 2020-12-18 2021-04-09 宁波长壁流体动力科技有限公司 一种基于深度学习视频识别的井下自动跟机方法
CN112784738A (zh) * 2021-01-21 2021-05-11 上海云从汇临人工智能科技有限公司 运动目标检测告警方法、装置以及计算机可读存储介质
CN112836089A (zh) * 2021-01-28 2021-05-25 浙江大华技术股份有限公司 运动轨迹的确认方法及装置、存储介质、电子装置
CN113055654A (zh) * 2021-03-26 2021-06-29 太原师范学院 边缘设备中的视频流有损压缩方法
CN113221800A (zh) * 2021-05-24 2021-08-06 珠海大横琴科技发展有限公司 一种待检测目标的监控判断方法及系统
CN113435368A (zh) * 2021-06-30 2021-09-24 青岛海尔科技有限公司 监控数据的识别方法和装置、存储介质及电子装置
CN113609317A (zh) * 2021-09-16 2021-11-05 杭州海康威视数字技术股份有限公司 一种图像库构建方法、装置及电子设备
CN114241420A (zh) * 2021-12-20 2022-03-25 国能(泉州)热电有限公司 一种动火作业检测方法及装置
CN114403047A (zh) * 2022-02-09 2022-04-29 上海依蕴宠物用品有限公司 一种基于图像分析技术的老龄动物健康干预方法及系统
CN115150371A (zh) * 2022-08-31 2022-10-04 深圳市万佳安物联科技股份有限公司 基于云平台的业务处理方法、系统及储存介质
CN115187916A (zh) * 2022-09-13 2022-10-14 太极计算机股份有限公司 基于时空关联的建筑内疫情防控方法、装置、设备和介质
CN115457447A (zh) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 运动物体识别的方法、装置、系统及电子设备、存储介质
CN116684626A (zh) * 2023-08-04 2023-09-01 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜
CN116890668A (zh) * 2023-09-07 2023-10-17 国网浙江省电力有限公司台州供电公司 信息同步互联的安全充电方法及充电装置
CN117392596A (zh) * 2023-09-07 2024-01-12 中关村科学城城市大脑股份有限公司 数据处理方法、装置、电子设备和计算机可读介质
CN117671597A (zh) * 2023-12-25 2024-03-08 北京大学长沙计算与数字经济研究院 一种老鼠检测模型的构建方法和老鼠检测方法及装置

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472492A (zh) * 2019-07-05 2019-11-19 平安国际智慧城市科技股份有限公司 目标生物检测方法、装置、计算机设备和存储介质
CN110516535A (zh) * 2019-07-12 2019-11-29 杭州电子科技大学 一种基于深度学习的老鼠活跃度检测方法和系统、及卫生评估方法
CN111753609B (zh) * 2019-08-02 2023-12-26 杭州海康威视数字技术股份有限公司 一种目标识别的方法、装置及摄像机
CN110674793A (zh) * 2019-10-22 2020-01-10 上海秒针网络科技有限公司 调味品容器加盖监测方法及系统
CN111126317B (zh) * 2019-12-26 2023-06-23 腾讯科技(深圳)有限公司 一种图像处理方法、装置、服务器及存储介质
CN111553238A (zh) * 2020-04-23 2020-08-18 北京大学深圳研究生院 一种用于动作的时间轴定位的回归分类模块和方法
CN111611938B (zh) * 2020-05-22 2023-08-29 浙江大华技术股份有限公司 一种逆行方向确定方法及装置
EP3929801A1 (en) * 2020-06-25 2021-12-29 Axis AB Training of an object recognition neural network
CN112001457A (zh) * 2020-07-14 2020-11-27 浙江大华技术股份有限公司 图像预处理方法、装置、系统和计算机可读存储介质
CN111898581B (zh) * 2020-08-12 2024-05-17 成都佳华物链云科技有限公司 动物检测方法、装置、电子设备及可读存储介质
CN112311966A (zh) * 2020-11-13 2021-02-02 深圳市前海手绘科技文化有限公司 一种短视频中动态镜头制作的方法和装置
CN112861826B (zh) * 2021-04-08 2021-12-14 重庆工程职业技术学院 基于视频图像的煤矿监管方法、系统、设备及存储介质
CN113487821A (zh) * 2021-07-30 2021-10-08 重庆予胜远升网络科技有限公司 基于机器视觉的电力设备异物入侵识别系统及方法
CN114051124B (zh) * 2022-01-17 2022-05-20 深圳市华付信息技术有限公司 支持多区域监控的视频监控方法、装置、设备及存储介质
CN115091472B (zh) * 2022-08-26 2022-11-22 珠海市南特金属科技股份有限公司 基于人工智能的目标定位方法及装夹机械手控制系统
TWI826129B (zh) * 2022-11-18 2023-12-11 英業達股份有限公司 週期時間偵測及修正系統與方法
CN117221391B (zh) * 2023-11-09 2024-02-23 天津华来科技股份有限公司 基于视觉语义大模型的智能摄像机推送方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160366346A1 (en) * 2015-06-12 2016-12-15 Google Inc. Using infrared images of a monitored scene to identify windows
CN106559645A (zh) * 2015-09-25 2017-04-05 杭州海康威视数字技术股份有限公司 基于摄像机的监控方法、系统和装置
CN106878666A (zh) * 2015-12-10 2017-06-20 杭州海康威视数字技术股份有限公司 基于监控摄像机来查找目标对象的方法、装置和系统
CN107358160A (zh) * 2017-06-08 2017-11-17 小草数语(北京)科技有限公司 终端监控视频处理方法、监控终端以及服务器
CN108259830A (zh) * 2018-01-25 2018-07-06 深圳冠思大数据服务有限公司 基于云服务器的鼠患智能监控系统和方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004266735A (ja) 2003-03-04 2004-09-24 Ecore Kk ねずみの監視システム
US7746378B2 (en) * 2004-10-12 2010-06-29 International Business Machines Corporation Video analysis, archiving and alerting methods and apparatus for a distributed, modular and extensible video surveillance system
CN101854516B (zh) * 2009-04-02 2014-03-05 北京中星微电子有限公司 视频监控系统、视频监控服务器及视频监控方法
JP2011197365A (ja) 2010-03-19 2011-10-06 Panasonic Corp 映像表示装置および映像表示方法
WO2017208356A1 (ja) 2016-05-31 2017-12-07 株式会社オプティム IoT制御システム、IoT制御方法及びプログラム
WO2019043855A1 (ja) 2017-08-31 2019-03-07 三菱電機株式会社 データ伝送装置、データ処理システムおよびデータ伝送方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160366346A1 (en) * 2015-06-12 2016-12-15 Google Inc. Using infrared images of a monitored scene to identify windows
CN106559645A (zh) * 2015-09-25 2017-04-05 杭州海康威视数字技术股份有限公司 基于摄像机的监控方法、系统和装置
CN106878666A (zh) * 2015-12-10 2017-06-20 杭州海康威视数字技术股份有限公司 基于监控摄像机来查找目标对象的方法、装置和系统
CN107358160A (zh) * 2017-06-08 2017-11-17 小草数语(北京)科技有限公司 终端监控视频处理方法、监控终端以及服务器
CN108259830A (zh) * 2018-01-25 2018-07-06 深圳冠思大数据服务有限公司 基于云服务器的鼠患智能监控系统和方法

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101344A (zh) * 2020-08-25 2020-12-18 腾讯科技(深圳)有限公司 一种视频文本跟踪方法及装置
CN112199993B (zh) * 2020-09-01 2022-08-09 广西大学 基于人工智能识别任意方向变电站绝缘子红外图像检测模型的方法
CN112199993A (zh) * 2020-09-01 2021-01-08 广西大学 基于人工智能识别任意方向变电站绝缘子红外图像检测模型的方法
CN112437274A (zh) * 2020-11-17 2021-03-02 浙江大华技术股份有限公司 一种抓拍图片的传输方法及抓拍机
CN112565863A (zh) * 2020-11-26 2021-03-26 深圳Tcl新技术有限公司 视频播放方法、装置、终端设备及计算机可读存储介质
CN112633131A (zh) * 2020-12-18 2021-04-09 宁波长壁流体动力科技有限公司 一种基于深度学习视频识别的井下自动跟机方法
CN112633131B (zh) * 2020-12-18 2022-09-13 宁波长壁流体动力科技有限公司 一种基于深度学习视频识别的井下自动跟机方法
CN112784738B (zh) * 2021-01-21 2023-09-19 上海云从汇临人工智能科技有限公司 运动目标检测告警方法、装置以及计算机可读存储介质
CN112784738A (zh) * 2021-01-21 2021-05-11 上海云从汇临人工智能科技有限公司 运动目标检测告警方法、装置以及计算机可读存储介质
CN112836089A (zh) * 2021-01-28 2021-05-25 浙江大华技术股份有限公司 运动轨迹的确认方法及装置、存储介质、电子装置
CN112836089B (zh) * 2021-01-28 2023-08-22 浙江大华技术股份有限公司 运动轨迹的确认方法及装置、存储介质、电子装置
CN113055654A (zh) * 2021-03-26 2021-06-29 太原师范学院 边缘设备中的视频流有损压缩方法
CN113221800A (zh) * 2021-05-24 2021-08-06 珠海大横琴科技发展有限公司 一种待检测目标的监控判断方法及系统
CN113435368A (zh) * 2021-06-30 2021-09-24 青岛海尔科技有限公司 监控数据的识别方法和装置、存储介质及电子装置
CN113435368B (zh) * 2021-06-30 2024-03-22 青岛海尔科技有限公司 监控数据的识别方法和装置、存储介质及电子装置
CN113609317A (zh) * 2021-09-16 2021-11-05 杭州海康威视数字技术股份有限公司 一种图像库构建方法、装置及电子设备
CN113609317B (zh) * 2021-09-16 2024-04-02 杭州海康威视数字技术股份有限公司 一种图像库构建方法、装置及电子设备
CN114241420A (zh) * 2021-12-20 2022-03-25 国能(泉州)热电有限公司 一种动火作业检测方法及装置
CN114403047B (zh) * 2022-02-09 2023-01-06 上海依蕴宠物用品有限公司 一种基于图像分析技术的老龄动物健康干预方法及系统
CN114403047A (zh) * 2022-02-09 2022-04-29 上海依蕴宠物用品有限公司 一种基于图像分析技术的老龄动物健康干预方法及系统
CN115150371A (zh) * 2022-08-31 2022-10-04 深圳市万佳安物联科技股份有限公司 基于云平台的业务处理方法、系统及储存介质
CN115187916A (zh) * 2022-09-13 2022-10-14 太极计算机股份有限公司 基于时空关联的建筑内疫情防控方法、装置、设备和介质
CN115457447A (zh) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 运动物体识别的方法、装置、系统及电子设备、存储介质
CN116684626A (zh) * 2023-08-04 2023-09-01 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜
CN116684626B (zh) * 2023-08-04 2023-11-24 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜
CN116890668A (zh) * 2023-09-07 2023-10-17 国网浙江省电力有限公司台州供电公司 信息同步互联的安全充电方法及充电装置
CN116890668B (zh) * 2023-09-07 2023-11-28 国网浙江省电力有限公司杭州供电公司 信息同步互联的安全充电方法及充电装置
CN117392596A (zh) * 2023-09-07 2024-01-12 中关村科学城城市大脑股份有限公司 数据处理方法、装置、电子设备和计算机可读介质
CN117392596B (zh) * 2023-09-07 2024-04-30 中关村科学城城市大脑股份有限公司 数据处理方法、电子设备和计算机可读介质
CN117671597A (zh) * 2023-12-25 2024-03-08 北京大学长沙计算与数字经济研究院 一种老鼠检测模型的构建方法和老鼠检测方法及装置

Also Published As

Publication number Publication date
JP2021514548A (ja) 2021-06-10
JP7018462B2 (ja) 2022-02-10
CN109919009A (zh) 2019-06-21

Similar Documents

Publication Publication Date Title
WO2020151084A1 (zh) 目标对象的监控方法、装置及系统
CN109922310B (zh) 目标对象的监控方法、装置及系统
CN109886130B (zh) 目标对象的确定方法、装置、存储介质和处理器
WO2020151083A1 (zh) 区域确定方法、装置、存储介质和处理器
US20220301300A1 (en) Processing method for augmented reality scene, terminal device, system, and computer storage medium
CN109886999B (zh) 位置确定方法、装置、存储介质和处理器
JP7229662B2 (ja) ビデオ監視システムで警告を発する方法
CN101918989B (zh) 带有对象跟踪和检索的视频监控系统
KR102296088B1 (ko) 보행자 추적 방법 및 전자 디바이스
CN109886129B (zh) 提示信息生成方法和装置,存储介质及电子装置
CN104303193B (zh) 基于聚类的目标分类
WO2021139049A1 (zh) 检测方法、检测装置、监控设备和计算机可读存储介质
CN106559645B (zh) 基于摄像机的监控方法、系统和装置
AU2012340862A1 (en) Geographic map based control
CN112733690A (zh) 一种高空抛物检测方法、装置及电子设备
US11134221B1 (en) Automated system and method for detecting, identifying and tracking wildlife
WO2021063046A1 (zh) 一种分布式目标监测系统和方法
JP6787831B2 (ja) 検索結果による学習が可能な対象検出装置、検出モデル生成装置、プログラム及び方法
CN108288017A (zh) 获取对象密度的方法及装置
CN109831634A (zh) 目标对象的密度信息确定方法及装置
CN111291646A (zh) 一种人流量统计方法、装置、设备及存储介质
KR101944374B1 (ko) 이상 개체 검출 장치 및 방법, 이를 포함하는 촬상 장치
KR102424098B1 (ko) 딥러닝을 이용한 드론 검출 장치 및 방법
KR102171384B1 (ko) 영상 보정 필터를 이용한 객체 인식 시스템 및 방법
CN111681269B (zh) 一种基于空间一致性的多摄像机协同人物追踪系统及训练方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019570566

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911023

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 15.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19911023

Country of ref document: EP

Kind code of ref document: A1