WO2021020866A1

WO2021020866A1 - Image analysis system and method for remote monitoring

Info

Publication number: WO2021020866A1
Application number: PCT/KR2020/009954
Authority: WO
Inventors: 심재술
Original assignee: (주)유디피
Priority date: 2019-07-31
Filing date: 2020-07-28
Publication date: 2021-02-04
Also published as: KR102247359B1; WO2021020866A9; KR20210014988A

Abstract

The present invention relates to an image analysis system and method for remote monitoring, whereby objects identified in images received periodically rather than in real time from a camera unit positioned at a remote location can be analyzed on the basis of deep learning to provide an event when an object to be monitored is detected, and thus the time required to analyze the object to be monitored can be reduced and costs can be lowered.

Description

Video analysis system and method for remote monitoring

The present invention relates to an image analysis system and method for remote monitoring, and more particularly, when detecting an object to be monitored by analyzing an object identified in an image periodically received from a camera unit located in a remote location instead of in real time, based on deep learning. The present invention relates to an image analysis system and method for remote monitoring capable of reducing time and cost for analyzing a monitored object by providing an event.

With the recent development of video transmission and video analysis related technologies and various video related devices, including cameras that support them, monitoring that receives high-quality images from cameras located in remote locations and analyzes the high-quality images to detect objects to be monitored. The development of the system is continuing.

However, video analysis methods for tracking and detection of objects to be monitored applied to existing monitoring systems presuppose images received in real time, so even if cameras located in remote locations over several kilometers are high performance, tens of frames per second are required. Not only is it difficult to transmit high-definition real-time images, but also a considerable cost is required to configure cameras to support them, so in configuring a monitoring system, a remotely located camera is operated to transmit images to a monitoring server that monitors the images. Users have difficulty in adopting such a high-performance camera.

Accordingly, users use low-cost cameras as cameras connected to the monitoring server, and these low-cost cameras transmit images at a very small frame rate per second. During analysis, false alarms are frequently output, which causes a problem of lowering the reliability of the monitoring system.

In order to solve the above-described problem, the present invention applies a deep learning-based image analysis method to an image received from a camera located at a remote location and has a very small number of transmission frames per second to easily detect an object to be monitored in the image. The purpose of this is to increase the efficiency and reliability of the system for video monitoring by providing support to provide events and shortening the time required for video analysis during deep learning-based video analysis.

In addition, an object of the present invention is to support to reduce system configuration cost by supporting reliable detection of an object to be monitored even when a low-cost camera with a very small number of transmission frames per second is used.

An image analysis system for remote monitoring according to an embodiment of the present invention includes an image collection unit that receives a plurality of images from a camera unit, and an object in which movement occurs by analyzing each of the plurality of images according to a preset image analysis algorithm is detected. An object extraction unit for extracting an object region for the object from the plurality of images; an image synthesis unit for generating a composite image obtained by combining at least one object region extracted from the object extraction unit into a single image; and the composite image A deep learning unit that identifies the object to be monitored by analyzing through a deep learning algorithm in which a pattern for the object to be monitored is set in advance, and object information on the identified object from the deep learning unit is received, and the object information is It may include an event determination unit that determines that an event occurs when a set event occurrence condition is satisfied.

As an example related to the present invention, the object information includes a type of an object and a degree of similarity with the object to be monitored, and the event determination unit includes a type of the object related to the object to be monitored in advance, and the type of the object according to the object information. If they coincide and the degree of similarity is equal to or greater than a preset reference value, it may be determined that an event occurs.

As an example related to the present invention, the event determination unit may further include an event notification unit for generating and outputting event information when an event occurs as a result of the determination of the event determination unit or transmitting it to a preset external device.

As an example related to the present invention, the object extraction unit generates a median image obtained by synthesizing the plurality of images, and the movement occurs among the plurality of images through a difference image from the median image for each of the plurality of images. It may be characterized in that the object region of the object is extracted for each image in which the object is detected.

As an example related to the present invention, the object extracting unit extracts the object region from the specific image along an outline of the region determined as the object from the specific image in which the object is detected, and the image combining unit It may be characterized in that one or more object regions extracted by the object extracting unit are synthesized into one composite image through a preset box filling problem.

As an example related to the present invention, the plurality of images may be images corresponding to an object detected by detection of a sensor configured in the camera unit or an image analysis by the camera unit.

An image analysis method for remote monitoring of a monitoring server that communicates with a camera unit through a communication network according to an embodiment of the present invention includes receiving a plurality of images from the camera unit, and analyzing each of the plurality of images in advance. Extracting an object region for the object from the plurality of images when an object in which movement has occurred by analyzing according to an algorithm is detected, and generating a composite image by combining the extracted one or more object regions into one image; Identifying an object to be monitored by analyzing the composite image through a deep learning algorithm in which a pattern for a predetermined object to be monitored is learned, receiving object information on the identified object, and generating an event in which the object information is preset When the condition is satisfied, it may include determining that an event has occurred.

The present invention provides an image analysis algorithm based on a real-time image at a number of frames per second due to a large data transmission distance and a low performance of the camera unit when transmitting a plurality of images to a monitoring server according to object detection from an image in a remote location. Even if an image is transmitted in the form of a snapshot that is insufficient to identify the object to be monitored through the camera unit, only the object region is separated and extracted from the plurality of images transmitted from the camera unit and synthesized into a single image. By supporting the ability to accurately detect whether the object detected by the device is an object to be monitored related to an event, the camera unit located in a remote location can be configured as a low-cost camera, thereby reducing system configuration costs and improving the reliability of the object analysis results. There is a guarantee effect.

In addition, the present invention does not analyze the entire area of each of the plurality of images received from the camera unit in the monitoring server through a deep learning algorithm, but separates only the object area where movement has occurred in each of the plurality of images, and then converts it into one image. The analysis time required for object identification of the deep learning algorithm can be greatly shortened by analyzing a single composite image synthesized by using a deep learning algorithm.Through this, even if the number of camera units communicating with the monitoring server is large, In addition to supporting rapid event determination based on identification, even if the number of camera units increases, the number of monitoring servers for accommodating them and hardware performance can be reduced, thereby reducing system configuration cost.

1 is a block diagram of an image analysis system for remote monitoring according to an embodiment of the present invention.

2 is a detailed configuration diagram of a monitoring server configuring an image analysis system for remote monitoring according to an embodiment of the present invention.

3 to 5 are diagrams illustrating operations for a process of extracting an object area and generating a composite image of a monitoring server according to an embodiment of the present invention.

6 and 7 are diagrams illustrating operations of a monitoring server for identification of an object to be monitored and an event generation process according to an embodiment of the present invention.

8 is a flowchart of an image analysis method for remote monitoring according to an embodiment of the present invention.

Hereinafter, detailed embodiments of the present invention will be described with reference to the drawings.

1 is a configuration diagram of an image analysis system for remote monitoring according to an embodiment of the present invention, including a camera unit 10 located at a remote location and a monitoring server 100 communicating through a communication network as shown. Can be.

In this case, the camera unit 10 may be configured as an IP (Internet Protocol) camera.

In addition, the camera unit 10 may include a sensor unit such as a passive infrared sensor (PIR) or perform self-image analysis, and when detecting an object that satisfies a preset condition through the sensing signal or image analysis of the sensor unit A plurality of images in which the object is detected may be transmitted to the monitoring server 100 based on the object detection time point.

In this case, each of the plurality of images may be composed of a frame that is a snapshot.

That is, the camera unit 10 has difficulty in transmitting a large amount of data related to a real-time high-definition image due to the distance from the monitoring server 100, so that the data capacity of the image, the data transmission speed of the camera unit 10, and the network environment Considering that, an image composed of a very small number of the snapshots per second may be generated and transmitted to the monitoring server 100 periodically during the time when an object is detected.

As an example, the camera unit 10 may generate and transmit an image composed of a snapshot of 2 frames per second.

In addition, the monitoring server 100 may receive a plurality of images according to the detection of one or more objects from the camera unit 10.

The monitoring server 100 detects and extracts an object moving in the image targeting a plurality of images in the form of a snapshot that is periodically transmitted instead of a real-time image, and selects an object to be monitored among one or more objects displayed in the plurality of images. By identifying based on deep learning, an event can be generated when a monitored object that satisfies a preset event condition is identified.

In this case, as an example of the object to be monitored described in the present invention, a moving person or vehicle may be set as the object to be monitored in the monitoring server 100.

That is, the monitoring server 100 can easily and accurately identify the object to be monitored for 10 frames or less, not a real-time image of several tens of frames per second, and provide an event for this, and through this, a camera unit located at a remote location ( Even if 10) is configured as a low-cost, low-performance camera, it is possible to reduce system configuration cost and increase system reliability by supporting the monitoring server 100 to easily monitor the object to be monitored.

Based on the above-described configuration, a detailed configuration of the monitoring server 100 constituting an image analysis system for remote monitoring according to an embodiment of the present invention will be described below with reference to the drawings.

2 is a configuration diagram of the monitoring server 100, as shown, an image collection unit 110, an object extraction unit 120, an image synthesis unit 130, a deep learning unit 140, and , An event determination unit 150 and an event notification unit 160 may be included.

At this time, at least one of the image collection unit 110, the object extraction unit 120, the image synthesis unit 130, the deep learning unit 140, the event determination unit 150, and the event notification unit 160 May be configured as a control unit that controls the monitoring server 100, and a component part of the monitoring server 100 other than the control unit may be controlled by the control unit.

In addition, the control unit executes an overall control function of the monitoring server 100 using programs and data stored in the monitoring server 100. The control unit may include RAM, ROM, CPU, GPU, and bus, and RAM, ROM, CPU, GPU, and the like may be connected to each other through a bus.

Detailed operation configurations of each component of the monitoring server 100 will be described with reference to FIGS. 3 to 7.

3 to 5 are diagrams illustrating operations of a process of extracting an object area and generating a composite image of the monitoring server 100 according to an embodiment of the present invention.

First, as shown in FIG. 3, the image collection unit 110 may receive a plurality of images from a camera unit 10 located at a remote location through a communication network.

In this case, the image collection unit 110 may store the plurality of images in the DB 101 included in the monitoring server 100.

In addition, the object extraction unit 120 receives a plurality of images from the image collection unit 110, detects an object in which movement has occurred in the plurality of images, and determines the object region of the detected object from the plurality of images. Can be extracted.

In this case, the object extraction unit 120 may analyze each of the plurality of images received from the image collection unit 110 through a preset image analysis algorithm to detect an object in which movement has occurred.

In addition, the object extracting unit 120 may apply a differential image method, a Model of Gaussian (MOG) algorithm using Gaussian Mixture Models (GMM), a codebook algorithm, and the like as the image analysis algorithm.

In addition, when a specific object in which movement has occurred is detected, the object extracting unit 120 may extract an object region corresponding to the specific object from each of at least one or more images in which the specific object is detected from among the plurality of images.

At this time, since the plurality of images transmitted from the camera unit 10 has a very small number of frames per second of about 2 frames, the object extraction unit 120 detects an object that has moved when the image analysis algorithm is applied to the plurality of images. Errors may occur in detection.

In order to prevent such an error, the object extraction unit 120 generates a median image obtained by synthesizing the plurality of images, and detects an object through a difference image from the median image for each of the plurality of images. The object area of the object may be extracted for each image in which the moving object is detected among the plurality of images.

Through this, the object extracting unit 120 may prevent an error when analyzing an image targeting a plurality of images having a very small number of frames per second.

In addition, the object extraction unit 120 may prevent an error by generating an analysis target image obtained by obtaining a horizontal edge and a vertical edge for each of a plurality of images, and calculating the difference between the analysis target images to detect the object. have.

Also, the object extraction unit 120 may detect one or more objects from each of the plurality of images.

Meanwhile, the image synthesizing unit 130 may interwork with the object extracting unit 120 to collect an object area for each object extracted from the plurality of images, and combine the object area for each object into a single image. have.

At this time, the image synthesis unit 130 synthesizes one or more object regions for each object extracted by the object extraction unit 120 from at least one of a plurality of images through a preset bin packing problem. Can be combined into images.

When a detailed example of the above-described operation content is described with reference to FIGS. 4 and 5, the object extracting unit 120 for a plurality of images collected by the image collection unit 110 as shown in FIG. 4 When an object related to a moving vehicle is detected through a predetermined image analysis algorithm for a plurality of images, an object region for each object may be extracted from the plurality of images.

In this case, the object extracting unit 120 may also detect nonsensical objects such as a shadow of a vehicle to be monitored or a light reflected by the vehicle as a moving object, together with the object to be monitored, and detect an object area according to such a meaningless object. To minimize, the object region may be extracted from a specific image in which a moving object is detected along an outline of an area determined as the moving object.

That is, the object extraction unit 120 is the object of the meaningless object, which is regarded as noise such as a change in a shadow or light attached to the object to be monitored when extracting the object area to a bounding box in the process of extracting the object-related object area. Due to the problem that the size of the bounding box of the object to be monitored increases more than necessary depending on the region, only the region related to the object in which movement has occurred can be extracted roughly along the outline without using the bounding box.

In addition, as shown in FIG. 5, the image synthesis unit 130 may collect one or more object regions extracted from the plurality of images by the object extraction unit 120 and combine them into one image.

In this case, the image synthesizing unit 130 may apply one or more object regions to a preset box filling problem-related algorithm as shown in Fig. 5(a) and synthesize it into one image. As shown in ), a composite image in which the one or more object regions are combined may be generated.

In the above-described configuration, the object extraction unit 120 may provide the position of the object in the image for each image in which the object is detected among a plurality of images to the image synthesis unit 130, and the image synthesis unit 130 ) May also record the position of the object in the image in the composite image.

On the other hand, FIGS. 6 and 7 are exemplary diagrams of the operation of the monitoring server 100 for identification of the object to be monitored and the event generation process according to the embodiment of the present invention. As shown in FIG. 6, the deep learning unit ( 140) receives a composite image from the image synthesizing unit 130, analyzes the composite image through a deep learning algorithm in which a preset pattern for the object to be monitored is learned, and identifies the object to be monitored from the composite image. have.

In this case, the deep learning unit 140 may continuously learn the composite images generated by the deep learning algorithm in response to images provided whenever an object is detected by the camera unit 10, and through the learning It is possible to learn the pattern of the object to be monitored in the learning algorithm.

As an example, the deep learning unit 140 may output object information for each object identified in the composite image through the deep learning algorithm, connected to the monitoring server 100 or through a separately configured output unit. The deep learning algorithm receives feedback information selected by a user as an object to be monitored from among object information output from the deep learning unit 140 through the user interface unit 170 to be received, and the deep learning algorithm based on the feedback information By modifying, the identification error of the object to be monitored can be reduced so that a pattern for the object to be monitored can be learned.

In this case, the deep learning unit 140 may learn a pattern of an object to be monitored, such as a person or vehicle, to the deep learning algorithm.

In addition, the deep learning algorithm is preferably Regions with Convolutional Neural Network (R-CNN), but is not limited thereto, and various neural network models may be applied.

In addition, the user interface unit 170 may be included in the monitoring server 100 and configured.

According to the above configuration, as shown in FIG. 6, the deep learning unit 140 analyzes one or more object regions included in the composite image through a deep learning algorithm, and among objects corresponding to the object region, the object to be monitored An object is identified through a deep learning algorithm, and object information including the object type of the object to be monitored and the similarity between the object type is generated in correspondence to the object area identified as the object to be monitored, and then the event determination unit 150 Can provide.

As an example, as shown in FIG. 6, the deep learning unit 140 corresponds to a specific object area among one or more object areas included in the composite image, and when a specific object identified is a person-related monitoring object, the object type is The object information including a degree of similarity to a person may be generated in relation to the specific object set as a person and identified as a monitoring target object.

As another example, as shown in FIG. 7, when the object identified in correspondence with the specific object area is a vehicle-related monitoring object, the object type is set as a vehicle, and the object type is set as a monitoring object. In relation to the identified object, the object information including a degree of similarity to a vehicle may be generated.

In this case, the composite image may include location information on a location (or location of an object) corresponding to the object area in the image corresponding to the object area for each object area, and the deep learning unit 140 Among the location information for each object area included in the image, location information of an object area corresponding to the specific object identified as the object to be monitored may be included in the object information corresponding to the specific object.

In addition, the deep learning unit 140 matches the object information generated for each object area identified as the object to be monitored among the one or more object areas with the object area corresponding to the object information in the composite image and adds it to the composite image. Alternatively, a composite image including the object information may be provided to the event determination unit 150.

On the other hand, the event determination unit 150 receives object information on the identified object from the deep learning unit 140, and when the object information satisfies a preset event occurrence condition, a plurality of images transmitted from the camera unit 10 It can be determined as the occurrence of an event corresponding to the image of

For example, when the type of the object according to the object information matches the type of the object related to the object to be monitored previously set, and the similarity is greater than or equal to a preset reference value, the event determination unit 150 may determine that an event has occurred.

That is, as shown in FIGS. 6 and 7, the event determination unit 150 sets an object type related to a person or vehicle as a monitoring target object among one or more object information provided from the deep learning unit 140, and If there is object information having a similarity with the person or vehicle equal to or greater than a preset reference value, it is determined that a preset event condition is satisfied, and the event is determined to have occurred in response to the camera unit 10 that transmitted the plurality of images. I can.

In addition, the event notification unit 160 may interwork with the event determination unit 150 to generate event information when the event determination unit 150 determines that an event has occurred and output it through the output unit.

In this case, when an event occurs, the event notification unit 160 may include a plurality of images corresponding to the event in the event information and output through the output unit.

In addition, the event notification unit 160 may transmit the event information to a preset external device through a communication network.

In addition, the event notification unit 160 may identify object information that satisfies the event occurrence condition in conjunction with the event determination unit 150, and monitor among a plurality of images based on the location information included in the object information. For each image in which the target object exists, a plurality of images in which the location of the object to be monitored is marked with a preset mark may be generated and then included in the event information and transmitted.

Meanwhile, in the above-described configuration, the monitoring server 100 may communicate with a plurality of different camera units 10 through a communication network.

In this case, the image collection unit 110 configured in the monitoring server 100 may allocate different channels to each of the plurality of camera units 10 and receive a plurality of images for each channel.

In addition, the monitoring server 100 may classify the plurality of camera units 10 through a channel, and individually determine whether an event occurs for each of the plurality of camera units 10 as described above.

As described above, the present invention provides an event by determining whether the object is a monitoring target object in a monitoring server for an object detected as an event by the camera unit, according to object detection in an image by a camera unit located at a remote location. When transmitting multiple images to the monitoring server, the data transmission distance is significant and the number of frames per second is transmitted in the form of a snapshot that is insufficient to identify the object to be monitored through an image analysis algorithm based on real-time images due to the low performance of the camera unit. Even in the case of such a case, as described above, whether the object detected by the camera is easily detected by the camera unit through a deep learning algorithm after separating and extracting only the object area from the plurality of images transmitted from the camera unit and combining it into one image. By supporting to accurately detect the object, it is possible to configure the camera unit located in a remote location as a low-cost camera, thereby reducing the cost of system configuration and guaranteeing the reliability of the object analysis result.

In addition, the present invention does not analyze the entire area of each of the plurality of images received from the camera unit in the monitoring server through a deep learning algorithm, but separates only the object area where movement has occurred in each of the plurality of images, and then converts it into one image. The analysis time required for object identification of the deep learning algorithm can be greatly shortened by analyzing a single composite image synthesized by using a deep learning algorithm.Through this, even if the number of camera units communicating with the monitoring server is large, In addition to supporting rapid event determination based on identification, even if the number of camera units increases, the number of monitoring servers and hardware performance for accommodating them can be reduced, thereby reducing system configuration cost.

8 is a flowchart illustrating an image analysis method for remote monitoring of a monitoring server communicating with a camera unit through a communication network according to an embodiment of the present invention.

First, the monitoring server 100 may receive a plurality of images from the camera unit 10 (S1).

In addition, the monitoring server 100 analyzes each of the plurality of images according to a preset image analysis algorithm (S2) and extracts an object region for the object from the plurality of images when an object in which movement has occurred (S3) is detected. I can (S4).

Next, the monitoring server 100 may generate a composite image obtained by combining the extracted one or more object regions into one image (S5).

In addition, the monitoring server 100 may identify the object to be monitored by analyzing the composite image through a deep learning algorithm in which a pattern for the object to be monitored is set in advance (S6).

Thereafter, the monitoring server 100 may receive object information on the identified object, determine that an event has occurred when the object information satisfies a preset event occurrence condition, and output event information according to the event occurrence or The event information may be transmitted to a set external device (S7).

The various devices and components described herein may be implemented by hardware circuitry (eg, CMOS-based logic circuitry), firmware, software, or a combination thereof. For example, it may be implemented using transistors, logic gates, and electronic circuits in the form of various electrical structures.

The above contents may be modified and modified without departing from the essential characteristics of the present invention by those of ordinary skill in the technical field to which the present invention pertains. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain the technical idea, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

Claims

An image collection unit receiving a plurality of images from the camera unit;

An object extracting unit that analyzes each of the plurality of images according to a preset image analysis algorithm and extracts an object region for the object from the plurality of images when a moving object is detected;

An image synthesis unit for generating a composite image obtained by combining one or more object regions extracted by the object extraction unit into one image;

A deep learning unit that identifies the object to be monitored by analyzing the composite image through a deep learning algorithm in which a pattern for the object to be monitored is learned in advance; And

An event determination unit that receives object information on the identified object from the deep learning unit and determines that an event occurs when the object information satisfies a preset event occurrence condition

Video analysis system for remote monitoring comprising a.
The method according to claim 1,

The object information includes a type of an object and a degree of similarity with the object to be monitored, and the event determination unit matches the type of the object according to the object information to a type of the object related to the object to be monitored, and the degree of similarity is preset. An image analysis system for remote monitoring, characterized in that it determines that an event has occurred when the value exceeds the reference value.
The method according to claim 1,

An image analysis system for remote monitoring, further comprising an event notification unit generating and outputting event information when an event occurs as a result of the determination of the event determination unit or transmitting it to a preset external device.
The method according to claim 1,

The object extraction unit generates a median image obtained by synthesizing the plurality of images, and for each of the plurality of images, the object for each image in which the moving object is detected through a difference image from the median image An image analysis system for remote monitoring, characterized in that extracting the object area of.
The method according to claim 1,

The object extracting unit extracts the object region from the specific image along an outline of the region determined to be the object from the specific image in which the object is detected,

The image synthesizing unit synthesizes one or more object regions extracted by the object extracting unit from at least one of a plurality of images into one composite image through a preset box filling problem.
The method according to claim 1,

The image analysis system for remote monitoring, wherein the plurality of images are images corresponding to an object detected by detection of a sensor configured in the camera unit or an image analysis by the camera unit.
In the image analysis method for remote monitoring of a monitoring server communicating through a camera unit and a communication network,

Receiving a plurality of images from the camera unit;

Analyzing each of the plurality of images according to a preset image analysis algorithm and extracting an object region for the object from the plurality of images when an object in which movement has occurred is detected;

Generating a composite image obtained by combining the extracted one or more object regions into one image;

Analyzing the composite image through a deep learning algorithm in which a predetermined pattern for the object to be monitored is learned to identify the object to be monitored; And

Receiving object information on the identified object, and determining that an event occurs when the object information satisfies a preset event occurrence condition

Image analysis method for remote monitoring comprising a.