KR102110502B1 - transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression - Google Patents

transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression Download PDF

Info

Publication number
KR102110502B1
KR102110502B1 KR1020190042771A KR20190042771A KR102110502B1 KR 102110502 B1 KR102110502 B1 KR 102110502B1 KR 1020190042771 A KR1020190042771 A KR 1020190042771A KR 20190042771 A KR20190042771 A KR 20190042771A KR 102110502 B1 KR102110502 B1 KR 102110502B1
Authority
KR
South Korea
Prior art keywords
image
cctv
transcoding
camera
ethernet
Prior art date
Application number
KR1020190042771A
Other languages
Korean (ko)
Inventor
이현우
정승훈
이성진
Original Assignee
이노뎁 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 이노뎁 주식회사 filed Critical 이노뎁 주식회사
Priority to KR1020190042771A priority Critical patent/KR102110502B1/en
Application granted granted Critical
Publication of KR102110502B1 publication Critical patent/KR102110502B1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/10Current supply arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed circuit television systems, i.e. systems in which the signal is not broadcast
    • H04N7/181Closed circuit television systems, i.e. systems in which the signal is not broadcast for receiving images from a plurality of remote sources

Abstract

The present invention generally relates to a camera-linked transcoder device that transcodes a video generated by CCTV cameras for each zone in a video control system according to a network bandwidth and transmits it to a video control server. In particular, the present invention is equipped with a plurality of PoE ports that can be driven by connecting a CCTV camera to increase the convenience of the CCTV installation site, parsing the CCTV camera shooting video (eg, H.264 AVC, H.265 HEVC, etc.) The network environment given to each by extracting the region where there is some significant movement in the captured image through syntax information (e.g. motion vector, coding type) obtained by using this, and using this as the object ROI region, differentially applying the image compression rate and transcoding PoE camera-linked transcoder device that is configured to obtain CCTV video with high utilization value.

Description

Transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression}

The present invention generally relates to a camera-linked transcoder device that transcodes a video generated by CCTV cameras for each zone in a video control system according to a network bandwidth and transmits it to a video control server.

In particular, the present invention is equipped with a plurality of PoE ports that can be driven by connecting a CCTV camera to increase the convenience of the CCTV installation site, parsing the CCTV camera shooting video (eg, H.264 AVC, H.265 HEVC, etc.) Highly utilized in the network environment given to each by extracting the region where there is some significant movement in the captured image through the syntax information (eg motion vector, coding type) obtained by using the image compression rate differentially by using it as the object ROI region It relates to a PoE camera-linked transcoder device configured to obtain a CCTV video having value.

Recently, it is common to establish a video control system using CCTV to prevent crime or secure post-evidence. In the state of distributing a large number of CCTV cameras, the video captured by these CCTV cameras is displayed on a control monitor and stored in a storage device. When a control agent finds a scene in which a crime or accident occurs, he immediately responds to it and, if necessary, searches for images stored in the storage device to secure post-evidence.

In order to operate the CCTV video control system, a number of CCTV cameras must be distributed and distributed over a wide area, and the video generated by these CCTV cameras must be delivered to the video control server through the network. In this specification, the video control server is a term that collectively refers to a video processing configuration of a video control system consisting of a master server, a recording server, a video analysis server, and a video storage server. At this time, since it is unrealistic to directly connect multiple CCTV cameras to the video control server, it is common to configure a camera network as shown in [Fig. 1].

1 is a diagram showing the basic concept of a camera network generally adopted in a video control system. In the video control system, hundreds to thousands of CCTV cameras 11 are distributedly distributed over a wide area (for example, Gangnam-gu, Seoul), for example, close proximity because they cover the same distance from multiple CCTV cameras 11 Grouped CCTV cameras into small groups. By arranging the photographing image collection terminals 12 in such a small group unit, the photographed images generated by the CCTV camera 11 are transmitted to the video control server 13 through the photographed image collection terminals 12 of the corresponding small group.

In the video control system, the CCTV cameras 11 are distributed over a wide area, so the network environment given to them is not uniform. Referring to FIG. 1, a plurality of captured image collection terminals 12 are connected to the video control server 13 through broadband networks (WANs 1 to 3), respectively. Broadband networks have poor communication environments.

In response to this, the photographed image collection terminal 12 transcodes the photographed images according to the network environment in which they are placed. After collecting the captured image data from a plurality of CCTV cameras (11) connected to it, reducing the captured image data through image recompression to accommodate the bandwidth of the broadband network connecting itself and the video control server (13) will be. Generally, the captured image collection terminal 12 operates by decoding the captured image provided by the CCTV camera 11 and then performing encoding through bitrate control. The photographed image generated by the CCTV camera 11 installed in a location where the broadband network environment is poor is transmitted to the image control server 13 in a state in which the image quality is significantly deteriorated, and thus, the effectiveness of image control is deteriorated.

In addition, the CCTV installation site has been quite poor in the past. In general, CCTV cameras are installed on high poles and mounted on them. It is quite inconvenient to fix power after placing power lines and communication lines up to this point and connecting them with CCTV cameras. This inconvenience becomes worse when installing additionally or replacing it later than during the initial installation, and both power and communication have to be resolved, so you have to go up and down the column and check where the cause of the failure is and try to solve it repeatedly several times Also occurs.

An object of the present invention is to provide a camera-linked transcoder device that transcodes a video generated by CCTV cameras for each zone in a video control system according to a network bandwidth and transmits it to a video control server.

In particular, the object of the present invention is to increase the convenience of the CCTV installation site by providing a plurality of PoE ports that can be driven by connecting a CCTV camera, and the video of the CCTV camera (eg, H.264 AVC, H.265 HEVC, etc.) By syntactical information obtained by parsing (eg, motion vector, coding type), extracting an area where there is a significant motion in the captured image and using it as an object ROI region, apply the image compression rate differentially, in a given network environment. It is to provide a transcoder device interlocked with PoE cameras configured to obtain CCTV photographed images with high utilization value.

In order to achieve the above object, the PoE camera-linked transcoder device using the syntax-based object ROI compression according to the present invention performs photographing image interface and camera operation power supply for a plurality of CCTV cameras 11 and external images Ethernet hub unit 120 that provides a broadband access interface to the control server 13; After receiving the captured images of a plurality of CCTV cameras 11 through the Ethernet hub unit 120, and transcoding the captured images, the transcoding image is transmitted through the Ethernet hub unit 120 to the external video control server. The camera image transcoding unit 110 to be transmitted to (13); may be configured to include.

At this time, the Ethernet hub unit 120, an operation power input terminal receiving a DC operation power of a predetermined voltage, and a plurality of CCTV cameras 11 are each connected to provide a receiving interface for the captured image, and each connected CCTV camera (11) for a plurality of downstream PoE ports for supplying operation power from a DC operation power source, and connected to the camera image transcoding unit 110 to provide a transmission interface for a captured image and a reception interface for a transcoded image It has a downstream Ethernet port and an upstream Ethernet port that provides a broadband access interface for transmitting the transcoded image to the video control server 13.

In addition, the camera image transcoding unit 110 is equipped with an Ethernet port, is connected to the Ethernet port downstream of the Ethernet hub unit 120, the motion vector and coding obtained by parsing the captured image provided from the downstream Ethernet port Based on syntax information of one or more types, an object ROI region that is a chunk of an image block including motion is extracted from the captured image, and the captured image includes a first image region including an object ROI region and an object ROI region not including the object ROI region. It is divided into 2 video regions, and a transcoding image is generated by applying differential transcoding to a captured image so that the image quality of the first image region is better preserved than the second image region, and the Ethernet hub unit ( 120) may be configured to transmit to the video control server 13 through the broadband access interface provided. have.

Specifically, the camera image transcoding unit 110 is provided in real time to receive and collect the captured images generated by the plurality of CCTV cameras 11 through the Ethernet connection with the Ethernet port downstream of the Ethernet hub unit 120 An image local collection unit 111; The bitstream is parsed for each of the collected captured images to obtain syntax information that is at least one of a motion vector and a coding type, and based on one or more of a cumulative value of a motion vector and coding type information calculated in units of preset video blocks. A syntax-based ROI extraction unit 112 for extracting an object ROI region that is a chunk of an image block including motion from a captured image; By dividing the image of the captured image into a first image region including an object ROI region and a second image region not including an object ROI region, and setting the compression ratio for the first image region lower than the compression ratio of the second image region A differential compression unit 113 for generating a transcoding image by applying differential transcoding to the captured image; It may be configured to include a; a video recording server transmitting unit 114 for transmitting the transcoding image to the video control server 13 via a broadband connection interface provided by the Ethernet hub unit 120 through the Ethernet connection.

Alternatively, the camera image transcoding unit 110 receives and collects the captured images generated by the plurality of CCTV cameras 11 in real time through an Ethernet connection with the downstream Ethernet ports of the Ethernet hub unit 120. A local collection unit 111; The bitstream is parsed for each of the collected captured images to obtain syntax information that is at least one of a motion vector and a coding type, and based on one or more of a cumulative value of a motion vector and coding type information calculated in units of preset video blocks. A syntax-based ROI extraction unit 112 for extracting an object ROI region that is a chunk of an image block including motion from a captured image; The image of the captured image is divided into a first image region including an object ROI region and a second image region not including an object ROI region, the second image region is skipped, and only the first image region is encoded. A differential image compression unit 113 for generating a transcoding image by applying differential transcoding to the captured image by setting; It may be configured to include a; a video recording server transmitting unit 114 for transmitting the transcoding image to the video control server 13 via a broadband connection interface provided by the Ethernet hub unit 120 through the Ethernet connection.

In addition, the Ethernet hub unit 120 is connected to a plurality of CCTV cameras (11), each of which provides a receiving interface for the captured image, and a plurality of supplying operating power from the DC operating power to each connected CCTV camera (11) A PoE Ethernet hub 122 having a downstream PoE port 122-1 to 4 and an upstream Ethernet port 122-0 that provides a transmission interface for the captured image; It is connected to the upstream Ethernet port (122-0) of the PoE Ethernet hub (122) and is connected to the first downstream Ethernet port (121-2) and the camera image transcoding unit (110) that provide a reception interface for the captured image. A second downstream Ethernet port (121-1) providing a transmission interface for a captured image and a receiving interface for a transcoded image, and an upstream providing a broadband access interface for transmitting the transcoded image to the image control server (13). It may be configured to include; a passive Ethernet hub 121 having an Ethernet port (121-0).

In addition, the syntax-based ROI extractor 112 parses the bitstream of the captured image to obtain a motion vector and a coding type for the coding unit, and accumulates the motion vector for a preset time for each of a plurality of image blocks constituting the captured image When a value is obtained and a motion vector accumulation value is compared with a preset first threshold value for a plurality of image blocks, a corresponding image block is marked as a moving object region when the first threshold value is exceeded, and its neighbor is centered around the moving object region. A second threshold that identifies a plurality of video blocks (hereinafter referred to as 'neighbor blocks') and sets a motion vector value obtained in bitstream parsing or a cumulative motion vector for a predetermined time with respect to a plurality of neighboring blocks. When the second threshold is exceeded, a corresponding neighboring block is additionally marked as a moving object area, and among a plurality of neighboring blocks The neighboring block having a ding type of intra picture is additionally marked as a moving object area, and interpolation is performed on a plurality of moving object areas to additionally mark a predetermined number of non-marking video blocks surrounded by the moving object area as moving object areas. , It can be configured to set a chunk of an image block marked as a moving object region as an object ROI region.

According to the present invention, by simply connecting the CCTV cameras to be placed in each zone to the PoE port of the transcoder device with an Ethernet cable, it is possible to operate the camera and transmit the captured image, thereby improving the convenience of the CCTV installation site.

In addition, according to the present invention, by extracting the object ROI area from the CCTV image and transcoding by applying the image compression rate differentially, the important part of the CCTV image can be maintained clearly even in a low network bandwidth environment, thereby improving the effectiveness of the video control system. There is an advantage that can be increased.

In addition, according to the present invention, object ROI is based on syntax information (eg, motion vector, coding type) obtained by parsing a compressed image rather than performing complicated processing such as downscale resizing, difference image acquisition, and image analysis as in the prior art. Since the region is extracted, there is an advantage in that a smart edge can be implemented even in a transcoder device having low specification hardware performance.

1 is a diagram showing the basic concept of a camera network generally adopted in a video control system.
2 is a diagram showing the basic configuration of a PoE camera-linked transcoder device according to the present invention.
3 is a block diagram showing the internal configuration of a camera image transcoding unit according to the present invention.
4 is a flowchart illustrating a process in which the syntax-based ROI extraction unit detects an object ROI area in a CCTV photographed image in the present invention.
5 is a diagram showing an example of a result of detecting an effective motion area for a camera image in the present invention.
6 is a diagram showing an example of a result of detecting a boundary region for the video image of [FIG. 5] in the present invention.
7 is a diagram showing an example of a result of arranging a moving object region by interpolation with respect to the video image of [FIG. 6] in the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings.

2 is a diagram showing a basic configuration of a PoE camera-linked transcoder device 100 according to the present invention.

Referring to FIG. 2, the PoE camera-linked transcoder device 100 is generally connected to a small group of CCTV cameras 11 arranged in close proximity to receive the captured images generated by these cameras in real time, and placed themselves. After transcoding according to the network environment, it is transmitted to an external video control server 13. At this time, for ease of installation of the CCTV camera 11, which is usually placed on a high pillar, the transcoder device 100 is provided with a Power-on-Ethernet (PoE) port to transmit and receive photographed images and supply operating power only through an Ethernet connection. It is configured to achieve all at once.

Accordingly, the PoE camera-linked transcoder device 100 includes a camera image transcoding unit 110 and an external device, that is, a CCTV camera 11 that performs transcoding for each captured image generated by the CCTV camera 11. ) And an Ethernet hub unit 120 for providing a connection interface with the video control server 13. The camera image transcoding unit 110 receives the captured image of the CCTV camera 11 through the Ethernet hub unit 120, performs transcoding on the captured image, and then transmits the external image through the Ethernet hub unit 120. Transfer to the control server (13). The Ethernet hub unit 120 provides an interface for transmitting and receiving a captured image to a plurality of CCTV cameras 11 and supplies camera operation power through PoE. In addition, a broadband access interface to the external video control server 13 is provided.

In general, a communication hub includes a single upstream port and a plurality of downstream ports. Referring to FIG. 2, in the present invention, the Ethernet hub unit 120 is also provided with one upstream port (A) and a plurality of downstream ports (B1 to B5), some of the downstream ports (B2 to B5) ) Is composed of PoE port that can supply operating power.

Ethernet hub unit 120 is provided with a plurality of downstream PoE ports (B2 ~ B5), which are respectively connected to a plurality of CCTV cameras (11). Through this, the receiving interface provided in the form of an Ethernet packet is provided for the captured image generated by the CCTV camera 11, and the camera operation power is supplied to each connected CCTV camera 11 through PoE. To this end, the Ethernet hub unit 120 is provided with an operating power input terminal (P), the power of the PoE camera interlocked transcoder device 100 (eg, SMPS) of a preset voltage (eg, DC 48 V) DC operating power is provided.

The Ethernet hub unit 120 has a downstream Ethernet port B1, which is connected to the camera image transcoding unit 110. Through this, it provides a transmission interface for transmitting the captured image of the CCTV camera 11 to the camera image transcoding unit 110, while providing a reception interface for receiving the transcoding image from the camera image transcoding unit 110. do. That is, the data of the captured image generated by the CCTV camera 11 is provided to the camera image transcoding unit 110 via the Ethernet hub unit 120, and after the transcoding operation in the camera image transcoding unit 110 Return to the Ethernet hub unit 120 again.

The Ethernet hub unit 120 has an upstream Ethernet port A, which is connected to a broadband network. Through this, a broadband access interface is provided for transmitting the transcoding image transmitted from the camera image transcoding unit 110 to the image control server 13.

In addition, the camera image transcoding unit 110 is provided with an Ethernet port to be connected to the Ethernet port (B1) downstream of the Ethernet hub unit 120, and through this, receives the captured image from the Ethernet hub unit 120. For example, the camera image transcoding unit 110 may be implemented in a form in which transcoding software for compressed images is installed in MICOM hardware.

In the present invention, the camera image transcoding unit 110 extracts an object ROI region from the captured image based on syntax information, and for each image frame constituting the captured image, a portion corresponding to the object ROI region and an object ROI region Differential transcoding is applied by classifying the non-applicable parts. That is, one part of the image frame is encoded to clearly see a part, and the other part is blurred to be difficult to recognize properly or is skipped altogether. Through this, the entire data size to be transmitted through the broadband network is kept small, but an important part (for example, faces or belongings of moving people) in the CCTV photographed image is preserved so that the image can be clearly seen.

To this end, the camera image transcoding unit 110 is based on syntax information that is at least one of a motion vector and a coding type obtained by parsing a captured image provided from the Ethernet hub unit 120. The object ROI region which is a chunk is extracted, and the photographed image is divided into a first image region including the object ROI region and a second image region not including the object ROI region, and the image quality of the first image region is the second image region Differential transcoding is applied to the captured image so that it is better preserved, thereby generating a transcoded image. Then, the transcoding image is transmitted to the image control server 13 through the broadband connection interface provided by the Ethernet hub unit 120.

In the present invention, the Ethernet hub 120 is preferably configured in the form of a switching hub that supports PoE, which is a video control server 13 for media recording (eg, in the case of Fail-over) and CCTV for controlling a PTZ camera This is because the camera 11 can be directly connected to UDP and TCP multiple sockets. If the Ethernet hub 120 is configured in the form of a router that supports PoE, NAT or proxy (STUN / ICE) must be supported in order to enable this function. Therefore, a PoE camera-linked transcoder There is a disadvantage that the unit price of the device 100 increases.

At this time, preferably, the Ethernet hub unit 120 may be configured as a 2-chip solution, that is, a combination of a passive Ethernet hub 121 and a PoE Ethernet hub 122. The passive Ethernet hub 121 is in charge of the operation power supply is not necessary, that is, the camera video transcoding unit 110 and the broadband network access, PoE Ethernet hub 122 is in need of operating power supply, that is It is configured to be in charge of connection with the CCTV camera (11).

To this end, the PoE Ethernet hub 122 is connected to a plurality of CCTV cameras 11, respectively, to provide a reception interface for a captured image, and a plurality of supplying operation power from DC operation power to each connected CCTV camera 11 It may be provided with a downstream PoE port (122-1 to 4) and an upstream Ethernet port (122-0) that provides a transmission interface for the captured image.

In addition, the passive Ethernet hub 121 is connected to the upstream Ethernet port 122-0 of the PoE Ethernet hub 122, and provides a first downstream Ethernet port 121-2 and a camera image to provide a reception interface for the captured image. The second downstream Ethernet port 121-1, which is connected to the transcoding unit 110 and provides a transmission interface for a captured image and a reception interface for a transcoded image, and transmits the transcoded image to the image control server 13 It can be made by having an upstream Ethernet port (121-0) to provide a broadband access interface for.

3 is a block diagram showing the internal configuration of the camera image transcoding unit 110 according to the present invention.

Referring to FIG. 3, the camera image transcoding unit 110 according to the present invention includes a captured image local collection unit 111, a syntax-based ROI extraction unit 112, a captured image differential compression unit 113, and a captured image It comprises a server transmission unit 114.

First, the captured image local collection unit 111 is a component that collects the captured image generated by the CCTV camera 11 via the Ethernet hub 120. To this end, the captured image local collection unit 111 establishes an Ethernet connection with the downstream Ethernet port of the Ethernet hub unit 120, and the captured image generated by a plurality of CCTV cameras 11 through the Ethernet connection in real time. Receive and collect.

Referring to FIG. 2, the Ethernet connection between the CCTV camera 11 and the downstream PoE ports 122-1 to 4 of the PoE Ethernet hub 122, the upstream Ethernet port 122- of the PoE Ethernet hub 122 0) and the Ethernet connection between the first downstream Ethernet port 121-2 of the passive Ethernet hub 121, and the second downstream Ethernet port 121-1 of the passive Ethernet hub 121 and the camera image transcoding unit Via the Ethernet connection between the Ethernet ports of (110) in sequence, the captured image generated by each CCTV camera 11 is preferably delivered in real time to the captured image local collection unit (111).

The syntax-based ROI extraction unit 112 parses the bitstream for each of the captured images collected by the captured image local collection unit 111 to obtain syntax information, that is, motion vectors and coding types, and preset images After calculating the cumulative value of the motion vector (eg, the accumulated motion vector value for 500 msec) in units of blocks, such as macro blocks and sub blocks, the accumulated motion vector values for each video block unit And an object ROI region, which is a chunk of an image block including motion, from the captured image based on one or more of the coding type information.

In the present invention, the process of quickly extracting the object ROI region based on syntax information from the CCTV-captured image by the syntax-based ROI extraction unit 112 will be described later in detail with reference to FIG. 4.

The photographed image differential compression unit 113 is a component that transcodes the photographed image collected by the photographed image local collection unit 111 into a state suitable for transmission through a broadband network (WAN). In this case, transcoding is performed by applying a low compression rate so that the image quality can be preserved for the object ROI area extracted by the syntax-based ROI extraction unit 112, and the high compression rate is considered while other parts are considered to be blurry. Apply it to transcode or skip it altogether. As described above, there is an advantage in that low network bandwidth can be effectively utilized by performing differential transcoding in response to an object ROI region within an image frame. That is, even in a condition where the network bandwidth is low, unlike a conventional technology, it is possible to clearly identify a person's face or belongings, for example, by encoding at a high resolution similar to the original, in a high-resolution image.

To this end, the photographed image differential compression unit 113 divides the image of the photographed image into a first image region including an object ROI region and a second image region not including an object ROI region. In this case, the first image area may be set to be the same as the object ROI area, or may be set as a square area including the object ROI area. The second image area may be defined as a portion other than the first image area in the corresponding image frame, but is not limited thereto.

In addition, in transcoding the image frame of the captured image, the differential compression unit 113 applies a differential transcoding policy to the first and second image regions. As an embodiment, transcoding may be performed by setting the compression ratio for the first image region lower than the compression ratio for the second image region. In this case, the first image region including the object ROI region is encoded with a clearer image quality, and the second image region without the object ROI region is encoded with a more blurry image quality.

As another embodiment, the second video area may be skipped and only the first video area may be set as an encoding target. Skip processing means processing "no previous image and changed part", which means that there is no data to be compressed in the transcoding operation. Actually, there is a difference from the previous image, but simply ignores it and processes that there is no changed portion, which is equivalent to discarding the image information generated in the second image area. The first video area may be encoded with high image quality as much as the data space saved in the second video area.

The recording image server transmission unit 114 transmits the transcoding image generated by the recording image differential compression unit 113 to the image control server 13 through a broadband connection interface provided by the Ethernet hub unit 120 through an Ethernet connection. send.

4 is a flowchart illustrating a process in which the syntax-based ROI extraction unit 112 detects an object ROI area in a CCTV photographed image in the present invention.

Due to the nature of the CCTV image, the object ROI area corresponds to the part containing the moving object in the image. In the prior art, in order to identify a moving object from a compressed image, first, a compressed image is decoded according to a compressed format (eg, H.264 AVC, H.265 HEVC, etc.), and the decoded frame images are small images (eg, 320x240) ), Downscaled to, and after obtaining the differentials from the resized frame images, image analysis is performed. It is to identify moving objects by analyzing the image contents of a series of frame images constituting a compressed image. However, compressed video decoding, downscale resizing, and video analysis are very complicated processes, and therefore it is difficult to apply to low-spec hardware.

The present invention adopts a method of parsing a bitstream of a captured image to obtain syntax information for each image block and extracting a moving object region by utilizing it. At this time, any one or a combination of macro blocks and sub blocks may be adopted as the video block, and motion vector and coding type as syntax information. This is preferred. This process is very simple and requires little computation, so it can be quickly processed even with low-end hardware.

This method is conceptually different from the prior art in that it does not extract a moving object because it does not know the image content at all, but extracts a chunk of a video block that is supposed to contain something moving object based on syntax information. There is a difference. The moving object region thus obtained does not accurately reflect the boundary line of the moving object, as shown in FIGS. 5 to 7. However, the processing speed is fast and the reliability of detecting the moving object is high.

Meanwhile, according to the present invention, a moving object region can be identified and an object ROI region can be detected therefrom without decoding the captured image. However, if the device or software to which the present invention is applied, the scope of the present invention is not limited to not performing an operation of decoding a captured image.

Hereinafter, a process of detecting the object ROI area in the CCTV photographed image by the syntax-based ROI extraction unit 112 will be described in detail.

Step (S100 ~ S130): detects effective movement (actual movement) that is practically recognizable in CCTV photographed images, and sets the video blocks in which the effective movement is detected as a moving object area.

First, the coding unit of the captured image is parsed to obtain a motion vector and a coding type (S100). That is, parsing (header parsing) and motion vector operations are performed according to video compression standards such as H.264 AVC and H.265 HEVC on the bitstream of the captured video, and through this, a coding unit of the captured video The motion vector and coding type are parsed. Such a coding unit is generally an image block such as a macroblock or a sub-block, and the size of the coding unit is about 64x64 pixels to 4x4 pixels, and may be variously set according to the designer's selection.

Then, a motion vector accumulation value for a preset time (for example, 500 ms) is acquired for each of a plurality of image blocks constituting the captured image (S110). This step was presented with the intention to detect if there were any effective movements, such as a driving car, a running person, or a crowd fighting with each other, to the extent that it was practically recognizable from the captured image. Shaking leaves, ghosts that appear briefly, and shadows that change slightly due to reflection of light, etc., should not be detected as they are practically meaningless objects even though there is movement. To this end, the motion vector is accumulated by acquiring motion vectors in units of image blocks for a predetermined time (eg, 500 msec).

Then, the motion vector accumulation value is compared with a preset first threshold value (for example, 20) for a plurality of image blocks, and an image block having a motion vector accumulation value exceeding the first threshold is marked as a moving object area ( S120, S130). If an image block having a motion vector cumulative value equal to or greater than a certain level is found, it is marked as a moving object area, as something significant motion, that is, an effective motion is found in the corresponding image block. Conversely, even if a motion vector occurs, if the cumulative value for a certain time is small enough not to exceed the first threshold, the change in the image is not so large and is estimated to be negligible and ignored at this stage.

5 is an example of visually showing a result of detecting an effective motion area from a camera image taken through the process of (S100 ~ S130) in the present invention. An image block with a motion vector accumulation value of 20 or more for 500 msec was marked as a moving object area and displayed in red. 5, sidewalk blocks, roads, and shadows are not marked as moving object areas, while walking people or driving cars are marked as moving object areas.

Steps (S140 to S170): Next, in the present invention, a process of detecting a boundary area for a moving object area is performed. Looking at [Fig. 5], it can be found that the moving object is not completely marked and only the some blocks of the moving object are marked. That is, when looking at a walking person or a driving car, not all objects are marked, but only some image blocks are marked. Moreover, it is also found that a plurality of moving object areas are marked for one moving object. This means that the criteria for determining the moving object area of the preceding (S100 to S130) is very useful for filtering out the general area (ie, the background), but is too strict for finding the moving object area. Therefore, a process of detecting the boundary of the moving object is necessary by looking around the area around the moving object area.

The boundary of these moving object regions is expanded by examining the surrounding regions of the moving object regions detected in (S100 to S130) based on the motion vector and coding type. Through these processes, the moving object regions detected in the form of fragmented image blocks in (S100 to S130) are connected to each other to obtain a meaningful lump shape.

In the preceding (S100 ~ S130), by selecting the image blocks according to the strict criteria, the image blocks that seem to correspond to the moving objects in the captured image are detected and marked as moving object areas. In this time (S100), other image blocks located around the image blocks marked as the moving object area in (S100 to S130) are examined, and these are called 'neighbor blocks'. For the neighboring block, it is determined whether or not it corresponds to the moving object area according to the relatively relaxed determination criteria compared to (S100 to S130).

Macroblocks and subblocks are very small in the captured image. Therefore, if it is a video shot of a person, car, animal, etc., such as a CCTV video, it is difficult for a moving object to appear only in one video block due to its nature, and it is expected to appear across multiple video blocks. That is, it is predicted that the probability that a moving object is stamped on an image block existing in the vicinity of the image block on which the moving object is photographed is relatively higher than that on the image block that is not. Reflecting such technical ideas, (S140 to S170), it is determined whether or not it corresponds to the moving object area according to a relatively relaxed determination criterion for neighboring blocks existing around the moving object area.

Preferably, each neighboring block is inspected, and if the motion vector value detected in the current frame is greater than or equal to a preset second threshold (eg, 0) or the coding type is an intra picture, the corresponding image block is also a moving object Mark as an area. In another embodiment, when a motion vector accumulation value previously calculated in (S100 to S130) for a neighboring block is greater than or equal to a second threshold (for example, 5) or a coding type is an intra picture, the corresponding video block is also a moving object area. It can be marked. At this time, it is logically reasonable that the second threshold is set to a smaller value than the first threshold.

Conceptually, if an image block in which an effective motion is found and there is a certain amount of motion in the vicinity of the moving object area is marked as a moving object area because it is likely to be a single chunk with the preceding moving object area. In addition, in the case of an intra picture, since there is no motion vector, it is impossible to determine whether it is a moving object area based on the motion vector. Accordingly, if the intra picture is located adjacent to an image block that has already been detected as a moving object area, it is estimated that a single chunk is formed together with the extracted moving object area. This is because the loss when the moving object region is fragmented is not large, whereas the loss when the image block other than the moving object region is included in the moving object region is large.

Hereinafter, each step will be described.

First, a plurality of adjacent image blocks are identified centering on the image blocks marked as moving object areas by the preceding steps (S100 to S130) (S140). These are referred to herein as 'neighboring blocks'. These neighboring blocks are parts that are not marked as moving object areas by (S100 to S130), and by looking at them further in (S140 to S170), it is determined whether any of these neighboring blocks may be included in the boundary of the moving object area.

For each of the plurality of neighboring blocks, a motion vector value is compared with a preset second threshold, and neighboring blocks having a motion vector value exceeding the second threshold are marked as moving object areas (S150, S160). If the effective motion that can give practical meaning is located adjacent to the recognized moving object area, and a certain amount of motion is also found in itself, the video block may differ from the preceding moving object area due to the nature of the captured video (eg CCTV video). It is most likely a lump. Therefore, this neighboring block is also marked as a moving object area.

As a first embodiment to implement this, each neighboring block is inspected, and when the motion vector value detected in the current frame is greater than or equal to a preset second threshold (eg, 0), the corresponding image block is also marked as a moving object area.

On the other hand, as a second embodiment, when the motion vector accumulation value previously calculated in (S100 to S130) for a neighboring block is equal to or greater than a preset second threshold (eg, 5), the corresponding image block can also be marked as a moving object area. have. At this time, it is reasonable that the second threshold is set to a smaller value than the first threshold.

In addition, among the plurality of neighboring blocks, a coding type is an intra picture, and is marked as a moving object area (S170). In the case of an intra picture, since a motion vector does not exist, it is impossible to determine whether a corresponding neighboring block corresponds to a moving object area based on the motion vector. If the intra picture is located adjacent to an image block that has already been detected as a moving object area, it is preferable to estimate that it is formed as a lump together with the extracted moving object area. This is because the loss when the moving object region is fragmented is not large, whereas the loss when the image block other than the moving object region is included in the moving object region is large.

FIG. 6 is a diagram visually showing results applied to a boundary image detection process in a camera image in the present invention, and a plurality of image blocks marked as moving object regions are displayed in blue color through the above process. Looking at [Fig. 6], in the vicinity of the moving object area indicated in red in [Fig. 5], the moving object area of blue color is expanded more, and through this, it is enough to cover all moving objects when compared with the actual image. You can discover the facts.

Step (S180): Arrangement of fragmentation of the moving object area by applying interpolation to the moving object area detected through the above process. In the previous process, since it is determined whether the moving object area is in units of image blocks, there are several moving object areas because there is an image block that is not marked as a moving object area in the middle even though it is actually one moving object (eg, a person). The phenomenon of splitting into may occur. Accordingly, if there is one or a small number of non-marked image blocks surrounded by a plurality of image blocks marked as a moving object region, they are additionally marked as a moving object region. Through this, it is possible to make the moving object area divided into several pieces into one, and the effect of such interpolation is clearly seen when comparing [FIG. 6] and [FIG. 7].

Referring to FIG. 6, a non-marking image block is found between moving object regions indicated in blue. In the present invention, if there is one or a few non-marking image blocks surrounded by a plurality of image blocks marked as a moving object region, this is marked as a moving object region, which is called interpolation. Looking at [Fig. 7] in contrast to [Fig. 6], all non-marking image blocks existing between the moving object areas are marked as moving object areas. Through this, all the areas moving by the lump are bundled and treated as one moving object.

Comparing [FIG. 5], [FIG. 6], and [FIG. 7], it can be found that the moving object region properly reflects the situation of the real image through the boundary region detection process and the interpolation process. In [Fig. 5], if it is judged as a chunk marked with red, it will be treated as if a lot of very small objects move in the video screen, which does not correspond to the actual. On the other hand, if it is judged as a lump marked with blue in [Fig. 7], several moving objects having a certain volume will be treated as being, thus reflecting the actual scene similarly.

On the other hand, the present invention can be implemented in the form of computer-readable code on a computer-readable non-volatile recording medium. There are various types of storage devices such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, cloud disks, etc., and codes are distributed and stored in multiple networked storage devices. The form of being executed can also be implemented. In addition, the present invention may be implemented in the form of a computer program stored in a medium in order to execute a specific procedure in combination with hardware.

11: CCTV camera
12: Collecting video collection terminal
13: Video control server
100: PoE camera linked transcoder device
110: camera image transcoding unit
111: Local collection department
112: syntax-based ROI extraction unit
113: differential compression unit for the captured image
114: server for recording video
120: Ethernet hub unit
121: passive Ethernet hub
122: PoE Ethernet hub

Claims (8)

  1. delete
  2. delete
  3. delete
  4. delete
  5. delete
  6. A plurality of CCTV cameras 11 arranged in close proximity and a transcoder device 100 connected through an internal PoE Ethernet hub 122 transcodes the captured images generated by these CCTV cameras 11, respectively, to external video control servers. (13) as a PoE camera-linked image processing method performed to deliver to,
    The first PoA Ethernet hub 122 is connected to each of the plurality of CCTV cameras 11 through a plurality of downstream PoE ports 122-1 to 122-4 (B2 to B5) to supply operation power. ;
    The PoE Ethernet hub 122 is a plurality of downstream PoE ports (122-1 ~ 122-4; B2 ~ B5) through the plurality of CCTV cameras (11) each of the generated video generated in the form of Ethernet packets Receiving 1B;
    The internal passive Ethernet hub 121 is connected to the upstream Ethernet port 122-0 of the PoE Ethernet hub 122 through the first downstream Ethernet port 121-2, so that the plurality of CCTV cameras 11 A 2A step of receiving the generated photographed images (hereinafter, referred to as 'CCTV photographed images');
    The passive Ethernet hub 121 transcodes the camera image received through the first downstream Ethernet port 121-2 through the second downstream Ethernet port 121-1; B1. Step 2B provided to the unit 110;
    A third step in which the camera image transcoding unit 110 collects the CCTV photographed images in real time through an Ethernet connection with the second downstream Ethernet port 121-1 of the passive Ethernet hub 121;
    The camera image transcoding unit 110 parses a bitstream for each of the collected CCTV photographed images to obtain syntax information that is at least one of a motion vector and a coding type, and accumulates motion vectors calculated in units of preset video blocks. Based on one or more of values and coding type information, a chunk of an image block (hereinafter referred to as 'object ROI area') estimated to include a moving object is extracted for each image frame constituting the CCTV image. The fourth step;
    The camera image transcoding unit 110 includes the first image region including the object ROI region extracted in the fourth step and the object ROI region for each image frame constituting the CCTV photographed image. A 5A step of dividing into 2 image regions;
    The camera image transcoding unit 110 sets the compression ratio for the first image region to be lower than the compression ratio of the second image region for each image frame constituting the CCTV image. Step 5B of applying differential transcoding on an image frame basis to the CCTV photographed image by encoding so that it is relatively clearly visible and encoding the second image area to be relatively blurry;
    A 5C step in which the camera image transcoding unit 110 generates a transcoding image for each image frame constituting the CCTV photographed image through differential transcoding of the image frame unit;
    The camera image transcoding unit 110 provides the transcoding image to the passive Ethernet hub 121 through an Ethernet connection with the second downstream Ethernet port 121-1 of the passive Ethernet hub 121. Sixth step;
    The passive Ethernet hub 121 receives the transcoding image from the camera image transcoding unit 110 through the second downstream Ethernet port 121-1; B1, and an upstream Ethernet port 121-0; A seventh step of connecting to the broadband network via A) and transmitting the transcoding image to the image control server 13;
    PoE camera linked image processing method of a transcoder device comprising a.
  7. A plurality of CCTV cameras 11 arranged in close proximity and a transcoder device 100 connected through an internal PoE Ethernet hub 122 transcodes the captured images generated by these CCTV cameras 11, respectively, to external video control servers. (13) as a PoE camera-linked image processing method performed to deliver to,
    The first PoA Ethernet hub 122 is connected to each of the plurality of CCTV cameras 11 through a plurality of downstream PoE ports 122-1 to 122-4 (B2 to B5) to supply operation power. ;
    The PoE Ethernet hub 122 is a plurality of downstream PoE ports (122-1 ~ 122-4; B2 ~ B5) through the plurality of CCTV cameras (11) each of the generated video generated in the form of Ethernet packets Receiving 1B;
    The internal passive Ethernet hub 121 is connected to the upstream Ethernet port 122-0 of the PoE Ethernet hub 122 through the first downstream Ethernet port 121-2, so that the plurality of CCTV cameras 11 A 2A step of receiving the generated photographed images (hereinafter, referred to as 'CCTV photographed images');
    The passive Ethernet hub 121 receives the CCTV captured image received through the first downstream Ethernet port 121-2 through the second downstream Ethernet port 121-1, an internal camera image transcoding unit ( Step 2B provided by 110);
    A third step in which the camera image transcoding unit 110 collects the CCTV photographed images in real time through an Ethernet connection with the second downstream Ethernet port 121-1 of the passive Ethernet hub 121;
    The camera image transcoding unit 110 parses a bitstream for each of the collected CCTV photographed images to obtain syntax information that is at least one of a motion vector and a coding type, and accumulates motion vectors calculated in units of preset video blocks. A fourth step of extracting an object ROI region that is a chunk of an image block including motion in units of image frames of the CCTV photographed image based on one or more of values and coding type information;
    The camera image transcoding unit 110 includes the first image region including the object ROI region extracted in the fourth step and the object ROI region for each image frame constituting the CCTV photographed image. A 5A step of dividing into 2 image regions;
    The camera image transcoding unit 110 skips the second image area for each image frame constituting the CCTV image, and sets only the first image area as an encoding target, so that the CCTV image is captured. Step 5B of applying differential transcoding on a per image frame basis;
    A 5C step in which the camera image transcoding unit 110 generates a transcoding image for each image frame constituting the CCTV photographed image through differential transcoding of the image frame unit;
    The camera image transcoding unit 110 provides the transcoding image to the passive Ethernet hub 121 through an Ethernet connection with the second downstream Ethernet port 121-1 of the passive Ethernet hub 121. Sixth step;
    The passive Ethernet hub 121 receives the transcoding image from the camera image transcoding unit 110 through the second downstream Ethernet port 121-1; B1, and an upstream Ethernet port 121-0; A seventh step of connecting to the broadband network via A) and transmitting the transcoding image to the image control server 13;
    PoE camera linked image processing method of a transcoder device comprising a.
  8. The method according to claim 6 or 7,
    The fourth step,
    Parsing the bitstream of the CCTV photographed image to obtain a motion vector and a coding type for the coding unit;
    Acquiring a motion vector accumulation value for a preset time for a plurality of video blocks constituting the CCTV image;
    Marking the corresponding image block as a moving object area when the motion vector cumulative value is compared to a preset first threshold value for the plurality of image blocks and exceeds the first threshold value;
    Identifying a plurality of adjacent image blocks (hereinafter referred to as 'neighbor blocks') with respect to the moving object area;
    When the motion vector value obtained in the bitstream parsing or the motion vector accumulation value for the preset time is compared with a preset second threshold for the plurality of neighboring blocks, the corresponding neighboring block is exceeded. Further marking the moving object area;
    Marking a neighboring block having an intra-picture coding type as a moving object area among the plurality of neighboring blocks;
    Performing interpolation on the plurality of moving object areas to additionally mark non-marked image blocks of a preset number or less surrounded by the moving object areas as moving object areas;
    Setting a chunk of the image block marked as the moving object area as an object ROI area;
    PoE camera linked image processing method of a transcoder device, characterized in that comprises a.
KR1020190042771A 2019-04-11 2019-04-11 transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression KR102110502B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020190042771A KR102110502B1 (en) 2019-04-11 2019-04-11 transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020190042771A KR102110502B1 (en) 2019-04-11 2019-04-11 transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression

Publications (1)

Publication Number Publication Date
KR102110502B1 true KR102110502B1 (en) 2020-05-13

Family

ID=70730087

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020190042771A KR102110502B1 (en) 2019-04-11 2019-04-11 transcoder device for PoE cameras by use of syntax-based object Region-Of-Interest compression

Country Status (1)

Country Link
KR (1) KR102110502B1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110092143A (en) * 2010-02-08 2011-08-17 삼성전자주식회사 Client terminal, server, cloud computing system and cloud computing method
KR20170033075A (en) * 2015-09-16 2017-03-24 김용묵 parking guidance image camera, system for parking guidance using the same and method for parking guidance by parking guidance system
KR101949676B1 (en) * 2017-12-20 2019-02-19 이노뎁 주식회사 syntax-based method of providing intrusion detection in compressed video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110092143A (en) * 2010-02-08 2011-08-17 삼성전자주식회사 Client terminal, server, cloud computing system and cloud computing method
KR20170033075A (en) * 2015-09-16 2017-03-24 김용묵 parking guidance image camera, system for parking guidance using the same and method for parking guidance by parking guidance system
KR101949676B1 (en) * 2017-12-20 2019-02-19 이노뎁 주식회사 syntax-based method of providing intrusion detection in compressed video

Similar Documents

Publication Publication Date Title
US10186297B2 (en) Reference and non-reference video quality evaluation
US20190156490A1 (en) Adaptive video streaming
US10567765B2 (en) Streaming multiple encodings with virtual stream identifiers
JP6121518B2 (en) Method and apparatus for encoding a selected spatial portion of a video stream
JP5479504B2 (en) Region-of-interest video processing on the decoder side
KR101443070B1 (en) Method and system for low-latency transfer protocol
KR100988753B1 (en) Methods and device for data alignment with time domain boundary
RU2369039C1 (en) Image encoding device, imade decoding device, image encoding method and image decoding method
RU2518435C2 (en) Encoder optimisation in stereoscopic video delivery systems
US9154799B2 (en) Encoding and decoding motion via image segmentation
US8902971B2 (en) Video compression repository and model reuse
JP3719933B2 (en) Hierarchical digital video summary and browsing method and apparatus
US7672378B2 (en) Spatio-temporal graph-segmentation encoding for multiple video streams
US8681866B1 (en) Method and apparatus for encoding video by downsampling frame resolution
US6100940A (en) Apparatus and method for using side information to improve a coding system
CN105900425B (en) Indicate the motion vector in encoded bit stream
US9402034B2 (en) Adaptive auto exposure adjustment
JP5192393B2 (en) Multi-view video processing
JP5596801B2 (en) Monitoring system
CN1242623C (en) Video coding
KR101823537B1 (en) Method of identifying relevant areas in digital images, method of encoding digital images, and encoder system
KR101244911B1 (en) Apparatus for encoding and decoding muti-view image by using camera parameter, and method thereof, a recording medium having a program to implement thereof
AU2010241260B2 (en) Foreground background separation in a scene with unstable textures
Gualdi et al. Video streaming for mobile video surveillance
US7606391B2 (en) Video content scene change determination

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant