CN112241982B

CN112241982B - Image processing method, device and machine-readable storage medium

Info

Publication number: CN112241982B
Application number: CN201910651713.7A
Authority: CN
Inventors: 蔡晓望
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2024-08-27
Anticipated expiration: 2039-07-18
Also published as: CN112241982A

Abstract

The application provides an image processing method, an image processing device and a machine-readable storage medium, wherein the method comprises the following steps: performing target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame; determining a corresponding region of interest (ROI) region of the target in each monitoring video frame according to the position information of the same target in each monitoring video frame; performing enhancement processing on the image sequence of the ROI region based on the quality scores of the ROI region in each monitoring video frame; and carrying out coding transmission on the image sequence after the enhancement processing. The method can improve the effect of optimizing the image quality.

Description

Image processing method, device and machine-readable storage medium

Technical Field

The present application relates to the field of video monitoring, and in particular, to an image processing method, an image processing device, and a machine-readable storage medium.

Background

In recent years, with the continuous development of technology, more and more intelligent cameras are applied to monitoring scenes to complete specific tasks. For example, the face snapshot machine can detect faces in the monitoring picture in real time, and transmit detected face images to the server for face recognition; the traffic snapshot machine can help traffic management departments to rapidly and effectively monitor illegal vehicles and conduct snapshot evidence collection on the illegal vehicles.

However, due to the influence of factors such as illumination, camera mounting height, target movement, etc., the quality of the region of interest (region of interest, abbreviated as ROI) in the final obtained target image is generally different, and some images have better quality, but many times there may be poor quality conditions such as blurring, insufficient brightness, insufficient contrast, etc.

Aiming at the situation, the current solution is to optimize the decoded image data at the rear end, improve the brightness and contour details of the ROI area in the monitoring image, and control the color of the ROI area, thereby achieving the purpose of improving the definition of the image of the ROI area.

Practice finds that the information when the solution processes the ROI is the image data decoded after network transmission, and the information is seriously lost due to encoding compression at the moment, so that the subsequent processing is improved only to a limited extent; in addition, the current solution only processes traffic vehicle information, and has limited applicable scenes.

Disclosure of Invention

In view of the above, the present application provides an image processing method and apparatus thereof.

Specifically, the application is realized by the following technical scheme:

According to a first aspect of an embodiment of the present application, there is provided an image processing method applied to a video monitoring front-end device, the method including:

performing target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame;

Determining a corresponding ROI (region of interest) of the same target in each monitoring video frame according to the position information of the target in each monitoring video frame;

Intercepting the ROI area from the cached monitoring video frames in the original format according to the quality scores of the ROI area in each monitoring video frame, and carrying out enhancement processing on the image sequence of the intercepted ROI area in the original format;

And carrying out coding transmission on the image sequence after the enhancement processing.

According to a second aspect of an embodiment of the present application, there is provided an image processing apparatus applied to a video monitoring front-end device, the apparatus including:

The target detection unit is used for detecting and tracking the targets of the monitoring video frames so as to determine the position information of the same target in each monitoring video frame;

the determining unit is used for determining a region of interest (ROI) of the target corresponding to each monitoring video frame according to the position information of the same target in each monitoring video frame;

The intercepting unit is used for intercepting the ROI area from the cached monitoring video frames in the original format according to the quality scores of the ROI areas in the monitoring video frames;

an enhancement processing unit, configured to perform enhancement processing on the truncated image sequence of the ROI area in the original format;

And the transmission unit is used for carrying out coding transmission on the image sequence after the enhancement processing.

According to a third aspect of embodiments of the present application, there is provided an image processing apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to perform the above-described image processing method.

According to a fourth aspect of embodiments of the present application there is provided a machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the above-described image processing method.

According to the image processing method, target detection and tracking are carried out on the monitoring video frames through the video monitoring front-end equipment, so that the position information of the same target in each monitoring video frame is determined, the corresponding region of interest (ROI) of the target in each monitoring video frame is determined according to the position information of the same target in each monitoring video frame, furthermore, the ROI is intercepted from the cached monitoring video frames in the original format according to the quality scores of the ROI in each monitoring video frame, the image sequence of the intercepted ROI in the original format is subjected to enhancement processing, and the image sequence after the enhancement processing is subjected to coding transmission, so that the influence of information loss in the compression transmission process on image quality optimization is avoided, and the effect of image quality optimization is improved; in addition, the image quality optimization is not limited to vehicles any more, and the applicable scene of the scheme is expanded.

Drawings

FIG. 1 is a flow chart of an image processing method according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a specific application scenario according to an exemplary embodiment of the present application;

FIG. 3A is a schematic diagram of a first process module according to an exemplary embodiment of the present application;

FIG. 3B is a schematic diagram of a second sub-processing unit according to an exemplary embodiment of the present application;

FIG. 3C is a schematic diagram of a second process module according to an exemplary embodiment of the present application;

fig. 4 is a schematic structural view of an image processing apparatus according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram showing a hardware configuration of an image processing apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the following description will simply explain the original format described in the present application.

The raw format refers to the raw data format in which the image sensor converts the captured light source signals into digital signals, the raw data being sensed data containing signals from one or more spectral bands.

By way of example, the raw data may include sensed data obtained by sampling optical signals in the spectral band having wavelengths in the range of 380nm to 780nm, and/or 780nm to 2500 nm.

For example, an RGB sensor senses the resulting RAW (unprocessed) image signal.

The imaging device collects light source signals, converts the collected light source signals into analog signals, converts the analog signals into digital signals, inputs the digital signals into a processing chip for processing (the processing may include bit width clipping, image processing, codec processing, etc.), obtains data in a second data format (the original format may be referred to as a first data format), and transmits the data in the second data format to a display device for display or other devices for processing.

It can be seen that the image in the original format is an image when the sensor converts the collected light source information into a digital signal, the image is not processed by the processing chip, and the bit width is high, and compared with the image in the second data format which is processed by bit width clipping, image processing and encoding and decoding, the image contains more abundant image information.

In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solution of the embodiments of the present application in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of an image processing method provided in an embodiment of the present application is shown, wherein the image processing method may be applied to a video monitoring front-end device, such as IPC (Internet Protocol Camera, webcam), and as shown in fig. 1, the image processing method may include the following steps:

For convenience of description and understanding, the following description will take IPC as an example of the execution subjects of step S100 to step S130.

And step S100, performing target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame.

In the embodiment of the application, when the IPC acquires the monitoring video, the target detection can be carried out on the monitoring video frames, and the target is tracked after the target is detected, so that the position information of the same target in each monitoring video frame can be determined.

By way of example, targets may include, but are not limited to, pedestrians, vehicles, animals, license plates, and the like.

In one example, the above-mentioned object detection and tracking on the surveillance video frames to determine the location information of the same object in each surveillance video frame may include:

And performing target detection and tracking on the monitoring video frames based on the neural network so as to determine the position information of the same target in each monitoring video frame.

For example, in order to realize target detection and tracking of the monitoring video frames, a neural network for target detection and tracking can be trained through a training sample with target labeling, and the trained neural network is utilized to perform target detection and tracking on the monitoring video frames so as to determine the position information of the same target in each monitoring video frame.

Step S110, determining the ROI area of the target in each monitoring video frame according to the position information of the same target in each monitoring video frame.

In the embodiment of the application, for any target, the IPC can determine the ROI area corresponding to the target in each monitoring video frame according to the position information of the target in each monitoring video frame.

For example, for any monitoring video frame, the target frame selection area in the monitoring video frame may be determined as the ROI area corresponding to the target in the monitoring video frame.

And step S120, according to the quality scores of the ROI areas in the monitoring video frames, the ROI areas are intercepted from the cached monitoring video frames in the original format, and the image sequences of the ROI areas in the original format are enhanced.

In the embodiment of the application, the video monitoring front-end equipment can also cache the monitoring video frame in the original format when performing target detection and tracking on the monitoring video frame.

For any object, when the IPC determines the ROI area of the object in each monitoring video frame, the ROI area can be truncated from the cached monitoring video frame in the original format according to the quality score of the ROI area in each monitoring video frame, and the enhancement processing can be performed on the image sequence of the truncated ROI area in the original format.

In one possible implementation manner, the capturing the ROI area from the buffered monitoring video frame in the original format according to the quality score of the ROI area in each monitoring video frame may include:

Determining a quality score of the ROI area in each monitoring video frame;

based on the quality scores of the ROI areas in each monitoring video frame, the ROI areas with the quality scores higher than a preset scoring threshold value are intercepted from the cached monitoring video frames in the original format.

For example, for any object, the IPC may score the quality of the ROI area in each surveillance video frame when the object determines the ROI area in each surveillance video frame.

For example, the ROI area may be quality-scored according to the target imaging quality (such as face definition, face angle, etc.) in the ROI area, or the ROI area may be quality-scored according to the quality (such as image definition, brightness, contrast, etc.) of the image corresponding to the ROI area, which is not described in detail in the embodiments of the present application.

In one example, a quality score for the ROI area in each surveillance video frame may be determined based on the neural network.

When the IPC determines the quality score of the ROI area in each monitoring video frame, the monitoring video frame with the quality score of the ROI area not higher than a preset score threshold (which may be set according to an actual application scenario) may be removed from the cached monitoring video frames in the original format based on the quality score of the ROI area in each monitoring video frame, and the monitoring video frame with the quality score of the ROI area higher than the preset score threshold may be subjected to ROI area interception, so as to obtain an image sequence of the ROI area in the original format, and the intercepted image sequence of the ROI area in the original format may be subjected to enhancement processing.

In one example, the enhancing the truncated image sequence of the ROI area in the original format may include:

selecting a reference frame from a plurality of frame images in an input image sequence of the ROI area in the original format;

aligning the multi-frame image pair based on the reference frame;

And carrying out enhancement processing on the multi-frame images based on the complementary information among the aligned multi-frame images.

For example, when enhancement processing is required for an image sequence of an ROI region in an original format, a plurality of frame images (a specific frame number may be set according to an actual scene) may be selected as an input of the enhancement processing from the image sequence of the ROI region in the original format at a time, and one frame image may be selected as a reference frame from the input plurality of frame images.

For example, an intermediate frame or a last frame in the multi-frame image may be selected as the reference frame.

For example, assuming that the number of multi-frame images is 3 frames, the 2 nd frame or the 3 rd frame may be selected as the reference frame.

After the reference frame is selected, the multiple frame images can be aligned based on the reference frame, and enhancement processing can be performed on the multiple frame images based on complementary information between the aligned multiple frame images.

And step S130, coding and transmitting the image sequence after the enhancement processing.

In the embodiment of the application, after the IPC performs enhancement processing on the image sequence of the corresponding ROI area in the manner described in the above steps, the image sequence after enhancement processing may be encoded and compressed, and the encoded and compressed code stream may be transmitted to the back-end device through the network.

For example, after the IPC performs enhancement processing on the image sequence of the corresponding ROI area in the manner described in the above steps, a part or all of the image sequence may be selected from the image sequence after the enhancement processing, and compressed and encoded, and transmitted to the back-end device through the network.

In one example, the above-mentioned encoding and transmitting the image sequence after the enhancement processing may include:

and for the same target, selecting a frame of image from the image sequence after the enhancement processing for coding transmission.

For example, for any object, after the IPC performs enhancement processing on the image sequence of the ROI area corresponding to the object in the manner described in the above steps, a frame of image may be selected from the image sequence after the enhancement processing for encoding and transmission.

For example, the IPC may determine a quality score for each enhanced image frame, and select the image frame with the highest quality score for encoding transmission.

In the flow of the method shown in fig. 1, the video monitoring front-end equipment optimizes the image quality of the monitoring video frame aiming at the target in the monitoring video, so that the influence of information loss in the compression transmission process on the image quality optimization is avoided, and the effect of the image quality optimization is improved; in addition, the image quality optimization is not limited to vehicles any more, and the applicable scene of the scheme is expanded.

In order to enable those skilled in the art to better understand the technical scheme provided by the embodiment of the present application, the technical scheme provided by the embodiment of the present application is described below in connection with a specific application scenario.

Referring to fig. 2, a schematic structural diagram of a specific application scenario provided in an embodiment of the present application is shown in fig. 2, where in the application scenario, a video monitoring front-end device may include a first processing module, an ROI area extracting module, a second processing module, and a third processing module.

The image processing flow of the present application will be described below in connection with the functions of the respective modules.

1. First processing module

The first processing module is used for detecting the target and tracking the target after the target is detected.

The first processing module can detect targets such as pedestrians, vehicles, animals or license plates, and the like, accurately position the targets after detecting the targets, and output target position information after positioning; and then tracking the target around the target according to the positioned target position information, and determining the position information of the target in each frame of image.

The first processing module may also buffer the image sequence during object detection, for example, in a first data format (e.g., RGB (red green blue) format or YUV (brightness, chromaticity, saturation) format).

In one example, referring to fig. 3A, a first processing module may include: the system comprises a first sub-processing unit, a second sub-processing unit and a third sub-processing unit.

The first sub-processing unit is used for carrying out target detection so as to determine target position information in each frame of image;

the second sub-processing unit is used for tracking the target to determine the position information of the same target in each frame of image;

The third sub-processing unit is used for evaluating the image quality of the target area and outputting an evaluation score (namely the quality score).

For example, the first sub-processing unit may be implemented by a neural network, directly outputting the target coordinates. As shown in fig. 3B, the neural network used to implement the first sub-processing unit may include a convolutional layer (Conv), a pooling layer (Pool), a fully-connected layer (FC layer), and a frame regression (BBR).

By way of example, the operation of the convolutional layer may be expressed by the following formula:

YC_i(I)＝g(W_i*YC_i-1(I)+B_i)

Where YC _i (I) is the output of the current convolutional layer, YC _i-1 (I) is the input of the current convolutional layer, W _i and B _i are the weight coefficient and offset coefficient of the convolutional filter of the current convolutional layer, respectively, g () represents the activation function, and g (x) =max (0, x) when the activation function is ReLU.

The pooling layer is a special downsampling layer, namely, the feature map obtained by convolution is reduced, the size of a reducing window is NxN, and when the maximum pooling is used, namely, the maximum value is obtained for the NxN window to be used as the value of the corresponding point of the latest image, and the specific formula is as follows:

YP_j(I)＝maxpool(YP_j-1(I))

Wherein YC _i (I) is the input of the jth pooling layer, and YP _j (I) is the output of the jth pooling layer.

The fully connected layer FC can be regarded as a convolution layer with a filter window of 1×1, and the specific implementation is similar to convolution filtering, and the expression is as follows:

wherein F _kI (I) is the input of the kth full connection layer, YF _k (I) is the output of the kth full connection layer, R, C is the width and height of F _kI (I), W _ij and B _ij are the connection weight and bias of the full connection layer, respectively, and g () represents the activation function.

The frame regression BBR is used for searching a relation so that the window P output by the full connection layer is mapped to obtain a window G' which is closer to the more real window G. Regression is typically implemented by translating or scaling the window P.

Assuming that the coordinates of the window P output by the full link layer are (x ₁,x₂,y₁,y₂), the transformed window post-coordinates (x ₃,x₄,y₃,y₄), if transformed into a translation transformation, the translation scale is (Δx, Δy), and the coordinate relationship before and after the translation is:

x₃＝x₁+Δx

x₄＝x₂+Δx

y₃＝y₁+Δy

y₄＝y₂+Δy

if the transformation is scaling transformation, the scaling scale in X, Y direction is dx and dy respectively, and the coordinate relation before and after transformation is as follows:

x₄-x₃＝(x₂-x₁)*dx

y₄-y₃＝(y₂-y₁)*dy

The second sub-processing unit is used for positioning the target in the current frame according to the target position information of the current image frame detected by the first sub-processing unit, comparing the similarity between the target of the current image frame and the target of the previous image frame, and if the similarity is higher than a set threshold, the target is considered to be the same target, and the target position information is updated, so that target tracking is realized.

The third sub-processing unit is used for evaluating the image quality of the target area of the current image frame to obtain a quality score.

Taking the image quality evaluation of the target area by the average luminance as an example, the average luminance of the target area can be calculated as follows:

Where L _m denotes the average luminance of the target area, n denotes the total number of pixels of the target area, and R denotes the target area.

The calculation formula of the quality score of the target area is as follows:

S＝100-|80-L_m|

The higher the quality score, the better the representative image quality. .

2. ROI region clipping module

The ROI area clipping module is configured to clip an image of an ROI area (i.e., the above-mentioned target area) in a multi-frame image (original image) based on the position information of the target.

For example, the ROI area clipping module may clip the ROI area image from the cached image sequence in the first data format based on the location information of the object.

When the original image is subjected to ROI region image interception, the original image of which the quality score of the target region is not higher than a preset score threshold is removed, the original image of which the quality score of the target region is higher than the preset score threshold is reserved, and the ROI region image interception is performed.

3. Second processing module

The second processing module is used for aligning the multiple frames of ROI area images and carrying out enhancement processing on the multiple frames of ROI area images according to complementary information among the multiple frames of ROI area images.

In one example, as shown in fig. 3C, the second processing module includes: a fourth sub-processing unit and a fifth sub-processing unit.

The fourth sub-processing unit is used for selecting one frame from the input multi-frame ROI region image as a reference frame, and carrying out position transformation on other frames based on the reference frame so as to align the other frames with the reference frame;

the fifth sub-processing unit is used for performing enhancement processing according to the aligned multi-frame ROI area images and by utilizing complementary information among the multi-frame ROI area images.

In one example, after the enhancement processing is performed in the above-described manner on the input multi-frame ROI area image, one frame of the enhanced ROI area image (may be a reference frame enhanced with complementary information between the multi-frame ROI area images) may be output.

The fourth sub-processing unit may or may not be implemented by a convolutional neural network, for example.

For example, assuming that the fourth sub-processing unit is implemented by a non-convolutional neural network, feature points of a target image (such as any frame of ROI area image except a reference frame in the multi-frame ROI area image) and a reference image (i.e. the reference frame) may be extracted by feature point detection methods such as SIFT (Scale-INVARIANT FEATURE TRANSFORM, scale invariant feature transform) and Harris corner detection; then calculating according to the position information among the feature points to obtain image transformation parameters; and finally, transforming the multi-frame images to obtain aligned multi-frame image data.

Methods of image transformation may include, but are not limited to, projective transformation, affine transformation, rotational translational transformation, and the like.

Taking the image transformation by rotation translation transformation as an example, the image transformation formula is as follows:

Wherein x and y represent coordinates of each pixel point of the image after transformation, w and z represent coordinates of each pixel point of the image, and alpha and d _hor、d_ver represent parameters of image transformation obtained by the transformation parameter calculation module.

The fifth sub-processing unit is for enhancing the multi-frame ROI area image, which may be implemented by a convolutional neural network or by a non-convolutional neural network, which is not described herein.

4. Third processing module

The third processing module is used for carrying out compression coding on the ROI area image after the enhancement processing and transmitting the ROI area image to the back-end equipment through a network.

In the embodiment of the application, the video monitoring front-end equipment detects and tracks the target of the monitoring video frames to determine the position information of the same target in each monitoring video frame, and determines the corresponding region of interest (ROI) of the target in each monitoring video frame according to the position information of the same target in each monitoring video frame, further, according to the quality score of the ROI in each monitoring video frame, the ROI is intercepted from the cached monitoring video frames in the original format, the image sequence of the intercepted ROI in the original format is enhanced, and the image sequence after the enhancement is encoded and transmitted, so that the influence of information loss in the compression and transmission process on the image quality optimization is avoided, and the image quality optimization effect is improved; in addition, the image quality optimization is not limited to vehicles any more, and the applicable scene of the scheme is expanded.

The method provided by the application is described above. The device provided by the application is described below:

referring to fig. 4, a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, as shown in fig. 4, the image processing apparatus may include:

The target detection unit 410 is configured to perform target detection and tracking on the monitoring video frames to determine position information of the same target in each monitoring video frame;

a determining unit 420, configured to determine, according to the position information of the same target in each monitoring video frame, a region of interest ROI region corresponding to the target in each monitoring video frame;

a capturing unit 430, configured to capture the ROI area from the buffered original format monitoring video frame according to the quality score of the ROI area in each monitoring video frame;

an enhancement processing unit 440 for performing enhancement processing on the truncated image sequence of the ROI area in the original format;

and the transmission unit 450 is used for coding and transmitting the image sequence after the enhancement processing.

In an alternative embodiment, the target detection unit 410 is specifically configured to perform target detection and tracking on the surveillance video frames based on a neural network, so as to determine location information of the same target in each surveillance video frame.

In an alternative embodiment, the determining unit 420 is further configured to determine a quality score of the ROI area in each of the surveillance video frames;

The intercepting unit 430 is specifically configured to intercept, from the cached original format surveillance video frames, the ROI areas with quality scores higher than a preset score threshold, based on the quality scores of the ROI areas in the surveillance video frames.

In an alternative embodiment, the enhancement processing unit 440 is specifically configured to select a reference frame from multiple frame images in the input image sequence of the ROI area in the original format; aligning the multi-frame image pair based on the reference frame; and carrying out enhancement processing on the multi-frame images based on the complementary information among the aligned multi-frame images.

In an alternative embodiment, the transmission unit 450 is specifically configured to select, for the same object, a part or all of the image frames from the image sequence after the enhancement processing for encoding and transmission.

In an alternative embodiment, the transmission unit 450 is specifically configured to select, from the image sequence after the enhancement processing, a frame image with the highest quality score for encoding transmission.

Fig. 5 is a schematic hardware structure diagram of an image processing apparatus according to an embodiment of the application. The image processing apparatus may include a processor 501, a machine-readable storage medium 502 storing machine-executable instructions. The processor 501 and machine-readable storage medium 502 may communicate via a system bus 503. Also, the processor 501 may perform the image processing methods described above by reading and executing machine-executable instructions corresponding to image processing logic in the machine-readable storage medium 502.

The machine-readable storage medium 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state disk, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

Embodiments of the present application also provide a machine-readable storage medium, such as machine-readable storage medium 502 in fig. 5, comprising machine-executable instructions executable by processor 501 in an image processing apparatus to implement the image processing method described above.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. An image processing method, applied to a video monitoring front-end device, comprising:

Determining a corresponding region of interest (ROI) region of the target in each monitoring video frame according to the position information of the same target in each monitoring video frame;

Intercepting the ROI area from the cached monitoring video frames in the original format according to the quality scores of the ROI area in each monitoring video frame, and carrying out enhancement processing on the image sequence of the intercepted ROI area in the original format; the original format refers to an original data format of converting the captured light source signals into digital signals by the image sensor, wherein the original data comprises sensing data from one or more spectrum bands;

Coding and transmitting the image sequence after the enhancement processing;

Wherein, the enhancing processing of the image sequence of the intercepted original format ROI region comprises the following steps:

Selecting a reference frame from a plurality of frame images in the input image sequence of the ROI area in the original format;

aligning the multi-frame image pair based on the reference frame;

Performing enhancement processing on the multi-frame images based on the complementary information among the aligned multi-frame images;

the capturing the ROI area from the cached original format monitoring video frame according to the quality score of the ROI area in each monitoring video frame comprises:

determining a quality score of the ROI area in each monitoring video frame;

and based on the quality scores of the ROI areas in the monitoring video frames, intercepting the ROI areas with the quality scores higher than a preset score threshold from the cached monitoring video frames in the original format.

2. The method of claim 1, wherein the performing object detection and tracking on the surveillance video frames to determine the location information of the same object in each surveillance video frame comprises:

3. The method of claim 1, wherein the capturing the ROI from the buffered original format surveillance video frames based on the quality score of the ROI in each of the surveillance video frames comprises:

determining a quality score of the ROI area in each monitoring video frame;

4. The method according to claim 1, wherein said encoding the sequence of images after enhancement processing comprises:

and for the same object, selecting part or all of image frames from the image sequence after the enhancement processing for coding transmission.

5. The method of claim 4, wherein selecting a portion of the image frames from the enhanced image sequence for encoding transmission comprises:

And selecting a frame of image with the highest quality score from the image sequence after the enhancement processing for coding transmission.

6. An image processing apparatus for use with a video surveillance head-end, the apparatus comprising:

The intercepting unit is used for intercepting the ROI area from the cached monitoring video frames in the original format according to the quality scores of the ROI areas in the monitoring video frames; the original format refers to an original data format of converting the captured light source signals into digital signals by the image sensor, wherein the original data comprises sensing data from one or more spectrum bands;

A transmission unit, configured to perform coding transmission on the image sequence after enhancement processing;

the capturing unit captures the ROI area from the buffered original format monitoring video frame according to the quality score of the ROI area in each monitoring video frame, including:

determining a quality score of the ROI area in each monitoring video frame;

7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

The target detection unit is specifically configured to perform target detection and tracking on the monitoring video frames based on a neural network, so as to determine position information of the same target in each monitoring video frame.

8. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

The determining unit is further configured to determine a quality score of the ROI area in each monitoring video frame;

The intercepting unit is specifically configured to intercept, from the cached original format surveillance video frames, the ROI areas with quality scores higher than a preset score threshold, based on the quality scores of the ROI areas in the surveillance video frames.

9. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

The transmission unit is specifically configured to select, for the same target, a part or all of the image frames from the image sequence after the enhancement processing to perform encoding transmission.

10. The apparatus of claim 9, wherein the device comprises a plurality of sensors,

The transmission unit is specifically configured to select, from the image sequence after enhancement processing, a frame of image with the highest quality score for encoding transmission.

11. An image processing apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the method of any one of claims 1-5 to be performed.

12. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any of claims 1-5.