CN112819706A

CN112819706A - Method for determining identification frame displayed in superposition mode, readable storage medium and electronic device

Info

Publication number: CN112819706A
Application number: CN202110046727.3A
Authority: CN
Inventors: 叶炎钟; 李文斌
Original assignee: Hangzhou Ruiying Technology Co ltd
Current assignee: Hangzhou Ruiying Technology Co ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2021-05-18
Anticipated expiration: 2041-01-14
Also published as: CN112819706B

Abstract

The embodiment of the invention provides a method for determining an identification frame displayed in an overlapping mode, a readable storage medium and electronic equipment. The method comprises the following steps: acquiring position information of a first identification frame in an original frame; converting the position information of the first identification frame in the original frame into the position information of the first identification frame in the output frame; according to the position information of the first recognition frame in the output frame, if the position of the starting point of the first recognition frame in the output frame is determined to be between two pixels pointing to the output frame in the moving direction of the target object, a second recognition frame which is displayed on the output frame in an overlapping mode is set according to the following mode: and respectively expanding a first preset number of pixels and a second preset number of pixels to two sides in the moving direction of the target object by taking the two pixels as a reference to obtain a frame of the second recognition frame in the moving direction of the target object, wherein the moving direction of the target object is a horizontal direction or a vertical direction. The invention reduces the jitter of the identification frame during the overlapping display.

Description

Method for determining identification frame displayed in superposition mode, readable storage medium and electronic device

Technical Field

The invention relates to the technical field of video display, in particular to a method for determining an identification frame of overlaid display, a readable storage medium and electronic equipment.

Background

In video processing, after identifying an object and a position in a frame, the frame, usually a rectangular frame, is drawn at the boundary of the object outline to mark the object. If the object is moving, the displacement of the recognition frame with respect to the object may vary after the recognition frame is superimposed, visually creating a dithering effect.

The following image formats in video pictures belong to formats in which the color components need to be complemented by interpolation, such as: giving a process of displaying the identification frame in a superimposed manner by taking YUV422, YVU422, YUV420, YVU420 and the like as examples, and taking YUV420 as an example in an image format:

1) when an object and its position are identified in a current frame of a captured video stream, the position of an identification frame superimposed at a contour boundary of the object in the current frame is determined.

The location of the identification box in the current frame may be determined by: the coordinates of the start point of the recognition frame, the width of the entire frame (i.e., the width of the entire recognition frame), and the height of the entire frame (i.e., the height of the entire recognition frame) are represented, where the width of the frame of the recognition frame in the four directions (hereinafter, referred to as the frame width) is usually two pixels.

Fig. 1 gives a schematic view of the identification box. Since the image in YUV420 format shares one U component and one V component for 4Y components, the abscissa and ordinate of the start point of the recognition frame in the current frame must be even (the unit of the abscissa and ordinate is a pixel, the upper left vertex of the current frame is set as the origin (0, 0), and the horizontal right is the horizontal forward direction, and the vertical downward is the vertical forward direction).

2) For the position information of the identification frame in the current frame, the method comprises the following steps: the starting point coordinates, the whole frame width and the whole frame height are normalized, and then the normalized recognition frame position information comprises: and storing the normalized starting point coordinate of the identification frame, the normalized whole frame width and the normalized whole frame height into a cache.

Dividing the abscissa of the starting point of the identification frame by the width of the current frame to obtain the normalized horizontal coordinate of the starting point of the identification frame, and dividing the ordinate of the starting point of the identification frame by the height of the current frame to obtain the normalized vertical coordinate of the starting point of the identification frame; dividing the whole frame width of the identification frame by the width of the current frame to obtain the normalized whole frame width of the identification frame; and dividing the whole frame height of the identification frame by the height of the current frame to obtain the normalized whole frame height of the identification frame. Then the normalized starting point abscissa and ordinate of the recognition box, and the normalized whole-box width and whole-box height are both less than 1.

3) Then, when the current frame is to be displayed, reading the stored normalized position information of each identification frame in the current frame includes: and converting the normalized position information of the identification frame into the display position information of the identification frame by the following processing for each identification frame:

multiplying the abscissa and the ordinate of the normalized starting point of the identification frame by the width and the height of a preset output frame respectively to obtain the abscissa and the ordinate of the display starting point of the identification frame;

multiplying the normalized whole frame width of the identification frame by the width of a preset output frame to obtain the display width of the identification frame;

and multiplying the normalized whole frame height of the identification frame by the preset height of the output frame to obtain the display height of the identification frame.

4) And if the abscissa or the ordinate of the display starting point of the identification frame obtained in the step 3) is not an integer, rounding the abscissa or/and the ordinate which are not an integer.

As shown in fig. 2: if the abscissa of the display start point of the recognition frame obtained in 3) is 1001.41, after rounding, the abscissa of the display start point of the recognition frame is 1000 (since the YUV420 format image has 4Y components sharing one U component and one V component, the abscissa and ordinate of the display start point of the recognition frame must be even), and the left frame of the recognition frame is the left frame of the recognition frame 1 shown in fig. 2;

as shown in fig. 2: if the abscissa of the display start point of the recognition frame obtained in 3) is 1001.51, then after rounding up, the abscissa of the display start point of the recognition frame is 1002, and the left frame of the recognition frame is the left frame of the recognition frame 2 shown in fig. 2;

it can be seen that although the abscissa of the display start point of the recognition frame calculated in 3) differs by only 0.1 pixel, a 2-pixel deviation may occur when the recognition frame is displayed, resulting in the occurrence of recognition frame shaking when the video is displayed in real time.

Through a large number of experiments, the current algorithm for overlapping the identification frames during video display generates a deviation of 0.5 pixel on average.

If other rounding methods are used instead of rounding, there is a case where even though the display positions of the recognition frame calculated in 3) differ by only 0.1 pixel, there is a 2-pixel deviation that may occur when the recognition frame is displayed on a display.

In addition, even if the image format for superimposing the recognition frame is complete for each component, i.e., a format in which any component of the pixel does not need to be complemented by interpolation, such as: YUV444, YVU444, RGB, etc., in which jitter is much smaller when the identification frame is superimposed, but jitter still exists mainly because: the normalized position information of the recognition frame is a decimal, and the decimal may still be a decimal after the normalized position information of the recognition frame is converted into the display position information of the recognition frame, but the decimal part behind the decimal point cannot be taken into account when the recognition frame is actually displayed, so that the deviation between the display position of the recognition frame and the real position is caused, and the jitter is caused.

Disclosure of Invention

The embodiment of the invention provides a method for determining an identification frame displayed in an overlapping mode, a readable storage medium and electronic equipment, so that the jitter of the identification frame in the overlapping mode is reduced.

The technical scheme of the embodiment of the invention is realized as follows:

a method of determining an identification box for overlay display, the method comprising:

acquiring position information of a first identification frame in an original frame, wherein the first identification frame is used for marking a target object in the original frame;

converting the position information of the first identification frame in the original frame into the position information of the first identification frame in the output frame; the output frame has the same content as the original frame but a different resolution;

according to the position information of the first recognition frame in the output frame, if the position of the starting point of the first recognition frame in the output frame is determined to be between two pixels pointing to the output frame in the motion direction of the target object, setting a second recognition frame which is displayed on the output frame in an overlapping mode according to the following mode:

and respectively expanding a first preset number of pixels and a second preset number of pixels to two sides in the moving direction of the target object by taking the two pixels as a reference to obtain a frame of the second recognition frame in the moving direction of the target object, wherein the moving direction of the target object is a horizontal direction or a vertical direction.

When the position of the starting point of the first identification frame in the output frame points to exactly one pixel of the output frame, the step of taking the two pixels as a reference comprises the following steps:

the pixel of the output frame to which the starting point of the first recognition frame is positioned in the output frame is taken as the first pixel, and,

when the motion direction of the target object is the horizontal direction, taking a pixel which is positioned on the same line with the first pixel and is positioned on the right side of the first pixel on the output frame as a second pixel; when the motion direction of the target object is the vertical direction, taking a pixel which is positioned in the same column with the first pixel and is positioned below the first pixel on the output frame as a second pixel;

the first pixel and the second pixel are taken as a reference.

After the first preset number of pixels and the second preset number of pixels are respectively expanded to two sides in the moving direction of the target object, the method further includes:

if the motion direction of the target object is the horizontal direction, taking the pixel which is positioned at the leftmost side after the expansion as the starting point of the first line of pixels of the left frame of the second recognition frame, and taking the pixel which is positioned at the rightmost side after the expansion as the end point of the first line of pixels of the left frame of the second recognition frame;

and if the motion direction of the target object is the vertical direction, taking the pixel which is positioned at the top after the expansion as the starting point of the first row of pixels of the upper frame of the second recognition frame, and taking the pixel which is positioned at the bottom after the expansion as the end point of the first row of pixels of the upper frame of the second recognition frame.

The first preset number is 1+2 m, the second preset number is 1+2 n, m and n are non-negative integers, and m and n are not necessarily equal.

After the determining that the position of the starting point of the first recognition frame in the output frame is between two pixels pointing to the output frame in the moving direction of the object, further includes:

determining that a first pixel of the two pixels is an even-numbered pixel in a row of an output frame, the first pixel being a left-located pixel or a top-located pixel of the two pixels;

after the obtaining of the frame of the second recognition frame in the moving direction of the target object, the method further includes:

for each non-boundary pixel on the second bounding box, setting the Y, U, V value for that pixel to be the same as the Y, U, V value for the predefined second bounding box;

for each boundary pixel on the second recognition box, the U, V value of the pixel is set to be the same as the U, V value of the predefined second recognition box, and the Y value of the pixel is set to be: and fusing the Y value of the predefined second identification frame with the Y value of the pixel of the output frame corresponding to the pixel.

Setting the Y value of the pixel as: the fused value of the Y value of the predefined second identification frame and the Y value corresponding to the pixel comprises the following values:

if the pixel is located at the left or upper boundary of the frame of the second recognition box, the value Y of the pixel is equal to α × 1- (a-a0) ] × Y1+ β (a-a0) × Y0;

if the pixel is located at the right or lower boundary of the frame of the second recognition box, the value Y of the pixel is equal to α × (a-a0) × Y1+ β × [1- (a-a0) ] -Y0;

wherein Y1 is the Y value of the predefined second frame, and Y0 is the Y value of the pixel of the output frame corresponding to the pixel; a is an abscissa of a starting point of the first identification frame in the output frame after converting the position information of the first identification frame in the original frame into the position information of the first identification frame in the output frame; a0 is an integer part of A, alpha and beta are preset constants, and 0< alpha < 1, and 0< beta < 1.

determining that a first pixel of the two pixels is an odd number of pixels in a row of an output frame, the first pixel being a left pixel or a top pixel of the two pixels;

for each boundary pixel on the second recognition frame, setting the U, V value of the pixel to be the same as the U, V value of the pixel of the output frame corresponding to the pixel, and setting the Y value of the pixel to be: and fusing the Y value of the predefined second identification frame with the Y value of the pixel of the output frame corresponding to the pixel.

The format of the output frame is YUV420, or YVU420, or YUV422, or YVU422, or YUV400, or YUV444, or RGB.

A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of the method of determining an identification box for an overlay display as described in any one of the above.

An electronic device comprising a non-transitory computer readable storage medium as described above, and the processor having access to the non-transitory computer readable storage medium.

In the embodiment of the invention, when the position of the starting point of the first recognition frame in the output frame points to between two pixels of the output frame in the motion direction of the target object, the first preset number of pixels and the second preset number of pixels are respectively expanded to two sides in the motion direction of the target object by taking the two pixels as a reference, so as to obtain the frame of the second recognition frame in the motion direction of the target object, wherein the motion direction of the target object is the horizontal direction or the vertical direction, so that the frame of the second recognition frame can be always attached to the outline of the target object when the output frame is displayed, and the shaking of the recognition frame in the display process is avoided.

Drawings

FIG. 1 is a schematic view of an identification box;

fig. 2 is a schematic diagram illustrating differences between superimposed identification frames when the abscissa of the display starting point of the converted identification frame differs by 0.1 pixel in the conventional scheme;

FIG. 3 is a flowchart of a method for determining an identification frame of an overlay display according to an embodiment of the present invention;

FIG. 4 is a first example of an application of the present invention to determine a second identification box;

FIG. 5 is a second example of an application of the present invention to determine a second identification box;

FIG. 6 is a flowchart of a method for determining an identification box for overlay display according to another embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 3 is a flowchart of a method for determining an identification frame for overlay display according to an embodiment of the present invention, which includes the following specific steps:

step 301: position information of the first recognition frame in the original frame is acquired. The first identification frame is used for marking a target object in the original frame.

The position information of the first recognition frame in the original frame includes: the position information of the starting point of the first recognition frame in the original frame, and the whole frame width and the whole frame height.

Step 302: and converting the position information of the first identification frame in the original frame into the position information of the first identification frame in the output frame. The output frame has the same content as the original frame but different resolution.

Step 303: according to the position information of the first recognition frame in the output frame, if the position of the starting point of the first recognition frame in the output frame is determined to be between two pixels pointing to the output frame in the moving direction of the target object, a second recognition frame which is displayed on the output frame in an overlapping mode is set according to the following mode:

and respectively expanding a first preset number of pixels and a second preset number of pixels to two sides in the moving direction of the target object by taking the two pixels as a reference to obtain a frame of a second recognition frame in the moving direction of the target object, wherein the moving direction of the target object is a horizontal direction or a vertical direction.

The moving direction of the target object is a horizontal direction or a vertical direction, the horizontal direction is a width direction of the output frame, and the vertical direction is a height direction of the output frame. Then:

when the moving direction of the target object is a horizontal direction, in this step, the position of the starting point of the first recognition frame in the output frame points between two pixels of the output frame in the moving direction of the target object means that the position of the starting point of the first recognition frame in the output frame points between two pixels in the width direction of the output frame, that is, the two pixels are horizontally adjacent on the output frame, and the starting point of the first recognition frame is located between the two pixels in the output frame;

when the moving direction of the object is the vertical direction, in this step, the position of the starting point of the first recognition frame in the output frame points between two pixels of the output frame in the moving direction of the object means that the position of the starting point of the first recognition frame in the output frame points between two pixels in the height direction of the output frame, that is, the two pixels are vertically adjacent on the output frame, and the starting point of the first recognition frame is located between the two pixels in the output frame.

When the moving direction of the target object is a horizontal direction, in this step, with the two pixels as a reference, respectively extending the first preset number of pixels and the second preset number of pixels to both sides in the moving direction of the target object means that with the two pixels as a reference, extending the first preset number of pixels to the left and the second preset number of pixels to the right on the row where the two pixels are located, so as to obtain a first row of the left frame of the second recognition frame;

when the moving direction of the target object is the vertical direction, in this step, with the two pixels as the reference, respectively extending the first preset number of pixels and the second preset number of pixels to both sides in the moving direction of the target object means that with the two pixels as the reference, extending the preset first number of pixels upwards and extending the preset second number of pixels downwards on the column where the two pixels are located, thereby obtaining the first column of the upper frame of the second recognition frame.

Since the whole frame width and the whole frame height of the second recognition frame are known (which can be calculated according to the whole frame width and the whole frame height of the first recognition frame in the original frame and the resolutions of the original frame and the output frame), and the frame widths of the four frames of the second recognition frame are also preset, after the first row of the left frame or the first column of the upper frame of the second recognition frame is obtained (i.e. after the starting point of the second recognition frame is substantially known to be at the position of the output frame), the four frames of the whole second recognition frame can be obtained. And finally, superposing the second identification frame on the output frame.

In the above embodiment, when the starting point of the first recognition frame in the output frame is located between two pixels of the output frame in the motion direction of the object, the first preset number of pixels and the second preset number of pixels are respectively extended to two sides in the motion direction of the object by taking the two pixels as a reference, so as to obtain the frame of the second recognition frame in the motion direction of the object, wherein the motion direction of the object is the horizontal direction or the vertical direction, so that when the output frame is displayed, the frame of the second recognition frame can always be attached to the contour of the object, thereby avoiding the shaking of the recognition frame during the display.

In an alternative embodiment, when the position of the starting point of the first recognition frame in the output frame points to exactly one pixel of the output frame, step 303 includes, with reference to the two pixels:

the pixel of the output frame to which the start point of the first recognition frame is located in the output frame is pointed right at is taken as the first pixel, and,

the first pixel and the second pixel are taken as a reference.

In an optional embodiment, in step 303, after the extending of the first preset number of pixels and the second preset number of pixels to both sides in the moving direction of the object respectively, the method further includes:

and if the motion direction of the target object is the vertical direction, taking the pixel which is positioned at the top after expansion as the starting point of the first row of pixels of the upper frame of the second recognition frame, and taking the pixel which is positioned at the bottom after expansion as the end point of the first row of pixels of the upper frame of the second recognition frame.

In an alternative embodiment, the first predetermined number is 1+2 × m, the second predetermined number is 1+2 × n, m and n are non-negative integers, and m and n are not necessarily equal.

Fig. 4 is a first application example of determining the second recognition frame provided by the present invention, as shown in fig. 4, the top left vertex of the output frame is taken as the origin (0, 0), the horizontal right direction is taken as the horizontal forward direction, the vertical downward direction is taken as the vertical forward direction, the coordinate unit is a pixel, the moving direction of the object is taken as the horizontal direction, and the abscissa of the starting point of the first recognition frame calculated in step 302 in the output frame is 1001.41, the starting point of the first recognition frame is located between two pixels with the abscissa of 1001 and 1002 in the output frame, at this time, the two pixels with the abscissa of 1001 and 1002 are taken as the reference, and the two pixels are respectively extended by 1 pixel to the left and the right, then the four pixels (the

abscissas

1000, 1001, 1002, and 1003) are the 1 st line pixel of the left frame of the second recognition frame, and the four frames of the second recognition frame have widths of 4 pixels.

In an alternative embodiment, after determining that the position of the starting point of the first recognition frame in the output frame is between two pixels pointing to the output frame in the moving direction of the object in step 303, the method further includes: determining that a first pixel of the two pixels is an even-numbered pixel in a row of the output frame, and the first pixel is a pixel located on the left side or a pixel located on the upper side of the two pixels;

in step 303, after obtaining the frame of the second recognition frame in the moving direction of the target object, the method further includes:

Still taking fig. 4 as an example, if the top left vertex of the output frame is the origin (0, 0), the horizontal right direction is the horizontal forward direction, the vertical downward direction is the vertical forward direction, the coordinate unit is a pixel, the moving direction of the object is the horizontal direction, and the abscissa of the starting point of the first recognition frame in the output frame calculated in step 302 is 1001.41, the starting point of the first recognition frame is located between two pixels with the abscissa of 1001 and 1002 in the output frame, i.e. the first pixel in the two pixels: the pixel with the abscissa of 1001 is the 1002 th (i.e., even) th pixel in the row of the output frame; then, with two pixels with abscissa of 1001 and 1002 as a reference, 1 pixel is respectively extended to the left and the right, so that the abscissa of the leftmost pixel in the four pixels is 1000, and the abscissa of the rightmost pixel is 1003, that is, the abscissas of the pixels on the left frame of the second recognition frame are sequentially from left to right: 1000. 1001, 1002, 1003.

Consider that: when the output frame format is YUV420 or YVU420 or YUV422 or YVU422, four Y components or two Y components are required to share one U component and one V component, U, V components of two pixels with adjacent horizontal coordinates being even and odd in sequence are required to be set to be the same, namely U, V values of pixels with

horizontal coordinates

1000 and 1001 are the same, U, V values of pixels with

horizontal coordinates

1002 and 1003 are the same, and since the pixels with horizontal coordinates 1000-1003 are all located on the second recognition frame, U, V values of the pixels are all set to be U, V values of the predefined second recognition frame;

meanwhile, consider that: and reducing the difference of the displacement between the second recognition frame and the target object in the front and back frames as much as possible, and fusing the Y value of the boundary pixel of the second recognition frame with the Y value of the pixel of the output frame at the position of the boundary pixel.

As can be seen from the above example, when the abscissa of the starting point of the first recognition frame calculated in step 302 in the output frame is 1001.41 and 1001.51 (i.e., the difference is 0.1 pixel), the displacement between the second recognition frame and the target object is substantially consistent in the previous and subsequent frames, and no 2-pixel deviation occurs as in the prior art.

In an alternative embodiment, after determining that the position of the starting point of the first recognition frame in the output frame is between two pixels pointing to the output frame in the moving direction of the object in step 303, the method further includes:

determining that a first pixel of the two pixels is an odd number of pixels in a row of the output frame, and the first pixel is a pixel positioned on the left side or a pixel positioned on the upper side of the two pixels;

for each boundary pixel on the second recognition frame, the U, V value of the pixel is set to be the same as the U, V value of the pixel of the output frame corresponding to the pixel, and the Y value of the pixel is set to be: and fusing the Y value of the predefined second identification frame with the Y value of the pixel of the output frame corresponding to the pixel.

Fig. 5 is a second application example of determining the second recognition frame provided by the present invention, as shown in fig. 5, assuming that the top left vertex of the output frame is the origin (0, 0), the horizontal right is the horizontal positive direction, the vertical downward is the vertical positive direction, the coordinate unit is a pixel, the moving direction of the object is the horizontal direction, and the abscissa of the starting point of the first recognition frame calculated in step 302 in the output frame is 1002.41, the starting point of the first recognition frame is located between two pixels with the abscissa of 1002 and 1003 in the output frame, that is, the first pixel in the two pixels: the pixel with the abscissa of 1002 is the 1003 th (i.e., odd number) pixel in the row of the output frame; then, with two pixels with abscissa of 1002 and 1003 as a reference, 1 pixel is respectively extended to the left and the right, so that the abscissa of the leftmost pixel in the four pixels is 1001, and the abscissa of the rightmost pixel is 1004, that is, the abscissas of the pixels on the left frame of the second recognition frame are sequentially from left to right: 1001. 1002, 1003 and 1004.

Consider that: when the output frame format is YUV420 or YVU420 or YUV422 or YVU422, four Y components or two Y components are required to share one U component and one V component, and U, V components of two pixels with adjacent horizontal coordinates being even and odd in sequence are required to be set to be the same; that is, U, V values of the pixels of the

abscissas

1002 and 1003 are the same, since the pixels of the

abscissas

1002 and 1003 are both located on the second recognition box, their U, V values are both set to U, V values of the predefined second recognition box; while the value of U, V on abscissa 1001 should be the same as the value of U, V for the pixel on abscissa 1000 of the same row on the output frame, the value of U, V for the pixel on abscissa 1004 should be the same as the value of U, V for the pixel on abscissa 1005 of the same row on the output frame;

meanwhile, consider that: and the smooth transition of the second identification frame and the output frame fuses the Y value of the boundary pixel of the second identification frame and the Y value of the pixel of the output frame at the position of the boundary pixel.

As can be seen from the above example, when the starting point of the first recognition frame calculated in step 302 has abscissa 1002.41 and 1002.51 (i.e. 0.1 pixel difference) in the output frame, the second recognition frame is identical and does not generate 2-pixel deviation as in the prior art.

In an alternative embodiment, the Y value of the pixel is set to: the value obtained by fusing the Y value of the predefined second identification frame with the Y value of the pixel of the output frame corresponding to the pixel comprises:

if the pixel is located at the left or upper boundary of the frame of the second recognition box, the value Y of the pixel is equal to α × 1- (a-a0) ] × Y1+ β (a-a0) × Y0; the left border is aimed at the left frame and the right frame, and the upper border is aimed at the upper frame and the lower frame;

if the pixel is located at the right or lower boundary of the frame of the second recognition box, the value Y of the pixel is equal to α × (a-a0) × Y1+ β × [1- (a-a0) ] -Y0; the right border is aimed at the left frame and the right frame, and the lower border is aimed at the upper frame and the lower frame;

wherein Y1 is the Y value of the predefined second frame, and Y0 is the Y value of the pixel of the output frame corresponding to the pixel; a is an abscissa of a starting point of the first identification frame in the output frame after converting the position information of the first identification frame in the original frame into the position information of the first identification frame in the output frame; a0 is the integer part of A; for example: in step 302, the abscissa of the starting point of the first recognition frame in the output frame is 1002.41, a is 1002.41, and a0 is 1002; α and β are predetermined constants, and 0< α ≦ 1, 0< β ≦ 1, and typically α ═ 1 and β ≦ 1.

In an alternative embodiment, the format of the output frame in the embodiment of the present invention may be YUV420, or YVU420, or YUV422, or YVU422, or YUV400, or YUV444, or RGB.

Fig. 6 is a flowchart of a method for determining an identification frame displayed in an overlapping manner according to another embodiment of the present invention, which includes the following specific steps:

step 601: the method comprises the steps of collecting original video streams, identifying a target object for each original frame in the original video streams, normalizing position information of an identification frame for marking the target object, putting the obtained normalized position information of the identification frame into a first cache queue, and putting the original frame into a second cache queue.

Step 602: when the number of frames in the second buffer queue reaches a preset threshold, searching the normalized position information of the identification frame of the frame from the first buffer queue according to the frame identifier (such as a frame number or a time stamp) of the frame to be displayed, wherein the normalized position information comprises the following steps: the normalized start point coordinates, the normalized whole frame width and the normalized whole frame height of the recognition frame are processed as follows for each recognition frame in steps 603 and 607:

if the normalized position information of any identification frame of the frame is not found in the first cache queue according to the frame identifier of the frame to be displayed, predicting according to the normalized position information of each identification frame of the latest frame and the inter-frame offset to obtain the normalized position information of each identification frame of the frame to be displayed.

Step 603: setting the current identification frame as a first identification frame, and converting the normalized position information of the first identification frame into the position information of the first identification frame in an output frame:

multiplying the horizontal coordinate and the vertical coordinate of the normalized starting point of the first identification frame by the width and the height of the output frame respectively to obtain the horizontal coordinate and the vertical coordinate of the starting point of the first identification frame in the output frame;

multiplying the normalized whole frame width of the first identification frame by the width of the output frame to obtain the whole frame width of the first identification frame on the output frame;

and multiplying the normalized whole frame height of the first identification frame by the height of the output frame to obtain the whole frame height of the first identification frame on the output frame.

Here, the moving direction of the target object is set to the horizontal direction.

Before outputting and displaying the frame, the resolution of the frame is converted into the preset resolution of the output frame.

Step 604: and determining that the abscissa of the starting point of the first recognition frame is positioned between two pixels of the output frame, and respectively expanding one pixel leftwards and rightwards by taking the two pixels as a reference to obtain the 1 st line of pixels of the left frame of the second recognition frame.

Here, the frame width of the second recognition frame is set to 4 pixels.

Step 605: judging whether the abscissa of the starting point of the second recognition frame (i.e. the left pixel of the two pixels, i.e. the 1 st pixel of the 1 st line of pixels of the left frame of the second recognition frame) in the output frame is an odd number or an even number, and if the abscissa is an odd number, performing step 606; if so, go to step 607.

The top left vertex of the output frame is the origin (0, 0), the horizontal right direction is the horizontal forward direction, the vertical downward direction is the vertical forward direction, and the coordinate unit is a pixel.

Step 606: for each non-boundary pixel on the second bounding box, setting the Y, U, V value for that pixel to be the same as the Y, U, V value for the predefined second bounding box; for each boundary pixel on the second recognition box, the U, V value of the pixel is set to be the same as the U, V value of the predefined second recognition box, and the Y value of the pixel is set to be: the Y value of the second identification frame defined in advance is fused with the Y value of the pixel of the output frame corresponding to the pixel, and the flow is finished.

After non-boundary pixels, that is, the second identification frame, are superimposed on the output frame, pixels on the second identification frame that are not adjacent to the original pixels on the output frame, such as: the leftmost and rightmost pixels on the left and right frames of the second recognition frame, and the uppermost and lowermost pixels on the upper and lower frames.

Step 607: for each non-boundary pixel on the second bounding box, setting the Y, U, V value for that pixel to be the same as the Y, U, V value for the predefined second bounding box; for each boundary pixel on the second recognition frame, the U, V value of the pixel is set to be the same as the U, V value of the pixel of the output frame corresponding to the pixel, and the Y value of the pixel is set to be: and fusing the Y value of the predefined second identification frame with the Y value of the pixel of the output frame corresponding to the pixel.

Embodiments of the present invention also provide a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the method as described in steps 301-303, or steps 601-607.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device includes the non-transitory computer-readable storage medium 71 as described above, and the processor 72 can access the non-transitory computer-readable storage medium 71.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of determining an identification box for overlay display, the method comprising:

2. The method of claim 1, wherein when the position of the starting point of the first recognition frame in the output frame points to exactly one pixel of the output frame, the taking the two pixels as the reference comprises:

the first pixel and the second pixel are taken as a reference.

3. The method according to claim 1 or 2, wherein after the expanding the first preset number of pixels and the second preset number of pixels to both sides respectively in the moving direction of the object, further comprises:

4. The method according to claim 3, wherein the first predetermined number is 1+2 m, the second predetermined number is 1+2 n, m and n are non-negative integers, and m and n are not necessarily equal.

5. The method of claim 4, wherein determining the location of the starting point of the first recognition box in the output frame after pointing between two pixels of the output frame in the direction of motion of the object further comprises:

6. The method of claim 5, wherein the Y value of the pixel is set as: the fused value of the Y value of the predefined second identification frame and the Y value corresponding to the pixel comprises the following values:

7. The method of claim 4, wherein determining the location of the starting point of the first recognition box in the output frame after pointing between two pixels of the output frame in the direction of motion of the object further comprises:

8. The method of claim 7, wherein the Y value of the pixel is set as: the fused value of the Y value of the predefined second identification frame and the Y value corresponding to the pixel comprises the following values:

9. The method according to any of claims 1 to 8, wherein the format of the output frame is YUV420, or YVU420, or YUV422, or YVU422, or YUV400, or YUV444, or RGB.

10. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of the method of determining an identification frame for an overlay display of any of claims 1 to 9.

11. An electronic device comprising the non-transitory computer readable storage medium of claim 10, and the processor having access to the non-transitory computer readable storage medium.