CN111787389B

CN111787389B - Transposed video identification method, device, equipment and storage medium

Info

Publication number: CN111787389B
Application number: CN202010740341.8A
Authority: CN
Inventors: 黄晨; 王璐; 陶文; 杨羿; 李�一; 陈晓冬; 刘林; 朱延峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2022-11-08
Anticipated expiration: 2040-07-28
Also published as: CN111787389A

Abstract

The application discloses a transposed video identification method, a transposed video identification device, transposed video identification equipment and a storage medium, and relates to the technical field of image processing. The specific implementation scheme is as follows: identifying content attribute characteristics of video frames in a video to be processed; wherein the content attribute feature comprises at least one of a straight line feature, a text feature and an area feature; and determining whether the video to be processed is a transposed video or not according to the content attribute characteristics. The method and the device for identifying the transposed video realize automatic identification of the transposed video and improve identification efficiency of the transposed video.

Description

Transposed video identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to an image processing technology, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a transposed video.

Background

With the sudden increase of new flow of short videos, the short videos have become important carriers for netizen information acquisition. Due to different video display scenes, a large number of transposed videos exist in the video playing platform, namely, a vertical version video obtained after the horizontal version video is transposed, or a horizontal version video obtained after the vertical version video is transposed.

The video images in the transposed video cannot completely fill the playing interface due to the influence of the playing interface of the user terminal, so that the content filling condition exists in the non-video area of the playing interface, the watching experience of a user is influenced, and the secondary use of the video materials in other scenes is not facilitated.

Disclosure of Invention

The application provides a transposed video identification method, device, equipment and storage medium with higher efficiency so as to realize automatic identification of the transposed video.

According to an aspect of the present application, there is provided a transposed video identification method, including:

identifying content attribute characteristics of video frames in a video to be processed; the content attribute feature comprises at least one of a straight line feature, a character feature and a region feature;

and determining whether the video to be processed is a transposed video or not according to the content attribute characteristics.

According to another aspect of the present application, there is provided a transposed video identification device, comprising:

the content attribute feature identification module is used for identifying the content attribute features of video frames in the video to be processed; wherein the content attribute feature comprises at least one of a straight line feature, a text feature and an area feature;

and the transposed video determining module is used for determining whether the video to be processed is the transposed video or not according to the content attribute characteristics.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the transposed video identification methods provided by embodiments of the present application.

According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any one of the transposed video identification methods provided by the embodiments of the present application.

According to the technology of the application, the automatic identification of the transposed video is realized, and the identification efficiency of the transposed video is improved.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1A is a flowchart of a transposed video identification method according to an embodiment of the present application;

fig. 1B is a schematic diagram of a transposed video with a horizontal rotation and a vertical rotation provided in an embodiment of the present application;

fig. 1C is a schematic diagram of a vertically rotated and horizontally transposed video provided by an embodiment of the present application;

fig. 2 is a flowchart of another transposed video identification method provided in an embodiment of the present application;

fig. 3A is a flowchart of another transposed video identification method provided in an embodiment of the present application;

FIG. 3B is a schematic diagram of a target image according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a transposed video identification device according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device for implementing a transposed video identification method of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The transposed video identification method and the transposed video identification device provided by the embodiment of the application are suitable for the situation of automatic identification of the transposed video in the video, namely the portrait video obtained by transposing the landscape video or the landscape video obtained by transposing the portrait video. The transposed video identification method may be performed by a transposed video identification apparatus implemented in software and/or hardware and specifically configured in an electronic device.

Fig. 1A is a flowchart of a transposed video identification method provided in an embodiment of the present application, where the method includes:

s101, identifying content attribute characteristics of video frames in a video to be processed; wherein the content attribute feature includes at least one of a straight line feature, a text feature, and a region feature.

Illustratively, the frame extraction processing may be performed on the video to be processed according to a set frequency, so as to obtain at least one video frame. Wherein the set frequency can be set by a technician as desired or empirically.

The video frame comprises a video area used for displaying picture information contained in the video; the video frame can also comprise a non-video area which is used for filling the background of the area except the picture information contained in the video in the display interface. The non-video area can be filled by means of Gaussian blur, repeated video pictures, static pictures or pure color backgrounds and the like.

It should be noted that, in the portrait video transposed from the landscape video or the landscape video transposed from the portrait video, there may be a filling situation of the non-video area in order to meet the aspect ratio requirement of different display scenes.

The content attribute feature is used for representing basic features of content contained in a video picture and/or a non-video picture contained in the video frame. For example, a straight line feature used to characterize straight line segment information contained in a video region and/or a non-video region in a video frame; the text characteristics are used for representing text information contained in a video area and/or a non-video area in a video frame; region features for characterizing region smoothness of video regions and/or non-video regions in a video.

S102, determining whether the video to be processed is a transposed video or not according to the content attribute characteristics.

In an optional implementation manner of the embodiment of the present application, if the content attribute feature includes a straight line feature, it may be determined whether the video to be processed is a transposed video according to a matching condition between the straight line feature and a video frame length or a video frame width.

For example, if the linear feature matches with the video frame length in the video frame, it indicates that the video frame is more likely to be a portrait image transposed into a horizontal image, that is, the video to be processed may be a portrait video obtained after the horizontal image is transposed. And if the number of the video frames matched with the video frame length in the video to be processed is larger than a first set number threshold value, or the ratio of the video frames matched with the video frame length in the video to be processed is larger than a first set ratio threshold value, determining that the video to be processed is a transposed video in a horizontal-to-vertical mode. Fig. 1B is a diagram showing a transposed video from landscape to portrait, the transposed video 10 including a video area 11 and a non-video area 12. The video boundary line 13 of the video area 11 is a straight line feature matching the video frame length. Wherein the first set quantity threshold or the first set proportion threshold may be determined by a skilled person as desired or empirically, or determined through trial and error.

For example, if the linear feature matches with the width of the video frame in the video frame, it indicates that the video frame is more likely to be a horizontal image formed by transposing a portrait image, that is, the video to be processed may be a horizontal video obtained by transposing a portrait video. And if the number of the video frames matched with the video frame width in the video to be processed is larger than a second set number threshold value, or the ratio of the video frames matched with the video frame width in the video to be processed is larger than a second set ratio threshold value, determining that the video to be processed is a vertically-rotated and horizontally-rotated transposed video. Fig. 1C shows a schematic diagram of a transposed video of a portrait to landscape, the transposed video 20 includes a video area 21 and a non-video area 22. The video boundary line 23 of the video area 21 is a straight line feature matching the video frame width. Wherein the second set number threshold or the second set ratio threshold may be determined by a skilled person as desired or empirically, or determined repeatedly by a number of experiments. The numerical values of the first set quantity threshold and the second set quantity threshold can be the same or different; the magnitude of the first set duty threshold and the magnitude of the second set duty threshold may be the same or different.

In another optional embodiment of the present application, if the content attribute feature includes a text feature, it may be determined whether the video to be processed is a transposed video according to a matching condition of the text feature and the video picture content in the video frame.

For example, if the text feature is located in a non-video area in the video frame, it is determined that the text feature matches the picture content in the video frame, and the video frame can be considered as a non-transposed image; and if the number of the non-transposed images in the video to be processed is greater than a third set number threshold value, or the ratio of the non-transposed images in the extracted video frame is greater than a third set ratio threshold value, determining that the video to be processed is a non-transposed video.

In order to improve the matching degree of the character characteristics and the video content and further enhance the video watching experience of a user, the character characteristics of a non-video area in a video frame can be matched with a video title or character information or picture information in the video area; if the matching is successful, determining that the character features in the non-video area are related to the video content, and determining that the video frame is a non-transposed image; and if the number of the non-transposed images in the video to be processed is greater than a third set number threshold value, or the proportion of the non-transposed images in the extracted video frame is greater than a third set proportion threshold value, determining that the video to be processed is a non-transposed video. The magnitude of the third set quantity threshold or the third set proportion threshold may be determined by a technician according to needs or experience values, or determined repeatedly through a large number of experiments.

In another optional implementation manner of the embodiment of the present application, if the content attribute feature includes a region feature, it may be determined whether the video to be processed is a transposed video according to a comparison condition between the region features of different regions in the video frame and corresponding region standards.

Illustratively, identifying a middle smoothness of a picture middle region of a video frame in the video to be processed, and identifying an edge smoothness of a picture edge region of the video frame in the video to be processed; taking the middle smoothness and the edge smoothness as regional characteristics; if the intermediate smoothness is less than the intermediate smoothness threshold and the edge smoothness is greater than the edge smoothness threshold, determining that the video to be processed is a transposed video. The numerical value of the intermediate smoothness threshold or the edge smoothness threshold can be determined by a skilled person according to requirements or experience values, or determined repeatedly through a large number of experiments.

Wherein the smoothness is used to quantify a degree of blurring or smoothness characterizing the identified region. Optionally, an image smoothing algorithm may be used to determine the smoothness of the identified region. For example, the smoothness of different picture regions can be calculated using the laplacian operator.

It should be noted that, when the video is identified by horizontally and vertically rotating the video, the middle area of the picture may be understood as dividing the picture of the video frame into at least three areas, and the area located in the middle is used as the middle area, that is, the area located near the horizontal central axis of the picture of the video frame and the video is used as the middle area; at least one region near an upper boundary and a lower boundary of a picture of the video frame is taken as an edge region. Or, optionally, when the video boundary line of the video region is identified, the non-video region associated with the video boundary line is used as the picture edge region, and the video region is used as the picture middle region.

When the vertical-rotation and horizontal-rotation video is identified, the middle area of the picture can be understood as dividing the left and right sides of the picture of the video frame into at least three areas, and the area in the middle part is used as the middle area, namely the area near the vertical central axis of the picture of the video frame and the video is used as the middle area; at least one region near the left and right boundaries of the video frame picture is taken as an edge region. Or, optionally, when the video boundary line of the video region is identified, the non-video region associated with the video boundary line is used as the picture edge region, and the video region is used as the picture middle region.

Or optionally, after the region feature is determined, it may also be determined whether the video to be processed is a transposed video according to a comparison condition of the region features between different regions in the video frame.

For example, if the smoothness ratio of the intermediate smoothness to the edge smoothness is greater than the set smoothness ratio threshold, it is determined that the video to be processed is the transposed video. Wherein the smoothness fraction threshold may be determined by a skilled person based on average need or empirical values, or determined iteratively through a number of experiments.

It can be understood that by introducing the region feature, the transposed video of the non-video region containing the pure-color background filling, the gaussian blur or the like can be effectively identified, so that the effective identification of the transposed video can still be performed under the condition that the straight line feature cannot be identified or the accuracy of the identified straight line feature is low, and the accuracy of the identification result of the transposed video is improved.

On the basis of the technical schemes, after the video to be processed is identified as the transposed video, the video to be processed can be cut and/or processed for the second time according to the position of the boundary line of the video, so that the video materials which do not pass through the proportion are obtained, and the video materials with the corresponding proportion are put in a platform according to the putting scene requirement and the video platform requirement.

The method comprises the steps of identifying content attribute characteristics of video frames in a video to be processed; wherein the content attribute feature comprises at least one of a straight line feature, a text feature and an area feature; and determining whether the video to be processed is a transposed video or not according to the content attribute characteristics. According to the technical scheme, by introducing the content attribute characteristics of the video frames, the automatic identification of the transposed video of the to-be-processed video to which the video frames belong is realized, the identification efficiency of the transposed video is improved, the batch and real-time processing of the identification of the transposed video can be realized, and a foundation is laid for the automatic, large-scale, real-time and customized processing of the subsequent secondary processing of the transposed video. In addition, because the video quality of the transposed video is generally low, effective control of the video quality can be realized through automatic identification of the transposed video.

Fig. 2 is a flowchart of another transposed video identification method provided in the embodiment of the present application, which is improved based on the foregoing technical solutions.

Further, if the content attribute features include straight line features, the operation "identify the content attribute features of the video frames in the video to be processed" is refined into "identify the video boundary lines of the video frames in the video to be processed, and take the video boundary lines as the straight line features"; correspondingly, the operation of determining whether the video to be processed is the transposed video according to the content attribute characteristics is refined into the operation of determining that the video to be processed is the non-transposed video if the boundary line of the video is not identified so as to perfect the identification mechanism for identifying the transposed video based on the linear characteristics.

A transposed video identification method as shown in fig. 2, includes:

s201, identifying a video boundary line of a video frame in a video to be processed, and taking the video boundary line as a straight line feature.

Here, the video boundary line may be understood as an intersection line of a video region and a non-video region in a video frame picture.

For example, identifying the video boundary line of the video frame in the video to be processed may be: identifying video reference lines of video frames in a video to be processed; and determining a video boundary line according to the video reference line.

Optionally, the video frame in the video to be processed is input into a pre-trained straight line recognition model, and the straight line segment output by the model is used as the video reference line. The straight line recognition model is obtained by taking a sample video frame with a pre-marked straight line segment as a training sample and training a pre-constructed machine learning model.

Optionally, the video frame may be further processed according to a straight line extraction algorithm to extract a straight line segment in the video frame, and the extracted straight line segment is used as a video reference line.

Optionally, determining the video boundary line according to the video reference line may be: and screening the video boundary line from the video reference line directly according to the boundary line attribute. The number of the screened video boundary lines is not limited. In general, the video boundary lines may be two.

Wherein the boundary line attribute comprises at least one of a length of the boundary line, an area size of the non-video region associated with the boundary line, and a symmetry of the non-video region associated with the boundary line.

Generally, in a transposed video with horizontal rotation and vertical rotation, the length of a boundary line is the same as the length of a video picture; in the transposed video of vertical panning, the boundary line length is the same as the video picture width. Therefore, the video boundary line can be limited by the boundary line length, and interference lines with line segment lengths which do not meet corresponding requirements in the video reference line can be eliminated.

No matter the transposed video is horizontally and vertically rotated or vertically rotated, in order to ensure the user experience when watching, the video area is usually located in the middle part of the video frame, and correspondingly, the non-video areas are symmetrically distributed on two sides of the video area. Therefore, the video boundary line can be limited through the symmetry of the non-video area associated with the boundary line, and the interference lines with the symmetry condition which does not meet the corresponding requirement in the video reference line can be eliminated.

In order to ensure the viewing experience of the user, in the transposed video, the video area is generally ensured to be maximized, and the non-video area is ensured to be minimized. Therefore, the position of the video boundary line can be limited by the area size of the non-video area associated with the boundary line or the area size of the video area associated with the boundary line, and the interference line of which the straight line position does not meet the corresponding requirement in the video reference line can be eliminated.

When the video reference line is identified, a longer straight line segment may be identified into a plurality of sub-line segments due to identification precision or fuzzy boundary lines of the video, so that the identification precision of the boundary lines of the video is influenced, and the accuracy of a subsequent transposed video identification result is influenced. In order to avoid the above situation, optionally, the video boundary line is determined according to the video reference line, and the following may also be performed: merging all video reference lines meeting the line segment merging condition to obtain candidate boundary lines; the line segment merging condition is that the distance between straight lines where the video reference lines are located is smaller than a set distance threshold; and screening the video boundary line from the candidate boundary lines according to the boundary line attribute. Wherein the set distance threshold may be determined by a skilled person as desired or empirically.

Exemplarily, when transposed video identification of horizontal rotation and vertical rotation is carried out, horizontal straight line segments with width coordinates in a set width range are combined to obtain a candidate boundary line; and when the transposed video of vertical rotation and horizontal rotation is identified, combining the vertical straight line segments with the length coordinates in a set length range to obtain a candidate boundary line.

In order to avoid introducing straight-line segments which are not originally present in the video frames during the merging process, the distance interval between two adjacent line segments can be introduced for limitation during the merging process of the straight-line segments. Namely, when the transposed video of horizontal rotation and vertical rotation is identified, a horizontal straight-line segment with a width coordinate in a set width range is taken as a candidate horizontal straight-line segment; and if the horizontal interval between two adjacent line segments in the candidate horizontal line segment is smaller than the set horizontal distance interval, merging the two adjacent candidate horizontal line segments. Wherein the set horizontal distance interval is determined by a technician as needed or empirically, or determined iteratively through numerous trials. When vertical-to-horizontal transposition video identification is carried out, taking a numerical straight-line segment of a length range of a length coordinate displacement image black hole as a candidate numerical straight-line segment; and if the vertical interval between two adjacent line segments in the candidate numerical value straight line segment is smaller than the set vertical distance interval, combining the two adjacent candidate vertical straight line segments. Wherein the set vertical distance interval is determined by a skilled person as desired or empirical value, or determined repeatedly by a number of trials.

It can be understood that, the determination of the video boundary line is performed in a manner of first identifying and then determining, so that the determination mechanism of the video boundary line is perfected, and the accuracy of the determination result of the video boundary line is improved.

S202, if the video boundary line is not identified, determining that the video to be processed is a non-transposed video.

Exemplarily, if the video boundary line is not identified, determining that the video to be processed is a non-transposed video; if a video boundary line matched with the video frame length is identified, determining that the video to be processed is a transposed video in a horizontal-to-vertical mode; and if the video boundary line matched with the width of the video picture is identified, determining that the video to be processed is a vertically-rotated horizontally-rotated transposed video.

Because a part of transposed videos may be elaborately designed by a video producer, for example, in order to improve the viewing experience of a user, static text information associated with the videos is added in a non-video area of a video picture of a video frame, so that the purposes of information popularization and video content reminding are achieved. Therefore, in order to meet the use requirements of a video producer, the content attribute feature of text features can be introduced when transposed video identification is performed, so that when the content attribute feature of video frames in a video to be processed is identified, text regions of non-video regions associated with video boundary lines are identified, and the text regions are used as text features; and if the area ratio between the text area and the video area meets a first set proportion threshold, determining that the video to be processed is a non-transposed video.

Exemplarily, through a character recognition technology, recognizing character information of a non-video region, and determining a character region to which the character information belongs; taking the character area as character feature of the video frame; if the area between the text area and the video area is larger than the first set proportion threshold, it indicates that the video producer intentionally adds text information in the non-video area, so that the transposed video can be regarded as a transposed video meeting the requirement, and the video to be processed with the area ratio between the text area and the video area meeting the first set proportion threshold is determined as a non-transposed video. Wherein the first set proportion threshold is determined by a skilled person as required or empirical or determined iteratively through a number of trials.

It can be understood that, in order to avoid that the text information added by the video producer is not related to the video content, thereby bringing poor viewing experience to the video viewer, the text information may also be identified, and if the text information is not related to the video title and is also not related to the text or the video picture included in the video to be processed, the text information is determined to be invalid text information, and therefore, such a video to be processed with invalid text information added in the non-video area may be determined to be a transposed video. If the text information is related to a video title or related to texts or video pictures contained in the video to be processed, the text information is determined to be effective text information, so that the video to be processed with the effective text information added in the non-video area can be removed from the transposed video determination result, that is, the video to be processed is determined to be the non-transposed video.

Since portions of transposed video may have undergone careful design by the video producer, the video layout and video area fraction are adjusted, for example, to improve the viewing experience of the user. Therefore, the identification of transposed video and non-transposed video directly through the straight line feature will bring poor use experience to such video producers. In order to meet the use requirements of a video producer, the non-transposed video can be identified according to the area ratio between the video area and the non-video area associated with the video boundary line, so that the transposed video with the area ratio being adjusted carefully is added with the transposed video identification result to be removed.

Exemplarily, if the area ratio between the non-video area and the video area associated with the video boundary line satisfies a second set proportion threshold, the video to be processed is determined to be a non-transposed video. The second set proportion threshold value may be determined by a technician according to needs or empirical values, or may be determined repeatedly through a large number of experiments.

In the embodiment of the application, under the condition that the content attribute characteristics comprise linear characteristics, video boundary lines of video frames in the video to be processed are identified, and the video boundary lines are used as the linear characteristics; and if the video boundary line is not identified, determining that the video to be processed is a non-transposed video. According to the technical scheme, the non-transposed video is determined in a video boundary line identification mode, so that automatic identification of the transposed video is realized, the identification efficiency of the transposed video is improved, and effective control on the video quality is realized. Meanwhile, the data operand is small, the condition of parallel processing of a large number of videos to be processed can be realized, and a foundation is laid for the subsequent batch processing of transposed videos.

Fig. 3A is a flowchart of another transposed video identification method provided in an embodiment of the present application, and the method provides a preferred implementation manner based on the above technical solutions, and details of identification of a transposed video of a horizontal-to-vertical type is taken as an example.

A transposed video identification method as shown in fig. 3A, includes:

s310, a linear feature extraction stage;

s320, character feature extraction;

s330, a regional feature extraction stage; and the number of the first and second groups,

s340, transpose video identification.

Illustratively, the straight line feature extraction stage includes:

s311, performing frame extraction processing on the video to be processed to obtain a target image.

And S312, identifying the candidate line segments in the target image.

And S313, screening out horizontal line segments from the candidate line segments as candidate horizontal line segments.

For example, the following formula may be used to screen out horizontal line segments of the candidate line segments:

L ₁ ＝{(x _i0 ,y _i0 ,x _i1 ,y _i1 )||(y _i1 -y _i0 )/(x _i1 -x _i0 )|≤θ ₁ ，i＝1,2,…,n ₁ }；

wherein L is ₁ The selected candidate horizontal line segments are set; (x) _i0 ,y _i0 ) And (x) _i1 ,y _i1 ) Two points in the ith candidate line segment; theta ₁ The constant is a threshold constant and can be automatically adjusted according to the requirement; n is a radical of an alkyl radical ₁ Is the total number of candidate line segments. Wherein the origin of coordinates is the vertex of the upper left corner of the target image.

And S314, sorting the candidate horizontal line segments according to the height values of the candidate horizontal line segments in the target image.

And S315, merging the candidate horizontal line segments at the same height to obtain a candidate boundary line.

And S316, screening two video boundary lines from the candidate boundary lines according to the boundary line attributes, and taking the video boundary lines as straight line characteristics.

Wherein the border line attributes include a border line length, a symmetry and an area size of the non-video area associated with the border line.

For example, the following formula may be adopted to screen candidate boundary lines according to the boundary line length:

L ₂ ＝{(x _j0 ,y _j0 ,x _j1 ,y _i1 )||(x _j1 -x _j0 )/w|≤θ ₂ ，j＝1,2,…,n ₂ }；

wherein L is ₂ The selected candidate boundary lines are set; (x) _j0 ,y _j0 ) And (x) _j1 ,y _j1 ) Two points in the jth candidate boundary line are set; w is the length of the video frame in the target image; theta ₂ The constant is a threshold constant and can be automatically adjusted according to the requirement; n is a radical of an alkyl radical ₂ Is the total number of candidate boundary lines. Wherein the origin of coordinates is the vertex of the upper left corner of the target image.

For example, the following formula may be adopted to screen candidate boundary lines according to the symmetry and the area size of the non-video region associated with the boundary line, so that the obtained candidate boundary lines are symmetrical with respect to the horizontal axis, and the video region area is ensured to be as large as possible, that is, the non-video region area is as small as possible:

wherein L is _t The method comprises the steps of collecting candidate boundary lines of an area above a horizontal central axis of a video picture in a target image; l is a radical of an alcohol _b The method comprises the steps of collecting candidate boundary lines of an area below a horizontal central axis of a video picture in a target image; (x) ₀₀ ,y ₀₀ ,x ₀₁ ,y ₀₁ ) Points on one screened video boundary line are selected; (x) ₁₀ ,y ₁₀ ,x ₁₁ ,y ₁₁ ) Points on the other screened video boundary line are selected; n is ₃ Is L _t The total number of candidate boundary lines in the set; n is ₄ Is L _b The total number of candidate boundary lines in the set; h is the height of the video picture of the target image; theta.theta. ₃ The threshold constant can be adjusted according to the requirement. Wherein the origin of coordinates is the vertex of the upper left corner of the video frame.

And S317, determining whether the target image is a transposed image according to the linear characteristics to obtain a linear recognition result.

Exemplarily, if the straight line feature is recognized, obtaining a straight line recognition result as a transposed image; otherwise, the straight line recognition result is obtained as a non-transposed image.

To facilitate subsequent calculations, the identifier of the transposed image may be set to 1 and the identifier of the non-transposed image may be set to 0. That is, when the straight line recognition result is a transposed image, f _line =1; when the straight line recognition result is a non-transposed image, f _line ＝0。

Illustratively, the text feature extraction stage includes:

s321, character information in a non-video area related to the video boundary line in the target image is identified.

Illustratively, the text information in the non-video region in the target image may be recognized by an OCR (Optical Character Recognition) technique.

S322, determining the character area ratio of the character area to which the identified character information belongs to the video area.

S323, determining whether the target image is a transposed image according to the character area ratio or the video area ratio of the video area associated with the video boundary line, and obtaining a character recognition result.

Exemplarily, if the area ratio of the characters is greater than a first set ratio threshold, or the area ratio of the video area to the target image is greater than a second set ratio threshold, the character recognition result is obtained as a non-transposed image; otherwise, the character recognition result is obtained as a transposed image. The first setting proportion threshold value and the second setting proportion threshold value are set by technicians according to needs or experience values, or are determined through a large number of experiments, and the values of the first setting proportion threshold value and the second setting proportion threshold value are the same or different.

To facilitate subsequent calculations, the identifier of the transposed image may be set to 1 and the identifier of the non-transposed image may be set to 0. That is, when the character recognition result is a transposed image, f _text =1; when the character recognition result is a non-transposed image, f _text ＝0。

Referring to a schematic diagram of an object image shown in fig. 3B, a video boundary line 33 in the object image 30 divides the object image 30 into a video area 31 and a non-video area 32. In the non-video area, the recognized text information belongs to the text area 34.

Specifically, if the following conditions are met, the character recognition result is determined to be a transposed video:

wherein (x) ₀₀ ,y ₀₀ ,x ₀₁ ,y ₀₁ ) The points are points on one screened video boundary line; (x) ₁₀ ,y ₁₀ ,x ₁₁ ,y ₁₁ ) Points on the other screened video boundary line; w length of video frame of target image; h is the height of the video picture of the target image; a. The _text The area of the character in the character recognition area; theta ₄ 、θ ₅ The threshold constant is adjusted automatically according to the requirement; wherein, | | is an or operation.

Illustratively, the region feature extraction stage includes:

s331, respectively identifying smoothness in the video region, the top non-video region and the bottom non-video region associated with the video boundary line to obtain a middle smoothness, a top smoothness and a bottom smoothness.

The smoothness in the video region, the top non-video region and the bottom non-video region is illustratively computed by the laplacian operator, respectively.

S332, if the smoothness of each region meets the corresponding smoothness threshold value respectively, or the comparison condition between the middle smoothness and the top smoothness and/or the bottom smoothness, determining whether the target image is a transposed image, and obtaining a region identification result.

Exemplarily, the text recognition result is determined to be a transposed video if the following conditions are satisfied:

[(s _top ＜s _thr1 )&(s _bot ＜s _thr2 )&(s _mid ＜s _thr3 )]||[s _mid /(s _top +s _bot +ε)＞θ ₆ ]；

wherein s is _top Top smoothness, which is the top non-video area; s is _bot Bottom smoothness which is the bottom non-video area; s is _mid Intermediate smoothness for video regions; s _thr1 A smoothness threshold for the top non-video region; s is _thr2 A smoothness threshold for the bottom non-video region; s _thr3 A smoothness threshold for the video region; epsilon is a non-negative value, so that zero division errors are prevented; theta.theta. ₆ The threshold constant is adjusted automatically according to the requirement;&is an AND operation; and | is an or operation.

Wherein the smoothness thresholds for the different regions may be determined by a skilled person, as desired or empirically, or iteratively determined by a number of experiments.

To facilitate subsequent calculations, the identifier of the transposed image may be set to 1 and the identifier of the non-transposed image may be set to 0. That is, when the area recognition result is a transposed image, f _region =1; when the region identification result is a non-transposed image, f _region ＝0。

Illustratively, the transposed video identification stage includes:

s341, determining the transposed identification result of the target image according to the straight line identification result, the character identification result and the area identification result.

For example, the determination of the transposed recognition result may be performed using the following formula:

f _img ＝(f _line &(1-f _text ))|f _region ；

wherein, f _img For transposing the recognition result, if f _img If =1, the result of the transpose recognition is a transposed image; if f _img =0, then transpose is indicatedThe recognition result is a non-transposed image.&Is an AND operation; and | is an or operation.

And S342, if the number of the transposed images in the video frame to be processed exceeds a set number threshold, determining that the video to be processed is a transposed video.

Wherein the set number threshold may be determined by a skilled person as desired or empirically, or determined iteratively through a number of experiments.

The method and the device for identifying the transposed video can realize automatic, large-scale and real-time identification of the transposed video, provide identification efficiency of the transposed video, and provide possibility for effective control of video quality. Furthermore, through the automatic processing of the transposed video, abundant material resources are provided for subsequent cutting and secondary processing of the transposed video, and a foundation is laid for automation, large-scale, real-time and customization of the secondary processing of cutting the transposed video.

Fig. 4 is a block diagram of a transposed video identification apparatus according to an embodiment of the present application, where the transposed video identification apparatus 400 includes: a content attribute feature identification module 401 and a transposed video determination module 402. Wherein,

a content attribute feature identification module 401, configured to identify a content attribute feature of a video frame in a video to be processed; the content attribute features comprise at least one of straight line features, character features and region features;

a transposed video determining module 402, configured to determine whether the video to be processed is a transposed video according to the content attribute feature.

The method comprises the steps that content attribute characteristics of video frames in a video to be processed are identified through a content attribute characteristic identification module; wherein the content attribute feature comprises at least one of a straight line feature, a text feature and an area feature; and the user transposed video determining module determines whether the video to be processed is the transposed video or not according to the content attribute characteristics. According to the technical scheme, by introducing the content attribute characteristics of the video frames, the automatic identification of the transposed video of the to-be-processed video to which the video frames belong is realized, the identification efficiency of the transposed video is improved, the batch and real-time processing of the identification of the transposed video can be realized, and a foundation is laid for the automatic, large-scale, real-time and customized processing of the subsequent secondary processing of the transposed video. In addition, because the video quality of the transposed video is generally low, effective control of the video quality can be realized through automatic identification of the transposed video.

Further, if the content attribute feature includes a straight line feature, the content attribute feature identification module 401 includes:

the linear characteristic identification unit is used for identifying a video boundary line of a video frame in a video to be processed and taking the video boundary line as a linear characteristic;

a transposed video determination module 402, comprising:

and the first transposed video determining unit is used for determining that the video to be processed is a non-transposed video if the video boundary line is not identified.

Further, the straight line feature recognition unit includes:

the video reference line identification subunit is used for identifying video reference lines of video frames in the video to be processed;

and the video boundary line determining subunit is used for determining the video boundary line according to the video reference line.

Further, the video boundary line determining subunit includes:

the candidate boundary line obtaining slave unit is used for merging all the video reference lines meeting the line segment merging condition to obtain candidate boundary lines; the line segment merging condition is that the distance between straight lines where the video reference lines are located is smaller than a set distance threshold;

and the video boundary line screening slave unit is used for screening the video boundary lines from the candidate boundary lines according to the boundary line attributes.

Further, the boundary line attribute includes at least one of a boundary line length, an area size of the non-video region associated with the boundary line, and symmetry of the non-video region associated with the boundary line.

Further, if the content attribute feature further includes a text feature, the content attribute feature identification module 401 includes:

the character feature recognition unit is used for recognizing a character region of a non-video region associated with the video boundary line if the video boundary line is recognized, and taking the character region as character features;

a transposed video determination module 402, comprising:

and the second transposed video determining unit is used for determining that the video to be processed is a non-transposed video if the area ratio between the text area and the video area meets a first set proportion threshold.

Further, the transposed video determining module 402 further comprises:

and the third transposed video determining unit is used for determining the video to be processed as the non-transposed video if the area ratio between the non-video area and the video area associated with the video boundary line meets a second set proportion threshold.

Further, if the content attribute feature includes an area feature, the content attribute feature identification module 401 includes:

the smoothness identification unit is used for identifying the middle smoothness of the picture middle area of the video frame in the video to be processed and identifying the edge smoothness of the picture edge area of the video frame in the video to be processed;

a region feature determination unit configured to take the intermediate smoothness and the edge smoothness as region features;

a transposed video determination module 402, comprising:

a fourth transposed video determining unit, configured to determine that the video to be processed is a transposed video if the intermediate smoothness is less than the intermediate smoothness threshold and the edge smoothness is greater than the edge smoothness threshold;

and a fifth transposed video determining unit, configured to determine that the video to be processed is a transposed video if a smoothness ratio of the intermediate smoothness to the edge smoothness is greater than a set smoothness duty threshold.

The transposed video identification device can execute the transposed video identification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the transposed video identification method.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device implementing the transposed video identification method according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing some of the necessary operations (e.g., as an array of servers, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for transposed video identification provided herein. A non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the transposed video identification method provided herein.

The memory 502, which may be a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions/modules corresponding to the transposed video identification method in embodiments of the present application (e.g., the content attribute feature identification module 401 and the transposed video determination module 402 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the transposed video identification method in the above-described method embodiments.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of an electronic device implementing the transposed video recognition method, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 may optionally include memory located remotely from the processor 501, which may be connected over a network to an electronic device implementing the transposed video identification method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the transposed video identification method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the transposed video recognition method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

According to the technical scheme of the embodiment of the application, by introducing the content attribute characteristics of the video frame, the automatic identification of the transposed video of the video to be processed to which the video frame belongs is realized, the identification efficiency of the transposed video is improved, the batch and real-time processing of the transposed video identification can be realized, and a foundation is laid for the automatic, large-scale, real-time and customized processing of the secondary processing of the transposed video. In addition, because the video quality of the transposed video is generally low, effective control on the video quality can be realized through automatic identification of the transposed video.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A transposed video identification method, comprising:

determining whether the video to be processed is a transposed video or not according to the content attribute characteristics;

wherein, if the content attribute feature includes an area feature, the identifying the content attribute feature of the video frame in the video to be processed includes:

identifying a middle smoothness of a picture middle area of a video frame in the video to be processed and identifying an edge smoothness of a picture edge area of the video frame in the video to be processed;

taking the intermediate smoothness and the edge smoothness as the region features;

the determining whether the video to be processed is a transposed video according to the content attribute feature includes:

if the intermediate smoothness is less than an intermediate smoothness threshold and the edge smoothness is greater than an edge smoothness threshold, determining that the video to be processed is a transposed video; or,

and if the smoothness ratio of the middle smoothness to the edge smoothness is greater than a set smoothness ratio threshold, determining that the video to be processed is a transposed video.

2. The method of claim 1, wherein if the content attribute feature comprises the straight line feature, the identifying the content attribute feature of the video frame in the video to be processed comprises:

identifying a video boundary line of a video frame in a video to be processed, and taking the video boundary line as the linear characteristic;

the determining whether the video to be processed is a transposed video according to the content attribute features includes:

and if the video boundary line is not identified, determining that the video to be processed is a non-transposed video.

3. The method of claim 2, wherein the identifying a video boundary line of a video frame in the video to be processed comprises:

identifying a video reference line of the video frame in the video to be processed;

and determining the video boundary line according to the video reference line.

4. The method of claim 3, wherein said determining the video boundary line from the video reference line comprises:

merging all the video reference lines meeting the line segment merging condition to obtain candidate boundary lines; the line segment merging condition is that the distance between straight lines where the video reference lines are located is smaller than a set distance threshold;

and screening the video boundary lines from the candidate boundary lines according to the boundary line attributes.

5. The method of claim 4, wherein the border line attribute comprises at least one of a border line length, an area size of the non-video region associated with the border line, and a symmetry of the non-video region associated with the border line.

6. The method of claim 2, wherein the content attribute features further include the text feature, and the identifying the content attribute features of the video frames in the video to be processed comprises:

if the video boundary line is identified, identifying a text area of a non-video area associated with the video boundary line, and taking the text area as the text characteristic;

and if the area ratio between the text area and the video area meets a first set proportion threshold, determining that the video to be processed is a non-transposed video.

7. The method of claim 2, further comprising:

and if the area ratio between the non-video area and the video area associated with the video boundary line meets a second set proportion threshold, determining that the video to be processed is a non-transposed video.

8. A transposed video identification device, comprising:

the content attribute feature identification module is used for identifying the content attribute features of video frames in the video to be processed; the content attribute feature comprises at least one of a straight line feature, a character feature and a region feature; a transposed video determining module, configured to determine whether the video to be processed is a transposed video according to the content attribute feature;

wherein, if the content attribute feature includes an area feature, the content attribute feature identification module includes:

a smoothness identification unit, configured to identify a middle smoothness of a picture middle area of a video frame in the video to be processed, and identify an edge smoothness of a picture edge area of the video frame in the video to be processed;

a region feature determination unit configured to take the intermediate smoothness and the edge smoothness as the region feature;

the transposed video determination module, comprising:

a fourth transposed video determining unit configured to determine that the video to be processed is a transposed video if the intermediate smoothness is less than an intermediate smoothness threshold and the edge smoothness is greater than an edge smoothness threshold;

a fifth transposed video determining unit, configured to determine that the video to be processed is a transposed video if a smoothness ratio of the intermediate smoothness to the edge smoothness is greater than a set smoothness ratio threshold.

9. The apparatus of claim 8, wherein if the content attribute feature comprises the straight line feature, the content attribute feature identification module comprises:

the linear feature identification unit is used for identifying a video boundary line of a video frame in a video to be processed and taking the video boundary line as the linear feature;

the transposed video determination module, comprising:

10. The apparatus of claim 9, wherein the straight line feature recognition unit comprises:

a video reference line identification subunit, configured to identify a video reference line of the video frame in the to-be-processed video;

11. The apparatus of claim 10, wherein the video border line determining subunit comprises:

the candidate boundary line obtaining slave unit is used for merging the video reference lines meeting the line segment merging condition to obtain a candidate boundary line; the line segment merging condition is that the distance between straight lines where the video reference lines are located is smaller than a set distance threshold;

and the video boundary line screening slave unit is used for screening the video boundary line from the candidate boundary lines according to the boundary line attribute.

12. The apparatus according to claim 11, wherein the border line attribute comprises at least one of a border line length, an area size of the non-video region associated with the border line, and a symmetry of the non-video region associated with the border line.

13. The apparatus of claim 9, wherein the content attribute characteristics further include the text characteristics, and the content attribute characteristic identification module includes:

the character feature recognition unit is used for recognizing a character area of a non-video area associated with the video boundary line if the video boundary line is recognized, and taking the character area as the character feature;

the transposed video determination module, comprising:

14. The apparatus of claim 9, the transposed video determination module, further comprising:

a third transposed video determining unit, configured to determine that the video to be processed is a non-transposed video if an area ratio between a non-video area and a video area associated with the video boundary line satisfies a second set proportion threshold.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of transposed video identification of any of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method of transposed video identification of any of claims 1-7.