CN111695540A

CN111695540A - Video frame identification method, video frame cutting device, electronic equipment and medium

Info

Publication number: CN111695540A
Application number: CN202010554591.2A
Authority: CN
Inventors: 周杰; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-09-22
Anticipated expiration: 2040-06-17
Also published as: CN111695540B

Abstract

The embodiment of the disclosure provides a video frame identification method, a video frame cutting device, electronic equipment and a video frame cutting medium. The identification method comprises the following steps: acquiring a target video with a frame; performing frame extraction on the target video to obtain a plurality of frame images; identifying a candidate frame set of the target video according to the plurality of frame images; and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image. The video frame identification method provided by the embodiment of the disclosure can identify and cut the real frame of the video from the plurality of video frames according to the position relation of the video frames on the corresponding frame images, and has the advantages of high identification precision, high efficiency, less time consumption and the like.

Description

Video frame identification method, video frame cutting device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video frame recognition method, a video frame cropping method, a video frame recognition device, a video frame cropping device, an electronic device, and a computer-readable storage medium.

Background

Generally, when a user uploads a video, the user sometimes needs to adapt to the requirement of a playing window, for example, an original video originally meeting the requirement of watching a horizontal screen is converted into a new video needing to meet the requirement of watching a vertical screen, at this time, the user or the platform needs to add some borders to the original video to obtain the new video, and the size ratio of the new video changes. Conventionally, adding frames to the original video generally includes adding gaussian-blurred frames, or solid-color frames or still-picture frames to the original video.

However, in some application scenarios (such as deduplication processing), we need to find and remove some frames added in the video, and most of the related arts for removing the frames of the video use manual removal of some frames in the video by means of some video editing software. The method for removing the frame consumes a large amount of labor and time, and simultaneously has the problems of non-uniform frame removing standard and non-standard frame removing quality.

Disclosure of Invention

The present disclosure is directed to at least one of the technical problems in the prior art, and provides a video frame recognition method, a video frame cropping method, a video frame recognition device, a video frame cropping device, an electronic device, and a computer-readable storage medium.

In one aspect of the present disclosure, a method for identifying a video frame is provided, including:

acquiring a target video with a frame;

performing frame extraction on the target video to obtain a plurality of frame images;

identifying a candidate frame set of the target video according to the plurality of frame images;

and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

In some optional embodiments, the selecting, according to a position relationship of each candidate frame on the corresponding frame image, at least one candidate frame from the candidate frame set as a real frame of the target video includes:

determining a first vertical distance between one side of each candidate frame facing the frame image edge and the corresponding frame image edge;

if the first vertical distance of at least two candidate frames is smaller than a preset first threshold, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

In some optional embodiments, the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video includes:

if the first vertical distance of one candidate frame in the at least two candidate frames is smaller than the first vertical distances of the other candidate frames, taking the one candidate frame as a real frame of the target video.

determining the contact ratio between each candidate frame and the rest of candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image;

if the coincidence degree of at least two candidate frames is smaller than a preset second threshold value, at least one candidate frame is selected from the at least two candidate frames to serve as a real frame of the target video.

determining a second vertical distance between a side of each of the at least two candidate borders, which faces away from the frame image edge, and the frame image edge;

and if the second vertical distance of one of the candidate frames is smaller than the second vertical distances of the rest of the candidate frames, taking the one of the candidate frames as a real frame of the target video.

In some optional embodiments, the identifying the set of candidate borders of the target video according to the plurality of frame images includes:

identifying a first candidate frame set of the target video from the plurality of frame images, wherein the first candidate frame comprises a Gaussian blur frame and/or a pure color frame;

identifying a second set of candidate borders of the target video from the plurality of frame images, the second candidate borders comprising static borders;

and combining the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

In some optional embodiments, the identifying the first set of candidate borders of the target video from the plurality of frame images includes:

and identifying a first candidate frame set of the target video from the plurality of frame images by adopting a pre-trained frame detection model.

In some optional embodiments, the identifying the second set of candidate borders of the target video from the plurality of frame images includes:

and identifying a second candidate frame set of the target video from the plurality of frame images by adopting a frame difference method.

In some optional embodiments, the selecting at least one candidate bounding box from the set of candidate bounding boxes as the real bounding box of the target video further includes:

determining whether text information exists in the at least one candidate frame;

and adjusting the candidate frame with the text information according to a text information operation request of a user to obtain the adjusted real frame of the target video, wherein the text information operation request comprises candidate frame reserved text information and/or candidate frame discarded text information.

In another aspect of the present disclosure, a method for clipping a video frame is further provided, including:

identifying a real frame of a target video according to the video frame identification method described above;

and cutting the real frame.

In another aspect of the present disclosure, there is also provided a video frame recognition apparatus, including:

the acquisition module is used for acquiring a target video with a frame;

the frame extracting module is used for extracting frames of the target video to obtain a plurality of frame images;

the frame identification module is used for identifying a candidate frame set of the target video according to the plurality of frame images;

and the selecting module is used for selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

In some optional embodiments, the selecting module includes a determining sub-module and a selecting sub-module, and the selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to a position relationship of each candidate frame on the corresponding frame image includes:

the determining submodule is used for determining a first vertical distance between one side of each candidate frame towards the frame image edge and the corresponding frame image edge;

the selecting submodule is configured to select at least one candidate frame from the at least two candidate frames as a real frame of the target video if the first vertical distance between the at least two candidate frames is smaller than a preset first threshold.

the selecting sub-module is configured to, if the first vertical distance of one of the at least two candidate borders is smaller than the first vertical distances of the remaining candidate borders, use the one of the candidate borders as a real border of the target video.

the determining submodule is further configured to determine, according to a positional relationship of each candidate frame on the corresponding frame image, a coincidence degree between each candidate frame and the rest of the candidate frames on the frame image;

the selecting submodule is configured to select at least one candidate border from the at least two candidate borders as a real border of the target video if the overlap ratio of the at least two candidate borders is greater than a preset second threshold.

the determining submodule is used for determining a second vertical distance between one side of each candidate frame in the at least two candidate frames, which deviates from the frame image edge, and the frame image edge;

the selecting submodule is configured to, if the second vertical distance of one of the candidate borders is smaller than the second vertical distances of the remaining candidate borders, use the one of the candidate borders as a real border of the target video.

In some optional embodiments, the frame recognition module further includes a first recognition submodule, a second recognition submodule, and a merging submodule, and the recognizing the set of candidate frames of the target video according to the plurality of frame images includes:

the first identification submodule is used for identifying a first candidate frame set of the target video from the plurality of frame images, wherein the first candidate frame set comprises a Gaussian fuzzy frame and/or a pure color frame;

the second identification submodule is used for identifying a second candidate frame set of the target video from the plurality of frame images, wherein the second candidate frame comprises a static frame;

and the merging submodule is used for merging the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

In some optional embodiments, the first recognition sub-module recognizes a first candidate frame set of the target video from the plurality of frame images using a pre-trained frame detection model.

In some optional embodiments, the second identifying sub-module identifies a second candidate bounding box set of the target video from the plurality of frame images using a frame difference method.

In some optional embodiments, the selecting module includes an adjusting sub-module, and the selecting at least one candidate frame from the candidate frame set as the real frame of the target video further includes:

the determining submodule is used for determining whether text information exists in the at least one candidate frame;

the adjusting submodule is used for adjusting the candidate frame with the text information according to a text information operation request of a user to obtain the adjusted real frame of the target video, and the text information operation request comprises candidate frame reserved text information and/or candidate frame discarded text information.

In another aspect of the present disclosure, there is also provided a video frame cropping device, including:

the identification module is used for identifying the real frame of the video according to the video frame identification method described above;

and the cutting module is used for cutting the real frame.

In another aspect of the present disclosure, there is also provided an electronic device including:

one or more processors;

a storage unit configured to store one or more programs, which when executed by the one or more processors, enable the one or more processors to implement the video border recognition method or the video border cropping method described above.

In another aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program can implement the video frame recognition method or the video frame cropping method described above when executed by a processor.

According to the video frame identification method and the video frame cutting method and device, the electronic equipment and the medium, the real frame of the video can be identified and cut from the video frames according to the position relation of the video frames on the corresponding frame images, and the video frame identification method and device have the advantages of being high in identification precision, high in efficiency, low in time consumption and the like.

Drawings

FIG. 1 is a schematic block diagram of an example electronic device for implementing a video border recognition method and a clipping method and apparatus according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a video frame recognition method according to another embodiment of the disclosure;

FIG. 3 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 4 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 5 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 6 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

fig. 7 is a schematic flowchart of step S140 according to another embodiment of the disclosure;

fig. 8 is a schematic flowchart of step S140 according to another embodiment of the disclosure;

fig. 9 is a schematic flowchart of step S130 according to another embodiment of the disclosure;

fig. 10 is a schematic flowchart of step S140 according to another embodiment of the disclosure;

fig. 11 is a schematic diagram of candidate frames with text information in a frame image according to another embodiment of the disclosure;

fig. 12 is a schematic diagram of a real frame obtained after text information is retained in a frame image according to another embodiment of the present disclosure;

fig. 13 is a schematic flowchart of a video frame cropping method according to another embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of a video border recognition apparatus according to another embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of a video frame cropping device according to another embodiment of the present disclosure.

Detailed Description

For a better understanding of the technical aspects of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

First, an example electronic device for implementing the video border recognition method and the clipping method and apparatus according to an embodiment of the present disclosure is described with reference to fig. 1.

As shown in FIG. 1, the electronic device 300 includes one or more processors 310, one or more memory devices 320, an input device 330, an output device 340, and the like, interconnected by a bus system and/or other form of connection mechanism 350. It should be noted that the components and structure of the electronic device shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 310 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The storage 320 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that a processor may execute to implement the client functionality (implemented by the processor) in the embodiments of the disclosure described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 330 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 340 may output various information (e.g., images or sounds) to an outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Next, a video bezel recognition method according to another embodiment of the present disclosure will be described with reference to fig. 2.

As shown in fig. 2, a method S100 for identifying a video border includes:

s110: and acquiring the target video with the frame.

Specifically, in this step, a target video with a frame is selected from a plurality of target videos to be recognized according to actual requirements. Illustratively, one or more target videos may be selected from candidate videos for which a frame needs to be identified, for example, according to a user's request for identifying the frame. Besides, for another example, one or more target videos may be selected from some candidate videos according to system instructions, and the like, which may be determined according to actual needs, and this is not limited by the embodiments of the present disclosure.

S120: and performing frame extraction on the target video to obtain a plurality of frame images.

Specifically, in this step, the target video may be frame-decimated by using an equal-interval frame-decimating method or an unequal-interval frame-decimating method. For example, when the equal interval frame extraction method is adopted, one frame image may be extracted from the target video at equal time intervals or equal frame numbers, for example, one frame image may be extracted at 15s intervals, or one frame image may be extracted at 15 frame numbers intervals, and the like. When the unequal interval frame extraction method is adopted, one frame of image can be extracted from the target video at unequal intervals or unequal number of frames, for example, the number of interval frames of two adjacent frames can be sequentially increased or sequentially decreased or a random number is adopted. Of course, besides, those skilled in the art may also adopt other frame extraction manners to extract a plurality of frame images from the target video according to actual needs, which is not limited by the embodiment of the present disclosure.

S130: and identifying a candidate frame set of the target video according to the plurality of frame images.

Specifically, in this step, candidate frames of each frame image may be identified by an image identification method, so as to form a candidate frame set of the target video. The image recognition method comprises a pre-trained frame detection model, a color difference method, Laplace transform, a frame difference method and the like. Of course, besides this, a person skilled in the art may also use other methods to identify the candidate frame set from the plurality of frame images according to actual needs, and the embodiment of the disclosure is not limited to this.

In addition, in a frame image, the positions of the candidate frames in which they are identified may be located at different regions of the frame image. Illustratively, as shown in fig. 3, three frame candidates and one image area exist on the frame image, where frame candidate B is located at an upper edge area of the frame image, frame candidate a and frame candidate C are located at a lower edge area of the frame image, and image area S1 is located at a central area of the frame image. As still another example, as shown in fig. 4, three frame candidates D and E are present on the frame image, wherein the frame candidate D and E are located at the lower edge region of the frame image, the frame candidate F is located at the upper edge region of the frame image, and the image region S2 is located at the central region of the frame image. As still another example, as shown in fig. 5, there are two candidate borders and one image area on the frame image, where the candidate border I and the candidate border J are located at the left and right two edge areas of the frame image, respectively, and the image area S3 is located at the central area of the frame image. As still another example, as shown in fig. 6, there are four frame candidates K, L, M, and N on the frame image, where the frame candidate K, L, M, and N are located at four edge regions of the frame image, respectively, and the image region S4 is located at a central region of the frame image. Of course, the position of the candidate frame existing on the frame image is not limited to this, for example, the position of the candidate frame may be located only at one edge position of the frame image, such as the upper edge region, or the like, or the position of the candidate frame may also be located at the upper edge region and the left edge region of the frame image, or the like.

S140: and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

Specifically, in this step, for example, assuming that there are a plurality of frame images (for example, 80% of all the frame images) and the presence of a candidate frame is detected at the same position, it may be determined that a real frame exists at the position of the target video. In addition, for example, in combination with fig. 3, a candidate frame a and a candidate frame C are detected in the upper and lower edge regions of a frame image, at this time, the candidate frame C may not be the real frame of the target video, and the candidate frame C needs to be removed according to the position relationship between the candidate frame C and the candidate frame a on the frame image, so as to select the candidate frame a as the real frame of the target video. Of course, besides, a person skilled in the art may select some other ways to determine the real frame of the target video according to the position relationship of the candidate frame on the corresponding frame image, which is not limited in the embodiment of the present disclosure.

According to the video frame identification method, the position relations of the candidate frames on the corresponding frame images are compared, the real frame of the target video is selected from the candidate frames, the real frame can be accurately selected from the candidate frames, and the video frame identification precision is improved. In addition, the position relation of the images can be obtained through the coordinates of the images, the operation of comparing the position relation is simple, and the efficiency of recognizing the video frame can be effectively improved.

According to the above description, as shown in fig. 3 to fig. 6, candidate frames may exist on the frame image at different edge regions, each of the candidate frames is a square frame having a certain size, and a certain distance may exist between the candidate frame and the edge of the frame image. In addition, the candidate frame detected on the frame image is not necessarily a real frame existing on the target video, for example, as shown in fig. 3, two candidate frames, that is, a candidate frame a and a candidate frame C, are identified at the lower edge area in the frame image, obviously, the candidate frame C belongs to the frame detected by mistake, and of course, the position of the candidate frame identified by mistake is not limited thereto. For another example, as shown in fig. 4, two candidate frames, i.e., a candidate frame D and a candidate frame E, are also identified in the lower edge region of the frame image, and obviously, the candidate frame E should be excluded because the candidate frame D and the candidate frame E have a relatively high overlapping ratio, that is, the candidate frame E also belongs to the candidate frame detected by mistake.

The following describes how to eliminate the candidate bezel false detection to obtain the real bezel by using fig. 3 and fig. 4 as a specific example, but the embodiments of the present disclosure are not limited thereto.

First, a frame selection process of eliminating false detection of a candidate frame to obtain a real frame is described with fig. 3 as a specific example.

Exemplarily, as shown in fig. 7, step S140 specifically includes:

s141: determining a first vertical distance between one side of each candidate frame facing the frame image edge and the corresponding frame image edge.

Specifically, in this step, in conjunction with fig. 3, the side of the frame image edge candidate a has a first vertical distance L1 from the frame image edge, and the side of the frame image edge candidate C has a first vertical distance L2 from the frame image edge, it being understood that in practice the first vertical distance L1 between the frame image edge candidate a and the frame image edge is zero, that is, one side of the frame image edge candidate a coincides with the frame image edge.

In addition, in this step, there is no limitation on how to obtain the first vertical distance between the candidate frame and the edge of the frame image. For example, the first vertical distance may be obtained according to coordinates of a border edge pixel on a side of the candidate border facing the edge of the frame image and coordinates of the frame image edge pixel. Of course, besides, those skilled in the art may select other ways to obtain the first vertical distance according to actual needs, which is not limited by the embodiments of the disclosure.

It should be understood that, since the frame image has a plurality of edges, in this step, the frame image edge is the frame image edge closest to the frame candidate, that is, as shown in fig. 3, for the frame candidate C, the distance between the lower edge of the frame candidate and the frame image lower edge is the first vertical distance, and is not the distance between the upper edge of the frame candidate C and the frame image upper edge.

S142: if the first vertical distance of at least two candidate frames is smaller than a preset first threshold, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

Specifically, in this step, the preset first threshold is set according to actual conditions, and may be a specific value, such as 0.5cm, 1cm, 1.5cm, and the like; the first threshold may be a fixed ratio of the frame image size, for example, 0.5%, 1%, 2% of the image size, and if the candidate frame is an upper frame and a lower frame, the first threshold may be a fixed ratio of the frame image height size, if the candidate frame is a left frame and a right frame, the first threshold may be a fixed ratio of the frame image width size, for example, for an image having a frame image size of 640 × 480dpi and the fixed ratio of 1%, if the candidate frame is an upper frame and a lower frame, the first threshold may be 4.8dpi, and if the candidate frame is a left frame and a right frame, the first threshold may be 6.4 dpi.

Exemplarily, step S142 specifically includes:

Specifically, as shown in fig. 3, the candidate frame a has a first vertical distance L1 from the frame image edge, and the candidate frame C has a first vertical distance L2 from the frame image edge, and it is obvious that the first vertical distance L1 is smaller than the first vertical distance L2, and then the candidate frame a is taken as the real frame of the target video.

According to the video frame identification method, when the first vertical distances of the candidate frames meet the preset first threshold condition, the position relation between the candidate frames and the frame image edge is further judged by comparing the first vertical distances of the candidate frames, the candidate frame closest to the frame image edge is selected as the real frame of the target video, the false detection candidate frame far away from the frame image edge in the candidate frames is effectively identified, and the accuracy of video frame identification is improved.

Next, fig. 4 is taken as another specific example to explain how to eliminate the false detection of the candidate bounding box to obtain the bounding box selection process of the real bounding box.

Exemplarily, as shown in fig. 8, step S140 specifically includes:

s143: and determining the contact ratio between each candidate frame and the rest of candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image.

Specifically, in this step, the position relationship may be obtained according to the pixel coordinates of each candidate frame on the corresponding frame image, and the coincidence degree may be determined by respectively comparing the coincidence proportions of the pixel coordinates between the multiple candidate frames. Of course, besides, those skilled in the art may select other ways to calculate the overlap ratio between the candidate frames according to actual needs, and the implementation of the present disclosure is not limited thereto.

S144: if the coincidence degree of at least two candidate frames is larger than a preset second threshold value, at least one candidate frame is selected from the at least two candidate frames to serve as a real frame of the target video.

Specifically, in this step, it is determined whether the coincidence ratio of the pixel coordinates is greater than a preset second threshold, and at least one candidate frame is selected from the at least two candidate frames as a real frame of the target video. The preset second threshold may be, for example, 90%, 80%, 60% or the like according to a value set in an actual situation, and exemplarily, the preset second threshold takes a value of 80% in this embodiment.

For example, as shown in fig. 4, the degree of overlap between the candidate frame D and the candidate frame E is high, and the degree of overlap between the candidate frame D and the candidate frame E already exceeds the second threshold, at this time, at least one candidate frame needs to be selected from the candidate frame D and the candidate frame E as the real frame of the target video.

Specifically, if the overlap ratio of at least two candidate frames is greater than a preset second threshold, a second vertical distance between one side of each of the at least two candidate frames, which deviates from the frame image edge, and the frame image edge is determined.

For example, in this step, the second vertical distance may be obtained according to coordinates of a border edge pixel on a side of the candidate border away from the frame image edge and coordinates of the frame image edge pixel. Of course, besides this, the skilled person can select other ways to obtain the second vertical distance according to the actual requirement, which is not limited by the embodiments of the present disclosure.

Illustratively, as shown in fig. 4, the side of the candidate frame D facing away from the edge of the frame image has a second vertical distance L8 from the edge of the frame image, and the side of the candidate frame E facing away from the edge of the frame image has a second vertical distance L7 from the edge of the frame image. In addition, in this step, there is no limitation on how to obtain the second vertical distance between the candidate frame and the edge of the frame image. Illustratively, the second vertical distance may also be obtained according to coordinates of border edge pixels on a side of the candidate border away from the frame image edge and coordinates of the frame image edge pixels. Of course, besides, those skilled in the art may select other ways to obtain the second vertical distance according to practical requirements, which is not limited by the embodiments of the present disclosure.

It should also be understood that, since the frame image has a plurality of edges, in this step, the frame image edge is the frame image pixel closest to the candidate frame, that is, as shown in fig. 4, for the candidate frame D, the distance between the upper edge of the candidate frame and the lower edge of the frame image is the second vertical distance.

Specifically, as shown in fig. 4, if the candidate frame D has a second vertical distance L8 from the frame image edge, the candidate frame E has a second vertical distance L7 from the frame image edge, and the second vertical distance L8 is smaller than the second vertical distance L7, the candidate frame D is taken as the real frame of the target video.

In the video frame identification method of the embodiment, under the condition that the coincidence degrees of the candidate frames meet the preset second threshold, the real frame is further selected by comparing the second vertical distances of the candidate frames, that is, the candidate frame closer to the edge of the frame image is selected from the candidate frames with higher coincidence degrees as the real frame, so that the frame identification efficiency is ensured, and the identification accuracy is improved.

Exemplarily, as shown in fig. 9, step S130 specifically includes:

s131: a first set of candidate borders of the target video is identified from the plurality of frame images, the first candidate borders comprising Gaussian-blurred borders and/or solid-colored borders.

Specifically, in this step, for example, a pre-trained frame detection model may be used to identify the first candidate frame set of the target video from the plurality of frame images, the frame detection model may accurately identify a gaussian fuzzy frame and a solid frame on the frame images, no limitation is made on how to train to obtain the frame detection model, and for example, machine learning may be performed by establishing a large number of training videos with frames, so as to obtain an accurate frame detection model. Of course, besides, those skilled in the art may select other ways to obtain the border detection model according to actual needs, which is not limited by the embodiments of the present disclosure.

S132: a second set of candidate borders of the target video is identified from the plurality of frame images, the second candidate borders comprising static borders.

Specifically, in this step, for example, a pre-trained frame detection model may be used to identify a first candidate frame set of the target video from the plurality of frame images, and the frame difference method is specifically to obtain a same region of two adjacent frame images by comparing pixel grays of the two frame images, and use the same region as a static frame region. Besides, a person skilled in the art may select other methods to identify the second candidate frame according to actual needs, which is not limited by the embodiments of the present disclosure.

S133: and combining the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

Specifically, in this step, a first candidate frame set and a second candidate frame set identified by different methods are merged and combined to obtain the candidate frame set.

According to the video frame identification method, frames in the frame images are identified by using different methods, so that the frames can be accurately identified, the situations that the candidate frames are incompletely identified and the real frame selected from the candidate frames is incorrect due to poor matching applicability of the identification method and the identified images are avoided, and the accuracy of video frame identification is improved.

It should be understood that, in order to simplify the workload of identifying candidate borders from multiple frame images, the multiple frame images may be input to the border detection model in sequence, and a gaussian-blurred border and a solid-color border may be detected. And then, the static frame is identified from the multi-frame image by adopting a frame difference method, so that the understanding is easy, in the identification process of the static frame, the frame difference method is not adopted for detection at the position where the Gaussian blur frame or the pure color frame exists in one frame of image, obviously, the efficiency of detecting the static frame by adopting the frame difference method can be greatly improved, and the workload of identifying the static frame by adopting the frame difference method can be greatly reduced.

In some possible embodiments, the frame candidate identified in step S130 may have a condition shown in fig. 11 in addition to fig. 3 to 6, that is, the frame candidate identified in step S130 may have text information, and as shown in fig. 11, the frame candidate identified at the lower edge region in the frame image has text information G, such as "feel the splendid operation of the high-end player".

The following describes in detail how to identify whether text information exists in the candidate frame, and what processing is performed on the frame in which the text information exists, so as to obtain a real frame of the target video that meets the user's expectations.

Exemplarily, as shown in fig. 10, the step S140 further includes:

s145: and determining whether text information exists in the at least one candidate frame.

Specifically, in this embodiment, the text information may be subtitle information, for example, in a movie video, subtitles are displayed in real time in an edge area of a frame image. For another example, the text information may also be bullet screen information, for example, in a section of a movie, bullet screen information input by the user may be displayed at a certain area of the frame image, such as "666, too wonderful", "i like to eat fermented bean curd", and so on. For these text messages existing in the frame image, a well-established text recognition method may be used to determine whether text messages exist in the candidate box, such as an OCR text recognition method, and the like, which is not limited by the embodiments of the present disclosure.

S146: and adjusting the candidate frame with the text information according to a text information operation request of a user to obtain the adjusted real frame of the target video, wherein the text information operation request comprises candidate frame reserved text information and/or candidate frame discarded text information.

Specifically, in this step, for the candidate frame having the text information, the user may wish to detect that the text information does not exist in the real frame, as shown in fig. 12, and of course, the user may wish to detect that the text information exists in the real frame, as shown in fig. 11. Therefore, according to the two selections of the user, a corresponding text information operation request can be generated, for example, the text information operation request reserves text information for the candidate frame corresponding to the situation shown in fig. 11, and correspondingly, the text information operation request discards text information for the candidate frame corresponding to the situation shown in fig. 12. How to receive the text information operation request of the user is not limited, and for example, the text information operation request may be received by a keyboard, a mouse, a touch display screen, a voice device, and the like, but the embodiment of the present disclosure is not limited thereto. In addition, in addition to generating the text information operation request of the user according to the user selection, a default rule may be set to generate the text information operation request of the user, such as default candidate frame discarding text information or candidate frame retaining text information.

Illustratively, as shown in fig. 11, two candidate borders and one image area S5 exist in the frame image, text information G exists in the candidate border F, and text information does not exist in the candidate border O. The frame O does not have character information, adjustment is not performed in the step, when the character information operation request of the user reserves the character information for the candidate frame, the candidate frame F is not adjusted, and the candidate frame F and the candidate frame O can be directly used as the real frame of the target video.

On the contrary, when the text information operation request of the user is to discard the text information for the candidate frame, the candidate frame O is not adjusted, but the candidate frame F needs to be adjusted. At this time, the candidate frame F may be narrowed until the text information G does not exist in the candidate frame F, as shown in fig. 12, so as to obtain the adjusted real frame H of the target video.

According to the video frame identification method, the size of the candidate frame is selectively adjusted by reserving and giving up the text information, so that the candidate frame can adaptively adjust the size of the real frame of the target video according to the operation request for the text information, and the practicability of frame identification and the user friendliness are improved.

Next, a video frame cropping method S200 according to another embodiment of the present disclosure is described with reference to fig. 13, where the method includes:

s210: and identifying the real frame of the target video by adopting a video frame identification method.

Specifically, in this step, the video frame recognition method described above may be used to recognize the real frame existing in the target video, and reference may be specifically made to the related description above, which is not described herein again.

S220: and cutting the real frame.

Specifically, in this step, there is no limitation on how to crop the real border, and for example, after the real border of the target video is detected, the size of the target video may be automatically adjusted according to the size of the real border, so that a new video without a border may be generated.

In the video frame cropping method of the present embodiment, the video frame identification method described above is adopted, and the real frame of the target video can be accurately determined from the plurality of candidate frames by comparing the position relationships of the plurality of candidate frames on the corresponding frame images. Therefore, after the real frame is cut, an accurate new video without the frame can be obtained, and the cutting precision of the frame of the target video is improved.

Next, a video frame recognition apparatus 100 according to another embodiment of the present disclosure is described with reference to fig. 14, which can be applied to the video frame recognition methods described above, and specific contents thereof can refer to the related descriptions above, which are not described herein again. The device comprises an obtaining module 110, a frame extracting module 120, a frame identifying module 130 and a selecting module 140, specifically:

the obtaining module 110 is configured to obtain a target video with a frame.

The frame extracting module 120 is configured to perform frame extraction on the target video to obtain a plurality of frame images.

The frame identification module 130 is configured to identify a candidate frame set of the target video according to the plurality of frame images.

The selecting module 140 is configured to select at least one candidate frame from the candidate frame set as a real frame of the target video according to a position relationship of each candidate frame on the corresponding frame image.

According to the video frame recognition device provided by the embodiment of the disclosure, the real frame of the target video is selected from the candidate frames by comparing the position relations of the candidate frames on the corresponding frame images, so that the real frame can be accurately selected from the candidate frames, and the video frame recognition precision is improved. In addition, the position relation of the images can be obtained through the coordinates of the images, the operation of comparing the position relation is simple, and the efficiency of recognizing the video frame can be effectively improved.

Illustratively, as shown in fig. 14, the selecting module 140 includes a determining sub-module 141 and a selecting sub-module 142, and the selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relationship of each candidate frame on the corresponding frame image includes:

the determining submodule 141 is configured to determine a first vertical distance between each frame candidate frame and the corresponding frame image edge toward the frame image edge side;

the selecting submodule 142 is configured to select at least one candidate frame from the at least two candidate frames as a real frame of the target video if the first vertical distance between the at least two candidate frames is smaller than a preset first threshold.

Illustratively, as shown in fig. 14, the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video includes:

the selecting submodule 142 is configured to, if the first vertical distance of one of the at least two candidate frames is smaller than the first vertical distances of the remaining candidate frames, use the one of the candidate frames as a real frame of the target video.

Illustratively, as shown in fig. 14, the selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relationship of each candidate frame on the corresponding frame image includes:

the determining submodule 141 is further configured to determine, according to a positional relationship of each candidate frame on the corresponding frame image, a coincidence degree between each candidate frame and the rest of the candidate frames on the frame image;

the selecting submodule 142 is configured to select at least one candidate frame from the at least two candidate frames as a real frame of the target video if the overlap ratio of the at least two candidate frames is greater than a preset second threshold.

the determining submodule 141 is configured to determine a second vertical distance between a side of each of the at least two candidate borders, which faces away from the frame image edge, and the frame image edge;

the selecting submodule 142 is configured to, if the second vertical distance of one of the candidate frames is smaller than the second vertical distances of the remaining candidate frames, use the one of the candidate frames as a real frame of the target video.

Illustratively, as shown in fig. 14, the frame identifying module 130 further includes a first identifying submodule 131, a second identifying submodule 132 and a merging submodule 133, and the identifying the candidate frame set of the target video according to the plurality of frame images includes:

the first identifying submodule 131 is configured to identify a first candidate frame set of the target video from the plurality of frame images, where the first candidate frame set includes a gaussian-blurred frame and/or a solid-color frame;

the second identifying sub-module 132 is configured to identify a second candidate frame set of the target video from the plurality of frame images, where the second candidate frame set includes a static frame;

the merging submodule 133 is configured to merge the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

Illustratively, as shown in fig. 14, the selecting module 140 includes an adjusting sub-module 143, and the selecting at least one candidate frame from the candidate frame set as the real frame of the target video includes:

the determining submodule 141 is configured to determine whether text information exists in the at least one candidate frame;

the adjusting submodule 143 is configured to adjust the candidate frame with the text information according to a text information operation request of a user, to obtain the adjusted real frame of the target video, where the text information operation request includes candidate frame reserved text information and/or candidate frame discarded text information.

The video frame recognition device provided in this embodiment can select the real frame of the target video from the multiple candidate frames by comparing the position relationships of the multiple candidate frames on the corresponding frame images, effectively recognize the false detection candidate frame far away from the frame image edge in the candidate frames, and select the candidate frame near the frame image edge from the multiple candidate frames with high overlap ratio as the real frame, and adaptively adjust the size of the real frame of the target video according to the operation request for the text information, thereby achieving the video frame recognition with high accuracy, high recognition efficiency, strong practicability, and high user friendliness.

Next, a video frame cropping device 200 according to another embodiment of the present disclosure is described with reference to fig. 15, where the device includes an identification module 210 and a cropping module 220, specifically:

the identifying module 210 is configured to identify a real border of the video according to the video border identifying method described above.

The cropping module 220 is configured to crop the real border.

The video frame clipping device of the embodiment adopts the video frame recognition device described above, and can accurately determine the real frame of the target video from a plurality of candidate frames by comparing the position relationships of the candidate frames on the corresponding frame images, so that after the real frame is clipped, an accurate new video without frames can be obtained, and the clipping precision of the frame of the target video is improved.

Further, the embodiment further discloses an electronic device, including:

one or more processors;

Further, the present embodiment also discloses a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the video frame recognition method or the video frame clipping method described above.

The computer readable medium may be included in the apparatus, device, system, or may exist separately.

The computer readable storage medium may be any tangible medium that can contain or store a program, and may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, more specific examples of which include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.

The computer readable storage medium may also include a propagated data signal with computer readable program code embodied therein, for example, in a non-transitory form, such as in a carrier wave or in a carrier wave, wherein the carrier wave is any suitable carrier wave or carrier wave for carrying the program code.

It is to be understood that the above embodiments are merely exemplary embodiments that are employed to illustrate the principles of the present disclosure, and that the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.

Claims

1. A video frame identification method is characterized by comprising the following steps:

acquiring a target video with a frame;

2. The method according to claim 1, wherein said selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relationship of each candidate frame on the corresponding frame image comprises:

3. The method according to claim 2, wherein the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video comprises:

4. The method according to claim 1, wherein said selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relationship of each candidate frame on the corresponding frame image comprises:

if the coincidence degree of at least two candidate frames is larger than a preset second threshold value, at least one candidate frame is selected from the at least two candidate frames to serve as a real frame of the target video.

5. The method according to claim 4, wherein the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video comprises:

6. The method according to any one of claims 1 to 5, wherein the identifying the set of candidate borders of the target video according to the plurality of frame images comprises:

7. The method of claim 6, wherein the identifying the first set of candidate bounding boxes of the target video from the plurality of frame images comprises:

8. The method of claim 6, wherein the identifying the second set of candidate bounding boxes of the target video from the plurality of frame images comprises:

9. The method according to any one of claims 1 to 5, wherein the selecting at least one candidate bounding box from the set of candidate bounding boxes as a real bounding box of the target video further comprises:

10. A clipping method for video borders is characterized by comprising the following steps:

identifying a real border of a target video according to the method of any one of claims 1-9;

and cutting the real frame.

11. A video frame recognition apparatus, comprising:

the acquisition module is used for acquiring a target video with a frame;

12. The utility model provides a video frame tailors device which characterized in that includes:

an identification module for identifying a real border of a video according to the method of any one of claims 1-9;

and the cutting module is used for cutting the real frame.

13. An electronic device, comprising:

one or more processors;

a storage unit for storing one or more programs which, when executed by the one or more processors, enable the one or more processors to implement the recognition method of any one of claims 1 to 9 or the cropping method of claim 8.

14. A computer-readable storage medium having stored thereon a computer program, characterized in that,

the computer program is capable of implementing a recognition method according to any one of claims 1 to 9 or a cropping method according to claim 10 when executed by a processor.