CN115811585A

CN115811585A - Scene switching identification method, device, equipment, medium and computer product

Info

Publication number: CN115811585A
Application number: CN202111074656.4A
Authority: CN
Inventors: 李德东; 周成成
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2023-03-17

Abstract

The application discloses a scene switching identification method, a scene switching identification device, scene switching equipment, a scene switching identification medium and a computer product, and relates to the field of video processing. The method comprises the following steps: acquiring a first image frame and a second image frame, wherein the first image frame corresponds to a first time stamp, and the second image frame corresponds to a second time stamp; performing pixel position conversion on the first image frame by taking a target timestamp as a reference to obtain a first intermediate frame, wherein the target timestamp is a timestamp between the first timestamp and the second timestamp; performing pixel position conversion on the second image frame by taking the target timestamp as a reference to obtain a second intermediate frame; a scene-cut recognition result between the first image frame and the second image frame is determined based on a difference between the first intermediate frame and the second intermediate frame. The scene switching identification result determined by the first intermediate frame and the second intermediate frame does not generate misjudgment due to the large displacement of the image contents of the first image frame and the second image frame, and the accuracy of scene switching identification in a video image is improved.

Description

Scene switching identification method, device, equipment, medium and computer product

Technical Field

The present application relates to the field of video processing, and in particular, to a method, an apparatus, a device, a medium, and a computer product for identifying scene switching.

Background

Scene cut refers to a visual break point in a continuous video, and usually comes from the interruption and switching of the lens of a video camera during video shooting or the insertion point during later video editing. Scene cuts can be essentially viewed as a segmentation of a video sequence in the time dimension, and thus, a video can be split into a set of different scenes, where each scene is the smallest unit in a segment of the video to be retrieved. Scene cut detection is applied to many video analysis and processing methods, and is a basic processing step, such as video compression, video super-resolution, video frame interpolation, and the like.

In the related art, a method for detecting scene switching is generally implemented by a pixel difference method or a histogram matching method, that is, an inter-frame difference between two adjacent frames of images is calculated by using a pixel gray level or a color statistical value, if the inter-frame difference reaches a preset threshold, it is determined that scene switching exists between the two frames of images, and if the inter-frame difference does not reach the preset threshold, it is determined that the two frames of images are images in the same scene.

However, the detection method of scene switching implemented in the above manner is very sensitive to large motion displacement, and if the content displayed in the video has a large displacement between the two frames of images, the corresponding histogram will also have a large change, which may result in misjudgment of scene switching, that is, the two frames of images have not been subjected to scene switching, and the detection system will also determine that scene switching has occurred, thereby reducing the accuracy of scene switching detection.

Disclosure of Invention

The embodiment of the application provides a scene switching identification method, a scene switching identification device, a scene switching identification medium and a computer product, and the scene switching identification method, the scene switching identification device, the scene switching identification medium and the computer product can improve the accuracy of scene switching identification in a video picture. The technical scheme is as follows:

in one aspect, a method for identifying a scene change is provided, where the method includes:

acquiring a first image frame and a second image frame, wherein the first image frame and the second image frame are two adjacent images in a video to be identified, the first image frame corresponds to a first timestamp, and the second image frame corresponds to a second timestamp;

performing pixel position conversion on the first image frame by taking a target timestamp as a reference to obtain a first intermediate frame, wherein the target timestamp is a timestamp between the first timestamp and the second timestamp;

performing pixel position conversion on the second image frame by taking the target timestamp as a reference to obtain a second intermediate frame;

determining a scene-cut recognition result between the first image frame and the second image frame based on a difference between the first intermediate frame and the second intermediate frame.

In another aspect, an apparatus for identifying a scene change is provided, the apparatus including:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first image frame and a second image frame, the first image frame and the second image frame are two adjacent images in a video to be identified, the first image frame corresponds to a first timestamp, and the second image frame corresponds to a second timestamp;

a generating module, configured to perform pixel position transformation on the first image frame by using a target timestamp as a reference to obtain a first intermediate frame, where the target timestamp is a timestamp between the first timestamp and the second timestamp;

the generating module is further configured to perform pixel position conversion on the second image frame by using the target timestamp as a reference to obtain a second intermediate frame;

a determination module to determine a scene cut identification result between the first image frame and the second image frame based on a difference between the first intermediate frame and the second intermediate frame.

In another aspect, a computer device is provided, where the terminal includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the scene cut recognition method in any of the embodiments of the present application.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the program code is loaded and executed by a processor to implement the scene-switching identification method described in any of the embodiments of the present application.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer readable storage medium, and the processor executes the computer instruction, so that the computer device executes the scene change identification method in any one of the above embodiments.

The technical scheme provided by the application at least comprises the following beneficial effects:

when scene switching in a video to be recognized is recognized, pixel position conversion is carried out on a first image frame and a second image frame in the video to be recognized respectively by taking a target timestamp as a reference, a first intermediate frame corresponding to the first image frame and a second intermediate frame corresponding to the second image frame are obtained, wherein the first intermediate frame can represent the display condition of the content in the first image frame in the target timestamp after motion estimation, the second intermediate frame can represent the display condition of the content in the second image frame in the target timestamp after motion estimation, a scene switching recognition result between the first image frame and the second image frame is determined according to the difference between the first intermediate frame and the second intermediate frame, the scene switching recognition result cannot be misjudged due to the fact that the image content of the first image frame and the second image frame has large displacement, and the accuracy of scene switching recognition in the video image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method for identifying a scene cut provided by an exemplary embodiment of the present application;

FIG. 3 is a flowchart of a method for identifying a scene cut provided by another exemplary embodiment of the present application;

FIG. 4 is a schematic illustration of sensed data provided by an exemplary embodiment of the present application;

FIG. 5 is a flowchart of a video frame insertion method provided in an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a frame insertion network according to an exemplary embodiment of the present application;

FIG. 7 is a block diagram illustrating an apparatus for identifying scene cuts according to an exemplary embodiment of the present application;

fig. 8 is a block diagram of a scene change recognition apparatus according to another exemplary embodiment of the present application;

fig. 9 is a schematic structural diagram of a server according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application are briefly described:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, smart transportation, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Optical Flow refers to the instantaneous speed of pixel motion of spatially moving objects on the viewing imaging plane (e.g., images and video frames). In a video sequence, because the time interval is small, the displacement of the same pixel between adjacent frames in the time domain is generally approximated as the instantaneous velocity of the optical flow, which represents the motion of the pixel.

In the embodiment of the application, the video is identified by scene switching through a computer vision technology, optical flow information corresponding to two adjacent images in the video to be identified is obtained based on machine learning, position alignment is carried out based on the optical flow information to obtain predicted intermediate frames corresponding to the two adjacent images, whether scene switching exists in picture content in the two adjacent images is determined according to the difference between the two predicted intermediate frames, and the accuracy of scene switching identification is improved.

In combination with the above noun explanations, the application scenarios of the scene switching recognition method provided in the present application are schematically illustrated, and the above method can be applied to scenarios including, but not limited to:

the scene switching identification method can be applied to Video Frame Interpolation (Video Frame Interpolation) technology.

The video Frame interpolation technology is also called a Frame Rate Conversion (Frame Rate Conversion) technology, generates a high Frame Rate video (N =1,2, \ 8230;) from a low Frame Rate video, and shortens the display time between each Frame by adding one or more frames to each two frames displayed on an original screen, thereby improving the fluency of the video and achieving a better visual and sensory effect. The video frame interpolation technology is widely applied to old and old film repair, video live broadcast, video playing ends and the like at present.

When frame interpolation processing needs to be carried out on a target video, two frame image frames, namely a first image frame and a second image frame, before and after a target timestamp of a frame to be interpolated are obtained, the first image frame and the second image frame are fused to generate an interpolated image, and the interpolated image is inserted into the target timestamp. Because the scene change may occur in the picture content in the target video, if the first image frame and the second image frame correspond to the picture content of different scenes, and if the first image frame and the second image frame are directly fused to obtain the frame insertion image, problems may occur, such as that the produced frame insertion image is not related to the video content, or the frame insertion image is a scrambled image. Therefore, before performing frame interpolation, scene switching in a target video needs to be detected, in the present application, it is determined whether scene switching occurs between image frames before and after a target timestamp of a frame to be interpolated, if scene switching occurs, a frame interpolation image corresponding to the target timestamp needs special processing, for example, a previous frame is copied to replace an inaccurate frame interpolation image, and if scene switching does not occur, a frame interpolation image obtained by fusing the previous and next image frames is used. Due to the fact that the false alarm rate of scene switching detection is reduced, the frame interpolation effect corresponding to the finally obtained frame interpolation video is improved.

And (II) the scene switching identification method can be applied to the video segmentation technology.

Video segmentation refers to dividing a video sequence into regions according to a certain standard, and aims to separate entities with certain meaning from the video sequence, wherein the entities with certain meaning become video objects in a digital video. Video segmentation plays a very important role in many fields, such as video coding, video object extraction in WEB technology, pattern recognition, computer vision, video retrieval, etc.

Video segmentation generally needs to be performed by using information of video images on a spatial and temporal axis, and the same or similar object segmentation methods can be used for video pictures of the same scene, so that image sequences of target videos need to be grouped according to scenes. Before image sequences of a target video are grouped according to scenes, the image sequences in the target video are determined to belong to the same scene, namely, a demarcation point between the scenes is detected, namely, a node of the target video where scene switching occurs can be determined through the scene switching identification method, so that the image sequences are grouped according to the scenes.

And (III) the scene switching identification method can be applied to video compression technology.

Video compression, also known as Video Encoding (Video Encoding), refers to the manner in which a file in an original Video format is converted into another Video format file by a compression technique. Video is a continuous sequence of images, consisting of successive frames, a frame being an image. Due to the persistence of vision effect of the human eye, when a sequence of frames is played at a certain rate, a video with continuous motion is seen. Because the similarity between the continuous frames is extremely high, in order to facilitate storage and transmission, the original video can be coded and compressed to remove redundancy in space and time dimensions, while the similarity between the image frames belonging to the same scene is high, and in order to facilitate video compression, scene switching identification is required.

Namely, when the target video needs to be compressed, the nodes with scene switching in the target video need to be detected, the image frames corresponding to the target video are reasonably grouped according to the nodes with scene switching, each image group is encoded, compressed data corresponding to the target video is obtained, and error spread of the video in the encoding prediction process can be blocked. In the decoding process, an inter-frame prediction mode can be adopted, in the decoding process of image frames in the same image group, the subsequent frames can refer to the earlier decoded frames, and as the image contents belonging to the same image group are in the same scene, the current coded frame can acquire effective reference images (blocks) from the earlier decoded frames so as to ensure the accuracy of the video obtained by decoding.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application is shown. The implementation environment includes: a terminal 110, a server 120, and a communication network 130.

The terminal 110 includes various types of terminal devices such as a mobile phone, a tablet computer, a desktop computer, and a laptop computer. The terminal 110 is configured to provide the target video or a sequence of images corresponding to the target video to the server 120.

The server 120 includes a scene switching identification module, and the server 120 inputs the target video as a video to be identified to the identification module for scene switching identification, so as to obtain a scene switching identification result between image frames in the target video. Optionally, the server 120 may provide functions including video frame interpolation, video compression, video segmentation, and the like to the terminal 110, where the identification module is one of the above functions, and taking the function that the server 120 provides the video frame interpolation as an example, the terminal 110 requests the server 120 to perform frame interpolation processing on a target video, after receiving the request, the server 120 determines a scene switching identification result between image frames in the target video uploaded by the terminal 110 through the identification module, performs frame interpolation on the target video based on the scene switching identification result, generates a frame interpolation video with a frame rate higher than that of the target video, and returns the frame interpolation video to the terminal 110.

It should be noted that the server 120 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data, and an artificial intelligence platform.

The Cloud Technology (Cloud Technology) is a hosting Technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have an own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industry data need strong system background support and can be realized only through cloud computing.

In some embodiments, the server 120 described above may also be implemented as a node in a blockchain system. The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain, which is essentially a decentralized database, is a string of data blocks associated using cryptography, each data block containing information about a batch of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises the steps of maintaining public and private key generation (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorized condition, supervising and auditing the transaction condition of some real identities, and providing rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node point devices and used for verifying the effectiveness of the service request, recording the effective request after consensus is completed on storage, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the encrypted service information to a shared account (network communication) completely and consistently, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation.

Illustratively, the terminal 110 and the server 120 are connected via a communication network 130.

Referring to fig. 2, a method for identifying a scene cut according to an embodiment of the present application is shown, in which the method is applied to a server shown in fig. 1, and the method includes:

step 201, a first image frame and a second image frame are acquired.

The first image frame and the second image frame are two adjacent images in the video to be identified, the first image frame corresponds to a first time stamp, and the second image frame corresponds to a second time stamp.

Optionally, the video to be identified may be a video uploaded by the terminal, or a video obtained from a database.

Schematically, a video to be identified is acquired, the video to be identified is subjected to framing processing to obtain an image frame sequence, and a first image frame and a second image frame which are adjacent in time domain are acquired from the image frame sequence. In some embodiments, the framing processing is performed according to the frame rate and the video duration corresponding to the video to be recognized, for example, the frame rate corresponding to the video to be recognized is 24FPS, and the video duration corresponding to the video to be recognized is 10s, so that an image frame sequence composed of 240 image Frames is obtained after the framing processing is performed on the video to be recognized.

A first image frame and a second image frame are obtained from the image frame sequence. Optionally, if all nodes where scene switching occurs in the video to be identified are determined, all image frames in the image frame sequence corresponding to the video to be identified are grouped, each group includes an image pair of < the first image frame, the second image frame >, for example, the image frame sequence includes n image frames, and the obtained image pairs are grouped as < image frame 1, image frame 2>, < image frame 2, image frame 3>, < image frame 3, image frame 4>, < 823030, < image frame n-1, and image frame n >. Optionally, the first image frame and the second image frame in the image frame sequence may also be determined according to a preset requirement, for example, the terminal instructs to perform frame supplementing (frame interpolation) in a time period when a target video object in the video to be recognized moves fast, the server recognizes the target video object in the video to be recognized, determines the image frame including the target video object and determines whether a displacement condition of the target video object in the image frame meets a frame supplementing requirement, and if so, performs image pair generation of < the first image frame and the second image frame > on the image frame to perform scene switching detection.

Step 202, performing pixel position conversion on the first image frame by taking the target timestamp as a reference to obtain a first intermediate frame.

The target timestamp is a timestamp between the first timestamp and the second timestamp. In some embodiments, the target timestamp may be a midpoint timestamp between the first timestamp and the second timestamp, i.e., the target timestamp is the same length of time from the first timestamp to the second timestamp. The target timestamp may be another timestamp between the first timestamp and the second timestamp, and is not limited herein.

In some embodiments, motion estimation is performed on the first image frame and the second image frame with reference to the target timestamp to determine a variation of the first image frame from the first timestamp to the target timestamp and a variation of the second image frame from the second timestamp to the target timestamp, respectively. The motion estimation method includes, for example and without limitation, a full Search method, a Predictive motion vector field adaptive fast Search method, an enhanced Predictive region Search (EPZS) method, an asymmetric cross multi-layer hexagon Search method, and an optical flow estimation method.

In the embodiment of the present application, the above-described motion estimation process is implemented by an optical flow estimation method implemented based on pixel motion. Illustratively, acquiring first optical flow information of a first image frame between a first time stamp and a target time stamp, wherein the first optical flow information is used for indicating the motion condition of pixels in the first image frame between the first time stamp and the target time stamp; second optical flow information of the second image frame between the second time stamp and the target time stamp is acquired, and the second optical flow information is used for indicating the motion situation of pixels in the second image frame between the second time stamp and the target time stamp. Illustratively, the optical flow information may include the instantaneous velocity (optical flow) of the motion of the pixels between the time stamps, or may include the displacement of the motion of the pixels between the time stamps, or other data information that can represent the motion of the pixels.

In some embodiments, the Optical Flow Network (Optical Flow Network) U is passed through _flow Estimating to obtain a first image frameFrame ₀ First optical Flow information Flow between first time stamp and target time stamp _0→t And a second image Frame ₁ Second optical Flow information Flow between the second time stamp and the target time _1→t As shown in equation one.

The formula I is as follows: flow _0→t ，Flow _1→t ＝U _flow (Frame ₀ ，Frame ₁ )

Optionally, the method for acquiring the optical flow information includes, but is not limited to: gradient-based optical flow estimation methods, i.e., using the time-space differential (i.e., the instantaneous space gradient function) of the time-varying image gray scale (or a filtered version thereof) to calculate the velocity vector of a pixel, such as the horns-Schunck algorithm and the Lucas-Kanade (LK) algorithm; the characteristic-based optical flow estimation method is used for realizing optical flow estimation by continuously positioning and tracking main characteristics of a target object; the optical flow estimation method based on the area comprises the steps of firstly positioning similar areas in an image, and then calculating the optical flow through the displacement of the similar areas; the optical flow estimation method based on the phase obtains the optical flow field corresponding to the image by utilizing the phase information.

In some embodiments, after determining the motion of the pixel between the first timestamp and the target timestamp, the pixel position alignment may be performed according to the motion to obtain the corresponding inter frame. Taking the above motion situation represented by optical flow information as an example, the first intermediate frame is obtained by performing pixel position conversion on the first image frame based on the first optical flow information.

Illustratively, the pixel position alignment operation may be an image torsion (Warp) operation, where an optical flow field of a pixel is used as a two-dimensional vector and includes a displacement condition of each pixel in a two-dimensional (horizontal direction x and vertical direction y) plane, and the Warp torsion alignment operation is to perform coordinate transformation on a pixel point on an image according to an optical flow, so as to translate the pixel to a corresponding place in different time domains (adding or subtracting coordinates along the x direction, i.e., translating left and right in the horizontal direction, and adding or subtracting coordinates along the y direction, i.e., translating up and down in the vertical direction). That is, the obtained first optical Flow information Flow is used _0→t Frame the first image Frame ₀ By warp alignment toFirst intermediate Frame _0→t As shown in equation two.

The second formula is as follows: frame _0→t ＝warp(Frame ₀ ，Flow _0→t )

The pixel position alignment operation may also be implemented by other coordinate transformation or position alignment methods, which are not limited herein.

And 203, performing pixel position conversion on the second image frame by taking the target timestamp as a reference to obtain a second intermediate frame.

Schematically, the motion estimation process corresponding to the second image frame is the same as that of the first image frame, and is not described herein again. In the embodiment of the present application, for example, a second optical flow information between a second time stamp and a target time stamp is obtained by performing motion estimation based on an optical flow estimation method implemented by pixel motion, and after the second optical flow information is acquired, a second intermediate frame is obtained by performing pixel position conversion on the second image frame based on the second optical flow information.

Similarly, taking the pixel position alignment operation as the image warping (Warp) operation as an example, the obtained second optical Flow information Flow is used _1→t Frame the second image Frame ₁ Obtaining a second intermediate Frame by warp alignment _1→t As shown in equation three.

The formula III is as follows: frame _1→t ＝warp(Frame ₁ ，Flow _1→t )

The first intermediate frame and the second intermediate frame are the result of the first image frame and the second image frame moving to the same time stamp, and the deviation caused by the movement at different times can be eliminated.

Step 204, determining a scene change recognition result between the first image frame and the second image frame based on the difference between the first intermediate frame and the second intermediate frame.

In the implementation of the application, whether scene switching occurs between the first image frame and the second image frame is determined according to the difference between the first intermediate frame and the second intermediate frame. If scene switching occurs between the first image frame and the second image frame, a larger difference exists between the first image frame and the second image frame; if the scene switching does not occur between the first image frame and the second image frame, the difference between the first image frame and the second image frame is smaller.

Comparing the first intermediate frame with the second intermediate frame to determine difference information between the first intermediate frame and the second intermediate frame; in response to the difference information reaching a preset difference threshold value, determining that a scene switching identification result between the first image frame and the second image frame is a first result, wherein the first result is used for indicating that a scene corresponding to the first image frame is different from a scene corresponding to the second image frame, namely, scene switching occurs; or, in response to failure of matching the difference information with the preset difference threshold, determining that the scene switching identification result between the first image frame and the second image frame is a second result, where the second result is used to indicate that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame, that is, no scene switching occurs.

Optionally, the difference information may be determined by a pixel difference method, that is, a pixel difference between each pixel point in the first intermediate frame and the second intermediate frame is determined; the mean value of the pixel differences of each image is determined as difference information. In one example, the width and height of the first intermediate Frame and the second intermediate Frame are W × H, and each pixel point (i, j) in the first intermediate Frame is calculated _0→t And a second intermediate Frame _1→t The difference between the pixels, and the average value is obtained to obtain the difference information buffer, as shown in equation four.

The formula IV is as follows:

optionally, the difference information may be determined through histogram statistics, that is, a first histogram corresponding to the first intermediate frame and a second histogram corresponding to the second intermediate frame are obtained; a difference between the first histogram and the second histogram is determined as difference information.

The difference information may be determined by a feature point matching method, a deep learning neural network, or the like, and is not limited herein.

In summary, according to the scene switching identification method provided in the embodiment of the present application, when identifying scene switching in a video to be identified, pixel position conversion is performed on a first image frame and a second image frame in the video to be identified respectively by using a target timestamp as a reference, so as to obtain a first intermediate frame corresponding to the first image frame and a second intermediate frame corresponding to the second image frame, where the first intermediate frame can represent a display condition of content in the first image frame at the target timestamp after motion estimation, and the second intermediate frame can represent a display condition of content in the second image frame at the target timestamp after motion estimation, and a scene switching identification result between the first image frame and the second image frame is determined according to a difference between the first intermediate frame and the second intermediate frame, and the scene switching identification result cannot be misjudged due to a large displacement of image content of the first image frame and the second image frame, so as to improve accuracy of scene switching identification in the video image.

Referring to fig. 3, a method for identifying a scene change according to an embodiment of the present application is shown, in the embodiment of the present application, taking motion estimation of pixels through optical flow estimation as an example, the method includes:

step 301, a first image frame and a second image frame are acquired.

Schematically, a video to be identified is acquired, the video to be identified is subjected to framing processing to obtain an image frame sequence, and a first image frame and a second image frame which are adjacent in a time domain are acquired from the image frame sequence. The video to be recognized corresponds to a frame rate (FPS), and in some embodiments, the framing processing is performed according to the frame rate and the video duration corresponding to the video to be recognized. A first image frame and a second image frame are acquired from the image frame sequence.

Step 302, first optical flow information of a first image frame between a first timestamp and a target timestamp is acquired.

Optionally, the method for acquiring the optical flow information includes, but is not limited to: gradient-based optical flow estimation methods, i.e., computing the velocity vector of a pixel using the spatio-temporal differentiation (instantaneous spatial gradient function) of the time-varying image gray (or a filtered version thereof), such as the Horn-Schunck algorithm and the Lucas-Kanade (LK) algorithm; the characteristic-based optical flow estimation method is characterized in that the optical flow estimation is realized by continuously positioning and tracking main characteristics of a target object; the optical flow estimation method based on the area comprises the steps of firstly positioning similar areas in an image, and then calculating the optical flow through the displacement of the similar areas; the optical flow estimation method based on the phase obtains the optical flow field corresponding to the image by utilizing the phase information.

Step 303, second optical flow information of the second image frame between the second timestamp and the target timestamp is acquired.

In the embodiment of the application, the Optical Flow Network (Optical Flow Network) U is used _flow Estimating to obtain a first image Frame ₀ First optical Flow information Flow between a first timestamp and a target timestamp _0→t And a second image Frame ₁ Second optical Flow information Flow between the second time stamp and the target time _1→t 。

Step 304, performing pixel position transformation on the first image frame based on the first optical flow information to obtain a first intermediate frame.

In some embodiments, first coordinate change data of pixels in the first image frame between the first timestamp and the target timestamp is acquired based on the first optical flow information; and displacing the pixels in the first image frame based on the first coordinate change data to obtain a first intermediate frame. Illustratively, the above-mentioned first coordinate transformation data is determined by a pixel movement velocity indicated by the first optical flow information, that is, the first optical flow information is converted into a first movement velocity of a pixel in the first image frame, a first time interval between the first time stamp and the target time stamp is determined, and the first coordinate transformation data is determined based on the first movement velocity and the first time interval.

In some embodiments, pixel alignment based on the first optical flow information is achieved by image warping resulting in a first intermediate frame, i.e., pixels in the first image frame are mapped to new locations based on the first optical flow information, generating the first intermediate frame.

Step 305, performing pixel position transformation on the second image frame based on the second optical flow information to obtain a second intermediate frame.

In some embodiments, second coordinate change data of pixels in the second image frame between the second timestamp and the target timestamp is acquired based on the second optical flow information; and displacing the pixels in the second image frame based on the second coordinate change data to obtain a second intermediate frame. Illustratively, the above-mentioned second coordinate transformation data is determined by a pixel movement velocity indicated by the second optical-flow information, that is, the second optical-flow information is converted into a second movement velocity of the pixel in the second image frame, a second time interval between the second time stamp and the target time stamp is determined, and the second coordinate transformation data is determined based on the second movement velocity and the second time interval.

In some embodiments, pixel alignment based on the first optical flow information is achieved by image warping resulting in a second intermediate frame, i.e., pixels in the second image frame are mapped to new locations based on the second optical flow information, generating the second intermediate frame.

Step 306, comparing the first intermediate frame and the second intermediate frame, and determining difference information between the first intermediate frame and the second intermediate frame.

Optionally, the method for determining the difference information includes, but is not limited to, at least one of a pixel difference method, a histogram statistical method, a feature point matching method, a deep learning neural network method, and the like, and is not limited herein.

Step 307, in response to the difference information reaching the preset difference threshold, determining a scene change recognition result between the first image frame and the second image frame as a first result.

The first result is used for indicating that the scene corresponding to the first image frame is different from the scene corresponding to the second image frame.

Illustratively, the difference threshold corresponds to a difference information obtaining manner, for example, when the difference information is obtained by a pixel difference method, the difference threshold corresponds to a preset gray threshold.

Optionally, the difference threshold may be preset by the system, or may be indicated by the terminal, which is not limited herein.

In the embodiment of the application, when the difference information reaches the preset threshold, it indicates that there is a scene change between the first image frame and the second image frame, and the scene change identification result between the first image frame and the second image frame is output as the first result.

And 308, in response to the failure of matching the difference information with the preset difference threshold, determining the scene switching identification result between the first image frame and the second image frame as a second result.

And the second result is used for indicating that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame.

In the embodiment of the present application, when the difference information does not reach the preset threshold, it indicates that there is no scene change between the first image frame and the frame in the second image, that is, the first image frame and the second image frame belong to pictures in the same scene, and the scene change identification result between the first image frame and the second image frame is output as the second result.

In the method for identifying scene switching provided in the embodiment of the present application, since the first intermediate frame and the second intermediate frame generated by the optical flow information are the result of the original image frame moving to the same target timestamp, the influence of the motion situation on the scene switching determination between the image frames can be eliminated, where the motion situation includes, but is not limited to, the following situations: object changes, object motion, lens movement, etc.

Illustratively, the scene switching identification method provided by the embodiment of the present application performs scene switching detection by using an intermediate frame after alignment of an optical flow warp between video image frames, and does not need an additional auxiliary module, and the aligned front and rear frames of a video can well eliminate interference caused by motion displacement, thereby greatly reducing false alarm rate of scene switching detection. That is to say, the scene-switching identification method provided in the embodiment of the present application effectively reduces the occurrence of false detection when the original input frame image is used for scene-switching detection and a large displacement exists in the image in the existing method. The existing method is based on the original input frame for detection, the error rate between the adjacent frames is very high, and the method is based on the aligned image for detection, so that the stability of a detection system can be greatly improved, and errors caused by movement are reduced. For example, in a certain motion video, there are actually 5 scene changes, and the detection data is as shown in fig. 4, where a detection result a401 is a detection result of an existing method, a detection result B402 is a detection result of the present application, a vertical axis is a detection result of comparing a difference between a current frame and a previous frame, and if a percentage exceeds a certain threshold, it is determined that a scene change has occurred, it can be seen that the detection result a401 cannot identify a true scene change (both values are high) in some regions, while a peak is obvious when each scene change occurs in the detection result B402 corresponding to the present application, and a detection result of the corresponding scene change is more accurate.

In summary, according to the scene switching identification method provided in the embodiment of the present application, when identifying scene switching in a video to be identified, pixel position transformation is performed on a first image frame and a second image frame in the video to be identified respectively by using optical flow information and using a target timestamp as a reference, a first intermediate frame corresponding to the first image frame and a second intermediate frame corresponding to the second image frame are obtained, and a scene switching identification result between the first image frame and the second image frame is determined according to a difference between the first intermediate frame and the second intermediate frame. The first intermediate frame can represent the display condition of the content in the first image frame at the target timestamp after motion estimation, the second intermediate frame can represent the display condition of the content in the second image frame at the target timestamp after motion estimation, the scene switching identification result determined by the first intermediate frame and the second intermediate frame cannot be misjudged due to the fact that the image content of the first image frame and the image content of the second image frame have large displacement, and the accuracy of scene switching identification in a video image is improved.

Referring to fig. 5, a video frame interpolation method according to an embodiment of the present application is shown, in the embodiment of the present application, the method for identifying a scene switch is described as being applied to a scene of a video frame interpolation, and the method includes:

step 501, a video to be identified is obtained.

Optionally, the video to be identified may be a video uploaded by the terminal, or a video acquired from a database.

Step 502, segmenting the video to be recognized according to the original frame number corresponding to the video to be recognized to obtain an image frame sequence.

Illustratively, the video to be identified corresponds to a frame rate, and the original frame number of the video to be identified can be determined according to the frame rate and the video duration of the video to be identified. After the original frame number is determined, the video to be recognized can be segmented according to the original frame number to obtain a corresponding image frame sequence.

In step 503, a first image frame and a second image frame adjacent in a time domain are determined from the image frame sequence.

In the embodiment of the application, at least one frame time period to be inserted is determined according to the target frame number requirement corresponding to a video frame inserting task, the frame time period to be inserted is a time period in which an inserting frame needs to be executed in a video to be identified, the frame time period to be inserted comprises two frame images, and a first image frame and a second image frame of the frame to be inserted are determined according to the frame time period to be inserted. Illustratively, a target frame number requirement is obtained, wherein the target frame number requirement is used for indicating the frame number of an interpolation video generated according to a video to be identified; determining a frame period to be inserted in the image frame sequence based on the target frame number requirement; determining corresponding picture images in a frame period to be inserted as a first image frame and a second image frame; determining the starting time corresponding to the display duration of a first image frame in a frame period to be inserted as a first timestamp; and determining the starting time corresponding to the display duration of the second image frame in the video to be identified as a second time stamp.

In an example, the frame rate of a video to be recognized is 24fps, the duration of a corresponding video is 10s, and the current video frame interpolation task indicates that an interpolated video with a frame rate of 60fps is generated, it is determined that 20 frames of images need to be inserted in the video to be recognized per second, 23 insertable frame periods are included in each second, and taking an image frame added to each image pair as an example, 20 insertable frame periods need to be determined from the 23 insertable frame periods, optionally, the 20 insertable frame periods may be randomly extracted, or may be determined according to a preset requirement. Each frame period to be inserted corresponds to one image pair < first image frame, second image frame >, and the corresponding first image frame, first timestamp, second image frame and second timestamp can be determined through the determined frame period to be inserted.

Step 504, performing pixel position conversion on the first image frame by taking the target timestamp as a reference to obtain a first intermediate frame.

Wherein the target timestamp is a timestamp between the first timestamp and the second timestamp. In some embodiments, the target timestamp may be a midpoint timestamp between the first timestamp and the second timestamp, i.e., the lengths of time from the target timestamp to the first timestamp and from the target timestamp to the second timestamp are the same. The target timestamp may be another timestamp between the first timestamp and the second timestamp, and is not limited herein.

In the embodiment of the application, the motion estimation of the pixel is realized through the optical flow information, namely, the first image Frame is obtained through the optical flow network estimation ₀ First optical Flow information Flow between first time stamp and target time stamp _0→t Then warp warping alignment is carried out on the first image frame according to the first optical flow information, and a first intermediate frame is obtained.

And 505, performing pixel position conversion on the second image frame by taking the target timestamp as a reference to obtain a second intermediate frame.

In the embodiment of the present application, the obtaining of the second intermediate frame is the same as the obtaining of the first intermediate frame, and is not described herein again.

Step 506, comparing the first intermediate frame and the second intermediate frame, and determining difference information between the first intermediate frame and the second intermediate frame.

In step 507, in response to the difference information reaching the preset difference threshold, the scene change recognition result between the first image frame and the second image frame is determined to be a first result.

Illustratively, the difference threshold corresponds to the difference information, for example, when the difference information is obtained by a pixel difference method, the difference threshold corresponds to a preset gray threshold.

And step 508, in response to the failure of matching the difference information with the preset difference threshold, determining that the scene switching identification result between the first image frame and the second image frame is a second result.

The second result is used to indicate that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame.

In the embodiment of the application, after the scene switching recognition result between the first image frame and the second image frame is determined according to the preset difference threshold, the corresponding target frame interpolation image is acquired based on the scene switching recognition result.

In response to the scene change recognition result being the first result, the first image frame or the second image frame is determined as the target frame-inserted image, step 509.

In the embodiment of the present application, when there is a scene change between the first image frame and the second image frame, the first image frame or the second image frame is taken as a target frame interpolation image to be inserted into the target timestamp.

And 510, in response to the scene switching identification result being the second result, fusing the first intermediate frame and the second intermediate frame to obtain a target frame-inserted image.

Optionally, when there is no scene change between the first image frame and the second image frame, the first intermediate frame or the second intermediate frame may be used as a target inter-frame image inserted into the target timestamp, or the first intermediate frame and the second intermediate frame are fused by a Convolution Network (Convolution Network) to obtain the target inter-frame image.

And 511, inserting the target frame inserting image into a target time stamp in the video to be identified to generate a frame inserting video.

And after the target frame interpolation image is determined, inserting the target frame interpolation image into a target time stamp in the video to be identified to generate a frame interpolation video.

In one example, please refer to fig. 6, which illustrates a schematic structural diagram of a frame insertion network 600 according to an exemplary embodiment of the present application. The frame interpolation network 600 comprises an optical flow network 610, a warp processing module 620, a judging module 630, a convolution network 640 and a video synthesizing module 650, wherein a first image frame 601 and a second image frame 602 are input into the optical flow network 610 to respectively obtain first optical flow information 611 and second optical flow information 612, a first intermediate frame 621 and a second intermediate frame 622 can be obtained after passing through the warp processing module 620, the first intermediate frame 621 and the second intermediate frame 622 are input into the judging module 630 to determine whether scene switching exists between the first image frame 601 and the second image frame 602, if so, the first image frame 601 is input into the video synthesizing module 650 as a target frame interpolation video, if not, the first intermediate frame 621 and the second intermediate frame 622 are input into the convolution network 640 to be fused, the convolution network 640 inputs the fused target frame interpolation video into the video synthesizing module 650, and the video synthesizing module 650 synthesizes the frame interpolation video.

To sum up, in the video frame interpolation method provided in the embodiment of the present application, when the frame interpolation is performed in the video to be recognized, the pixel position transformation is performed on the first image frame and the second image frame in the video to be recognized respectively by using the target timestamp as a reference, the first intermediate frame corresponding to the first image frame and the second intermediate frame corresponding to the second image frame are obtained, the scene switching recognition result between the first image frame and the second image frame is determined according to the difference between the first intermediate frame and the second intermediate frame, and different target frame interpolation images are given according to different scene switching recognition results. Namely, the scene switching identification result determined by the first intermediate frame and the second intermediate frame does not generate misjudgment due to the large displacement of the image content of the first image frame and the second image frame, so that the accuracy of scene switching identification in a video image is improved, and the accuracy of the obtained frame-inserted video is improved.

Referring to fig. 7, a block diagram of a scene change recognition apparatus according to an exemplary embodiment of the present application is shown, where the apparatus includes the following modules:

an obtaining module 710, configured to obtain a first image frame and a second image frame, where the first image frame and the second image frame are two adjacent images in a video to be identified, the first image frame corresponds to a first timestamp, and the second image frame corresponds to a second timestamp;

a generating module 720, configured to perform pixel position transformation on the first image frame by using a target timestamp as a reference to obtain a first intermediate frame, where the target timestamp is a timestamp between the first timestamp and the second timestamp;

the generating module 720 is further configured to perform pixel position transformation on the second image frame by using the target timestamp as a reference to obtain a second intermediate frame;

a determining module 730, configured to determine a scene-cut recognition result between the first image frame and the second image frame based on a difference between the first intermediate frame and the second intermediate frame.

In some optional embodiments, as shown in fig. 8, the generating module 720 further includes:

a first obtaining unit 721 configured to obtain first optical flow information of the first image frame between the first timestamp and the target timestamp, where the first optical flow information is used to indicate motion of pixels in the first image frame between the first timestamp and the target timestamp;

a generating unit 722, configured to perform pixel position transformation on the first image frame based on the first optical flow information, so as to obtain the first intermediate frame;

the first acquiring unit 721 is further configured to acquire second optical flow information of the second image frame between the second timestamp and the target timestamp, where the second optical flow information is used to indicate motion of pixels in the second image frame between the second timestamp and the target timestamp;

the generating unit 722 is further configured to perform pixel position transformation on the second image frame based on the second optical flow information to obtain the second intermediate frame.

In some optional embodiments, the first obtaining unit 721 is further configured to obtain first coordinate change data of pixels in the first image frame between the first timestamp and the target timestamp based on the first optical flow information;

the generating unit 722 is further configured to shift pixels in the first image frame based on the first coordinate change data, so as to obtain the first intermediate frame;

the first acquiring unit 721 further configured to acquire second coordinate change data of pixels in the second image frame between the second time stamp and the target time stamp based on the second optical flow information;

the generating unit 722 is further configured to shift pixels in the second image frame based on the second coordinate change data, so as to obtain the second intermediate frame.

In some optional embodiments, the first obtaining unit 721 is further configured to convert the first optical flow information into a first motion speed of a pixel in the first image frame;

the first obtaining unit 721 is further configured to determine a first time interval between the first timestamp and the target timestamp;

the first obtaining unit 721, further configured to determine the first coordinate transformation data based on the first motion speed and the first time interval;

the first obtaining unit 721 is further configured to convert the second optical flow information into a second motion speed of the pixel in the second image frame;

the first obtaining unit 721 is further configured to determine a second time interval between the second timestamp and the target timestamp;

the first obtaining unit 721 is further configured to determine the second coordinate transformation data based on the second movement speed and the second time interval.

In some optional embodiments, the determining module 730 further includes:

a comparing unit 731, configured to compare the first intermediate frame and the second intermediate frame, and determine difference information between the first intermediate frame and the second intermediate frame;

a determining unit 732, configured to determine, in response to the difference information reaching a preset difference threshold, that the scene change recognition result between the first image frame and the second image frame is a first result, where the first result is used to indicate that a scene corresponding to the first image frame is different from a scene corresponding to the second image frame; or, in response to failure of matching the difference information with the preset difference threshold, determining that the scene change identification result between the first image frame and the second image frame is a second result, where the second result is used to indicate that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame.

In some optional embodiments, the comparing unit 731 is further configured to determine a pixel difference between each pixel point in the first intermediate frame and the second intermediate frame;

the comparing unit 731 is further configured to determine a mean value of the pixel differences of each image as the difference information.

In some optional embodiments, the comparing unit 731 is further configured to obtain a first histogram corresponding to the first intermediate frame and a second histogram corresponding to the second intermediate frame;

the comparing unit 731 is further configured to determine a difference value between the first histogram and the second histogram as the difference information.

In some optional embodiments, the obtaining module 710 is further configured to obtain a corresponding target frame interpolation image based on the scene cut recognition result;

the generating module 720 is further configured to insert the target frame-inserted image into the target timestamp in the video to be identified, so as to generate a frame-inserted video.

In some optional embodiments, the scene-cut recognition result includes a first result and a second result, the first result is used to indicate that the scene corresponding to the first image frame is different from the scene corresponding to the second image frame, and the second result is used to indicate that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame;

the obtaining module 710 is further configured to determine the first image frame or the second image frame as the target frame-insertion image in response to the scene change recognition result being the first result;

the obtaining module 710 is further configured to fuse the first intermediate frame and the second intermediate frame in response to the scene change recognition result being the second result, so as to obtain the target interpolated frame image.

In some optional embodiments, the obtaining module 710 further includes:

a second obtaining unit 711, configured to obtain the video to be identified;

a segmenting unit 712, configured to segment the video to be identified according to an original frame number corresponding to the video to be identified, so as to obtain an image frame sequence;

the second obtaining unit 711 is further configured to determine the first image frame and the second image frame that are adjacent in a time domain from the image frame sequence.

In some optional embodiments, the second obtaining unit 711 is further configured to obtain a target frame number requirement, where the target frame number requirement is used to indicate a frame number of an inter-frame video generated according to the video to be identified;

the second obtaining unit 711 is further configured to determine, based on the target frame number requirement, a frame period to be inserted in the image frame sequence, where the frame period to be inserted is a period in which frame insertion needs to be performed in the video to be identified, and the frame period to be inserted includes two frame images;

the second obtaining unit 711 is further configured to determine the corresponding picture image in the frame period to be inserted as the first image frame and the second image frame;

the second obtaining unit 711 is further configured to determine, as the first timestamp, a starting time corresponding to a display duration of the first image frame in the frame period to be inserted;

the second obtaining unit 711 is further configured to determine a starting time corresponding to a display duration of the second image frame in the video to be identified as the second timestamp.

In summary, the scene-switching recognition device provided in the embodiment of the present application performs pixel position transformation on a first image frame and a second image frame in a video to be recognized respectively with reference to a target timestamp when recognizing scene switching in the video to be recognized, obtains a first intermediate frame corresponding to the first image frame and a second intermediate frame corresponding to the second image frame, and determines a scene-switching recognition result between the first image frame and the second image frame according to a difference between the first intermediate frame and the second intermediate frame. The first intermediate frame can represent the display condition of the content in the first image frame in the target timestamp after motion estimation, the second intermediate frame can represent the display condition of the content in the second image frame in the target timestamp after motion estimation, the scene switching identification result determined by the first intermediate frame and the second intermediate frame cannot generate misjudgment due to the fact that the large displacement condition exists in the image content of the first image frame and the second image frame, and the accuracy of scene switching identification in a video image is improved.

It should be noted that: the scene switching recognition apparatus provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the scene switching identification apparatus and the scene switching identification method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, and are not described herein again.

Fig. 9 shows a schematic structural diagram of a server according to an exemplary embodiment of the present application. Specifically, the structure includes the following.

The server 900 includes a Central Processing Unit (CPU) 901, a system Memory 904 including a Random Access Memory (RAM) 902 and a Read Only Memory (ROM) 903, and a system bus 905 connecting the system Memory 904 and the CPU 901. The server 900 also includes a mass storage device 906 for storing an operating system 913, application programs 914, and other program modules 915.

The mass storage device 906 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905. The mass storage device 906 and its associated computer-readable media provide non-volatile storage for the server 900. That is, mass storage device 906 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable Programmable Read-Only Memory (EPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 904 and mass storage 906 described above may be collectively referred to as memory.

The server 900 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present application. That is, the server 900 may be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or the network interface unit 911 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

Embodiments of the present application further provide a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the biometric identification method provided by the above-mentioned method embodiments. Alternatively, the computer device may be a terminal or a server.

Embodiments of the present application further provide a computer-readable storage medium having at least one instruction, at least one program, code set, or instruction set stored thereon, loaded and executed by a processor, to implement the biometric identification method provided by the above method embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the biometric method described in any of the above embodiments.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for identifying a scene change, the method comprising:

2. The method of claim 1, wherein the pixel-location transforming the first image frame with reference to the target timestamp to obtain a first intermediate frame comprises:

acquiring first optical flow information of the first image frame between the first timestamp and the target timestamp, the first optical flow information indicating motion of pixels in the first image frame between the first timestamp and the target timestamp;

performing pixel position transformation on the first image frame based on the first optical flow information to obtain a first intermediate frame;

the pixel position conversion of the second image frame by taking the target timestamp as a reference to obtain a second intermediate frame comprises the following steps:

acquiring second optical flow information of the second image frame between the second timestamp and the target timestamp, wherein the second optical flow information is used for indicating motion conditions of pixels in the second image frame between the second timestamp and the target timestamp;

and performing pixel position transformation on the second image frame based on the second optical flow information to obtain the second intermediate frame.

3. The method of claim 2, wherein said pixel-wise transforming said first image frame based on said first optical-flow information to obtain said first intermediate frame comprises:

acquiring first coordinate change data of pixels in the first image frame between the first timestamp and the target timestamp based on the first optical flow information;

displacing pixels in the first image frame based on the first coordinate change data to obtain a first intermediate frame;

the pixel position transformation of the second image frame based on the second optical flow information to obtain the second intermediate frame comprises:

acquiring second coordinate change data of pixels in the second image frame between the second time stamp and the target time stamp based on the second optical flow information;

and displacing the pixels in the second image frame based on the second coordinate change data to obtain the second intermediate frame.

4. The method of claim 3, wherein said obtaining first coordinate change data for pixels in the first image frame between the first timestamp and the target timestamp based on the first optical flow information comprises:

translating the first optical flow information into a first motion velocity for a pixel in the first image frame;

determining a first time interval between the first timestamp and the target timestamp;

determining the first coordinate transformation data based on the first motion speed and the first time interval;

the acquiring second coordinate change data of pixels in the second image frame between the second timestamp and the target timestamp based on the second optical flow information includes:

translating the second optical flow information into a second motion velocity for pixels in the second image frame;

determining a second time interval between the second timestamp and the target timestamp;

determining the second coordinate transformation data based on the second motion speed and the second time interval.

5. The method of any of claims 1 to 4, wherein determining the scene cut recognition result between the first image frame and the second image frame based on the difference between the first intermediate frame and the second intermediate frame comprises:

comparing the first intermediate frame with the second intermediate frame, and determining difference information between the first intermediate frame and the second intermediate frame;

in response to the difference information reaching a preset difference threshold, determining that the scene change recognition result between the first image frame and the second image frame is a first result, wherein the first result is used for indicating that a scene corresponding to the first image frame is different from a scene corresponding to the second image frame; or, in response to a failure in matching the difference information with the preset difference threshold, determining that the scene change identification result between the first image frame and the second image frame is a second result, where the second result is used to indicate that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame.

6. The method of claim 5, wherein comparing the first intermediate frame and the second intermediate frame to determine difference information between the first intermediate frame and the second intermediate frame comprises:

determining the pixel difference of each pixel point between the first intermediate frame and the second intermediate frame;

determining a mean value of the pixel differences of each image as the difference information.

7. The method of claim 5, wherein comparing the first intermediate frame to the second intermediate frame to determine difference information between the first intermediate frame and the second intermediate frame comprises:

acquiring a first histogram corresponding to the first intermediate frame and a second histogram corresponding to the second intermediate frame;

determining a difference between the first histogram and the second histogram as the difference information.

8. The method of any of claims 1 to 4, further comprising:

acquiring a corresponding target frame interpolation image based on the scene switching identification result;

and inserting the target frame interpolation image into the target timestamp in the video to be identified to generate a frame interpolation video.

9. The method according to claim 8, wherein the scene change recognition result comprises a first result and a second result, the first result is used for indicating that the scene corresponding to the first image frame is different from the scene corresponding to the second image frame, and the second result is used for indicating that the scene corresponding to the first image frame is the same as the scene corresponding to the second image frame;

the acquiring of the corresponding target frame interpolation image based on the scene switching recognition result includes:

determining the first image frame or the second image frame as the target frame interpolation image in response to the scene change recognition result being the first result;

and in response to the scene switching identification result being the second result, fusing the first intermediate frame and the second intermediate frame to obtain the target frame interpolation image.

10. The method of any of claims 1 to 4, wherein said obtaining a first image frame and a second image frame comprises:

acquiring the video to be identified;

segmenting the video to be identified according to the original frame number corresponding to the video to be identified to obtain an image frame sequence;

determining the first image frame and the second image frame that are temporally adjacent from the sequence of image frames.

11. The method of claim 10, wherein said determining the temporally adjacent first image frame and the second image frame from the sequence of image frames comprises:

acquiring a target frame number requirement, wherein the target frame number requirement is used for indicating the frame number of an interpolation video generated according to the video to be identified;

determining a frame period to be inserted in the image frame sequence based on the target frame number requirement, wherein the frame period to be inserted is a period in which frame insertion needs to be executed in the video to be identified, and the frame period to be inserted comprises two frame image images;

determining corresponding picture images in the frame period to be inserted as the first image frame and the second image frame;

determining a starting time corresponding to the display duration of the first image frame in the frame period to be inserted as the first timestamp;

and determining the starting time corresponding to the display duration of the second image frame in the video to be identified as the second timestamp.

12. An apparatus for identifying scene cuts, the apparatus comprising:

13. A computer device, characterized in that it comprises a processor and a memory, in which at least one instruction, at least one program, set of codes or set of instructions is stored, which is loaded and executed by the processor to implement the scene cut recognition method according to any one of claims 1 to 11.

14. A computer-readable storage medium, wherein at least one program code is stored in the computer-readable storage medium, and the program code is loaded and executed by a processor to implement the scene-cut recognition method according to any one of claims 1 to 11.

15. A computer program product comprising a computer program/instructions, characterized in that the computer program/instructions are stored in a computer readable storage medium. A processor of a computer device reads the computer program/instruction from a computer-readable storage medium, and executes the computer program/instruction, so that the computer device executes to implement the scene-cut recognition method according to any one of claims 1 to 11.