WO2021184754A1 - 视频对比方法、装置、计算机设备和存储介质 - Google Patents

视频对比方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021184754A1
WO2021184754A1 PCT/CN2020/122626 CN2020122626W WO2021184754A1 WO 2021184754 A1 WO2021184754 A1 WO 2021184754A1 CN 2020122626 W CN2020122626 W CN 2020122626W WO 2021184754 A1 WO2021184754 A1 WO 2021184754A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
feature vector
image
image sequence
sharpness
Prior art date
Application number
PCT/CN2020/122626
Other languages
English (en)
French (fr)
Inventor
邬继阳
刘楚蓉
郭锐
罗家润
徐逸扬
谢宗兴
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021184754A1 publication Critical patent/WO2021184754A1/zh
Priority to US17/722,300 priority Critical patent/US20220237917A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • This application relates to the field of image processing technology, and in particular to a video comparison method, device, computer equipment, and storage medium.
  • the evaluation methods of video definition are basically carried out for a single video. If two videos are compared based on this definition evaluation method, the accuracy of the difference in definition of the two videos will not be too accurate. To guarantee.
  • a video comparison method, device, computer equipment, and storage medium are provided.
  • a video comparison method executed by a computer device, the method including:
  • the definition difference between the first video and the second video is determined based on the first definition feature vector and the second definition feature vector.
  • a video comparison device which includes:
  • a sequence extraction unit configured to obtain a first image sequence from the first video, and a second image sequence from the second video
  • the first feature extraction unit is configured to extract the first sharpness feature vector of the first image sequence through the first feature extraction module of the video comparison model;
  • the second feature extraction unit is configured to extract the second sharpness feature vector of the second image sequence through the second feature extraction module of the video comparison model, wherein the first feature extraction module and the second feature extraction The modules are the same;
  • the sharpness difference analysis unit is configured to determine the first video and the first video based on the first sharpness feature vector and the second sharpness feature vector through the sharpness difference analysis module of the video comparison model Second, the definition of the video is poor.
  • a non-volatile storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the video comparison method.
  • a computer device includes a memory and a processor.
  • the memory stores computer readable instructions.
  • the processor executes the steps of the video comparison method.
  • FIG. 1 is a schematic diagram of a scene of a video comparison method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a video comparison method provided by an embodiment of the present application.
  • Fig. 3a is a flowchart of a method for training a video comparison model in an embodiment of the present application
  • Figure 3b is a technical framework diagram of a video comparison solution provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a video comparison device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 6 is an optional structural schematic diagram of the distributed system 100 provided by an embodiment of the present application applied to a blockchain system.
  • FIG. 7 is an optional schematic diagram of a block structure provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure machine vision for targets. Further graphics processing is done to make computer processing more suitable for human eyes to observe or send to the instrument to detect images.
  • Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robotics, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • the embodiments of the present application provide a video comparison method, device, computer equipment, and storage medium. Specifically, this embodiment provides a video comparison method suitable for a video comparison device, and the video comparison device can be integrated in a computer device.
  • the computer equipment may be a terminal or other equipment, for example, a mobile phone, a tablet computer, a notebook computer, a desktop computer, etc.
  • the computer equipment can also be a server and other equipment.
  • the server can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, and it can also provide cloud services, cloud databases, cloud computing, and cloud functions. , Cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, but it is not limited to this.
  • the video comparison method of this embodiment can be implemented by a terminal or a server, or can be implemented jointly by the terminal and the server.
  • the video comparison method is described below by taking the terminal and the server jointly implementing the video comparison method as an example.
  • the video comparison system provided by the embodiment of the present application includes a terminal 10 and a server 20, etc.; the terminal 10 and the server 20 are connected through a network, for example, through a wired or wireless network connection, among which, the video comparison device on the terminal side It can be integrated in the terminal in the form of a client.
  • the terminal 10 may be used to obtain the first video and the second video, send the first video and the second video to the server, and a comparison instruction instructing the server to compare the videos.
  • the server 20 can be used to receive the first video and the second video, and a comparison instruction, obtain the first image sequence from the first video, and obtain the second image sequence from the second video;
  • the feature extraction module extracts the first sharpness feature vector of the first image sequence;
  • the second feature extraction module of the video comparison model extracts the second sharpness feature vector of the second image sequence, where the first feature extraction module and the second The feature extraction module is the same;
  • through the sharpness difference analysis module of the video comparison model based on the first sharpness feature vector and the second sharpness feature vector, the sharpness difference between the first video and the second video is determined, and the sharpness difference is sent to Terminal 10.
  • the embodiments of the present application will be described from the perspective of a video comparison device, and the video comparison device may be specifically integrated in a terminal.
  • the embodiment of the application provides a video comparison method, which can be executed by the processor of the terminal. As shown in Fig. 2, the flow of the video comparison method may be as follows:
  • the video transcoding methods used for the first video and the second video in this embodiment may be the same or different, which is not limited in this embodiment.
  • the video formats of the first video and the second video may be the same or different.
  • the video formats include but are not limited to rmvb, mpeg1-4, and mov.
  • the duration of the first video and the second video, the number of image frames included, etc., can be different; the first video and the second video can be any one of a horizontal video and a vertical video, which is not limited in this embodiment .
  • the first video and the second video may be videos shot through a video client.
  • the video client in this embodiment can be understood as a client that provides users with a video shooting portal, including but not limited to instant messaging clients Client and short video client, etc.
  • the first video and the second video may be derived from the same original video.
  • the step of "obtaining the first video and the second video” may include: obtaining the original video; converting the original video according to the first video transcoding method to obtain the first video; and obtaining the first video according to the second video for the original video.
  • the video transcoding mode is converted to obtain the second video.
  • the first video transcoding method and the second video transcoding method may be transcoding methods provided by different video clients.
  • the original video in this embodiment may be obtained by real-time shooting of the terminal through a camera, or obtained from a local video library of the terminal.
  • the step of "obtaining the original video” may include: taking a video as the original video through the camera of the terminal.
  • the step of "obtaining the original video” may also include: selecting a video from the videos stored locally in the terminal as the original video.
  • the video definition of two video clients can be compared with the solution of this embodiment.
  • the first video and the second video may be downloaded from the network through the video client.
  • the reference video client may be a competing video client of the video client to be evaluated.
  • the step of "obtaining the first video and the second video” may include: after logging in to the video client to be evaluated, downloading a video on the video client to be evaluated as the first video; and After the reference video client of the video client to be evaluated, download a video on the reference video client as the second video.
  • the original video can be converted on two different video clients first, and then the converted video can be downloaded from the video client to perform the video comparison in this embodiment.
  • the step of "converting the original video according to the second video transcoding mode to obtain the second video” may include: performing the original video based on the first video transcoding mode provided by the video client to be evaluated Convert to obtain the first converted video; publish the first converted video through the video client to be evaluated; and download the first converted video from the video client to be evaluated, where the downloaded first converted video is used as The first video.
  • the step of "converting the original video according to the first video transcoding method to obtain the first video” may include: a second video transcoding method provided by a competing video client based on the video client to be evaluated , Convert the original video to obtain the second converted video; publish the second converted video through the competing video client; and download the second converted video from the competing video client, where the downloaded second converted video The post video serves as the second video.
  • a video client to be evaluated and a competing product client can be installed on the terminal.
  • the method of this embodiment can be implemented by a video comparison device, which can be integrated on the terminal in the form of a client.
  • the device can call these video clients through the application program interfaces of the video clients to be evaluated and the competing video clients.
  • a comparative analysis trigger page can be displayed, and the comparative analysis trigger page can include a selection list of video clients to be evaluated and a selection list of competing video clients;
  • the video client to be evaluated can be called to convert the original video according to the first video transcoding method through the application program interface of the video client to be evaluated.
  • the first video is obtained, and the second video transcoding method provided by the competing video client is called through the application program interface of the competing video client to convert the original video to obtain the second video.
  • the first video and the second video may also be obtained by manually inputting the original video to the video client to be evaluated and the competing video client.
  • the number of frames of the images may be the same or different.
  • the number of image frames in the first image sequence and the second image sequence are equal.
  • Extracting the image frames of the first video and the second video, respectively, can obtain the first image sequence and the second image sequence.
  • the step of "obtaining a first image sequence from the first video and obtaining a second image sequence from the second video” may include: extracting a preset number of first images from the first video to form the first image An image sequence; and from the second video, extract a second image with the same position as the first image in the first video to form a second image sequence.
  • the preset number can be set according to needs, for example, it can be 20, 30, and so on.
  • the manner of extracting images from the first video and the second video is not limited. For example, it may be randomly selected, or may be extracted by a preset frame number interval, for example, a preset number of images are extracted at a preset frame number interval from the first frame in the video.
  • the positions of the corresponding images in the first image sequence and the second image sequence in the first video and the second video are the same.
  • the first video extracts the first images whose serial numbers are 1, 3, 7, 9, and 11 to form a first image sequence.
  • second images with serial numbers 1, 3, 7, 9 and 11 are also extracted to form a second image sequence.
  • the images extracted from the first video and the second video can be extracted by means of key frames, namely the first video and the second video.
  • the images extracted from the video are all key frame images.
  • an adaptive unsupervised clustering method can be used to extract the video key frames.
  • the images can be processed, such as zooming, etc., to process the first image and the second image into the same size.
  • the first image and the second image are all scaled to a size of 224 ⁇ 224.
  • the first feature extraction module and the second feature extraction module in this embodiment are the same, including the same structure and the same parameters.
  • the same network layer structure is adopted, and the parameters (including weights) in the network layer are completely the same.
  • the video comparison model in this embodiment includes a first feature extraction module, a second feature extraction module, and a definition difference calculation module.
  • the video comparison model can also be trained by the method shown in FIG. 3a.
  • the video training process includes:
  • the training sample pair includes a first image sequence sample and a second image sequence sample
  • the first image sequence sample in the same training sample pair comes from the same first video sample
  • the same training sample pair is in the same training sample pair.
  • the second image sequence samples of are from the same second video sample
  • the sample label of the training sample pair includes the expected sharpness difference between the first video sample and the second video sample.
  • the method for obtaining the first image sequence sample and the second image sequence sample in the training sample pair may refer to the foregoing description of the first image sequence and the second image sequence obtaining process.
  • a preset number of images can be extracted by extracting key frames for a first video sample to form the first image sequence sample, and for the second video sample, a preset number of images can be extracted by extracting key frames to form a second video sample.
  • Sample image sequence can be determined according to the actual situation, such as 20 sheets.
  • a preset number of images can be extracted multiple times to form multiple (first or second) image sequence samples.
  • the video sources of the first image sequence sample and the second image sequence sample in a training sample pair can be the same.
  • the first video sample and the second video sample can be video taken for the same terminal. Use different video transcoding methods to transcode the obtained video.
  • some preprocessing may be performed on these images, such as scaling, and scaling the extracted images to a preset size, such as scaling to a size of 224 ⁇ 224.
  • the size of the images in the first image sequence sample and the second image sequence sample are made consistent, which is convenient for subsequent feature extraction, comparison, and so on.
  • the video comparison model includes a first feature extraction module, a second feature extraction module, and a definition difference analysis module.
  • a video comparison model to be trained may be established based on the training sample pair.
  • the first feature extraction module may include a first feature extraction layer and a second feature extraction layer, and the first feature extraction layer and the second feature extraction layer can learn features of different dimensions.
  • the first feature extraction layer extracts image features
  • the second feature extraction layer extracts temporal features between image features.
  • the first feature extraction module is not limited to the structure of the first feature extraction layer and the second feature extraction layer, and may also have other feasible compositions.
  • the structure of the first feature extraction layer can be set and adjusted according to actual needs, and the extracted image features can be multi-dimensional, which is not limited in this embodiment.
  • the first feature extraction layer extracts multi-dimensional features for each frame of image
  • it can perform feature fusion on each frame of image separately to obtain the image features of each frame of image, and then input the fused image features into the second feature extraction layer Carry out the study of temporal relationship.
  • the second feature extraction module when the first feature extraction module includes the first feature extraction layer and the second feature extraction layer, the second feature extraction module also includes the first feature extraction layer and the second feature extraction layer. Floor.
  • the first feature extraction module and the second feature extraction module in this embodiment can be implemented based on a neural network.
  • the step of "extracting the first sharpness feature vector of the first image sequence sample through the first feature extraction module” may include: using the first feature extraction module to extract the image in the first image sequence sample from The pixel space is mapped to the target embedding space to obtain the first image feature vector of the first image sequence sample; and through the first feature extraction module, based on the timing relationship between the images corresponding to the first image feature vector, the first image feature vector Perform analysis to obtain the first sharpness feature vector of the first image sequence sample.
  • the step of "extracting the second sharpness feature vector of the second image sequence sample through the second feature extraction module” may include: using the second feature extraction module to extract the image in the second image sequence sample from The pixel space is mapped to the target embedding space to obtain the second image feature vector of the second image sequence sample; and through the second feature extraction module, based on the timing relationship between the images corresponding to the second image feature vector, the second image feature vector The analysis is performed to obtain the second sharpness feature vector of the second image sequence sample.
  • the process of obtaining the first image feature vector through the first feature extraction module may specifically include: using the first feature extraction module to extract multi-dimensional features from the images in the first image sequence sample to obtain multiple features.
  • the target embedding space to which the vector belongs is a combined space formed by a combination of multiple feature spaces (the image feature vector of each dimension corresponds to a feature space).
  • the process of obtaining the second image feature vector through the second feature extraction module may specifically include: using the second feature extraction module to extract multi-dimensional features of the images in the second image sequence sample to obtain multiple dimensions
  • the image feature vector of each image is feature fusion of the image feature vector of multiple dimensions of each image, and the fused image feature of each image in the second image sequence sample is obtained as the second image feature vector, where the second image feature vector belongs
  • the target embedding space (that is, the target feature space, generally a high-latitude space) is a combination space composed of multiple feature spaces (the image feature vector of each dimension corresponds to a feature space).
  • a pixel is a physical point in a bitmap (also called a raster image), expressed as the smallest element in the image representation.
  • an image can be understood as being composed of pixels in a grid. Each pixel has its own color value and spatial position. The color and spatial position of all pixels in the image determine the appearance of the image.
  • an image can be expressed in the format of [h, w, c], where h represents the image height, W represents the image width, and c represents the number of image channels.
  • the pixel space in this embodiment can be understood as h, w , The three-dimensional space formed by c.
  • the image in this embodiment can adopt any image mode.
  • the image mode can be understood as decomposing colors into partial color components. Different classifications of color components form different color modes, and different color modes define different color ranges.
  • the number of image channels for different color modes can also be different. For example, the number of image channels corresponding to an image in RGB mode is 3, and the number of image channels corresponding to an image in Alpha mode may be 4.
  • the first image feature vector and the second image feature vector can be extracted through the first feature extraction layer, and the first sharpness feature vector and the second sharpness feature vector can be extracted through the second feature extraction layer.
  • the image is mapped from the pixel space to the target embedding space, which can be understood as feature extraction of the image, and the target embedding space can be understood as the feature space where the second image feature vector is located after feature extraction is performed on the image.
  • the feature space varies according to different feature extraction methods.
  • the first feature extraction layer can be any network layer with image feature extraction function, which can be implemented based on any available network structure, for example, it can be based on CNN (Convolutional Neural Network, convolutional neural network)
  • the second feature extraction layer can be any network with a time-series feature extraction function, and it can be implemented based on any available network structure, for example, it can be implemented based on a recurrent neural network structure.
  • the first feature extraction layer can be implemented based on CNN, and the second feature extraction layer can be implemented based on a recurrent neural network, such as LSTM (Long Short-Term Memory).
  • LSTM Long Short-Term Memory
  • the first feature extraction layer may use ResNet50 (in other examples, other CNN networks may be used) as the backbone structure for fine-tuning, and at the same time, use data batch processing for training.
  • ResNet50 in other examples, other CNN networks may be used
  • N the number of samples of batch data Number
  • C the number of channels of the picture
  • H the height of the picture
  • W the width of the picture
  • This embodiment can transfer this set of data as input to the first feature extraction layer, and extract the output of the last fully connected layer of ResNet50 as the high-dimensional spatial feature of the current video frame sequence, and the feature dimension of the last fully connected layer Set to 2048 (the 2048 dimension is only an example, and the total number of dimensions can also be set to other values, which is not limited in this embodiment), namely:
  • F t ⁇ R N ⁇ 2048 is the high-dimensional sequence feature vector (first or second image feature vector) of the current video sequence frame, and N represents the number of files currently processed in one batch (which can be understood as Number of training sample pairs). For the two first feature extraction layers in Figure 3b, they share parameters during the training process.
  • the high-dimensional feature vector will be sent to the second feature extraction layer such as the LSTM module for timing feature learning.
  • the LSTM module can pass the forget gate, input
  • the sequence combination of gate and output gate automatically retains the useful information in the video sequence frame and outputs the final video feature vector, which is the first or second definition feature vector.
  • the parameter setting of the LSTM structure is not limited. It can be: the number of LSTM cells is 20, corresponding to 20 frames of images extracted from each video; the number of neurons in the hidden layer is 512, which corresponds to the final output video feature dimension of 512 ,
  • the activation function selects the tanh activation function.
  • the reship transformation operation is performed as (N is the number of batch-processed picture files), and then input F r into the LSTM module to calculate the timing feature F s , that is
  • the time sequence feature is the first or second sharpness feature vector.
  • the sharpness difference analysis module analyze the first sharpness feature vector and the second sharpness feature vector corresponding to the same training sample, and determine the difference between the training sample and the corresponding first video sample and second video sample.
  • the prediction clarity is poor.
  • the first feature extraction module and the second feature extraction module are identical (including the same structure and parameters, such as weights, etc.), so that for two image sequence samples in the same training sample pair, they are used for clear
  • the sharpness feature vector of the degree comparison analysis is in the same vector space, which ensures that the two image sequence samples can be compared and analyzed based on the sharpness feature vector, and the label of the training sample pair is poor in sharpness, so the continuous comparison of the model in the video
  • the parameters of the model such as the weights of features, are constantly adjusted based on the predicted sharpness difference and the expected sharpness difference.
  • the sharpness feature vector extracted by the model can more and more accurately reflect the sharpness difference between videos.
  • the accuracy of the model's comparison and analysis of video clarity is improved.
  • similarity can also be calculated for two definition feature vectors, and the difference in definition between the first video and the second video can be measured by the similarity.
  • the similarity can be expressed by Euclidean distance or the like.
  • the difference in sharpness can also be analyzed by the vector difference of the two sharpness feature vectors.
  • the "predicted definition difference with the second video sample” may include: through the definition difference analysis module, calculating the vector difference between the first definition feature vector and the second definition feature vector corresponding to the same training sample pair; And based on the vector difference value of the training sample pair, determine the prediction definition difference of the first video sample and the second video sample corresponding to the training sample pair.
  • the step of "determining the difference in prediction definition between the first video sample and the second video sample corresponding to the training sample pair based on the vector difference of the training sample pair" may include: pairing the training sample with a fully connected layer Process the vector difference value to obtain a one-dimensional vector difference value; and normalize the one-dimensional vector difference value to obtain the prediction sharpness difference of the training sample to the corresponding first video sample and the second video sample.
  • the vector difference can be classified through the fully connected layer in the definition difference analysis module.
  • the fully connected layer includes a first fully connected layer and a second fully connected layer, and the dimension of the first fully connected layer is the same as the dimension of the first definition feature vector, for example, 512 dimensions.
  • the dimension of the second fully connected layer is 1.
  • the fully connected layer may include a 512-dimensional fully connected layer FC 1 and a 1-dimensional fully connected layer FC 2.
  • FC 1 512-dimensional fully connected layer
  • FC 2 1-dimensional fully connected layer
  • An activation layer is connected between the layers, and the activation function of the activation layer may be a non-linear activation function layer, such as ReLU (Rectified Linear Unit, linear rectification function).
  • ReLU Rectified Linear Unit, linear rectification function
  • the sharpness difference in this embodiment can be any value between -1 and 1.
  • a regression operation on the one-dimensional vector difference, that is, normalize. Through the transformation operation, the predicted sharpness difference between -1 and 1 is obtained.
  • the function used in the normalization operation can be selected according to actual needs. For example, the Tanh function is selected, the 1-dimensional vector difference is subjected to the Tanh normalization activation operation, and the final sharpness difference Result is output.
  • the expected definition difference in this embodiment may be obtained by subjective evaluation of the first video sample and the second video sample.
  • the expected definition difference may be the mean MOS (Mean Opinion Score) of the subjective video evaluation. .
  • a preset loss function may be used to calculate the loss value between the predicted sharpness difference and the corresponding expected sharpness difference, and the parameters of the video comparison model can be adjusted based on the loss value.
  • the preset loss function may be a mean square error loss function.
  • the step of extracting the first definition feature vector of the first image sequence through the first feature extraction module of the video comparison model May include: mapping the first image in the first image sequence from the pixel space to the target embedding space through the first feature extraction module to obtain the first image feature vector of the first image sequence; and through the first feature extraction module, Based on the time sequence relationship between the first images corresponding to the first image feature vector, the first image feature vector is analyzed to obtain the first sharpness feature vector corresponding to the first image sequence.
  • the step of extracting the second sharpness feature vector of the second image sequence through the second feature extraction module of the video comparison model may include: using the second feature extraction module to extract the second image in the second image sequence from The pixel space is mapped to the target embedding space to obtain the second image feature vector of the second image sequence; and through the second feature extraction module, based on the timing relationship between the second images corresponding to the second image feature vector, the second image feature The vector is analyzed to obtain the second sharpness feature vector corresponding to the second image sequence.
  • the specific steps of extracting the first sharpness vector and the second sharpness vector can refer to the description in the above-mentioned model training process.
  • the video comparison model of this embodiment may be an end-to-end network structure model, the input is an image sequence, and the output is a sharpness difference. This can not only quantify the difference in definition between videos, but also effectively solve the problem of high difficulty in training and cumbersome deployment of models with non-end-to-end network structures.
  • the definition difference in this embodiment can be a positive or negative value or zero.
  • a value of zero can indicate that the definition of the first video is the same as that of the second video, and a positive value can indicate the definition of the first video.
  • the definition is higher than that of the second video, and a negative value may indicate that the definition value of the first video is lower than the definition value of the second video.
  • the step of “determining the difference in definition between the first video and the second video based on the first definition feature vector and the second definition feature vector through the definition difference analysis module of the video comparison model” may include : Calculate the similarity between the first sharpness feature vector and the second sharpness feature vector through the sharpness difference analysis module of the video comparison model; and determine the sharpness difference between the first video and the second video based on the similarity.
  • the similarity can be expressed by the vector distance between vectors, such as Euclidean distance.
  • the step of “determining the difference in definition between the first video and the second video based on the first definition feature vector and the second definition feature vector through the definition difference analysis module of the video comparison model” may include : Calculate the vector difference between the first sharpness feature vector and the second sharpness feature vector through the sharpness difference analysis module of the video comparison model; and determine the sharpness difference between the first video and the second video based on the vector difference.
  • the specific calculation process of poor clarity can refer to the relevant description in the model training program.
  • the definition difference analysis module of this embodiment includes a fully connected layer, and the fully connected layer may be composed of at least one fully connected layer.
  • the step of "determining the difference in definition between the first video and the second video based on the vector difference" may include: processing the vector difference through a fully connected layer to obtain a one-dimensional vector difference; and determining the one-dimensional vector The difference is normalized to obtain the difference in definition between the first video and the second video.
  • the first sharpness feature vector and the second sharpness feature vector are respectively F 1 and F 2 , and the bitwise subtraction operation is performed on F 1 and F 2 to obtain the vector difference.
  • the difference in definition can be obtained based on the processing of the vector difference by the fully connected layer.
  • the number of fully connected layers included in the fully connected layer of this embodiment is not limited.
  • the first fully connected layer FC 1 shown in FIG. 3b may be included.
  • the second fully connected layer FC 2 an activation layer is connected between the first fully connected layer and the second fully connected layer, and the activation function of the activation layer can be a non-linear activation function layer, such as ReLU (Rectified Linear Unit, linear rectification function) .
  • the second video transcoding manner may be used as the preset reference video transcoding manner.
  • the sharpness difference analysis module of the video comparison model determines the sharpness difference between the first video and the second video based on the first sharpness feature vector and the second sharpness feature vector, it can also analyze the first video based on the sharpness difference.
  • the degree of transcoding performance of the video transcoding method compared to the preset reference video transcoding method.
  • the definition difference is in the range of -1-0 (excluding 0)
  • the first video transcoding method is inferior to the second video transcoding method
  • the definition difference is in the range of 0-1 (excluding 0)
  • the first video transcoding method is better than the second video transcoding method.
  • -1-0 (excluding 0)
  • -1-0 it can be subdivided into several different ranges, and then set a different inferiority level for each range.
  • 0-1 (excluding 0)
  • the optimization solution for the video client to be analyzed can be determined based on the degree of transcoding performance (especially for the same-source video), for example, to optimize or optimize the first video transcoding method provided by the video client. replace.
  • the solution of this embodiment can accurately evaluate the performance difference with competing video clients, which is beneficial to optimize the client, improve the video quality of the product, and further enhance the user experience and attract more users to use.
  • this embodiment there can be multiple shooting modes of the camera of the terminal (the shooting parameters of different shooting modes are different).
  • This terminal uses a camera to shoot, and the total shooting mode of multiple original videos is not less than two.
  • the target shooting mode corresponding to the first video conversion mode is determined.
  • the first video is clearest compared to the second video.
  • the sharpness difference between the first video and the second video can be a positive value (the first video is clearer) or a negative value (the second video is clearer). If there is a positive value in the sharpness difference, in the target shooting mode , The sharpness difference between the first video and the second video obtained through the second video conversion method (such as a competing client) is the largest (and is a positive value). If there is no positive value in the sharpness difference, In the target shooting mode, the sharpness difference between the first video and the second video obtained through the second video conversion method (such as a competing client) is the smallest (and is a negative value).
  • This embodiment provides a video comparison method, which can obtain the first video and the second video; then obtain the first image sequence from the first video, and obtain the second image sequence from the second video; and then use the video comparison model
  • the first feature extraction module extracts the first sharpness feature vector of the first image sequence
  • the second feature extraction module of the video comparison model extracts the second sharpness feature vector of the second image sequence.
  • the first feature extraction module is the same as the second feature extraction module.
  • the sharpness feature vectors of the two extracted image sequences can more accurately reflect the relative sharpness of the two videos. After the feature vectors are extracted, the sharpness of the models can be compared through the video.
  • the degree difference analysis module determines the definition difference between the first video and the second video based on the first definition feature vector and the second definition feature vector, and realizes the quantification of the definition difference between the two videos. This embodiment is based on the definition The analysis of the feature vector helps to improve the accuracy of the analysis of the difference in the definition of the video.
  • the video comparison model of this embodiment inputs two videos and then outputs the difference in definition between the videos. This end-to-end solution is very convenient for the deployment of the model.
  • the solution of this embodiment can be applied to the automated analysis of competing product data on the UGC video recommendation terminal, which can accurately evaluate the performance difference with competing products and improve the video quality of the product; at the same time, the solution of this embodiment can also be applied to The evaluation of video transcoding technology accurately estimates the performance pros and cons of different transcoding technologies, so that the transcoding technology can make effective optimization strategies for video clarity.
  • an embodiment of the present invention also provides a video comparison device, and the video comparison device may be specifically integrated in a terminal.
  • a video comparison device includes: an acquisition unit 401, a sequence extraction unit 402, a first feature extraction unit 403, a second feature extraction unit 404, and a sharpness difference analysis unit 405.
  • the various modules included in the video comparison device can be implemented in whole or in part by software, hardware, or a combination thereof.
  • the acquiring unit 401 is configured to acquire the first video and the second video.
  • the sequence extraction unit 402 is configured to obtain a first image sequence from a first video and a second image sequence from a second video.
  • the first feature extraction unit 403 is configured to extract the first sharpness feature vector of the first image sequence through the first feature extraction module of the video comparison model.
  • the second feature extraction unit 404 is configured to extract the second sharpness feature vector of the second image sequence through the second feature extraction module of the video comparison model, wherein the first feature extraction module and the second feature extraction module are the same.
  • the sharpness difference analysis unit 405 is configured to determine the sharpness difference between the first video and the second video based on the first sharpness feature vector and the second sharpness feature vector through the sharpness difference analysis module of the video comparison model.
  • the obtaining unit 401 is also used to obtain the original video; convert the original video according to the first video transcoding mode to obtain the first video; and convert the original video according to the second video transcoding mode to obtain the first video. Second video.
  • the acquiring unit 401 is further configured to convert the original video based on the first video transcoding mode provided by the video client to be evaluated to obtain the first video; and the reference video client based on the video client provides The second video transcoding method converts the original video to obtain the second video.
  • the sequence extracting unit 402 is further configured to extract a preset number of first images from the first video to form a first image sequence; and from the second video, extract the first image from the first video.
  • the second images with the same position compose the second image sequence.
  • the first feature extraction unit 403 is further configured to map the first image in the first image sequence from the pixel space to the target embedding space through the first feature extraction module to obtain the first image of the first image sequence Feature vector; and through the first feature extraction module, based on the time sequence relationship between the first image corresponding to the first image feature vector, analyze the first image feature vector to obtain the first sharpness feature vector corresponding to the first image sequence
  • the second feature extraction unit 404 is also used to map the second image in the second image sequence from the pixel space to the target embedding space through the second feature extraction module to obtain the second image feature vector of the second image sequence; and
  • the second feature extraction module analyzes the second image feature vector based on the time sequence relationship between the second images corresponding to the second image feature vector to obtain the second sharpness feature vector corresponding to the second image sequence.
  • the sharpness difference analysis unit 403 is further configured to calculate the vector difference between the first sharpness feature vector and the second sharpness feature vector through the sharpness difference analysis module of the video comparison model; and determine based on the vector difference The definition of the first video and the second video is poor.
  • the second video transcoding method is a preset reference video transcoding method;
  • the video comparison device of this embodiment further includes: a transcoding performance analysis unit, which is used to analyze the difference in definition by using the video comparison model Based on the first definition feature vector and the second definition feature vector, the definition difference analysis module determines the definition difference between the first video and the second video. Based on the difference in definition, analyzes the comparison between the first video transcoding method and the Preset the degree of transcoding performance of the reference video transcoding method.
  • the video comparison device of this embodiment further includes: a training unit for obtaining training before extracting the first definition feature vector of the first image sequence by the first feature extraction module of the video comparison model A sample pair, where the training sample pair includes a first image sequence sample and a second image sequence sample, the first image sequence sample in the same training sample pair comes from the same first video sample, and the second image in the same training sample pair
  • the sequence samples are from the same second video sample, and the sample label of the training sample pair includes the expected difference in definition between the corresponding first video sample and the second video sample
  • the video comparison model to be trained is obtained, and the video comparison model includes the first feature extraction Module, the second feature extraction module, and the sharpness difference analysis module; through the first feature extraction module, the first sharpness feature vector is extracted from the first image sequence sample; through the second feature extraction module, the second image sequence is extracted
  • the sample extracts the second sharpness feature vector, where the network structure and network parameters of the first feature extraction module and the second feature extraction module are the same; through the sharpness difference analysis module, the same
  • the training unit is further used to map the image in the first image sequence sample from the pixel space to the target embedding space through the first feature extraction module to obtain the first image feature vector of the first image sequence sample;
  • the first feature extraction module analyzes the first image feature vector based on the time sequence relationship between the images corresponding to the first image feature vector to obtain the first sharpness feature vector of the first image sequence sample;
  • the image in the second image sequence sample is mapped from the pixel space to the target embedding space to obtain the second image feature vector of the second image sequence sample; through the second feature extraction module, based on the second image feature vector corresponding to the image
  • the second image feature vector is analyzed to obtain the second sharpness feature vector of the second image sequence sample.
  • the training unit is also used to calculate the vector difference between the first sharpness feature vector and the second sharpness feature vector corresponding to the same training sample pair through the sharpness difference analysis module; based on the training sample pair The vector difference value of, determines the difference in the prediction definition of the training sample to the corresponding first video sample and the second video sample.
  • the embodiment of the present invention also provides a computer device, which may be a terminal or a server, as shown in FIG. 5, which shows a schematic structural diagram of the computer device involved in the embodiment of the present invention, specifically:
  • the computer device may include one or more processing core processors 501, one or more computer-readable storage medium memory 502, power supply 503, input unit 504 and other components.
  • processing core processors 501 one or more computer-readable storage medium memory 502, power supply 503, input unit 504 and other components.
  • FIG. 5 does not constitute a limitation on the computer device, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements. in:
  • the processor 501 is the control center of the computer equipment. It uses various interfaces and lines to connect various parts of the entire computer equipment. It runs or executes software programs and/or modules stored in the memory 502, and calls Data, perform various functions of computer equipment and process data, so as to monitor the computer equipment as a whole.
  • the processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 501.
  • the memory 502 may be used to store software programs and modules.
  • the processor 501 executes various functional applications and data processing by running the software programs and modules stored in the memory 502.
  • the memory 502 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of computer equipment, etc.
  • the memory 502 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
  • the computer device also includes a power supply 503 for supplying power to various components.
  • the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the power supply 503 may also include any components such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators.
  • the computer device may further include an input unit 504, which can be used to receive inputted digital or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • an input unit 504 which can be used to receive inputted digital or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the computer device may also include a display unit, etc., which will not be repeated here.
  • the system involved in the embodiment of the present invention may be a distributed system formed by connecting clients and multiple nodes (computer devices of any form in the network, such as servers and terminals) connected through network communication.
  • FIG. 6 is an optional structural diagram of the distributed system 100 provided by an embodiment of the present invention applied to the blockchain system.
  • Multiple nodes access Any form of computing equipment in the network, such as servers, user terminals, and clients are formed, and the nodes form a peer-to-peer (P2P, Peer To Peer) network.
  • the P2P protocol is a transmission control protocol (TCP, Transmission Control). Protocol)
  • TCP Transmission Control
  • Protocol The application layer protocol above the protocol.
  • any machine such as a server or terminal can join and become a node.
  • the node includes the hardware layer, the middle layer, the operating system layer, and the application layer.
  • the original video, the first video, the second video, the training sample pair, the first video sample, the second video sample, the difference in definition, etc. can all be stored in the shared ledger of the blockchain system through the node .
  • a computer device (such as a terminal or a server) can obtain poor clarity based on the recorded data stored in the shared ledger.
  • Routing the basic function of nodes to support communication between nodes.
  • nodes can also have the following functions:
  • the services implemented by the application include:
  • Wallet used to provide the function of electronic currency transactions, including initiating a transaction (that is, sending the transaction record of the current transaction to other nodes in the blockchain system, and after other nodes are successfully verified, it will act as a response to acknowledge that the transaction is valid ,
  • the transaction record data is stored in the temporary block of the blockchain; of course, the wallet also supports the inquiry of the remaining electronic money in the electronic money address;
  • Shared ledger used to provide functions such as storage, query, and modification of account data, and send the record data of the operation of the account data to other nodes in the blockchain system. After other nodes verify the validity, it will be regarded as an acknowledged account. Data valid response, save the recorded data in a temporary block, and send confirmation to the node that initiated the operation.
  • Smart contracts can execute the terms of a certain contract, implemented by the code deployed on the shared ledger for execution when certain conditions are met, and the code is used to complete automated transactions according to actual business requirements. For example, query the logistics status of the goods purchased by the buyer, and transfer the buyer’s electronic currency to the merchant’s address after the buyer signs for the goods; Contract processed.
  • Blockchain includes a series of blocks that are connected to each other according to the chronological order of generation. Once a new block is added to the blockchain, it will not be removed again. The blockchain system is recorded in the block. Record data submitted by the mid-node.
  • Block Structure is an optional schematic diagram of a block structure (Block Structure) provided by an embodiment of the present invention.
  • Each block includes the hash value of the transaction record stored in the block (the hash value of the block). ), and the hash value of the previous block, each block is connected by the hash value to form a blockchain.
  • the block may also include information such as the time stamp when the block was generated.
  • Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains relevant information to verify the validity of its information. (Anti-counterfeiting) and generate the next block.
  • a computer device including a memory and a processor, and the memory stores a computer program.
  • the processor causes the processor to perform the steps in the foregoing method embodiments.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the steps in the foregoing method embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the steps in the foregoing method embodiments.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

一种视频对比方法、装置、计算机设备和存储介质,所述方法包括:获取第一视频和第二视频;从所述第一视频中获取第一图像序列,并从所述第二视频中获取第二图像序列;通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量;通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量,其中,所述第一特征提取模块和所述第二特征提取模块相同;及通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差。

Description

视频对比方法、装置、计算机设备和存储介质
本申请要求于2020年3月17日提交中国专利局,申请号为202010187813.1,申请名称为“一种视频对比方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,具体涉及一种视频对比方法、装置、计算机设备和存储介质。
背景技术
相关技术中,视频清晰度的评估方式基本上都是针对单个视频进行的,若是基于这种清晰度的评估方式来对两个视频进行比较,则两个视频的清晰度差的准确度得不到保证。
发明内容
根据本申请提供的各种实施例,提供一种视频对比方法、装置、计算机设备和存储介质。
一种视频对比方法,由计算机设备执行,所述方法包括:
获取第一视频和第二视频;
从所述第一视频中获取第一图像序列,并从所述第二视频中获取第二图像序列;
通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量;
通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量的提取,其中,所述第一特征提取模块和所述第二特征提取模块相同;及
通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差。
一种视频对比装置,该装置包括:
获取单元,用于获取第一视频和第二视频;
序列提取单元,用于从所述第一视频中获取第一图像序列,并从所述第二视频中获取第二图像序列;
第一特征提取单元,用于通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量;
第二特征提取单元,用于通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量,其中,所述第一特征提取模块和第二特征提取模块相同;及
清晰度差分析单元,用于通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差。
一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行视频对比方法的步骤。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行视频对比方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本 申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的视频对比方法的场景示意图;
图2是本申请实施例提供的一种视频对比方法的流程图;
图3a是本申请实施例中视频对比模型的训练方法的流程图;
图3b是本申请实施例提供的视频对比方案的技术框架图;
图4是本申请实施例提供的一种视频对比装置的结构示意图;
图5是本申请实施例提供的一种计算机设备的结构示意图;
图6是本申请实施例提供的分布式系统100应用于区块链系统的一个可选的结构示意图;及
图7是本申请实施例提供的区块结构的一个可选的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语 言处理技术以及机器学习/深度学习等几大方向。
其中,计算机视觉(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的计算机视觉、机器学习/深度学习等技术,具体通过如下实施例进行说明。
本申请实施例提供一种视频对比方法、装置、计算机设备和存储介质。具体地,本实施例提供适用于视频对比装置的视频对比方法,该视频对比装置可以集成在计算机设备中。
该计算机设备可以为终端等设备,例如可以为手机、平板电脑、笔记本电脑、台式电脑等。
该计算机设备还可以为服务器等设备,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器,但并不局限于此。
本实施例的视频对比方法,可以由终端或服务器实现,也可以由终端和服务器共同实现。
下面以终端和服务器共同实现该视频对比方法为例,对视频对比方法进行说明。
参考图1,本申请实施例提供的视频对比系统包括终端10和服务器20等;终端10与服务器20之间通过网络连接,比如,通过有线或无线网络连接等,其中,终端侧的视频对比装置可以以客户端的形式集成在终端中。
其中,终端10,可以用于获取第一视频和第二视频,向服务器发送第一视频和第二视频,以及指示服务器进行视频对比的对比指令。
其中,服务器20可以用于接收第一视频和第二视频,以及对比指令,从第一视频中获取第一图像序列,并从第二视频中获取第二图像序列;通过视频对比模型的第一特征提取模块,提取第一图像序列的第一清晰度特征向量;通过视频对比模型的第二特征提取模块提取第二图像序列的第二清晰度特征向量,其中,第一特征提取模块和第二特征提取模块相同;通过视频对比模型的清晰度差分析模块,基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差,将清晰度差发送给终端10。
以下分别进行详细说明。需说明的是,以下实施例的描述顺序不作为对实施例优选顺序的限定。
本申请实施例将从视频对比装置的角度进行描述,该视频对比装置具体可以集成在终端中。本申请实施例提供了一种视频对比方法,该方法可以由 终端的处理器执行,如图2所示,该视频对比方法的流程可以如下:
201、获取第一视频和第二视频。
本实施例中的第一视频和第二视频采用的视频转码方式可以相同或不同,本实施例对此没有限制。第一视频和第二视频的视频格式可以相同或不同,例如视频格式包括但不限于rmvb、mpeg1-4、以及mov等等。第一视频和第二视频的时长,包含的图像帧的数量等等,可以不同;第一视频和第二视频可以为横向视频和竖向视频中的任意一种,本实施例对此没有限制。
在一个实施例中,第一视频和第二视频可以是通过视频客户端拍摄的视频,本实施例的视频客户端可以理解为向用户提供视频拍摄入口的客户端,包括但不限于即时通讯客户端和短视频客户端等等。
本实施例中,第一视频和第二视频可以来源于相同的原始视频。
在一个实施例中,步骤“获取第一视频和第二视频”,可以包括:获取原始视频;对原始视频按照第一视频转码方式进行转换,得到第一视频;及对原始视频按照第二视频转码方式进行转换,得到第二视频。其中,第一视频转码方式和第二视频转码方式可以是不同视频客户端提供的转码方式。
本实施例的原始视频可以是终端通过摄像头实时拍摄得到的,或者是从终端本地的视频库中获取得到的。
在一个实施例中,步骤“获取原始视频”,可以包括:通过终端的摄像头拍摄一段视频作为原始视频。
在一个实施例中,步骤“获取原始视频”,也可以包括:从终端本地存储的视频中选择一段视频作为原始视频。
本实施例中,可以通过本实施例的方案对两个视频客户端进行视频清晰度的比较。
在一个实施例中,步骤“对原始视频按照第一视频转码方式进行转换,得到第一视频”,可以包括:基于待评估的视频客户端提供的第一视频转码方式,对原始视频进行转换,得到第一视频;“对原始视频按照第二视频转码方式进行转换,得到第二视频”,可以包括:基于视频客户端的参考视频客户 端提供的第二视频转码方式,对原始视频进行转换,得到第二视频。
本实施例中,考虑到网络传输对视频清晰度影响,第一视频和第二视频可以是通过视频客户端从网络上下载的。
其中,参考视频客户端可以是待评估的视频客户端的竞品视频客户端。
在一个实施例中,步骤“获取第一视频和第二视频”,可以包括:在登录待评估的视频客户端后,在待评估的视频客户端上下载一段视频作为第一视频;及在登录待评估的视频客户端的参考视频客户端后,在参考视频客户端上下载一段视频作为第二视频。
在一个实施例中,可以先将原始视频在两个不同的视频客户端上进行转换,然后再从视频客户端上下载转换后的视频,进行本实施例的视频对比。
在一个实施例中,步骤“对原始视频按照第二视频转码方式进行转换,得到第二视频”,可以包括:基于待评估的视频客户端提供的第一视频转码方式,对原始视频进行转换,得到第一转换后视频;通过待评估的视频客户端发布第一转换后视频;及从待评估的视频客户端上下载该第一转换后视频,其中下载的该第一转换后视频作为第一视频。
在一个实施例中,步骤“对原始视频按照第一视频转码方式进行转换,得到第一视频”,可以包括:基于待评估的视频客户端的竞品视频客户端提供的第二视频转码方式,对原始视频进行转换,得到第二转换后视频;通过竞品视频客户端发布第二转换后视频;及从竞品视频客户端上下载该第二转换后视频,其中下载的该第二转换后视频作为第二视频。
本实施例中,终端上可以安装有待评估的视频客户端以及竞品客户端,本实施例的方法可以由视频对比装置实现,该视频对比装置可以以客户端的形式集成在终端上,该视频对比装置可以通过待评估的视频客户端和竞品视频客户端的应用程序接口,调用这些视频客户端。
获取原始视频后,可以显示对比分析触发页面,该对比分析触发页面可以包括待评估的视频客户端的选择列表,以及竞品视频客户端的选择列表;
在用户选择待评估的视频客户端和竞品视频客户端后,可以通过待评估 的视频客户端的应用程序接口,调用待评估的视频客户端按照第一视频转码方式,对原始视频进行转换,得到第一视频,通过竞品视频客户端的应用程序接口,调用竞品视频客户端提供的第二视频转码方式,对原始视频进行转换,得到第二视频。
由此,可以实现对第一视频和第二视频的自动获取。
在一个实施例中,还可以通过向待评估的视频客户端和竞品视频客户端人工输入原始视频的方式,获取第一视频和第二视频。
202、从第一视频中获取第一图像序列,并从第二视频中获取第二图像序列。
本实施例中,第一图像序列和第二图像序列中,图像的帧数可以相同或不同。可选的,第一图像序列和第二图像序列的图像帧数相等。
对第一视频和第二视频分别进行图像帧的提取,可以得到第一图像序列和第二图像序列。
在一个实施例中,步骤“从第一视频中获取第一图像序列,从第二视频中获取第二图像序列”,可以包括:从第一视频中提取预设数量的第一图像,组成第一图像序列;及从第二视频中,提取与第一视频中的第一图像位置相同的第二图像,组成第二图像序列。
其中,预设数量可以根据需要设置,例如可以为20、30等等数值。
进一步的,从第一视频和第二视频中抽取图像的方式不限。例如可以是随机抽取,可以通过预设帧数间隔的方式提取,例如,从视频中的第一帧开始按照预设帧数间隔提取预设数量的图像。
本实施例中,在第一视频和第二视频同源时,可选的,第一图像序列和第二图像序列中对应的图像在第一视频和第二视频中的位置相同。
如假设第一视频提取序号为1、3、7、9以及11的第一图像,组成第一图像序列。在第二视频中也提取序号为1、3、7、9以及11的第二图像,组成第二图像序列。
在另一个实施例中,对于同源和非同源的第一视频和第二视频,从第一 视频和第二视频中提取图像,可以通过关键帧的方式提取,即第一视频和第二视频中提取的图像均为关键帧图像。可选的,可以采用自适应的非监督聚类方法进行视频关键帧的提取。
在第一图像和第二图像提取后,可以对图像进行处理,如缩放处理等,将第一图像和第二图像处理成相同尺寸。例如第一图像和第二图像全部缩放至224×224大小。
203、通过视频对比模型的第一特征提取模块,提取第一图像序列的第一清晰度特征向量。
204、通过视频对比模型的第二特征提取模块,提取第二图像序列的第二清晰度特征向量,其中,第一特征提取模块和第二特征提取模块相同。
本实施例中的第一特征提取模块和第二特征提取模块的相同,包括结构相同以及参数相同,如采用完全相同的网络层结构,网络层中的参数(包括权重)等完全相同。
本实施例中的视频对比模型包括第一特征提取模块、第二特征提取模块以及清晰度差值计算模块。
在此,先结合图3a和图3b,对视频对比模型的训练过程进行描述。
本实施例的步骤201之前,还可以通过图3a所示的方法,对视频对比模型进行训练。
参考图3a,视频训练的过程包括:
301、获取训练样本对,其中,训练样本对中包括第一图像序列样本和第二图像序列样本,同一训练样本对中的第一图像序列样本来自同一个第一视频样本,同一训练样本对中的第二图像序列样本来自同一个第二视频样本,训练样本对的样本标签包括第一视频样本和第二视频样本的期望清晰度差。
本实施例中,训练样本对中的第一图像序列样本和第二图像序列样本的获取方式可以参考前述描述的第一图像序列和第二图像序列的获取过程。例如,可以针对第一视频样本采用提取关键帧的方式提取预设数量的图像,组成第一图像序列样本,对第二视频样本采用提取关键帧的方式也提取预设数 量的图像,组成第二图像序列样本。其中,预设数量可以根据实际情况确定,例如20张。
可以理解的是,在本实施例中,针对同一视频样本,可以多次提取预设数量的图像,组成多个(第一或第二)图像序列样本。当然可以理解的是,同一视频提取得到的两个图像序列样本,至少存在一帧不同的图像。一个训练样本对中的第一图像序列样本和第二图像序列样本的视频来源可以相同,例如在一个训练样本对中,第一视频样本和第二视频样本,可以是针对同一个终端拍摄视频,采用不同的视频转码方法进行转码得到的视频。
本实施例中,在从视频样本中提取图像后,可以针对这些图像进行一些预处理,例如进行缩放处理,将提取的图像缩放至预设尺寸,如缩放至224×224大小。使得第一图像序列样本和第二图像序列样本中的图像的尺寸一致,便于后续提取特征,以及进行比较等等。
302、获取待训练的视频对比模型,视频对比模型包括第一特征提取模块、第二特征提取模块和清晰度差分析模块。
本实施例的步骤302中,可以基于训练样本对建立待训练的视频对比模型。
其中,第一特征提取模块可以包括第一特征提取层和第二特征提取层,第一特征提取层和第二特征提取层可以学习不同维度的特征。例如,第一特征提取层提取图像特征,第二特征提取层提取图像特征之间的时序特征。当然,第一特征提取模块不局限于第一特征提取层和第二特征提取层的结构,还可以有其他可行的组成。
其中,第一特征提取层的结构可以根据实际需要设置和调整,其提取的图像特征可以是多维度的,本实施例对此没有限制。
例如,第一特征提取层对每帧图像提取多维度的特征之后,可以分别对每一帧图像进行特征融合,得到每帧图像的图像特征,然后将融合后的图像特征输入第二特征提取层进行时序关系的学习。
鉴于第一特征提取模块和第二特征提取模块相同,第一特征提取模块包括第一特征提取层和第二特征提取层时,第二特征提取模块也包括第一特征 提取层和第二特征提取层。
303、通过第一特征提取模块,提取第一图像序列样本的第一清晰度特征向量。
304、通过第二特征提取模块,提取第二图像序列样本的第二清晰度特征向量,其中,第一特征提取模块和第二特征提取模块的网络结构和网络参数相同。
本实施例中的第一特征提取模块和第二特征提取模块可以基于神经网络实现。
在一个实施例中,步骤“通过第一特征提取模块,提取第一图像序列样本的第一清晰度特征向量”,可以包括:通过第一特征提取模块,将第一图像序列样本中的图像从像素空间映射到目标嵌入空间,得到第一图像序列样本的第一图像特征向量;及通过第一特征提取模块,基于第一图像特征向量对应的图像之间的时序关系,对第一图像特征向量进行分析,得到第一图像序列样本的第一清晰度特征向量。
在一个实施例中,步骤“通过第二特征提取模块,提取第二图像序列样本的第二清晰度特征向量”,可以包括:通过第二特征提取模块,将第二图像序列样本中的图像从像素空间映射到目标嵌入空间,得到第二图像序列样本的第二图像特征向量;及通过第二特征提取模块,基于第二图像特征向量对应的图像之间的时序关系,对第二图像特征向量进行分析,得到第二图像序列样本的第二清晰度特征向量。
本实施例中,通过第一特征提取模块,得到第一图像特征向量的过程,具体可以包括:通过第一特征提取模块,对第一图像序列样本中的图像进行多维度特征的提取,得到多个维度的图像特征向量,对各图像的多个维度的图像特征向量进行特征融合,得到第一图序列样本中各图像的融合后图像特征,作为第一图像特征向量,其中,第一图像特征向量所属的目标嵌入空间(即目标特征空间,一般为高纬度空间)为由多个特征空间(每个维度的图像特征向量对应一个特征空间)组合而成的组合空间。
对应的,通过第二特征提取模块,得到第二图像特征向量的过程,具体可以包括:通过第二特征提取模块,对第二图像序列样本中的图像进行多维度特征的提取,得到多个维度的图像特征向量,对各图像的多个维度的图像特征向量进行特征融合,得到第二图序列样本中各图像的融合后图像特征,作为第二图像特征向量,其中,第二图像特征向量所属的目标嵌入空间(即目标特征空间,一般为高纬度空间)为由多个特征空间(每个维度的图像特征向量对应一个特征空间)组合而成的组合空间。
在影像中,像素是位图(又称栅格图)中的一个物理点,表示为图像表示中最小的元素。也即,图像可以理解为由一格一格的像素构成的。每个像素有各自的颜色值和空间位置。图像中所有像素的颜色和空间位置决定了图像所呈现出来的样子。在神经网络中,图像可以被表示成[h,w,c]的格式,其中,h表示图像高度,W表示图像宽度,c表示图像通道数,本实施例的像素空间可以理解为h,w,c构成的三维空间。
本实施例中的图像可以采用任意的图像模式,图像模式可以理解为把色彩分解成部分颜色组件,对颜色组件不同的分类就形成了不同的色彩模式,不同的色彩模式所定义的颜色范围不同,不同的色彩模式应的图像通道数目也可以不同。例如,RGB模式的图像对应的图像通道数为3,Alpha模式的图像对应的图像通道数可以为4。
本实施例中,第一图像特征向量和第二图像特征向量,可以通过第一特征提取层提取,第一清晰度特征向量和第二清晰度特征向量可以通过第二特征提取层提取的。
本实施例中将图像从像素空间映射到目标嵌入空间,可以理解为对图像进行特征提取,目标嵌入空间可以理解为对图像进行特征提取后,得到的第二图像特征向量所处的特征空间,该特征空间根据特征提取方式的不同而不同。
可选的,本实施例中,第一特征提取层可以是任意具有图像特征提取功能的网络层,其可以基于任意可用的网络结构实现,例如可以基于CNN (Convolutional Neural Network,卷积神经网络)实现,本实施例中,第二特征提取层可以是任意具有时序特征提取功能的网络,其可以基于任意可用的网络结构实现,例如,可以基于循环神经网络结构实现。
参考图3b,第一特征提取层可以基于CNN实现,第二特征提取层可以基于循环神经网络实现,例如LSTM(Long Short-Term Memory,长短期记忆网络)。
本实施例中,第一特征提取层可以采用ResNet50(在其他示例中,可以采用其他CNN网络)作为主干结构进行微调,同时采用数据批处理的方式进行训练。
对于视频的一组序列帧I t∈R N×C×H×W数据(可以理解为N个第一图像序列样本或N个第二图像序列样本),其中,N为批处理数据的样本个数、C为图片的通道数、H为图片高度,W为图片宽度。
本实施例可以把这组数据作为输入传输到第一特征提取层中,同时提取ResNet50的最后一个全连接层的输出,作为当前视频帧序列的高维空间特征,最后一个全连接层的特征维度设置为2048(该2048维度仅为示例,维度总数还可以设置为其他数值,本实施例对此没有限制),即:
F t=CNN(I t),F t∈R N×2048           (1)
如公式(1)所示,F t∈R N×2048为当前视频序列帧的高维序列特征向量(第一或第二图像特征向量),N代表当前一次批处理的文件数量(可以理解为训练样本对数量)。对于图3b中的两个第一特征提取层,它们在训练的过程中共享参数。
当第一特征提取层对视频序列帧的特征抽取完毕以后,会将该高维特征向量送入第二特征提取层如LSTM模块中,进行时序特征的学习,该LSTM模块可以通过遗忘门、输入门和输出门的序列组合,自动保留视频序列帧中有用的信息并输出最终的视频特征向量,该视频特征向量即为第一或第二清晰度特征向量。
本实施例中,LSTM结构的参数设置不限,可以为:LSTM细胞个数20, 对应每个视频抽取的20帧图像;隐藏层的神经元个数512,对应最终输出的视频特征维度为512,激活函数选择tanh激活函数。
本实施例中,对于第一特征提取层输出的特征F t∈R N×2048,将其进行reship变形操作为
Figure PCTCN2020122626-appb-000001
(N为批处理的图片文件个数),然后将F r输入LSTM模块计算时序特征F s,即
Figure PCTCN2020122626-appb-000002
该时序特征即为第一或第二清晰度特征向量。
对于图3b中的两个第二特征提取层如LSTM层,它们在训练的过程中也共享参数。
305、通过清晰度差分析模块,对同一训练样本对所对应的第一清晰度特征向量和第二清晰度特征向量进行分析,确定训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差。
本实施例中,第一特征提取模块和第二特征提取模块的相同(包括结构和参数如权重等完全相同),使得对于同一训练样本对中的两个图像序列样本而言,用于进行清晰度对比分析的清晰度特征向量在同一向量空间中,保证了两个图像序列样本可以基于清晰度特征向量进行对比分析,并且,训练样本对的标签为清晰度差,所以在视频对比模型的不断训练过程中,模型的参数例如特征的权重等参数,基于预测清晰度差和期望清晰度差被不断调整,模型提取的清晰度特征向量可以越来越准确地反映视频之间的清晰度差,最终,在一定程度上提升了模型对视频的清晰度对比分析的准确性。
在一个实施例中,还可以对两个清晰度特征向量计算相似度,通过相似度衡量第一视频和第二视频之间的清晰度差。其中,相似度可以通过欧式距离等表示。
在另一个实施例中,也可以通过两个清晰度特征向量的向量差值分析清晰度差。
在一个实施例中,步骤“通过清晰度差分析模块,对同一训练样本对所对应的第一清晰度特征向量和第二清晰度特征向量进行分析,确定训练样本 对所对应的第一视频样本和第二视频样本的预测清晰度差”,可以包括:通过清晰度差分析模块,计算同一训练样本对所对应的第一清晰度特征向量和第二清晰度特征向量之间的向量差值;及基于训练样本对的向量差值,确定训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差。
在一个实施例中,步骤“基于训练样本对的向量差值,确定训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差”,可以包括:通过全连接层对训练样本对的向量差值进行处理,得到一维的向量差值;及对一维的向量差值进行归一化,得到训练样本对对应的第一视频样本和第二视频样本的预测清晰度差。
例如,假设第一清晰度特征向量和第二清晰度特征向量分别为F 1,F 2
Figure PCTCN2020122626-appb-000003
对F 1和F 2进行按位相减的操作得到向量差值F final为:
Figure PCTCN2020122626-appb-000004
得到向量差值后,可以通过清晰度差分析模块中的全连接层对向量差值进行分类。其中,全连接层包括第一全连接层和第二全连接层,第一全连接层的维度和第一清晰度特征向量的维度相同,例如为512维。第二全连接层的维度为1。
例如图3b所示的技术框架图中,全连接层可以包括512维度的全连接层FC 1和一个1维的全连接层FC 2,本实施例中,第一全连接层和第二全连接层之间连接有激活层,激活层的激活函数可以是非线性激活函数层,如ReLU(Rectified Linear Unit,线性整流函数)。
一维的向量差值
Figure PCTCN2020122626-appb-000005
本实施例的清晰度差值可以为是-1至1之间的任意数值,参考图3b,在计算出一维的向量差值后,对该一维的向量差值进行回归操作即归一化操作,得到取值在-1至1之间的预测清晰度差。可选的,归一化操作采用的函数可以根据实际需要选择,例如选择Tanh函数,将1维的向量差值进行Tanh归一化激活操作,输出最终的清晰度差Result。
Result=Tanh(F score),Result∈(-1,1)          (5)
306、基于训练样本对的预测清晰度差和对应的期望清晰度差,调整视频对比模型的参数,直到视频对比模型训练完成,其中,在每一次参数调整后,第一特征提取模块和第二特征提取模块的参数相同。
如图3b中的CNN和LSTM的参数相同。
本实施例中的期望清晰度差可以是针对第一视频样本和第二视频样本进行主观评测得到的,例如该期望清晰度差可以是视频主观评测的MOS(Mean Opinion Score,平均意见得分)均值。
本实施例中,可以采用预设损失函数计算预测清晰度差和对应的期望清晰度差之间的损失值,基于该损失值调整视频对比模型的参数。
在一个实施例中,预设损失函数可以是均方差损失函数。
在一个实施例中,在基于第一图像序列和第二图像序列进行视频清晰度对比分析时,通过视频对比模型的第一特征提取模块,提取第一图像序列的第一清晰度特征向量的步骤,可以包括:通过第一特征提取模块,将第一图像序列中的第一图像从像素空间映射到目标嵌入空间,得到第一图像序列的第一图像特征向量;及通过第一特征提取模块,基于第一图像特征向量对应的第一图像之间的时序关系,对第一图像特征向量进行分析,得到第一图像序列对应的第一清晰度特征向量。
对应的,通过视频对比模型的第二特征提取模块,提取第二图像序列的第二清晰度特征向量的步骤,可以包括:通过第二特征提取模块,将第二图像序列中的第二图像从像素空间映射到目标嵌入空间,得到第二图像序列的第二图像特征向量;及通过第二特征提取模块,基于第二图像特征向量对应的第二图像之间的时序关系,对第二图像特征向量进行分析,得到第二图像序列对应的第二清晰度特征向量。
其中,提取第一清晰度向量和第二清晰度向量的具体步骤,可以参考上述模型训练过程中的描述。
205、通过视频对比模型的清晰度差分析模块,基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差。
可选的,本实施例的视频对比模型可以是一个端到端网络结构的模型,输入是图像序列,输出是清晰度差值。这样不但可以实现对视频之间的清晰度差异的量化,还有效解决了非端到端网络结构的模型在训练难度高,部署繁琐的问题。
本实施例的清晰度差值,取值可以为正负值或零,取值为零可以表示第一视频的清晰度与第二视频的清晰度相同,正值可以表示第一视频的清晰度比第二视频的清晰度高,负值可以表示第一视频的清晰度值比第二视频的清晰度值低。
在一个实施例中,步骤“通过视频对比模型的清晰度差分析模块,基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差”,可以包括:通过视频对比模型的清晰度差分析模块,计算第一清晰度特征向量和第二清晰度特征向量的相似度;及基于相似度,确定第一视频和第二视频的清晰度差。
其中,相似度可以用向量之间的向量距离表示,例如欧式距离。
在一个实施例中,步骤“通过视频对比模型的清晰度差分析模块,基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差”,可以包括:通过视频对比模型的清晰度差分析模块,计算第一清晰度特征向量和第二清晰度特征向量的向量差值;及基于向量差值确定第一视频和第二视频的清晰度差。
其中,清晰度差的具体计算过程,可以参考模型训练方案中的相关描述。
本实施例的清晰度差分析模块包括全连接层,该全连接层可以由至少一个全连接层构成。
其中,步骤“基于向量差值确定第一视频和第二视频的清晰度差”,可以包括:通过全连接层对向量差值进行处理,得到一维的向量差值;及对一维的向量差值进行归一化,得到第一视频和第二视频的清晰度差。
例如,类似于模型训练方案中的示例,还是假设第一清晰度特征向量和第二清晰度特征向量分别为F 1,F 2,对F 1和F 2进行按位相减的操作得到向量差 值F final为F final=F 1-F 2
清晰度差可以基于全连接层对向量差值的处理得到,本实施例的全连接层中包含的全连接层的数量不限,类似的可以包含图3b示出的第一全连接层FC 1和第二全连接层FC 2,第一全连接层和第二全连接层之间连接有激活层,激活层的激活函数可以是非线性激活函数层,如ReLU(Rectified Linear Unit,线性整流函数)。其中,一维的向量差值为F score=FC 2(ReLU(FC 1(F final)))。
当然,在其他实施例中,还可以采用其他可用的激活函数替换上述的ReLU函数。
本实施例中,可以将第二视频转码方式作为预设参考视频转码方式。
在通过视频对比模型的清晰度差分析模块基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差后,还可以基于清晰度差,分析第一视频转码方式相较于预设参考视频转码方式的转码性能优劣程度。
例如,设置清晰度差值与转码性能优劣等级的对应关系,若清晰度差值在-1-0(不包括0)范围内,第一视频转码方式劣于第二视频转码方式,若清晰度差值在0-1(不包括0)范围内,第一视频转码方式优于第二视频转码方式。
对于-1-0(不包括0),还可以细分为不同的若干范围,然后对每个范围设置不同的劣势等级,越接近-1,劣势等级越高,对于0-1(不包括0),也可以细分为不同的若干范围,然后对每个范围设置不同的优势等级,越接近1,劣势等级越高。
本实施例中,可以基于(尤其是同源视频的)转码性能优劣程度,确定对上述待分析的视频客户端的优化方案,例如对视频客户端提供的第一视频转码方式进行优化或替换。
对于UGC类视频,采用本实施例的方案,可以准确评估与竞品视频客户端的性能差异,有利于优化客户端,提高产品的视频质量,进而提升用户体验,吸引更多用户使用。
本实施例中,终端的摄像头的拍摄模式(不同拍摄模式的拍摄参数不同) 可以有多种,本实施例中,可以通过上述方案获取多个原始视频对应的清晰度差,其中,原始视频视为本终端通过摄像头拍摄,并且多个原始视频的总的拍摄模式不低于两种。
在获取清晰度差后,还可以分析拍摄模式对清晰度差值的影响;
基于分析结果,确定第一视频转换方式对应的目标拍摄模式,其中,在目标拍摄模式下,第一视频与第二视频相比最为清晰。
其中,第一视频和第二视频的清晰度差可以为正值(第一视频更清晰)或为负值(第二视频更清晰),若清晰度差中存在正值,在目标拍摄模式下,第一视频,与通过(如竞品客户端)的第二视频转换方式得到的第二视频之间的清晰度差值最大(且为正值),若清晰度差中不存在正值,在目标拍摄模式下,第一视频,与通过(如竞品客户端)的第二视频转换方式得到的第二视频之间的清晰度差值最小(且为负值)。
本实施例提供了一种视频对比方法,可以获取第一视频和第二视频;然后从第一视频中获取第一图像序列,从第二视频中获取第二图像序列;再通过视频对比模型的第一特征提取模块对第一图像序列进行第一清晰度特征向量的提取;通过视频对比模型的第二特征提取模块对第二图像序列进行第二清晰度特征向量的提取,本实施例的第一特征提取模块和第二特征提取模块相同,其提取的两个图像序列的清晰度特征向量可以更加准确反映出两个视频的清晰度相对情况,提取特征向量后,可通过视频对比模型的清晰度差分析模块基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差,实现对两个视频的清晰度差异的量化,本实施例基于清晰度特征向量的分析,有利于提升对视频的清晰度差异的分析准确度。
进一步的,本实施例的视频对比模型输入两个视频,即可输出视频之间的清晰度差值,这种端对端的解决方案,十分便于模型的部署。
进一步的,可以将本实施例的方案应用于UGC视频推荐端的竞品数据自动化分析,可以准确评估与竞品的性能差异,提高产品的视频质量;同时,本实施例的方案也可以应用于对视频转码技术的评测,准确估计不同转码技术 的性能优劣,使得转码技术能够针对视频清晰度做出有效的优化策略。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
为了更好地实施以上方法,相应的,本发明实施例还提供一种视频对比装置,该视频对比装置具体可以集成在终端中。
如图4所示,在一个实施例中,提供了一种视频对比装置。参照图9,该视频对比装置包括:获取单元401、序列提取单元402、第一特征提取单元403、第二特征提取单元404和清晰度差分析单元405。视频对比装置中包括的各个模块可全部或部分通过软件、硬件或其组合来实现。
获取单元401,用于获取第一视频和第二视频。
序列提取单元402,用于从第一视频中获取第一图像序列,并从第二视频中获取第二图像序列。
第一特征提取单元403,用于通过视频对比模型的第一特征提取模块,提取第一图像序列的第一清晰度特征向量。
第二特征提取单元404,用于通过视频对比模型的第二特征提取模块,提取第二图像序列的第二清晰度特征向量,其中,第一特征提取模块和第二特征提取模块相同。
清晰度差分析单元405,用于通过视频对比模型的清晰度差分析模块,基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差。
在一个实施例中,获取单元401还用于获取原始视频;对原始视频按照第 一视频转码方式进行转换,得到第一视频;及对原始视频按照第二视频转码方式进行转换,得到第二视频。
在一个实施例中,获取单元401还用于基于待评估的视频客户端提供的第一视频转码方式,对原始视频进行转换,得到第一视频;及基于视频客户端的参考视频客户端提供的第二视频转码方式,对原始视频进行转换,得到第二视频。
在一个实施例中,序列提取单元402还用于从第一视频中提取预设数量的第一图像,组成第一图像序列;及从第二视频中,提取与第一视频中的第一图像位置相同的第二图像,组成第二图像序列。
在一个实施例中,第一特征提取单元403还用于通过第一特征提取模块,将第一图像序列中的第一图像从像素空间映射到目标嵌入空间,得到第一图像序列的第一图像特征向量;及通过第一特征提取模块,基于第一图像特征向量对应的第一图像之间的时序关系,对第一图像特征向量进行分析,得到第一图像序列对应的第一清晰度特征向量;第二特征提取单元404还用于通过第二特征提取模块,将第二图像序列中的第二图像从像素空间映射到目标嵌入空间,得到第二图像序列的第二图像特征向量;及通过第二特征提取模块,基于第二图像特征向量对应的第二图像之间的时序关系,对第二图像特征向量进行分析,得到第二图像序列对应的第二清晰度特征向量。
在一个实施例中,清晰度差分析单元403还用于通过视频对比模型的清晰度差分析模块计算第一清晰度特征向量和第二清晰度特征向量的向量差值;及基于向量差值确定第一视频和第二视频的清晰度差。
在一个实施例中,第二视频转码方式为预设参考视频转码方式;本实施例的视频对比装置还包括:转码性能分析单元,用于在清晰度差分析单元,通过视频对比模型的清晰度差分析模块基于第一清晰度特征向量和第二清晰度特征向量,确定第一视频和第二视频的清晰度差后,基于清晰度差,分析第一视频转码方式相较于预设参考视频转码方式的转码性能优劣程度。
在一个实施例中,本实施例的视频对比装置还包括:训练单元,用于在 通过视频对比模型的第一特征提取模块对第一图像序列进行第一清晰度特征向量的提取前,获取训练样本对,其中,训练样本对中包括第一图像序列样本和第二图像序列样本,同一训练样本对中的第一图像序列样本来自同一个第一视频样本,同一训练样本对中的第二图像序列样本来自同一个第二视频样本,训练样本对的样本标签包括对应的第一视频样本和第二视频样本的期望清晰度差;获取待训练的视频对比模型,视频对比模型包括第一特征提取模块,第二特征提取模块,和清晰度差分析模块;通过第一特征提取模块,对第一图像序列样本进行第一清晰度特征向量的提取;通过第二特征提取模块,对第二图像序列样本进行第二清晰度特征向量的提取,其中,第一特征提取模块和第二特征提取模块的网络结构和网络参数相同;通过清晰度差分析模块,对同一训练样本对对应的第一清晰度特征向量和第二清晰度特征向量进行分析,确定训练样本对对应的第一视频样本和第二视频样本的预测清晰度差;基于训练样本对的预测清晰度差和对应的期望清晰度差,调整视频对比模型的参数,直到视频对比模型训练完成,其中,在每一次参数调整后,第一特征提取模块和第二特征提取模块的参数相同。
在一个实施例中,训练单元还用于通过第一特征提取模块,将第一图像序列样本中的图像从像素空间映射到目标嵌入空间,得到第一图像序列样本的第一图像特征向量;通过第一特征提取模块,基于第一图像特征向量对应的图像之间的时序关系,对第一图像特征向量进行分析,得到第一图像序列样本的第一清晰度特征向量;通过第二特征提取模块,将第二图像序列样本中的图像从像素空间映射到目标嵌入空间,得到第二图像序列样本的第二图像特征向量;通过第二特征提取模块,基于第二图像特征向量对应的图像之间的时序关系,对第二图像特征向量进行分析,得到第二图像序列样本的第二清晰度特征向量。
在一个实施例中,训练单元还用于通过清晰度差分析模块,计算同一训练样本对所对应的第一清晰度特征向量和第二清晰度特征向量之间的向量差值;基于训练样本对的向量差值,确定训练样本对对应的第一视频样本和第 二视频样本的预测清晰度差。
采用本实施例的方案,能够对视频进行准确有效的清晰度对比分析,一定程度上提升无参考视频清晰度分析的准确性,同时端到端的解决方案也方便了模型的部署。
此外,本发明实施例还提供一种计算机设备,该计算机设备可以为终端或者服务器,如图5所示,其示出了本发明实施例所涉及的计算机设备的结构示意图,具体来讲:
该计算机设备可以包括一个或者一个以上处理核心的处理器501、一个或一个以上计算机可读存储介质的存储器502、电源503和输入单元504等部件。本领域技术人员可以理解,图5中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器501是该计算机设备的控制中心,利用各种接口和线路连接整个计算机设备的各个部分,通过运行或执行存储在存储器502内的软件程序和/或模块,以及调用存储在存储器502内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。可选的,处理器501可包括一个或多个处理核心;优选的,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。
存储器502可用于存储软件程序以及模块,处理器501通过运行存储在存储器502的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器502可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器502还 可以包括存储器控制器,以提供处理器501对存储器502的访问。
计算机设备还包括给各个部件供电的电源503,优选的,电源503可以通过电源管理系统与处理器501逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源503还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该计算机设备还可包括输入单元504,该输入单元504可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,计算机设备还可以包括显示单元等,在此不再赘述。
本发明实施例涉及的系统可以是由客户端、多个节点(接入网络中的任意形式的计算机设备,如服务器、终端)通过网络通信的形式连接形成的分布式系统。
以分布式系统为区块链系统为例,参见图6,图6是本发明实施例提供的分布式系统100应用于区块链系统的一个可选的结构示意图,由多个节点(接入网络中的任意形式的计算设备,如服务器、用户终端)和客户端形成,节点之间形成组成的点对点(P2P,Peer To Peer)网络,P2P协议是一个运行在传输控制协议(TCP,Transmission Control Protocol)协议之上的应用层协议。在分布式系统中,任何机器如服务器、终端都可以加入而成为节点,节点包括硬件层、中间层、操作系统层和应用层。本实施例中,原始视频,第一视频、第二视频,训练样本对,第一视频样本、第二视频样本以及清晰度差等等,都可以通过节点被存储在区域链系统的共享账本中,计算机设备(例如终端或服务器)可以基于共享账本存储的记录数据获取清晰度差。
参见图6示出的区块链系统中各节点的功能,涉及的功能包括:
1)路由,节点具有的基本功能,用于支持节点之间的通信。
节点除具有路由功能外,还可以具有以下功能:
2)应用,用于部署在区块链中,根据实际业务需求而实现特定业务,记 录实现功能相关的数据形成记录数据,在记录数据中携带数字签名以表示任务数据的来源,将记录数据发送到区块链系统中的其他节点,供其他节点在验证记录数据来源以及完整性成功时,将记录数据添加到临时区块中。
例如,应用实现的业务包括:
2.1)钱包,用于提供进行电子货币的交易的功能,包括发起交易(即,将当前交易的交易记录发送给区块链系统中的其他节点,其他节点验证成功后,作为承认交易有效的响应,将交易的记录数据存入区块链的临时区块中;当然,钱包还支持查询电子货币地址中剩余的电子货币;
2.2)共享账本,用于提供账目数据的存储、查询和修改等操作的功能,将对账目数据的操作的记录数据发送到区块链系统中的其他节点,其他节点验证有效后,作为承认账目数据有效的响应,将记录数据存入临时区块中,还可以向发起操作的节点发送确认。
2.3)智能合约,计算机化的协议,可以执行某个合约的条款,通过部署在共享账本上的用于在满足一定条件时而执行的代码实现,根据实际的业务需求代码用于完成自动化的交易,例如查询买家所购买商品的物流状态,在买家签收货物后将买家的电子货币转移到商户的地址;当然,智能合约不仅限于执行用于交易的合约,还可以执行对接收的信息进行处理的合约。
3)区块链,包括一系列按照产生的先后时间顺序相互接续的区块(Block),新区块一旦加入到区块链中就不会再被移除,区块中记录了区块链系统中节点提交的记录数据。
参见图7,图7是本发明实施例提供的区块结构(Block Structure)一个可选的示意图,每个区块中包括本区块存储交易记录的哈希值(本区块的哈希值)、以及前一区块的哈希值,各区块通过哈希值连接形成区块链。另外,区块中还可以包括有区块生成时的时间戳等信息。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了相关的信息,用于验证其信息的有效性(防伪)和生成下一个区块。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详 细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (22)

  1. 一种视频对比方法,其特征在于,由计算机设备执行,所述方法包括:
    获取第一视频和第二视频;
    从所述第一视频中获取第一图像序列,并从所述第二视频中获取第二图像序列;
    通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量;
    通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量,其中,所述第一特征提取模块和所述第二特征提取模块相同;及
    通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差。
  2. 根据权利要求1所述的视频对比方法,其特征在于,所述获取第一视频和第二视频,包括:
    获取原始视频;
    对所述原始视频按照第一视频转码方式进行转换,得到第一视频;及
    对所述原始视频按照第二视频转码方式进行转换,得到第二视频。
  3. 根据权利要求2所述的视频对比方法,其特征在于,所述对所述原始视频按照第一视频转码方式进行转换,得到第一视频,包括:
    基于待评估的视频客户端提供的第一视频转码方式,对所述原始视频进行转换,得到第一视频;
    所述对所述原始视频按照第二视频转码方式进行转换,得到第二视频,包括:
    基于所述视频客户端的参考视频客户端提供的第二视频转码方式,对所述原始视频进行转换,得到第二视频。
  4. 根据权利要求1所述的视频对比方法,其特征在于,所述从所述第一 视频中获取第一图像序列,并从所述第二视频中获取第二图像序列,包括:
    从所述第一视频中提取预设数量的第一图像,组成第一图像序列;及
    从所述第二视频中,提取与所述第一视频中的第一图像位置相同的第二图像,组成第二图像序列。
  5. 根据权利要求1所述的视频对比方法,其特征在于,所述通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量,包括:
    通过所述第一特征提取模块,将所述第一图像序列中的第一图像从像素空间映射到目标嵌入空间,得到所述第一图像序列的第一图像特征向量;及通过所述第一特征提取模块,基于所述第一图像特征向量对应的第一图像之间的时序关系,对所述第一图像特征向量进行分析,得到所述第一图像序列对应的第一清晰度特征向量;
    所述通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量,包括:
    通过所述第二特征提取模块,将所述第二图像序列中的第二图像从像素空间映射到所述目标嵌入空间,得到所述第二图像序列的第二图像特征向量;及
    通过所述第二特征提取模块,基于所述第二图像特征向量对应的第二图像之间的时序关系,对所述第二图像特征向量进行分析,得到所述第二图像序列对应的第二清晰度特征向量。
  6. 根据权利要求1所述的视频对比方法,其特征在于,所述通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差,包括:
    通过所述视频对比模型的清晰度差分析模块,计算所述第一清晰度特征向量和所述第二清晰度特征向量的向量差值;及
    基于所述向量差值确定所述第一视频和第二视频的清晰度差。
  7. 根据权利要求2所述的视频对比方法,其特征在于,所述第二视频转 码方式为预设参考视频转码方式;
    所述通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差之后,还包括:
    基于所述清晰度差,确定所述第一视频转码方式相较于所述预设参考视频转码方式的转码性能优劣程度。
  8. 根据权利要求1-7任一项所述的视频对比方法,其特征在于,在所述通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量之前,还包括:
    获取训练样本对,其中,训练样本对中包括第一图像序列样本和第二图像序列样本,同一训练样本对中的第一图像序列样本来自同一个第一视频样本,同一训练样本对中的第二图像序列样本来自同一个第二视频样本,所述训练样本对的样本标签包括所述训练样本对所对应的第一视频样本和第二视频样本的期望清晰度差;
    获取待训练的视频对比模型,所述视频对比模型包括第一特征提取模块、第二特征提取模块和清晰度差分析模块;
    通过所述第一特征提取模块,提取所述第一图像序列样本的第一清晰度特征向量;
    通过所述第二特征提取模块,提取所述第二图像序列样本的第二清晰度特征向量,其中,所述第一特征提取模块和所述第二特征提取模块的网络结构和网络参数相同;
    通过所述清晰度差分析模块,对同一训练样本对所对应的所述第一清晰度特征向量和所述第二清晰度特征向量进行分析,确定所述训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差;及
    基于训练样本对的预测清晰度差和期望清晰度差,调整所述视频对比模型的参数,直到所述视频对比模型训练完成,其中,在每一次参数调整后,所述第一特征提取模块和第二特征提取模块的参数相同。
  9. 根据权利要求8所述的视频对比方法,其特征在于,所述通过所述第一特征提取模块,提取所述第一图像序列样本的第一清晰度特征向量,包括:
    通过所述第一特征提取模块,将所述第一图像序列样本中的图像从像素空间映射到目标嵌入空间,得到所述第一图像序列样本的第一图像特征向量;及
    通过所述第一特征提取模块,基于所述第一图像特征向量对应的图像之间的时序关系,对所述第一图像特征向量进行分析,得到所述第一图像序列样本的第一清晰度特征向量;
    所述通过所述第二特征提取模块,提取所述第二图像序列样本的第二清晰度特征向量,包括:
    通过所述第二特征提取模块,将所述第二图像序列样本中的图像从像素空间映射到所述目标嵌入空间,得到所述第二图像序列样本的第二图像特征向量;及
    通过所述第二特征提取模块,基于所述第二图像特征向量对应的图像之间的时序关系,对所述第二图像特征向量进行分析,得到所述第二图像序列样本的第二清晰度特征向量。
  10. 根据权利要求8所述的视频对比方法,其特征在于,所述通过所述清晰度差分析模块,对同一训练样本对所对应的所述第一清晰度特征向量和所述第二清晰度特征向量进行分析,确定所述训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差,包括:
    通过所述清晰度差分析模块,计算同一训练样本对所对应的所述第一清晰度特征向量和所述第二清晰度特征向量之间的向量差值;及
    基于所述训练样本对的向量差值,确定所述训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差。
  11. 一种视频对比装置,其特征在于,所述装置包括:
    获取单元,用于获取第一视频和第二视频;
    序列提取单元,用于从所述第一视频中获取第一图像序列,并从所述第 二视频中获取第二图像序列;
    第一特征提取单元,用于通过视频对比模型的第一特征提取模块,提取所述第一图像序列的第一清晰度特征向量;
    第二特征提取单元,用于通过所述视频对比模型的第二特征提取模块,提取所述第二图像序列的第二清晰度特征向量,其中,所述第一特征提取模块和第二特征提取模块相同;及
    清晰度差分析单元,用于通过所述视频对比模型的清晰度差分析模块,基于所述第一清晰度特征向量和所述第二清晰度特征向量,确定所述第一视频和所述第二视频的清晰度差。
  12. 据根据权利要求11所述的装置,其特征在于,所述获取单元还用于获取原始视频;对所述原始视频按照第一视频转码方式进行转换,得到第一视频;及对所述原始视频按照第二视频转码方式进行转换,得到第二视频。
  13. 根据权利要求12所述的装置,其特征在于,获取单元还用于基于待评估的视频客户端提供的第一视频转码方式,对所述原始视频进行转换,得到第一视频;及基于所述视频客户端的参考视频客户端提供的第二视频转码方式,对所述原始视频进行转换,得到第二视频。
  14. 根据权利要求11所述的装置,其特征在于,所述序列提取单元还用于从所述第一视频中提取预设数量的第一图像,组成第一图像序列;及从所述第二视频中,提取与所述第一视频中的第一图像位置相同的第二图像,组成第二图像序列。
  15. 根据权利要求11所述的视频对比方法,其特征在于,所述第一特征提取单元还用于通过所述第一特征提取模块,将所述第一图像序列中的第一图像从像素空间映射到目标嵌入空间,得到所述第一图像序列的第一图像特征向量;及通过所述第一特征提取模块,基于所述第一图像特征向量对应的第一图像之间的时序关系,对所述第一图像特征向量进行分析,得到所述第一图像序列对应的第一清晰度特征向量;
    所述第二特征提取单元还用于通过所述第二特征提取模块,将所述第二 图像序列中的第二图像从像素空间映射到所述目标嵌入空间,得到所述第二图像序列的第二图像特征向量;及通过所述第二特征提取模块,基于所述第二图像特征向量对应的第二图像之间的时序关系,对所述第二图像特征向量进行分析,得到所述第二图像序列对应的第二清晰度特征向量。
  16. 根据权利要求11所述的装置,其特征在于,所述清晰度差分析单元还用于通过所述视频对比模型的清晰度差分析模块,计算所述第一清晰度特征向量和所述第二清晰度特征向量的向量差值;及基于所述向量差值确定所述第一视频和第二视频的清晰度差。
  17. 根据权利要求12所述的装置,其特征在于,所述第二视频转码方式为预设参考视频转码方式;所述清晰度差分析单元还用于基于所述清晰度差,确定所述第一视频转码方式相较于所述预设参考视频转码方式的转码性能优劣程度。
  18. 根据权利要求11-17任一项所述的装置,其特征在于,所述装置还包括:训练单元,用于获取训练样本对,其中,训练样本对中包括第一图像序列样本和第二图像序列样本,同一训练样本对中的第一图像序列样本来自同一个第一视频样本,同一训练样本对中的第二图像序列样本来自同一个第二视频样本,所述训练样本对的样本标签包括所述训练样本对所对应的第一视频样本和第二视频样本的期望清晰度差;获取待训练的视频对比模型,所述视频对比模型包括第一特征提取模块、第二特征提取模块和清晰度差分析模块;通过所述第一特征提取模块,提取所述第一图像序列样本的第一清晰度特征向量;通过所述第二特征提取模块,提取所述第二图像序列样本的第二清晰度特征向量,其中,所述第一特征提取模块和所述第二特征提取模块的网络结构和网络参数相同;通过所述清晰度差分析模块,对同一训练样本对所对应的所述第一清晰度特征向量和所述第二清晰度特征向量进行分析,确定所述训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差;及基于训练样本对的预测清晰度差和期望清晰度差,调整所述视频对比模型的参数,直到所述视频对比模型训练完成,其中,在每一次参数调整后,所述第一 特征提取模块和第二特征提取模块的参数相同。
  19. 根据权利要求18所述的装置,其特征在于,所述训练单元还用于通过所述第一特征提取模块,将所述第一图像序列样本中的图像从像素空间映射到目标嵌入空间,得到所述第一图像序列样本的第一图像特征向量;通过所述第一特征提取模块,基于所述第一图像特征向量对应的图像之间的时序关系,对所述第一图像特征向量进行分析,得到所述第一图像序列样本的第一清晰度特征向量;通过所述第二特征提取模块,将所述第二图像序列样本中的图像从像素空间映射到所述目标嵌入空间,得到所述第二图像序列样本的第二图像特征向量;及通过所述第二特征提取模块,基于所述第二图像特征向量对应的图像之间的时序关系,对所述第二图像特征向量进行分析,得到所述第二图像序列样本的第二清晰度特征向量。
  20. 根据权利要求18所述的装置,其特征在于,所述训练单元还用于通过所述清晰度差分析模块,计算同一训练样本对所对应的所述第一清晰度特征向量和所述第二清晰度特征向量之间的向量差值;及基于所述训练样本对的向量差值,确定所述训练样本对所对应的第一视频样本和第二视频样本的预测清晰度差。
  21. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至10中任一项所述的方法的步骤。
  22. 一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至10中任一项所述的方法的步骤。
PCT/CN2020/122626 2020-03-17 2020-10-22 视频对比方法、装置、计算机设备和存储介质 WO2021184754A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/722,300 US20220237917A1 (en) 2020-03-17 2022-04-15 Video comparison method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010187813.1A CN111414842B (zh) 2020-03-17 2020-03-17 一种视频对比方法、装置、计算机设备和存储介质
CN202010187813.1 2020-03-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/722,300 Continuation US20220237917A1 (en) 2020-03-17 2022-04-15 Video comparison method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021184754A1 true WO2021184754A1 (zh) 2021-09-23

Family

ID=71491293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122626 WO2021184754A1 (zh) 2020-03-17 2020-10-22 视频对比方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
US (1) US20220237917A1 (zh)
CN (1) CN111414842B (zh)
WO (1) WO2021184754A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414842B (zh) * 2020-03-17 2021-04-13 腾讯科技(深圳)有限公司 一种视频对比方法、装置、计算机设备和存储介质
CN112580577B (zh) * 2020-12-28 2023-06-30 出门问问(苏州)信息科技有限公司 一种基于面部关键点生成说话人图像的训练方法及装置
CN115529457B (zh) * 2022-09-05 2024-05-14 清华大学 基于深度学习的视频压缩方法和装置
CN115689819B (zh) * 2022-09-23 2023-06-30 河北东来工程技术服务有限公司 一种船舶应急训练方法、系统、装置及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050036702A1 (en) * 2003-08-12 2005-02-17 Xiaoli Yang System and method to enhance depth of field of digital image from consecutive image taken at different focus
US20070216787A1 (en) * 2006-03-16 2007-09-20 Lin Peng W Image unsharpness test method for a camera device
CN105469411A (zh) * 2015-11-27 2016-04-06 小米科技有限责任公司 用于检测图像清晰度的方法、装置及终端
CN108540807A (zh) * 2018-02-28 2018-09-14 北京达佳互联信息技术有限公司 视频文件转码的方法、装置及终端
CN108682024A (zh) * 2018-04-20 2018-10-19 长沙学院 一种图像清晰度匹配方法和系统
CN110689524A (zh) * 2019-09-04 2020-01-14 华南理工大学 一种无参考在线图像清晰度评价方法与系统
CN111414842A (zh) * 2020-03-17 2020-07-14 腾讯科技(深圳)有限公司 一种视频对比方法、装置、计算机设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8451384B2 (en) * 2010-07-08 2013-05-28 Spinella Ip Holdings, Inc. System and method for shot change detection in a video sequence
CN107103331B (zh) * 2017-04-01 2020-06-16 中北大学 一种基于深度学习的图像融合方法
US10574890B2 (en) * 2018-01-12 2020-02-25 Movidius Ltd. Methods and apparatus to operate a mobile camera for low-power usage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050036702A1 (en) * 2003-08-12 2005-02-17 Xiaoli Yang System and method to enhance depth of field of digital image from consecutive image taken at different focus
US20070216787A1 (en) * 2006-03-16 2007-09-20 Lin Peng W Image unsharpness test method for a camera device
CN105469411A (zh) * 2015-11-27 2016-04-06 小米科技有限责任公司 用于检测图像清晰度的方法、装置及终端
CN108540807A (zh) * 2018-02-28 2018-09-14 北京达佳互联信息技术有限公司 视频文件转码的方法、装置及终端
CN108682024A (zh) * 2018-04-20 2018-10-19 长沙学院 一种图像清晰度匹配方法和系统
CN110689524A (zh) * 2019-09-04 2020-01-14 华南理工大学 一种无参考在线图像清晰度评价方法与系统
CN111414842A (zh) * 2020-03-17 2020-07-14 腾讯科技(深圳)有限公司 一种视频对比方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN111414842A (zh) 2020-07-14
CN111414842B (zh) 2021-04-13
US20220237917A1 (en) 2022-07-28

Similar Documents

Publication Publication Date Title
US11652956B2 (en) Emotion recognition in video conferencing
WO2021184754A1 (zh) 视频对比方法、装置、计算机设备和存储介质
WO2021093468A1 (zh) 视频分类方法、模型训练方法、装置、设备及存储介质
CN109145759B (zh) 车辆属性识别方法、装置、服务器及存储介质
CN111814620B (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN108197618B (zh) 用于生成人脸检测模型的方法和装置
CN110728209A (zh) 一种姿态识别方法、装置、电子设备及存储介质
KR20160083900A (ko) 얼굴 표현을 위한 시스템 및 방법
CN110889672A (zh) 一种基于深度学习的学生打卡及上课状态的检测系统
CN112395979A (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN111126347B (zh) 人眼状态识别方法、装置、终端及可读存储介质
CN112820071B (zh) 一种行为识别方法和装置
CN113254491A (zh) 一种信息推荐的方法、装置、计算机设备及存储介质
WO2022052782A1 (zh) 图像的处理方法及相关设备
CN115131218A (zh) 图像处理方法、装置、计算机可读介质及电子设备
Yu Emotion monitoring for preschool children based on face recognition and emotion recognition algorithms
Parde et al. Deep convolutional neural network features and the original image
CN114330514A (zh) 一种基于深度特征与梯度信息的数据重建方法及系统
CN112669244A (zh) 人脸图像增强方法、装置、计算机设备以及可读存储介质
WO2023217138A1 (zh) 一种参数配置方法、装置、设备、存储介质及产品
CN117218398A (zh) 一种数据处理的方法以及相关装置
CN112749686B (zh) 图像检测方法、装置、计算机设备及存储介质
CN112101306B (zh) 基于rgb图像的精细化人脸表情捕获方法及装置
WO2024099026A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
CN115797723B (zh) 滤镜推荐方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20926225

Country of ref document: EP

Kind code of ref document: A1