WO2022037343A1 - 一种视频信息处理方法、装置、电子设备及存储介质 - Google Patents

一种视频信息处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022037343A1
WO2022037343A1 PCT/CN2021/106552 CN2021106552W WO2022037343A1 WO 2022037343 A1 WO2022037343 A1 WO 2022037343A1 CN 2021106552 W CN2021106552 W CN 2021106552W WO 2022037343 A1 WO2022037343 A1 WO 2022037343A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
image frame
image
information processing
feature vector
Prior art date
Application number
PCT/CN2021/106552
Other languages
English (en)
French (fr)
Inventor
束俊辉
叶豪
范湉湉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21857438.2A priority Critical patent/EP4120122A4/en
Publication of WO2022037343A1 publication Critical patent/WO2022037343A1/zh
Priority to US17/973,418 priority patent/US20230045726A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • the present application relates to video information processing technology, and in particular, to a video information processing method, apparatus, electronic device, and storage medium.
  • artificial intelligence is the use of digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results Theory, method and technology and application system.
  • video recognition can be achieved by using a digital computer or a machine controlled by a digital computer.
  • research in this direction has received extensive attention and can be effectively applied in the fields of security, home furnishing, medical care, and teaching.
  • An embodiment of the present application provides a video information processing method, which is executed by an electronic device, and the method includes:
  • the image frames in the video image frame set are cropped, and the image feature vector is determined based on the corresponding cropping result
  • the similarity between the first video and the second video is determined based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video.
  • the embodiment of the present application also provides a video information processing device, including:
  • an information transmission module configured to determine a set of video image frames corresponding to the video
  • Information processing module configured as:
  • the image frames in the video image frame set are cropped, and the image feature vector is determined based on the corresponding cropping result
  • the similarity between the first video and the second video is determined based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video.
  • the embodiment of the present application also provides an electronic device, the electronic device includes:
  • the processor is configured to implement the aforementioned video information processing method when executing the executable instructions stored in the memory.
  • Embodiments of the present application further provide a computer-readable storage medium storing executable instructions, and when the executable instructions are executed by a processor, the foregoing video information processing method is implemented.
  • FIG. 1 is a schematic diagram of a usage environment of a video information processing method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the composition and structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a video information processing method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a dynamic splicing area and a static splicing area of a video image frame in an embodiment of the present application;
  • FIG. 5 is a schematic diagram of a dynamic splicing area and a static splicing area of a video image frame in an embodiment of the present application
  • FIG. 6 is a schematic diagram of a process of determining a static splicing area in an embodiment of the present application
  • FIG. 7A is a schematic diagram of cropping an image frame in an embodiment of the present application.
  • FIG. 7B is a schematic diagram of a processing process of a video information processing model in an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a training video information processing model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a process of video similarity judgment in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a video information processing apparatus 100 provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a blockchain in a blockchain network 200 provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the functional architecture of the blockchain network 200 provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of a usage scenario of the video information processing method provided by the embodiment of the present application.
  • FIG. 14 is a schematic diagram of a usage process of a video information processing method in an embodiment of the present application.
  • 15 is a schematic diagram of image cropping performed by a video information processing method in an embodiment of the present application.
  • FIG. 16 is a schematic diagram of image cropping performed by a video information processing method according to an embodiment of the present application.
  • one or more of the executed operations may be real-time, or may have a set delay; Unless otherwise specified, there is no restriction on the order of execution of multiple operations to be executed.
  • Video It generally refers to various forms of video information available on the Internet.
  • the names of the first video and the second video in the following are different only for the convenience of distinction.
  • APP the carrier that implements specific functions in the terminal
  • the mobile client is the carrier of specific functions in the mobile terminal, such as the function of performing online live broadcast (video streaming) or the function of playing online video.
  • Dynamic splicing area (video dynamic area): the area of the screen that changes with the passage of the time axis during the playback of the video, and this part is the main content of the video.
  • Static splicing area (video static area): a picture area that does not change with the passage of the time axis during the playback of the video.
  • Information flow a form of content organization arranged up and down according to a specific specification style. From the perspective of display sorting, time sorting, popularity sorting or algorithm sorting can be applied.
  • Image feature vector that is, the image 01 vector, which is a binarized feature vector generated based on the image.
  • Dynamic and static spliced video There are some picture areas in the video that do not change with the passage of the time axis during the playback process, that is, the image frame of the video includes both the dynamic splicing area and the static splicing area.
  • FIG. 1 is a schematic diagram of a usage scenario of a video information processing method provided by an embodiment of the present application.
  • terminals eg, terminal 10-1 and terminal 10-2
  • the terminals can use the business process in the client to obtain different videos from the corresponding server 200 through the network 300 for browsing, wherein the terminal is connected to the server 200 through the network 300, and the network 300 It can be a wide area network or a local area network, or a combination of the two.
  • the video types acquired by the terminals (such as the terminal 10-1 and the terminal 10-2) from the corresponding server 200 through the network 300 are not limited, for example: the terminals (such as the terminal 10-1 and the terminal 10-2) both
  • the video can be obtained from the corresponding server 200 through the network 300 (that is, the video carries the video information or the corresponding video link), or the corresponding video containing only text or images can be obtained from the corresponding server 200 through the network 300 for browsing.
  • the server 200 may store different types of videos. Wherein, in this application, the compilation environments of different types of videos are not distinguished. In this process, the videos pushed to the user's client should be copyright-compliant videos. Therefore, for a large number of videos, it is necessary to determine which videos are similar, and further perform compliance detection on the copyright information of similar videos.
  • the embodiments of the present application can be applied to short video playback.
  • UI User Interface
  • the video to be recommended corresponding to the corresponding user if the recommended video is a pirated video with an illegal copyright, it will have a bad impact on the user experience.
  • the background database used for video playback receives a large amount of video data from different sources every day, and the different videos obtained for video recommendation to target users can also be called by other applications (for example, the recommendation results of the short video recommendation process are migrated to long videos).
  • Recommendation process or news recommendation process of course, the video information processing model that matches the corresponding target user can also be migrated to different video recommendation processes (such as webpage video recommendation process, applet video recommendation process or long video client video recommendation process process).
  • the video information processing method provided by the embodiments of the present application may be implemented by a terminal.
  • the terminals such as the terminal 10-1 and the terminal 10-2 can locally implement the video information processing solution.
  • the video information processing method provided by the embodiments of the present application may be implemented by a server.
  • the server 200 may implement a video information processing scheme.
  • the video information processing method provided by the embodiments of the present application may be implemented by a terminal and a server collaboratively.
  • the terminals such as the terminal 10-1 and the terminal 10-2
  • the server 200 may send the finally obtained video to be recommended to the terminal for video recommendation.
  • FIG. 2 is a schematic diagram of the composition and structure of an electronic device provided by an embodiment of the present application. It can be understood that FIG. 2 only shows an exemplary structure of the electronic device but not the entire structure, and part or all of the structure shown in FIG. 2 can be implemented as needed. .
  • the electronic device includes: at least one processor 201 , a memory 202 , a user interface 203 , and at least one network interface 204 .
  • the various components in electronic device 20 are coupled together by bus system 205 .
  • the bus system 205 is used to implement the connection communication between these components.
  • the bus system 205 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 205 in FIG. 2 .
  • the user interface 203 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch pad or a touch screen, and the like.
  • the memory 202 may be either volatile memory or non-volatile memory, and may include both volatile and non-volatile memory.
  • the memory 202 in this embodiment of the present application can store data to support the operations of the terminals (eg, the terminal 10-1 and the terminal 10-2). Examples of such data include: any computer programs, such as operating systems and applications, used to operate on terminals such as terminal 10-1 and terminal 10-2.
  • the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks.
  • Applications can contain various applications.
  • the video information processing apparatus provided by the embodiments of the present application may be implemented by a combination of software and hardware.
  • the video information processing apparatus provided by the embodiments of the present application may be processors in the form of hardware decoding processors , which is programmed to execute the video information processing method provided by the embodiments of the present application.
  • the processor in the form of a hardware decoding processor may adopt one or more Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), Digital Signal Processor (DSP, Digital Signal Processor), Programmable Logic Device (PLD, Programmable Logic Device (PLD, Programmable Logic Device). Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • PLD Programmable Logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • the video information processing apparatus provided by the embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 201, and the software modules may be located in a storage medium , the storage medium is located in the memory 202, the processor 201 reads the executable instructions included in the software modules in the memory 202, and combines necessary hardware (for example, including the processor 201 and other components connected to the bus 205) to complete the embodiments of the present application. video information processing method.
  • the processor 201 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., wherein the general-purpose processing
  • the processor may be a microprocessor or any conventional processor or the like.
  • the apparatus provided by this embodiment of the present application may be directly executed by a processor 201 in the form of a hardware decoding processor, for example, by one or more ASICs. , DSP, PLD, CPLD, FPGA, or other electronic components to implement the video information processing method provided by the embodiments of the present application.
  • the memory 202 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device 20 .
  • Examples of these data include: any executable instructions for operating on the electronic device 20, such as executable instructions, and the program implementing the video information processing method of the embodiment of the present application may be included in the executable instructions.
  • the video information processing apparatus provided by the embodiments of the present application may be implemented in software.
  • FIG. 2 shows the video information processing apparatus 2020 stored in the memory 202, which may be software in the form of programs and plug-ins. , and includes a series of modules.
  • the program stored in the memory 202 it may include a video information processing apparatus 2020 , and the video information processing apparatus 2020 includes the following software modules: an information transmission module 2081 and an information processing module 2082 .
  • the software module in the video information processing apparatus 2020 is read into the RAM by the processor 201 and executed, the video information processing method provided by the embodiment of the present application will be implemented.
  • Embodiments of the present application also provide a computer program product or computer program, where the computer program product or computer program includes computer instructions (executable instructions), and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the video information processing method provided by the embodiment of the present application.
  • FIG. 3 is a schematic flowchart of the video information processing method provided by the embodiment of the present application.
  • the steps shown can be executed by various electronic devices running the video information processing apparatus, such as a dedicated terminal with video information processing function, a server or a server cluster. The steps shown in FIG. 3 will be described below.
  • Step 301 The video information processing apparatus determines a video image frame set corresponding to the video.
  • a video is acquired, and a set of video image frames corresponding to the video is determined.
  • the set of video image frames may include some image frames in the video, or may include all image frames in the video.
  • the image frames in the video image frame set include dynamic splicing areas and static splicing areas.
  • Fig. 4 shows different image frames in one video, namely image frame 41 and image frame 42;
  • Fig. 5 shows different image frames in another video, namely image frame 51 and image frame 42; Image frame 52 .
  • the dynamic splicing area which is the main content of the video and can be used to determine whether the videos are similar.
  • the image area that does not change with the time axis of the video during playback is the static splicing area.
  • the static splicing area is too large or covered, it often causes misjudgment of whether the videos are similar.
  • the upper frame area and the lower frame area are static splicing areas
  • the middle frame area is a dynamic splicing area. From the examples in Figure 4 and Figure 5, it can be found that the area of the static splicing area should account for more than 30% of the area of the entire image frame. It can be seen that the useless static splicing area in the video must have great interference with the video similarity comparison.
  • the static splicing area can be identified and cropped, and the real main content (dynamic splicing area) of the video can be stripped, so as to make the similarity identification more accurate, which will be described later.
  • determining the set of video image frames corresponding to the video may be implemented in the following manner:
  • Step 302 The video information processing apparatus determines the static splicing area corresponding to the image frame in the video image frame set.
  • a static mosaic area corresponding to the image frame is determined.
  • determining the static splicing area corresponding to the image frame in the video image frame set may be implemented in the following manner:
  • FIG. 6 is a schematic diagram of a process of determining a static splicing area in an embodiment of the present application, wherein the identification of the dynamic and static splicing area (including the dynamic splicing area and the static splicing area) in the video image frame set can be performed by the dynamic and static splicing area.
  • the splicing identification and positioning module (encapsulated with a static and dynamic identification algorithm) is implemented.
  • the dynamic and static splicing identification and positioning module can identify image frames based on the dynamic and static identification algorithm.
  • the input of this module is the image frame extracted by the video in chronological order, and the output is an image.
  • the vertical size parameters eg, including the width of the upper area and the width of the lower area
  • the horizontal size parameters eg, including the width of the left area and the width of the right area
  • the process of processing image frames by the motion and stillness recognition algorithm is shown in Figure 6. It is assumed that the number of image frames in the video image frame set is n, and the size parameters of all image frames are the same (the width is the horizontal size of w pixels, and the height is the vertical size).
  • the example process is as follows: 1) Grayscale processing: convert the image frame into a grayscale image, for the convenience of representation, the grayscale image corresponding to all the image frames in the video image frame set is marked as M n ⁇ w ⁇ h ; 2) Sampling: For all grayscale images, k columns of pixels are sampled at equal intervals in the horizontal direction, and the sampling result is denoted as W n ⁇ k ⁇ h ; the pixels in k rows are also sampled at equal intervals in the longitudinal direction, and the sampling result is denoted as H n ⁇ w ⁇ k ; 3) Calculate the time-dimensional variance sum (that is, the time-dimensional variance sum): Calculate the variance of W n ⁇ k ⁇ h in the 0th dimension (ie, the n dimension), and then find the variance along the first dimension (ie, the k dimension).
  • the result vector is recorded as var n ;
  • Detection margin for var h , Traverse from front to back and back to front respectively, find the first time-dimensional variance and the top and bottom positions greater than the threshold T, and use them as the top and bottom margins of the static splicing area of the video, where the top margin is the upper area.
  • the bottom margin is the width of the area below; similarly, for var w , traverse from front to back and back to front, find the first time-dimensional variance and the position left and right greater than the threshold T, and use them as the video's
  • Step 303 The video information processing apparatus performs cropping processing on the image frames in the video image frame set according to the static splicing area, and determines the image feature vector based on the corresponding cropping result.
  • the image frame is cropped according to the static splicing area of the image frame identified in step 302 .
  • the cropping process refers to cropping out the static splicing area in the image frame.
  • the static splicing area and the dynamic splicing area can be accurately separated, that is, only the dynamic splicing area is retained in the cropping result (ie, the cropped image frame).
  • the image frame in the video image frame set is cropped, which can be implemented in the following ways:
  • the dynamic splicing area is cropped from the image frame according to the size parameter of the dynamic splicing area, as the cropped image frame.
  • FIG. 7A is a schematic diagram of cropping an image frame in an embodiment of the present application, wherein the cropping process may be implemented by a cropping module running a cropping algorithm.
  • An example process is as follows: frame is cropped.
  • the input of the cropping module is the image frame, the size parameters of the identified static stitching area (including the left margin left, the top margin top, the right margin right and the bottom margin bottom), and the output of the cropping module is the cropped image frame (height is heightC , the width is widthC), the cropped image frame only includes the dynamic stitching area.
  • the width of the image frame before cropping may be defined as width, and the height may be defined as hight.
  • the upper left corner of the image frame before cropping is set as the vertex of the coordinate axis (0, 0), and the coordinates of the upper left corner of the dynamic stitching area in the image frame before cropping are (x 0 , y 0 ).
  • the coordinates of the lower right corner are (x 1 , y 1 ).
  • an image feature vector is determined based on a corresponding cropping result: a new video image frame set is constructed according to the cropped multiple image frames, wherein the image frames in the new video image frame set include: The dynamic splicing area is different from the static splicing area; the image feature vector corresponding to the image frame in the new video image frame set is determined.
  • the cropped image frame only includes the dynamic stitching area that is different from the static stitching area
  • a new video image frame set can be constructed according to all the cropped image frames, wherein the image frames in the new video image frame set are The order may be consistent with the order of the image frames in the original set of video image frames.
  • an image feature vector corresponding to the image frame can be determined.
  • determining the image feature vector corresponding to the image frame in the new video image frame set may be implemented in the following manner:
  • the convolution layer and the maximum pooling layer of the video information processing model are used to process the intersection of different image frames in the new video image frame set, and the down-sampling results of different image frames are obtained;
  • the sampling result is normalized to obtain the normalized result;
  • the normalized result is deeply decomposed by the video information processing model, and the image feature vector matching different image frames is obtained.
  • FIG. 7B is a schematic diagram of the processing process of the video information processing model in the embodiment of the present application, wherein, the image frames in the new video image frame set can be 01 vectorized by the vectorization model, and the example process is as follows:
  • the image frames in the new video image frame set are input to the video information processing model to convert the image frames into image feature vectors represented by 01 vectors.
  • the input of the vectorization module is the image frame in the new video image frame set, and the output is a 1024-dimensional 01 feature vector (x 0 , x 1 Vietnamese x 1023 )xi ⁇ ⁇ 0, 1 ⁇ , the 1024-dimensional
  • the feature vector of is the image feature vector corresponding to the image frame (matching image feature vector).
  • FIG. 8 is a schematic flowchart of training a video information processing model provided by an embodiment of the present application. It is understood that the steps shown in FIG. 8 may be executed by various electronic devices running the video information processing apparatus, such as It can be, for example, a dedicated terminal, server or server cluster with a video information processing function, wherein the trained video information processing model can be deployed in the server to detect the similarity of the videos uploaded by the terminal (client), Thereby, it is determined whether to perform compliance detection on the copyright information of the video.
  • the trained video information processing model can also be deployed in the terminal. The description will be made in conjunction with the respective steps shown in FIG. 8 .
  • Step 801 Obtain a first training sample set, wherein the first training sample set is a video processing sample with noise obtained from historical data.
  • Step 802 Perform denoising processing on the first training sample set to obtain a corresponding second training sample set.
  • negative example processing may also be performed on the second training sample set to obtain a negative example sample set corresponding to the second training sample set, wherein the negative example sample set is used to adjust the video information processing model
  • the corresponding network parameters are determined according to the negative example sample set, wherein the value of the edge loss function is used as a supervision parameter to evaluate the processing results of the video information processing model.
  • the video information processing model of the embodiment of the present application can be constructed based on the Siamese framework, so as to avoid problems such as the linear transformation matrix parameters existing in the classification network increase linearly with the number of categories, and the learned features have insufficient discriminative ability for open set problems.
  • the positive sample pair used may include at least one of the real repeated images in the video information processing process, the repeated video frame drawing, and the image pairs obtained by different data enhancement processes.
  • the image pair of the same classification and random matching can be used as a negative sample pair, and the margin-based loss function can be used for training.
  • the margin-based loss function has the flexibility of triplet loss while maintaining the threshold judgment.
  • Hash Net can be used for reference.
  • the network adopts an activation method that gradually approaches the symbolic function with the training process. In this way, it avoids that the neurons are in the saturation region in the early stage of training and cannot be back-propagated for learning.
  • performing negative example processing on the second training sample set can be implemented in at least one of the following ways:
  • the image feature vectors in the video information processing model are randomly combined to obtain a negative sample set corresponding to the second training sample set; the image feature vector of the video information processing model is randomly deleted to obtain the second training sample set. A corresponding set of negative samples; performing replacement processing on the image feature vector of the video information processing model to obtain a set of negative samples corresponding to the second training sample set.
  • Step 803 Process the second training sample set by the video information processing model to determine the initial parameters of the video information processing model.
  • Step 804 In response to the initial parameters of the video information processing model, the second training sample set is processed by the video information processing model to determine the update parameters of the video information processing model.
  • Step 805 According to the update parameters of the video information processing model, iteratively update the network parameters of the video information processing model through the second training sample set; wherein, the updated video information processing model is used to determine the image feature vector.
  • the trained video information processing model can be deployed in an electronic device (for example, a video server or a cloud server for video similarity judgment), so as to continue to perform the subsequent step 304 to realize the video similarity judgment.
  • an electronic device for example, a video server or a cloud server for video similarity judgment
  • Step 304 The video information processing apparatus determines the similarity between the first video and the second video based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video.
  • the first video and the second video can be determined based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video. Similarity between.
  • the image feature vector corresponding to the video refers to the image feature vector corresponding to the image frame in the video.
  • the image feature vector corresponding to the video can be pre-calculated and stored in the corresponding video server, such as the video verified by copyright information; of course, the image feature vector corresponding to the video can also be calculated in real time, such as the received video uploaded by the user .
  • determining the similarity between the first video and the second video based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video can be achieved in the following ways:
  • the similarity between each image frame in the video image frame set corresponding to the first video and the image feature vector corresponding to each image frame in the video image frame set corresponding to the second video determine the corresponding inter-frame similarity parameter set;
  • the number of image frames that reach the similarity threshold is determined according to the set of inter-frame similarity parameters;
  • the similarity between the first video and the second video is determined based on the number of image frames that reach the similarity threshold.
  • the similarity on the image feature vector of the image frame and each image frame in the video image frame set corresponding to the second video may be determined respectively. Then, an inter-frame similarity parameter set is constructed according to all the obtained similarities.
  • the similarity between the first video and the second video can be determined by a similarity calculation module.
  • the input of the similarity calculation module is the image feature vector to be compared, and the output is the similarity simscore between the two videos.
  • the parameter set of Hamming distance ⁇ sim 00 , sim 01 , sim 02 ,...,sim m-1n-1 ⁇ is taken as an example, where sim 00 represents the corresponding value of the first video.
  • the number of image frames whose similarity reaches a similarity threshold is determined according to the set of inter-frame similarity parameters (the number of image frames here may refer to the number of image frame pairs). Based on the number of image frames reaching the similarity threshold, the similarity between the first video and the second video may be determined, wherein the greater the number of image frames reaching the similarity threshold, the greater the similarity between the first video and the second video.
  • determining the similarity between the first video and the second video based on the number of image frames that reach the similarity threshold may be implemented in the following manner:
  • the number of image frames in the set of video image frames corresponding to the first video, and the number of image frames in the set of video image frames corresponding to the second video determine the difference between the first video and the second video. similarity.
  • product processing may be performed on the number of image frames in the set of video image frames corresponding to the first video and the number of image frames in the set of video image frames corresponding to the second video to obtain a product result. Then, the number of image frames reaching the similarity threshold is divided by the product result to obtain the similarity between the first video and the second video.
  • the parameter set can be traversed, and the traversed Hamming distance can be determined Whether it is less than the specified threshold (less than the specified threshold is similar, the specified threshold is determined according to the actual business scenario), and the final number of similar images C (that is, the number of image frames that reach the similarity threshold) is obtained.
  • the final similarity simscore between the first video and the second video can be calculated by this formula:
  • FIG. 9 is a schematic diagram of a process of video similarity judgment in an embodiment of the present application, which specifically includes:
  • Step 901 Denote the input video image frame set corresponding to the first video as A, and the input video image frame set corresponding to the second video as B.
  • Step 902 Perform static splicing region identification on both A and B.
  • Step 903 perform cropping processing on A according to the static stitching area corresponding to A, and denote the set of video image frames after the cropping processing as A'; perform cropping processing on B according to the static stitching area corresponding to B, and convert the cropped
  • the set of video image frames is denoted as B'.
  • Step 904 Perform 01 vector transformation on all image frames in A' to obtain the XA set (that is, the set formed by the image feature vectors corresponding to all image frames in A', the same is true below); for all image frames in B' The image frame is transformed into 01 vector to obtain the XB set.
  • Step 905 Calculate the simscore according to the XA set and the XB set, and judge whether the first video and the second video are similar according to the simscore, if they are similar, execute step 906 , otherwise, execute step 907 .
  • Step 906 Outputting that the first video is similar to the second video.
  • Step 907 Outputting that the first video is not similar to the second video.
  • the copyright information of the first video and the copyright information of the second video are obtained; Copyright information to determine the legitimacy of the first video; when the copyright information of the first video and the copyright information of the second video are inconsistent, a warning message is issued.
  • the copyright information of the first video and the copyright information of the second video can be obtained, and the copyright information of the first video can be obtained through the copyright information of the first video. and copyright information of the second video to determine the legitimacy of the first video.
  • the copyright information of the first video is consistent with the copyright information of the second video, it is determined that the first video is legal; when the copyright information of the first video and the copyright information of the second video are inconsistent, it is determined that the first video is legal.
  • the first video is illegal.
  • warning information may also be issued.
  • the legality of the second video may also be determined on the premise that the first video is legal by default.
  • the first video when it is determined that the first video and the second video are dissimilar, the first video is added to the video source; the recall order of all videos to be recommended in the video source is sorted; The ranking result of the recall order of the recommended videos is used to recommend videos to the target user.
  • the first video can be added to the video source as the video to be recommended in the video source, and of course, the second video can also be added to the video source here .
  • video recommendation needs to be performed, the recall order of all videos to be recommended in the video source can be sorted, and video recommendation is performed to the target user based on the sorting result of the recall order of the videos to be recommended.
  • the identification, copyright information and image feature vector of the video can be stored in the blockchain network or cloud server, so as to facilitate the realization of video similarity and legality. sexual judgment.
  • the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video may also be sent to the blockchain network, so that the nodes of the blockchain network will The image feature vector and the copyright information of the video are filled into the new block, and when a consensus is reached on the new block, the new block is appended to the end of the blockchain.
  • the method further includes:
  • Receive data synchronization requests from other nodes in the blockchain network verify the permissions of other nodes in response to the data synchronization requests; when the permissions of other nodes pass the verification, control the data synchronization between the current node and other nodes to Enable other nodes to obtain the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video.
  • the method further includes: in response to the query request, parsing the query request to obtain a corresponding object identifier (such as a user identifier); and obtaining the authority in the target block in the blockchain network according to the object identifier information; verify the matching of the rights information and the object identification; when the rights information matches the object identification, obtain the corresponding video identification, the image feature vector corresponding to the video and the copyright information of the video in the blockchain network; Push the obtained identification of the corresponding video, the image feature vector corresponding to the video, and the copyright information of the video to the corresponding client, so that the client can obtain the identification of the corresponding video saved in the blockchain network, and the corresponding video The image feature vector and the copyright information of the video.
  • a corresponding object identifier such as a user identifier
  • FIG. 10 is a schematic diagram of the architecture of the video information processing apparatus 100 provided by the embodiment of the present application, including a blockchain network 200 (including a plurality of consensus nodes, the consensus node 210 is exemplarily shown in FIG. 10 ), authentication The center 300 , the business entity 400 and the business entity 500 will be described separately below.
  • a blockchain network 200 including a plurality of consensus nodes, the consensus node 210 is exemplarily shown in FIG. 10
  • the center 300 the business entity 400 and the business entity 500 will be described separately below.
  • the type of the blockchain network 200 is flexible and diverse, for example, it can be any one of a public chain, a private chain or a consortium chain.
  • the electronic equipment of any business entity such as user terminals and servers, can access the blockchain network 200 without authorization; taking the alliance chain as an example, the business entity will govern after obtaining authorization.
  • the electronic device (for example, a terminal/server) can access the blockchain network 200, and at this time, it becomes a client node in the blockchain network 200.
  • the client node can only serve as an observer of the blockchain network 200, that is, to provide the function of supporting business entities to initiate transactions (for example, for storing data on the chain or querying data on the chain), for the blockchain network
  • the functions of the consensus node 210 of 200 can be implemented by the client node by default or selectively (eg, depending on the specific business needs of the business entity). Therefore, the data and business processing logic of the business entity can be migrated to the blockchain network 200 to the greatest extent, and the trustworthiness and traceability of the data and business processing process can be realized through the blockchain network 200 .
  • the consensus node in the blockchain network 200 receives client nodes from different business subjects (for example, the business subject 400 and the business subject 500 shown in the previous implementation)
  • the client node 410 of 400 and the client node 510 belonging to the database operator system submit the transaction, execute the transaction to update the ledger or query the ledger, and various intermediate results or final results of the executed transaction can be returned to the client of the business entity. displayed in the node.
  • client nodes 410/510 can subscribe to events of interest in the blockchain network 200, such as transactions occurring in a specific organization/channel in the blockchain network 200, and the consensus node 210 pushes corresponding transaction notifications to the client node 410/510, thereby triggering the corresponding business logic in the client node 410/510.
  • the following describes an exemplary application of the blockchain network by taking multiple business entities accessing the blockchain network to realize the management of video-related information as an example.
  • a plurality of business entities involved in the management process such as the business entity 400 may be a video information processing device, and the business entity 500 may be a display system with a video information processing function.
  • Certificate the digital certificate includes the public key of the business subject, and the digital signature signed by the certification center 300 on the public key of the business subject and the identity information, which is used to attach to the transaction together with the digital signature of the business subject for the transaction, and is sent to the transaction.
  • the blockchain network allows the blockchain network to extract the digital certificate and signature from the transaction, to verify the reliability of the message (that is, whether it has not been tampered with) and the identity information of the business entity sending the message. Validation, such as permission to initiate transactions.
  • Clients running on electronic devices (such as terminals or servers) under the jurisdiction of the business entity can request access to the blockchain network 200 to become client nodes.
  • the client node 410 of the business entity 400 is used to send the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video to the blockchain network, so that the nodes of the blockchain network can send the identification of the video and the image corresponding to the video to the blockchain network.
  • the feature vector and the copyright information of the video are filled into the new block, and when a consensus is reached on the new block, the new block is appended to the end of the blockchain.
  • the identification of the corresponding video, the image feature vector corresponding to the video, and the copyright information of the video are sent to the blockchain network 200, and business logic can be set in the client node 410 in advance. For example, taking the first video as an example, when it is determined When the first video is not similar to the second video, the client node 410 automatically sends the identifier of the first video to be processed, the image feature vector corresponding to the first video, and the copyright information of the first video to the blockchain network 200, and also The business personnel of the business entity 400 can log in in the client node 410, manually package the identification of the first video, the image feature vector corresponding to the first video, and the copyright information of the first video, and send them to the blockchain network 200. .
  • the client node 410 When sending, the client node 410 generates a transaction corresponding to the update operation according to the identifier of the first video, the image feature vector corresponding to the first video, and the copyright information of the first video, and specifies the smart contract that needs to be called to implement the update operation in the transaction. , and the parameters passed to the smart contract, the transaction also carries the digital certificate of the client node 410 and the signed digital signature (for example, using the private key in the digital certificate of the client node 410 to encrypt the transaction summary), And broadcast the transaction to consensus nodes 210 in the blockchain network 200.
  • the consensus node 210 in the blockchain network 200 receives the transaction, it verifies the digital certificate and digital signature carried by the transaction. After the verification is successful, it confirms whether the business subject 400 has the identity of the business subject 400 carried in the transaction. Any verification judgment of transaction authority, digital signature and authority verification will cause the transaction to fail. After the verification is successful, the digital signature of the consensus node 210 is signed (for example, obtained by encrypting the transaction digest with the private key of the consensus node 210 ), and it continues to be broadcast in the blockchain network 200 .
  • the consensus node 210 in the blockchain network 200 After receiving the successfully verified transaction, the consensus node 210 in the blockchain network 200 fills the transaction into a new block and broadcasts it. When the consensus node 210 in the blockchain network 200 broadcasts a new block, it will perform a consensus process on the new block. If the consensus is successful, the new block will be appended to the end of the blockchain stored by itself, and updated according to the result of the transaction.
  • the state database executes the transaction in the new block: for the transaction that submits and updates the identity of the first video to be processed, the image feature vector corresponding to the first video and the copyright information of the first video, in the state database, add a transaction that includes the first video. A key-value pair of the identifier, the image feature vector corresponding to the first video, and the copyright information of the first video.
  • the business personnel of the business entity 500 logs in in the client node 510, and inputs a query request for the identifier of the first video, the image feature vector corresponding to the first video, and the copyright information of the first video, and the client node 510 generates a query request according to the query request.
  • the smart contract to be called to realize the update operation/query operation and the parameters passed to the smart contract are specified in the transaction, and the transaction also carries the digital certificate of the client node 510 and the signed digital signature.
  • the digest of the transaction is encrypted using the private key in the digital certificate of the client node 510
  • the transaction is broadcast to the consensus node 210 in the blockchain network 200 .
  • the consensus node 210 in the blockchain network 200 receives the transaction, verifies the transaction, fills the block and agrees with the consensus, and appends the filled new block to the end of the blockchain stored by itself, and according to the result of the transaction Update the state database, and execute the transaction in the new block: for the submitted transaction to update the copyright information of a certain video, update the key-value pair corresponding to the copyright information of the video in the state database; for the submitted transaction to query the copyright information of a certain video For transaction, the identifier of the first video, the image feature vector corresponding to the first video, and the key-value pair corresponding to the copyright information of the first video are queried from the state database, and the transaction result is returned.
  • FIG. 10 exemplarily shows the process of directly uploading the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video, but in other embodiments, for the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video.
  • the client node 410 may upload the identification of the video, the image feature vector corresponding to the video, and the hash of the copyright information of the video in pairs, and upload the data to the chain.
  • the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video are stored in a distributed file system or database.
  • the client node 510 After the client node 510 obtains the identification of the video, the image feature vector corresponding to the video and the copyright information of the video from the distributed file system or database, it can be verified in combination with the corresponding hash in the blockchain network 200, thereby reducing the number of uploads to the chain.
  • the workload of the operation After the client node 510 obtains the identification of the video, the image feature vector corresponding to the video and the copyright information of the video from the distributed file system or database, it can be verified in combination with the corresponding hash in the blockchain network 200, thereby reducing the number of uploads to the chain. The workload of the operation.
  • FIG. 11 is a schematic structural diagram of a blockchain in the blockchain network 200 provided by an embodiment of the present application.
  • the header of each block may include the hash of all transactions in the block.
  • the hash value also contains the hash value of all transactions in the previous block.
  • the record of the newly generated transaction is filled into the block and after the consensus of the nodes in the blockchain network, it will be appended to the end of the blockchain Thus, chain growth is formed, and the chain structure based on the hash value between blocks ensures the tamper-proof and anti-forgery of transactions in the block.
  • FIG. 12 is a schematic diagram of the functional architecture of the blockchain network 200 provided by the embodiments of the present application, including an application layer 201 and a consensus layer 202 , the network layer 203 , the data layer 204 and the resource layer 205 , which will be described separately below.
  • the resource layer 205 encapsulates the computing resources, storage resources and communication resources for realizing each consensus node 210 in the blockchain network 200 .
  • the data layer 204 encapsulates various data structures that implement the ledger, including blockchains implemented as files in a file system, key-value state databases, and proofs of existence (eg, hash trees of transactions in blocks).
  • the network layer 203 encapsulates the functions of point-to-point (P2P, Point to Point) network protocol, data dissemination mechanism and data verification mechanism, access authentication mechanism and business subject identity management.
  • P2P Point to Point
  • the P2P network protocol realizes the communication between the consensus nodes 210 in the blockchain network 200
  • the data dissemination mechanism ensures the dissemination of transactions in the blockchain network 200
  • the data verification mechanism is used based on cryptography methods (such as digital certificates, digital signature, public/private key pair) to achieve the reliability of data transmission between consensus nodes 210
  • the access authentication mechanism is used to authenticate the identity of the business subject joining the blockchain network 200 according to the actual business scenario, and to authenticate
  • the authority to access the blockchain network 200 is given to the business entity
  • the business entity identity management is used to store the identity and authority of the business entity allowed to access the blockchain network 200 (for example, the types of transactions that can be initiated).
  • the consensus layer 202 encapsulates a mechanism (ie, a consensus mechanism) for consensus nodes 210 in the blockchain network 200 to reach consensus on blocks, and functions of transaction management and ledger management.
  • the consensus mechanism includes consensus algorithms such as POS, POW, and DPOS, and supports the pluggability of consensus algorithms.
  • the transaction management is used to verify the digital signature carried in the transaction received by the consensus node 210, verify the identity information of the business entity, and determine whether it has the authority to conduct transactions according to the identity information (read relevant information from the business entity identity management); For business entities authorized to access the blockchain network 200, they all have digital certificates issued by the certification center. The business entities use the private key in their digital certificates to sign the submitted transactions, thereby declaring their legal identity.
  • Ledger management is used to maintain the blockchain and state database.
  • For the consensus block append it to the end of the blockchain; execute the transaction in the consensus block, update the key-value pair in the state database when the transaction includes an update operation, and query the state database when the transaction includes a query operation and returns the query result to the client node of the business principal.
  • Supports query operations in various dimensions of the state database including: querying blocks according to block vector numbers (such as transaction hash values); querying blocks according to block hash values; querying blocks according to transaction vector numbers; Query the transaction by the transaction vector number; query the account data of the business entity according to the account number (vector number) of the business entity; query the blockchain in the channel according to the channel name.
  • the application layer 201 encapsulates various services that the blockchain network can implement, including transaction traceability, certificate storage, and verification.
  • the copyright information of the first video after similarity identification can be stored in the blockchain network, and when a new user uploads a video to the video server, the video server can call the copyright information in the blockchain network (this time The first video uploaded by the user can be used as the second video) to verify the copyright compliance of the video.
  • FIG. 13 is a schematic diagram of a usage scenario of the video information processing method provided by the embodiment of the application.
  • the terminals (such as the terminal 10-1 and the terminal 10-2) are provided with software clients capable of displaying corresponding short videos, such as short video
  • the client or plug-in for video playback the user can obtain and display the video through the corresponding client; the terminal connects to the short video server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
  • users can also upload videos through the terminal for other users in the network to watch.
  • the video server of the operator needs to detect the videos uploaded by the users, compare and analyze different video information, and determine the uploaded videos by the users. Check whether the copyright of the video is compliant, and recommend compliant videos to different users to prevent users’ short videos from being pirated.
  • the dynamic splicing areas of the two videos are similar, and the static splicing areas are dissimilar.
  • This type of video needs to be judged as similar in the information flow scene. Due to the interference of the static splicing area of the video, the related technology does not carry out special identification on this kind of video, and it may be determined that the video pair is dissimilar. If a large number of such duplicate videos appear in the news feed recommendation, it will lead to poor user experience and user complaints. In addition, in order to avoid copyright, many porters make different patches for the static splicing area in the video, so as to avoid similarity recognition. Negative impact.
  • the dynamic splicing areas of the two videos are dissimilar and the static splicing areas are similar. Such videos need to be judged as dissimilar in the information flow scene. Affected by the static splicing area of the video, related technologies may determine that the video is similar, and if the owner of the video in the information flow business publishes the same type of special effects video (same static splicing area) and is mistakenly identified as similar, it will lead to Not being recommended by the information stream will affect the income of the account owner and fans, which will cause complaints from the account owner and negatively affect the business.
  • FIG. 14 FIG. Schematic diagram of the use process, including the following steps:
  • Step 1401 Acquire two different video image frame sets, and identify the static splicing regions in the different video image frame sets.
  • Step 1402 Crop the set of video image frames according to the identified static splicing area to obtain a new set of video image frames.
  • Step 1403 Perform 01 vector transformation on all image frames in the new video image frame set to obtain a corresponding feature vector set (ie, a combination of image feature vectors), and perform similarity judgment based on the feature vector set to obtain the similarity between different videos. result.
  • a corresponding feature vector set ie, a combination of image feature vectors
  • Step 1404 Perform video recommendation or warning processing based on similar results among different videos.
  • FIG. 15 is a schematic diagram of image cropping performed by a video information processing method in an embodiment of the present application
  • FIG. 16 is a schematic diagram of image cropping performed by a video information processing method in an embodiment of the present application.
  • the two videos can be determined to be similar in the information flow scene (as shown in FIG. 16 ), so where Copyright infringement may occur in a certain video of , and the corresponding copyright certificate provided by the uploading user of the video is required for subsequent processing.
  • the two videos can be judged to be dissimilar in the information flow scene (as shown in Figure 15 ), and both can be regarded as videos to be recommended, and sent to the Different users in the network make video recommendations.
  • the software modules stored in the video information processing apparatus 2020 of the memory 202 may be Including: an information transmission module 2081, configured to determine a video image frame set corresponding to the video; an information processing module 2082, configured to determine a static splicing area corresponding to an image frame in the video image frame set; The image frames in the frame set are cropped, and the image feature vector is determined based on the corresponding cropping result; based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video, the similarity between the first video and the second video is determined.
  • an information transmission module 2081 configured to determine a video image frame set corresponding to the video
  • an information processing module 2082 configured to determine a static splicing area corresponding to an image frame in the video image frame set
  • the image frames in the frame set are cropped, and the image feature vector is determined based on the corresponding cropping result; based on the image feature vector corresponding to the first video and the image feature vector corresponding to the second video, the similarity between the
  • the information transmission module 2081 is further configured to: parse the video to obtain timing information of the video; parse the video parameters corresponding to the video according to the timing information of the video to obtain the playback duration parameter corresponding to the video and screen area parameters; based on the playback duration parameters and screen area parameters corresponding to the video, the video is extracted to obtain a set of video image frames corresponding to the video.
  • the information processing module 2082 is further configured to: perform grayscale processing on the image frames in the video image frame set; perform horizontal equal interval sampling processing and vertical equal interval sampling on the grayscale processed image frames processing; determine the time dimension variance sum of the horizontal equal interval sampling processing result of the image frame, and the time dimension variance sum of the vertical equal interval sampling processing result of the image frame; based on the time dimension variance sum of the horizontal equal interval sampling processing result of the image frame, Determine the vertical size parameter of the static splicing area of the image frame; determine the horizontal size parameter of the static splicing area of the image frame based on the time dimension variance sum of the longitudinal equal interval sampling processing results of the image frame.
  • the information processing module 2082 is further configured to: construct a new set of video image frames according to the cropped image frames, wherein the image frames in the new set of video image frames include images different from static stitching The dynamic splicing area of the area; determine the image feature vector corresponding to the image frame in the new video image frame set.
  • the information processing module 2082 is further configured to: process the intersection of different image frames in the new video image frame set through the convolution layer and the maximum value pooling layer of the video information processing model to obtain different image frames. Down-sampling results; normalize the down-sampling results through the fully connected layer of the video information processing model to obtain the normalized results; perform depth decomposition processing on the normalized results through the video information processing model to obtain the results that are similar to different image frames. matched image feature vector.
  • the information processing module 2082 is further configured to: determine the size of the dynamic splicing area corresponding to the image frame according to the size parameter of the image frame in the video image frame set and the size parameter of the static splicing area corresponding to the image frame parameter; wherein, the size parameter includes a vertical size parameter and a horizontal size parameter; according to the size parameter of the dynamic splicing area, the dynamic splicing area is cropped from the image frame to serve as the cropped image frame.
  • the information processing module 2082 is further configured to: obtain a first training sample set, wherein the first training sample set is a video processing sample with noise obtained through historical data; Denoising processing to obtain a corresponding second training sample set; processing the second training sample set through the video information processing model to determine the initial parameters of the video information processing model; in response to the initial parameters of the video information processing model, through the video information processing model The processing model processes the second training sample set to determine the update parameters of the video information processing model; according to the update parameters of the video information processing model, iteratively updates the network parameters of the video information processing model through the second training sample set; wherein, The updated video information processing model is used to determine image feature vectors.
  • the information processing module 2082 is further configured to: perform negative example processing on the second training sample set to obtain a negative example sample set corresponding to the second training sample set; determine the corresponding edge according to the negative example sample set Loss function value, where the edge loss function value is used as a supervision parameter to evaluate the processing result of the video information processing model.
  • the information processing module 2082 is further configured to perform at least one of the following processing: randomly combining the image feature vectors in the video information processing model to obtain a negative sample set corresponding to the second training sample set ; Perform random deletion processing on the image feature vector of the video information processing model to obtain a negative sample set corresponding to the second training sample set; Perform replacement processing on the image feature vector of the video information processing model to obtain the second training sample set. The corresponding negative sample set.
  • the information processing module 2082 is further configured to: according to the image features corresponding to each image frame in the video image frame set corresponding to the first video and each image frame in the video image frame set corresponding to the second video vector similarity, determine the corresponding set of inter-frame similarity parameters; determine the number of image frames that reach the similarity threshold according to the inter-frame similarity parameter set; determine the first video and the second video based on the number of image frames that reach the similarity threshold similarity.
  • the information processing module 2082 is further configured to: based on the number of image frames that reach the similarity threshold, the number of image frames in the set of video image frames corresponding to the first video, and the set of video image frames corresponding to the second video to determine the similarity between the first video and the second video.
  • the information processing module 2082 is further configured to: when it is determined according to the similarity that the first video is similar to the second video, obtain copyright information of the first video and copyright information of the second video; The copyright information of the first video and the copyright information of the second video are determined, and the legality of the first video is determined; when the copyright information of the first video and the copyright information of the second video are inconsistent, a warning message is issued.
  • the information processing module 2082 is further configured to: when it is determined according to the similarity that the first video is not similar to the second video, add the first video to the video source; Sort by the recall order of the video to be recommended; recommend videos to the target user based on the sorting result of the recall order of the videos to be recommended.
  • the information processing module 2082 is further configured to: send the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video to the blockchain network, so that the nodes of the blockchain network can send the identification of the video to the blockchain network.
  • the image feature vector corresponding to the video and the copyright information of the video are filled into the new block, and when the consensus on the new block is consistent, the new block is appended to the end of the blockchain.
  • the information processing module 2082 is further configured to: receive data synchronization requests from other nodes in the blockchain network; in response to the data synchronization request, verify the permissions of other nodes; when the permissions of other nodes pass the verification When , control the data synchronization between the current node and other nodes, so that other nodes can obtain the identification of the video, the image feature vector corresponding to the video, and the copyright information of the video.
  • the information processing module 2082 is further configured to: in response to the query request, parse the query request to obtain the corresponding object identifier; obtain permission information in the target block in the blockchain network according to the object identifier; The matching of the permission information and the object identification is checked; when the permission information matches the object identification, the identification of the video, the image feature vector corresponding to the video and the copyright information of the video are obtained in the blockchain network; The identifier of the video, the image feature vector corresponding to the video, and the copyright information of the video are sent to the corresponding client, so that the client can obtain the identifier of the video, the image feature vector corresponding to the video, and the copyright information of the video.
  • the present application distinguishes the dynamic splicing area and the static splicing area of the video, and performs cropping processing, and uses the obtained image feature vector to determine the similarity of the video, which improves the accuracy of video similarity judgment. , which reduces the misjudgment of video similarity caused by the occlusion and coverage of the static splicing area.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请提供了一种视频信息处理方法、装置、电子设备以及存储介质,方法包括:确定与视频对应的视频图像帧集合;确定视频图像帧集合中的图像帧所对应的静态拼接区域;根据静态拼接区域,对视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定第一视频与第二视频的相似度。

Description

一种视频信息处理方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为202010847943.3、申请日为2020年08月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及视频信息处理技术,尤其涉及视频信息处理方法、装置、电子设备及存储介质。
背景技术
传统技术条件下,视频信息的需求量呈现爆发式增长,传统的信息处理技术已无法应对数量众多的视频信息的相似度判断的需求。对视频之间的相似度进行描述对于人类来说非常简单,但对机器来说却是一项很难的任务,这需要机器能够跨越对图像理解的语义鸿沟,正确融合计算机视觉和自然语言处理这两项技术。
为此,引入了人工智能(AI,Artificial Intelligence)技术,其中,人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法和技术及应用系统。在视频识别处理领域中,可以通过利用数字计算机或者数字计算机控制的机器实现对视频的识别。目前,该方向的研究受到了广泛的关注,能够实现在安防、家居、医疗、教学等领域的有效应用,但是这一过程中,也存在视频相似度识别的准确性较低的缺陷。
发明内容
本申请实施例的技术方案是这样实现的:
本申请实施例提供了一种视频信息处理方法,由电子设备执行,所述方法包括:
确定与视频对应的视频图像帧集合;
确定所述视频图像帧集合中的图像帧所对应的静态拼接区域;
根据所述静态拼接区域,对所述视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;
基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定所述第一视频与所述第二视频的相似度。
本申请实施例还提供了一种视频信息处理装置,包括:
信息传输模块,配置为确定与视频对应的视频图像帧集合;
信息处理模块,配置为:
确定所述视频图像帧集合中的图像帧所对应的静态拼接区域;
根据所述静态拼接区域,对所述视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;
基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定所述第一视频与所述第二视频的相似度。
本申请实施例还提供了一种电子设备,所述电子设备包括:
存储器,用于存储可执行指令;
处理器,用于运行所述存储器存储的可执行指令时,实现前述的视频信息处理方法。
本申请实施例还提供了一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实现前述的视频信息处理方法。
附图说明
图1是本申请实施例提供的一种视频信息处理方法的使用环境示意图;
图2为本申请实施例提供的电子设备的组成结构示意图;
图3为本申请实施例提供的视频信息处理方法的流程示意图;
图4为本申请实施例中视频图像帧的动态拼接区域和静态拼接区域的示意图;
图5为本申请实施例中视频图像帧的动态拼接区域和静态拼接区域的示意图;
图6为本申请实施例中静态拼接区域的确定过程示意图;
图7A为本申请实施例中图像帧裁剪示意图;
图7B为本申请实施例中视频信息处理模型的处理过程示意图;
图8为本申请实施例提供的训练视频信息处理模型的流程示意图;
图9为本申请实施例中视频相似判断的过程示意图;
图10是本申请实施例提供的视频信息处理装置100的架构示意图;
图11是本申请实施例提供的区块链网络200中区块链的结构示意图;
图12是本申请实施例提供的区块链网络200的功能架构示意图;
图13为本申请实施例提供的视频信息处理方法的使用场景示意图;
图14为本申请实施例中视频信息处理方法的使用过程示意图;
图15为本申请实施例中视频信息处理方法进行图像裁剪的示意图;
图16为本申请实施例中视频信息处理方法进行图像裁剪的示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)响应于:用于表示所执行的操作所依赖的条件或者状态,当满足所依赖的条件或状态时,所执行的一个或多个操作可以是实时的,也可以具有设定的延迟;在没有特别说明的情况下,所执行的多个操作不存在执行先后顺序的限制。
2)视频:泛指互联网中可获取的各种形式的视频信息。后文的第一视频和第二视频的命名不同仅是为了便于区分。
3)客户端:终端中实现特定功能的载体,例如移动客户端(APP)是移动终端中特定功能的载体,功能如执行线上直播(视频推流)的功能或者是在线视频的播放功能。
4)动态拼接区域(视频动态区域):视频在播放过程中随着时间轴推移而改变的画面区域,此部分是视频的主体内容。
5)静态拼接区域(视频静态区域):视频在播放过程中随着时间轴推移不改变的画面区域。
6)信息流:按照特定规格样式的上下排布的一种内容组织形态。从展示排序角度而言,可以应用时间排序、热度排序或算法排序等方式。
7)图像特征向量:即图像01向量,是基于图像生成的二值化的特征向量。
8)动静拼接视频:视频在播放过程中存在部分不随着时间轴推移而改变的画面区域,即视频的图像帧同时包括动态拼接区域和静态拼接区域。
图1为本申请实施例提供的视频信息处理方法的使用场景示意图,参见图1,终端(如终端10-1和终端10-2)上设置有能够执行不同功能的客户端。终端(如终端10-1和终端10-2)可以利用客户端中的业务进程,通过网络300从相应的服务器200中获取不同的视频进行浏览,其中,终端通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合。其中,对终端(如终端10-1和终端10-2)通过网络300从相应的服务器200中所获取的视频类型并不限定,例如:终端(如终端10-1和终端10-2)既可以通过网络300从相应的服务器200中获取视频(即视频中携带视频信息或相应的视频链接),也可以通过网络300从相应的服务器200中获取仅包括文字或图像的相应视频进行浏览。服务器200可以保存有不同类型的视频。其中,本申请中不对不同类型的视频的编译环境进行区分。在这一过程中向用户的客户端推送的视频应该是版权合规的视频,因此对于数量众多的视频,需要判断哪些视频是相似的,并进一步地对相似视频的版权信息进行合规 检测。
以短视频为例,本申请实施例可以应用于短视频播放,在短视频播放过程中通常会对不同数据来源的不同短视频进行处理,最终在用户界面(UI,User Interface)上呈现出与相应的用户相对应的待推荐视频,如果推荐的视频是版权不合规的盗播视频,则会对用户体验造成不良影响。用于视频播放的后台数据库每天都会收到大量不同来源的视频数据,所得到的向目标用户进行视频推荐的不同视频还可以供其他应用程序调用(例如短视频推荐进程的推荐结果迁移至长视频推荐进程或者新闻推荐进程),当然,与相应的目标用户相匹配的视频信息处理模型也可以迁移至不同的视频推荐进程(例如网页视频推荐进程、小程序视频推荐进程或者长视频客户端的视频推荐进程)。
在一些实施例中,本申请实施例提供的视频信息处理方法可以由终端实现。例如,终端(如终端10-1和终端10-2)可以在本地实现视频信息处理的方案。
在一些实施例中,本申请实施例提供的视频信息处理方法可以由服务器实现。例如,服务器200可以实现视频信息处理的方案。
在一些实施例中,本申请实施例提供的视频信息处理方法可以由终端及服务器协同实现。例如,终端(如终端10-1和终端10-2)可以向服务器200发送请求,以请求服务器200实现视频信息处理的方案。服务器200可以将最终得到的待推荐视频发送至终端,以进行视频推荐。
下面对本申请实施例的电子设备的结构做详细说明,电子设备可以各种形式来实施,如带有视频信息处理功能的专用终端例如网关,也可以为带有视频信息处理功能的服务器,例如前述图1中的服务器200。图2为本申请实施例提供的电子设备的组成结构示意图,可以理解,图2仅仅示出了电子设备的示例性结构而非全部结构,根据需要可以实施图2示出的部分结构或全部结构。
本申请实施例提供的电子设备包括:至少一个处理器201、存储器202、用户接口203和至少一个网络接口204。电子设备20中的各个组件通过总线系统205耦合在一起。可以理解,总线系统205用于实现这些组件之间的连接通信。总线系统205除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统205。
其中,用户接口203可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。
可以理解,存储器202可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。本申请实施例中的存储器202能够存储数据以支持终端(如终端10-1和终端10-2)的操作。这些数据的示例包括:用于在终端(如终端10-1和终端10-2)上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。
在一些实施例中,本申请实施例提供的视频信息处理装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的视频信息处理装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的视频信息处理方法。例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、数字信号处理器(DSP,Digital Signal Processor)、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
作为本申请实施例提供的视频信息处理装置采用软硬件结合实施的示例,本申请实施例所提供的视频信息处理装置可以直接体现为由处理器201执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器202,处理器201读取存储器202中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器201以及连接到总线205的其他组件)完成本申请实施例提供的视频信息处理方法。
作为示例,处理器201可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
作为本申请实施例提供的视频信息处理装置采用硬件实施的示例,本申请实施例所提供的装置可以直接采用硬件译码处理器形式的处理器201来执行完成,例如,被一个或多个ASIC、DSP、PLD、CPLD、FPGA或其他电子元件执行实现本申请实施例提供的视频信息处理方法。
本申请实施例中的存储器202用于存储各种类型的数据以支持电子设备20的操作。这些数据的示例包括:用于在电子设备20上操作的任何可执行指令,如可执行指令,实现本申请实施例的视频信息处理方法的程序可以包含在可执行指令中。
在另一些实施例中,本申请实施例提供的视频信息处理装置可以采用软件方式实现,图2示出了存储在存储器202中的视频信息处理装置2020,其可以是程序和插件等形式的软件,并包括一系列的模块,作为存储器202中存储的程序的示例,可以包括视频信息处理装置2020,视频信息处理装置2020中包括以下的软件模块:信息传输模块2081,信息处理模块2082。当视频信息处理装置2020中的软件模块被处理器201读取到RAM中并执行时,将实现本申请实施例提供的视频信息处理方法。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令(可执行指令),该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行本申请实施例提供的视频信息处理方法。
将结合图2示出的电子设备20说明本申请实施例提供的视频信息处理方法,参见图3,图3为本申请实施例提供的视频信息处理方法的流程示意图,可以理解地,图3所示的步骤可以由运行视频信息处理装置的各种电子设备执行,例如可以是带有视频信息处理功能的专用终端、服务器或者服务器集群。下面针对图3示出的步骤进行说明。
步骤301:视频信息处理装置确定与视频对应的视频图像帧集合。
这里,获取视频,并确定与视频对应的视频图像帧集合。其中,视频图像帧集合可以包括视频中的部分图像帧,也可以包括视频中的所有图像帧。视频图像帧集合中的图像帧包括动态拼接区域和静态拼接区域。
例如,参考图4和图5,图4示出了一个视频中的不同图像帧,即图像帧41及图像帧42;图5示出了另一个视频中的不同图像帧,即图像帧51及图像帧52。在视频的播放过程中,随着时间轴推移而改变的画面区域即为动态拼接区域,此部分是视频的主体内容,可以用于判断视频之间是否相似。而视频在播放过程中随着时间轴推移不改变的画面区域即为静态拼接区域,当静态拼接区域的面积过大或者出现遮盖时,经常会造成对视频之间是否相似的误判。在图4和图5中,上部框线区域和下部框线区域为静态拼接区域,中部框线区域是动态拼接区域。通过图4和图5的示例可以发现,静态拼接区域的面积要占整个图像帧的面积的30%以上,可见视频中无用的静态拼接区域必然对于视频相似比对存在很大的干扰。本申请实施例可以对静态拼接区域进行识别及裁剪,将视频真正主体内容(动态拼接区域)剥离出来,从而使相似识别更加精准,将在后文进行阐述。
在本申请的一些实施例中,确定与视频对应的视频图像帧集合,可以通过以下方式实现:
对视频进行解析,得到视频的时序信息;根据视频的时序信息,对视频所对应的视频参数进行解析,得到与视频对应的播放时长参数与画面区域参数;基于视频对应的播放时长参数与画面区域参数,对视频进行抽取,得到视频对应的视频图像帧集合。
步骤302:视频信息处理装置确定视频图像帧集合中的图像帧所对应的静态拼接区域。
例如,对于视频图像帧集合中的每个图像帧,确定图像帧对应的静态拼接区域。
在本申请的一些实施例中,确定视频图像帧集合中的图像帧所对应的静态拼接区域,可以通过以下方式实现:
对视频图像帧集合中的图像帧进行灰度化处理;对经过灰度化处理的图像帧进行横向等间隔采样处理和纵向等间隔采样处理;确定图像帧的横向等间隔采样处理结果的时间维度方差和、以及图像帧的纵向等间隔采样处理结果的时间维度方差和;基于图像帧的横向等间隔采样处理结果的时间维度方差和,确定图像帧的静态拼接区域的纵向尺寸参数;基于图像帧的纵向等间隔采样处理结果的时间维度方差和,确定图像帧的静态拼接区域的横向尺寸参数。
例如,参考图6,图6为本申请实施例中静态拼接区域的确定过程示意图,其中,对视频图像帧集合中的动静拼接区域(包括动态拼接区域和静态拼接区域)进行的识别可以通过动静拼接识别定位模块(封装有动静识别算法)实现,其中,动静拼接识别定位模块可以基于动静识别算法进行图像帧的识别,该模块的输入是视频按时间顺序所抽取出的图像帧,输出是图像帧的静态拼接区域的纵向尺寸参数(如包括上方区域宽度和下方区域宽度)和横向尺寸参数(如包括左侧区域宽度和右侧区域宽度)。动静识别算法对图像帧进行处理的过程如图6所示,假设视频图像帧集合中的图像帧数量为n,且所有图像帧的尺寸参数相同(宽度即横向尺寸为w个像素,高度即纵向尺寸为h个像素),则示例过程如下:1)灰度化处理:将图像帧转化为灰度图,为了便于表示,将视频图像帧集合中的所有图像帧对应的灰度图记作M n×w×h;2)采样:对所有灰度图,横向等间隔采样k列像素,采样结果记作W n×k×h;纵向也等间隔采样k行像素,采样结果记作H n×w×k;3)计算时维方 差和(即时间维度方差和):计算W n×k×h在第0维(即n维度)的方差,再沿第1维(即k维度)求和,结果向量记作var n;同样地,计算H n×w×k第0维的方差,再沿第2维求和,结果向量记作var w;4)检测边距:对于var h,分别自前往后、自后往前遍历,找到第一个时维方差和大于阈值T的位置top、bottom,并分别作为视频的静态拼接区域的上、下边距,其中,上边距即为上方区域宽度,下边距即为下方区域宽度;同样地,对于var w,分别自前往后、自后往前遍历,找到第一个时维方差和大于阈值T的位置left、right,并分别作为视频的静态拼接区域的左、右边距,其中,左边距即为左侧区域宽度,右边距即为右侧区域宽度。
步骤303:视频信息处理装置根据静态拼接区域,对视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量。
例如,对于视频图像帧集合中的每个图像帧,根据步骤302所识别出的图像帧的静态拼接区域,对该图像帧进行裁剪处理。其中,裁剪处理是指裁剪掉图像帧中的静态拼接区域,如此,可以准确地分离静态拼接区域及动态拼接区域,即在裁剪结果(即裁剪后的图像帧)中仅保留动态拼接区域。
在本申请的一些实施例中,根据静态拼接区域,对视频图像帧集合中的图像帧进行裁剪处理,可以通过以下方式实现:
根据视频图像帧集合中的图像帧的尺寸参数、以及图像帧对应的静态拼接区域的尺寸参数,确定图像帧对应的动态拼接区域的尺寸参数;其中,尺寸参数包括纵向尺寸参数和横向尺寸参数;根据动态拼接区域的尺寸参数从图像帧中裁剪出动态拼接区域,以作为裁剪处理后的图像帧。
例如,参考图7A,图7A为本申请实施例中的图像帧裁剪示意图,其中,裁剪处理可以由运行裁剪算法的裁剪模块实现,示例过程如下:裁剪模块根据识别出的静态拼接区域,对图像帧进行裁剪。裁剪模块的输入是图像帧、识别出的静态拼接区域的尺寸参数(包括左边距left、上边距top、右边距right及下边距bottom),裁剪模块的输出是裁剪后的图像帧(高度为heightC,宽度为widthC),该裁剪后的图像帧仅包括动态拼接区域。
其中,可以将裁剪前的图像帧的宽度定义为width,高度定义为hight。将裁剪前的图像帧的左上角设为坐标轴的顶点(0,0)、该裁剪前的图像帧中的动态拼接区域的左上角坐标为(x 0,y 0),该动态拼接区域的右下角坐标为(x 1,y 1)。则根据裁剪前的图像帧中的静态拼接区域的坐标(根据静态拼接区域的尺寸参数可以得到相应的坐标),可计算得到动态拼接区域的坐标为:x 0=left,y 0=top,x1=width-right,y1=height-bottom。
在本申请的一些实施例中,基于相应的裁剪结果确定图像特征向量:根据裁剪处理后的多个图像帧,构建新的视频图像帧集合,其中,新的视频图像帧集合中的图像帧包括区别于静态拼接区域的动态拼接区域;确定新的视频图像帧集合中的图像帧对应的图像特征向量。
例如,裁剪处理后的图像帧仅包括区别于静态拼接区域的动态拼接区域,可以根据裁剪处理后的所有图像帧构建新的视频图像帧集合,其中,新的视频图像帧集合中的图像帧的顺序可以与原始的视频图像帧集合中的图像帧的顺序一致。
对于新的视频图像帧集合中的每个图像帧来说,可以确定图像帧对应的图像特征向量。
在本申请的一些实施例中,确定新的视频图像帧集合中的图像帧对应的图像特征向量,可以通过以下方式实现:
通过视频信息处理模型的卷积层和最大值池化层对新的视频图像帧集合中不同图像帧交叉进行处理,得到不同图像帧的降采样结果;通过视频信息处理模型的全连接层对降采样结果进行归一化处理,得到归一化结果;通过视频信息处理模型对归一化结果进行深度分解处理,得到与不同图像帧相匹配的图像特征向量。
例如,参考图7B,图7B为本申请实施例中视频信息处理模型的处理过程示意图,其中,可以通过向量化模型对新的视频图像帧集合中的图像帧进行01向量化,示例过程如下:将新的视频图像帧集合中的图像帧输入至视频信息处理模型,以将图像帧转化为通过01向量表示的图像特征向量。向量化模块的输入是新的视频图像帧集合中的图像帧,输出为1024维的01特征向量(x 0,x 1.....x 1023)xi∈{0,1},该1024维的特征向量即为图像帧对应的图像特征向量(相匹配的图像特征向量)。
在此之前,可以对视频信息处理模型进行训练。作为示例,参见图8,图8为本申请实施例提供的训练视频信息处理模型的流程示意图,可以理解地,图8所示的步骤可以由运行视频信息处理装置的各种电子设备执行,例如可以是如带有视频信息处理功能的专用终端、服务器或者服务器集群,其中,经过训练的视频信息处理模型可以部署在服务器中,用于对终端(客户端)上传的视频 的相似性进行检测,从而确定是否对视频的版权信息进行合规检测,当然,经过训练的视频信息处理模型也可以部署在终端中。将结合图8示出的各个步骤进行说明。
步骤801:获取第一训练样本集合,其中第一训练样本集合为通过历史数据所获取的带有噪声的视频处理样本。
步骤802:对第一训练样本集合进行去噪处理,得到相应的第二训练样本集合。
在本申请的一些实施例中,还可以对第二训练样本集合进行负例处理,得到与第二训练样本集合相对应的负例样本集合,其中,负例样本集合用于调整视频信息处理模型的网络参数;根据负例样本集合确定相应的边缘损失函数值,其中,边缘损失函数值用于作为监督参数对视频信息处理模型的处理结果进行评价。其中,本申请实施例的视频信息处理模型可以基于Siamese框架构建,从而避免分类网络存在的线性变换矩阵参数量随类别数线性增加,且所学习的特征对于开集问题判别能力不足等问题。
其中,在视频信息处理模型的训练阶段,所使用的正样本对可以包括视频信息处理过程中的真实重复图像、重复视频抽帧图、以及图像经过不同的数据增强处理得到的图像对中的至少之一,可以将相同分类和随机匹配的图像对作为负样本对,采用边缘损失函数(margin based loss)训练,边缘损失函数在保持阈值判断的同时,拥有triplet loss的灵活性。进一步地,为解决海量存储和快速匹配的业务要求,可以借鉴Hash Net思想,网络采用随训练过程逐渐逼近符号函数的激活方式,如此,既避免了训练初期神经元处于饱和区无法反向传播学习的困境,又避免了图像特征向量直接二值化量化所导致的误差大的问题。在降低32倍存储开销和距离计算速度加快300倍的情况下,准确率仅下降1.8%。当然在本申请的一些实施例中,还可以采用multi-task多任务学习的训练方式,多域学习,并引入GAN loss等优化,进一步提升模型准确率和召回率。
其中,对第二训练样本集合进行负例处理,可以通过以下至少之一的方式来实现:
将视频信息处理模型中的图像特征向量进行随机组合,得到与第二训练样本集合相对应的负例样本集合;对视频信息处理模型的图像特征向量进行随机删除处理,得到与第二训练样本集合相对应的负例样本集合;对视频信息处理模型的图像特征向量进行替换处理,得到与第二训练样本集合相对应的负例样本集合。
步骤803:通过视频信息处理模型对第二训练样本集合进行处理,以确定视频信息处理模型的初始参数。
步骤804:响应于视频信息处理模型的初始参数,通过视频信息处理模型对第二训练样本集合进行处理,以确定视频信息处理模型的更新参数。
步骤805:根据视频信息处理模型的更新参数,通过第二训练样本集合对视频信息处理模型的网络参数进行迭代更新;其中,更新后的视频信息处理模型用于确定图像特征向量。
例如,经过训练的视频信息处理模型可以部署在电子设备中(例如可以是视频服务器或者用于视频相似度判断的云服务器),以便继续执行后续步骤304,实现对视频相似度的判断。
步骤304:视频信息处理装置基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定第一视频与第二视频的相似度。
这里,对于任意两个视频来说(分别命名为第一视频和第二视频),可以基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定第一视频与第二视频之间的相似度。其中,视频对应的图像特征向量是指视频中的图像帧对应的图像特征向量。
其中,视频对应的图像特征向量可以预先计算,并保存在相应的视频服务器中,例如通过版权信息验证的视频;当然,视频对应的图像特征向量也可以实时计算,例如接收到的用户上传的视频。
在本申请的一些实施例中,基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定第一视频与第二视频的相似度,可以通过以下方式实现:
根据第一视频对应的视频图像帧集合中的每个图像帧与第二视频对应的视频图像帧集合中的每个图像帧对应图像特征向量的相似度,确定对应的帧间相似度参数集合;根据帧间相似度参数集合确定达到相似度阈值的图像帧数量;基于达到相似度阈值的图像帧数量,确定第一视频与第二视频的相似度。
这里,可以针对第一视频对应的视频图像帧集合中的每个图像帧,分别确定图像帧与第二视频对应的视频图像帧集合中的每个图像帧在图像特征向量上的相似度。然后,根据得到的所有相似度构建帧间相似度参数集合。
举例来说,可以通过相似度计算模块来确定第一视频与第二视频的相似度,该相似度计算模块的输入是需要比较的图像特征向量,输出是两视频间的相似度simscore。例如,需要比对的一对图 像特征向量分别为X=(x 0,x 1.....x 1023)xi∈{0,1}和Y=(y 0,y 1.....y 1023)yi∈{0,1},则可以确定这两个图像特征向量之间的汉明距离sim x,y,即sim x,y=Hamming(X,Y),汉明距离越小,表示相似度越大。
设第一视频对应的视频图像帧集合中的图像帧数量为m,第二视频对应的视频图像帧集合中的图像帧数量为n,则通过计算图像帧之间的相似度,可以得到一个帧间相似度参数集合,为了便于理解,这里以汉明距离的参数集合{sim 00,sim 01,sim 02,…,sim m-1n-1}为例,其中,sim 00表示第一视频对应的视频图像帧集合中的第一个图像帧与第二视频对应的视频图像帧集合中的第一个图像帧之间的汉明距离,以此类推。
根据帧间相似度参数集合确定相似度达到相似度阈值的图像帧数量(这里的图像帧数量可以是指图像帧对的数量)。基于达到相似度阈值的图像帧数量,可以确定第一视频与第二视频的相似度,其中,达到相似度阈值的图像帧数量越多,则第一视频与第二视频的相似度越大。
在本申请的一些实施例中,基于达到相似度阈值的图像帧数量,确定第一视频与第二视频的相似度,可以通过以下方式实现:
基于达到相似度阈值的图像帧数量、第一视频对应的视频图像帧集合中的图像帧数量、以及第二视频对应的视频图像帧集合中的图像帧数量,确定第一视频与第二视频的相似度。
例如,可以对第一视频对应的视频图像帧集合中的图像帧数量、以及第二视频对应的视频图像帧集合中的图像帧数量进行乘积处理,得到乘积结果。然后,将达到相似度阈值的图像帧数量除以该乘积结果,得到第一视频与第二视频的相似度。
以上述的用汉明距离表示的参数集合{sim 00,sim 01,sim 02,…,sim m-1n-1}为例,可以对该参数集合进行遍历处理,并确定遍历到的汉明距离是否小于指定阈值(小于指定阈值即为相似,该指定阈值根据实际业务场景而定),得到最终相似图像个数C(即达到相似度阈值的图像帧数量)。最终第一视频与第二视频之间的相似度simscore可通过此公式计算得到:
Figure PCTCN2021106552-appb-000001
继续参考图9,图9为本申请实施例中视频相似判断的过程示意图,具体包括:
步骤901:将输入的第一视频对应的视频图像帧集合记为A、第二视频对应的视频图像帧集合记为B。
步骤902:对A和B均进行静态拼接区域的识别。
步骤903:根据A对应的静态拼接区域对A进行裁剪处理,并将裁剪处理后的视频图像帧集合记为A';根据B对应的静态拼接区域对B进行裁剪处理,并将裁剪处理后的视频图像帧集合记为B'。
步骤904:对A'中的所有图像帧进行01向量转化,得到XA集合(即A'中的所有图像帧分别对应的图像特征向量所构成的集合,以下同理);对B'中的所有图像帧进行01向量转化,得到XB集合。
步骤905:根据XA集合和XB集合计算simscore,并根据simscore判断第一视频与第二视频之间是否相似,如果相似则执行步骤906,否则,执行步骤907。
步骤906:输出第一视频与第二视频相似。
步骤907:输出第一视频与第二视频不相似。
在本申请的一些实施例中,当确定第一视频与第二视频之间相似时,获取第一视频的版权信息和第二视频的版权信息;通过第一视频的版权信息和第二视频的版权信息,确定第一视频的合法性;当第一视频的版权信息和第二视频的版权信息不一致时,发出警示信息。
这里,当确定第一视频与第二视频之间相似时,证明可能存在版权侵权的风险,因此,可以获取第一视频的版权信息和第二视频的版权信息,并通过第一视频的版权信息和第二视频的版权信息确定第一视频的合法性。以第二视频默认合法为例,当第一视频的版权信息和第二视频的版权信息一致时,确定第一视频合法;当第一视频的版权信息和第二视频的版权信息不一致时,确定第一视频不合法。此外,当第一视频的版权信息和第二视频的版权信息不一致时,还可以发出警示信息。
当然,本申请实施例也可以在默认第一视频合法的前提下,确定第二视频的合法性。
在本申请的一些实施例中,当确定第一视频与第二视频之间不相似时,将第一视频添加至视频源;对视频源中的所有待推荐视频的召回顺序进行排序;基于待推荐视频的召回顺序的排序结果向 目标用户进行视频推荐。
这里,当确定第一视频与第二视频之间不相似时,可以将第一视频添加至视频源,以作为视频源中的待推荐视频,当然,这里也可以将第二视频添加至视频源。在需要进行视频推荐时,可以对视频源中的所有待推荐视频的召回顺序进行排序,并基于待推荐视频的召回顺序的排序结果向目标用户进行视频推荐。
在本申请实施例中,考虑到视频的量级较大,因此,可以将视频的标识、版权信息及图像特征向量保存在区块链网络或者云服务器中,从而便于实现对视频相似性及合法性的判断。
在本申请的一些实施例中,还可以将视频的标识、视频对应的图像特征向量和视频的版权信息发送至区块链网络,以使区块链网络的节点将视频的标识、视频对应的图像特征向量和视频的版权信息填充至新区块,且当对新区块共识一致时,将新区块追加至区块链的尾部。
在本申请的一些实施例中,方法还包括:
接收区块链网络中的其他节点的数据同步请求;响应于数据同步请求,对其他节点的权限进行验证;当其他节点的权限通过验证时,控制当前节点与其他节点之间进行数据同步,以使其他节点获取视频的标识、视频对应的图像特征向量和视频的版权信息。
在本申请的一些实施例中,方法还包括:响应于查询请求,解析查询请求以获取对应的对象标识(如用户标识);根据对象标识,获取区块链网络中的目标区块内的权限信息;对权限信息与对象标识的匹配性进行校验;当权限信息与对象标识相匹配时,在区块链网络中获取相应的视频的标识、视频对应的图像特征向量和视频的版权信息;将所获取的相应的视频的标识、视频对应的图像特征向量和视频的版权信息向相应的客户端进行推送,以使客户端获取区块链网络中所保存的相应的视频的标识、视频对应的图像特征向量和视频的版权信息。
继续参见图10,图10是本申请实施例提供的视频信息处理装置100的架构示意图,包括区块链网络200(包括多个共识节点,图10中示例性示出了共识节点210)、认证中心300、业务主体400和业务主体500,下面分别进行说明。
区块链网络200的类型是灵活多样的,例如可以为公有链、私有链或联盟链中的任意一种。以公有链为例,任何业务主体的电子设备例如用户终端和服务器,都可以在不需要授权的情况下接入区块链网络200;以联盟链为例,业务主体在获得授权后其下辖的电子设备(例如终端/服务器)可以接入区块链网络200,此时,成为区块链网络200中的客户端节点。
在一些实施例中,客户端节点可以只作为区块链网络200的观察者,即提供支持业务主体发起交易(例如,用于上链存储数据或查询链上数据)功能,对于区块链网络200的共识节点210的功能,例如排序功能、共识服务和账本功能等,客户端节点可以缺省或者有选择性(例如,取决于业务主体的具体业务需求)地实施。从而,可以将业务主体的数据和业务处理逻辑最大程度迁移到区块链网络200中,通过区块链网络200实现数据和业务处理过程的可信和可追溯。
区块链网络200中的共识节点接收来自不同业务主体(例如前序实施中示出的业务主体400和业务主体500)的客户端节点(例如,前序实施例中示出的归属于业务主体400的客户端节点410、以及归属于数据库运营商系统的客户端节点510)提交的交易,执行交易以更新账本或者查询账本,执行交易的各种中间结果或最终结果可以返回业务主体的客户端节点中显示。
例如,客户端节点410/510可以订阅区块链网络200中感兴趣的事件,例如区块链网络200中特定的组织/通道中发生的交易,由共识节点210推送相应的交易通知到客户端节点410/510,从而触发客户端节点410/510中相应的业务逻辑。
下面以多个业务主体接入区块链网络以实现对视频的相关信息的管理为例,说明区块链网络的示例性应用。
参见图10,管理环节涉及的多个业务主体,如业务主体400可以是视频信息处理装置,业务主体500可以是带有视频信息处理功能的显示系统,从认证中心300进行登记注册获得各自的数字证书,数字证书中包括业务主体的公钥、以及认证中心300对业务主体的公钥和身份信息签署的数字签名,用来与业务主体针对交易的数字签名一起附加到交易中,并被发送到区块链网络,以供区块链网络从交易中取出数字证书和签名,验证消息的可靠性(即是否未经篡改)和发送消息的业务主体的身份信息,区块链网络会根据身份进行验证,例如是否具有发起交易的权限。业务主体下辖的电子设备(例如终端或者服务器)运行的客户端都可以向区块链网络200请求接入而成为客户端节点。
业务主体400的客户端节点410用于将视频的标识、视频对应的图像特征向量和视频的版权信息发送至区块链网络,以使区块链网络的节点将视频的标识、视频对应的图像特征向量和视频的版 权信息填充至新区块,且当对新区块共识一致时,将新区块追加至区块链的尾部。
其中,将相应的视频的标识、视频对应的图像特征向量和视频的版权信息发送至区块链网络200,可以预先在客户端节点410设置业务逻辑,例如,以第一视频为例,当确定第一视频与第二视频不相似时,客户端节点410将待处理的第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息自动发送至区块链网络200,也可以由业务主体400的业务人员在客户端节点410中登录,手动打包第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息,并将其发送至区块链网络200。在发送时,客户端节点410根据第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息生成对应更新操作的交易,在交易中指定了实现更新操作需要调用的智能合约、以及向智能合约传递的参数,交易还携带了客户端节点410的数字证书、签署的数字签名(例如,使用客户端节点410的数字证书中的私钥,对交易的摘要进行加密得到),并将交易广播到区块链网络200中的共识节点210。
区块链网络200中的共识节点210中接收到交易时,对交易携带的数字证书和数字签名进行验证,验证成功后,根据交易中携带的业务主体400的身份,确认业务主体400是否是具有交易权限,数字签名和权限验证中的任何一个验证判断都将导致交易失败。验证成功后签署共识节点210自己的数字签名(例如,使用共识节点210的私钥对交易的摘要进行加密得到),并继续在区块链网络200中广播。
区块链网络200中的共识节点210接收到验证成功的交易后,将交易填充到新的区块中,并进行广播。区块链网络200中的共识节点210广播的新区块时,会对新区块进行共识过程,如果共识成功,则将新区块追加到自身所存储的区块链的尾部,并根据交易的结果更新状态数据库,执行新区块中的交易:对于提交更新待处理的第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息的交易,在状态数据库中添加包括第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息的键值对。
业务主体500的业务人员在客户端节点510中登录,输入针对第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息的查询请求,客户端节点510根据该查询请求生成对应更新操作/查询操作的交易,在交易中指定了实现更新操作/查询操作需要调用的智能合约、以及向智能合约传递的参数,交易还携带了客户端节点510的数字证书、签署的数字签名(例如,使用客户端节点510的数字证书中的私钥,对交易的摘要进行加密得到),并将交易广播到区块链网络200中的共识节点210。
区块链网络200中的共识节点210中接收到交易,对交易进行验证、区块填充及共识一致后,将填充的新区块追加到自身所存储的区块链的尾部,并根据交易的结果更新状态数据库,执行新区块中的交易:对于提交的更新某一视频的版权信息的交易,更新状态数据库中该视频的版权信息对应的键值对;对于提交的查询某个视频的版权信息的交易,从状态数据库中查询第一视频的标识、第一视频对应的图像特征向量和第一视频的版权信息对应的键值对,并返回交易结果。
值得说明的是,在图10中示例性地示出了将视频的标识、视频对应的图像特征向量和视频的版权信息直接上链的过程,但在另一些实施例中,对于视频的标识、视频对应的图像特征向量和视频的版权信息的数据量较大的情况,客户端节点410可将视频的标识、视频对应的图像特征向量和视频的版权信息的哈希进行成对上链,将视频的标识、视频对应的图像特征向量和视频的版权信息存储于分布式文件系统或数据库。客户端节点510从分布式文件系统或数据库获取到视频的标识、视频对应的图像特征向量和视频的版权信息后,可结合区块链网络200中对应的哈希进行校验,从而减少上链操作的工作量。
作为区块链的示例,参见图11,图11是本申请实施例提供的区块链网络200中区块链的结构示意图,每个区块的头部既可以包括区块中所有交易的哈希值,同时也包含前一个区块中所有交易的哈希值,新产生的交易的记录被填充到区块并经过区块链网络中节点的共识后,会被追加到区块链的尾部从而形成链式的增长,区块之间基于哈希值的链式结构保证了区块中交易的防篡改和防伪造。
下面说明本申请实施例提供的区块链网络的示例性的功能架构,参见图12,图12是本申请实施例提供的区块链网络200的功能架构示意图,包括应用层201、共识层202、网络层203、数据层204和资源层205,下面分别进行说明。
资源层205封装了实现区块链网络200中的各个共识节点210的计算资源、存储资源和通信资源。
数据层204封装了实现账本的各种数据结构,包括以文件系统中的文件实现的区块链,键值型的状态数据库和存在性证明(例如区块中交易的哈希树)。
网络层203封装了点对点(P2P,Point to Point)网络协议、数据传播机制和数据验证机制、接入认证机制和业务主体身份管理的功能。
其中,P2P网络协议实现区块链网络200中共识节点210之间的通信,数据传播机制保证了交易在区块链网络200中的传播,数据验证机制用于基于加密学方法(例如数字证书、数字签名、公/私钥对)实现共识节点210之间传输数据的可靠性;接入认证机制用于根据实际的业务场景对加入区块链网络200的业务主体的身份进行认证,并在认证通过时赋予业务主体接入区块链网络200的权限;业务主体身份管理用于存储允许接入区块链网络200的业务主体的身份、以及权限(例如能够发起的交易的类型)。
共识层202封装了区块链网络200中的共识节点210对区块达成一致性的机制(即共识机制)、交易管理和账本管理的功能。共识机制包括POS、POW和DPOS等共识算法,支持共识算法的可插拔。
交易管理用于验证共识节点210接收到的交易中携带的数字签名,验证业务主体的身份信息,并根据身份信息判断确认其是否具有权限进行交易(从业务主体身份管理读取相关信息);对于获得接入区块链网络200的授权的业务主体而言,均拥有认证中心颁发的数字证书,业务主体利用自己的数字证书中的私钥对提交的交易进行签名,从而声明自己的合法身份。
账本管理用于维护区块链和状态数据库。对于取得共识的区块,追加到区块链的尾部;执行取得共识的区块中的交易,当交易包括更新操作时更新状态数据库中的键值对,当交易包括查询操作时查询状态数据库中的键值对并向业务主体的客户端节点返回查询结果。支持对状态数据库的多种维度的查询操作,包括:根据区块向量号(例如交易的哈希值)查询区块;根据区块哈希值查询区块;根据交易向量号查询区块;根据交易向量号查询交易;根据业务主体的账号(向量号)查询业务主体的账号数据;根据通道名称查询通道中的区块链。
应用层201封装了区块链网络能够实现的各种业务,包括交易的溯源、存证和验证等。
由此,经过相似性识别的第一视频的版权信息可以保存在区块链网络中,当新的用户上传视频至视频服务器中时,视频服务器可以调用区块链网络中的版权信息(此时用户上传的第一视频可以作为第二视频),对视频的版权合规性进行验证。
图13为本申请实施例提供的视频信息处理方法的使用场景示意图,参见图13,终端(如终端10-1和终端10-2)上设置有能够显示相应短视频的软件客户端,例如短视频播放的客户端或插件,用户通过相应的客户端可以获得视频并进行展示;终端通过网络300连接短视频服务器200,网络300可以是广域网或者局域网,又或者是二者的组合。当然,用户也可以通过终端上传视频以供网络中的其他用户观看,这一过程中运营商的视频服务器需要对用户上传的视频进行检测,对不同的视频信息进行比对和分析,确定用户上传的视频版权是否合规,并对合规视频向不同的用户进行推荐,避免用户的短视频被盗播。
这一过程中,相关技术提供的方案至少存在如下问题:
a.两视频的动态拼接区域相似、静态拼接区域不相似,该类视频在信息流场景中需要判定为相似。受到视频的静态拼接区域的干扰,相关技术未对此类视频进行特殊识别,可能会判定此视频对为不相似。如果大量的此类重复视频在信息流推荐中出现,会导致用户体验不佳,遭到用户投诉。此外很多搬运号主为了逃避版权,对视频中的静态拼接区域做不同的贴片,从而逃避相似识别,如果根据相关技术提供的方案进行相似识别,则会导致版权方版权受损,造成严重的负面影响。
b.两视频的动态拼接区域不相似、静态拼接区域相似,该类视频在信息流场景中需要判定为不相似。受到视频的静态拼接区域的干扰,相关技术可能会判定此视频为相似,信息流业务中发表视频的号主发布了相同类型的特效视频(静态拼接区域相同)被误识别为相似后,会导致不被信息流进行视频推荐,使得号主收益、粉丝受到影响,从而引起号主的投诉,造成业务负面影响。
为了解决上述缺陷,本申请提供一种视频信息处理方法,下面对本申请所提供的视频信息处理方法的使用过程进行说明,其中,参考图14,图14为本申请实施例中视频信息处理方法的使用过程示意图,具体包括以下步骤:
步骤1401:获取两个不同的视频图像帧集合,并对不同视频图像帧集合中的静态拼接区域进行识别。
步骤1402:根据所识别的静态拼接区域对视频图像帧集合进行裁剪处理,得到新的视频图像帧集合。
步骤1403:将新的视频图像帧集合中的所有图像帧进行01向量转化,得到对应的特征向量集合(即图像特征向量的结合),并基于特征向量集合进行相似判定,得到不同视频间的相似结果。
步骤1404:基于不同视频间的相似结果执行视频推荐或者警告处理。
参考图15和图16,图15为本申请实施例中视频信息处理方法进行图像裁剪的示意图,图16为本申请实施例中视频信息处理方法进行图像裁剪的示意图。
通过本申请所提供的视频信息处理方法,当两视频的动态拼接区域相似、静态拼接区域不相似时,这两个视频在信息流场景中可以判定为相似(如图16所示),因此其中的某个视频可能会出现版权侵权,需要视频的上传用户提供的相应的版权证明,进行后续处理。进一步地,当两视频的动态拼接区域不相似、静态拼接区域相似时,这两个视频在信息流场景中可以判定为不相似(如图15所示),可以均作为待推荐视频,并向网络中的不同用户进行视频推荐。
下面继续说明本申请实施例提供的视频信息处理装置2020实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器202的视频信息处理装置2020中的软件模块可以包括:信息传输模块2081,配置为确定与视频对应的视频图像帧集合;信息处理模块2082,配置为确定视频图像帧集合中的图像帧所对应的静态拼接区域;根据静态拼接区域,对视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定第一视频与第二视频的相似度。
在一些实施例中,信息传输模块2081,还配置为:对视频进行解析,得到视频的时序信息;根据视频的时序信息,对视频所对应的视频参数进行解析,得到与视频对应的播放时长参数与画面区域参数;基于视频对应的播放时长参数与画面区域参数,对视频进行抽取,得到视频对应的视频图像帧集合。
在一些实施例中,信息处理模块2082,还配置为:对视频图像帧集合中的图像帧进行灰度化处理;对经过灰度化处理的图像帧进行横向等间隔采样处理和纵向等间隔采样处理;确定图像帧的横向等间隔采样处理结果的时间维度方差和、以及图像帧的纵向等间隔采样处理结果的时间维度方差和;基于图像帧的横向等间隔采样处理结果的时间维度方差和,确定图像帧的静态拼接区域的纵向尺寸参数;基于图像帧的纵向等间隔采样处理结果的时间维度方差和,确定图像帧的静态拼接区域的横向尺寸参数。
在一些实施例中,信息处理模块2082,还配置为:根据裁剪处理后的多个图像帧,构建新的视频图像帧集合,其中,新的视频图像帧集合中的图像帧包括区别于静态拼接区域的动态拼接区域;确定新的视频图像帧集合中的图像帧对应的图像特征向量。
在一些实施例中,信息处理模块2082,还配置为:通过视频信息处理模型的卷积层和最大值池化层对新的视频图像帧集合中不同图像帧交叉进行处理,得到不同图像帧的降采样结果;通过视频信息处理模型的全连接层对降采样结果进行归一化处理,得到归一化结果;通过视频信息处理模型对归一化结果进行深度分解处理,得到与不同图像帧相匹配的图像特征向量。
在一些实施例中,信息处理模块2082,还配置为:根据视频图像帧集合中的图像帧的尺寸参数、以及图像帧对应的静态拼接区域的尺寸参数,确定图像帧对应的动态拼接区域的尺寸参数;其中,尺寸参数包括纵向尺寸参数和横向尺寸参数;根据动态拼接区域的尺寸参数从图像帧中裁剪出动态拼接区域,以作为裁剪处理后的图像帧。
在一些实施例中,信息处理模块2082,还配置为:获取第一训练样本集合,其中第一训练样本集合为通过历史数据所获取的带有噪声的视频处理样本;对第一训练样本集合进行去噪处理,得到相应的第二训练样本集合;通过视频信息处理模型对第二训练样本集合进行处理,以确定视频信息处理模型的初始参数;响应于视频信息处理模型的初始参数,通过视频信息处理模型对第二训练样本集合进行处理,以确定视频信息处理模型的更新参数;根据视频信息处理模型的更新参数,通过第二训练样本集合对视频信息处理模型的网络参数进行迭代更新;其中,更新后的视频信息处理模型用于确定图像特征向量。
在一些实施例中,信息处理模块2082,还配置为:对第二训练样本集合进行负例处理,得到与第二训练样本集合相对应的负例样本集合;根据负例样本集合确定相应的边缘损失函数值,其中,边缘损失函数值用于作为监督参数对视频信息处理模型的处理结果进行评价。
在一些实施例中,信息处理模块2082,还配置为:执行以下至少一种处理:将视频信息处理模型中的图像特征向量进行随机组合,得到与第二训练样本集合相对应的负例样本集合;对视频信息处理模型的图像特征向量进行随机删除处理,得到与第二训练样本集合相对应的负例样本集合;对视频信息处理模型的图像特征向量进行替换处理,得到与第二训练样本集合相对应的负例样本集合。
在一些实施例中,信息处理模块2082,还配置为:根据第一视频对应的视频图像帧集合中的每个图像帧与第二视频对应的视频图像帧集合中的每个图像帧对应图像特征向量的相似度,确定对应 的帧间相似度参数集合;根据帧间相似度参数集合确定达到相似度阈值的图像帧数量;基于达到相似度阈值的图像帧数量,确定第一视频与第二视频的相似度。
在一些实施例中,信息处理模块2082,还配置为:基于达到相似度阈值的图像帧数量、第一视频对应的视频图像帧集合中的图像帧数量、以及第二视频对应的视频图像帧集合中的图像帧数量,确定第一视频与第二视频的相似度。
在一些实施例中,信息处理模块2082,还配置为:当根据相似度确定出第一视频与第二视频相似时,获取第一视频的版权信息和第二视频的版权信息;通过第一视频的版权信息和第二视频的版权信息,确定第一视频的合法性;当第一视频的版权信息和第二视频的版权信息不一致时,发出警示信息。
在一些实施例中,信息处理模块2082,还配置为:当根据相似度确定出第一视频与第二视频不相似时,将第一视频添加至视频源;对视频源中的所有待推荐视频的召回顺序进行排序;基于待推荐视频的召回顺序的排序结果向目标用户进行视频推荐。
在一些实施例中,信息处理模块2082,还配置为:将视频的标识、视频对应的图像特征向量和视频的版权信息发送至区块链网络,以使区块链网络的节点将视频的标识、视频对应的图像特征向量和视频的版权信息填充至新区块,且当对新区块共识一致时,将新区块追加至区块链的尾部。
在一些实施例中,信息处理模块2082,还配置为:接收区块链网络中的其他节点的数据同步请求;响应于数据同步请求,对其他节点的权限进行验证;当其他节点的权限通过验证时,控制当前节点与其他节点之间进行数据同步,以使其他节点获取视频的标识、视频对应的图像特征向量和视频的版权信息。
在一些实施例中,信息处理模块2082,还配置为:响应于查询请求,解析查询请求以获取对应的对象标识;根据对象标识,获取区块链网络中的目标区块内的权限信息;对权限信息与对象标识的匹配性进行校验;当权限信息与对象标识相匹配时,在区块链网络中获取视频的标识、视频对应的图像特征向量和视频的版权信息;将所获取的视频的标识、视频对应的图像特征向量和视频的版权信息发送至相应的客户端,以使客户端获取视频的标识、视频对应的图像特征向量和视频的版权信息。
本申请实施例至少具有以下技术效果:本申请通过区分视频的动态拼接区域和静态拼接区域,并进行裁剪处理,利用得到的图像特征向量判断视频的相似度,提升了视频相似度判断的准确性,减少了由于静态拼接区域的遮挡与覆盖所导致的对视频相似度的误判。
以上,仅为本申请的实施例而已,并非用于限定本申请的保护范围,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。

Claims (19)

  1. 一种视频信息处理方法,由电子设备执行,所述方法包括:
    确定与视频对应的视频图像帧集合;
    确定所述视频图像帧集合中的图像帧所对应的静态拼接区域;
    根据所述静态拼接区域,对所述视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;
    基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定所述第一视频与所述第二视频的相似度。
  2. 根据权利要求1所述的方法,其中,所述确定与视频对应的视频图像帧集合,包括:
    对所述视频进行解析,得到所述视频的时序信息;
    根据所述视频的时序信息,对所述视频所对应的视频参数进行解析,得到与所述视频对应的播放时长参数与画面区域参数;
    基于所述视频对应的播放时长参数与画面区域参数,对所述视频进行抽取,得到所述视频对应的视频图像帧集合。
  3. 根据权利要求1所述的方法,其中,所述确定所述视频图像帧集合中的图像帧所对应的静态拼接区域,包括:
    对所述视频图像帧集合中的图像帧进行灰度化处理;
    对经过灰度化处理的所述图像帧进行横向等间隔采样处理和纵向等间隔采样处理;
    确定所述图像帧的横向等间隔采样处理结果的时间维度方差和、以及所述图像帧的纵向等间隔采样处理结果的时间维度方差和;
    基于所述图像帧的横向等间隔采样处理结果的时间维度方差和,确定所述图像帧的静态拼接区域的纵向尺寸参数;
    基于所述图像帧的纵向等间隔采样处理结果的时间维度方差和,确定所述图像帧的静态拼接区域的横向尺寸参数。
  4. 根据权利要求1所述的方法,其中,所述基于相应的裁剪结果确定图像特征向量,包括:
    根据裁剪处理后的多个所述图像帧,构建新的视频图像帧集合,其中,所述新的视频图像帧集合中的图像帧包括区别于所述静态拼接区域的动态拼接区域;
    确定所述新的视频图像帧集合中的图像帧对应的图像特征向量。
  5. 根据权利要求4所述的方法,其中,所述确定所述新的视频图像帧集合中的图像帧对应的图像特征向量,包括:
    通过视频信息处理模型的卷积层和最大值池化层对所述新的视频图像帧集合中不同图像帧交叉进行处理,得到所述不同图像帧的降采样结果;
    通过所述视频信息处理模型的全连接层对所述降采样结果进行归一化处理,得到归一化结果;
    通过所述视频信息处理模型对所述归一化结果进行深度分解处理,得到与所述不同图像帧相匹配的图像特征向量。
  6. 根据权利要求1所述的方法,其中,所述根据所述静态拼接区域,对所述视频图像帧集合中的图像帧进行裁剪处理,包括:
    根据所述视频图像帧集合中的图像帧的尺寸参数、以及所述图像帧对应的静态拼接区域的尺寸参数,确定所述图像帧对应的动态拼接区域的尺寸参数;其中,所述尺寸参数包括纵向尺寸参数和横向尺寸参数;
    根据所述动态拼接区域的尺寸参数从所述图像帧中裁剪出所述动态拼接区域,以作为裁剪处理后的所述图像帧。
  7. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:
    获取第一训练样本集合,其中所述第一训练样本集合为通过历史数据所获取的带有噪声的视频处理样本;
    对所述第一训练样本集合进行去噪处理,得到相应的第二训练样本集合;
    通过视频信息处理模型对所述第二训练样本集合进行处理,以确定所述视频信息处理模型的初始参数;
    响应于所述视频信息处理模型的初始参数,通过所述视频信息处理模型对所述第二训练样本集合进行处理,以确定所述视频信息处理模型的更新参数;
    根据所述视频信息处理模型的更新参数,通过所述第二训练样本集合对所述视频信息处理模型的网络参数进行迭代更新;
    其中,更新后的所述视频信息处理模型用于确定图像特征向量。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    对所述第二训练样本集合进行负例处理,得到与所述第二训练样本集合相对应的负例样本集合;
    根据所述负例样本集合确定相应的边缘损失函数值,其中,所述边缘损失函数值用于作为监督参数对所述视频信息处理模型的处理结果进行评价。
  9. 根据权利要求8所述的方法,其中,所述对所述第二训练样本集合进行负例处理,包括:
    执行以下至少一种处理:
    将所述视频信息处理模型中的图像特征向量进行随机组合,得到与所述第二训练样本集合相对应的负例样本集合;
    对所述视频信息处理模型的图像特征向量进行随机删除处理,得到与所述第二训练样本集合相对应的负例样本集合;
    对所述视频信息处理模型的图像特征向量进行替换处理,得到与所述第二训练样本集合相对应的负例样本集合。
  10. 根据权利要求1-6任一项所述的方法,其中,所述基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定所述第一视频与所述第二视频的相似度,包括:
    根据所述第一视频对应的视频图像帧集合中的每个图像帧与所述第二视频对应的视频图像帧集合中的每个图像帧对应所述图像特征向量的相似度,确定对应的帧间相似度参数集合;
    根据所述帧间相似度参数集合确定达到相似度阈值的图像帧数量;
    基于所述达到相似度阈值的图像帧数量,确定所述第一视频与所述第二视频的相似度。
  11. 根据权利要求10所述的方法,其中,所述基于所述达到相似度阈值的图像帧数量,确定所述第一视频与所述第二视频的相似度,包括:
    基于所述达到相似度阈值的图像帧数量、所述第一视频对应的视频图像帧集合中的图像帧数量、以及所述第二视频对应的视频图像帧集合中的图像帧数量,确定所述第一视频与所述第二视频的相似度。
  12. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:
    当根据所述相似度确定出所述第一视频与所述第二视频相似时,获取所述第一视频的版权信息和所述第二视频的版权信息;
    通过所述第一视频的版权信息和所述第二视频的版权信息,确定所述第一视频的合法性;
    当所述第一视频的版权信息和所述第二视频的版权信息不一致时,发出警示信息。
  13. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:
    当根据所述相似度确定出所述第一视频与所述第二视频不相似时,将所述第一视频添加至视频源;
    对所述视频源中的所有待推荐视频的召回顺序进行排序;
    基于所述待推荐视频的召回顺序的排序结果向目标用户进行视频推荐。
  14. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:
    将所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息发送至区块链网络,以使
    所述区块链网络的节点将所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息填充至新区块,且当对所述新区块共识一致时,将所述新区块追加至区块链的尾部。
  15. 根据权利要求14所述的方法,其中,所述方法还包括:
    接收所述区块链网络中的其他节点的数据同步请求;
    响应于所述数据同步请求,对所述其他节点的权限进行验证;
    当所述其他节点的权限通过验证时,控制当前节点与所述其他节点之间进行数据同步,以使所述其他节点获取所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息。
  16. 根据权利要求14所述的方法,其中,所述方法还包括:
    响应于查询请求,解析所述查询请求以获取对应的对象标识;
    根据所述对象标识,获取所述区块链网络中的目标区块内的权限信息;
    对所述权限信息与所述对象标识的匹配性进行校验;
    当所述权限信息与所述对象标识相匹配时,在所述区块链网络中获取所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息;
    将所获取的所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息发送至相应的客户端,以使所述客户端获取所述视频的标识、所述视频对应的图像特征向量和所述视频的版权信息。
  17. 一种视频信息处理装置,所述装置包括:
    信息传输模块,配置为确定与视频对应的视频图像帧集合;
    信息处理模块,配置为:
    确定所述视频图像帧集合中的图像帧所对应的静态拼接区域;
    根据所述静态拼接区域,对所述视频图像帧集合中的图像帧进行裁剪处理,并基于相应的裁剪结果确定图像特征向量;
    基于第一视频对应的图像特征向量和第二视频对应的图像特征向量,确定所述第一视频与所述第二视频的相似度。
  18. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于运行所述存储器存储的可执行指令时,实现权利要求1至16任一项所述的视频信息处理方法。
  19. 一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实现权利要求1至16任一项所述的视频信息处理方法。
PCT/CN2021/106552 2020-08-21 2021-07-15 一种视频信息处理方法、装置、电子设备及存储介质 WO2022037343A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21857438.2A EP4120122A4 (en) 2020-08-21 2021-07-15 METHOD AND DEVICE FOR PROCESSING VIDEO INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIA
US17/973,418 US20230045726A1 (en) 2020-08-21 2022-10-25 Video information processing method and apparatus, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010847943.3 2020-08-21
CN202010847943.3A CN111931678B (zh) 2020-08-21 2020-08-21 一种视频信息处理方法、装置、电子设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/973,418 Continuation US20230045726A1 (en) 2020-08-21 2022-10-25 Video information processing method and apparatus, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022037343A1 true WO2022037343A1 (zh) 2022-02-24

Family

ID=73306064

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106552 WO2022037343A1 (zh) 2020-08-21 2021-07-15 一种视频信息处理方法、装置、电子设备及存储介质

Country Status (4)

Country Link
US (1) US20230045726A1 (zh)
EP (1) EP4120122A4 (zh)
CN (1) CN111931678B (zh)
WO (1) WO2022037343A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979716A (zh) * 2022-05-19 2022-08-30 浙江传媒学院 一种基于母子链的视频版权交易方法及系统
CN116188821A (zh) * 2023-04-25 2023-05-30 青岛尘元科技信息有限公司 版权检测方法、系统、电子设备和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931678B (zh) * 2020-08-21 2021-09-07 腾讯科技(深圳)有限公司 一种视频信息处理方法、装置、电子设备及存储介质
CN112559863A (zh) * 2020-12-14 2021-03-26 杭州趣链科技有限公司 基于区块链的信息推送方法、装置、设备和存储介质
CN112786163B (zh) * 2020-12-31 2023-10-24 北京小白世纪网络科技有限公司 一种超声图像处理显示方法、系统及存储介质
CN113037496B (zh) * 2021-03-15 2022-11-04 承德石油高等专科学校 一种基于区块链技术视频采集装置及其方法
CN114394100B (zh) * 2022-01-12 2024-04-05 深圳力维智联技术有限公司 一种无人巡逻车控制系统及无人车
CN116304179B (zh) * 2023-05-19 2023-08-11 北京大学 一种获取目标视频的数据处理系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120257831A1 (en) * 2007-09-27 2012-10-11 Behavioral Recognition Systems, Inc. Context processor for video analysis system
CN105975939A (zh) * 2016-05-06 2016-09-28 百度在线网络技术(北京)有限公司 视频检测方法和装置
CN107801093A (zh) * 2017-10-26 2018-03-13 深圳市量子视觉科技有限公司 视频渲染方法、装置、计算机设备和可读存储介质
CN110162664A (zh) * 2018-12-17 2019-08-23 腾讯科技(深圳)有限公司 视频推荐方法、装置、计算机设备及存储介质
CN111931678A (zh) * 2020-08-21 2020-11-13 腾讯科技(深圳)有限公司 一种视频信息处理方法、装置、电子设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4157567B2 (ja) * 2006-04-11 2008-10-01 株式会社東芝 動画像の高解像度化方法及び装置
CN101566994B (zh) * 2008-04-22 2011-02-16 王磊 一种图像、视频检索方法
CN104053023B (zh) * 2014-06-13 2018-11-30 海信集团有限公司 一种确定视频相似度的方法及装置
CN104504717B (zh) * 2014-12-31 2017-10-27 北京奇艺世纪科技有限公司 一种图像信息检测方法及装置
CN106686460B (zh) * 2016-12-22 2020-03-13 优地网络有限公司 一种视频节目推荐方法及视频节目推荐装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120257831A1 (en) * 2007-09-27 2012-10-11 Behavioral Recognition Systems, Inc. Context processor for video analysis system
CN105975939A (zh) * 2016-05-06 2016-09-28 百度在线网络技术(北京)有限公司 视频检测方法和装置
CN107801093A (zh) * 2017-10-26 2018-03-13 深圳市量子视觉科技有限公司 视频渲染方法、装置、计算机设备和可读存储介质
CN110162664A (zh) * 2018-12-17 2019-08-23 腾讯科技(深圳)有限公司 视频推荐方法、装置、计算机设备及存储介质
CN111931678A (zh) * 2020-08-21 2020-11-13 腾讯科技(深圳)有限公司 一种视频信息处理方法、装置、电子设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979716A (zh) * 2022-05-19 2022-08-30 浙江传媒学院 一种基于母子链的视频版权交易方法及系统
CN114979716B (zh) * 2022-05-19 2023-06-13 浙江传媒学院 一种基于母子链的视频版权交易方法及系统
CN116188821A (zh) * 2023-04-25 2023-05-30 青岛尘元科技信息有限公司 版权检测方法、系统、电子设备和存储介质

Also Published As

Publication number Publication date
EP4120122A4 (en) 2023-08-16
EP4120122A1 (en) 2023-01-18
CN111931678B (zh) 2021-09-07
CN111931678A (zh) 2020-11-13
US20230045726A1 (en) 2023-02-09

Similar Documents

Publication Publication Date Title
WO2022037343A1 (zh) 一种视频信息处理方法、装置、电子设备及存储介质
CN111816252B (zh) 一种药物筛选方法、装置及电子设备
CN113711536B (zh) 从区块链网络中提取数据
WO2022052630A1 (zh) 一种多媒体信息处理方法、装置、电子设备及存储介质
CA2930415C (en) Document management and collaboration system
US11323407B2 (en) Methods, systems, apparatuses, and devices for facilitating managing digital content captured using multiple content capturing devices
CN110210883B (zh) 群控账号识别方法、装置、服务器及存储介质
CN111368901A (zh) 基于联邦学习的多方联合建模方法、设备和介质
CN110705683A (zh) 随机森林模型的构造方法、装置、电子设备及存储介质
CN112989186B (zh) 资讯信息推荐模型训练方法、装置、电子设备及存储介质
CN111339183A (zh) 数据处理方法、边缘节点、数据中心及存储介质
CN113539299A (zh) 一种多媒体信息处理方法、装置、电子设备及存储介质
CN110555482A (zh) 基于人工智能的低俗图片识别方法、装置及电子设备
CN112200064B (zh) 一种图像处理方法、装置、电子设备及存储介质
US10341420B1 (en) Approaches for preparing and delivering bulk data to clients
US20230267430A1 (en) Data processing method and device, and computer-readable storage medium
CN111860597B (zh) 一种视频信息处理方法、装置、电子设备及存储介质
CN112702623A (zh) 视频处理方法、装置、设备及存储介质
CN113051418A (zh) 一种图像来源追踪方法、装置、存储介质及电子设备
CN113741774A (zh) 一种多媒体信息处理方法、装置、电子设备及存储介质
CN106547626B (zh) 用于对等架构均衡服务器的方法及服务器
Niu et al. A Trustworthy Content Moderation Scheme Based on Permissioned Blockchain
WO2024029675A1 (ko) Qr 코드를 이용한 사적영상 관리 서비스 제공 시스템
CN113762040B (zh) 视频识别方法、装置、存储介质及计算机设备
US20230206634A1 (en) Blockchain recordation and validation of video data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857438

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021857438

Country of ref document: EP

Effective date: 20221014

NENP Non-entry into the national phase

Ref country code: DE