CN113254703A

CN113254703A - Video matching method, video processing device, electronic equipment and medium

Info

Publication number: CN113254703A
Application number: CN202110520030.5A
Authority: CN
Inventors: 刘俊启
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-08-13

Abstract

The disclosure discloses a video matching method, a video processing device, video processing equipment, a video processing medium and a video processing product, and relates to the fields of image processing, intelligent search and the like. The video matching method comprises the following steps: receiving first feature data for a reference video; comparing the first characteristic data with respective second characteristic data of at least one candidate video to obtain a comparison result, wherein the second characteristic data is obtained by identifying scene change information in the candidate video; and determining a target video matched with the reference video from the at least one candidate video based on the comparison result, wherein the second characteristic data of the target video is matched with the first characteristic data.

Description

Video matching method, video processing device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the fields of image processing, intelligent search, and the like, and more particularly, to a video matching method, a video processing method, an apparatus, an electronic device, a medium, and a program product.

Background

With the popularity of the internet, more and more users search for videos on the internet. In the process of searching videos, related videos are matched based on search terms input by a user, and the videos obtained through matching are recommended to the user. However, the method of matching videos by using search terms has the problem of low matching accuracy, and the videos obtained by matching are difficult to meet the requirements of users.

Disclosure of Invention

The present disclosure provides a video matching method, a video processing method, an apparatus, an electronic device, a storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided a video matching method, including: receiving first feature data for a reference video; comparing the first feature data with respective second feature data of at least one candidate video to obtain a comparison result, wherein the second feature data is obtained by identifying scene change information in the candidate video; and determining a target video matched with the reference video from the at least one candidate video based on the comparison result, wherein the second characteristic data of the target video is matched with the first characteristic data.

According to another aspect of the present disclosure, there is provided a video processing method including: identifying scene change information in a reference video; extracting first feature data from the reference video in response to identifying scene change information; and sending the first characteristic data.

According to another aspect of the present disclosure, there is provided a video matching apparatus including: the device comprises a receiving module, a comparing module and a determining module. The receiving module is used for receiving first characteristic data aiming at a reference video. And the comparison module is used for comparing the first characteristic data with second characteristic data of at least one candidate video to obtain a comparison result, wherein the second characteristic data is obtained by identifying scene change information in the candidate video. A determining module, configured to determine, based on the comparison result, a target video matching the reference video from the at least one candidate video, where second feature data of the target video matches the first feature data.

According to another aspect of the present disclosure, there is provided a video processing apparatus including: the device comprises a second identification module, a second extraction module and a sending module. And the second identification module is used for identifying scene change information in the reference video. A second extraction module to extract first feature data from the reference video in response to identifying scene change information. And the sending module is used for sending the first characteristic data.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video matching method as described above.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the video matching method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the video processing method as described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the video matching method as described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the video processing method as described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically shows an application scenario of a video matching method according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram of a video matching method according to an embodiment of the present disclosure;

fig. 3 schematically shows a schematic diagram of a video matching method according to an embodiment of the present disclosure;

FIG. 4 schematically shows a flow diagram of a video processing method according to an embodiment of the present disclosure;

fig. 5 schematically shows a schematic diagram of a video matching method and a video processing method according to an embodiment of the present disclosure;

fig. 6 schematically shows a block diagram of a video matching apparatus according to an embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure; and

fig. 8 is a block diagram of an electronic device for implementing a video matching method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Fig. 1 schematically shows an application scenario of a video matching method according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 of an embodiment of the present disclosure includes, for example, a candidate video and a reference video.

For example, a plurality of candidate videos are stored in a server and a reference video is stored in a client. When the user needs to search for the video matching with the reference video, the server may receive the reference video from the client, and match the reference video with each candidate video, so as to determine a target video matching with the reference video from the plurality of candidate videos, where the target video is the video needed by the user.

The embodiment of the present disclosure takes one candidate video 110 as an example to illustrate the matching situation between the candidate video and the reference video.

Illustratively, the reference videos 121, 122 are a subset of the candidate videos 110. When the candidate video 110 is matched based on the reference videos 121, 122, at least part of the content of the candidate video 110 matches the entire content of the reference videos 121, 122.

Illustratively, the partial content of the reference videos 123, 124 is a subset of the candidate videos 110. When the candidate video 110 is matched based on the reference videos 123, 124, the partial content of the candidate video 110 matches the partial content of the reference videos 123, 124.

Illustratively, the candidate videos 110 are a subset of the

reference videos

125, 126. When the candidate video 110 is matched based on the

reference videos

125, 126, the entire content of the candidate video 110 matches the partial content of the

reference videos

125, 126.

Illustratively, the candidate video 110 is disjoint from the

reference videos

127, 128. When the candidate video 110 is matched based on the

reference videos

127, 128, the content of the candidate video 110 does not match the content of the

reference videos

127, 128.

For example, matching of videos may be performed using a picture similarity recognition algorithm and an algorithm that extracts feature matches. Specifically, a plurality of images may be extracted from the reference video, a plurality of images may be extracted from each candidate video, and the plurality of images of the reference video and the plurality of images of the candidate video may be matched to determine the target video matching the reference video from the candidate videos. Alternatively, the features of each image may be extracted from a plurality of images of the reference video, the features of each image may be extracted from a plurality of images of the candidate video, and the features of the images may be matched to determine the target video matching the reference video from the candidate video. It can be seen that, since the video is a continuous content, when matching the reference video and the candidate video, the process of matching through multiple images is computationally expensive.

Illustratively, for each second of content in a video, 25 (frames) of images are extracted from each second of content, and if the duration of the video is 16 seconds, the number of images to be extracted is 16 × 25 — 400. The number of extracted images is large, so that the calculation amount is large when matching is performed by using the extracted images. And it is a matter of consideration which images are extracted from the video, because the inconsistent rules for extracting images will cause the problem that the same two videos cannot be matched.

For example, the duration of the reference video is 16 seconds, and the duration of the candidate video is 32 seconds. In one case, 25 images per second of the content of the reference video are extracted, for a total of 16 × 25 images, and 25 images per second of the content of the candidate video are extracted, for a total of 32 × 25 images. When matching the reference video and the candidate video, at most (16 × 25) × (32 × 25) × 32 (ten thousand) comparisons are required to compare one image of the reference video and one image of the candidate video at a time, and the calculation amount of matching of the visible videos is large.

In view of this, an embodiment of the present disclosure provides a video matching method, including: first feature data for a reference video is received. Then, the first feature data and second feature data of at least one candidate video are compared to obtain a comparison result, wherein the second feature data are obtained by identifying scene change information in the candidate video. Next, based on the comparison result, a target video matching the reference video is determined from the at least one candidate video, wherein the second feature data of the target video matches the first feature data.

An embodiment of the present disclosure further provides a video processing method, including: scene change information in a reference video is identified. First feature data is then extracted from the reference video in response to identifying the scene change information. Next, the first characteristic data is transmitted.

A video matching method and a video processing method according to an exemplary embodiment of the present disclosure are described below with reference to fig. 2 to 5 in conjunction with an application scenario of fig. 1.

Fig. 2 schematically shows a flow chart of a video matching method according to an embodiment of the present disclosure.

As shown in fig. 2, the video matching method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S230. The method of the disclosed embodiments may be performed by a server, for example.

In operation S210, first feature data for a reference video is received.

In operation S220, the first feature data and second feature data of each of the at least one candidate video are compared to obtain a comparison result, where the second feature data is obtained by identifying scene change information in the candidate video.

In operation S230, a target video matching the reference video is determined from the at least one candidate video based on the comparison result.

According to an embodiment of the present disclosure, the second feature data is obtained by, for example, identifying scene change information in the candidate video. When the scene change in the candidate video is identified, second feature data is extracted from the candidate video. The second feature data of the target video is matched with the first feature data.

Illustratively, when a scene change in the candidate video is identified, an image is extracted from the candidate video, and the second feature data may include the extracted image. Alternatively, after the image is extracted, the image may be processed to obtain image features of the image, and the image features may be used as the second feature data.

Illustratively, the first feature data is obtained by identifying scene change information in the reference video, for example, and the first feature data is extracted in a similar manner to the second feature data.

For example, the first characteristic data is sent by the client to the server. The server stores a plurality of candidate videos, each having second feature data. After the server receives the first feature data, the first feature data and the second feature data of each candidate video are compared to obtain a comparison result, and then a candidate video matched with the reference video is determined from the candidate videos as a target video based on the comparison result.

In the embodiment of the disclosure, the feature data are extracted from the video and the video matching is performed through the comparison of the feature data, so that the calculation amount of the video matching is greatly reduced, and the efficiency of the video matching is improved. In addition, the embodiment of the disclosure extracts the feature data by identifying scene change information in the video, so that the extracted first feature data is for scene change, and the second feature data is for scene change in the candidate video, thereby improving the matching probability between the first feature data and the second feature data, and thus improving the success rate of video matching.

Fig. 3 schematically shows a schematic diagram of a video matching method according to an embodiment of the present disclosure.

As shown in fig. 3, each candidate video is composed of a plurality of (frame) images. Taking the candidate video 300 as an example, scene change information in the candidate video 300 is identified, and when the scene change information is identified, second feature data is extracted from the candidate video 300.

For example, video data 310 for a first scene and video data 320 for a second scene are included in candidate video 300. The first scene is for example two users (characters) in a video that are talking and the second scene is for example a scene about a landscape. When a switch from a first scene to a second scene in the candidate video 300 is identified, a first video clip 330 is determined from the candidate video 300.

Illustratively, the first video clip 330 includes video data for a first scene and/or video data for a second scene. For example, the first video segment 330 includes a portion of the video data in the video data 310. Alternatively, the first video segment 330 includes a portion of the video data in the video data 320. Alternatively, the first video segment 330 includes a portion of video data in the video data 310 and a portion of video data in the video data 320. The embodiment of the present disclosure takes the example that the first video segment 330 includes a part of the video data in the video data 310 and a part of the video data in the video data 320.

After the first video segment 330 is determined, second feature data for the scene change is extracted from the first video segment 330.

In one example, a first image 340 may be extracted from the first video segment 330 and the first image 340 may be used as the second feature data. Alternatively, the first image 340 may be subjected to preprocessing such as compression, cropping, changing the image size, or graying the image, and the preprocessed first image may be used as the second feature data.

In another example, a first image 340 may be extracted from the first video segment 330, the first image 340 may then be pre-processed, and the pre-processed first image 340 may then be further processed to extract image features 350 of the first image 340 and treat the image features 350 as second feature data. The image features 350 may be feature vectors.

According to the embodiment of the disclosure, when extracting the second feature data of the candidate video, a first video segment for a scene cut is determined by identifying the scene cut, and then the second feature data is extracted from the first video segment. Compared with the method of comparing each image (frame) in the video, the method has the advantage that the cost of video matching is greatly reduced by extracting the second characteristic data to match the video. In addition, the video content corresponding to the second characteristic data in the candidate video is changed according to the scene, and video matching is performed based on the second characteristic data, so that the success rate of video matching is improved.

In an embodiment of the present disclosure, the first feature data includes, for example, a first data sequence including a plurality of sub data. The second characteristic data includes, for example, a second data sequence including a plurality of sub data.

For example, the first data sequence is [ a ]₁，a₂，a₃]Sub-data a₁Indicating that the reference video is recognized as consisting of scene A₁Switch to scene A₂Time extracted feature data, time scene A₁As a third scenario, scenario A₂Is the fourth scenario. Subdata a₂Indicating that the reference video is recognized as consisting of scene A₂Switch to scene A₃Time extracted feature data, time scene A₂As a third scenario, scenario A₃Is the fourth scenario. Subdata a₃Indicating that the reference video is recognized as consisting of scene A₃Switch to scene A₄Time extracted feature data, time scene A₃As a third scenario, scenario A₄Is the fourth scenario. Scene A₁、A₂、A₃、A₄Which in turn appear in the reference video. First data sequence [ a ]₁，a₂，a₃]And storing in association with the reference video.

For example, the second data sequence is [ b ]₁，b₂，b₃，b₄]Sub-data b₁Indicating that the candidate video is identified by scene B₁Switch to scene B₂Time extracted feature data, time scene B₁As a first scene, scene B₂Is the second scenario. Sub-data b₂Is indicated in recognizing the waiting timeSelect video from scene B₂Switch to scene B₃Time extracted feature data, time scene B₂As a first scene, scene B₃Is the second scenario. Sub-data b₃Indicating that the candidate video is identified by scene B₃Switch to scene B₄Time extracted feature data, time scene B₃As a first scene, scene B₄Is the second scenario. Sub-data b₄Indicating that the candidate video is identified by scene B₄Switch to scene B₅Time extracted feature data, time scene B₄As a first scene, scene B₅Is the second scenario. Scene B₁、B₂、B₃、B₄、B₅Appearing in the candidate video in sequence. Second data sequence [ b ]₁，b₂，b₃，b₄]And storing the candidate videos in association.

Then, from the first data sequence [ a ]₁，a₂，a₃]A plurality of sub data adjacent to each other are determined as a first sub sequence, and a second data sequence [ b ] is selected from among the plurality of sub data₁，b₂，b₃，b₄]The number of the sub-data in the second sub-sequence is the same as the number of the sub-data in the first sub-sequence, and the number may be set according to the actual application, for example, the number is 2. That is, the first subsequence is, for example, [ a ]₁，a₂]Or [ a₂，a₃]The second subsequence is [ b ]₁，b₂]、[b₂，b₃]Or [ b₃，b₄]. And comparing any one first subsequence with any one second subsequence, and if any one first subsequence is matched with any one second subsequence, determining the candidate video corresponding to the second subsequence as the target video. With the first subsequence as [ a ]₁，a₂]And the second subsequence is [ b ]₂，b₃]For example, when the child data a₁And sub data b₂Match and subdata a₂And sub data b₃When matching, the first subsequence is determined to be [ a ]₁，a₂]And the second subsequence is [ b ]₂，b₃]And (6) matching. The matching of the two sub data comprises that the similarity of the two sub data is higher, and the sub data can be an image or a feature vector of the image.

In one embodiment, the first data sequence [ a ]₁，a₂，a₃]For example by the client to the server. The server receives a first data sequence [ a ]₁，a₂，a₃]Thereafter, from the first data sequence [ a ]₁，a₂，a₃]Determining a first sub-sequence from a second data sequence [ b ]₁，b₂，b₃，b₄]Determining the second subsequence and matching the first subsequence with the second subsequence.

In another example, the client may send one sub data for the reference video to the server at a time, and after the service receives the sub data of the reference video, the received sub data and the sub data of the candidate video are compared until it is determined that a preset number of adjacent sub data for the reference video and a preset number of adjacent sub data for the candidate video match. The preset number is for example 2. After it is determined that the preset number of adjacent subdata for the reference video and the preset number of adjacent subdata for the candidate video are matched, the matching process may be ended, and the matched candidate video is recommended to the client as the target video.

For example, the client will sub-data a₁Sending to the server, the server will a₁And a second data sequence [ b ]₁，b₂，b₃，b₄]Is compared, when the sub-data a is determined₁And sub data b₂When matching, it will be for the second data sequence [ b ]₁，b₂，b₃，b₄]The candidate video of (2) is stored in a queue. The client side continues to use the subdata a₂Sending to the server, the server will a₂And a second data sequence [ b ]₁，b₂，b₃，b₄]Is compared, when the sub-data a is determined₂And sub data b₃When there is a match, the first data sequence and the second data sequence are thenHas 2 neighbor data matches, and now will be in the queue for the second data sequence b₁，b₂，b₃，b₄]As the target video. The server may recommend the target video to the client. If the sub-data a₂And sub data b₃Mismatch, one can look for the second data sequence [ b ]₁，b₂，b₃，b₄]Is removed from the queue.

Fig. 4 schematically shows a flow chart of a video processing method according to an embodiment of the present disclosure.

As shown in fig. 4, the video processing method 400 of the embodiment of the present disclosure may include, for example, operations S410 to S430. The method of the disclosed embodiments may be performed by a client, for example.

In operation S410, scene change information in a reference video is identified.

In operation S420, first feature data is extracted from the reference video in response to identifying the scene change information.

In operation S430, first feature data is transmitted.

In the embodiment of the present disclosure, a process in which the client extracts the first feature data by performing scene change information identification on the reference video is the same as or similar to a process in which the server extracts the second feature data by performing scene change information identification on the candidate video, and details are not repeated here. After the client extracts the first feature data, the client may send the first feature data to the server, so that the server can match the first feature data with the second feature data conveniently.

In the embodiment of the disclosure, the first feature data is extracted from the reference video and the video matching is performed through the first feature data, so that the calculation amount of the video matching is greatly reduced, and the efficiency of the video matching is improved. In addition, the embodiment of the disclosure extracts the feature data by identifying the scene change information in the video, so that the extracted first feature data is specific to the scene change, the matching probability between the first feature data and the second feature data is improved, and the success rate of video matching is improved.

In the embodiment of the disclosure, when the client identifies that the reference video is switched from the third scene to the fourth scene, a second video segment is determined from the reference video, and then the first feature data is extracted from the second video segment, wherein the second video segment comprises video data for the third scene and/or video data for the fourth scene. The process of determining the second video segment is similar to the process of determining the first video segment, and is not repeated herein.

Illustratively, when the client identifies that the reference video is switched from the third scene to the fourth scene, a second video segment is determined from the reference video, and then a second image is extracted from the second video segment and taken as the first characteristic data. Alternatively, the second image may be subjected to preprocessing such as compression, cropping, changing the image size, or graying the image, and the preprocessed second image may be used as the first feature data.

In another example, a second image may be extracted from a second video segment, then the second image may be pre-processed, the pre-processed second image may be further processed to extract image features of the second image, and the image features may be taken as the first feature data. The image feature may be a feature vector.

According to the embodiment of the present disclosure, when extracting the first feature data of the reference video, a second video segment for a scene cut is determined by recognizing the scene cut, and then the first feature data is extracted from the second video segment. Compared with the mode of comparing each image in the video, the video matching is carried out by extracting the first characteristic data, so that the video matching cost is greatly reduced. In addition, the video content corresponding to the first characteristic data in the reference video is changed according to the scene, and video matching is performed based on the first characteristic data, so that the success rate of video matching is improved.

Fig. 5 schematically shows a schematic diagram of a video matching method and a video processing method according to an embodiment of the present disclosure.

As shown in fig. 5, the video matching method performed by the server 510 includes operations S510A through S560A, and the video processing method performed by the client 520 includes operations S510B through S540B.

In operation S510A, the server 510 identifies scene change information in each of the stored candidate videos.

In operation S520A, the server 510 extracts second feature data from the candidate video after recognizing the scene change information.

In operation S510B, the client 520 identifies scene change information in the reference video.

In operation S520B, the client 520 extracts first feature data from the reference video after recognizing the scene change information.

In operation S530B, the client 520 transmits the first feature data to the server 510.

In operation S530A, the server 510 receives first feature data.

In operation S540A, the server 510 compares the first feature data with the second feature data of each candidate video to obtain a comparison result.

In operation S550A, the server 510 determines a target video matching the reference video from among the plurality of candidate videos based on the comparison result.

In operation S560A, the server 510 recommends the target video to the client 520.

In operation S540B, the client 520 presents the target video to the user.

In the embodiment of the present disclosure, if multiple frames of images in a video are compared to perform video matching, the cost of matching calculation is too high. The characteristic data in the video is extracted based on the scene switching of the video, and the client and the server (cloud) cooperate to realize the video search with lower cost.

For the client, a complete reference video does not need to be uploaded to the server, a second image is extracted by identifying scene switching in the reference video, the second image is preprocessed and processed to obtain first characteristic data, and the first characteristic data is uploaded to the server for video matching.

And for the server, comparing the first characteristic data uploaded by the client with the second characteristic data extracted in advance. The server does not limit the number of the first feature data uploaded by the client, the client does not need to upload a complete reference video, the client can extract features in real time, and the server performs video matching in real time. Therefore, according to the technical scheme of the embodiment of the disclosure, when the feature extraction is performed on videos with different video lengths or different video starting points, the extracted first feature data and the extracted second feature data are both for scene switching, that is, the extracted first feature data and the extracted second feature data have higher similarity probability, and the matching of the videos and the searching of the videos can be realized through lower calculation amount.

According to the embodiment of the disclosure, the image extraction, the image preprocessing and the feature data extraction are completed on the client, the data transmission quantity of the client and the server is reduced, and the matching speed of the server is greatly improved.

Fig. 6 schematically shows a block diagram of a video matching apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the video matching apparatus 600 of the embodiment of the present disclosure includes, for example, a receiving module 610, a comparing module 620, and a determining module 630.

The receiving module 610 may be configured to receive first feature data for a reference video. According to the embodiment of the present disclosure, the receiving module 610 may perform, for example, the operation S210 described above with reference to fig. 2, which is not described herein again.

The comparing module 620 may be configured to compare the first feature data with second feature data of each of the at least one candidate video to obtain a comparison result, where the second feature data is obtained by identifying scene change information in the candidate video. According to the embodiment of the present disclosure, the comparing module 620 may perform, for example, the operation S220 described above with reference to fig. 2, which is not described herein again.

The determining module 630 may be configured to determine a target video matching the reference video from the at least one candidate video based on the comparison result, wherein the second feature data of the target video matches the first feature data. According to the embodiment of the present disclosure, the determining module 630 may, for example, perform operation S230 described above with reference to fig. 2, which is not described herein again.

According to an embodiment of the present disclosure, the apparatus 600 may further include: the device comprises a first identification module and a first extraction module. The first identification module is used for identifying scene change information in the candidate videos aiming at each candidate video in the at least one candidate video. And the first extraction module is used for responding to the identification of the scene change information and extracting second characteristic data from the candidate video.

According to an embodiment of the present disclosure, the first extraction module includes: a first determination submodule and a first extraction submodule. The first determining sub-module is used for responding to the fact that the candidate videos are switched from the first scene to the second scene, and determining a first video segment from the candidate videos, wherein the first video segment comprises video data aiming at the first scene and/or video data aiming at the second scene. And the first extraction submodule is used for extracting second characteristic data from the first video segment.

According to an embodiment of the present disclosure, the first extraction module includes: a second extraction sub-module and a first processing sub-module. And the second extraction sub-module is used for extracting the first image from the candidate video. And the first processing submodule is used for processing the first image to obtain second characteristic data aiming at the candidate video.

According to an embodiment of the present disclosure, the first feature data includes a first data sequence including a plurality of sub data; the second characteristic data includes a second data sequence including a plurality of sub data. The comparison module 620 includes: a second determination submodule, a third determination submodule, and a comparison submodule. And the second determining sub-module is used for determining a plurality of adjacent sub-data from the first data sequence as the first sub-sequence. And the third determining sub-module is used for determining a plurality of adjacent sub-data from the second data sequence as a second sub-sequence, wherein the number of the sub-data in the second sub-sequence is the same as that of the sub-data in the first sub-sequence. And the comparison sub-module is used for comparing the first subsequence with the second subsequence.

According to an embodiment of the present disclosure, the determining module 630 is further configured to determine, in response to the first sub-sequence and the second sub-sequence matching, a candidate video corresponding to the second sub-sequence as the target video.

Fig. 7 schematically shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the video processing apparatus 700 of the embodiment of the present disclosure includes, for example, a second identifying module 710, a second extracting module 720, and a sending module 730.

The second identification module 710 may be used to identify scene change information in the reference video. According to the embodiment of the present disclosure, the second identifying module 710 may, for example, perform operation S410 described above with reference to fig. 4, which is not described herein again.

The second extraction module 720 may be configured to extract first feature data from the reference video in response to identifying the scene change information. According to the embodiment of the present disclosure, the second extracting module 720 may, for example, perform operation S420 described above with reference to fig. 4, which is not described herein again.

The sending module 730 may be configured to send the first characteristic data. According to the embodiment of the present disclosure, the sending module 730 may, for example, perform the operation S430 described above with reference to fig. 4, which is not described herein again.

According to an embodiment of the present disclosure, the second extraction module 720 includes: a fourth determination submodule and a third extraction submodule. And a fourth determining sub-module, configured to determine a second video segment from the reference video in response to identifying that the reference video is switched from the third scene to a fourth scene, where the second video segment includes video data for the third scene and/or video data for the fourth scene. And the third extraction submodule is used for extracting the first characteristic data from the second video clip.

According to an embodiment of the present disclosure, the second extraction module 720 includes: a fourth extraction sub-module and a second processing sub-module. And the fourth extraction sub-module is used for extracting the second image from the reference video. And the second processing sub-module is used for processing the second image to obtain first characteristic data aiming at the reference video.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. The electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as the video matching method. For example, in some embodiments, the video matching method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the video matching method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the video matching method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

The electronic device may be configured to perform a video processing method. The electronic device may comprise, for example, a computing unit, a ROM, a RAM, an I/O interface, an input unit, an output unit, a storage unit and a communication unit. The computing unit, the ROM, the RAM, the I/O interface, the input unit, the output unit, the storage unit, and the communication unit in the electronic device have the same or similar functions as the computing unit, the ROM, the RAM, the I/O interface, the input unit, the output unit, the storage unit, and the communication unit of the electronic device shown in fig. 8, for example, and are not described again here.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A video matching method, comprising:

receiving first feature data for a reference video;

comparing the first feature data with respective second feature data of at least one candidate video to obtain a comparison result, wherein the second feature data is obtained by identifying scene change information in the candidate video; and

and determining a target video matched with the reference video from the at least one candidate video based on the comparison result, wherein the second characteristic data of the target video is matched with the first characteristic data.

2. The method of claim 1, further comprising:

for each candidate video of the at least one candidate video, identifying scene change information in the candidate video; and

in response to identifying scene change information, second feature data is extracted from the candidate video.

3. The method of claim 2, wherein said extracting second feature data from the candidate video in response to identifying scene change information comprises:

in response to identifying a switch from a first scene to a second scene in the candidate video, determining a first video segment from the candidate video, wherein the first video segment includes video data for the first scene and/or video data for the second scene; and

extracting the second feature data from the first video segment.

4. The method of claim 2 or 3, wherein said extracting second feature data from the candidate video comprises:

extracting a first image from the candidate video; and

and processing the first image to obtain second characteristic data aiming at the candidate video.

5. The method of claim 1, wherein the first characteristic data comprises a first data sequence comprising a plurality of sub-data; the second characteristic data comprises a second data sequence comprising a plurality of subdata;

wherein the comparing the first feature data with respective second feature data of at least one candidate video comprises:

determining a plurality of adjacent sub data from the first data sequence as a first sub sequence;

determining a plurality of adjacent sub-data from the second data sequence as a second sub-sequence, wherein the number of the sub-data in the second sub-sequence is the same as the number of the sub-data in the first sub-sequence; and

comparing the first subsequence and the second subsequence.

6. The method of claim 5, wherein the determining, from the at least one candidate video, a target video that matches the reference video based on the comparison comprises:

in response to the first sub-sequence and the second sub-sequence matching, determining a candidate video corresponding to the second sub-sequence as the target video.

7. A video processing method, comprising:

identifying scene change information in a reference video;

extracting first feature data from the reference video in response to identifying scene change information; and

and sending the first characteristic data.

8. The method of claim 7, wherein said extracting first feature data from the reference video in response to identifying scene change information comprises:

in response to identifying a switch from a third scene to a fourth scene in the reference video, determining a second video segment from the reference video, wherein the second video segment includes video data for the third scene and/or video data for the fourth scene; and

extracting the first feature data from the second video segment.

9. The method of claim 7 or 8, wherein said extracting first feature data from the reference video comprises:

extracting a second image from the reference video; and

and processing the second image to obtain first characteristic data aiming at the reference video.

10. A video matching apparatus, comprising:

a receiving module for receiving first feature data for a reference video;

the comparison module is used for comparing the first characteristic data with respective second characteristic data of at least one candidate video to obtain a comparison result, wherein the second characteristic data is obtained by identifying scene change information in the candidate video; and

a determining module, configured to determine, based on the comparison result, a target video matching the reference video from the at least one candidate video, where second feature data of the target video matches the first feature data.

11. The apparatus of claim 10, further comprising:

a first identification module, configured to identify, for each candidate video of the at least one candidate video, scene change information in the candidate video; and

a first extraction module to extract second feature data from the candidate video in response to identifying scene change information.

12. The apparatus of claim 11, wherein the first extraction module comprises:

a first determining sub-module, configured to determine a first video segment from the candidate videos in response to identifying a switch from a first scene to a second scene in the candidate videos, wherein the first video segment includes video data for the first scene and/or video data for the second scene; and

a first extraction sub-module for extracting the second feature data from the first video segment.

13. The apparatus of claim 11 or 12, wherein the first extraction module comprises:

a second extraction sub-module, configured to extract a first image from the candidate video; and

and the first processing submodule is used for processing the first image to obtain second characteristic data aiming at the candidate video.

14. The apparatus of claim 10, wherein the first characteristic data comprises a first data sequence comprising a plurality of sub-data; the second characteristic data comprises a second data sequence comprising a plurality of subdata;

wherein the comparison module comprises:

a second determining submodule, configured to determine, from the first data sequence, a plurality of adjacent sub-data as a first sub-sequence;

a third determining sub-module, configured to determine, from the second data sequence, a plurality of adjacent sub-data as a second sub-sequence, where a number of the sub-data in the second sub-sequence is the same as a number of the sub-data in the first sub-sequence; and

a comparison sub-module for comparing the first sub-sequence with the second sub-sequence.

15. The apparatus of claim 14, wherein the means for determining is further configured to:

16. A video processing apparatus comprising:

the second identification module is used for identifying scene change information in the reference video;

a second extraction module for extracting first feature data from the reference video in response to identifying scene change information; and

and the sending module is used for sending the first characteristic data.

17. The apparatus of claim 16, wherein the second extraction module comprises:

a fourth determining sub-module, configured to determine a second video segment from the reference video in response to identifying a switch from a third scene to a fourth scene in the reference video, wherein the second video segment includes video data for the third scene and/or video data for the fourth scene; and

a third extraction sub-module for extracting the first feature data from the second video segment.

18. The apparatus of claim 16 or 17, wherein the second extraction module comprises:

a fourth extraction sub-module, configured to extract a second image from the reference video; and

and the second processing submodule is used for processing the second image to obtain first characteristic data aiming at the reference video.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

20. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 7-9.

21. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 7-9.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.

24. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 7-9.