CN110826461A

CN110826461A - Video content identification method and device, electronic equipment and storage medium

Info

Publication number: CN110826461A
Application number: CN201911052014.7A
Authority: CN
Inventors: 周小涛; 刘浏
Original assignee: Shenzhen Onething Technology Co Ltd
Current assignee: Shenzhen Onething Technology Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2020-02-21

Abstract

A video content identification method, the method comprising: receiving a video network request for a target video; extracting video fingerprint features of the target video; calculating a hash value of the video fingerprint feature; according to the hash value, whether an index matched with the hash value exists or not is inquired from a cache; if an index matched with the hash value exists, acquiring basic fingerprint characteristics corresponding to the index; judging whether the video fingerprint features are similar to the basic fingerprint features or not by adopting a Manhattan distance; and if the video fingerprint features are similar to the basic fingerprint features, determining that the videos corresponding to the target video and the basic fingerprint features are videos with the same content. The invention also provides a video content identification device, electronic equipment and a storage medium. The method and the device can accurately identify the contents of the two videos.

Description

Video content identification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for identifying video content, an electronic device, and a storage medium.

Background

With the rapid development of internet technology, watching network videos becomes an indispensable way for people to acquire information, entertainment and the like. There are a lot of videos on the internet, and there are many different versions of the same content video, and their formats, bit rates, frame rates, durations, etc. may be different. In addition, there are some cases where video content is dubbed or pirated. This situation brings many problems for network video, such as copyright management, video resource storage, video content retrieval, and harmful information management.

Therefore, how to accurately identify the contents of the two videos is a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, it is desirable to provide a video content identification method, device, electronic device and storage medium, which can accurately identify the content of two videos.

A first aspect of the present invention provides a video content identification method, including:

receiving a video network request for a target video;

extracting video fingerprint features of the target video;

calculating a hash value of the video fingerprint feature;

according to the hash value, whether an index matched with the hash value exists or not is inquired from a cache;

if an index matched with the hash value exists, acquiring basic fingerprint characteristics corresponding to the index;

judging whether the video fingerprint features are similar to the basic fingerprint features or not by adopting a Manhattan distance;

and if the video fingerprint features are similar to the basic fingerprint features, determining that the videos corresponding to the target video and the basic fingerprint features are videos with the same content.

In a possible implementation manner, the video network request is a video query request, and after the extracting the video fingerprint feature of the target video, the method further includes:

distinguishing the video salient features from the video fingerprint features by adopting a Hamming distance;

the calculating the hash value of the video fingerprint feature comprises:

and calculating the hash value of the video salient feature.

In one possible implementation, the method further includes:

and if the video fingerprint features are similar to the basic fingerprint features, outputting the similar position relation of the target video and the video corresponding to the basic fingerprint features.

In a possible implementation manner, the querying, according to the hash value, whether an index matching the hash value exists in a cache includes:

judging whether indexes the same as hash values exist in the hash values of the indexes cached by Redis;

if indexes identical to the hash values exist in the hash values of the indexes cached by the Redis, determining that indexes matched with the hash values exist; or

If the indexes identical to the hash values do not exist in the hash values of the indexes cached by the Redis, judging whether the indexes identical to the hash values exist in the hash values of the indexes cached by the Hbase;

and if the indexes which are the same as the hash values exist in the hash values of the indexes cached by the Hbase, determining that the indexes which are matched with the hash values exist.

In one possible implementation, the video network request is a video write request, and the method further includes:

if the video fingerprint features are similar to the basic fingerprint features, judging whether the target video contains a video corresponding to the basic fingerprint features;

and if the target video comprises the video corresponding to the basic fingerprint features, replacing the video corresponding to the basic fingerprint features in the database with the target video.

In one possible implementation, the method further includes:

if the target video is partially overlapped with the video corresponding to the basic fingerprint feature, recording the similar position relation between the target video and the video corresponding to the basic fingerprint feature, and storing the target video into the database.

In one possible implementation, the method further includes:

if the index matched with the hash value does not exist, a target index is created for the target video by adopting a locality sensitive hash algorithm;

storing the target video in the database.

In one possible implementation manner, the extracting the video fingerprint feature of the target video includes:

extracting frame fingerprint characteristics of the target video;

and processing the frame fingerprint characteristics at a uniform frame rate to obtain the video fingerprint characteristics of the target video.

A second aspect of the present invention provides a video content recognition apparatus, the apparatus comprising:

the receiving module is used for receiving a video network request aiming at a target video;

the extraction module is used for extracting video fingerprint characteristics of the target video;

the computing module is used for computing the hash value of the video fingerprint characteristic;

the query module is used for querying whether an index matched with the hash value exists in a cache according to the hash value;

the acquisition module is used for acquiring the basic fingerprint characteristics corresponding to the index if the index matched with the hash value exists;

the judging module is used for judging whether the video fingerprint features are similar to the basic fingerprint features or not by adopting a Manhattan distance;

and the determining module is used for determining that the target video and the video corresponding to the basic fingerprint feature are videos with the same content if the video fingerprint feature is similar to the basic fingerprint feature.

A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the video content identification method when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video content identification method.

According to the technical scheme, when a video network request aiming at a target video is received, the video fingerprint characteristics of the target video can be extracted firstly; calculating a hash value of the video fingerprint feature; according to the hash value, whether an index matched with the hash value exists or not is inquired from a cache; if an index matched with the hash value exists, acquiring basic fingerprint characteristics corresponding to the index; further, judging whether the video fingerprint features are similar to the basic fingerprint features or not by adopting a Manhattan distance; and if the video fingerprint features are similar to the basic fingerprint features, determining that the videos corresponding to the target video and the basic fingerprint features are videos with the same content.

Therefore, according to the invention, after a video network request aiming at the target video is received, the video fingerprint characteristics of the target video can be extracted, and the similarity judgment is carried out on the video fingerprint characteristics and the basic fingerprint characteristics corresponding to the index inquired in the cache.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flow chart of a video content recognition method according to a preferred embodiment of the present invention.

Fig. 2 is a functional block diagram of a video content recognition apparatus according to a preferred embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of an electronic device implementing a video content recognition method according to a preferred embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, nor should they be construed to indicate or imply the relative importance thereof or the number of technical features indicated. It will be appreciated that the data so used are interchangeable under appropriate circumstances such that the embodiments described herein are capable of operation in sequences other than those illustrated or otherwise described herein, and that the features defined as "first" and "second" may explicitly or implicitly include at least one such feature.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The video content identification method of the embodiment of the invention is applied to electronic equipment, and can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and the server and the electronic equipment are jointly executed. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.

A server may refer to a computer system that provides services to other devices (e.g., electronic devices) in a network.

An electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.

Referring to fig. 1, fig. 1 is a flowchart illustrating a video content recognition method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.

S11, the electronic equipment receives a video network request aiming at the target video.

The target video may be any one of videos, such as a legal video, a dubbing video, a pirating video, or videos with the same content but different versions. Wherein the video network request may comprise a video query request or a video write request.

S12, the electronic equipment extracts the video fingerprint features of the target video.

Specifically, the extracting the video fingerprint features of the target video includes:

extracting frame fingerprint characteristics of the target video;

In the invention, because various reproduction versions exist in the video, the electronic equipment needs to pre-process the target video before extracting the video fingerprint characteristics of the target video, filter out the meaningless black frame area, letter area and station logo advertisement area of the video, and then can extract the frame fingerprint characteristics of each frame of target video.

The method comprises the following specific steps:

1) converting each frame of target video image into a gray-scale image;

2) dividing the gray level image into equal-sized blocks, for example, dividing each picture into sub-areas of 6X4 size;

3) removing possible station caption and caption areas;

4) calculating the average value Avg of the gray scale of the whole picture;

5) comparing the pixel gray level mean value and the Avg value in each average block, if the pixel gray level mean value in each average block is greater than the Avg value, recording as 1, otherwise, recording as 0; and combining the frame fingerprint characteristics of the frame target video image according to the same sequence to obtain a 24-bit integer.

In addition, because the frame rates of the videos of different versions are different, in order to enable the videos of different frame rates to be capable of performing similarity calculation, it is also necessary to perform processing of uniform frame rate on the extracted frame fingerprint features.

The method comprises the steps of performing nearest neighbor resampling on videos with high frame rates (such as 29 frames/second), performing nearest neighbor interpolation on videos with low frame rates (such as 17 frames/second), and finally unifying the frame rates of all videos to a reference frame rate (such as 24 frames/second). The frame rate is higher or lower than the reference frame rate.

The method comprises the following specific steps:

if frames F [ i ], i ═ 1, …, p of the original fingerprint sequence, timestamps T [1], T [2], … T [ p ], timestamps NT [1], NT [2], …, NT [ q ] of the fingerprint sequence to be generated, then the frame fingerprint NF [ k ] at time NT [ k ] takes the original frame fingerprint fs [ s ] corresponding to the timestamp ts [ s ] whose temporal distance is the smallest.

And S13, the electronic equipment calculates the hash value of the video fingerprint feature.

Wherein the hash value of the video fingerprint feature can be calculated through a hash algorithm.

And S14, the electronic equipment inquires whether an index matched with the hash value exists in the cache according to the hash value, if so, the step S15 is executed, and if not, the process is ended.

In the invention, a local sensitive hashing algorithm (LSH) and a frame skipping method can be adopted in advance to create all indexes of video fingerprint characteristics. Specifically, it is assumed that the video a generates video fingerprint features of a after fingerprint feature extraction, the number of frame fingerprints of the video a is Fn, wherein a continuous sequence of every K frame video fingerprint features is used as an independent unit for generating a fingerprint index, where K may be 120. According to the frame skipping method, assuming that the interval between two adjacent K frame sequences is 8 frames, indexes need to be created for (Fn-K)/8 consecutive sequences. For K frame fingerprints, each frame is composed of a 4-byte 32-bit integer, so each frame fingerprint can be regarded as a 32-bit 01 vector, and thus the K frame fingerprint is a vector L with the length of 32K. Then, according to a hash algorithm, a plurality of groups of random number sequences, such as 20 groups, are randomly generated, each group is composed of a plurality of numbers, such as 28, and the value range of each number is 1-32 × K. Then, the bit value at the corresponding position is taken out from L according to each group of random numbers, and then a group of 01 sequences with 28 bits is obtained, and the sequences can be expressed by a 32-bit integer. After repeating the fetching of 20 groups, 20 32-bit integers are obtained, which will be used to generate 20 groups of indices for the K frames of data.

Each group of indexes is composed of a 32-bit integer (hash value), a video identifier, and a start position of a k frame, such as: buckettId: { fpid1: { start1, start2,. }, fpid2: { start1, start2,.. },..,. fpidn: { start1, start2,.. } }, wherein buckettId is a hash value, fpid1, fpid2 … fpidn is a video logo, start1, start2,... is a start position.

In the invention, the Hbase cache can be adopted to store the index data, so that the index data can be quickly detected, new index data can be quickly updated to an index system, and the problem of complex backup of the traditional disk file storage data is solved.

As an optional implementation manner, the querying, according to the hash value, whether there is an index matching with the hash value from a cache includes:

In this embodiment, the cache may include a Redis cache and an Hbase cache, where the Redis cache may be a first-level cache and is mainly used to store index data with a higher hot retrieval frequency, and the Hbase cache may be a second-level cache and is mainly used to store a full amount of index data.

After the hash value of the target video is obtained through calculation, searching can be performed in a Redis cache, whether an index which is the same as the hash value exists in the hash values of the indexes of the Redis cache or not is judged, if the index which is the same as the hash value exists in the hash values of the indexes of the Redis cache, the index which is matched with the hash value can be determined to exist, and corresponding index data can be read from the Redis cache; on the contrary, if the index identical to the hash value does not exist in the hash values of the multiple indexes of the Redis cache, the index can be searched from the Hbase cache, whether the index identical to the hash value exists in the hash values of the multiple indexes of the Hbase cache is judged, if the index identical to the hash value exists in the hash values of the multiple indexes of the Hbase cache, the index matched with the hash value can be determined to exist, and then the corresponding index data can be read from the Hbase cache, and meanwhile, the index data read from the Hbase cache can be cached into the Redis cache, so that the index data can be conveniently searched in the Redis cache in the subsequent process.

It should be noted that, no matter whether the video query or the video write, when the index query is performed, the index query can be performed according to the above method, that is, the query is performed in the Redis cache first, and the query cannot be performed, and then the query is performed in the Hbase cache.

And S15, the electronic equipment acquires the basic fingerprint characteristics corresponding to the index.

In the invention, after the index is found, the electronic equipment can obtain the basic fingerprint characteristics corresponding to the index from the database through the index.

Wherein, a plurality of storage strategies are stored in the basic fingerprint characteristics. These storage strategies may make it more convenient to recommend better quality videos. Wherein, a basic fingerprint A is set, and all the videos contained in the content A are elements of the fingerprint closure set. There are only four relations of including, included, overlapping and not including each other between videos.

Rule one is as follows: principle of direct warehousing

And if the inquired video cannot find any video with the same content in the basic fingerprint library, executing direct fingerprint warehousing operation.

Rule two: integrity warehousing principle

For each base fingerprint, video md5 that is identified as having the same content as it must be contained by the video to which the fingerprint corresponds. Otherwise they may only overlap or not contain each other.

Rule three: alternative principles

And for the query video, if the video contains a video corresponding to a certain basic fingerprint A, applying a replacement principle, taking the fingerprint B as a basic fingerprint to be put in storage, and simultaneously, the contained fingerprint closure set of the basic fingerprint A becomes a subset of the fingerprint closure set of the fingerprint B. The base fingerprint A, and its closure set A, are also cleared.

Rule four: rule of overlap

For query video B, if the video is not included by any base fingerprint but is in an overlapping relationship with some base fingerprints a, the video needs to perform base fingerprint binning operation. Meanwhile, B needs to be compared with all the fingerprints md5 in the fingerprint closure set A in position, and a new fingerprint closure set B is constructed.

Rule five: overlap of base fingerprints

Overlapping relationships are allowed between the underlying fingerprints, where the shortest overlap that can be resolved is the shortest video frame length in the fingerprint library. The correspondence between the base fingerprints requires additional records, including overlapping locations, etc.

Rule six: is included with the rule

The inquiry fingerprint has an included relation with a certain basic fingerprint, and only the operation that the inquiry fingerprint is merged into the basic fingerprint closure set is executed.

And S16, the electronic equipment adopts the Manhattan distance to judge whether the video fingerprint characteristics are similar to the basic fingerprint characteristics, if so, the step S17 is executed, and if not, the process is ended.

In the invention, the target video can be a video clip or a complete video.

If the target video is a video clip, performing distance calculation on the video fingerprint features of the target video and the basic fingerprint features by directly adopting a Manhattan distance (Manhattan distance), wherein a distance calculation formula is as follows:

f (i, k) is the video fingerprint feature of the target video, and g (i, k) is the base fingerprint feature. And if the calculated distance is greater than a preset distance threshold, determining that the target video is not similar to the video corresponding to the basic fingerprint feature, otherwise, if the calculated distance is less than or equal to the preset distance threshold, determining that the target video is similar to the video corresponding to the basic fingerprint feature.

If the target video is a complete video, the target video needs to be segmented into multiple segments, each segment has an interval, then, after the distance calculation formula is used for judging whether the video fingerprint features of each segmented video are similar to the basic fingerprint features, and after the similarity judgment of all the segmented video segments is completed, if the percentage of the number of the similar video segments relative to the total number of the segments of the target video is judged to be greater than a percentage threshold (such as 0.5), such as the number of the similar video segments is judged to be 3 segments, and the total number of the segments of the target video is 5, the percentage is 3/5 to 0.6, then the target video can be judged to be similar to the video corresponding to the basic fingerprint features.

S17, the electronic equipment determines that the target video and the video corresponding to the basic fingerprint feature are videos with the same content.

As an optional implementation manner, after the video network request is a video query request and the video fingerprint feature of the target video is extracted in step S12, the method further includes:

distinguishing the video salient features from the video fingerprint features by using a Hamming distance (hamming distance);

the calculating the hash value of the video fingerprint feature comprises:

and calculating the hash value of the video salient feature.

In this alternative embodiment, the video network request is a video query request.

Generally, for the beginning and end of a movie, and long-time still video segments, these segments are either simply repeated in the same movie or have small differences between adjacent segments (such as the time of the screen break of the end of the movie), and these scene segments belong to segments without significant distinctiveness. In addition, for a video scene which changes rapidly and sharply, the accuracy of recognition is affected due to the fact that face switching is too fast, and the video segment which changes rapidly and sharply also belongs to a segment which does not have remarkable distinguishability.

When video query is carried out, fingerprints of video segments with remarkable distinguishability need to be extracted. Specifically, a hamming distance may be adopted to distinguish the video salient features from the video fingerprint features, where the formula of the hamming distance between two adjacent frames is as follows:

wherein x (i) and x (i +1) are two adjacent frames.

After distinguishing the video salient features from the video fingerprint features, the hash value needs to be calculated for the video salient features, the hash values of all the video fingerprint features do not need to be calculated, and then the query is performed for the video salient features, so that invalid queries can be reduced, and the query efficiency is improved.

As an optional implementation, the method further comprises:

In this optional embodiment, the video network request is a video query request, and after determining that the video fingerprint features are similar to the base fingerprint features, a similar position relationship between the target video and the video corresponding to the base fingerprint features may be output, for example, a similar relationship exists between the content of the target video in the 3 rd to 10 th seconds and the content of the video in the 5 th to 12 th seconds corresponding to the base fingerprint features.

As an optional implementation manner, the video network request is a video write request, and the method further includes:

In this alternative embodiment, the video network request is a video write request, and typically, the video to be written, that is, the target video is typically a full video. For a complete video, the similarity between the video fingerprint features and the basic fingerprint features includes the following cases, the first case is: the target video completely includes the video corresponding to the base fingerprint feature, for example, the target video includes upper and lower contents, while the video corresponding to the base fingerprint feature includes only the upper content, i.e., the content of the target video is more complete than the content of the video corresponding to the base fingerprint feature, and second, there is a partial overlap between the target video and the video corresponding to the base fingerprint feature, i.e., the partially overlapped portions are similar.

Under the condition that the video fingerprint features are similar to the basic fingerprint features, if the target video contains a video corresponding to the basic fingerprint features, the video corresponding to the basic fingerprint features in the database can be directly replaced by the target video.

As an optional implementation, the method further comprises:

In the case that the video fingerprint features are similar to the base fingerprint features, if there is a partial overlap between the target video and the video corresponding to the base fingerprint features, the partially overlapped contents are similar, and the non-partially overlapped contents are dissimilar, at this time, it is necessary to record a similar position relationship between the target video and the video corresponding to the base fingerprint features, for example, there is a similar relationship between the content of the 3 rd to 10 th seconds of the target video and the content of the 5 th to 12 th seconds of the video corresponding to the base fingerprint features, and at the same time, it is also necessary to store the target video in the database.

As an optional implementation, the method further comprises:

storing the target video in the database.

In this optional implementation, when querying whether an index matching the hash value exists from the buffer, a situation that the index cannot be queried may occur, that is, the target video is a completely new complete video, and in this situation, a locality sensitive hash algorithm needs to be adopted to create a target index for the target video, and store the target video in the database, so as to enrich videos in the database.

Optionally, the related data of the target index also needs to be written into the Redis cache and the Hbase cache.

In the method flow described in fig. 1, when a video network request for a target video is received, video fingerprint features of the target video may be extracted first; calculating a hash value of the video fingerprint feature; according to the hash value, whether an index matched with the hash value exists or not is inquired from a cache; if an index matched with the hash value exists, acquiring basic fingerprint characteristics corresponding to the index; further, judging whether the video fingerprint features are similar to the basic fingerprint features or not by adopting a Manhattan distance; and if the video fingerprint features are similar to the basic fingerprint features, determining that the videos corresponding to the target video and the basic fingerprint features are videos with the same content. Therefore, according to the invention, after a video network request aiming at the target video is received, the video fingerprint characteristics of the target video can be extracted, and the similarity judgment is carried out on the video fingerprint characteristics and the basic fingerprint characteristics corresponding to the index inquired in the cache.

The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.

Referring to fig. 2, fig. 2 is a functional block diagram of a video content recognition apparatus according to a preferred embodiment of the present invention.

In some embodiments, the video content identification apparatus is run in an electronic device. The video content recognition means may comprise a plurality of functional modules consisting of program code segments. The program codes of the program segments in the video content identification apparatus may be stored in the memory and executed by at least one processor to perform part or all of the steps in the video content identification method described in fig. 1, which may specifically refer to the relevant description in fig. 1 and will not be described herein again.

In this embodiment, the video content recognition apparatus may be divided into a plurality of functional modules according to the functions performed by the video content recognition apparatus. The functional module may include: the device comprises a receiving module 201, an extracting module 202, a calculating module 203, a querying module 204, an obtaining module 205, a judging module 206 and a determining module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.

A receiving module 201, configured to receive a video network request for a target video;

an extraction module 202, configured to extract video fingerprint features of the target video;

a calculating module 203, configured to calculate a hash value of the video fingerprint feature;

the query module 204 is configured to query whether an index matching the hash value exists in a cache according to the hash value;

an obtaining module 205, configured to obtain, if an index matching the hash value exists, a basic fingerprint feature corresponding to the index;

a determining module 206, configured to determine whether the video fingerprint feature is similar to the basic fingerprint feature by using a manhattan distance;

a determining module 207, configured to determine that the video corresponding to the target video and the video corresponding to the basic fingerprint feature are videos with the same content if the video fingerprint feature is similar to the basic fingerprint feature.

Optionally, the video network request is a video query request, and the video content identification apparatus further includes:

the distinguishing module is used for adopting Hamming distance to distinguish video salient features from the video fingerprint features after the extracting module extracts the video fingerprint features of the target video;

the calculating module 203 is specifically configured to calculate a hash value of the video salient feature.

and the output module is used for outputting the similar position relation between the target video and the video corresponding to the basic fingerprint feature if the video fingerprint feature is similar to the basic fingerprint feature.

Optionally, the manner of querying, by the querying module 204, whether an index matching the hash value exists in the cache according to the hash value is specifically:

Optionally, the video network request is a video write request, and the determining module 206 is further configured to determine whether the target video includes a video corresponding to the basic fingerprint feature if the video fingerprint feature is similar to the basic fingerprint feature;

the video content recognition apparatus further includes:

and the replacing module is used for replacing the video corresponding to the basic fingerprint characteristic in the database with the target video if the target video comprises the video corresponding to the basic fingerprint characteristic.

Optionally, the video content recognition apparatus further includes:

the recording module is used for recording the similar position relation of the target video and the video corresponding to the basic fingerprint feature if the target video and the video corresponding to the basic fingerprint feature are partially overlapped;

and the storage module is used for storing the target video into the database.

Optionally, the video content recognition apparatus further includes:

the creating module is used for creating a target index for the target video by adopting a locality sensitive hashing algorithm if the index matched with the hashing value does not exist;

the storage module is further used for storing the target video into the database.

Optionally, the manner of extracting the video fingerprint feature of the target video by the extraction module 202 is specifically as follows:

extracting frame fingerprint characteristics of the target video;

In the video content identification apparatus described in fig. 2, after receiving a video network request for a target video, the video fingerprint feature of the target video may be extracted, and the similarity between the video fingerprint feature and the basic fingerprint feature corresponding to the index queried in the cache may be determined, and as long as the version, format, bitrate, and frame rate of the target video are determined to be similar to each other, it may be determined that the video corresponding to the target video and the basic fingerprint feature in the database is the same video, so that the content of the two videos may be accurately identified.

As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device implementing a video content recognition method according to a preferred embodiment of the invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.

Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.

The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.

The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. Further, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.

With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement a video content recognition method, and the processor 32 executes the plurality of instructions to implement:

receiving a video network request for a target video;

extracting video fingerprint features of the target video;

calculating a hash value of the video fingerprint feature;

In an alternative embodiment, the video network request is a video query request, and after extracting the video fingerprint feature of the target video, the processor 32 may execute the plurality of instructions to implement:

the calculating the hash value of the video fingerprint feature comprises:

and calculating the hash value of the video salient feature.

In an alternative embodiment, the processor 32 may execute the plurality of instructions to implement:

In an optional embodiment, the querying, according to the hash value, whether there is an index matching the hash value from the cache includes:

In an alternative embodiment, where the video network request is a video write request, the processor 32 may execute the plurality of instructions to:

storing the target video in the database.

In an optional implementation, the extracting the video fingerprint feature of the target video includes:

extracting frame fingerprint characteristics of the target video;

Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.

In the electronic device 3 described in fig. 3, after receiving a video network request for a target video, the video fingerprint feature of the target video may be extracted, and the similarity between the video fingerprint feature and the basic fingerprint feature corresponding to the index queried in the cache is determined, and as long as the version, format, bitrate, and frame rate of the target video are determined to be similar to each other, it may be determined that the target video and the video corresponding to the basic fingerprint feature in the database are videos of the same content, so that the content of the two videos can be accurately identified.

The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, and Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for identifying video content, the method comprising:

receiving a video network request for a target video;

extracting video fingerprint features of the target video;

calculating a hash value of the video fingerprint feature;

2. The method of claim 1, wherein the video network request is a video query request, and wherein after extracting the video fingerprint feature of the target video, the method further comprises:

the calculating the hash value of the video fingerprint feature comprises:

and calculating the hash value of the video salient feature.

3. The method of claim 2, further comprising:

4. The method of claim 2, wherein the querying from the cache whether there is an index matching the hash value according to the hash value comprises:

5. The method of claim 1, wherein the video network request is a video write request, the method further comprising:

6. The method of claim 5, further comprising:

7. The method of claim 5, further comprising:

storing the target video in the database.

8. The method according to any one of claims 1 to 7, wherein the extracting the video fingerprint feature of the target video comprises:

extracting frame fingerprint characteristics of the target video;

9. An apparatus for video content recognition, the apparatus comprising:

10. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the video content identification method according to any one of claims 1 to 8.

11. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements a video content identification method as claimed in any one of claims 1 to 8.