CN112911239B - Video processing method and device, electronic equipment and storage medium - Google Patents

Video processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112911239B
CN112911239B CN202110120299.4A CN202110120299A CN112911239B CN 112911239 B CN112911239 B CN 112911239B CN 202110120299 A CN202110120299 A CN 202110120299A CN 112911239 B CN112911239 B CN 112911239B
Authority
CN
China
Prior art keywords
video
target object
target
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110120299.4A
Other languages
Chinese (zh)
Other versions
CN112911239A (en
Inventor
宋述铕
侯超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202110120299.4A priority Critical patent/CN112911239B/en
Publication of CN112911239A publication Critical patent/CN112911239A/en
Priority to PCT/CN2021/129187 priority patent/WO2022160849A1/en
Application granted granted Critical
Publication of CN112911239B publication Critical patent/CN112911239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a video processing method and apparatus, an electronic device, and a storage medium, wherein the method includes: segmenting a received target video into a plurality of video segments, wherein the plurality of video segments comprise at least one target object; executing deduplication operation on at least N continuous video frames with the same target object in the multiple video clips in parallel to obtain a first deduplication result; performing the deduplication operation on at least N continuous video frames at the joint of adjacent video segments to obtain a second deduplication result; and merging the first duplicate removal result and the second duplicate removal result to obtain a duplicate removal result of each target object in the target video. The embodiment of the disclosure can improve the duplicate removal efficiency and realize accurate duplicate removal of the target video.

Description

Video processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a video processing method and apparatus, an electronic device, and a storage medium.
Background
In many application scenarios of video, there may be a technical need for deduplication of video frames of a video, for example, in the field of surveillance, deduplication of video frames of a surveillance video is performed to save storage space, but in the related art, deduplication of video frames is inefficient.
Disclosure of Invention
The present disclosure provides a video processing technical solution.
According to an aspect of the present disclosure, there is provided a video processing method including:
segmenting a received target video into a plurality of video segments, wherein the plurality of video segments comprise at least one target object;
performing deduplication operations on at least N consecutive video frames with the same target object in the multiple video clips in parallel to obtain a first deduplication result, wherein the first deduplication result comprises a set formed by characteristics of the target object obtained after the deduplication operations are performed on the multiple video clips;
performing the deduplication operation on at least N consecutive video frames at the joint of adjacent video segments to obtain a second deduplication result, wherein the second deduplication result comprises a set formed by the features of the target object obtained after the deduplication operation is performed on the at least N consecutive video frames at the joint of the adjacent video segments;
and merging the first duplicate removal result and the second duplicate removal result to obtain a duplicate removal result of each target object in the target video.
In one possible implementation, the deduplication operation includes:
acquiring a first video frame in the target video according to the time sequence of the video frames;
detecting whether first features belonging to the same target object exist in features of a target object contained in the first video frame and a comparison set, and detecting whether second features, which are different between the target object and other target objects, exist in the comparison set, wherein the comparison set contains the target features of the target object contained in the first N-1 video frames adjacent to the first video frame;
and under the condition that the second feature exists in the comparison set and the second feature is not detected in continuous N-1 video frames behind the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
In a possible implementation manner, the detecting whether there are first features belonging to the same target object and second features not belonging to the same target object in the first video frame and the comparison set includes:
determining characteristics of at least one target object contained in the first video frame;
respectively determining the similarity between the characteristics of the at least one target object and each target characteristic in the comparison set;
and obtaining a detection result according to the similarity and a preset similarity threshold.
In a possible implementation manner, after detecting whether there are first features belonging to the same target object and second features, which are different from other target objects, in the feature and comparison set of the target object included in the first video frame, the method further includes:
under the condition that the first features are detected, adding the first features with high quality scores in the first features belonging to the same target object into the comparison set;
and adding the second features contained in the first video frame into the comparison set under the condition that the second features are detected to exist in the first video frame.
In a possible implementation manner, the target feature meets a preset feature condition, where the preset feature condition includes:
the quality score of the target feature is higher than a preset quality score;
the target features are features with the highest quality scores in a plurality of images containing the same target object, the plurality of images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame, and the first video frame is a video frame containing the target object.
In a possible implementation manner, the segmenting the received target video into a plurality of video segments specifically includes:
according to the time sequence of the video frames, marking the video frames in the target video to obtain a marking result;
and determining at least N continuous video frames at the joint of the adjacent video segments according to the time sequence order of the video frames represented by the marking result.
In one possible implementation, the at least N consecutive video frames at the connection of the adjacent video segments include:
in the temporally adjacent video segments, N-1 video frames at the end of the previous video segment and N-1 video frames at the head of the next video segment.
In one possible implementation, before performing the deduplication operation in parallel on at least N consecutive video frames in the plurality of video segments in which the same target object exists, the method further includes:
determining a video frame containing the target object in the target video;
determining a quality score of a feature of a target object contained in a video frame, the quality score being determined based on at least one of:
the definition of a target object in a video frame and the angle between the target object and a lens in the video frame.
According to an aspect of the present disclosure, there is provided a video processing apparatus including:
the segmentation unit is used for segmenting the received target video into a plurality of video segments, wherein the video segments comprise at least one target object;
a first deduplication unit, configured to perform deduplication operations on at least N consecutive video frames in the multiple video segments where the same target object exists in parallel to obtain a first deduplication result, where the first deduplication result includes a set formed by features of target objects obtained after the deduplication operations are performed on the multiple video segments;
the second duplicate removal unit is used for executing the duplicate removal operation on at least N continuous video frames at the joint of the adjacent video segments to obtain a second duplicate removal result, and the second duplicate removal result comprises a set formed by the characteristics of the target object obtained after the duplicate removal operation is carried out on the at least N continuous video frames at the joint of the adjacent video segments;
and the merging unit is used for merging the first duplicate removal result and the second duplicate removal result to obtain a duplicate removal result of each target object in the target video.
In a possible implementation manner, the deduplication operation is performed by a deduplication subunit, and the deduplication subunit is configured to obtain a first video frame in the target video according to a time sequence order of the video frames; detecting whether first features belonging to the same target object exist in features of a target object contained in the first video frame and a comparison set, and detecting whether second features, which are different between the target object and other target objects, exist in the comparison set, wherein the comparison set contains the target features of the target object contained in the first N-1 video frames adjacent to the first video frame; and under the condition that the second feature exists in the comparison set and the second feature is not detected in continuous N-1 video frames behind the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
In a possible implementation, the de-weighting subunit is configured to determine a feature of at least one target object included in the first video frame; respectively determining the similarity between the characteristics of the at least one target object and each target characteristic in the comparison set; and obtaining a detection result according to the similarity and a preset similarity threshold.
In a possible implementation manner, the duplication elimination subunit is configured to, in a case that the first feature is detected, add a first feature with a high quality score in the first features belonging to the same target object into the comparison set; and adding the second features contained in the first video frame into the comparison set under the condition that the second features are detected to exist in the first video frame.
In a possible implementation manner, the target feature meets a preset feature condition, where the preset feature condition includes:
the quality score of the target feature is higher than a preset quality score;
the target features are features with the highest quality scores in a plurality of images containing the same target object, the plurality of images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame, and the first video frame is a video frame containing the target object.
In a possible implementation manner, the segmentation unit is configured to label video frames in the target video according to a time sequence order of the video frames to obtain a labeling result; and determining at least N continuous video frames at the joint of the adjacent video segments according to the time sequence order of the video frames represented by the marking result.
In one possible implementation, the at least N consecutive video frames at the connection of the adjacent video segments includes:
in the temporally adjacent video segments, N-1 video frames at the end of the previous video segment and N-1 video frames at the head of the next video segment.
In one possible implementation, the apparatus further includes:
a video frame determining unit, configured to determine a video frame containing the target object in the target video;
a quality score determination unit for determining a quality score of a feature of a target object contained in a video frame, the quality score being determined according to at least one of the following information:
the definition of a target object in a video frame and the angle between the target object and a lens in the video frame.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In the embodiment of the present disclosure, after receiving a deduplication request for a target video, deduplication operations are performed on at least N consecutive video frames in each video segment of the target video, where the same target object exists, in parallel to obtain a first deduplication result, and the deduplication operations are performed on at least N consecutive video frames at a connection of adjacent video segments to obtain a second deduplication result, so that deduplication of the entire target video is achieved. Therefore, the efficiency of removing the duplicate of the video frames can be improved by executing the duplicate removal operation on each video clip of the target video in parallel, and in addition, because the adjacent video clip connection part also possibly has the video frames needing the duplicate removal after the video clip connection part is connected, the duplicate removal is executed on the adjacent video clip connection part, and the accurate duplicate removal of the whole target video is realized on the basis of improving the duplicate removal efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flow diagram of a video processing method according to an embodiment of the present disclosure.
Fig. 2 shows an application scenario diagram of a video processing method according to an embodiment of the present disclosure.
Fig. 3 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure.
Fig. 4 shows a block diagram of an electronic device according to an embodiment of the disclosure.
Fig. 5 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, C, and may mean including any one or more elements selected from the group consisting of a, B, and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.
In the related art, the efficiency of removing the duplicate of the video frames in the video is low, and the embodiment of the disclosure provides a video processing method, after receiving a duplicate removal request for a target video, the method performs the duplicate removal operation on at least N consecutive video frames in each video segment of the target video, where the same target object exists, in parallel to obtain a first duplicate removal result, and performs the duplicate removal operation on at least N consecutive video frames at a connection of adjacent video segments to obtain a second duplicate removal result, thereby achieving the duplicate removal of the whole target video. Therefore, the efficiency of removing the duplication of the video frames can be improved by executing the duplication removing operation on each video clip of the target video in parallel, and in addition, because the video frames needing to be removed are possibly existed at the joints of the adjacent video clips after the video clips are connected, the duplication removing operation is executed on the joints of the adjacent video clips, so that the accurate duplication removing of the whole target video is realized on the basis of improving the duplication removing efficiency.
In one possible implementation, the video processing method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.
Fig. 1 shows a flow chart of a video processing method according to an embodiment of the present disclosure, as shown in fig. 1, the video processing method includes:
in step S11, the received target video is segmented into a plurality of video segments, wherein the plurality of video segments include at least one target object.
The video clips are video clips in the target video, and the target video can be obtained by combining the video clips. In one possible implementation, after receiving a target video, the target video may be segmented into a plurality of video segments. That is, the video segments may be obtained by temporally slicing the target video, for example, a target video containing 1000 video frames, and the target video may be temporally divided into 5 video segments, each of which contains 200 video frames.
In the embodiment of the present disclosure, at least N consecutive video frames in the target video where the same target object exists are deduplicated, where N is an integer greater than 1. In the embodiment of the present disclosure, the value of N may be set by a user, if the value of N is 2, the deduplication will be performed only if the same target object is included between two adjacent frames, and if the value of N is larger, the range of the video frame involved in the deduplication operation will be larger.
The target video may be any video, and may be a surveillance video of a certain location, for example. The present disclosure is not limited to a particular type of target video. The target video may contain N video frames, where N is an integer greater than N.
The target object may be at least one of a person, a vehicle, and a non-motor vehicle, and different target objects will be exemplarily described later in connection with possible implementations of the present disclosure, and will not be described herein.
The same target object here may be the same thing, for example, may be the same person, or the same vehicle, and so on.
In the embodiment of the present disclosure, at least N consecutive video frames with the same target object exist are deduplicated, specifically, only one video frame or only the feature of the target object in one video frame is reserved for at least 2 video frames including the same target object in the at least N consecutive video frames; if the front and back continuous N-1 video frames adjacent to the video frame containing a certain object do not contain the video frame of the object, the fact that the video frame repeatedly containing the object does not exist in the continuous N video frames indicates that the N video frames do not need to be subjected to de-duplication.
For example, the target video comprises 1000 video frames in total, and the value of N takes 6, i.e. the deduplication operation is performed on at least 6 consecutive video frames where the same target object exists. If the repeated interval of the same object in the video frames is less than 5 continuous frames, only 1 frame of video of the object is reserved, and other video frames containing the object in 6 frames are removed (in this example, the repeated interval of the 6 continuous frames is less than 5 continuous frames because the video frame containing a certain object occupies 2 frames); if the front and rear continuous 5 frames of the video frame containing a certain object do not contain the video frame of the object, the duplication removal is not needed (in this example, the duplication removal is performed on the continuous 6 frames, and since the video frame containing a certain object occupies 1 frame, the judgment is performed on the front and rear continuous 5 frames of the video frame, that is, the duplication removal is performed on the continuous 6 frames).
In step S12, a deduplication operation is performed on at least N consecutive video frames in the plurality of video segments in parallel, where the same target object exists, to obtain a first deduplication result.
The first duplication removal result comprises a set formed by characteristics of a target object obtained after duplication removal operation is carried out on the plurality of video clips.
The deduplication operations of multiple video segments may be performed in parallel, where parallel execution may be performing the deduplication operations on the video segments simultaneously, for example, the deduplication operations may be performed on the video segments by multiple parallel threads, or the deduplication operations may be performed on the video segments simultaneously by multiple devices.
For example, for the 5 video segments obtained in the foregoing example, the deduplication operations may be performed on the 5 video segments in the foregoing example simultaneously, and each thread performs the deduplication operation on 1 of the video segments by 5 threads while performing the deduplication operations on the video segments.
For example, video frames containing the target object may be obtained through target object detection, then similarity of features of the target object in the video frames is determined, which video frames contain features of the same target object are determined by using the similarity and a preset similarity threshold, that is, video frames containing features of the same target object are obtained, and then the video frames needing to be deduplicated are determined according to the number of video frames spaced between the video frames containing the same target object.
In the case where the feature of the target object is a face feature, the similarity threshold may be 0.96, and then two face features with a similarity higher than 0.96 correspond to the same person.
The deduplication operation provided by the embodiment of the present disclosure may also be multiple, and specifically refer to one or more implementation manners provided by the present disclosure, which are not described herein again.
The deduplication result obtained by the deduplication operation may be a video frame, that is, a video frame of the target video, where the video frame may include picture information of an image, may also include time information corresponding to the video frame, and may also include other information, which is not described herein again. In addition, in some optional implementations, the deduplication result may also be a cropped image obtained by cropping the video frame based on the target object in the video frame, where the cropped image includes the image of the target object and removes an image area of the video frame that is unrelated to the target object as much as possible. In some alternative implementations, the deduplication result may also be a feature of a target object in the video frame, and in the computer, the feature of the target object may be represented in a digital matrix form for storage and processing by the computer.
In step S13, the deduplication operation is performed on at least N consecutive video frames at the connection of adjacent video segments, so as to obtain a second deduplication result.
And the second duplication removal result comprises a set formed by the characteristics of the target object obtained by carrying out duplication removal operation on at least N continuous video frames at the joint of the adjacent video clips.
The deduplication operations have been performed in parallel for each video segment in the foregoing, and for adjacent video segments, since the video frame at the end of the preceding video segment and the video frame at the head of the following video segment are consecutive in the target video, it is also possible to perform the deduplication operations for at least N consecutive video frames at the junction of the adjacent video segments.
At least N consecutive video frames at the connection of adjacent video segments, specifically including the video frame at the tail of the previous video segment and the video frame at the head of the next video segment in the adjacent video segments, may be, for example, N-1 video frames at the tail of the previous video segment and N-1 video frames at the head of the next video segment in the temporally adjacent video segments. Because the last frame of the previous video clip and the N-1 video frames of the header of the next video clip form continuous N video frames, at least the continuous N video frames at the joint of the adjacent video clips comprise the N-1 video frames of the header of the next video clip; since the first frame of the next video segment and the last N-1 video frames of the previous video segment constitute N consecutive video frames, at least N consecutive video frames at the junction of adjacent video segments will contain N-1 video frames at the end of the previous video segment.
The deduplication operation here may be the same as the deduplication operation in step S12, so as to maintain consistent deduplication standards for the target video, resulting in accurate deduplication results. For specific deduplication operations, reference may be made to possible implementation manners hereinafter, and details are not described here.
In step S14, the first deduplication result and the second deduplication result are merged to obtain a deduplication result of each target object of the target video.
Here, the combining of the first and second deduplication results may be, for example, combining images included in the first and second deduplication results when both the first and second deduplication results are images, and taking the combined image as a deduplication result of the target video.
In the embodiment of the present disclosure, after receiving a deduplication request for a target video, deduplication operations are performed on at least N consecutive video frames in each video segment of the target video, where the same target object exists, in parallel to obtain a first deduplication result, and the deduplication operations are performed on at least N consecutive video frames at a connection of adjacent video segments to obtain a second deduplication result, so that deduplication of the entire target video is achieved. Therefore, the efficiency of removing the duplicate of the video frames can be improved by executing the duplicate removal operation on each video clip of the target video in parallel, and in addition, because the adjacent video clip connection part also possibly has the video frames needing the duplicate removal after the video clip connection part is connected, the duplicate removal is executed on the adjacent video clip connection part, and the accurate duplicate removal of the whole target video is realized on the basis of improving the duplicate removal efficiency.
The video processing method provided by the present disclosure may be various, and in one possible implementation, the deduplication operation includes: acquiring a first video frame in the target video according to the time sequence of the video frames; detecting whether first features belonging to the same target object exist in features of a target object contained in the first video frame and a comparison set, and detecting whether second features, which are different between the target object and other target objects, exist in the comparison set, wherein the comparison set contains the target features of the target object contained in the first N-1 video frames adjacent to the first video frame; and under the condition that the second feature exists in the comparison set and the second feature is not detected in N-1 continuous video frames after the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
The time-series order of the video frames herein is the order of the video frames in time, and may be the order from the first frame to the last frame in time series, or may be the order from the last frame to the first frame in time series. Sequentially acquiring first video frames in the target video frames according to the time sequence, and then executing the following operations for any first video frame:
and comparing the target characteristics of the target object contained in the first N-1 video frames adjacent to the first video frame in the set, and adding the 1 frame of the first video frame to form continuous N video frames so as to implement the deduplication operation on at least continuous N video frames with the same target object. That is, it is detected whether the first feature belonging to the same target object exists in the feature and comparison set of the target object included in the first video frame, and the second feature of the target object different from other target objects exists.
The first video frame may include one or more features of the target object, and the comparison set includes the target features of the target objects included in the first N-1 video frames adjacent to the first video frame, so that the features of the target objects included in the first video frame may be compared with the features in the comparison set one by one to detect the first feature and the second feature.
For example, if the first video frame is the 1 st frame in the detection order, and there is no video frame before the 1 st frame, the comparison set may not include the target feature of the target object; if the first video frame is the 2 nd frame in the detection sequence, the comparison set can contain the target characteristics of the target object in the 1 st frame; if the first video frame is the 3 rd frame in the detection sequence, in the case of N =2, the comparison set may include the target feature of the target object in the 2 nd frame, and in the case of N =3, the comparison set may include the target feature of the target object in the 1 st frame and the 2 nd frame; the method can be analogized as (8230) (\ 8230)
In a possible implementation manner, the target features included in the comparison set are different, that is, for each target object in the first N-1 video frames adjacent to the first video frame, each target object only takes one corresponding feature to be stored in the comparison set. For example, the first video frame is the 3 rd frame in the detection order, the 2 nd frame before the 3 rd frame includes the target objects a and B, the 1 st frame includes the target objects a, C, and D, and the comparison set corresponding to the 3 rd frame includes one feature of each of the target objects a, B, C, and D, and there are 4 features in total.
If the first feature belonging to the same target object, for example, a face of the same person, for example, the first video frame includes features of target objects D and E, and the features of target objects a, B, C, and D in the set are compared, it can be determined that 2 features (first features) belonging to D exist in the detection result.
The second feature of the object that is different from the other object may be a feature that is not detected to be repeated with the second feature, for example, in the above example, the features of the object a, B, C, and E may be determined to be the second feature of the object that is different from the other object.
After the above detection, which of the first video frame and the first N-1 video frames is the first feature belonging to the same target object and which is the second feature belonging to different target objects can be obtained, and then the first video frame and the first N-1 video frames can be deduplicated according to the detection result, for example, for the first feature belonging to the same target object, only one first feature can be retained while the other first features are discarded.
In the embodiment of the disclosure, a first video frame in a target video frame is obtained through a time sequence, and then the first video frame is compared with target features of target objects contained in N-1 video frames before the first video frame in a comparison set, so as to detect first features belonging to the same target object and second features of the target object different from other target objects, thereby performing deduplication on the first video frame and the first N-1 video frames. The process can detect the first video frames of the target video one by one according to the time sequence, namely, the first video frame is compared with the previous N-1 video frames, in the subsequent process, the characteristics of the target object in the first video frame are compared with the subsequent N-1 video frames in the comparison set, and under the condition that the number of the video frames is larger, the process can only compare the N-1 video frames adjacent to the first video frame without comparing the video frames except the adjacent N-1 video frames, so that the efficiency of the deduplication operation is higher. In addition, the target features in the comparison set can be different features, so that the repeated features in the first N-1 video frames do not need to be compared, the comparison times can be reduced, and the efficiency of the duplicate removal operation is further improved.
As described above, the second feature is a second feature that the target object is different from other target objects, and since the video frames are detected according to the sequence of the time sequence, if the second feature is not detected in N-1 consecutive video frames after the video frame where the second feature is located, it indicates that the second feature is not repeated in the N consecutive video frames, and obviously, it is not necessary to perform deduplication on the video frames, and the video frame corresponding to the second feature can be used as a result after deduplication.
In the case of performing the above detection on the video frames in the time sequence order, the number of times that the second feature is continuously undetected after being detected may be counted, and if the count reaches N-1, it indicates that the second feature is not detected in N-1 consecutive video frames after the video frame where the second feature is located. For example, if N is 3, after the feature of the target object a is detected in the 1 st frame and then the feature of the target object a is not detected in the 2 nd and 3 rd frames with respect to the video frames 1,2, 3, 4, 5, 6, and 7, the count reaches 2, and the feature of the target object a in the 1 st frame or the first frame may be used as the result of the deduplication; after the target object a is detected again in the 5 th frame, if the feature of the target object a is not detected in the 6 th frame, the count of the number of times that the target object a is not detected is restarted to 1, and so on for the subsequent steps.
The comparison set comprises target characteristics of target objects contained in the first N-1 video frames adjacent to the first video frame, so that after the detection of the current first video frame is finished, the comparison set can be updated, the next first video frame can be detected by using the comparison set, and the accuracy of the duplicate removal result is improved.
The updating of the alignment set is performed according to the detection result, and the updating process is described in detail below with reference to several possible implementation manners provided by the present disclosure.
In a possible implementation manner, when the second feature is detected to exist in the comparison set and the second feature is not detected in N-1 consecutive video frames after the video frame where the second feature exists, the second feature is removed from the comparison set.
As described above, when the second feature is detected to exist in the comparison set and the second feature is not detected in N-1 consecutive video frames after the video frame where the second feature exists, the second feature is removed from the comparison set since the second feature is already used as the deduplication result.
In a possible implementation manner, after detecting whether there are first features belonging to the same target object and second features, which are different from other target objects, in the feature and comparison set of the target object included in the first video frame, the method further includes: under the condition that the first characteristics are detected, adding the first characteristics with high quality scores in the first characteristics belonging to the same target object into the comparison set; and under the condition that the second characteristic exists in the first video frame, adding the second characteristic contained in the first video frame into the comparison set.
In the case of detecting the first feature, it indicates that a "duplicate" video frame containing the same target object is detected in N consecutive video frames, and therefore, the two video frames may be deduplicated, for example, only one of the video frames may be retained, or only one first feature belonging to the first feature identical to the target object may be retained, and of course, in the case of retaining only one first feature, since the first feature appears twice in the current N video frames, it is obvious that the first feature still does not satisfy the case of appearing only once in at least N video frames, i.e., may appear again in the next first video frame, and therefore, the first feature may be added to the alignment set to detect the next first video frame by using the alignment set.
In a possible implementation manner, a first feature with a high quality score in first features belonging to the same target object may be added to the comparison set, where the quality score is used to represent the quality degree of the first feature, and the higher the quality score is, the better the first feature is, the higher the resolution of the corresponding image is, and the like. Therefore, the first characteristics with high quality scores are added into the comparison set, and the first characteristics with low quality scores are abandoned, so that the quality of the duplicate removal result is improved.
And under the condition that the second characteristics exist in the first video frame, because the second characteristics in the first video frame do not appear in the N-1 video frames before the first video frame, whether the second characteristics in the first video frame appear in the subsequent N-1 video frames is also judged, so that the second characteristics contained in the first video frame can be added into the comparison set so as to detect the second characteristics contained in the first video frame in the N-1 video frames after the first video frame.
In a possible implementation manner, the target feature meets a preset feature condition, where the preset feature condition includes: the quality score of the target feature is higher than a preset quality score; the target features are features with the highest quality scores in a plurality of images containing the same target object, the plurality of images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame, and the first video frame is a video frame containing the target object.
In this implementation, the second feature in the alignment set is used as the deduplication result if it is not detected N-1 times continuously, and therefore, in order to improve the quality of the deduplication result, the target feature in the alignment set may be a feature with a quality score higher than the preset quality score. The higher the quality score of the characteristics is, the better the quality degree of the indication characteristics is, and the lower the quality score of the characteristics is, the worse the quality degree of the indication characteristics is, so that the quality score of the target characteristics in the comparison set is higher than the preset quality score, and the quality of the duplicate removal result can be improved.
In some implementations of the present disclosure, since the first feature with the higher quality score among the first features belonging to the same target object may be added to the comparison set, the target feature in the comparison set is a feature with the highest quality score among a plurality of images including the same target object, where the plurality of images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame. Therefore, the features left in the comparison set are the features with the highest quality scores in the multiple features belonging to the same target object, the features of the target object with the highest quality scores in the continuous N video frames with the same target object in the target video are reserved, and the quality of the duplicate removal result is improved.
In an optional implementation manner, the detecting whether there are first features belonging to the same target object and second features not belonging to the same target object in the first video frame and the comparison set includes: determining characteristics of at least one target object contained in the first video frame; respectively determining the similarity between the characteristics of the at least one target object and each target characteristic in the comparison set; and obtaining a detection result according to the similarity and a preset similarity threshold.
For example, if the target object is a person and the feature of the target object is a face feature, the face included in the first video frame may be determined by performing face detection on the first video frame, and a specific implementation manner of the face detection may refer to related technologies, which is not described herein again.
After the target object is detected, the features of the target object can be extracted, and if the first video frame contains the features of a plurality of target objects, the features of the target object can be extracted respectively. After the features of the target object are extracted, the features of the target object can be compared with the target features in the comparison set, and the similarity between the features of the target object and each target feature in the comparison set is determined.
For example, if the first video frame includes features d and e of the target object and the comparison set includes target features a, b, c, and d, the similarity between the feature d in the first video frame and the target features a, b, c, and d in the comparison set can be calculated, and then the similarity between the feature e in the first video frame and the target features a, b, c, and d in the comparison set can be calculated.
And comparing the calculated similarity with a preset similarity threshold value to obtain a detection result. The similarity threshold is a preset value used for measuring whether two features belong to the same target object, if the similarity of the two features is higher than the similarity threshold, the two features are indicated to belong to the same target object, and if the similarity of the two features is not higher than the similarity threshold, the two features are indicated not to belong to the same target object.
In the embodiment of the present disclosure, the detection result is obtained by determining the similarity between the feature of at least one target object in the first video frame and each target feature in the comparison set. Because the detection process is executed according to the sequence of the time sequence, the target features in the comparison set are compared with each other, and therefore, the first video frame is only required to be compared with the target features in the comparison set, the similarity between the target features in the comparison set is not required to be compared, and the efficiency of the duplicate removal operation can be improved.
In some implementations of the present disclosure, video frames in a target video may be marked in a time sequence order of the video frames to obtain a marking result; and determining at least N continuous video frames at the joints of the adjacent video segments according to the time sequence order of the video frames represented by the marking result.
The video frames in the target video are in time sequence order, so the video frames in the target video can be marked with a flag in the time sequence order, and the flag can be used for indicating the time sequence order of the video frames in the target video.
Specifically, each video segment of the target video may be numbered, and each video frame in the video segment may be numbered, the number of the video frame includes the number of the video segment to which the video frame belongs, and the time sequence number of the video frame in the video segment, for example, one target video including 1000 video frames, and the target video may be temporally divided into 5 video segments, each video segment including 200 video frames. Each video clip is numbered 1,2, 3, 4, 5, the first video clip is numbered 1-001, 1-002, 1-003 \8230, 1-200, the second video clip is numbered 2-001,2-002, 2-003 \8230, 2-200, the third video clip is numbered 3-001,3-002, 3-003 \8230, 3-200, the fourth video clip is numbered 4-001,4-002, 4-003 \8230, 8230, 4-200, the fifth video clip is numbered 5-001,5-002, 5-003 \8230, 5-200.
And determining at least continuous N video frames at the joint of the adjacent video segments according to the time sequence order of the video frames represented by the marking result, wherein specifically, the N-1 video frames at the tail part of the previous video segment and the N-1 video frames at the head part of the next video segment can be determined as the at least continuous N video frames at the joint of the adjacent video segments, and the total number of the N-2 video frames is 2N-2.
For example, in the above example, if N takes 3, at least 2 consecutive video frames at the junction of adjacent video segments are obtained, and the set of finally obtained video frames is {1-199,1-200,2-001,2-002}, {2-199,2-200,3-001,3-002}, {3-199,3-200,4-001,4-002}, {4-199,4-200,5-001,5-002}.
Furthermore, video frames in the target video may be numbered from front to back, for example, for a target video containing 1000 video frames, the video frames may be numbered 0001, 0002, 0003, 0004 \8230; \82301000; if N takes 3, at least 2 consecutive video frames at the junction of adjacent video segments are obtained, and the resulting set of video frames is {0199,0200,0201,0202}, {0399,0400,0401,0402}, {0599,0600,0601,0602}, {0799,0800,0801,0802}.
In the embodiment of the present disclosure, by marking the video frames in the target video, at least N consecutive video frames at the connection of adjacent video segments can be accurately determined according to the time sequence order of the video frames indicated by the marking result, so that an accurate deduplication result can still be obtained under the condition that deduplication operations are performed on video segments in parallel.
In one possible implementation, after receiving a deduplication request for a target video, the method further includes: determining a video frame containing the target object in the target video; determining a quality score of a feature of a target object contained in a video frame, the quality score being determined based on at least one of: the definition of a target object in a video frame and the angle between the target object and a lens in the video frame.
In this implementation, considering that each frame of the target video may not include the target object, the video frame including the target object in the target video frame may be identified first, and may be determined specifically by a target detection method, for example, by a neural network, for example, if the target object is a person, whether the video frame includes the person may be detected by face detection or human body detection, and the specific implementation of the face detection may refer to related technologies, which is not described herein again.
After the video frame containing the target object is determined, the quality score corresponding to the target object in the video frame containing the target object can be determined, the quality score can be determined according to the definition of the target object in the video frame, the definition of the target object is in direct proportion to the quality score of the characteristics of the target object, and the quality score is higher when the definition is higher; the quality score may be determined according to an angle between the target object and the shot, and the quality score of the target object feature may be higher as the angle between the target object and the shot is more positive.
In the embodiment of the disclosure, the quality score of the feature of the target object in the video frame is determined, so that the subsequent deduplication based on the quality score of the feature of the target object is facilitated, and the quality of a deduplication result is favorably improved.
In a possible implementation manner, after determining that the target video includes the video frame of the target object, the method may further delete the video frame that does not include the target object in the target video to obtain the target video from which the video frame that does not include the target object is deleted, and then acquire the first video frame in the target video according to the time sequence order of the video frames, which may specifically include: according to the time sequence of the video frames, a first video frame in a target video of the video frame without the target object is obtained, and then whether a first feature belonging to the same target object and a second feature which is different from other target objects exist in a feature and comparison set of the target object contained in the first video frame is detected to obtain a detection result.
In the case that a video frame not including the target object is deleted from the target video frame, it may be determined whether the detected second video frame satisfies a condition that the second feature is not detected in N-1 consecutive video frames after the video frame where the second feature is located, according to a number indicating a time sequence order of the video frames, and a deduplication result is determined.
In the embodiment of the present disclosure, before performing the deduplication operation, a video frame including the target object in the target video frame may be determined, then a video frame not including the target object in the target video may be deleted, and the deduplication operation is performed on the target object from which the video frame not including the target object is deleted, so that the efficiency of the deduplication operation can be improved.
In one possible implementation, the method further includes: storing the duplicate removal result; the deduplication result comprises: characteristic information of the target object after the duplication removal; the time information of the appearance of the target object after the duplication removal; and the position information of the target object after the duplication removal.
In the embodiment of the disclosure, the duplicate removal result is stored without storing the target video, so that the storage space is saved.
The storage result may include feature information of the target object after the deduplication, that is, feature information of the target object after the deduplication obtained by one or more possible implementations of the present disclosure.
The storage result may further include time information of occurrence of the target object after the duplication is removed, and the shooting time of the video frame where the feature information of the target object is located may be used as the time information of occurrence of the target object. For example, when the target video is a surveillance video, the shooting time of a video frame is often recorded in the surveillance video so as to trace the source of a surveillance picture, and the shooting time of the video frame recorded in the watermark of the surveillance picture is often recorded in the surveillance picture of the video frame in the form of a watermark, so that the shooting time of the video frame recorded in the watermark of the surveillance picture can be used as the time information of the occurrence of the target object.
The storage result may further include location information of the occurrence of the target object after the duplication removal, where the location information may be geographical location information of a preset shooting location of the target video, or may also be an image in which a scene of the occurrence of the target object is recorded in any one frame of the target video.
In the embodiment of the disclosure, the duplicate removal result is stored, and the storage result includes the characteristics of the target object after the duplicate removal and the time and place information of the target object, so as to track and trace the target object in the target video frame in the following process, therefore, the information of the time and place of the target object can be tracked and traced without storing the whole target video, and the storage space is saved.
In one possible implementation, the target object includes at least one of a person, a vehicle, and a non-motor vehicle; in the case that the target object comprises a person, the features of the target object comprise at least one of human face features and human body features; in the case where the target object includes a vehicle, the characteristic of the target object includes at least one of a vehicle type characteristic and a vehicle color characteristic.
In the security industry, people, vehicles, non-motor vehicles and other information are often tracked through video monitoring, and therefore in the security industry, a target video can be a monitoring video shot in real time or can also be a monitoring video stored offline.
Because a large amount of videos need to be stored in the security industry, the video processing method provided by the embodiment of the disclosure is used for removing duplicate of the videos, storing duplicate removal results, and removing pictures with poor quality and pictures with the same target object, so that the effects of saving storage and improving data query efficiency can be achieved.
In the following, a video processing method provided by the present disclosure is exemplarily described with reference to a target object as a person, a feature of the target object as a face, and a target video as a surveillance video (1000 frames in total), where the content that is not described in detail in this section may refer to the related description, and similarly, the content in this section may also be used to exemplarily describe the content in the foregoing.
In this implementation, the value of N is set to 2.
In a possible application scenario provided by the present disclosure, the video processing method provided by the present disclosure includes:
step 201, a target video is divided into a plurality of video segments.
For a target video of 1000 video frames, the target video may be temporally divided into 5 video segments, each video segment containing 200 video frames.
Step 202, performing frame extraction on each video clip to obtain a video frame, and numbering the video frames.
Specifically, the numbers of each video segment are 1,2, 3, 4 and 5, the numbers of the first video segment are 1-001, 1-002, 1-003, 823030309, the numbers of the second video segment are 2-001,2-002, 2-003, 82303008, the numbers of the third video segment are 3-001,3-002, 3-003, 8230003, the numbers of the fourth video segment are 4-001,4-002, 4-003, 82303030, the numbers of the 823030304-200, the numbers of the fifth video segment are 5-001,5-002, 5-8230, 5-82303030, and the numbers of the fourth video segment are 5-001,5-002, 5-823030, 5-200.
The video frames of the 5 video clips can be stored in a sub-directory.
Step 203, performing face detection on the video frame to obtain a first video frame containing face features.
Specifically, the face detection can be performed on the video frame through the neural network, so that the video frame containing the face is obtained.
And 204, calculating the quality scores of the face features in the video frames containing the faces, and determining the first video frame to which the face features with the quality scores larger than the quality score threshold belong.
The quality score of the face features is determined according to the definition of the face and the angle between the face and a lens, the definition of the face is in direct proportion to the quality score of the face features, and the higher the definition is, the higher the quality score of the face features is; in addition, the quality score of the face features can also be determined according to the angle between the face and the shot, and the more positive the angle between the face and the shot is, the higher the quality score of the face features is.
For the facial features, the quality score threshold may be 0.80, video frames to which facial features smaller than 0.86 belong may be directly discarded,
and step 205, performing a duplicate removal operation on each video segment in parallel, comparing the first video frame with the target image in the image comparison set, and obtaining a first duplicate removal result according to the comparison result.
Referring to fig. 2, an exemplary diagram of a deduplication operation provided by the present disclosure, which includes the first 4 frames of images of a target video, is shown, where faces are indicated by reference numerals in order to distinguish the faces in each frame of image.
When the 1 st frame image is detected, the comparison set is empty, 4 faces in the 1 st frame image are all the second features, and the 4 face features in the 1 st frame image are added into the comparison set;
when the 2 nd frame image is detected, 4 face features in the 1 st frame image are included in the comparison set, if the detection result is that the face features (1) (2) (4) are the first features, the face features (3) are the second features, and the face features (3) meet the condition that the next frame 1 is not detected, the face features (3) are used as a duplication removal result, in addition, the comparison set is updated, the face features (3) are removed, and the face features (5) in the 2 nd frame are added;
when the 3 rd frame image is detected, the updated face features (1), (4), (2) and (5) in the comparison set are compared with the comparison set by using the 3 rd frame image, and then the detection result shows that the face features (1), (4), (5), (6), (7) and (8) are second features, the face feature (2) is the first feature, and the second features (1), (4) and (5) meet the condition that the next frame 1 is not detected, the face features (1), (4) and (5) are taken as the duplication elimination result, in addition, the comparison set is updated, the face features (1), (4) and (5) are removed, and the second features (6), (7) and (8) are added;
when the 4 th frame image is detected, the updated face features (2), (6), (7) and (8) in the comparison set are compared with the comparison set by using the 4 th frame image, and then the detection result is that the face features (2), (6), (7), (8) and (9) are second features, the first features do not exist, and the second features (2), (6), (7) and (8) meet the condition that the face features (2), (6), (7) and (8) are not detected in the next 1 frame, the face features (2), (6), (7) and (8) are used as the duplication removing result, in addition, the comparison set is also updated, the face features (2), (6), (7) and (8) are removed, and the second features (9) are added.
And carrying out the detection on the first video frame according to the time sequence until the detection is finished.
Step 206, determining 2 continuous video frames at the connection of adjacent video segments;
i.e. where N takes 2, 2 consecutive video frames at the junction of adjacent video segments are acquired, and the set of resulting video frames is {1-200,2-001}, {2-200,3-001}, {3-200,4-001}, and {4-200,5-001}.
Step 207, performing a deduplication operation on 2 consecutive video frames at the connection of adjacent video segments to obtain a second deduplication result.
For a detailed process of the deduplication operation, please refer to the related description above, and details are not repeated herein.
And 208, fusing the first duplicate removal result and the second duplicate removal result to obtain a duplicate removal result of the target video.
Specifically, the first deduplication result and the second deduplication result may be combined.
Step 209, store the result of storage.
The storage result contains the characteristics of the target object after the duplication removal, and the time and place information of the target object, so as to trace back the target object in the target video frame in the following process.
In the embodiment of the present disclosure, after receiving a deduplication request for a target video, deduplication operations are performed on at least N consecutive video frames in each video segment of the target video, where the same target object exists, in parallel to obtain a first deduplication result, and the deduplication operations are performed on at least N consecutive video frames at a connection of adjacent video segments to obtain a second deduplication result, so that deduplication of the entire target video is achieved. Therefore, the efficiency of removing the duplication of the video frames can be improved by executing the duplication removing operation on each video clip of the target video in parallel, and in addition, because the video frames needing to be removed are possibly existed at the joints of the adjacent video clips after the video clips are connected, the duplication removing operation is executed on the joints of the adjacent video clips, so that the accurate duplication removing of the whole target video is realized on the basis of improving the duplication removing efficiency.
It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.
In addition, the present disclosure also provides a video processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any video processing method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.
Fig. 3 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 3, the apparatus 30 includes:
a segmentation unit 301, configured to segment a received target video into a plurality of video segments, where the plurality of video segments include at least one target object;
a first deduplication unit 302, configured to perform deduplication operations on at least N consecutive video frames in the multiple video segments where the same target object exists in parallel to obtain a first deduplication result, where the first deduplication result includes a set formed by features of the target object obtained after deduplication operations are performed on the multiple video segments;
a second duplicate removal unit 303, configured to perform the duplicate removal operation on at least N consecutive video frames at a connection of adjacent video segments to obtain a second duplicate removal result, where the second duplicate removal result includes a set of features of a target object obtained after performing the duplicate removal operation on at least N consecutive video frames at the connection of adjacent video segments;
a merging unit 304, configured to merge the first deduplication result and the second deduplication result to obtain a deduplication result for each target object in the target video.
In a possible implementation manner, the deduplication operation is performed by a deduplication subunit, and the deduplication subunit is configured to obtain a first video frame in the target video according to a time sequence order of the video frames; detecting whether first features belonging to the same target object exist in features of a target object contained in the first video frame and a comparison set, and detecting whether second features, which are different between the target object and other target objects, exist in the comparison set, wherein the comparison set contains the target features of the target object contained in the first N-1 video frames adjacent to the first video frame; and under the condition that the second feature exists in the comparison set and the second feature is not detected in N-1 continuous video frames after the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
In a possible implementation, the de-weighting subunit is configured to determine a feature of at least one target object included in the first video frame; respectively determining the similarity between the characteristics of the at least one target object and each target characteristic in the comparison set; and obtaining a detection result according to the similarity and a preset similarity threshold.
In a possible implementation manner, the de-weighting subunit is configured to, in a case that the first feature is detected, add, to the alignment set, a first feature with a high quality score in the first features belonging to the same target object; and adding the second features contained in the first video frame into the comparison set under the condition that the second features are detected to exist in the first video frame.
In a possible implementation manner, the target feature meets a preset feature condition, where the preset feature condition includes:
the quality score of the target feature is higher than a preset quality score;
the target features are features with the highest quality scores in a plurality of images containing the same target object, the plurality of images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame, and the first video frame is a video frame containing the target object.
In a possible implementation manner, the segmentation unit is configured to label video frames in the target video according to a time sequence order of the video frames to obtain a labeling result; and determining at least N continuous video frames at the joint of the adjacent video segments according to the time sequence order of the video frames represented by the marking result.
In one possible implementation, the at least N consecutive video frames at the connection of the adjacent video segments include:
in the temporally adjacent video segments, N-1 video frames at the end of the previous video segment and N-1 video frames at the head of the next video segment.
In one possible implementation, the apparatus further includes:
a video frame determination unit, configured to determine a video frame containing the target object in the target video;
a quality score determination unit for determining a quality score of a feature of a target object contained in a video frame, the quality score being determined according to at least one of the following information:
the definition of a target object in a video frame and the angle between the target object and a lens in the video frame.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
The disclosed embodiments also provide a computer program product comprising computer readable code, which when run on a device, a processor in the device executes instructions for implementing the video processing method provided in any of the above embodiments.
The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the video processing method provided in any of the above embodiments.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 4 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.
Referring to fig. 4, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 5 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 5, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may further include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932 TM ) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X) TM ) Multi-user, multi-process computer operating system (Unix) TM ) Free and open native code Unix-like operating System (Linux) TM ) Open native code Unix-like operating System (FreeBSD) TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as a punch card or an in-groove protruding structure with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the disclosure are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A video processing method, comprising:
segmenting a received target video into a plurality of video segments, wherein the plurality of video segments comprise at least one target object;
performing deduplication operation on at least N consecutive video frames with the same target object in the multiple video clips in parallel to obtain a first deduplication result, wherein the first deduplication result comprises a set formed by features of the target object obtained after deduplication operation is performed on the multiple video clips;
performing the deduplication operation on at least N consecutive video frames at the joint of adjacent video segments to obtain a second deduplication result, wherein the second deduplication result comprises a set formed by the features of the target object obtained after the deduplication operation is performed on the at least N consecutive video frames at the joint of the adjacent video segments;
merging the first duplicate removal result and the second duplicate removal result to obtain a duplicate removal result of each target object in the target video;
wherein the deduplication operation comprises:
acquiring a first video frame in the target video according to the time sequence of the video frames;
detecting whether first features belonging to the same target object exist in features of a target object contained in the first video frame and a comparison set, and detecting whether second features, which are different between the target object and other target objects, exist in the comparison set, wherein the comparison set contains the target features of the target object contained in the first N-1 video frames adjacent to the first video frame;
and under the condition that the second feature exists in the comparison set and the second feature is not detected in continuous N-1 video frames behind the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
2. The method of claim 1, wherein the detecting whether there is a first feature belonging to the same target object and a second feature not belonging to the same target object in the first video frame and the comparison set comprises:
determining characteristics of at least one target object contained in the first video frame;
respectively determining the similarity between the characteristics of the at least one target object and each target characteristic in the comparison set;
and obtaining a detection result according to the similarity and a preset similarity threshold.
3. The method according to claim 1, wherein after detecting whether there are first features belonging to the same target object and second features, which are different from other target objects, in the feature and comparison set of the target object included in the first video frame, the method further comprises:
under the condition that the first features are detected, adding the first features with high quality scores in the first features belonging to the same target object into the comparison set;
and adding the second features contained in the first video frame into the comparison set under the condition that the second features are detected to exist in the first video frame.
4. The method according to any one of claims 1 to 3, wherein the target feature meets a preset feature condition, and the preset feature condition comprises:
the quality score of the target feature is higher than a preset quality score;
the target features are features with the highest quality scores in a plurality of images containing the same target object, the images are a plurality of images in at least the first N-1 video frames adjacent to the first video frame, and the first video frame is a video frame containing the target object.
5. The method according to any one of claims 1-4, wherein said slicing the received target video into a plurality of video segments comprises:
according to the time sequence of the video frames, marking the video frames in the target video to obtain a marking result;
and determining at least N continuous video frames at the joint of the adjacent video segments according to the time sequence order of the video frames represented by the marking result.
6. The method according to any of claims 1-5, wherein at least N consecutive video frames at the junction of the adjacent video segments comprise:
in the temporally adjacent video segments, N-1 video frames at the end of the previous video segment and N-1 video frames at the head of the next video segment.
7. The method according to any of claims 1-6, wherein before performing the deduplication operation in parallel on at least N consecutive video frames of the plurality of video segments in which the same target object exists, the method further comprises:
determining a video frame containing the target object in the target video;
determining a quality score of a feature of a target object contained in a video frame, the quality score being determined based on at least one of:
the definition of a target object in a video frame and the angle between the target object and a lens in the video frame.
8. A video processing apparatus, comprising:
the segmentation unit is used for segmenting the received target video into a plurality of video segments, wherein the plurality of video segments comprise at least one target object;
a first duplicate removal unit, configured to perform a duplicate removal operation on at least N consecutive video frames in the multiple video segments in which the same target object exists, in parallel, to obtain a first duplicate removal result, where the first duplicate removal result includes a set formed by features of the target object obtained after performing the duplicate removal operation on the multiple video segments;
the second duplicate removal unit is used for executing the duplicate removal operation on at least N continuous video frames at the joint of the adjacent video segments to obtain a second duplicate removal result, and the second duplicate removal result comprises a set formed by the characteristics of the target object obtained after the duplicate removal operation is carried out on the at least N continuous video frames at the joint of the adjacent video segments;
a merging unit, configured to merge the first deduplication result and the second deduplication result to obtain a deduplication result for each target object in the target video;
the duplication removing operation is executed by a duplication removing subunit, and the duplication removing subunit is used for acquiring a first video frame in the target video according to the time sequence order of the video frames; detecting whether a first feature belonging to the same target object exists in a feature and a comparison set of the target object contained in the first video frame and a second feature which is different from the target object and other target objects, wherein the comparison set contains target features of the target object contained in the first N-1 video frames adjacent to the first video frame; and under the condition that the second feature exists in the comparison set and the second feature is not detected in N-1 continuous video frames after the video frame where the second feature exists, taking the video frame where the second feature exists as a duplicate removal result of the first video frame and the first N-1 video frames, and removing the second feature from the comparison set.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 7.
10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.
CN202110120299.4A 2021-01-28 2021-01-28 Video processing method and device, electronic equipment and storage medium Active CN112911239B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110120299.4A CN112911239B (en) 2021-01-28 2021-01-28 Video processing method and device, electronic equipment and storage medium
PCT/CN2021/129187 WO2022160849A1 (en) 2021-01-28 2021-11-08 Video processing method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110120299.4A CN112911239B (en) 2021-01-28 2021-01-28 Video processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112911239A CN112911239A (en) 2021-06-04
CN112911239B true CN112911239B (en) 2022-11-11

Family

ID=76119817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110120299.4A Active CN112911239B (en) 2021-01-28 2021-01-28 Video processing method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112911239B (en)
WO (1) WO2022160849A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112911239B (en) * 2021-01-28 2022-11-11 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium
CN113507630B (en) * 2021-07-08 2023-06-20 北京百度网讯科技有限公司 Method and device for stripping game video
CN117278763A (en) * 2022-06-14 2023-12-22 中兴通讯股份有限公司 Interactive-based encoding method, encoding device and readable storage medium
CN117372933B (en) * 2023-12-06 2024-02-20 南京智绘星图信息科技有限公司 Image redundancy removing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment
CN110996183A (en) * 2019-07-12 2020-04-10 北京达佳互联信息技术有限公司 Video abstract generation method, device, terminal and storage medium
CN112085097A (en) * 2020-09-09 2020-12-15 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090204636A1 (en) * 2008-02-11 2009-08-13 Microsoft Corporation Multimodal object de-duplication
WO2013187901A2 (en) * 2012-06-14 2013-12-19 Empire Technology Development Llc Data deduplication management
CN109543641B (en) * 2018-11-30 2021-01-26 厦门市美亚柏科信息股份有限公司 Multi-target duplicate removal method for real-time video, terminal equipment and storage medium
CN111476105A (en) * 2020-03-17 2020-07-31 深圳力维智联技术有限公司 Face data cleaning method, device and equipment
CN112231514B (en) * 2020-10-19 2024-01-05 腾讯科技(深圳)有限公司 Data deduplication method and device, storage medium and server
CN112911239B (en) * 2021-01-28 2022-11-11 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment
CN110996183A (en) * 2019-07-12 2020-04-10 北京达佳互联信息技术有限公司 Video abstract generation method, device, terminal and storage medium
CN112085097A (en) * 2020-09-09 2020-12-15 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022160849A1 (en) 2022-08-04
CN112911239A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112911239B (en) Video processing method and device, electronic equipment and storage medium
CN109948494B (en) Image processing method and device, electronic equipment and storage medium
CN111553864B (en) Image restoration method and device, electronic equipment and storage medium
CN110569777B (en) Image processing method and device, electronic device and storage medium
CN110781957A (en) Image processing method and device, electronic equipment and storage medium
CN110942036B (en) Person identification method and device, electronic equipment and storage medium
CN110633700B (en) Video processing method and device, electronic equipment and storage medium
CN109635142B (en) Image selection method and device, electronic equipment and storage medium
CN112465843A (en) Image segmentation method and device, electronic equipment and storage medium
CN109543536B (en) Image identification method and device, electronic equipment and storage medium
CN112991553B (en) Information display method and device, electronic equipment and storage medium
CN109671051B (en) Image quality detection model training method and device, electronic equipment and storage medium
CN106534951B (en) Video segmentation method and device
CN111523346B (en) Image recognition method and device, electronic equipment and storage medium
CN109101542B (en) Image recognition result output method and device, electronic device and storage medium
CN109344703B (en) Object detection method and device, electronic equipment and storage medium
CN112085097A (en) Image processing method and device, electronic equipment and storage medium
CN111860373B (en) Target detection method and device, electronic equipment and storage medium
CN110909203A (en) Video analysis method and device, electronic equipment and storage medium
CN107886515B (en) Image segmentation method and device using optical flow field
CN114187498A (en) Occlusion detection method and device, electronic equipment and storage medium
CN111652107A (en) Object counting method and device, electronic equipment and storage medium
CN110781842A (en) Image processing method and device, electronic equipment and storage medium
CN113722541A (en) Video fingerprint generation method and device, electronic equipment and storage medium
CN110955800A (en) Video retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40049964

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant