CN107493441B - Abstract video generation method and device - Google Patents

Abstract video generation method and device Download PDF

Info

Publication number
CN107493441B
CN107493441B CN201610409337.7A CN201610409337A CN107493441B CN 107493441 B CN107493441 B CN 107493441B CN 201610409337 A CN201610409337 A CN 201610409337A CN 107493441 B CN107493441 B CN 107493441B
Authority
CN
China
Prior art keywords
moving
video
original
image
videos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610409337.7A
Other languages
Chinese (zh)
Other versions
CN107493441A (en
Inventor
彭剑峰
王鹏
叶挺群
郭斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201610409337.7A priority Critical patent/CN107493441B/en
Publication of CN107493441A publication Critical patent/CN107493441A/en
Application granted granted Critical
Publication of CN107493441B publication Critical patent/CN107493441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment ; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Abstract

The embodiment of the invention provides a method and a device for generating abstract videos, wherein the method comprises the following steps: aiming at least two original videos, merging collected scene images of the at least two original videos to obtain a target scene image; extracting first video information of each first moving object included in each original video aiming at each original video; identifying identical second moving objects included in at least two original videos, and merging first video information of each second moving object in the at least two original videos to obtain second video information of each second moving object; determining the positions of the first moving objects and the second moving objects in the target scene image at various moments; and sequentially displaying each first moving target and each second moving target in the target scene image to generate a summary video. According to the embodiment of the invention, a summary video can be generated aiming at videos collected by a plurality of video collecting devices, so that the user experience is improved.

Description

Abstract video generation method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a method and a device for generating abstract videos.
Background
With the development of video technology, the role of summary video in video analysis and content-based video retrieval is increasingly important. For example, in the social public security field, a video surveillance system becomes an important component for maintaining social security and strengthening social management. However, video recording has the characteristics of large data storage amount, long storage time and the like, and the traditional method for obtaining evidence by searching clues through video recording consumes a large amount of manpower, material resources and time, so that the efficiency is extremely low, and the best solution solving opportunity is missed.
In the existing abstract video generation method, all moving objects can be extracted from a video acquired by any video acquisition device through moving object analysis, then the moving tracks of all the moving objects are analyzed, different moving objects are spliced into a common background scene, and a short video is generated, wherein the video is called an abstract video.
However, the existing summary video generation method can only generate the corresponding summary video for the video collected by one video collection device. In practical application, for one video acquisition device, the view angle of an acquisition scene for acquiring a video is limited, the included moving object information is less, and the user experience is poor. Therefore, how to generate an abstract video for videos collected by a plurality of video collecting devices becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for generating an abstract video, which are used for generating an abstract video aiming at videos acquired by a plurality of video acquisition devices and improving user experience. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for generating a summarized video, where the method includes:
determining the geographical position relationship of the acquisition scene of each original video aiming at least two original videos, and merging the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain a target scene image;
for each original video, analyzing each video frame of the original video, and extracting first video information of each first moving object included in the original video, wherein the first video information of each first moving object includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image corresponding to each time and containing the first moving object;
identifying identical second moving objects included in the at least two original videos according to first video information of each first moving object included in each original video, and merging the first video information of each second moving object in the at least two original videos to obtain second video information of each second moving object;
determining the positions of the first moving objects and the second moving objects in the target scene image at each moment according to the positions of the first moving objects in the original video image at each moment, the positions of the second moving objects in the original video at each moment and the relation between the acquired scene image of each original video and the target scene image;
and sequentially displaying each first moving object and each second moving object in the target scene image according to each moment when each first moving object appears in the original video, an image which corresponds to each moment and contains each first moving object, the position of each first moving object in the target scene image, each moment when each second moving object appears in each original video, the image which corresponds to each moment and contains each second moving object, the position of each second moving object in the target scene image and a preset display rule, and generating the abstract video corresponding to the at least two original videos.
Optionally, the merging, according to the geographical position relationship of the captured scene of each original video, the captured scene images of the at least two original videos includes:
when the geographic positions of the collected scenes of any two original videos partially coincide, splicing the collected scene images of the two original videos;
when the geographic positions of the collected scenes of any two original videos are not coincident, the collected scene images of the two original videos are subjected to joint processing.
Optionally, the identifying, according to the first video information of each first moving object included in each original video, a same second moving object included in the at least two original videos includes:
aiming at any two original videos, respectively obtaining an image containing any one first moving target in each original video, and calculating the matching degree of the two images;
and when the matching degree is greater than a first preset threshold value, determining that the two first moving targets are the same second moving target.
Optionally, after obtaining, for any two original videos, an image including any one first moving object in the original videos respectively and calculating a matching degree of the two images, the method further includes:
determining whether the two first moving objects meet preset conditions or not according to the moments when the two first moving objects appear in the two original videos and the positions corresponding to the moments;
if yes, continuing to execute the step of determining that the two first moving objects are the same second moving object when the matching degree is larger than a first preset threshold.
Optionally, the preset display rule includes displaying a predetermined number of first moving objects and/or second moving objects in the target scene image.
Optionally, the preset display rule includes at least one of: the time when the first moving target and/or the second moving target appear in the original video is within a preset time range, the position where the first moving target and/or the second moving target appear in the original video is within a preset position range, and the matching degree of the image containing the first moving target or the second moving target and a preset contrast image is larger than a second preset threshold value.
Optionally, the method further comprises:
and acquiring a three-dimensional scene model corresponding to the target scene image, and overlaying the abstract video into the three-dimensional scene model.
In a second aspect, an embodiment of the present invention provides an apparatus for generating a summarized video, where the apparatus includes:
the merging module is used for determining the geographical position relationship of the acquisition scene of each original video aiming at least two original videos, and merging the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain a target scene image;
an extraction module, configured to analyze each video frame of each original video, and extract first video information of each first moving object included in the original video, where the first video information of each first moving object includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image corresponding to each time and containing the first moving object;
the identification module is used for identifying the same second moving object in the at least two original videos according to the first video information of each first moving object in each original video, and merging the first video information of each second moving object in the at least two original videos to obtain the second video information of each second moving object;
the determining module is used for determining the positions of the first moving objects and the second moving objects in the target scene image at various moments according to the positions of the first moving objects in the original video image at various moments, the positions of the second moving objects in the original video at various moments and the relation between the acquired scene image of each original video and the target scene image;
and the generating module is used for sequentially displaying each first moving object and each second moving object in the target scene image according to each moment when each first moving object appears in the original video, an image which corresponds to each moment and contains each first moving object, the position of each first moving object in the target scene image, each moment when each second moving object appears in each original video, an image which corresponds to each moment and contains each second moving object, the position of each second moving object in the target scene image, and a preset display rule, so as to generate the abstract videos corresponding to the at least two original videos.
Optionally, the merging module includes:
the splicing submodule is used for splicing the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos are partially overlapped;
and the linking submodule is used for linking the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos are not coincident.
Optionally, the identification module comprises:
the calculation submodule is used for respectively obtaining an image containing any first moving target in each original video aiming at any two original videos and calculating the matching degree of the two images;
and the first determining submodule is used for determining that the two first moving targets are the same second moving target when the matching degree is greater than a first preset threshold.
Optionally, the apparatus further comprises:
the second determining submodule is used for determining whether the two first moving targets meet the preset condition or not according to the moments when the two first moving targets appear in the two original videos and the positions corresponding to the moments; if so, triggering the first determination submodule.
Optionally, the preset display rule includes displaying a predetermined number of first moving objects and/or second moving objects in the target scene image.
Optionally, the preset display rule includes at least one of: the time when the first moving target and/or the second moving target appear in the original video is within a preset time range, the position where the first moving target and/or the second moving target appear in the original video is within a preset position range, and the matching degree of the image containing the first moving target or the second moving target and a preset contrast image is larger than a second preset threshold value.
Optionally, the apparatus further comprises:
and the processing module is used for acquiring a three-dimensional scene model corresponding to the target scene image and overlaying the abstract video into the three-dimensional scene model.
The embodiment of the invention provides a method and a device for generating abstract videos, which can be used for merging the collected scene images of at least two original videos according to the geographical position relationship of the collected scenes of the at least two original videos to obtain a target scene image, further displaying a moving target in each original video in the target scene image to generate the abstract videos corresponding to the at least two original videos, and improving user experience.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for generating a summary video according to an embodiment of the present invention;
fig. 2(a) and 2(b) are schematic diagrams of splicing and splicing of pictures;
FIG. 3 is a schematic diagram of determining the position of a moving object in an image of a target scene;
fig. 4 is a schematic structural diagram of an apparatus for generating a summary video according to an embodiment of the present invention.
Detailed Description
In order to generate an abstract video aiming at videos acquired by a plurality of video acquisition devices and improve user experience, the embodiment of the invention provides an abstract video generation method and an abstract video generation device.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to generate an abstract video for videos collected by multiple video collecting devices and improve user experience, an embodiment of the present invention provides a process of an abstract video generating method, which, as shown in fig. 1, may include the following steps:
s101, determining the geographical position relationship of the acquisition scene of each original video aiming at least two original videos, and combining the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain a target scene image.
The method provided by the embodiment of the invention can be applied to electronic equipment. Specifically, the electronic device may be a desktop computer, a portable computer, an intelligent mobile terminal, and the like.
In the embodiment of the invention, in order to generate a summary video for videos acquired by a plurality of video acquisition devices and improve user experience, the electronic device may determine the geographical position relationship of the acquisition scene of each original video for at least two original videos, and merge the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain the target scene image.
For example, when the geographic positions of the captured scenes of any two original videos partially coincide, the electronic device may perform stitching processing on the captured scene images of the two original videos; when the geographic positions of the captured scenes of any two original videos do not coincide, the electronic device can perform connection processing on the captured scene images of the two original videos. According to the mode, the target scene image containing the acquisition scene of each original video can be obtained after the acquisition scene images of the at least two original videos are combined.
And splicing, namely splicing the two pictures with the overlapped area into a larger picture by an image splicing technology. And (4) linking processing, namely, performing mild pixel overcoupling on the two pictures without the overlapped area through geographical position information or performing intermediate position supplement by using fixed ground color so as to form a larger picture.
As shown in fig. 2(a), when the picture 210 and the picture 220 have an overlapping region, a larger picture 230 can be assembled by image stitching.
As shown in fig. 2(b), when there is no overlapping area between the pictures 240 and 250, a larger picture 260 can be assembled by gradual pixel overcoupling.
In fig. 2(a) and 2(b), the symbols "+" and "═ indicate that the two pictures above are merged to obtain the next picture.
S102, analyzing each video frame of each original video, and extracting first video information of each first moving object included in the original video, where the first video information of each first moving object includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image containing the first moving object corresponding to each time.
In the embodiment of the invention, the electronic device may analyze, for each original video, video frames of the original video, and extract first video information of first moving objects included in the original video.
The first video information of each first moving object may include: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image containing the first moving object corresponding to each time.
For example, the electronic device may identify, for each original video, a respective first moving object in the original video. Furthermore, each video frame of the original video can be analyzed, and the video information of each first moving object in each video frame can be extracted. Further, video information of the same first moving object in each video frame may be integrated to obtain the first video information of the first moving object in the original video.
It should be noted that, in the embodiment of the present invention, for each original video, the process of analyzing each video frame of the original video and extracting the first video information of each first moving object included in the original video by the electronic device may be implemented by using any method in the prior art, and this process is not described in detail in the embodiment of the present invention.
S103, identifying identical second moving objects included in the at least two original videos according to the first video information of the first moving objects included in the original videos, and merging the first video information of the second moving objects in the at least two original videos to obtain the second video information of the second moving objects.
It will be appreciated that at least two original videos may contain the same first moving object.
Therefore, in the embodiment of the present invention, after the electronic device extracts the first video information of each first moving object included in each original video, it may identify the same second moving object included in at least two original videos according to the first video information of each first moving object included in each original video.
For example, when the electronic device generates corresponding digest videos for two original videos, video a and video B, the first video information of each first moving object included in each original video acquired by the electronic device may be: the video A comprises first moving objects 1, 2, 3 and 4 and first video information corresponding to the first moving objects respectively; in video B are the first moving objects 5, 6, 7 and the corresponding first video information.
In this case, the electronic device may determine whether any first moving object 1, 2, 3, 4 in the video a is the same as any first moving object 5, 6, 7 in the video B according to the first video information of each first moving object 1, 2, 3, 4 in the video a and the first video information of each first moving object 5, 6, 7 in the video B.
Specifically, the electronic device may compare the first video information of each first moving object 1, 2, 3, 4 in the video a with the first video information of each first moving object 5, 6, 7 in the video B in sequence, and then determine whether any first moving object 1, 2, 3, 4 in the video a is the same as any first moving object 5, 6, 7 in the video B.
For example, the electronic device may compare the first video information of the first moving object 1 in the video a with the first video information of each first moving object 5, 6, and 7 in the video B, and determine whether the first video information of the first moving object 1 matches the first video information of the first moving object 5, 6, and 7 according to each time when each first moving object appears in the original video, a position corresponding to each time, and an image including the first moving object corresponding to each time.
If the first video information of the first moving object 1 is not matched with the first video information of the first moving objects 5, 6 and 7, determining that the first moving object 1 in the video A and each first moving object in the video B are different moving objects; if the first video information of the first moving object 1 matches with any first moving object in the video B, for example, the first video information of the first moving object 5, it is determined that the first moving object 1 in the video a and the first moving object 5 in the video B are the same second moving object.
After recognizing that the first moving object 1 in the video a and the first moving object 5 in the video B are the same second moving object, the electronic device may use the identifier of the first moving object in the video a or the identifier of the first moving object in the video B as the identifier of the second moving object. For example, the identification of the second moving object may be determined as the second moving object 1.
With the above method, the electronic device may determine whether the first moving object 2, 3, 4 in video a is the same second moving object as any of the first moving objects 5, 6, 7 in video B.
After the same second moving objects included in each original video are identified, the electronic device may further merge the first video information of each second moving object in each original video to obtain the second video information of each second moving object.
As in the above example, when the electronic device determines that the first moving object 1 in the video a and the first moving object 5 in the video B are the same second moving object 1, it may acquire and combine the first video information of the first moving object 1 extracted from the video a and the first video information of the first moving object 5 extracted from the video B as the second video information of the second moving object 1.
S104, determining the positions of the first moving objects and the second moving objects in the target scene image according to the positions of the first moving objects in the original video image at all times, the positions of the second moving objects in the original video at all times and the relation between the collected scene image and the target scene image of each original video.
It is understood that the first video information of each first moving object records the position of each first moving object in the original video image at each moment, and the second video information of each second moving object records the position of each second moving object in each original video image at each moment.
However, after the electronic device combines the captured scene images of the original videos to generate the target scene image, the positions of the first moving objects in the original video image at various times and the positions of the second moving objects in the original video image at various times cannot be used to accurately display the first moving objects and/or the second moving objects at the correct positions in the target scene image.
Therefore, in the embodiment of the present invention, the electronic device may determine the positions of the respective moments of the respective first moving objects and the respective second moving objects in the object scene image according to the positions of the respective moments of the respective first moving objects in the original video image, the positions of the respective moments of the respective second moving objects in the respective original videos, and the relationship between the captured scene image and the object scene image of the respective original videos.
For example, as shown in fig. 3, when the captured scene images A, B of the two original videos are the same in size and both are rectangular areas, and the target scene image is formed by splicing the captured scene images of the two original videos, the size of the target scene image is twice the size of the captured scene image of each original video, and the captured scene image of one of the original videos is located on the left half of the target scene image, and the captured scene image of the other original video is located on the right half of the target scene image.
In this case, when the first moving object 1 is a moving object in the captured scene image a and is located at the central position 310 of the captured scene image a at a certain time, the electronic device may determine that the first moving object 1 should be located at the position 320 of the target scene image at the certain time, that is, at the position 3/4 which is located at the upper and lower positions and at the left and right positions of the target scene image in the target scene image.
And S105, sequentially displaying each first moving object and each second moving object in the target scene image according to each moment when each first moving object appears in the original video, an image which corresponds to each moment and contains each first moving object, the position of each first moving object in the target scene image, each moment when each second moving object appears in each original video, an image which corresponds to each moment and contains each second moving object, the position of each second moving object in the target scene image, and a preset display rule, so as to generate abstract videos corresponding to the at least two original videos.
After the electronic device determines the positions of the first moving objects and the second moving objects in the target scene image at the moments, the first moving objects and the second moving objects can be sequentially displayed in the target scene image according to the moments of the first moving objects appearing in the original video, the images corresponding to the moments and containing the first moving objects, the positions of the first moving objects in the target scene image, the moments of the second moving objects appearing in the original video, the images corresponding to the moments and containing the second moving objects, the positions of the second moving objects in the target scene image, and a preset display rule, so that the abstract videos corresponding to the at least two original videos are generated.
For example, the electronic device may sequentially extract, according to original videos in which the first moving objects and the second moving objects are located, first video information of one first moving object in each original video, and/or a second moving object of a second moving object. And displaying the image containing the first moving object and/or the second moving object corresponding to each moment in the target scene image at the corresponding position in the moment target scene image according to each moment when the extracted first moving object and/or second moving object appears in the original video and the position of the first moving object and/or second moving object corresponding to each moment in the target scene image.
After the tracks of the first moving objects and/or the second moving objects in the object scene image are all displayed according to the extracted first video information of the first moving objects and/or the extracted second video information of the second moving objects, the electronic device may extract the first video information of another first moving object and/or the second video information of the second moving object in each original video, and display the tracks of the first moving object and/or the second moving object in the object scene image until all the tracks of the first moving object and the second moving object are displayed in the object scene image, so as to generate the summary video corresponding to the at least two original videos.
Or, the electronic device may ensure that only one moving object is displayed in the target scene image at any moment, sequentially extract the first video information of each first moving object and the second video information of the second moving object according to the storage sequence of the first video information of each first moving object and the second video information of the second moving object stored locally, and sequentially display each first moving object and each second moving object in the target scene image to generate the digest videos corresponding to the at least two original videos.
The embodiment of the invention provides a method for generating abstract videos, which can be used for merging the collected scene images of at least two original videos according to the geographical position relationship of the collected scenes of the at least two original videos to obtain a target scene image, further displaying a moving target in each original video in the target scene image to generate the abstract videos corresponding to the at least two original videos, and improving user experience.
As an implementation manner of the present invention, when the electronic device identifies the same second moving object included in at least two original videos according to the first video information of each first moving object included in each original video, an image including any one first moving object in each original video may be obtained for any two original videos, and a matching degree of the two images is calculated; when the matching degree is greater than a first preset threshold, such as 80%, 85%, 90%, etc., it may be determined that the two first moving objects are the same second moving object.
As another implementation manner of the present invention, in order to improve the accuracy of determining that two first moving objects are the same second moving object, after the electronic device obtains an image including any one of the first moving objects in any two original videos and calculates the matching degree of the two images, it may further determine whether the two first moving objects meet a preset condition according to the time when the two first moving objects appear in the two original videos and the position corresponding to each time.
For example, the electronic device may determine, according to each time when the two first moving objects appear in the two original videos and the positions corresponding to each time, whether the time and the position at which one of the two first moving objects disappears from the original video where the one of the two first moving objects appears are matched with the time and the position at which the other one of the two first moving objects appears in the original video where the one of the two first moving objects appears, and if so, determine that the two first moving objects meet the preset condition.
When the two first moving objects meet the preset condition, the electronic device may further determine whether the matching degree of the two first images is greater than a first preset threshold, and if so, determine that the two first moving objects are the same second moving object.
In the embodiment of the invention, when the electronic device generates the abstract videos corresponding to the at least two original videos, the first moving object and/or the second moving object can be displayed in the target scene image according to a display rule preset by a user.
As an implementation manner of the present invention, the preset display rule may be that a predetermined number (e.g. 4, 5, 8, etc.) of first moving objects and/or second moving objects are displayed in the target scene image.
That is, the number of first moving objects and/or second moving objects displayed in the target scene image is the predetermined number at any one time.
Specifically, the electronic device may extract a predetermined number of pieces of first video information of each first moving object and/or second video information of each second moving object simultaneously from the locally stored storage order of the first video information of each first moving object and the second video information of each second moving object, and simultaneously display the trajectory of the first moving object and/or the second moving object in the target scene image according to the extracted first video information of the first moving object and/or second video information of the second moving object.
After any first moving object or second moving object disappears in the target scene image, the electronic device may extract first video information of another first moving object or second video information of the second moving object, and display the trajectory of the first moving object or second moving object in the target scene image until all the trajectories of the first moving object and the second moving object are displayed in the target scene image, and generate the digest video corresponding to the at least two original videos.
It will be appreciated that in some cases, a user may want to obtain information about a certain time period, a certain range of positions, or a certain moving object. In this case, the user may preset a display rule as needed, so that the electronic device generates a corresponding summary video according to the set display rule.
As another implementation manner of the present invention, the preset display rule may include at least one of the following: the time when the first moving object and/or the second moving object appear in the original video is within a preset time range, the position where the first moving object and/or the second moving object appear in the original video is within a preset position range, and the matching degree of the image containing the first moving object or the second moving object and the preset contrast image is greater than a second preset threshold, such as 80%, 85%, 90% and the like.
After the user sets the display rule, the electronic device may screen out the first moving object and/or the second moving object that meet the display rule and the corresponding partial or all video information from the first video information of each first moving object and the second video information of each second moving object that are locally stored according to the display rule, and further display the track of the first moving object and/or the second moving object that corresponds to the partial or all video information in the target scene image, and generate the corresponding abstract video.
For example, when the preset display rule indicates that the time when the first moving object and/or the second moving object appear in the original video is within the preset time range, the electronic device may screen out the video information corresponding to each first moving object and each second moving object whose appearance time is within the preset time range, according to each time when each first moving object appears in the original video and each time when each second moving object appears in each original video, in the first video information of each first moving object and the second video information of each second moving object.
Furthermore, the electronic device may display, in the target scene image, the tracks of the first moving objects and the second moving objects corresponding to the video information according to the video information corresponding to the first moving objects and the second moving objects that are screened out, and generate corresponding abstract videos.
When the electronic device displays the trajectories of the first moving objects and the second moving objects corresponding to the video information in the target scene image, the electronic device may simultaneously display the trajectories of all the screened first moving objects and second moving objects in the target scene image.
Or, a number threshold may be preset, and when the number of the screened first moving objects and the number of each second moving object are less than or equal to the number threshold, the tracks of all the screened first moving objects and the tracks of all the screened second moving objects may be displayed in the target scene image at the same time; when the number of the screened first moving objects and the number of the screened second moving objects are greater than the number threshold, the tracks of the first moving objects and the tracks of the second moving objects, the number of which is equal to the number threshold, may be displayed in the target scene image each time, and the tracks of the screened first moving objects and the tracks of the second moving objects may be displayed in the target scene image for multiple times.
When the preset display rule indicates that the positions of the first moving objects and/or the second moving objects appearing in the original video are within the preset position range, the electronic device may screen out the video information corresponding to the first moving objects and the second moving objects, the positions of the first moving objects corresponding to the moments appearing in the original video and the positions of the second moving objects corresponding to the moments appearing in the original video, according to the first video information of the first moving objects and the second video information of the second moving objects.
Furthermore, the electronic device may display, in the target scene image, the tracks of the first moving objects and the second moving objects corresponding to the video information according to the video information corresponding to the first moving objects and the second moving objects that are screened out, and generate corresponding abstract videos.
When the electronic device displays the trajectories of the first moving objects and the second moving objects corresponding to the video information in the target scene image, the electronic device may simultaneously display the trajectories of all the screened first moving objects and second moving objects in the target scene image.
Or, a number threshold may be preset, and when the number of the screened first moving objects and the number of each second moving object are less than or equal to the number threshold, the tracks of all the screened first moving objects and the tracks of all the screened second moving objects may be displayed in the target scene image at the same time; when the number of the screened first moving objects and the number of the screened second moving objects are greater than the number threshold, the tracks of the first moving objects and the tracks of the second moving objects, the number of which is equal to the number threshold, may be displayed in the target scene image each time, and the tracks of the screened first moving objects and the tracks of the second moving objects may be displayed in the target scene image for multiple times.
When the preset display rule indicates that the matching degree of the image containing the first moving object or the second moving object and the preset contrast image is greater than the second preset threshold, the electronic device may calculate the matching degree of the image containing the first moving object or the second moving object and the contrast image according to the first video information of each first moving object and the second video information of the second moving object, and screen out the first video information of each first moving object and the second video information of each second moving object corresponding to the image with the matching degree greater than the preset threshold.
Furthermore, the electronic device may display the tracks of each first moving object and each second moving object in the object scene image according to the first video information of each first moving object and the second video information of each second moving object, which are screened out, and generate a corresponding abstract video.
When the electronic device displays the tracks of the first moving objects and the second moving objects screened out in the target scene image, the tracks of all the first moving objects and the second moving objects screened out can be displayed in the target scene image at the same time.
Or, a number threshold may be preset, and when the number of the screened first moving objects and the number of each second moving object are less than or equal to the number threshold, the tracks of all the screened first moving objects and the tracks of all the screened second moving objects may be displayed in the target scene image at the same time; when the number of the screened first moving objects and the number of the screened second moving objects are greater than the number threshold, the tracks of the first moving objects and the tracks of the second moving objects, the number of which is equal to the number threshold, may be displayed in the target scene image each time, and the tracks of the screened first moving objects and the tracks of the second moving objects may be displayed in the target scene image for multiple times.
As another implementation manner of the present invention, after the electronic device generates the digest videos corresponding to at least two original videos, the digest videos may be further displayed.
Optionally, in order to improve the summarized video display effect, the electronic device may construct a three-dimensional scene model corresponding to the target scene image in advance. After the electronic equipment generates the abstract video, the three-dimensional scene model corresponding to the target scene image can be obtained, and the abstract video is superposed to the three-dimensional scene model, so that the abstract video is displayed in a three-dimensional mode, and user experience is improved.
It should be noted that, in the embodiment of the present invention, a process of generating a three-dimensional scene model by an electronic device may adopt the prior art, and this process is not described in detail in the embodiment of the present invention.
Corresponding to the above method embodiment, the embodiment of the present invention also provides a corresponding device embodiment.
Fig. 4 is a summary video generating apparatus according to an embodiment of the present invention, where the apparatus includes:
a merging module 410, configured to determine, for at least two original videos, a geographical position relationship of an acquisition scene of each original video, and merge, according to the geographical position relationship of the acquisition scene of each original video, acquisition scene images of the at least two original videos to obtain a target scene image;
an extracting module 420, configured to analyze, for each original video, video frames of the original video, and extract first video information of first moving objects included in the original video, where the first video information of the first moving objects includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image corresponding to each time and containing the first moving object;
the identifying module 430 is configured to identify, according to first video information of each first moving object included in each original video, a same second moving object included in the at least two original videos, and merge the first video information of each second moving object in the at least two original videos to obtain second video information of each second moving object;
a determining module 440, configured to determine, according to a position of each first moving object in the original video image at each time, a position of each second moving object in each original video at each time, and a relationship between a captured scene image of each original video and a target scene image, positions of each first moving object and each second moving object in the target scene image at each time;
a generating module 450, configured to sequentially display each first moving object and each second moving object in the target scene image according to each time when each first moving object appears in the original video, an image including each first moving object corresponding to each time, and a position of each first moving object in the target scene image, each time when each second moving object appears in each original video, an image including each second moving object corresponding to each time, and a position of each second moving object in the target scene image, and a preset display rule, so as to generate a digest video corresponding to the at least two original videos.
The embodiment of the invention provides a summary video generation device, which can combine the collected scene images of at least two original videos according to the geographical position relationship of the collected scenes of the at least two original videos to obtain a target scene image, further can display a moving target in each original video in the target scene image to generate summary videos corresponding to the at least two original videos, and can improve user experience.
Optionally, the merging module 410 includes:
a splicing submodule (not shown in the figure) for splicing the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos partially coincide;
and the linking submodule (not shown in the figure) is used for linking the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos do not coincide.
Optionally, the identifying module 430 includes:
a calculating submodule (not shown in the figure) for obtaining an image containing any one first moving object in each original video respectively for any two original videos, and calculating a matching degree of the two images;
and a first determining sub-module (not shown in the figure) for determining that the two first moving objects are the same second moving object when the matching degree is greater than a first preset threshold.
Optionally, the apparatus further comprises:
a second determining submodule (not shown in the figure) configured to determine whether the two first moving objects meet a preset condition according to the times when the two first moving objects appear in the two original videos and the positions corresponding to the times; if so, triggering the first determination submodule.
Optionally, the preset display rule includes displaying a predetermined number of first moving objects and/or second moving objects in the target scene image.
Optionally, the preset display rule includes at least one of: the time when the first moving target and/or the second moving target appear in the original video is within a preset time range, the position where the first moving target and/or the second moving target appear in the original video is within a preset position range, and the matching degree of the image containing the first moving target or the second moving target and a preset contrast image is larger than a second preset threshold value.
Optionally, the apparatus further comprises:
and a processing module (not shown in the figure) for acquiring a three-dimensional scene model corresponding to the target scene image and overlaying the abstract video to the three-dimensional scene model.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for generating a summary video, the method comprising:
determining the geographical position relationship of the acquisition scene of each original video aiming at least two original videos, and merging the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain a target scene image;
for each original video, analyzing each video frame of the original video, and extracting first video information of each first moving object included in the original video, wherein the first video information of each first moving object includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image corresponding to each time and containing the first moving object;
identifying identical second moving objects included in the at least two original videos according to first video information of each first moving object included in each original video, and merging the first video information of each second moving object in the at least two original videos to obtain second video information of each second moving object;
determining the positions of the first moving objects and the second moving objects in the target scene image at each moment according to the positions of the first moving objects in the original video image at each moment, the positions of the second moving objects in the original video at each moment and the relation between the acquired scene image of each original video and the target scene image;
sequentially displaying each first moving object and each second moving object in the target scene image according to each moment when each first moving object appears in the original video, an image which corresponds to each moment and contains each first moving object, the position of each first moving object in the target scene image, each moment when each second moving object appears in each original video, an image which corresponds to each moment and contains each second moving object, the position of each second moving object in the target scene image and a preset display rule, and generating abstract videos corresponding to at least two original videos;
the identifying, according to the first video information of each first moving object included in each original video, a same second moving object included in the at least two original videos includes:
aiming at any two original videos, respectively obtaining an image containing any one first moving target in each original video, and calculating the matching degree of the two images;
determining whether the two first moving objects meet preset conditions or not according to the moments when the two first moving objects appear in the two original videos and the positions corresponding to the moments;
if yes, when the matching degree is larger than a first preset threshold value, the two first moving objects are determined to be the same second moving object.
2. The method according to claim 1, wherein the merging the captured scene images of the at least two original videos according to the geographical location relationship of the captured scene of each original video comprises:
when the geographic positions of the collected scenes of any two original videos partially coincide, splicing the collected scene images of the two original videos;
when the geographic positions of the collected scenes of any two original videos are not coincident, the collected scene images of the two original videos are subjected to joint processing.
3. The method according to claim 1, wherein the preset display rule comprises displaying a predetermined number of first moving objects and/or second moving objects in the target scene image.
4. The method of claim 1, wherein the preset display rules comprise at least one of: the time when the first moving target and/or the second moving target appear in the original video is within a preset time range, the position where the first moving target and/or the second moving target appear in the original video is within a preset position range, and the matching degree of the image containing the first moving target or the second moving target and a preset contrast image is larger than a second preset threshold value.
5. The method according to any one of claims 1-4, further comprising:
and acquiring a three-dimensional scene model corresponding to the target scene image, and overlaying the abstract video into the three-dimensional scene model.
6. An apparatus for generating a summarized video, the apparatus comprising:
the merging module is used for determining the geographical position relationship of the acquisition scene of each original video aiming at least two original videos, and merging the acquisition scene images of the at least two original videos according to the geographical position relationship of the acquisition scene of each original video to obtain a target scene image;
an extraction module, configured to analyze each video frame of each original video, and extract first video information of each first moving object included in the original video, where the first video information of each first moving object includes: each time of the first moving object appearing in the original video, the position corresponding to each time, and the image corresponding to each time and containing the first moving object;
the identification module is used for identifying the same second moving object in the at least two original videos according to the first video information of each first moving object in each original video, and merging the first video information of each second moving object in the at least two original videos to obtain the second video information of each second moving object;
the determining module is used for determining the positions of the first moving objects and the second moving objects in the target scene image at various moments according to the positions of the first moving objects in the original video image at various moments, the positions of the second moving objects in the original video at various moments and the relation between the acquired scene image of each original video and the target scene image;
a generating module, configured to sequentially display each first moving object and each second moving object in the target scene image according to each time when each first moving object appears in an original video, an image including each first moving object corresponding to each time, a position of each first moving object in the target scene image, each time when each second moving object appears in each original video, an image including each second moving object corresponding to each time, and a position of each second moving object in the target scene image, and a preset display rule, so as to generate digest videos corresponding to the at least two original videos;
the identification module comprises:
the calculation submodule is used for respectively obtaining an image containing any first moving target in each original video aiming at any two original videos and calculating the matching degree of the two images;
the first determining submodule is used for determining that the two first moving targets are the same second moving target when the matching degree is greater than a first preset threshold;
the second determining submodule is used for determining whether the two first moving targets meet the preset condition or not according to the moments when the two first moving targets appear in the two original videos and the positions corresponding to the moments; if so, triggering the first determination submodule.
7. The apparatus of claim 6, wherein the merging module comprises:
the splicing submodule is used for splicing the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos are partially overlapped;
and the linking submodule is used for linking the acquired scene images of any two original videos when the geographic positions of the acquired scenes of the two original videos are not coincident.
8. The apparatus of claim 6, wherein the preset display rule comprises displaying a predetermined number of first moving objects and/or second moving objects in the target scene image.
9. The apparatus of claim 6, wherein the preset display rule comprises at least one of: the time when the first moving target and/or the second moving target appear in the original video is within a preset time range, the position where the first moving target and/or the second moving target appear in the original video is within a preset position range, and the matching degree of the image containing the first moving target or the second moving target and a preset contrast image is larger than a second preset threshold value.
10. The apparatus according to any one of claims 6-9, further comprising:
and the processing module is used for acquiring a three-dimensional scene model corresponding to the target scene image and overlaying the abstract video into the three-dimensional scene model.
CN201610409337.7A 2016-06-12 2016-06-12 Abstract video generation method and device Active CN107493441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610409337.7A CN107493441B (en) 2016-06-12 2016-06-12 Abstract video generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610409337.7A CN107493441B (en) 2016-06-12 2016-06-12 Abstract video generation method and device

Publications (2)

Publication Number Publication Date
CN107493441A CN107493441A (en) 2017-12-19
CN107493441B true CN107493441B (en) 2020-03-06

Family

ID=60642731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610409337.7A Active CN107493441B (en) 2016-06-12 2016-06-12 Abstract video generation method and device

Country Status (1)

Country Link
CN (1) CN107493441B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101262568A (en) * 2008-04-21 2008-09-10 中国科学院计算技术研究所 A method and system for generating video outline
CN101621634A (en) * 2009-07-24 2010-01-06 北京工业大学 Method for splicing large-scale video with separated dynamic foreground
CN101751677A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Target continuous tracking method based on multi-camera
CN102256065A (en) * 2011-07-25 2011-11-23 中国科学院自动化研究所 Automatic video condensing method based on video monitoring network
CN102495907A (en) * 2011-12-23 2012-06-13 香港应用科技研究院有限公司 Video summary with depth information
CN102984601A (en) * 2012-12-11 2013-03-20 常州环视高科电子科技有限公司 Generation system for video abstract of camera
US8872979B2 (en) * 2002-05-21 2014-10-28 Avaya Inc. Combined-media scene tracking for audio-video summarization
CN104199841A (en) * 2014-08-06 2014-12-10 武汉图歌信息技术有限责任公司 Video editing method for generating animation through pictures and splicing and composing animation and video clips
CN104581437A (en) * 2014-12-26 2015-04-29 中通服公众信息产业股份有限公司 Video abstract generation and video backtracking method and system
CN104702917A (en) * 2015-03-25 2015-06-10 成都市灵奇空间软件有限公司 Video concentrating method based on micro map
CN105100688A (en) * 2014-05-12 2015-11-25 索尼公司 Image processing method, image processing device and monitoring system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2929456A4 (en) * 2012-12-05 2016-10-12 Vyclone Inc Method and apparatus for automatic editing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8872979B2 (en) * 2002-05-21 2014-10-28 Avaya Inc. Combined-media scene tracking for audio-video summarization
CN101262568A (en) * 2008-04-21 2008-09-10 中国科学院计算技术研究所 A method and system for generating video outline
CN101751677A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Target continuous tracking method based on multi-camera
CN101621634A (en) * 2009-07-24 2010-01-06 北京工业大学 Method for splicing large-scale video with separated dynamic foreground
CN102256065A (en) * 2011-07-25 2011-11-23 中国科学院自动化研究所 Automatic video condensing method based on video monitoring network
CN102495907A (en) * 2011-12-23 2012-06-13 香港应用科技研究院有限公司 Video summary with depth information
CN102984601A (en) * 2012-12-11 2013-03-20 常州环视高科电子科技有限公司 Generation system for video abstract of camera
CN105100688A (en) * 2014-05-12 2015-11-25 索尼公司 Image processing method, image processing device and monitoring system
CN104199841A (en) * 2014-08-06 2014-12-10 武汉图歌信息技术有限责任公司 Video editing method for generating animation through pictures and splicing and composing animation and video clips
CN104581437A (en) * 2014-12-26 2015-04-29 中通服公众信息产业股份有限公司 Video abstract generation and video backtracking method and system
CN104702917A (en) * 2015-03-25 2015-06-10 成都市灵奇空间软件有限公司 Video concentrating method based on micro map

Also Published As

Publication number Publication date
CN107493441A (en) 2017-12-19

Similar Documents

Publication Publication Date Title
US9208226B2 (en) Apparatus and method for generating evidence video
US9934453B2 (en) Multi-source multi-modal activity recognition in aerial video surveillance
CN102547141B (en) Method and device for screening video data based on sports event video
Yang et al. Lecture video indexing and analysis using video ocr technology
CN108012202B (en) Video concentration method, device, computer readable storage medium and computer device
JPWO2016162963A1 (en) Image search apparatus, system and method
TWI601425B (en) A method for tracing an object by linking video sequences
CN106327531A (en) Panorama video identification method and device, and video playing method and device
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
CN108563651B (en) Multi-video target searching method, device and equipment
CN103544467A (en) Method and device for detecting and recognizing station captions
KR101645959B1 (en) The Apparatus and Method for Tracking Objects Based on Multiple Overhead Cameras and a Site Map
CN107493441B (en) Abstract video generation method and device
US10958854B2 (en) Computer-implemented method for generating an output video from multiple video sources
JP4728795B2 (en) Person object determination apparatus and person object determination program
CN103903269A (en) Structural description method and system of dome camera monitor video
CN110458895B (en) Image coordinate system conversion method, device, equipment and storage medium
CN111177449A (en) Multi-dimensional information integration method based on picture and related equipment
CN102984601A (en) Generation system for video abstract of camera
TWI616763B (en) Method for video indexing and device using the same
CN110781797B (en) Labeling method and device and electronic equipment
Martín et al. Automatic Players Detection and Tracking in Multi-camera Tennis Videos
Solus et al. Inventory system of vertical traffic signs
Salehin et al. An efficient method for video summarization using moving object information
ES2684690B1 (en) Method for capturing images from a portable device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant