CN114710713B - Automatic video abstract generation method based on deep learning - Google Patents

Automatic video abstract generation method based on deep learning Download PDF

Info

Publication number
CN114710713B
CN114710713B CN202210337196.8A CN202210337196A CN114710713B CN 114710713 B CN114710713 B CN 114710713B CN 202210337196 A CN202210337196 A CN 202210337196A CN 114710713 B CN114710713 B CN 114710713B
Authority
CN
China
Prior art keywords
video
sub
occasion
environment
videos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210337196.8A
Other languages
Chinese (zh)
Other versions
CN114710713A (en
Inventor
兰雨晴
唐霆岳
余丹
邢智涣
王丹星
黄永琢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Intelligent Security Technology Co Ltd
Original Assignee
China Standard Intelligent Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Intelligent Security Technology Co Ltd filed Critical China Standard Intelligent Security Technology Co Ltd
Priority to CN202210337196.8A priority Critical patent/CN114710713B/en
Publication of CN114710713A publication Critical patent/CN114710713A/en
Application granted granted Critical
Publication of CN114710713B publication Critical patent/CN114710713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/665Control of cameras or camera modules involving internal camera communication with the image sensor, e.g. synchronising or multiplexing SSIS control signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention provides an automatic video abstract generating method based on deep learning, which is used for synchronously shooting areas with different azimuth in the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environment occasion sub-videos to obtain semantic tags of different objects appearing on the environment occasion sub-videos, and then forming video content abstracts in preset pictures of the environment occasion sub-videos; finally, according to the shooting azimuth of each environmental occasion sub-video, all the environmental occasion sub-videos are subjected to picture splicing, so that corresponding environmental panoramic occasion videos are obtained, synchronous identification and analysis can be carried out on the environmental occasion sub-videos shot by different cameras, objects in the environmental occasion sub-videos are calibrated, and matched video content summaries are generated, so that comprehensive and accurate screening and identification are carried out on the videos, and the automation and intelligent degree of video identification processing are improved.

Description

Automatic video abstract generation method based on deep learning
Technical Field
The invention relates to the technical field of video data processing, in particular to an automatic video abstract generating method based on deep learning.
Background
At present, a camera monitoring device is usually arranged in a public place to collect real-time images of the place, and the collected monitoring images are identified and analyzed, so that abnormal personnel or conditions in the monitoring images are screened. The prior art basically carries out manual screening and identification on the monitoring image, the mode needs to rely on a large number of personnel to carry out frame-by-frame screening and identification on the monitoring image, and the summarizing and integration of the monitoring image on the identification result cannot be carried out, so that the monitoring image cannot be comprehensively and accurately screened and identified, the monitoring image cannot be deeply processed, and the automation and intelligent degree of the monitoring image identification are reduced.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides an automatic video abstract generating method based on deep learning, which synchronously shoots areas with different orientations in the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environment occasion sub-videos to obtain semantic tags of different objects appearing on the environment occasion sub-videos, and then forming video content abstracts in preset pictures of the environment occasion sub-videos; finally, according to the shooting azimuth of each environmental occasion sub-video, all the environmental occasion sub-videos are subjected to picture splicing, so that corresponding environmental panoramic occasion videos are obtained, synchronous identification and analysis can be carried out on the environmental occasion sub-videos shot by different cameras, objects in the environmental occasion sub-videos are calibrated, and matched video content summaries are generated, so that comprehensive and accurate screening and identification are carried out on the videos, and the automation and intelligent degree of video identification processing are improved.
The invention provides an automatic video abstract generating method based on deep learning, which comprises the following steps:
step S1, synchronously shooting areas with different orientations in the same environment occasion through a plurality of cameras respectively, so as to acquire a plurality of environment occasion sub-videos; according to the shooting directions of the environment occasion sub-videos, all the environment occasion sub-videos are stored in a block chain in a grouping mode;
step S2, extracting corresponding environment occasion sub-videos from the block chain according to a video acquisition request from a video processing terminal, and transmitting the environment occasion sub-videos to the video processing terminal; identifying the environment occasion sub-video, thereby obtaining semantic tags of different objects appearing on the environment occasion sub-video;
step S3, forming a video content abstract in a preset picture of the environment scene sub-video according to the semantic tag; then carrying out data compression processing on the environment occasion sub-video;
and S4, performing picture stitching on all the environment occasion sub-videos according to the shooting azimuth of each environment occasion sub-video, so as to obtain corresponding environment panoramic occasion videos.
Further, in the step S1, the plurality of cameras are used to synchronously shoot the different azimuth areas of the same environmental occasion, so as to acquire a plurality of environmental occasion sub-videos specifically includes:
the method comprises the steps of respectively aligning the shooting directions of a plurality of cameras to different azimuth areas of the same environment occasion along the circumferential direction, and simultaneously adjusting the shooting view angle of each camera, so that the whole shooting view angle of all cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then, indicating all cameras to synchronously shoot at the same focal length, so as to acquire a plurality of environment occasion sub-videos.
Further, in the step S1, storing all the environmental case sub-videos into the blockchain according to the shooting directions of the environmental case sub-videos specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into corresponding environment occasion sub-videos; and storing all the environment case sub-video groups into the blockchain.
Further, in the step S2, according to a video acquisition request from a video processing terminal, extracting a corresponding environment occasion sub-video from the blockchain, and transmitting the environment occasion sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting environment occasion sub-videos matched with the video shooting time range from the blockchain; and synchronously transmitting all the extracted environment occasion sub-videos to the video processing terminal.
Further, in the step S2, the identifying process is performed on the environmental occasion sub-video, so as to obtain semantic tags of different objects appearing in the environmental occasion sub-video, which specifically includes:
decomposing the environment occasion sub-video into a plurality of environment occasion picture frames according to the video stream time axis sequence of the environment occasion sub-video;
carrying out identification processing on each environmental occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environmental occasion picture frames;
and generating an identity attribute semantic tag and an action attribute semantic tag related to the object according to the identity attribute information and the action attribute information.
Further, in the step S3, forming a video content abstract in a preset picture of the environmental case sub-video according to the semantic tag specifically includes:
generating a text abstract about the identity state and the action state of the object according to the identity attribute semantic tag and the action attribute semantic tag;
selecting a predetermined abstract addition picture area in an environmental occasion picture frame in which the object appears, wherein the abstract addition picture area is not overlapped with an appearance picture area of the object in the environmental occasion picture frame;
and adding the text abstract into the abstract adding picture area, and then performing self-font amplification display on the text abstract.
Further, in the step S3, the data compression processing for the environmental case sub-video specifically includes:
and sequentially recombining all the environmental occasion picture frames according to the video stream time axis sequence of the environmental occasion sub-videos to obtain the environmental occasion sub-videos, and then carrying out fidelity compression processing on the environmental occasion sub-videos.
Further, in the step S3, performing a fidelity compression process on the environmental scene sub-video specifically includes:
step S301, using the following formula (1), selecting the fidelity compressed pixel value of the video according to the environment occasion sub-video,
in the above formula (1), l represents a fidelity compressed pixel value of the environmental scene sub-video; l (L) a (i, j) represents the pixel value of the ith row and jth column pixel point of the ith frame image of the sub-video of the environment occasion; m represents the number of pixel points in each row of each frame image of the environment scene sub-video; n represents the number of pixel points in each column of each frame of image of the environment scene sub-video;the value of i is taken from 1 to n, and the value of j is taken from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environmental scene sub-video; />Indicating that the value of a is taken from 1 to G to obtain the minimum value in brackets; the method comprises the steps of carrying out a first treatment on the surface of the
Step S302, performing the fidelity compression processing on the environment occasion sub-video according to the fidelity compression pixel value by using the following formula (2),
in the above-mentioned formula (2),pixel data (data is in a pixel matrix form) representing an a-frame image after performing fidelity compression on the environment scene sub-video; />Substituting the value of i from 1 to n, substituting the value of j from 1 to m into brackets for all calculation;
step S303, judging whether the compression is effective compression according to the compressed environment case sub-video data by using the following formula (3), and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a restoration control value of data; h () represents the data amount of the data in the brackets;
if y=1, it indicates that the environment scene sub-video after performing the fidelity compression needs to be restored;
if y=0, it means that the environment scene sub-video after being subjected to the fidelity compression does not need to be restored.
Further, in the step S4, according to the shooting azimuth of each environmental occasion sub-video, performing picture stitching on all the environmental occasion sub-videos, so as to obtain corresponding environmental panoramic occasion videos specifically including:
and according to the shooting azimuth of each environment occasion sub-video and the shooting time axis of each environment occasion sub-video, performing picture seamless splicing on all the environment occasion sub-videos, thereby obtaining the corresponding environment panoramic occasion video.
Compared with the prior art, the automatic video abstract generation method based on deep learning synchronously shoots different azimuth areas of the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environment occasion sub-videos to obtain semantic tags of different objects appearing on the environment occasion sub-videos, and then forming video content abstracts in preset pictures of the environment occasion sub-videos; finally, according to the shooting azimuth of each environmental occasion sub-video, all the environmental occasion sub-videos are subjected to picture splicing, so that corresponding environmental panoramic occasion videos are obtained, synchronous identification and analysis can be carried out on the environmental occasion sub-videos shot by different cameras, objects in the environmental occasion sub-videos are calibrated, and matched video content summaries are generated, so that comprehensive and accurate screening and identification are carried out on the videos, and the automation and intelligent degree of video identification processing are improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an automated video summary generation method based on deep learning.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of an automatic video abstract generating method based on deep learning according to an embodiment of the invention is shown. The automatic video abstract generation method based on deep learning comprises the following steps:
step S1, synchronously shooting areas with different orientations in the same environment occasion through a plurality of cameras respectively, so as to acquire a plurality of environment occasion sub-videos; according to the shooting directions of the environment occasion sub-videos, all the environment occasion sub-videos are stored in a block chain in a grouping mode;
step S2, extracting corresponding environment occasion sub-videos from the block chain according to a video acquisition request from a video processing terminal, and transmitting the sub-videos to the video processing terminal; then, the sub-videos of the environment occasion are identified, so that semantic tags of different objects appearing on the sub-videos of the environment occasion are obtained;
step S3, forming a video content abstract in a preset picture of the environment scene sub-video according to the semantic tag; then, carrying out data compression processing on the environment occasion sub-video;
and S4, performing picture stitching on all the environment occasion sub-videos according to the shooting azimuth of each environment occasion sub-video, so as to obtain corresponding environment panoramic occasion videos.
The beneficial effects of the technical scheme are as follows: the automatic video abstract generation method based on deep learning synchronously shoots different azimuth areas of the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environment occasion sub-videos to obtain semantic tags of different objects appearing on the environment occasion sub-videos, and then forming video content abstracts in preset pictures of the environment occasion sub-videos; finally, according to the shooting azimuth of each environmental occasion sub-video, all the environmental occasion sub-videos are subjected to picture splicing, so that corresponding environmental panoramic occasion videos are obtained, synchronous identification and analysis can be carried out on the environmental occasion sub-videos shot by different cameras, objects in the environmental occasion sub-videos are calibrated, and matched video content summaries are generated, so that comprehensive and accurate screening and identification are carried out on the videos, and the automation and intelligent degree of video identification processing are improved.
Preferably, in the step S1, different azimuth areas of the same environmental occasion are synchronously shot by a plurality of cameras, so as to acquire a plurality of environmental occasion sub-videos specifically including:
the shooting directions of a plurality of cameras are respectively aligned to different azimuth areas of the same environment occasion along the circumferential direction, and meanwhile, the shooting view angle of each camera is adjusted, so that the whole shooting view angle of all cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then, indicating all cameras to synchronously shoot at the same focal length, so as to acquire a plurality of environment occasion sub-videos.
The beneficial effects of the technical scheme are as follows: the cameras are arranged to be respectively aligned with different azimuth areas of the same environment occasion along the circumferential direction, so that each camera can independently shoot videos of the corresponding azimuth area, panoramic shooting without dead angles is carried out on the environment occasion, and video shooting instantaneity of the environment occasion is improved. In addition, all cameras are instructed to shoot synchronously with the same focal length, so that the environment occasion sub-videos shot by each camera can be guaranteed to have the same focal depth range, and the follow-up rapid splicing and integration of the sub-videos in different environment occasions can be facilitated.
Preferably, in the step S1, storing all the environmental case sub-videos into the blockchain according to the shooting directions of the environmental case sub-videos specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into corresponding environment occasion sub-videos; and storing all the environment case sub-video groups into the blockchain.
The beneficial effects of the technical scheme are as follows: the shooting directions of video shooting by different cameras are different, and the shooting direction information of each camera is used as video index information to be added into the corresponding environment occasion sub-video, so that the required environment occasion sub-video can be quickly and accurately found in the blockchain.
Preferably, in the step S2, according to a video acquisition request from a video processing terminal, extracting a corresponding environment occasion sub-video from the blockchain, and transmitting the extracted environment occasion sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting an environment scene sub-video matched with the video shooting time range from the blockchain; and synchronously transmitting all the extracted environment occasion sub-videos to the video processing terminal.
The beneficial effects of the technical scheme are as follows: in practical applications, the video processing terminal may be, but is not limited to, a computer having an image processing function. The video processing terminal sends a video acquisition request to the block chain, and then the block chain obtains an environment occasion sub-video shot in a corresponding time range according to the video shooting time range condition in the video acquisition request, so that the environment occasion sub-video can be conveniently identified and processed in a time-division mode.
Preferably, in the step S2, the identifying process is performed on the environmental occasion sub-video, so as to obtain semantic tags about different objects appearing in the environmental occasion sub-video specifically includes:
decomposing the environment occasion sub-video into a plurality of environment occasion picture frames according to the video stream time axis sequence of the environment occasion sub-video;
carrying out identification processing on each environmental occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environmental occasion picture frame;
based on the identity attribute information and the action attribute information, an identity attribute semantic tag and an action attribute semantic tag are generated for the object.
The beneficial effects of the technical scheme are as follows: according to the video stream time axis sequence of the environment occasion sub-video, the environment occasion sub-video is decomposed into a plurality of environment occasion picture frames, so that the environment occasion sub-video can be subjected to refinement identification processing. Specifically, face recognition and limb motion recognition are performed on the person object existing in each environmental picture frame, so that identity attribute information and motion attribute information of the person object are obtained. And then generating a semantic text label according to the identity attribute information and the action attribute information, so that the real-time dynamic condition of the character object in the environment can be characterized in a text mode.
Preferably, in the step S3, forming the video content abstract in the preset frame of the environmental case sub-video according to the semantic tag specifically includes:
generating a text abstract about the identity state and the action state of the object according to the identity attribute semantic tag and the action attribute semantic tag;
selecting a predetermined abstract addition picture area in an environmental occasion picture frame in which the object appears, wherein the abstract addition picture area is not overlapped with an appearance picture area of the object in the environmental occasion picture frame;
and adding the text abstract into the abstract adding picture area, and then performing self-font amplification display on the text abstract.
The beneficial effects of the technical scheme are as follows: and carrying out adaptive text combination on the identity attribute semantic tag and the action attribute semantic tag to obtain a text abstract of the identity state and the action state of the object, so that the real-time dynamic situation of the character object can be accurately and timely known by reading the text abstract. And then selecting a preset abstract adding picture area in the picture frame of the environmental occasion where the character object appears, and adding the character abstract into the abstract adding picture area, so that the real-time dynamic situation of the character object can be obtained simultaneously in the process of watching the environmental occasion sub-video, and the visual viewability of the environmental occasion sub-video is improved.
Preferably, in the step S3, the data compression processing for the environmental case sub-video specifically includes:
and sequentially recombining all the environmental occasion picture frames according to the video stream time axis sequence of the environmental occasion sub-videos to obtain the environmental occasion sub-videos, and then carrying out fidelity compression processing on the environmental occasion sub-videos.
The beneficial effects of the technical scheme are as follows: and sequentially recombining all the picture frames of the environment occasion according to the video stream time axis sequence of the environment occasion sub-video to obtain the environment occasion sub-video, so that each picture frame contained in the environment occasion sub-video can display the real-time dynamic condition of the character object.
Preferably, in the step S3, the performing a fidelity compression process on the environmental scene sub-video specifically includes:
step S301, using the following formula (1), selecting the fidelity compressed pixel value of the video according to the environment scene sub-video,
in the above formula (1), l represents a fidelity compressed pixel value of the environmental scene sub-video; l (L) a (i, j) representing pixel values of pixel points of an ith row and a jth column of an ith frame image of the environmental scene sub-video; m represents the number of pixel points in each row of each frame image of the environment scene sub-video; n represents the number of pixel points in each column of each frame of image of the environment scene sub-video;the value of i is taken from 1 to n, and the value of j is taken from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environmental scene sub-video; />Indicating that the value of a is taken from 1 to G to obtain the minimum value in brackets; the method comprises the steps of carrying out a first treatment on the surface of the
Step S302, performing the fidelity compression processing on the environment scene sub-video according to the fidelity compressed pixel value by using the following formula (2),
in the above-mentioned formula (2),pixel data (data is in a pixel matrix form) representing an a-frame image after performing fidelity compression on the environment scene sub-video; />Substituting the value of i from 1 to n, substituting the value of j from 1 to m into brackets for all calculation;
step S303, judging whether the compression is effective compression according to the compressed environment case sub-video data by using the following formula (3), and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a restoration control value of data; h () represents the data amount of the data in the brackets;
if y=1, it indicates that the environment scene sub-video after performing the fidelity compression needs to be restored;
if y=0, it means that the environment scene sub-video after being subjected to the fidelity compression does not need to be restored.
The beneficial effects of the technical scheme are as follows: the formula (1) is utilized to screen out the fidelity compressed pixel value of the video according to the environmental occasion sub-video, so that the video is required to be stored together with the fidelity compressed pixel value of the video after being subjected to the fidelity compression, and the subsequent decompression processing is convenient; then, performing fidelity compression processing on the environment scene sub-video according to the fidelity compression pixel value by utilizing the formula (2), so that the fidelity compression processing is performed rapidly and efficiently, and the operation efficiency of the system is improved; and finally, judging whether the compression is effective compression or not according to the compressed environment case sub-video data by utilizing the formula (3), controlling whether the compressed data needs to be restored or not, and further restoring the ineffective compression to ensure the reliability of video compression.
Preferably, in the step S4, according to the shooting azimuth of each environmental occasion sub-video, performing picture stitching on all the environmental occasion sub-videos, so as to obtain corresponding environmental panoramic occasion videos specifically including:
and according to the shooting azimuth of each environment occasion sub-video and the shooting time axis of each environment occasion sub-video, performing picture seamless splicing on all the environment occasion sub-videos, thereby obtaining the corresponding environment panoramic occasion video.
The beneficial effects of the technical scheme are as follows: and according to the shooting azimuth of each environmental occasion sub-video and the shooting time axis of each environmental occasion sub-video, performing picture seamless splicing on all the environmental occasion sub-videos, so that the obtained environmental panorama occasion video can comprehensively and truly reflect the real-time dynamic situation of the character object in the environmental occasion global.
As can be seen from the content of the above embodiment, the automatic video abstract generating method based on deep learning performs synchronous shooting on different azimuth areas of the same environmental occasion to obtain a plurality of environmental occasion sub-videos; identifying the environment occasion sub-videos to obtain semantic tags of different objects appearing on the environment occasion sub-videos, and then forming video content abstracts in preset pictures of the environment occasion sub-videos; finally, according to the shooting azimuth of each environmental occasion sub-video, all the environmental occasion sub-videos are subjected to picture splicing, so that corresponding environmental panoramic occasion videos are obtained, synchronous identification and analysis can be carried out on the environmental occasion sub-videos shot by different cameras, objects in the environmental occasion sub-videos are calibrated, and matched video content summaries are generated, so that comprehensive and accurate screening and identification are carried out on the videos, and the automation and intelligent degree of video identification processing are improved.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. The automatic video abstract generation method based on deep learning is characterized by comprising the following steps of:
step S1, synchronously shooting areas with different orientations in the same environment occasion through a plurality of cameras respectively, so as to acquire a plurality of environment occasion sub-videos; according to the shooting directions of the environment occasion sub-videos, all the environment occasion sub-videos are stored in a block chain in a grouping mode;
step S2, extracting corresponding environment occasion sub-videos from the block chain according to a video acquisition request from a video processing terminal, and transmitting the environment occasion sub-videos to the video processing terminal; identifying the environment occasion sub-video, thereby obtaining semantic tags of different objects appearing on the environment occasion sub-video;
step S3, forming a video content abstract in a preset picture of the environment scene sub-video according to the semantic tag; then carrying out data compression processing on the environment occasion sub-video;
and S4, performing picture stitching on all the environment occasion sub-videos according to the shooting azimuth of each environment occasion sub-video, so as to obtain corresponding environment panoramic occasion videos.
2. The automated video summary generation method based on deep learning of claim 1, wherein:
in the step S1, synchronous shooting is performed on different azimuth areas of the same environmental occasion through a plurality of cameras, so that a plurality of environmental occasion sub-videos are acquired specifically, the method comprises the following steps:
the method comprises the steps of respectively aligning the shooting directions of a plurality of cameras to different azimuth areas of the same environment occasion along the circumferential direction, and simultaneously adjusting the shooting view angle of each camera, so that the whole shooting view angle of all cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then, indicating all cameras to synchronously shoot at the same focal length, so as to acquire a plurality of environment occasion sub-videos.
3. The automated video summary generation method based on deep learning of claim 2, wherein:
in the step S1, storing all the environmental case sub-videos into a blockchain according to the shooting directions of the environmental case sub-videos specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into corresponding environment occasion sub-videos; and storing all the environment case sub-video groups into the blockchain.
4. The automated video summary generation method based on deep learning of claim 3, wherein:
in the step S2, according to a video acquisition request from a video processing terminal, extracting a corresponding environment occasion sub-video from the blockchain, and transmitting the sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting environment occasion sub-videos matched with the video shooting time range from the blockchain; and synchronously transmitting all the extracted environment occasion sub-videos to the video processing terminal.
5. The automated video summary generation method based on deep learning of claim 4, wherein:
in the step S2, the identifying process is performed on the environmental occasion sub-video, so as to obtain semantic tags of different objects appearing in the environmental occasion sub-video, which specifically includes:
decomposing the environment occasion sub-video into a plurality of environment occasion picture frames according to the video stream time axis sequence of the environment occasion sub-video;
carrying out identification processing on each environmental occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environmental occasion picture frames;
and generating an identity attribute semantic tag and an action attribute semantic tag related to the object according to the identity attribute information and the action attribute information.
6. The automated video summary generation method based on deep learning of claim 5, wherein:
in the step S3, forming a video content abstract in a preset picture of the environmental scene sub-video according to the semantic tag specifically includes:
generating a text abstract about the identity state and the action state of the object according to the identity attribute semantic tag and the action attribute semantic tag;
selecting a predetermined abstract addition picture area in an environmental occasion picture frame in which the object appears, wherein the abstract addition picture area is not overlapped with an appearance picture area of the object in the environmental occasion picture frame;
and adding the text abstract into the abstract adding picture area, and then carrying out font amplification display on the text abstract.
7. The automated video summary generation method based on deep learning of claim 6, wherein:
in the step S3, the data compression processing for the environmental case sub-video specifically includes:
and sequentially recombining all the environmental occasion picture frames according to the video stream time axis sequence of the environmental occasion sub-videos to obtain the environmental occasion sub-videos, and then carrying out fidelity compression processing on the environmental occasion sub-videos.
8. The automated video summary generation method based on deep learning of claim 7, wherein:
in the step S3, performing a fidelity compression process on the environmental scene sub-video specifically includes:
step S301, using the following formula (1), selecting the fidelity compressed pixel value of the video according to the environment occasion sub-video,
in the above formula (1), l represents a fidelity compressed image of the environmental scene sub-videoA prime value; l (L) a (i, j) represents the pixel value of the ith row and jth column pixel point of the ith frame image of the sub-video of the environment occasion; m represents the number of pixel points in each row of each frame image of the environment scene sub-video; n represents the number of pixel points in each column of each frame of image of the environment scene sub-video;the value of i is taken from 1 to n, and the value of j is taken from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environmental scene sub-video; />Indicating that the value of a is taken from 1 to G to obtain the minimum value in brackets; the method comprises the steps of carrying out a first treatment on the surface of the
Step S302, performing the fidelity compression processing on the environment occasion sub-video according to the fidelity compression pixel value by using the following formula (2),
in the above-mentioned formula (2),pixel data of an a-frame image after the environment scene sub-video is subjected to fidelity compression is represented, wherein the pixel data is in a pixel matrix form; />Substituting the value of i from 1 to n, substituting the value of j from 1 to m into brackets for all calculation;
step S303, judging whether the compression is effective compression according to the compressed environment case sub-video data by using the following formula (3), and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a restoration control value of data; h () represents the data amount of the data in the brackets;
if y=1, it indicates that the environment scene sub-video after performing the fidelity compression needs to be restored;
if y=0, it means that the environment scene sub-video after being subjected to the fidelity compression does not need to be restored.
9. The automated video summary generation method based on deep learning of claim 7, wherein:
in the step S4, according to the shooting azimuth of each environmental occasion sub-video, performing picture stitching on all the environmental occasion sub-videos, so as to obtain corresponding environmental panoramic occasion videos specifically including:
and according to the shooting azimuth of each environment occasion sub-video and the shooting time axis of each environment occasion sub-video, performing picture seamless splicing on all the environment occasion sub-videos, thereby obtaining the corresponding environment panoramic occasion video.
CN202210337196.8A 2022-03-31 2022-03-31 Automatic video abstract generation method based on deep learning Active CN114710713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210337196.8A CN114710713B (en) 2022-03-31 2022-03-31 Automatic video abstract generation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210337196.8A CN114710713B (en) 2022-03-31 2022-03-31 Automatic video abstract generation method based on deep learning

Publications (2)

Publication Number Publication Date
CN114710713A CN114710713A (en) 2022-07-05
CN114710713B true CN114710713B (en) 2023-08-01

Family

ID=82170441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210337196.8A Active CN114710713B (en) 2022-03-31 2022-03-31 Automatic video abstract generation method based on deep learning

Country Status (1)

Country Link
CN (1) CN114710713B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052841A (en) * 2020-10-12 2020-12-08 腾讯科技(深圳)有限公司 Video abstract generation method and related device
WO2022033252A1 (en) * 2020-08-14 2022-02-17 支付宝(杭州)信息技术有限公司 Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100688B (en) * 2014-05-12 2019-08-20 索尼公司 Image processing method, image processing apparatus and monitoring system
US10230866B1 (en) * 2015-09-30 2019-03-12 Amazon Technologies, Inc. Video ingestion and clip creation
WO2019185170A1 (en) * 2018-03-30 2019-10-03 Toyota Motor Europe Electronic device, robotic system and method for localizing a robotic system
CN113052753B (en) * 2019-12-26 2024-06-07 百度在线网络技术(北京)有限公司 Panoramic topological structure generation method, device and equipment and readable storage medium
CN112015231B (en) * 2020-07-31 2021-06-22 中标慧安信息技术股份有限公司 Method and system for processing surveillance video partition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022033252A1 (en) * 2020-08-14 2022-02-17 支付宝(杭州)信息技术有限公司 Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus
CN112052841A (en) * 2020-10-12 2020-12-08 腾讯科技(深圳)有限公司 Video abstract generation method and related device

Also Published As

Publication number Publication date
CN114710713A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN108038422B (en) Camera device, face recognition method and computer-readable storage medium
US8248474B2 (en) Surveillance system and surveilling method
WO2021042682A1 (en) Method, apparatus and system for recognizing transformer substation foreign mattter, and electronic device and storage medium
EP3704864B1 (en) Methods and systems for generating video synopsis
KR101781358B1 (en) Personal Identification System And Method By Face Recognition In Digital Image
CN109981943A (en) Picture pick-up device, image processing equipment, control method and storage medium
EP4116462A3 (en) Method and apparatus of processing image, electronic device, storage medium and program product
CN112422909B (en) Video behavior analysis management system based on artificial intelligence
CN113228626A (en) Video monitoring system and method
CN115965889A (en) Video quality assessment data processing method, device and equipment
CN112949439A (en) Method and system for monitoring invasion of personnel in key area of oil tank truck
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
CN112470189B (en) Occlusion cancellation for light field systems
CN111669547B (en) Panoramic video structuring method
CN116962598B (en) Monitoring video information fusion method and system
CN114710713B (en) Automatic video abstract generation method based on deep learning
CN111160123B (en) Aircraft target identification method, device and storage medium
US11044399B2 (en) Video surveillance system
CN113014876A (en) Video monitoring method and device, electronic equipment and readable storage medium
CN112637564A (en) Indoor security method and system based on multi-picture monitoring
CN113243015B (en) Video monitoring system
CN112733809B (en) Intelligent image identification method and system for natural protection area monitoring system
CN112991175B (en) Panoramic picture generation method and device based on single PTZ camera
CN112102481A (en) Method and device for constructing interactive simulation scene, computer equipment and storage medium
CN112004054A (en) Multi-azimuth monitoring method, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant