CN110191352A - A kind of comprehensive display system towards video content Intelligent treatment - Google Patents
A kind of comprehensive display system towards video content Intelligent treatment Download PDFInfo
- Publication number
- CN110191352A CN110191352A CN201910456376.6A CN201910456376A CN110191352A CN 110191352 A CN110191352 A CN 110191352A CN 201910456376 A CN201910456376 A CN 201910456376A CN 110191352 A CN110191352 A CN 110191352A
- Authority
- CN
- China
- Prior art keywords
- video
- feature
- module
- display module
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 claims description 29
- 230000004927 fusion Effects 0.000 claims description 22
- 238000000638 solvent extraction Methods 0.000 claims description 17
- 235000019580 granularity Nutrition 0.000 claims description 13
- 238000000034 method Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 8
- 230000008451 emotion Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 7
- 238000000926 separation method Methods 0.000 claims description 7
- 230000001360 synchronised effect Effects 0.000 claims description 7
- 230000009191 jumping Effects 0.000 claims description 6
- 238000011161 development Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012800 visualization Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 238000003786 synthesis reaction Methods 0.000 claims description 2
- 230000010365 information processing Effects 0.000 abstract description 5
- 230000000007 visual effect Effects 0.000 abstract description 4
- 238000011160 research Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000003710 cerebral cortex Anatomy 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 210000000857 visual cortex Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23412—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234309—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/436—Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
- H04N21/4363—Adapting the video stream to a specific local network, e.g. a Bluetooth® network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440218—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47202—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8543—Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a kind of comprehensive display systems towards video content Intelligent treatment, comprising: synthetic display module, feature display module, recognition result display module;The synthetic display module completes video playing, video essential information is shown, video content recognition result is simply shown;The audio and video characteristic that the feature display module is used to transmit is shown in the form of visual image, and as the broadcasting of video positions in real time, grasps the changing features situation of audio-video in real time convenient for researcher;The recognition result display module is used to display the details of the recognition result of video content.Of the invention is a little: by the flow path visual of video intelligent information processing, the working efficiency of raising researcher;Researcher is supported to save various information and convenient for the information exchange between different developers avoid that unnecessary repeated work occurs.
Description
Technical field
The present invention relates to multimedia intelligent technical field of information processing, in particular to one kind is towards video content Intelligent treatment
Comprehensive display system.
Background technique
In recent years, various towards complicated applications with the continuous development of computer and its embedded system soft hardware performance
The intelligent video monitoring system of scene constantly moves towards market, and the video product with intelligent video processing capacity is increasingly becoming video
The mainstream of product.Intelligent video analysis is to belong to image/video processing technique and computer vision (CV, Computer
Vision) technology, belongs to artificial intelligence (AI, Artificial Intelligent) research field, this technology can pass through
Digital Image Processing and video signal analysis extract and understand the content in video pictures.The Intelligent treatment conduct of video content
An important subject in MultiMedia Field, in the various fields such as public safety field, judicial domain, field of traffic all
It is widely used.Intelligent Information Processing to video content includes that video scene cuts, is sudden and violent in Video coding and decoding, video
Multiple sub- projects such as separation of audio-video in subtitle recognition, video in power scene detection, video.
Although current intelligent video analysis market is burning hot, for laboratory researchers, there are many to ask by people at present
Topic.Since the sub- project that intelligent video analysis is related to is relatively more, many researchers often share out the work and help one another, everyone only studies
One to two problems, the research achievement for finally integrating more people obtain a complete product.Form in this way, Efficiency
It is improved.But there is also certain drawbacks for this working forms.(1) often exist between the sub- project of video intelligent analysis
Association, this will be such that work repeats, and waste unnecessary human resources;(2) before final products birth, everyone work
The visualization of work is all poor, and project person in overall is difficult intuitively effectively to supervise everyone work;(3) audio and video characteristic mentions
After taking, feature and video hardly result in unification in time, this brings certain puzzlement to research staff.
Summary of the invention
The present invention in view of the drawbacks of the prior art, provides a kind of comprehensive display system towards video content Intelligent treatment
System can effectively solve that researcher encounters that audio and video characteristic visualization is low, research achievement is difficult to turn in the course of the research
The problems such as turning to actual product.Achieve the purpose that improve Efficiency.What the present invention can require in smooth playing technical indicator
Video format can carry out curtain to video to be processed as required and scene is cut, can individually extract the audio file in video, together
When visual audio volume control is provided, feature shows program, provides the beautiful interface UI, can be to voice, emotion, scene, subtitle
Equal recognition results carry out real-time display.
In order to realize the above goal of the invention, the technical solution adopted by the present invention is as follows:
A kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, feature show mould
Block, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features module;
The function that modules are realized is as follows:
(1) synthetic display module: a. provides video playback window, is capable of the video lattice of smooth playing mainstream currently on the market
Formula, video playing are checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. it shows
The Time-Frequency Information of video sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, two for the sub-interface
Secondary exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result
Associated frame is navigated in real time.
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. it shows in video
The shape information of audio, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video extracted
File characteristic, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter.
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, by Episode sequences point
Block arrangement;B., interactive interface is provided, manual amendment's recognition result is used for;C. the audio being currently played, subtitle are highlighted
Corresponding recognition result;D. recognition result piecemeal can be saved to local directory.
(4) Video decoding module: internal module parses video file, can extract each in video file
Frame simultaneously saves specified video clip.
(5) scene partitioning module: internal module receives the data from XML parsing module and Video decoding module, to view
Frequency carries out scene partitioning.
(6) XML parsing module: internal module parses different types of XML file, and is passed data according to parsing result
It send to different modules.
(7) Fusion Features module: internal module receives the data from XML parsing module, and the frame of audio or video is special
Sign, Duan Tezheng, global characteristics are merged according to specific mode, are sent into feature display module.
Further, the synthetic display module, feature display module, recognition result display module are embodied as visualizing boundary
Three interfaces can be presented simultaneously in face on three panel type displays.It is synchronous by video playback time between three interfaces.
Further, the feature display technology that the comprehensive display system is merged based on more granularities, using at computer
When managing video data, more grain size characteristic extractions will also be carried out to audio and video characteristic.
Further, the comprehensive display system supports the organizational form of self-defining parameter, but must satisfy XML rule
Model.
Further, the comprehensive display system provides expansible programming, be synthetic display module, feature display module,
The programming interface that recognition result display module is provided for secondary development, display format or change feature for user-defined feature
Temporal resolution, the configuration file of interface equally uses XML specification.
The invention also discloses the working methods of comprehensive display system, include the following steps:
Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video to video first
Decoding.Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output solution
Audio stream and video flowing after code.
Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module.Scene partitioning module exists
After receiving video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter pair
Video flowing and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream with
The form of cache file is stored in local disk, calls for subsequent module;
Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, led to
The control button for crossing played column can carry out normal play, play two kinds of play mode frame by frame.The parsing of synthetic display module call XML
Module reads external XML file, reads tag file and recognition result file and shows.
Step 4, feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, it is
System calls Fusion Features resume module and shows more grain size characteristics.
Step 5, recognition result display module is read recognition result and is shown, synthesis display by call XML parsing module
Module, feature display module and recognition result display module are carried out with video playing is synchronous.
Further, more grain size characteristic processing and display methods are as follows:
Each dimension is aligned on time dimension first, frame feature, Duan Tezheng, global characteristics matrix are set respectively
Set alignment factor matrix Ax、As、Ag.Alignment factor arranged in matrix is following form.
Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor global characteristics
Dimension.In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, it is each after being just aligned
Grain size characteristic matrix.
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be by following formula meter
It obtains:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.XP×T、SM×T、For frame spy
Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F
The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature
Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.It, will after obtaining fusion matrix
Eigenmatrix is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.
Compared with prior art the present invention has the advantages that
(1) by the flow path visual of video intelligent information processing, the working efficiency of raising researcher;
(2) it supports researcher to save various information, convenient for the information exchange between different developers, avoids occurring not
Necessary repeated work;
(3) more grain size characteristic method for visualizing are provided, more grain size characteristics are shown simultaneously;
(3) it is controlled by the time, video playing, feature is shown, synchronized as the result is shown, researcher is facilitated to find in time
Problem provides clearly logic;
(4) present invention provides quadratic programming interface, and different research teams is facilitated to carry out personalized adjustment.
Figure of description
Fig. 1 is overall system architecture block diagram;
Fig. 2 is more grain size characteristic fusion schematic diagrames.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, below in conjunction with attached drawing and embodiment is enumerated,
The present invention is described in further details.
As shown in Figure 1, a kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, spy
Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould
Block;
The synthetic display module, feature display module, recognition result display module are embodied as visualization interface, can be three
Three interfaces are presented on panel type display simultaneously.It is synchronous by video playback time between three interfaces.
Three modules realize following functions:
(1) synthetic display module: a. provides video playback window, is capable of the video lattice of smooth playing mainstream currently on the market
Formula, video playing are checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. it shows
The Time-Frequency Information of video sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, two for the sub-interface
Secondary exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result
Associated frame is navigated in real time.
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. it shows in video
The shape information of audio, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video extracted
File characteristic, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter.
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, by Episode sequences point
Block arrangement;B., interactive interface is provided, manual amendment's recognition result is used for;C. the audio being currently played, subtitle are highlighted
Corresponding recognition result;D. recognition result piecemeal can be saved to local directory.
(4) Video decoding module: internal module parses video file, can extract each in video file
Frame simultaneously saves specified video clip.
(5) scene partitioning module: internal module receives the data from XML parsing module and Video decoding module, to view
Frequency carries out scene partitioning.
(6) XML parsing module: internal module parses different types of XML file, and is passed data according to parsing result
It send to different modules.
(7) Fusion Features module: internal module receives the data from XML parsing module, and the frame of audio or video is special
Sign, Duan Tezheng, global characteristics are merged according to specific mode, are sent into feature display module.
The present invention proposes the feature display technology merged based on more granularities.Timing information processing is that multi-level depth adds
Work.In the vision processing of people, visual cortex cell has carried out preparatory processing to the block of pixels in image, preliminary by what is obtained
The lesser information of granularity is sent to cerebral cortex and carries out more advanced information extraction, just forms the concept of each object in picture.
Equally we will also carry out more grain size characteristic extractions to audio and video characteristic when using computer disposal video data.Based on this,
Based on the feature display technology of more granularities in it is proposed that.
The present invention and the information exchange of external program are mainly to be completed by configuration file.In the present invention, scene
The parameter that the functions such as division, speech waveform, characteristic set, recognition result need is read by the way that XML file is unified.The present invention supports
The organizational form of self-defining parameter, but must satisfy XML specification.
The present invention provides expansible programming, and the present invention is synthetic display module, feature display module, recognition result display mould
Block provide for secondary development programming interface, for user-defined feature display format or change feature temporal resolution,
The configuration file of interface equally uses XML specification.
Overall workflow of the invention is as follows:
After video to be processed enters system, first passes around Video decoding module and audio-video separation and video solution are carried out to video
Code.Video is compressed due to generalling use technology of video compressing encoding at present to reduce the occupancy of memory space.Therefore
Video file, which needs to first pass through decoding, will carry out subsequent broadcasting, and in this step, digital video file is decoded, mentioned simultaneously
The audio stream in video file is taken, audio-video separation is completed, finally exports decoded audio stream and video flowing.
The video flowing and audio stream extracted in previous step is admitted to scene partitioning module.Scene partitioning module is receiving view
After frequency stream and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video flowing with
Audio stream is cut, and extracts the thumbnail of each act of key frame, by after cutting video flowing and audio stream with cache file
Form be stored in local disk, for subsequent module call;
Synthetic display module reads the video flowing and audio stream after cutting, which provides video playback capability, can be used for
Video is played, normal play can be carried out by the control button of played column, play two kinds of play mode frame by frame.Synthetic display module
Call XML parsing module reads external XML file, reads tag file and recognition result file and shows.
Feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, system is called
The more grain size characteristics of Fusion Features resume module.More grain size characteristics and its display methods will be described in detail below.
More grain size characteristics it is as shown in Figure 2.Clock signal is mostly the random signal of nonlinear and nonstationary, can not be straight on it
It connects with Digital Signal Processing, it is therefore desirable to carry out a point window on signal and operate, just obtain the concept of frame.In this way one
It can be obtained by the sequence of frame feature on section clock signal, to realize portraying for clock signal in small granularity.And
During actual treatment problem, since most classifiers can not classify to the feature of Length discrepancy, then just by one section
All frames obtained in clock signal are for statistical analysis, and by the spy of this section the most clock signal of finally obtained statistic
Sign.During this processing, although last feature is from lesser frame unit by being calculated, but it is
One entirety of whole section of clock signal reflects, can not embody the different variations of different moments and different time in clock signal
The different trend changed in section, therefore such method is that deeper high-level information can not be extracted from clock signal
's.
It is aligned on time dimension firstly the need of by each dimension, we are respectively to frame feature, Duan Tezheng, global spy
Levy arranged in matrix alignment factor matrix Ax、As、Ag.Herein, alignment factor matrix we be set as following form.
Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor global characteristics
Dimension.In this way, we are by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, after being just aligned
Each grain size characteristic matrix.
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be by following formula meter
It obtains:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.XP×T、SM×T、For frame spy
Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F
The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature
Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.After obtaining fusion matrix, I
Eigenmatrix is mapped in RGB image, send RGB image back to feature display module, complete the display of more grain size characteristics.
Recognition result display module is read recognition result and shown, ground for current by call XML parsing module
To study carefully, video content intelligent recognition is concentrated mainly on subtitle recognition, speech recognition, video scene identification, speech emotion recognition etc., I
By identification content and video scene divide piecemeal show.
Synthetic display module, feature display module and recognition result display module are carried out with video playing is synchronous, are conveniently ground
Study carefully the work of personnel.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair
Bright implementation method, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.Ability
The those of ordinary skill in domain disclosed the technical disclosures can make its various for not departing from essence of the invention according to the present invention
Its various specific variations and combinations, these variations and combinations are still within the scope of the present invention.
Claims (7)
1. a kind of comprehensive display system towards video content Intelligent treatment characterized by comprising synthetic display module, spy
Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould
Block;
The function that modules are realized is as follows:
(1) synthetic display module: a. provides video playback window, is capable of the video format of smooth playing mainstream currently on the market,
Video playing is checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. view is shown
The Time-Frequency Information of frequency sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, for the secondary of the sub-interface
Exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result reality
When navigate to associated frame;
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. video sound intermediate frequency is shown
Shape information, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video file extracted
Feature, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter;
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, arranges by Episode sequences piecemeal
Column;B., interactive interface is provided, manual amendment's recognition result is used for;C. it is corresponding that the audio being currently played, subtitle are highlighted
Recognition result;D. recognition result piecemeal can be saved to local directory;
(4) Video decoding module: parsing video file, can extract each frame in video file and save specified
Video clip;
(5) scene partitioning module: receiving the data from XML parsing module and Video decoding module, carries out scene to video and draws
Point;
(6) XML parsing module: different types of XML file is parsed, and transfers data to different moulds according to parsing result
Block;
(7) Fusion Features module: receiving the data from XML parsing module, by the frame feature of audio or video, Duan Tezheng, complete
Office's feature is merged according to specific mode, is sent into feature display module.
2. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute
It states synthetic display module, feature display module, recognition result display module and is embodied as visualization interface, it can be on three panel type displays
Three interfaces are presented simultaneously;It is synchronous by video playback time between three interfaces.
3. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute
The feature display technology that comprehensive display system is merged based on more granularities is stated, when using computer disposal video data, sound is regarded
Frequency feature will also carry out more grain size characteristic extractions.
4. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute
It states comprehensive display system and supports the organizational form of self-defining parameter, but must satisfy XML specification.
5. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute
It states comprehensive display system and expansible programming is provided, provided for synthetic display module, feature display module, recognition result display module
For the programming interface of secondary development, for user-defined feature display format or change the temporal resolution of feature, interface
Configuration file equally uses XML specification.
6. a kind of comprehensive display system towards video content Intelligent treatment described in the one of them to 5 according to claim 1
Working method, which comprises the steps of:
Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video solution to video first
Code;Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output decoding
Audio stream and video flowing afterwards;
Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module;Scene partitioning module is receiving
To after video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video
Stream and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream to cache
The form of file is stored in local disk, calls for subsequent module;
Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, by broadcasting
The control button for putting column can carry out normal play, play two kinds of play mode frame by frame;Synthetic display module call XML parses mould
Block reads external XML file, reads tag file and recognition result file and shows;
Step 4, feature display module reads the parameter of video file by parsing XML file;After characteristic parameter is read, system tune
With Fusion Features resume module and show more grain size characteristics;
Step 5, recognition result display module is read recognition result and is shown, synthesis display mould by call XML parsing module
Block, feature display module and recognition result display module are carried out with video playing is synchronous.
7. the method according to claim 6, it is characterised in that: more grain size characteristic processing and display methods are as follows:
Each dimension is aligned on time dimension first, respectively to frame feature, Duan Tezheng, global characteristics arranged in matrix pair
Neat factor matrix Ax、As、Ag;Alignment factor arranged in matrix is following form;
Wherein, I is unit battle array;P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor the dimension of global characteristics;
In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, each granularity after being just aligned
Eigenmatrix;
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be calculated by following formula
Out:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor;XP×T、SM×T、For frame feature, section
Feature, the corresponding eigenmatrix of global characteristics matrix, T is time granularity;Each column are all one by fusion in this way in matrix F
Frame feature, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains, in frame feature pass through Gauss
The section grain size characteristic that convolution of functions operation obtains, and the global characteristics of sequential signal at this time;After obtaining fusion matrix, by feature square
Battle array is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910456376.6A CN110191352A (en) | 2019-05-29 | 2019-05-29 | A kind of comprehensive display system towards video content Intelligent treatment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910456376.6A CN110191352A (en) | 2019-05-29 | 2019-05-29 | A kind of comprehensive display system towards video content Intelligent treatment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110191352A true CN110191352A (en) | 2019-08-30 |
Family
ID=67718513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910456376.6A Pending CN110191352A (en) | 2019-05-29 | 2019-05-29 | A kind of comprehensive display system towards video content Intelligent treatment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110191352A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112383837A (en) * | 2019-09-29 | 2021-02-19 | 北京城建设计发展集团股份有限公司 | Intelligent integrated broadcast control equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010084739A1 (en) * | 2009-01-23 | 2010-07-29 | 日本電気株式会社 | Video identifier extracting device |
CN102523536A (en) * | 2011-12-15 | 2012-06-27 | 清华大学 | Video semantic visualization method |
CN105205091A (en) * | 2015-06-04 | 2015-12-30 | 浙江大学 | Method for visualizing soundscape information |
-
2019
- 2019-05-29 CN CN201910456376.6A patent/CN110191352A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010084739A1 (en) * | 2009-01-23 | 2010-07-29 | 日本電気株式会社 | Video identifier extracting device |
CN102523536A (en) * | 2011-12-15 | 2012-06-27 | 清华大学 | Video semantic visualization method |
CN105205091A (en) * | 2015-06-04 | 2015-12-30 | 浙江大学 | Method for visualizing soundscape information |
Non-Patent Citations (2)
Title |
---|
张田等: "基于音频的数字媒体内容分析及其可视化", 《燕山大学学报》 * |
徐聪: "基于卷积_长短时记忆神经网络的时序信号多粒度分析处理方法研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112383837A (en) * | 2019-09-29 | 2021-02-19 | 北京城建设计发展集团股份有限公司 | Intelligent integrated broadcast control equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021114881A1 (en) | Intelligent commentary generation method, apparatus and device, intelligent commentary playback method, apparatus and device, and computer storage medium | |
CN109729426B (en) | Method and device for generating video cover image | |
CN109257622A (en) | A kind of audio/video processing method, device, equipment and medium | |
CN106789991A (en) | A kind of multi-person interactive method and system based on virtual scene | |
CN104782121A (en) | Multiple region video conference encoding | |
CN107436921B (en) | Video data processing method, device, equipment and storage medium | |
CN104540275B (en) | A kind of method for adjusting live lighting device, equipment and system | |
CN109063506A (en) | Privacy processing method for medical operating teaching system | |
CN106028004A (en) | Multi-signal input and multi-image composition device and method | |
CN207399423U (en) | A kind of distributed network video process apparatus | |
CN107454346B (en) | Movie data analysis method, video production template recommendation method, device and equipment | |
CN104469089A (en) | Multimedia interaction teaching system and teaching method | |
CN110969572A (en) | Face changing model training method, face exchanging device and electronic equipment | |
CN110191352A (en) | A kind of comprehensive display system towards video content Intelligent treatment | |
CN103929640A (en) | Techniques For Managing Video Streaming | |
CN201414197Y (en) | Miscs intelligent monitoring system | |
CN105407364B (en) | Based on channel synthesized competitiveness implementation method under smart television audience ratings system | |
CN205946014U (en) | Synthetic device of many pictures of many signal input | |
CN112995748A (en) | Multi-mode-based automatic bullet screen generation method and system, storage medium and equipment | |
CN102883213B (en) | Subtitle extraction method and device | |
CN106340307A (en) | Method and device used for displaying audio information | |
Li et al. | 3d human skeleton data compression for action recognition | |
CN202172447U (en) | Expandable multi-path input/output matrix server | |
CN108769548A (en) | A kind of decoding video output system and method | |
CN201107858Y (en) | Video matrix mainframe |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |
|
RJ01 | Rejection of invention patent application after publication |