CN110191352A - A kind of comprehensive display system towards video content Intelligent treatment - Google Patents

A kind of comprehensive display system towards video content Intelligent treatment Download PDF

Info

Publication number
CN110191352A
CN110191352A CN201910456376.6A CN201910456376A CN110191352A CN 110191352 A CN110191352 A CN 110191352A CN 201910456376 A CN201910456376 A CN 201910456376A CN 110191352 A CN110191352 A CN 110191352A
Authority
CN
China
Prior art keywords
video
feature
module
display module
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910456376.6A
Other languages
Chinese (zh)
Inventor
李海峰
马琳
李洪伟
薄洪建
丰上
徐聪
陈婧
房春英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910456376.6A priority Critical patent/CN110191352A/en
Publication of CN110191352A publication Critical patent/CN110191352A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/4363Adapting the video stream to a specific local network, e.g. a Bluetooth® network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8543Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a kind of comprehensive display systems towards video content Intelligent treatment, comprising: synthetic display module, feature display module, recognition result display module;The synthetic display module completes video playing, video essential information is shown, video content recognition result is simply shown;The audio and video characteristic that the feature display module is used to transmit is shown in the form of visual image, and as the broadcasting of video positions in real time, grasps the changing features situation of audio-video in real time convenient for researcher;The recognition result display module is used to display the details of the recognition result of video content.Of the invention is a little: by the flow path visual of video intelligent information processing, the working efficiency of raising researcher;Researcher is supported to save various information and convenient for the information exchange between different developers avoid that unnecessary repeated work occurs.

Description

A kind of comprehensive display system towards video content Intelligent treatment
Technical field
The present invention relates to multimedia intelligent technical field of information processing, in particular to one kind is towards video content Intelligent treatment Comprehensive display system.
Background technique
In recent years, various towards complicated applications with the continuous development of computer and its embedded system soft hardware performance The intelligent video monitoring system of scene constantly moves towards market, and the video product with intelligent video processing capacity is increasingly becoming video The mainstream of product.Intelligent video analysis is to belong to image/video processing technique and computer vision (CV, Computer Vision) technology, belongs to artificial intelligence (AI, Artificial Intelligent) research field, this technology can pass through Digital Image Processing and video signal analysis extract and understand the content in video pictures.The Intelligent treatment conduct of video content An important subject in MultiMedia Field, in the various fields such as public safety field, judicial domain, field of traffic all It is widely used.Intelligent Information Processing to video content includes that video scene cuts, is sudden and violent in Video coding and decoding, video Multiple sub- projects such as separation of audio-video in subtitle recognition, video in power scene detection, video.
Although current intelligent video analysis market is burning hot, for laboratory researchers, there are many to ask by people at present Topic.Since the sub- project that intelligent video analysis is related to is relatively more, many researchers often share out the work and help one another, everyone only studies One to two problems, the research achievement for finally integrating more people obtain a complete product.Form in this way, Efficiency It is improved.But there is also certain drawbacks for this working forms.(1) often exist between the sub- project of video intelligent analysis Association, this will be such that work repeats, and waste unnecessary human resources;(2) before final products birth, everyone work The visualization of work is all poor, and project person in overall is difficult intuitively effectively to supervise everyone work;(3) audio and video characteristic mentions After taking, feature and video hardly result in unification in time, this brings certain puzzlement to research staff.
Summary of the invention
The present invention in view of the drawbacks of the prior art, provides a kind of comprehensive display system towards video content Intelligent treatment System can effectively solve that researcher encounters that audio and video characteristic visualization is low, research achievement is difficult to turn in the course of the research The problems such as turning to actual product.Achieve the purpose that improve Efficiency.What the present invention can require in smooth playing technical indicator Video format can carry out curtain to video to be processed as required and scene is cut, can individually extract the audio file in video, together When visual audio volume control is provided, feature shows program, provides the beautiful interface UI, can be to voice, emotion, scene, subtitle Equal recognition results carry out real-time display.
In order to realize the above goal of the invention, the technical solution adopted by the present invention is as follows:
A kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, feature show mould Block, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features module;
The function that modules are realized is as follows:
(1) synthetic display module: a. provides video playback window, is capable of the video lattice of smooth playing mainstream currently on the market Formula, video playing are checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. it shows The Time-Frequency Information of video sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, two for the sub-interface Secondary exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result Associated frame is navigated in real time.
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. it shows in video The shape information of audio, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video extracted File characteristic, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter.
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, by Episode sequences point Block arrangement;B., interactive interface is provided, manual amendment's recognition result is used for;C. the audio being currently played, subtitle are highlighted Corresponding recognition result;D. recognition result piecemeal can be saved to local directory.
(4) Video decoding module: internal module parses video file, can extract each in video file Frame simultaneously saves specified video clip.
(5) scene partitioning module: internal module receives the data from XML parsing module and Video decoding module, to view Frequency carries out scene partitioning.
(6) XML parsing module: internal module parses different types of XML file, and is passed data according to parsing result It send to different modules.
(7) Fusion Features module: internal module receives the data from XML parsing module, and the frame of audio or video is special Sign, Duan Tezheng, global characteristics are merged according to specific mode, are sent into feature display module.
Further, the synthetic display module, feature display module, recognition result display module are embodied as visualizing boundary Three interfaces can be presented simultaneously in face on three panel type displays.It is synchronous by video playback time between three interfaces.
Further, the feature display technology that the comprehensive display system is merged based on more granularities, using at computer When managing video data, more grain size characteristic extractions will also be carried out to audio and video characteristic.
Further, the comprehensive display system supports the organizational form of self-defining parameter, but must satisfy XML rule Model.
Further, the comprehensive display system provides expansible programming, be synthetic display module, feature display module, The programming interface that recognition result display module is provided for secondary development, display format or change feature for user-defined feature Temporal resolution, the configuration file of interface equally uses XML specification.
The invention also discloses the working methods of comprehensive display system, include the following steps:
Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video to video first Decoding.Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output solution Audio stream and video flowing after code.
Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module.Scene partitioning module exists After receiving video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter pair Video flowing and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream with The form of cache file is stored in local disk, calls for subsequent module;
Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, led to The control button for crossing played column can carry out normal play, play two kinds of play mode frame by frame.The parsing of synthetic display module call XML Module reads external XML file, reads tag file and recognition result file and shows.
Step 4, feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, it is System calls Fusion Features resume module and shows more grain size characteristics.
Step 5, recognition result display module is read recognition result and is shown, synthesis display by call XML parsing module Module, feature display module and recognition result display module are carried out with video playing is synchronous.
Further, more grain size characteristic processing and display methods are as follows:
Each dimension is aligned on time dimension first, frame feature, Duan Tezheng, global characteristics matrix are set respectively Set alignment factor matrix Ax、As、Ag.Alignment factor arranged in matrix is following form.
Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor global characteristics Dimension.In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, it is each after being just aligned Grain size characteristic matrix.
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be by following formula meter It obtains:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.XP×T、SM×TFor frame spy Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.It, will after obtaining fusion matrix Eigenmatrix is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.
Compared with prior art the present invention has the advantages that
(1) by the flow path visual of video intelligent information processing, the working efficiency of raising researcher;
(2) it supports researcher to save various information, convenient for the information exchange between different developers, avoids occurring not Necessary repeated work;
(3) more grain size characteristic method for visualizing are provided, more grain size characteristics are shown simultaneously;
(3) it is controlled by the time, video playing, feature is shown, synchronized as the result is shown, researcher is facilitated to find in time Problem provides clearly logic;
(4) present invention provides quadratic programming interface, and different research teams is facilitated to carry out personalized adjustment.
Figure of description
Fig. 1 is overall system architecture block diagram;
Fig. 2 is more grain size characteristic fusion schematic diagrames.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, below in conjunction with attached drawing and embodiment is enumerated, The present invention is described in further details.
As shown in Figure 1, a kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, spy Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould Block;
The synthetic display module, feature display module, recognition result display module are embodied as visualization interface, can be three Three interfaces are presented on panel type display simultaneously.It is synchronous by video playback time between three interfaces.
Three modules realize following functions:
(1) synthetic display module: a. provides video playback window, is capable of the video lattice of smooth playing mainstream currently on the market Formula, video playing are checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. it shows The Time-Frequency Information of video sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, two for the sub-interface Secondary exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result Associated frame is navigated in real time.
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. it shows in video The shape information of audio, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video extracted File characteristic, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter.
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, by Episode sequences point Block arrangement;B., interactive interface is provided, manual amendment's recognition result is used for;C. the audio being currently played, subtitle are highlighted Corresponding recognition result;D. recognition result piecemeal can be saved to local directory.
(4) Video decoding module: internal module parses video file, can extract each in video file Frame simultaneously saves specified video clip.
(5) scene partitioning module: internal module receives the data from XML parsing module and Video decoding module, to view Frequency carries out scene partitioning.
(6) XML parsing module: internal module parses different types of XML file, and is passed data according to parsing result It send to different modules.
(7) Fusion Features module: internal module receives the data from XML parsing module, and the frame of audio or video is special Sign, Duan Tezheng, global characteristics are merged according to specific mode, are sent into feature display module.
The present invention proposes the feature display technology merged based on more granularities.Timing information processing is that multi-level depth adds Work.In the vision processing of people, visual cortex cell has carried out preparatory processing to the block of pixels in image, preliminary by what is obtained The lesser information of granularity is sent to cerebral cortex and carries out more advanced information extraction, just forms the concept of each object in picture. Equally we will also carry out more grain size characteristic extractions to audio and video characteristic when using computer disposal video data.Based on this, Based on the feature display technology of more granularities in it is proposed that.
The present invention and the information exchange of external program are mainly to be completed by configuration file.In the present invention, scene The parameter that the functions such as division, speech waveform, characteristic set, recognition result need is read by the way that XML file is unified.The present invention supports The organizational form of self-defining parameter, but must satisfy XML specification.
The present invention provides expansible programming, and the present invention is synthetic display module, feature display module, recognition result display mould Block provide for secondary development programming interface, for user-defined feature display format or change feature temporal resolution, The configuration file of interface equally uses XML specification.
Overall workflow of the invention is as follows:
After video to be processed enters system, first passes around Video decoding module and audio-video separation and video solution are carried out to video Code.Video is compressed due to generalling use technology of video compressing encoding at present to reduce the occupancy of memory space.Therefore Video file, which needs to first pass through decoding, will carry out subsequent broadcasting, and in this step, digital video file is decoded, mentioned simultaneously The audio stream in video file is taken, audio-video separation is completed, finally exports decoded audio stream and video flowing.
The video flowing and audio stream extracted in previous step is admitted to scene partitioning module.Scene partitioning module is receiving view After frequency stream and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video flowing with Audio stream is cut, and extracts the thumbnail of each act of key frame, by after cutting video flowing and audio stream with cache file Form be stored in local disk, for subsequent module call;
Synthetic display module reads the video flowing and audio stream after cutting, which provides video playback capability, can be used for Video is played, normal play can be carried out by the control button of played column, play two kinds of play mode frame by frame.Synthetic display module Call XML parsing module reads external XML file, reads tag file and recognition result file and shows.
Feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, system is called The more grain size characteristics of Fusion Features resume module.More grain size characteristics and its display methods will be described in detail below.
More grain size characteristics it is as shown in Figure 2.Clock signal is mostly the random signal of nonlinear and nonstationary, can not be straight on it It connects with Digital Signal Processing, it is therefore desirable to carry out a point window on signal and operate, just obtain the concept of frame.In this way one It can be obtained by the sequence of frame feature on section clock signal, to realize portraying for clock signal in small granularity.And During actual treatment problem, since most classifiers can not classify to the feature of Length discrepancy, then just by one section All frames obtained in clock signal are for statistical analysis, and by the spy of this section the most clock signal of finally obtained statistic Sign.During this processing, although last feature is from lesser frame unit by being calculated, but it is One entirety of whole section of clock signal reflects, can not embody the different variations of different moments and different time in clock signal The different trend changed in section, therefore such method is that deeper high-level information can not be extracted from clock signal 's.
It is aligned on time dimension firstly the need of by each dimension, we are respectively to frame feature, Duan Tezheng, global spy Levy arranged in matrix alignment factor matrix Ax、As、Ag.Herein, alignment factor matrix we be set as following form.
Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor global characteristics Dimension.In this way, we are by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, after being just aligned Each grain size characteristic matrix.
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be by following formula meter It obtains:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.XP×T、SM×TFor frame spy Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.After obtaining fusion matrix, I Eigenmatrix is mapped in RGB image, send RGB image back to feature display module, complete the display of more grain size characteristics.
Recognition result display module is read recognition result and shown, ground for current by call XML parsing module To study carefully, video content intelligent recognition is concentrated mainly on subtitle recognition, speech recognition, video scene identification, speech emotion recognition etc., I By identification content and video scene divide piecemeal show.
Synthetic display module, feature display module and recognition result display module are carried out with video playing is synchronous, are conveniently ground Study carefully the work of personnel.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright implementation method, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.Ability The those of ordinary skill in domain disclosed the technical disclosures can make its various for not departing from essence of the invention according to the present invention Its various specific variations and combinations, these variations and combinations are still within the scope of the present invention.

Claims (7)

1. a kind of comprehensive display system towards video content Intelligent treatment characterized by comprising synthetic display module, spy Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould Block;
The function that modules are realized is as follows:
(1) synthetic display module: a. provides video playback window, is capable of the video format of smooth playing mainstream currently on the market, Video playing is checked frame by frame without Caton, support;B. the curtain and scene for showing video support jumping in real time for scene;C. view is shown The Time-Frequency Information of frequency sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, for the secondary of the sub-interface Exploitation;D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result reality When navigate to associated frame;
(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene;B. video sound intermediate frequency is shown Shape information, and follow up in real time with video playing;C. sub-interface is provided to be used to show the audio file or video file extracted Feature, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter;
(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, arranges by Episode sequences piecemeal Column;B., interactive interface is provided, manual amendment's recognition result is used for;C. it is corresponding that the audio being currently played, subtitle are highlighted Recognition result;D. recognition result piecemeal can be saved to local directory;
(4) Video decoding module: parsing video file, can extract each frame in video file and save specified Video clip;
(5) scene partitioning module: receiving the data from XML parsing module and Video decoding module, carries out scene to video and draws Point;
(6) XML parsing module: different types of XML file is parsed, and transfers data to different moulds according to parsing result Block;
(7) Fusion Features module: receiving the data from XML parsing module, by the frame feature of audio or video, Duan Tezheng, complete Office's feature is merged according to specific mode, is sent into feature display module.
2. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states synthetic display module, feature display module, recognition result display module and is embodied as visualization interface, it can be on three panel type displays Three interfaces are presented simultaneously;It is synchronous by video playback time between three interfaces.
3. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute The feature display technology that comprehensive display system is merged based on more granularities is stated, when using computer disposal video data, sound is regarded Frequency feature will also carry out more grain size characteristic extractions.
4. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states comprehensive display system and supports the organizational form of self-defining parameter, but must satisfy XML specification.
5. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states comprehensive display system and expansible programming is provided, provided for synthetic display module, feature display module, recognition result display module For the programming interface of secondary development, for user-defined feature display format or change the temporal resolution of feature, interface Configuration file equally uses XML specification.
6. a kind of comprehensive display system towards video content Intelligent treatment described in the one of them to 5 according to claim 1 Working method, which comprises the steps of:
Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video solution to video first Code;Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output decoding Audio stream and video flowing afterwards;
Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module;Scene partitioning module is receiving To after video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video Stream and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream to cache The form of file is stored in local disk, calls for subsequent module;
Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, by broadcasting The control button for putting column can carry out normal play, play two kinds of play mode frame by frame;Synthetic display module call XML parses mould Block reads external XML file, reads tag file and recognition result file and shows;
Step 4, feature display module reads the parameter of video file by parsing XML file;After characteristic parameter is read, system tune With Fusion Features resume module and show more grain size characteristics;
Step 5, recognition result display module is read recognition result and is shown, synthesis display mould by call XML parsing module Block, feature display module and recognition result display module are carried out with video playing is synchronous.
7. the method according to claim 6, it is characterised in that: more grain size characteristic processing and display methods are as follows:
Each dimension is aligned on time dimension first, respectively to frame feature, Duan Tezheng, global characteristics arranged in matrix pair Neat factor matrix Ax、As、Ag;Alignment factor arranged in matrix is following form;
Wherein, I is unit battle array;P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, PgFor the dimension of global characteristics; In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, each granularity after being just aligned Eigenmatrix;
After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be calculated by following formula Out:
Wherein Ax、As、AgFor frame feature, Duan Tezheng, global characteristics matrix alignment factor;XP×T、SM×TFor frame feature, section Feature, the corresponding eigenmatrix of global characteristics matrix, T is time granularity;Each column are all one by fusion in this way in matrix F Frame feature, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains, in frame feature pass through Gauss The section grain size characteristic that convolution of functions operation obtains, and the global characteristics of sequential signal at this time;After obtaining fusion matrix, by feature square Battle array is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.
CN201910456376.6A 2019-05-29 2019-05-29 A kind of comprehensive display system towards video content Intelligent treatment Pending CN110191352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910456376.6A CN110191352A (en) 2019-05-29 2019-05-29 A kind of comprehensive display system towards video content Intelligent treatment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910456376.6A CN110191352A (en) 2019-05-29 2019-05-29 A kind of comprehensive display system towards video content Intelligent treatment

Publications (1)

Publication Number Publication Date
CN110191352A true CN110191352A (en) 2019-08-30

Family

ID=67718513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910456376.6A Pending CN110191352A (en) 2019-05-29 2019-05-29 A kind of comprehensive display system towards video content Intelligent treatment

Country Status (1)

Country Link
CN (1) CN110191352A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383837A (en) * 2019-09-29 2021-02-19 北京城建设计发展集团股份有限公司 Intelligent integrated broadcast control equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010084739A1 (en) * 2009-01-23 2010-07-29 日本電気株式会社 Video identifier extracting device
CN102523536A (en) * 2011-12-15 2012-06-27 清华大学 Video semantic visualization method
CN105205091A (en) * 2015-06-04 2015-12-30 浙江大学 Method for visualizing soundscape information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010084739A1 (en) * 2009-01-23 2010-07-29 日本電気株式会社 Video identifier extracting device
CN102523536A (en) * 2011-12-15 2012-06-27 清华大学 Video semantic visualization method
CN105205091A (en) * 2015-06-04 2015-12-30 浙江大学 Method for visualizing soundscape information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张田等: "基于音频的数字媒体内容分析及其可视化", 《燕山大学学报》 *
徐聪: "基于卷积_长短时记忆神经网络的时序信号多粒度分析处理方法研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383837A (en) * 2019-09-29 2021-02-19 北京城建设计发展集团股份有限公司 Intelligent integrated broadcast control equipment

Similar Documents

Publication Publication Date Title
WO2021114881A1 (en) Intelligent commentary generation method, apparatus and device, intelligent commentary playback method, apparatus and device, and computer storage medium
CN109729426B (en) Method and device for generating video cover image
CN109257622A (en) A kind of audio/video processing method, device, equipment and medium
CN106789991A (en) A kind of multi-person interactive method and system based on virtual scene
CN104782121A (en) Multiple region video conference encoding
CN107436921B (en) Video data processing method, device, equipment and storage medium
CN104540275B (en) A kind of method for adjusting live lighting device, equipment and system
CN109063506A (en) Privacy processing method for medical operating teaching system
CN106028004A (en) Multi-signal input and multi-image composition device and method
CN207399423U (en) A kind of distributed network video process apparatus
CN107454346B (en) Movie data analysis method, video production template recommendation method, device and equipment
CN104469089A (en) Multimedia interaction teaching system and teaching method
CN110969572A (en) Face changing model training method, face exchanging device and electronic equipment
CN110191352A (en) A kind of comprehensive display system towards video content Intelligent treatment
CN103929640A (en) Techniques For Managing Video Streaming
CN201414197Y (en) Miscs intelligent monitoring system
CN105407364B (en) Based on channel synthesized competitiveness implementation method under smart television audience ratings system
CN205946014U (en) Synthetic device of many pictures of many signal input
CN112995748A (en) Multi-mode-based automatic bullet screen generation method and system, storage medium and equipment
CN102883213B (en) Subtitle extraction method and device
CN106340307A (en) Method and device used for displaying audio information
Li et al. 3d human skeleton data compression for action recognition
CN202172447U (en) Expandable multi-path input/output matrix server
CN108769548A (en) A kind of decoding video output system and method
CN201107858Y (en) Video matrix mainframe

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190830

RJ01 Rejection of invention patent application after publication