CN110191352A

CN110191352A - A kind of comprehensive display system towards video content Intelligent treatment

Info

Publication number: CN110191352A
Application number: CN201910456376.6A
Authority: CN
Inventors: 李海峰; 马琳; 李洪伟; 薄洪建; 丰上; 徐聪; 陈婧; 房春英
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2019-08-30

Abstract

The invention discloses a kind of comprehensive display systems towards video content Intelligent treatment, comprising: synthetic display module, feature display module, recognition result display module；The synthetic display module completes video playing, video essential information is shown, video content recognition result is simply shown；The audio and video characteristic that the feature display module is used to transmit is shown in the form of visual image, and as the broadcasting of video positions in real time, grasps the changing features situation of audio-video in real time convenient for researcher；The recognition result display module is used to display the details of the recognition result of video content.Of the invention is a little: by the flow path visual of video intelligent information processing, the working efficiency of raising researcher；Researcher is supported to save various information and convenient for the information exchange between different developers avoid that unnecessary repeated work occurs.

Description

A kind of comprehensive display system towards video content Intelligent treatment

Technical field

The present invention relates to multimedia intelligent technical field of information processing, in particular to one kind is towards video content Intelligent treatment Comprehensive display system.

Background technique

In recent years, various towards complicated applications with the continuous development of computer and its embedded system soft hardware performance The intelligent video monitoring system of scene constantly moves towards market, and the video product with intelligent video processing capacity is increasingly becoming video The mainstream of product.Intelligent video analysis is to belong to image/video processing technique and computer vision (CV, Computer Vision) technology, belongs to artificial intelligence (AI, Artificial Intelligent) research field, this technology can pass through Digital Image Processing and video signal analysis extract and understand the content in video pictures.The Intelligent treatment conduct of video content An important subject in MultiMedia Field, in the various fields such as public safety field, judicial domain, field of traffic all It is widely used.Intelligent Information Processing to video content includes that video scene cuts, is sudden and violent in Video coding and decoding, video Multiple sub- projects such as separation of audio-video in subtitle recognition, video in power scene detection, video.

Although current intelligent video analysis market is burning hot, for laboratory researchers, there are many to ask by people at present Topic.Since the sub- project that intelligent video analysis is related to is relatively more, many researchers often share out the work and help one another, everyone only studies One to two problems, the research achievement for finally integrating more people obtain a complete product.Form in this way, Efficiency It is improved.But there is also certain drawbacks for this working forms.(1) often exist between the sub- project of video intelligent analysis Association, this will be such that work repeats, and waste unnecessary human resources；(2) before final products birth, everyone work The visualization of work is all poor, and project person in overall is difficult intuitively effectively to supervise everyone work；(3) audio and video characteristic mentions After taking, feature and video hardly result in unification in time, this brings certain puzzlement to research staff.

Summary of the invention

The present invention in view of the drawbacks of the prior art, provides a kind of comprehensive display system towards video content Intelligent treatment System can effectively solve that researcher encounters that audio and video characteristic visualization is low, research achievement is difficult to turn in the course of the research The problems such as turning to actual product.Achieve the purpose that improve Efficiency.What the present invention can require in smooth playing technical indicator Video format can carry out curtain to video to be processed as required and scene is cut, can individually extract the audio file in video, together When visual audio volume control is provided, feature shows program, provides the beautiful interface UI, can be to voice, emotion, scene, subtitle Equal recognition results carry out real-time display.

In order to realize the above goal of the invention, the technical solution adopted by the present invention is as follows:

A kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, feature show mould Block, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features module；

The function that modules are realized is as follows:

(1) synthetic display module: a. provides video playback window, is capable of the video lattice of smooth playing mainstream currently on the market Formula, video playing are checked frame by frame without Caton, support；B. the curtain and scene for showing video support jumping in real time for scene；C. it shows The Time-Frequency Information of video sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, two for the sub-interface Secondary exploitation；D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result Associated frame is navigated in real time.

(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene；B. it shows in video The shape information of audio, and follow up in real time with video playing；C. sub-interface is provided to be used to show the audio file or video extracted File characteristic, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter.

(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, by Episode sequences point Block arrangement；B., interactive interface is provided, manual amendment's recognition result is used for；C. the audio being currently played, subtitle are highlighted Corresponding recognition result；D. recognition result piecemeal can be saved to local directory.

(4) Video decoding module: internal module parses video file, can extract each in video file Frame simultaneously saves specified video clip.

(5) scene partitioning module: internal module receives the data from XML parsing module and Video decoding module, to view Frequency carries out scene partitioning.

(6) XML parsing module: internal module parses different types of XML file, and is passed data according to parsing result It send to different modules.

(7) Fusion Features module: internal module receives the data from XML parsing module, and the frame of audio or video is special Sign, Duan Tezheng, global characteristics are merged according to specific mode, are sent into feature display module.

Further, the synthetic display module, feature display module, recognition result display module are embodied as visualizing boundary Three interfaces can be presented simultaneously in face on three panel type displays.It is synchronous by video playback time between three interfaces.

Further, the feature display technology that the comprehensive display system is merged based on more granularities, using at computer When managing video data, more grain size characteristic extractions will also be carried out to audio and video characteristic.

Further, the comprehensive display system supports the organizational form of self-defining parameter, but must satisfy XML rule Model.

Further, the comprehensive display system provides expansible programming, be synthetic display module, feature display module, The programming interface that recognition result display module is provided for secondary development, display format or change feature for user-defined feature Temporal resolution, the configuration file of interface equally uses XML specification.

The invention also discloses the working methods of comprehensive display system, include the following steps:

Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video to video first Decoding.Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output solution Audio stream and video flowing after code.

Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module.Scene partitioning module exists After receiving video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter pair Video flowing and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream with The form of cache file is stored in local disk, calls for subsequent module；

Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, led to The control button for crossing played column can carry out normal play, play two kinds of play mode frame by frame.The parsing of synthetic display module call XML Module reads external XML file, reads tag file and recognition result file and shows.

Step 4, feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, it is System calls Fusion Features resume module and shows more grain size characteristics.

Step 5, recognition result display module is read recognition result and is shown, synthesis display by call XML parsing module Module, feature display module and recognition result display module are carried out with video playing is synchronous.

Further, more grain size characteristic processing and display methods are as follows:

Each dimension is aligned on time dimension first, frame feature, Duan Tezheng, global characteristics matrix are set respectively Set alignment factor matrix A_x、A_s、A_g.Alignment factor arranged in matrix is following form.

Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, P_gFor global characteristics Dimension.In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, it is each after being just aligned Grain size characteristic matrix.

After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be by following formula meter It obtains:

Wherein A_x、A_s、A_gFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.X_P×T、S_M×T、For frame spy Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.It, will after obtaining fusion matrix Eigenmatrix is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.

Compared with prior art the present invention has the advantages that

(1) by the flow path visual of video intelligent information processing, the working efficiency of raising researcher；

(2) it supports researcher to save various information, convenient for the information exchange between different developers, avoids occurring not Necessary repeated work；

(3) more grain size characteristic method for visualizing are provided, more grain size characteristics are shown simultaneously；

(3) it is controlled by the time, video playing, feature is shown, synchronized as the result is shown, researcher is facilitated to find in time Problem provides clearly logic；

(4) present invention provides quadratic programming interface, and different research teams is facilitated to carry out personalized adjustment.

Figure of description

Fig. 1 is overall system architecture block diagram；

Fig. 2 is more grain size characteristic fusion schematic diagrames.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention more comprehensible, below in conjunction with attached drawing and embodiment is enumerated, The present invention is described in further details.

As shown in Figure 1, a kind of comprehensive display system towards video content Intelligent treatment, comprising: synthetic display module, spy Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould Block；

The synthetic display module, feature display module, recognition result display module are embodied as visualization interface, can be three Three interfaces are presented on panel type display simultaneously.It is synchronous by video playback time between three interfaces.

Three modules realize following functions:

The present invention proposes the feature display technology merged based on more granularities.Timing information processing is that multi-level depth adds Work.In the vision processing of people, visual cortex cell has carried out preparatory processing to the block of pixels in image, preliminary by what is obtained The lesser information of granularity is sent to cerebral cortex and carries out more advanced information extraction, just forms the concept of each object in picture. Equally we will also carry out more grain size characteristic extractions to audio and video characteristic when using computer disposal video data.Based on this, Based on the feature display technology of more granularities in it is proposed that.

The present invention and the information exchange of external program are mainly to be completed by configuration file.In the present invention, scene The parameter that the functions such as division, speech waveform, characteristic set, recognition result need is read by the way that XML file is unified.The present invention supports The organizational form of self-defining parameter, but must satisfy XML specification.

The present invention provides expansible programming, and the present invention is synthetic display module, feature display module, recognition result display mould Block provide for secondary development programming interface, for user-defined feature display format or change feature temporal resolution, The configuration file of interface equally uses XML specification.

Overall workflow of the invention is as follows:

After video to be processed enters system, first passes around Video decoding module and audio-video separation and video solution are carried out to video Code.Video is compressed due to generalling use technology of video compressing encoding at present to reduce the occupancy of memory space.Therefore Video file, which needs to first pass through decoding, will carry out subsequent broadcasting, and in this step, digital video file is decoded, mentioned simultaneously The audio stream in video file is taken, audio-video separation is completed, finally exports decoded audio stream and video flowing.

The video flowing and audio stream extracted in previous step is admitted to scene partitioning module.Scene partitioning module is receiving view After frequency stream and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video flowing with Audio stream is cut, and extracts the thumbnail of each act of key frame, by after cutting video flowing and audio stream with cache file Form be stored in local disk, for subsequent module call；

Synthetic display module reads the video flowing and audio stream after cutting, which provides video playback capability, can be used for Video is played, normal play can be carried out by the control button of played column, play two kinds of play mode frame by frame.Synthetic display module Call XML parsing module reads external XML file, reads tag file and recognition result file and shows.

Feature display module reads the parameter of video file by parsing XML file.After characteristic parameter is read, system is called The more grain size characteristics of Fusion Features resume module.More grain size characteristics and its display methods will be described in detail below.

More grain size characteristics it is as shown in Figure 2.Clock signal is mostly the random signal of nonlinear and nonstationary, can not be straight on it It connects with Digital Signal Processing, it is therefore desirable to carry out a point window on signal and operate, just obtain the concept of frame.In this way one It can be obtained by the sequence of frame feature on section clock signal, to realize portraying for clock signal in small granularity.And During actual treatment problem, since most classifiers can not classify to the feature of Length discrepancy, then just by one section All frames obtained in clock signal are for statistical analysis, and by the spy of this section the most clock signal of finally obtained statistic Sign.During this processing, although last feature is from lesser frame unit by being calculated, but it is One entirety of whole section of clock signal reflects, can not embody the different variations of different moments and different time in clock signal The different trend changed in section, therefore such method is that deeper high-level information can not be extracted from clock signal 's.

It is aligned on time dimension firstly the need of by each dimension, we are respectively to frame feature, Duan Tezheng, global spy Levy arranged in matrix alignment factor matrix A_x、A_s、A_g.Herein, alignment factor matrix we be set as following form.

Wherein, I is unit battle array.P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, P_gFor global characteristics Dimension.In this way, we are by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, after being just aligned Each grain size characteristic matrix.

Wherein A_x、A_s、A_gFor frame feature, Duan Tezheng, global characteristics matrix alignment factor.X_P×T、S_M×T、For frame spy Sign, Duan Tezheng, the corresponding eigenmatrix of global characteristics matrix, T is time granularity.Each column are all a warps in this way in matrix F The frame feature of fusion is crossed, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains passes through in frame feature Cross the section grain size characteristic that Gauss convolution of functions operation obtains, and the global characteristics of sequential signal at this time.After obtaining fusion matrix, I Eigenmatrix is mapped in RGB image, send RGB image back to feature display module, complete the display of more grain size characteristics.

Recognition result display module is read recognition result and shown, ground for current by call XML parsing module To study carefully, video content intelligent recognition is concentrated mainly on subtitle recognition, speech recognition, video scene identification, speech emotion recognition etc., I By identification content and video scene divide piecemeal show.

Synthetic display module, feature display module and recognition result display module are carried out with video playing is synchronous, are conveniently ground Study carefully the work of personnel.

Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright implementation method, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.Ability The those of ordinary skill in domain disclosed the technical disclosures can make its various for not departing from essence of the invention according to the present invention Its various specific variations and combinations, these variations and combinations are still within the scope of the present invention.

Claims

1. a kind of comprehensive display system towards video content Intelligent treatment characterized by comprising synthetic display module, spy Levy display module, recognition result display module, Video decoding module, scene partitioning module, XML parsing module and Fusion Features mould Block；

The function that modules are realized is as follows:

(1) synthetic display module: a. provides video playback window, is capable of the video format of smooth playing mainstream currently on the market, Video playing is checked frame by frame without Caton, support；B. the curtain and scene for showing video support jumping in real time for scene；C. view is shown The Time-Frequency Information of frequency sound intermediate frequency, and follow up in real time with video playing, expansible programming interface is provided, for the secondary of the sub-interface Exploitation；D. sub-interface is provided for showing the recognition results such as voice, subtitle, event, emotion, it is desirable that can be according to recognition result reality When navigate to associated frame；

(2) feature display module: a. shows the curtain and scene of video, supports jumping in real time for scene；B. video sound intermediate frequency is shown Shape information, and follow up in real time with video playing；C. sub-interface is provided to be used to show the audio file or video file extracted Feature, feature are transmitted by configuration file, and sub-interface requirement simultaneously can freely add sub-interface according to parameter；

(3) recognition result display module: a. shows the recognition results such as voice, subtitle, event, emotion, arranges by Episode sequences piecemeal Column；B., interactive interface is provided, manual amendment's recognition result is used for；C. it is corresponding that the audio being currently played, subtitle are highlighted Recognition result；D. recognition result piecemeal can be saved to local directory；

(4) Video decoding module: parsing video file, can extract each frame in video file and save specified Video clip；

(5) scene partitioning module: receiving the data from XML parsing module and Video decoding module, carries out scene to video and draws Point；

(6) XML parsing module: different types of XML file is parsed, and transfers data to different moulds according to parsing result Block；

(7) Fusion Features module: receiving the data from XML parsing module, by the frame feature of audio or video, Duan Tezheng, complete Office's feature is merged according to specific mode, is sent into feature display module.

2. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states synthetic display module, feature display module, recognition result display module and is embodied as visualization interface, it can be on three panel type displays Three interfaces are presented simultaneously；It is synchronous by video playback time between three interfaces.

3. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute The feature display technology that comprehensive display system is merged based on more granularities is stated, when using computer disposal video data, sound is regarded Frequency feature will also carry out more grain size characteristic extractions.

4. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states comprehensive display system and supports the organizational form of self-defining parameter, but must satisfy XML specification.

5. a kind of comprehensive display system towards video content Intelligent treatment according to claim 1, it is characterised in that: institute It states comprehensive display system and expansible programming is provided, provided for synthetic display module, feature display module, recognition result display module For the programming interface of secondary development, for user-defined feature display format or change the temporal resolution of feature, interface Configuration file equally uses XML specification.

6. a kind of comprehensive display system towards video content Intelligent treatment described in the one of them to 5 according to claim 1 Working method, which comprises the steps of:

Step 1, after video to be processed enters system, Video decoding module carries out audio-video separation and video solution to video first Code；Digital video file is decoded, while extracting the audio stream in video file, completes audio-video separation, finally output decoding Audio stream and video flowing afterwards；

Step 2, the video flowing extracted in step 1 and audio stream are admitted to scene partitioning module；Scene partitioning module is receiving To after video flowing and audio stream, call XML parsing module reads scene partitioning parameter from external file, according to parameter to video Stream and audio stream are cut, and extract the thumbnail of each act of key frame, by after cutting video flowing and audio stream to cache The form of file is stored in local disk, calls for subsequent module；

Step 3, video flowing and audio stream after cutting are read using synthetic display module, video playback capability is provided, by broadcasting The control button for putting column can carry out normal play, play two kinds of play mode frame by frame；Synthetic display module call XML parses mould Block reads external XML file, reads tag file and recognition result file and shows；

Step 4, feature display module reads the parameter of video file by parsing XML file；After characteristic parameter is read, system tune With Fusion Features resume module and show more grain size characteristics；

Step 5, recognition result display module is read recognition result and is shown, synthesis display mould by call XML parsing module Block, feature display module and recognition result display module are carried out with video playing is synchronous.

7. the method according to claim 6, it is characterised in that: more grain size characteristic processing and display methods are as follows:

Each dimension is aligned on time dimension first, respectively to frame feature, Duan Tezheng, global characteristics arranged in matrix pair Neat factor matrix A_x、A_s、A_g；Alignment factor arranged in matrix is following form；

Wherein, I is unit battle array；P is the dimension of frame grain size characteristic, the dimension that M is section grain size characteristic, P_gFor the dimension of global characteristics； In this way, by the way that alignment factor matrix is multiplied with the eigenmatrix of each granularity, each granularity after being just aligned Eigenmatrix；

After eigenmatrix alignment, the fusion for carrying out feature is operated using adduction, then fusion feature matrix F can be calculated by following formula Out:

Wherein A_x、A_s、A_gFor frame feature, Duan Tezheng, global characteristics matrix alignment factor；X_P×T、S_M×T、For frame feature, section Feature, the corresponding eigenmatrix of global characteristics matrix, T is time granularity；Each column are all one by fusion in this way in matrix F Frame feature, wherein containing three parts: the smallest frame grain size characteristic that initial calculation obtains, in frame feature pass through Gauss The section grain size characteristic that convolution of functions operation obtains, and the global characteristics of sequential signal at this time；After obtaining fusion matrix, by feature square Battle array is mapped in RGB image, is sent RGB image back to feature display module, is completed the display of more grain size characteristics.