CN110493640A

CN110493640A - A kind of system and method that the Video Quality Metric based on video processing is PPT

Info

Publication number: CN110493640A
Application number: CN201910706271.1A
Authority: CN
Inventors: 敖欣; 朱泓谕; 吴永满; 黄鑫杰; 陈钿
Original assignee: Dongguan University of Technology
Current assignee: Dongguan University of Technology
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2019-11-22

Abstract

The invention discloses a kind of system and methods that the Video Quality Metric based on video processing is PPT, the system comprises have data storage server, using processing server and WEB server, wherein, the application processing server has pattern process module, audio processing modules and document integrate module, the data storage server isolates voice flow and video flowing, voice flow will be isolated and video flowing is transferred to pattern process module using processing server respectively, audio processing modules, the processing of great amount of images data is carried out in pattern process module, the processing of audio is carried out in audio processing modules, and it is converted into text；Then picture and lteral data are exported integrate module to document respectively by pattern process module, audio processing modules, and document integrates module and matches picture and text, and export a whole set of complete PPT document to WEB server.

Description

A kind of system and method that the Video Quality Metric based on video processing is PPT

Technical field

The present invention relates to information technology fields and field of Educational Technology, particularly relate to a kind of video turn based on video processing It is changed to the system and method for PPT.

Background technique

With the growth of the value of knowledge of modern society, useful information is likely to just written in water, such as opens in leader When meeting, class-teaching of teacher, when public speaking, speaker says quickly, causes us the time not remember so multi information, for this Class problem, currently used method are exactly to save information by hand-written notes, shooting explanation video etc..Both above-mentioned solutions Method all brings some problems, for hand-written notes: notes it is incomplete, causing cannot be accurately by useful letter Breath preserves, and looks back scene when having no idea to reappear explanation when notes；And for shooting explanation video: video It is likely to grow very much, dozens of minutes to several hours etc., this is one and no small chooses when looking back information for us War, and the voice recorded may be smudgy, and sounding can be gruelling.

Patent application 201710179528.3 discloses a kind of video, handout PPT and voice content precisely matched method And system, the method shoot with video-corder teachers video by video camera, and the record installed on the computer of PPT is played by class-teaching of teacher Screen software records computer video and teachers video and computer video is merged processing with course entitled index；According to figure Video segmentation is several video-frequency bands by picture variation, and the identical video-frequency band of text in video is merged, the time of video segmentation is recorded Value；It extracts the voice messaging in teachers video and is converted to text, record the time value of every language；With course name and when Between value for index, establish the data correlation between video, image, voice and content.The present invention is applied in field of Educational Technology, can When realizing online broadcasting teaching resource, user can position oneself interested PPT handout content or classroom by search service Voice content, and the instructional video of program request relevant time period can be positioned at any time, play the PPT handout content of related pages.

However, above-mentioned method is only to record to specific PPT, it cannot achieve video and be converted into PPT.To sum up, Existing solution can all lead to huge waste of the information in transmission process and increase the difficulty of understanding.

Summary of the invention

Shortcoming present in view of the above technology, the present invention provide it is a kind of based on video processing Video Quality Metric be The system of PPT, the video flowing which has PPT to explain by a shooting, can be believed huge video using this system A whole set of complete PPT document of boil down to is ceased, user can quickly extract the part of oneself needs from bulk information, and And the speech of the speaker of remarks can deepen understanding of the user to problem, the efficiency for improving study, looking back.

It is a further object to provide a kind of system that the Video Quality Metric based on video processing is PPT, the systems It is convenient to realize, it is only necessary to which the equipment that can be shot can be realized, and easy to operate, it is only necessary to video is passed to this system, just It can be converted into complete PPT document, be suitable for a variety of places.

To achieve the above object, the invention is realized in this way.

A kind of system that the Video Quality Metric based on video processing is PPT, it is characterised in that the system comprises have data to deposit Store up server, using processing server and WEB server, wherein the application processing server has pattern process module, sound Frequency processing module and document integrate module, and the data storage server isolates voice flow and video flowing, will isolate voice Stream is transferred to the pattern process module using processing server, audio processing modules with video flowing respectively, in pattern process module The processing of great amount of images data is carried out, carries out the processing of audio in audio processing modules, and be converted into text；Then graphics process Picture and lteral data are exported integrate module to document respectively by module, audio processing modules, document integrate module by picture and Text is matched, and exports a whole set of complete PPT document to WEB server.

A method of the Video Quality Metric based on video processing is PPT, and this method is isolated by data storage server Voice flow and video flowing, then will isolate voice flow and video flowing is transferred to graphics process mould using processing server respectively Block, audio processing modules carry out the processing of image data in pattern process module, convert picture, audio processing mould for video The processing of audio is carried out in block, and is converted into text；Then pattern process module, audio processing modules are respectively by picture and text Data, which are exported, integrates module to document, and document integrates module and matches picture and text, and it is complete to export a whole set of PPT document is to WEB server.+

Further, the method receives the request for the uploaded videos that client is sent, video first with data storage server Stream is passed to data storage server first and carries out data backup, and data processing is carried out in data storage server, is isolated Voice flow and video flowing, and be N frame picture by Video segmentation (N is the positive integer more than or equal to 1)；Again by image data and voice Data are incoming to apply processing server, carries out image procossing and speech processes in application processing server.

Further, in application processing server: giving a kind of processing great amount of images number in pattern process module According to method, this method may have inclination along x, tri- axis of y, z in view of PPT view field, need to become by perspective Changing commanders, it is inverted to parallel position.Specifically: grayscale image is converted by each frame image first and is filtered noise reduction process, then Switch to two-value picture, PPT view field is rectangular area, and is highlight regions compared to ambient enviroment, is then calculated using Canny Method carries out edge detection and extracts profile, objective contour is chosen by the method for extracting the maximum profile of area, then with more The method of side shape fitting surrounds profile, then obtain rectangular corners coordinate by finding the convex closure of profile, then to obtaining four The coordinate of a point is ranked up, and separates upper left, upper right, bottom right, lower-left, is coordinately transformed to obtain most finally by transformation matrix The image wanted eventually, that is, the position projected, and be cut into, then as the defeated of image processing model Enter.

Further, in image processing model, Gauss gold word (is first used using gaussian filtering+down-sampled operation Tower), then detect characteristic point and extract feature vector, for two pictures, if feature vector is more similar, just represent this two Picture is more similar, and more similar picture temporarily only retains one, remaining similar pictures is stored in set, after reduction The operand in face is finally before recognition restored picture by up-sampling operation (using laplacian pyramid)； Then the text in remaining each frame picture is identified with the convolutional neural networks model that training is completed with image, and remembered The coordinate for recording word and image in picture, is finally reconfigured each element by coordinate on the new PPT of one page.

Further, the step of a kind of processing audio is given in audio processing modules, it may be assumed that progress VAD detection first, Voice and environmental noise are classified using GM model, audio is subjected to noise reduction process, then by based on artificial neural network Hybrid algorithm identifies audio, and is converted into text.

Further, it is integrated in document and provides the algorithm of a kind of voice and video matching in module, it is specific as follows: in figure In image collection obtained in processing module, this is found out by calculating the time interval of each image collection, then from audio The text that time interval is converted to thus may be implemented accurately to match the voice of speaker into the remarks section at PPT pages, and And a whole set of complete PPT document is exported to WEB server.

Further, in pattern process module, for the picture addition time label being converted into, meanwhile, for sound Frequency does the text being converted into and also carries out time label, in order to be matched with the text that audio is converted into.

The method for the video flow processing that the present invention is realized: it in terms of image, can effectively be captured by target positioning Projection information on classroom or meeting, it is bright occurring in video using continuous frame difference method under the premise of carrying out video denoising The frame image of aobvious frame variation obtains, avoids the motion image blurring generated between frame in video, loss of detail and again Shadow problem, while improving the image quality of video；Substantially increase the transformation efficiency and clarity of video.

Compared with prior art, the beneficial effects of the present invention are:

The mode in meeting, classroom recorded information on the market is to carry out hand-written notes or directly regard under record mostly now Frequently, and these modes can not only have inconvenience, also result in the loss of precious information, the present invention, which proposes one to this, to be had The solution of effect, the present invention only need user that the video that shooting has PPT to explain is passed to this system, and system will utilize each skill PPT in video is extracted and is attached to text made of the speech conversion of speaker by the processing of art module.

In structure, product on the market is inputted by mobile phone terminal mostly, and the present invention supports multi-platform input, It is also more diversified in core technology as long as the platform that can connect internet is ok, it has been divided at different logics Server is managed, the processing of multimode is realized, avoids mixing in logic.

Detailed description of the invention

Fig. 1 is the structural schematic diagram for the system that the present invention is realized.

Specific embodiment

In order to more clearly state the present invention, the present invention is further described with reference to the accompanying drawing.

Refering to Figure 1, as shown in the figure, present invention institute is real shown in the system structure diagram realized for the present invention The system that the existing Video Quality Metric based on video processing is PPT, include data storage server, using processing server and WEB server, wherein there is the application processing server pattern process module, audio processing modules and document to integrate module, The data storage server isolates voice flow and video flowing, will isolate voice flow and video flowing is transferred to respectively at Pattern process module, the audio processing modules of server are managed, the processing of great amount of images data, audio are carried out in pattern process module The processing of audio is carried out in processing module, and is converted into text；Then pattern process module, audio processing modules are respectively by picture It is exported with lteral data and integrates module to document, document integrates module and matches picture and text, and exports a whole set of Complete PPT document is to WEB server.

The method that the Video Quality Metric that the present invention is realized as a result, is PPT are as follows:

S1 server receives the request for the uploaded videos that client is sent, and video flowing is passed to S1 first and carries out data backup, In Data processing is carried out in S1, isolates voice flow and video flowing, and is that (N is just more than or equal to 1 to N frame picture by Video segmentation Integer)；Image data and voice data are passed to S2 application processing server again, carried out at image procossing and voice in S2 Reason.

Treatment process in S2 is specific as follows: a kind of processing great amount of images data are given in pattern process module S2.1 Method, it is contemplated that PPT view field may have inclination along x, tri- axis of y, z, need by perspective transform that it is anti- Go to parallel position.Grayscale image is converted by each frame image first and is filtered noise reduction process, then turns to two-value picture, PPT view field is rectangular area, and is highlight regions compared to ambient enviroment, so carrying out edge inspection using Canny algorithm Profile is surveyed and extracted, objective contour is chosen by the method for extracting the maximum profile of area, then uses the side of polygon approach Method surrounds profile, obtains rectangular corners coordinate followed by the convex closure for finding profile, first to obtain the coordinates of four points into Row sequence, separates upper left, upper right, bottom right, lower-left, is coordinately transformed finally by transformation matrix and finally to be wanted Image, that is, the position projected, and be cut into, then as the input of image processing model.

In image processing model, first using gaussian filtering+down-sampled operation (using gaussian pyramid), then spy is detected Sign point simultaneously extracts feature vector, for two pictures, if feature vector is more similar, just represents this two picture and gets over phase Seemingly, more similar picture temporarily only retains one, remaining similar pictures is stored in set, subsequent operation is reduced Amount is finally before recognition restored picture by up-sampling operation (using laplacian pyramid).Then with instruction Practice the convolutional neural networks model of completion to identify to the text in remaining each frame picture with image, and in recordable picture The coordinate of word and image is finally reconfigured each element by coordinate on the new PPT of one page；In audio processing modules A kind of the step of processing audio is given in S2.2, first progress VAD detection, uses GM model by voice and environment herein Audio is carried out noise reduction process, then is identified by the hybrid algorithm based on artificial neural network to audio by noise classification, and It is converted into text, the algorithm for providing a kind of voice and video matching in module S2.3 is integrated in document, it is specific as follows: in S2.1 In obtained image collection, by calculating the time interval of each image collection, then this time section is found out from audio and is turned The text dissolved thus may be implemented accurately to match the voice of speaker into the remarks section at PPT pages, and it is whole to export one Complete PPT document is covered to WEB server S3.

In short, the invention has the advantages that

Give it is a kind of take notes in modern society, review the solution looked back and had any problem, have PPT explanation by a shooting Video flowing, use this system can be by huge video information compression for a whole set of complete PPT document, user can be with The part of oneself needs is quickly extracted from bulk information, and the speech of the speaker of remarks can deepen user to asking The understanding of topic, the efficiency for improving study, looking back.

Operation of the present invention is simple, it is only necessary to which video, which is passed to this system, can be converted into complete PPT document.

Strong applicability of the present invention is suitable for a variety of places, limits without occasion, and only need the equipment that can be shot It is ok.

The present invention can tentatively solve the problems, such as identification one page PPT template.

Disclosed above is only several specific embodiments of the invention, but the present invention is not limited to this, any ability What the technical staff in domain can think variation should all fall into protection scope of the present invention.

Claims

1. a kind of system that the Video Quality Metric based on video processing is PPT, it is characterised in that the system comprises have data storage Server, using processing server and WEB server, wherein the application processing server have pattern process module, audio Processing module and document integrate module, and the data storage server isolates voice flow and video flowing, will isolate voice flow Be transferred to the pattern process module using processing server, audio processing modules respectively with video flowing, in pattern process module into The processing of row great amount of images data, the processing of audio is carried out in audio processing modules, and is converted into text；Then graphics process mould Picture and lteral data are exported integrate module to document respectively by block, audio processing modules, and document integrates module for picture and text Word is matched, and exports a whole set of complete PPT document to WEB server.

2. a kind of method that the Video Quality Metric based on video processing is PPT, this method isolate language by data storage server Sound stream and video flowing, then will isolate voice flow and video flowing is transferred to graphics process mould using processing server respectively Block, audio processing modules carry out the processing of image data in pattern process module, convert picture, audio processing mould for video The processing of audio is carried out in block, and is converted into text；Then pattern process module, audio processing modules are respectively by picture and text Data, which are exported, integrates module to document, and document integrates module and matches picture and text, and it is complete to export a whole set of PPT document is to WEB server.

3. method according to claim 2, it is characterised in that the method receives client first with data storage server The request of the uploaded videos sent is held, video flowing is passed to data storage server first and carries out data backup, stores in data Data processing is carried out in server, isolates voice flow and video flowing, and is N frame picture by Video segmentation, and N is more than or equal to 1 Positive integer；It is again that image data and voice data is incoming using processing server, image is carried out in application processing server Processing and speech processes.

4. method as claimed in claim 3, it is characterised in that in application processing server: first converting each frame image It is filtered noise reduction process for grayscale image, then turns to two-value picture, PPT view field is rectangular area, and compared to surrounding Environment is highlight regions, then carries out edge detection using Canny algorithm and extracts profile, by extracting the maximum wheel of area Wide method chooses objective contour, is then surrounded profile with the method for polygon approach, then the convex closure by finding profile Rectangular corners coordinate is obtained, then the coordinate for obtaining four points is ranked up, separates upper left, upper right, bottom right, lower-left, finally It is coordinately transformed the image finally wanted by transformation matrix, that is, the position projected, and is cut into Come, then as the input of image processing model.

5. method as claimed in claim 4, it is characterised in that in image processing model, first using gaussian filtering+down-sampled Operation, then detect characteristic point and extract feature vector, for two pictures, if feature vector is more similar, just represent this Two pictures are more similar, and more similar picture temporarily only retains one, remaining similar pictures is stored in set, reduce Subsequent operand is finally before recognition restored picture by using laplacian pyramid；Then it uses and has trained At convolutional neural networks model the text in remaining each frame picture identified with image, and in recordable picture word and The coordinate of image is finally reconfigured each element by coordinate on the new PPT of one page.

6. method according to claim 2, it is characterised in that in audio processing modules: progress VAD detection first, using GM Model classifies voice and environmental noise, audio is carried out noise reduction process, then calculate by the mixing based on artificial neural network Method identifies audio, and is converted into text.

7. method according to claim 2, it is characterised in that integrated in module in document: obtained in pattern process module In image collection, find out what this time section was converted to by calculating the time interval of each image collection, then from audio Text thus may be implemented accurately to match the voice of speaker into the remarks section at PPT pages, and it is complete to export a whole set of PPT document to WEB server.

8. method according to claim 2, it is characterised in that in pattern process module, added for the picture being converted into Time label, meanwhile, the text being converted into is done for audio and also carries out time label, in order to the text that is converted into audio into Row matching.