CN116910302A - Multi-mode video content effectiveness feedback visual analysis method and system - Google Patents

Multi-mode video content effectiveness feedback visual analysis method and system Download PDF

Info

Publication number
CN116910302A
CN116910302A CN202310976858.0A CN202310976858A CN116910302A CN 116910302 A CN116910302 A CN 116910302A CN 202310976858 A CN202310976858 A CN 202310976858A CN 116910302 A CN116910302 A CN 116910302A
Authority
CN
China
Prior art keywords
video
validity
effectiveness
feedback
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310976858.0A
Other languages
Chinese (zh)
Inventor
马翠霞
黄泽远
贺强
邓小明
王宏安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202310976858.0A priority Critical patent/CN116910302A/en
Publication of CN116910302A publication Critical patent/CN116910302A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Abstract

The invention relates to a method and a system for visual analysis of validity feedback of multi-mode video content. The method comprises the following steps: collecting labels of a specific type of video and effectiveness objective indexes of the video; the multi-mode data characteristics of the concerned content in the video are extracted in a quantized mode, and then the validity factors are determined by combining with the actual requirements of the field, and the value of the validity factors of different contents is calculated; analyzing the correlation between the effectiveness factors and the effectiveness objective indexes, and extracting the effectiveness feedback result of the video to be analyzed; generating a reference video recommendation result by combining data of the video to be analyzed; and displaying the effectiveness feedback result of the video to be analyzed and the multi-mode data context thereof in different visual forms for the user to perform hierarchical exploration on the effectiveness feedback result. The invention provides a full-flow solution for video content effectiveness feedback visual analysis, which can better support users to know the result of effectiveness feedback and explore in a targeted manner.

Description

Multi-mode video content effectiveness feedback visual analysis method and system
Technical Field
The invention belongs to the technical field of information technology and visualization, and particularly relates to a method and a system for effectively feeding back and visually analyzing multi-mode video content.
Background
In recent years, the volume of multimedia video is rapidly increased, the video contains a large amount of information in various modes such as images, audios, texts and the like, the information transmission and thought expression intention of video authors are contained, and the information expression modes of different modes in the video content are also closely related to the expression effect. The lecture is a common expression form in life, and the expression, action, intonation and other lecture modes of the lecturer can play an important role in conveying the lecture content, so that different understanding experiences and resonance experiences can be produced by audiences, and a better lecture effect is realized. The lecture process of the lecturer is usually recorded as video, and lecture results of different occasions such as practice, formality and the like are recorded, so that subsequent analysis and transmission are supported. Similar to other video authors, a presenter has a need for feedback on the effectiveness of a presentation, needs to learn about the targeted feedback opinion for a particular presentation, obtains suggestions for improvement and reference learning objects, etc.
Currently, in the feedback of analysis of the effectiveness of video content, manual analysis is generally performed by a trainer, which is dependent on experience and consumes a great deal of labor cost. In recent years, there have been efforts in the related art to perform automated analysis. Some lecture training software supports extraction and analysis of data such as voice, but most of analysis means are relatively basic, and cannot be combined with various lecture skills in lectures for comprehensive analysis. The method and the system for video content effectiveness visual analysis based on multi-mode emotion disclosed in the Chinese patent application CN113743271A mainly depend on multi-mode emotion information, cannot cover video content in other aspects, mainly search the existing video database and analyze the effectiveness rule, cannot form effectiveness feedback to specific videos, and cannot obtain a potentially adjusted reference object.
Disclosure of Invention
The invention aims to provide a method and a system for effectively feeding back and visually analyzing multi-mode video content.
The video content effectiveness in the invention refers to the association between the multi-mode content in the video and the expression effect thereof, and determines the effectiveness evaluation mode in combination with the practical field, including but not limited to the relation between the method for carrying out the lecture in the lecture video and the lecture performance, the relation between the mode for teaching the curriculum in the teaching video and the curriculum effect, the relation between the entertainment content display mode in the entertainment video and the audience experience, and the like. Taking a lecture video as an example, the invention quantifies lecture skills in the lecture, introduces validity analysis of the lecture video, helps experts, beginners, commentators and the like to acquire validity feedback and lecture context relation for a specific lecture video, and recommends other lecture segments for reference analysis by a user according to a certain rule.
The technical scheme adopted by the invention is as follows:
a multi-mode video content effectiveness feedback visual analysis method comprises the following steps:
collecting labels of a specific type of video and effectiveness objective indexes of the video;
quantitatively extracting multi-mode data characteristics of the concerned content in the video;
on the basis of the extracted multi-mode data characteristics, determining validity factors according to the actual requirements of the field, and calculating to obtain validity factor values of different contents;
analyzing the correlation between the validity factors and the validity objective indexes to obtain a correlation result of the validity factors;
extracting a validity feedback result of the video to be analyzed by utilizing the correlation between the validity factors and the validity objective indexes;
generating a recommended video result by combining data of the video to be analyzed for reference by a user;
and displaying the effectiveness feedback result of the video to be analyzed and the multi-mode data context thereof in different visual forms for the user to perform hierarchical exploration on the effectiveness feedback result.
Further, the specific type of video comprises a lecture video, a teaching video, a sales video, an entertainment video and the like, and the label of the effectiveness objective index comprises a play amount, a ranking, a scoring, a success amount and the like.
Further, the multi-modal data sources include video, images, sounds, text, etc., and the multi-modal data features include facial expressions, limb movements, eye gaze, location, voice intonation, rhythm pauses, etc. of the person in the video, as well as background, hue, background sounds, etc. of the video picture.
Further, the determining the validity factor according to the actual requirements of the combined field includes: according to theories and requirements of the fields corresponding to the specific type of video, factors influencing the effectiveness of the specific fields are established, and correspond to skills, methods and the like of the specific fields, and have influence on the performance effect of the specific fields.
Further, the validity factors include at least one of the following: emotion proportion, emotion average level, emotion change degree, emotion diversity, action amplitude, action diversity, eye range, eye change speed, position change amplitude, position change speed, tone change amplitude, rhythm speed, pause amount, background type, tone brightness and the like.
Further, the analyzing the correlation between the validity factor and the validity objective index includes: and establishing the association between the validity factors and the validity objective indexes, such as analyzing the positive and negative correlation and the correlation degree between the validity factors and the validity objective indexes.
Further, the extracting the validity feedback result of the video to be analyzed by utilizing the correlation between the validity factor and the validity objective index includes: and extracting multi-modal data characteristics of the video to be analyzed, calculating the value of the validity factor, and predicting the validity feedback result of the video to be analyzed according to the correlation between the validity factor and the objective index.
Further, the method generates a recommended video result in combination with the data of the video to be analyzed, wherein the data of the video to be analyzed comprises, but is not limited to, multi-modal data characteristics, validity factor values and validity feedback results, the recommending method comprises, but is not limited to, similarity retrieval from a video database, recommending according to the whole and partial characteristics comprising, but not limited to, the video, and the granularity of recommended objects comprises, but is not limited to, the whole and the segment of the video.
Further, the hierarchical exploration of the validity feedback results supports the following functions of joint analysis and expression from whole to part: a validity factor feedback function, a video context understanding function, a time interval distribution understanding function, a data abstract and a similarity recommendation function.
A multimodal video content effectiveness feedback visual analysis system comprising:
the data collection module is responsible for collecting labels of a certain specific type of video and effectiveness objective indexes thereof;
the data feature extraction module is responsible for quantitatively extracting multi-mode data features of the concerned content in the video;
the validity factor calculation module is in charge of determining validity factors based on multi-mode data characteristics and combining with actual requirements of the field, and calculating values of different validity factors;
the validity analysis prediction module is responsible for establishing an association relation between a validity factor and a validity objective index, and using an association relation result obtained by analysis to predict the validity of the video to be analyzed;
the reference video recommending module is responsible for recommending videos which can be used for reference from a database by utilizing specified video content and related parameters according to the videos to be analyzed (the recommended videos can be similar to or different from the videos to be analyzed and can be determined according to different requirements);
the visual analysis module is responsible for integrating functions and data of the modules, displaying the data and results generated by the modules in different visual forms, and presenting the results in a complete interface so that a user can know the feedback result of the effectiveness of the video to be analyzed through the interface and further exploration is supported.
Through the visual analysis method and the visual analysis system provided by the invention, a user can know the effectiveness feedback result aiming at specific video content to find out the specific aspects which can be improved, can know the time distribution of the effectiveness of factors to find out the improvement position in the video content, can know the effectiveness factors in combination with multi-mode contexts in the video to be used for deeply understanding the video performance, can acquire the reference video instance for adjustment and promotion, and can summarize the multi-mode effectiveness factors in the video to be used for quick understanding and comparison.
Compared with the prior art, the invention has the following advantages and positive effects:
1. the invention provides a processing and analyzing flow of validity feedback of multi-mode content in video, and provides a full-flow solution of visual analysis of validity feedback of video content. Compared with the prior art, the method can better support the user to know the result of the validity feedback and conduct targeted exploration.
2. The invention provides an interactive visual analysis system for displaying, recommending, analyzing and exploring multi-mode contents in a user video, which allows the user to quickly know the effectiveness feedback conditions of different effectiveness factors in the video, supports the user to analyze in detail according to the context of the video, helps the user to quickly find video samples for reference through the recommendation, and supports targeted fine exploration on the video samples of interest so as to support the possible improvement of understanding the video to be analyzed.
3. The invention provides a visual analysis method and a visual analysis system based on multi-mode video content effectiveness feedback based on the feedback of the video content effectiveness, and the system can be used for analyzing the effectiveness feedback and possible improvement of content expression in a video. The video content effectiveness feedback is analyzed by means of a visualization method, the effectiveness feedback, the enhanced video content, the effectiveness time slice and the recommended video of the video are displayed through a visualization system, the method has the advantages of forming the video content effectiveness feedback and supporting the user to form improved insights, and the user is assisted to quickly understand and form deep insights through visual and effective visualization methods and interaction modes. Thus, multimodal video content effectiveness feedback visual analysis is considered in the present invention as the primary form of video analysis and is not limited to a particular field and a particular visualization method.
Drawings
FIG. 1 is a layout diagram of the overall flow of the method of the present invention and a multimodal video content effectiveness feedback visual analysis system.
FIG. 2 is a diagram of a multimodal video content effectiveness feedback visual analysis system interface in accordance with one embodiment of the invention.
Detailed Description
In order to better understand the present invention, the method and system for visual analysis of video content effectiveness based on multimodal emotion provided by the present invention are described in further detail below with reference to the accompanying drawings, but not by way of limitation.
The invention mainly comprises the following (wherein the description is directed to the field of lectures, the invention can also be applied to other video types of teaching video, entertainment video):
1. multi-mode data acquisition and processing flow
The multi-mode data acquisition and processing flow mainly comprises the following steps which are carried out for a specific field: 1) data collection, 2) data feature extraction, 3) effectiveness factor calculation, 4) effectiveness analysis prediction, 5) reference video recommendation, and 6) visual result generation. The multi-modal data includes three modalities of images, sound, text, etc. As shown in fig. 1, a lecture video will be described below as an example.
1) And (3) data collection: the video of the world public lecture tournament and related descriptive information (namely the label of the effectiveness objective index) published on a YouTube and other public platforms are crawled through a web crawler, and the lecture is divided into different levels of a resolution, a half resolution, a large area, a middle area, a small area, a club and the like, so that the video is used as a measure of the effectiveness of the lecture, namely, the higher the level of a match is, the higher the level of the lecturer is, and the more effective the lecture is. To ensure the effect of the correlation analysis, the number of lecture videos at each level should be approximately equal. In addition to the level information, information such as the name, region, subject matter, duration, etc. of the lecturer is collected, and such information is displayed in a visual system as required.
2) And (3) data characteristic extraction: in order to obtain multi-modal emotion data in a video, image frames, speech audio and speech text need to be extracted from the video, and all modes are aligned with text time stamps. The feature extraction algorithm and tool used in the present invention are described below from different modalities:
a. facial expression: face localization and face recognition are performed from the image frames, and face images of all speakers appearing in the video are found using DBSCAN (ref: M.Ester, H. -P.Kriegel, J.Sander, and X.xu.A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD'96, p.226-231.AAAIPress, 1996.). The continuous arousal and potency data in the face was then extracted using AffectNet (ref: A.Mollahosseini, B.Hasani, and M.H. Mahor. Affectnet: A database for facial expression, value, and arousal computing in the wild. IEEE Trans. Affect. Comput.,10 (1): 18-31, jan.2019.doi: 10.1109/TAFFC.2017.2740923) using the open source method on the network (ref: O.Arriaga, M.Valdenegro-Torr, and P).Real-time convolutional neural networks for emotion and gender classification.arxiv preprintarxiv: 1710.07557,2017).
b. Eye gaze: gaze direction of both eyes was estimated using an OpenFace kit (ref: T.Baltrusaitis, A.Zadeh, Y.C.Lim, and l. -p.mobility.openface 2.0:Facial behavior analysis toolkit.In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp.59-66,2018, and E.Wood, T.Baltruaitis, X.Zhang, Y.Sugano, P.Robinson, and a. Bundling. Rendering of eyes for eye-shape registration and gaze evaluation.in 2015IEEE International Conference on Computer Vision (ICCV), pp.3756-3764,2015). The angle at which the lecture looks at the camera is defined by the angle between the direction of the coordinate position of the eyes relative to the camera and the direction of gaze of the eyes.
c. Body posture: human body posture skeleton is estimated by MMPose toolkit (reference: M.Contributors. Openmmlab pose estimation toolbox and benchmark. Https:// gitub. Com/open-mmlab/mmpos, 2022.) and the speaker's skeleton is filtered out by setting rules. The lecture interval skeleton energy of the lecturer (ref: R.Niewiadomski, M.Mancini, and S.Piana. Human and virtual agent expressive gesture quality analysis and systems.Coverbal Synchrony in Human-Machine Interaction, pp.269-292,2013.) and lecture interval pose diversity were further calculated. The latter is obtained by calculating cosine distances between all the aligned and normalized gesture skeletons and the gesture of the first frame of the interval and then calculating standard deviation of the distance matrix.
d. Stage use: the distance from the camera is obtained by estimating the position of the speaker's head through the OpenFace kit. If the on-line lecture is performed, the position of the lecturer is defined by the center of a bounding box on the screen; if the speech is an offline speech, the position of the speaker is obtained by calculating the actual position of the speaker according to the camera.
e. Volume tone: the loudness of the speaker's speech is calculated as a volume value and the frequency as a pitch value by the Praat toolbox (ref: p.boersma and d.weennink.praat: doing phonetics by Computer [ Computer program ]. Version 6.1.38,retrieved 2January 2021http:// www.praat.org/, 2021.).
f. Speech rate pause: the pauses comprise inter-word and inter-sentence interval times, and the duration of syllables of each word is calculated to obtain a speed value. The estimation of word syllables can be done by NLTK language toolkit (ref: S.bird, E.Klein, and E.Loper.Natural language processing with Python: analyzing text with the natural language tool. "O' Reilly Media, inc." 2009.).
g. Text content: the text and sentence and corresponding timestamp may be obtained using an audio-to-text service provided by Microsoft Azure (reference: https:// Azure. Microsoft. Com/en-us/services/cognition-services/spech-to-text /) to convert the audio portion of the video to text.
3) And (3) calculating a validity factor: the characteristics of the change along with time are extracted in the last step, and direct influence factors of the change trend of data and the validity of a speech cannot be intuitively revealed.
Diversity (diversity): for emotion type characteristics to represent emotion types and relative proportion situations contained in specific lecture video, the emotion type characteristics are used for identifying the emotion type and relative proportion situationsAnd calculating, wherein e represents the number of emotion categories, and r represents the proportion of a certain emotion category.
Average (average): the mean value representing the multi-modal feature is calculated from the mean value of the time series. The method is suitable for the features of continuous emotion data (arousal degree and titer) of the face, skeleton energy, volume, tone, speech speed, pause and the like of the body gesture.
Volatility (volatility): the degree of temporal variation of the multi-modal feature is represented by the complexity of the calculation of the time series by the CID algorithm (ref. G.E.Batista, E.J.Keogh, O.M.Tataw, and V.de Souza. CID: an efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery,28 (3): 634-669, 2014.). The method is suitable for the features of continuous emotion data (arousal degree and titer), eye gaze direction, distance from a camera, position of a speaker, skeleton energy of body posture, volume, tone, speech speed, pause and the like of the face.
Distribution (dispersion): the variation amplitude of the multi-modal feature is represented by dividing the standard deviation of the time series by the average value to calculate the coefficient of variation. The method is suitable for the characteristics of eye gaze direction, distance from a camera, position of a lecturer and the like.
Ratio (ratio): representing the proportion of a certain state, by calculating the ratio of that state to all states. The method is suitable for the characteristics of discrete emotion types of the face, whether eyes look at a camera lens or not and the like.
4) Validity analysis predicts: in order to calculate the correlation between the validity factor and the validity of the lecture, the game level (the resolution, the halfprozier, the large area, the middle area, the small area, the club) to which the collected video belongs is taken as a label, and the labels are respectively marked with numbers of 6, 5, 4, 3, 2, 1 and the like, and the labels can be regarded as ordinal variables, namely, a certain sequence relation exists among the discrete labels. For such problems, the present invention uses a method of multi-class ordinal regression (ref: P.A.Guti' errez, M.Perez-Ortiz, J.Sanchez-Monedero, F.Fernandez-Navarro, and C.Hervas-Martinez. Ordinal regression methods: survey and experimental study.IEEE Transactions on Knowledge and Data Engineering,28 (1): 127-146, 2015.) to analyze and process, and can obtain a p value between each validity factor and the class label, wherein p represents the probability of assumption in the hypothesis test, p <0.05 is significant, and p <0.01 is very significant, and as the significance of the validity factor. For the video to be analyzed, based on the validity association relation which has been calculated based on the existing data set, the validity feedback result of the video to be analyzed can be predicted.
5) Reference video recommendation: and generating a result of recommending the video according to the to-be-analyzed lecture video, the data in the lecture video database and related parameters.
6) And (3) generating a visual result: and combining the data and the analysis result generated by the flow, and selecting a proper form to generate a visual result according to the characteristics of the data and the actual requirements.
Through the flow, the multi-mode data can be automatically acquired through video input, the relation between the multi-mode validity factors and the speech validity is mined, the reference video is recommended, and the data support is provided for the visual analysis method and system.
2. Multi-mode video content effectiveness feedback visual analysis system
As shown in the visual analysis module on the right side of fig. 1, the system interface is divided into four functions according to reading habits from left to right and from top to bottom: A. validity factor feedback (lecture factor panel), b.video context understanding (lecturer panel), c.time interval distribution understanding (time period slicing panel), d.data summary and similarity recommendation (mirror panel). These four main functions may cooperate together to help the user explore the validity feedback of the video to be analyzed and find the possibilities that can be improved.
A. The validity factor feedback function graphically displays the validity feedback of the video to be analyzed, and displays a validity rule diagram based on the data set and the distribution condition of the video to be analyzed in the data set. The function also comprises a function of selecting one or more validity factors, the selected result can influence the results of other functions, and the specific validity factors can be explored and analyzed.
B. The video context understanding function provides the condition of understanding the multi-mode data in the video context based on the video player, so that the understanding of multi-mode effectiveness feedback in the video can be enhanced while the video is watched, and meanwhile, a further data view is provided in an interactive mode, so that deep exploration of a user is supported.
C. The time interval distribution understanding function provides video effectiveness feedback distribution display and video interval selection functions, and performs visual display on multi-mode effectiveness factor distribution, multi-mode data and text content in the selected interval according to time sequence, so that finer exploration and analysis can be supported for users aiming at the selected effectiveness factors and the corresponding time interval.
D. The data summarization and similarity recommendation function provides the multi-mode data summarization of the selected fragments in the step C, and according to the requirements selected by the user, a recommendation result is formed by using the reference video recommendation module and displayed on a system interface, so that the user is assisted in knowing the video objects available for reference.
In this section, the present invention is presented with the emphasis on the arrangement of functions and the capabilities that should be provided, without limitation to the specific form of visualization, any form of visualization that can assist the user in analyzing the effectiveness of a presentation may be included in the system.
3. Video multi-level exploration method taking effectiveness feedback as core
Only the presentation of data is far from sufficient, the invention provides a video multi-level exploration method with effectiveness feedback as a core on the system proposed in the 2. Fig. 2 is a system interface diagram of an embodiment of the present invention, wherein A, B, C, D is described below as functions a-D.
The function A provides a feedback result of the validity factor of the video to be analyzed, and validity rules and data distribution display. The feedback result of different validity factors can be intuitively known in the function A (through A 1 The color bar mapping validity feedback results shown) and the distribution (via A 2 The panel shown shows further results) different validity factors can be clicked in a, the function C, D changing accordingly.
Function B, in the form of a video player, forms an understanding of the video context, and a user can understand the validity factors in the video content while watching the video, thereby enhancing the understanding of the validity factors and their associations between the video contexts. Highlighting (B) can be done in this function with key content in the video 2 -B 5 ) Can be realized by overlaying a visual form on the video, or overlaying an interactive function on the video, triggering further data views under the conditions of mouse suspension and the like (B 1 ) Helping users to understand the multi-modal data and its effectiveness more deeply.
The function C displays the distribution condition of video content effectiveness feedback along with video time, and displays multi-mode data and effectiveness factors in a selected interval. The function maps the validity feedback result (C 1 ) The method supports the selection of the speech interval by the user for detail exploration. Equally dividing into multiple parts on time axis, showing change condition of validity factor in each time slice interval, and corresponding multi-mode data condition (C 2 ). Text under the result map is fed back by validity (C 3 ) The method is convenient for users to intuitively know the text content in each interval and the corresponding multimodal validity feedback. Function C serves to assist the user in understanding the validity of the analysis lecture and the multimodal data.
Function D presents a multimodal summary of the selected video interval, displaying a user-set reference viewThe results are frequently recommended so that the user can quickly know and compare the lecture situation and find the case available for learning. The multi-modal video summary presents important data features for a user to quickly learn about the video content (D 1 ). The user can configure the recommended options of the reference video (D 3 ) Then, a result of the reference video recommendation is obtained, and the recommendation result can be displayed in a multi-mode video abstract mode, so that a user can conveniently find a proper reference source (D 2 ). Trigger the validity factor contrast panel (D) 4 ) The difference between the reference video and the video to be analyzed is conveniently known. Clicking on a recommendation may focus on the video and present its data in detail in function B and function C.
Based on the same inventive concept, another embodiment of the present invention provides a multi-mode video content effectiveness feedback visual analysis system, comprising:
the data collection module is responsible for collecting labels of a certain specific type of video and effectiveness objective indexes thereof;
the feature extraction module is responsible for collecting emotion data from multiple modes such as images, texts, sounds and the like in the video, and quantitatively extracting multi-mode data features of the concerned content in the video, including data features such as facial expression, limb actions, eye gaze, positions, backgrounds, voice intonation, speech speed pauses, background sounds, text content and the like of speakers;
the validity factor calculation module is in charge of determining validity factors based on multi-mode data characteristics and combining with actual requirements of the field, and calculating values of different validity factors;
the validity analysis prediction module is responsible for establishing an association relation between a validity factor and a validity objective index, and using an association relation result obtained by analysis to predict the validity of the video to be analyzed;
the reference video recommending module is responsible for recommending videos which can be used for reference and are similar to the videos to be analyzed from the database according to the specified video content and related parameters;
the visual analysis module is responsible for integrating functions and data of the modules, displaying the data and results generated by the modules in different visual forms, and presenting the results in a complete interface so that a user can know the effectiveness feedback of a specific video to be analyzed through the interface and further exploration is supported.
Wherein the modules are described above with reference to the method of the present invention for their specific implementation.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, implements the steps of the inventive method.
The method and system for visual analysis of validity feedback of multi-modal video content according to the present invention have been described in detail above, but it should be apparent that the implementation of the present invention is not limited thereto. Various obvious modifications thereof will be within the scope of the invention, as will be apparent to those skilled in the art, without departing from the spirit of the method of the invention and the scope of the claims.

Claims (10)

1.A method for visual analysis of validity feedback of multi-modal video content, comprising the steps of:
collecting labels of a specific type of video and effectiveness objective indexes of the video;
quantitatively extracting multi-mode data characteristics of the concerned content in the video;
on the basis of the extracted multi-mode data characteristics, determining validity factors according to the actual requirements of the field, and calculating to obtain validity factor values of different contents;
analyzing the correlation between the validity factors and the validity objective indexes to obtain a correlation result of the validity factors;
extracting a validity feedback result of the video to be analyzed by utilizing the correlation between the validity factors and the validity objective indexes;
generating a recommended video result by combining data of the video to be analyzed for reference by a user;
and displaying the effectiveness feedback result of the video to be analyzed and the multi-mode data context thereof in different visual forms for the user to perform hierarchical exploration on the effectiveness feedback result.
2. The method of claim 1, wherein the particular type of video comprises one of lecture video, educational video, sales video, entertainment video, and the label of the effectiveness objective index comprises a play amount, a rank, a score, and a volume of a transaction.
3. The method of claim 1, wherein the multimodal data includes video, images, sounds, text, and wherein the multimodal data features include facial expressions, limb movements, eye gaze, location, speech intonation, rhythmic pauses of a person in the video, and background, hue, and background sound of a video scene.
4. The method of claim 1, wherein determining the validity factor in connection with the domain actual demand comprises:
establishing factors influencing the effectiveness of a specific field according to the theory and the requirement of the field corresponding to the specific type of video, wherein the factors correspond to the skills and the methods of the specific field and have influence on the performance effect of the specific field; the validity factors include at least one of the following: emotion proportion, emotion average level, emotion change degree, emotion diversity, action amplitude, action diversity, eye range, eye change speed, position change amplitude, position change speed, tone change amplitude, rhythm speed, pause amount, background type and tone brightness.
5. The method according to claim 1, wherein the analyzing the correlation between the validity factor and the validity objective index is to establish a correlation between the validity factor and the validity objective index, including analyzing the positive and negative correlation and the degree of correlation between the two; the extracting the validity feedback result of the video to be analyzed by utilizing the correlation between the validity factors and the validity objective indexes comprises the following steps: and extracting multi-modal data characteristics of the video to be analyzed, calculating the value of the validity factor, and predicting the validity feedback result of the video to be analyzed according to the correlation between the validity factor and the objective index.
6. The method of claim 1, wherein the generating a recommended video result in combination with the data of the video to be analyzed, wherein the data of the video to be analyzed includes multimodal data characteristics, validity factor values, and validity feedback results, the recommending method includes similarity retrieval from a video database, recommending that the granularity of the recommended objects include both whole and segment of the video in terms of including whole and partial characteristics of the video.
7. The method of claim 1, wherein the hierarchical exploration of effectiveness feedback results supports the following overall to local joint analysis and expression functions: a validity factor feedback function, a video context understanding function, a time interval distribution understanding function, a data abstract and a similarity recommendation function.
8. A multi-modal video content effectiveness feedback visual analysis system, comprising:
the data collection module is responsible for collecting labels of a certain specific type of video and effectiveness objective indexes thereof;
the data feature extraction module is responsible for quantitatively extracting multi-mode data features of the concerned content in the video;
the validity factor calculation module is in charge of determining validity factors based on multi-mode data characteristics and combining with actual requirements of the field, and calculating values of different validity factors;
the validity analysis prediction module is responsible for establishing an association relation between a validity factor and a validity objective index, and using an association relation result obtained by analysis to predict the validity of the video to be analyzed;
the reference video recommending module is in charge of recommending videos which can be used for reference from the database by utilizing specified video contents and related parameters according to the videos to be analyzed;
the visual analysis module is responsible for integrating functions and data of the modules, displaying the data and results generated by the modules in different visual forms, and presenting the results in a complete interface, so that a user can know the feedback result of the effectiveness of the video to be analyzed through the interface, and further exploration is supported.
9. A computer device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1-7.
CN202310976858.0A 2023-08-04 2023-08-04 Multi-mode video content effectiveness feedback visual analysis method and system Pending CN116910302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310976858.0A CN116910302A (en) 2023-08-04 2023-08-04 Multi-mode video content effectiveness feedback visual analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310976858.0A CN116910302A (en) 2023-08-04 2023-08-04 Multi-mode video content effectiveness feedback visual analysis method and system

Publications (1)

Publication Number Publication Date
CN116910302A true CN116910302A (en) 2023-10-20

Family

ID=88350971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310976858.0A Pending CN116910302A (en) 2023-08-04 2023-08-04 Multi-mode video content effectiveness feedback visual analysis method and system

Country Status (1)

Country Link
CN (1) CN116910302A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591058A (en) * 2024-01-18 2024-02-23 浙江华创视讯科技有限公司 Display method, device and storage medium for multi-person speech

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591058A (en) * 2024-01-18 2024-02-23 浙江华创视讯科技有限公司 Display method, device and storage medium for multi-person speech

Similar Documents

Publication Publication Date Title
Ginosar et al. Learning individual styles of conversational gesture
KR102018295B1 (en) Apparatus, method and computer-readable medium for searching and providing sectional video
Stappen et al. The multimodal sentiment analysis in car reviews (muse-car) dataset: Collection, insights and improvements
CN116484318B (en) Lecture training feedback method, lecture training feedback device and storage medium
Somandepalli et al. Computational media intelligence: Human-centered machine analysis of media
CN113395578A (en) Method, device and equipment for extracting video theme text and storage medium
US10592733B1 (en) Computer-implemented systems and methods for evaluating speech dialog system engagement via video
CN105979366A (en) Smart television and content recommending method and content recommending device thereof
CN116910302A (en) Multi-mode video content effectiveness feedback visual analysis method and system
US20220405489A1 (en) Formulating natural language descriptions based on temporal sequences of images
Maragos et al. Cross-modal integration for performance improving in multimedia: A review
CN110245253B (en) Semantic interaction method and system based on environmental information
Ponce-López et al. Non-verbal communication analysis in victim–offender mediations
US10915819B2 (en) Automatic real-time identification and presentation of analogies to clarify a concept
Zeng et al. Gesturelens: Visual analysis of gestures in presentation videos
Xiao et al. An introduction to audio and visual research and applications in marketing
Sun et al. In your eyes: Modality disentangling for personality analysis in short video
CN113068077B (en) Subtitle file processing method and device
Sümer et al. Automated anonymisation of visual and audio data in classroom studies
CN116980665A (en) Video processing method, device, computer equipment, medium and product
Dudzik et al. A blast from the past: Personalizing predictions of video-induced emotions using personal memories as context
US11915614B2 (en) Tracking concepts and presenting content in a learning system
WO2022168185A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
CN113743271B (en) Video content effectiveness visual analysis method and system based on multi-modal emotion
Bustos-López et al. Emotion Detection in Learning Environments Using Facial Expressions: A Brief Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination