WO2020215722A1 - Method and device for video processing, electronic device, and computer-readable storage medium - Google Patents

Method and device for video processing, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
WO2020215722A1
WO2020215722A1 PCT/CN2019/121228 CN2019121228W WO2020215722A1 WO 2020215722 A1 WO2020215722 A1 WO 2020215722A1 CN 2019121228 W CN2019121228 W CN 2019121228W WO 2020215722 A1 WO2020215722 A1 WO 2020215722A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
vector
time period
unit time
user
Prior art date
Application number
PCT/CN2019/121228
Other languages
French (fr)
Chinese (zh)
Inventor
赵红亮
李凯
Original Assignee
北京谦仁科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京谦仁科技有限公司 filed Critical 北京谦仁科技有限公司
Publication of WO2020215722A1 publication Critical patent/WO2020215722A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Definitions

  • the present invention relates to the technical field of video processing, in particular to a video processing method and device, electronic equipment and computer-readable storage media.
  • embodiments of the present invention provide a video processing method and device, electronic equipment, and computer-readable storage medium, which can extract high-quality or wonderful video clips based on the interaction between a first user and a second user, and further provide more Rich user experience.
  • a video processing method including:
  • the first video candidate set includes a plurality of video clip pairs, and each video clip pair Including a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
  • obtaining the first vector corresponding to each unit time period according to the first video file includes:
  • the first vector is determined according to the face state parameter and the voice distribution parameter.
  • obtaining the second vector corresponding to each unit time period according to the second video file includes:
  • the face state parameter includes a first value that characterizes the appearance of the face and a second value that characterizes the expression state of the face.
  • the obtaining a third vector according to the first vector and the second vector corresponding to each unit time period includes:
  • the first vector and the second vector corresponding to the same unit time period are combined into a third vector corresponding to the unit time period.
  • determining the fourth vector according to the time axis position corresponding to each video segment pair and the third vector includes:
  • the fourth vector of the target video segment pair is determined according to the element random distribution function corresponding to each element and the sum vector, wherein each element of the fourth vector is the random element of the sum vector in the corresponding element.
  • the quantile value in the distribution function is determined according to the element random distribution function corresponding to each element and the sum vector, wherein each element of the fourth vector is the random element of the sum vector in the corresponding element.
  • selecting a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector includes:
  • the filtered video segment pair is removed from the first video candidate set to obtain a second video candidate set.
  • the determining to filter the video clip according to the fourth vector includes:
  • the corresponding video segment pair is determined as the filtered video segment.
  • the element random distribution function is a binomial distribution function with a corresponding element in the average vector as a mean value and a length matching the window duration.
  • selecting a plurality of first video segments and second video segments from the first video candidate set according to the fourth vector further includes:
  • a video processing device including:
  • the first obtaining unit is configured to obtain at least one first video file of the first user and at least one second video file of the second user;
  • the interception unit is configured to traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set, where the first video candidate set includes a plurality of video clip pairs, each The video clip pair includes a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
  • the second obtaining unit is configured to obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file.
  • the vector is used to characterize the first user state in the corresponding unit time period, and the second vector is used to characterize the second user state in the corresponding unit time period;
  • a third obtaining unit configured to obtain a third vector according to the first vector and the second vector corresponding to each unit time period
  • a fourth acquiring unit configured to determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector
  • a selecting unit configured to select a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector
  • the synthesis unit obtains the target video segment according to the selected first video segment and the second video segment.
  • an electronic device including a memory and a processor, where the memory is used to store one or more computer program instructions, where one or more computer program instructions are executed by the processor to Implement the method as described in the first aspect.
  • a computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method as described in the first aspect when executed by a processor.
  • the embodiment of the application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video candidate set According to the first vector and the second vector representing the state of the user in the video in the unit time period of the first video file, the fourth vector representing the probability distribution value is obtained, and the fourth vector is selected from the first video candidate set according to the fourth vector.
  • a first video segment and a corresponding second video segment and then synthesize the target video segment.
  • the target video segment (such as high-quality or wonderful video segment) can be extracted to fully reflect the interaction between the first user and the second user, thereby providing a richer user experience.
  • FIG. 1 is a flowchart of a video processing method according to an embodiment of the present invention
  • Figure 2 is a data flow diagram of a video processing method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a video processing device according to an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.
  • Fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention.
  • the execution subject of this method is a server, and the video processing method of this embodiment includes:
  • Step S110 Obtain at least one first video file of the first user and at least one second video file of the second user.
  • the server obtains at least one first video file of the first user and the second video file of the second user.
  • the first user may be a student, and the number of the first user may be one, two, four or more, which is not limited in the present invention.
  • the second user may be a teacher, and the number of second users may be one.
  • the number of first users is four, and the number of second users is one, that is, the online teaching mode of the embodiment of the present invention is "one to four".
  • the first video file may be a multimedia file when the first user performs online learning, and it may include real-time audio and video information of the first user.
  • the second video file may be a multimedia file during online teaching by the second user, and it may include real-time audio and video information of the second user.
  • the formats of the first video file and the second video file may include, but are not limited to, .AVI, .MOV, .RM, .MPEG, .ASF, etc.
  • Step S120 Traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set.
  • the first video candidate set includes multiple video clips, and the multiple video clips include windows with the same duration and Multiple first video clips and multiple corresponding second video clips with the same time axis position.
  • the window duration is represented by t
  • the first video candidate set is represented by R.
  • the server uses t as the window duration to traverse and intercept the first video file and the second video file to obtain the first video candidate set.
  • the second video segment and the first video segment corresponding to each other have the same window duration and time axis position.
  • the window duration may be, for example, 10 seconds, 13 seconds, 16 seconds, 19 seconds, 22 seconds, 25 seconds, etc.
  • the value range of the window duration t is [10, 13, 16, 19, 22, 25]
  • Use 13 seconds as the window duration to slide and capture video clips in accordance with the sliding step (e.g. 1 second)
  • the finally obtained first video candidate set R can be expressed as ⁇ 0-10s, 1-11s,..., 0-13s, 1-14s,... ⁇ . That is, the first video candidate set may include multiple video clips with a window duration of 10 seconds, or multiple video clips with a window duration of 13 seconds, and may also include a window duration of 16 seconds, 19 seconds, and 22 seconds. And multiple video clips of 25 seconds.
  • the window time length and the sliding step length may be the default time length of the system, or the time length preset by the administrator according to needs, and the present invention does not limit this.
  • the first video candidate set is not limited to include multiple video clips with different window durations as described above, but may only include video clips with the same window duration, for example, the first video candidate set It may only include multiple video clips with a window length of 10 seconds, or only include multiple video clips with a window length of 13 seconds.
  • Step S130 Obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file, where the first vector is used to characterize Corresponding to the first user state in the unit time period, the second vector is used to represent the second user state in the corresponding unit time period.
  • step S130 there is no dependency relationship between step S130 and step S120.
  • the order of execution of the two can be performed simultaneously or in a predetermined order. For example, step S120 is executed first, and then step S130 is executed. Step S130 is performed first, and then step S120 is performed.
  • the server analyzes multiple video clips according to a predetermined unit time period (e.g., in a second-by-second manner) and/or a set number of frames (e.g., in a frame-by-frame manner) to obtain each
  • the first vector corresponding to each time period, where the first vector is represented by Vs, is used to characterize the state of the first user or the second user of each video segment in the multiple video segments.
  • the state of the first user or the second user is represented based on three dimensional information of voice, facial appearance, and facial expression.
  • the status of the first user or the second user includes: in the video clip, whether the first user or the second user is talking at each set time period (for example, analyzing and judging whether the first user or the second user is talking every second in a second-by-second manner) ), whether the face of the first user or the second user appears in each frame, and whether the expression of the first user or the second user is happy, etc.
  • obtaining the first vector corresponding to each unit time period according to the first video file in step S130 includes:
  • Step S131 Determine video data and audio data corresponding to the target unit time period according to the first video file.
  • Step S132 Perform face recognition on multiple image frames of the video data respectively, and obtain face state parameters corresponding to each image frame.
  • the face state parameter includes a first value that characterizes the appearance of the face and a second value that characterizes the expression state of the face.
  • Step S133 Perform voice recognition on the audio data to obtain voice distribution parameters.
  • Step S134 Determine the first vector according to the face state parameter and the voice distribution parameter.
  • Face recognition is a kind of biometric recognition technology based on human facial feature information. It uses a camera or camera to collect images or video streams containing human faces, and automatically detects and tracks the human face in the image, and then detects Face recognition and other related technologies. Face recognition algorithms can include, but are not limited to, recognition algorithms based on facial feature points (Feature-based recognition algorithms), recognition algorithms based on the entire face image (Appearance-based recognition algorithms), and template-based recognition algorithms (Template- based on recognition algorithms), recognition algorithms using neural networks, based on light estimation model theory, etc.
  • the face state parameter refers to whether a human face appears every second, and whether the facial expression is happy.
  • Speech recognition is to convert a piece of speech signal into corresponding text information.
  • the system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding.
  • the sound signal performs audio data preprocessing such as filtering and framing to appropriately extract the audio signal that needs to be analyzed from the original signal;
  • the feature extraction work converts the sound signal from the time domain to the frequency domain to provide appropriate features for the acoustic model Vector;
  • the acoustic model calculates the score of each feature vector on the acoustic feature according to the acoustic characteristics; while the language model calculates the probability of the sound signal corresponding to the possible sequence of phrases according to the linguistic theory; finally, according to the existing dictionary, Decode the phrase sequence to get the final possible text representation.
  • Speech recognition algorithms may include, but are not limited to, Gaussian Mixed Model (GMM) algorithms, Dynamic Time Warping (DTW) algorithms, Connectionist temporal classification (CTC) algorithms, etc.
  • GMM Gaussian
  • the first vector Take obtaining the first vector as an example.
  • a predetermined unit time period for example, a period of 1 second
  • the three dimensions of face appearance and facial expressions obtain the attribute information of the video part (including audio and video) corresponding to the time period, denoted as [Ss, Fs, Es].
  • Ss represents the state of the voice in the corresponding unit time period
  • Fs represents the appearance of the face in the corresponding unit time period
  • Es represents the facial expression in the corresponding unit time period.
  • the voice dimension for each unit time period (for example, every second), it is determined by voice analysis whether the first user is continuously speaking in the audio data of the video segment, and the determination result is represented by Ss. For example, for 1 second in a video clip, if voice information is continuously detected during this time period, it means that in the clip, the first user is speaking during this time period, so Ss is 1, otherwise the value is zero.
  • the second-by-second analysis is performed to obtain information that characterizes the face appearance situation and the face expression dimension.
  • First extract all image frames or extract some image frames from the video data of each second of the video file. For each extracted image frame, it is determined by image recognition whether the face of the first user appears in each image frame and whether the facial expression is happy.
  • Use Ff to indicate whether a human face appears in an image frame and use Ef to indicate whether the facial expression is happy in an image frame.
  • the value of Ff is 0 or 1
  • the value of Ef is 0 or 1.
  • the images of each image frame of 24 frames can be extracted, and face recognition can be performed respectively to obtain 24 [Ff, Ef] components the sequence of.
  • the images of each image frame of 24 frames can be extracted, and face recognition can be performed respectively to obtain 24 [Ff, Ef] components the sequence of.
  • the images of each image frame of 24 frames can be extracted, and face recognition can be performed respectively to obtain 24 [Ff, Ef] components the sequence of.
  • partial image frames such as 8 frames
  • 8 frames of images can be extracted at intervals for each second in a video clip, and face recognition can be performed separately. Get 8 sequences composed of [Ff,Ef].
  • the sequence of the above-mentioned [Ff, Ef] values corresponding to each second of the video data in the video slice pair is combined for a predetermined period of time (when the predetermined period of time is 1 second, the combination is performed second by second), that is, if every second Ff contains 2 or more 1 values, then the face value Fs or expression value Es in that second is 1; otherwise, it is 0.
  • step S130 obtaining the second vector corresponding to each unit time period according to the second video file specifically includes:
  • Step S135 Determine the video data and audio data corresponding to the target unit time period according to the second video file.
  • Step S136 Perform face recognition on multiple image frames of the video data respectively, and obtain face state parameters corresponding to each image frame.
  • Step S137 Perform voice recognition on the audio data to obtain voice distribution parameters.
  • Step S138 Obtain the second vector according to the face state parameter and the voice distribution parameter.
  • steps S131 to S134 and steps s135 to S138 is not distinguished, and they can be executed in parallel or in a set sequence.
  • a corresponding first vector Vss can be obtained.
  • a corresponding second vector Vts can be obtained.
  • Step S140 Obtain a third vector for each unit time period according to the first vector and the second vector corresponding to each unit time period.
  • the third vector can be obtained by combining Vss and Vts.
  • the third vector is a 6-dimensional vector, which can characterize the relationship between the first user (such as a student) and the second user (such as a teacher) in the same unit time period in the first video file and the second video file. status.
  • a third vector with a dimension of 6 can be obtained by combining the first vector of the first video file and the second vector of the second video file with the same time axis coordinate.
  • the third vector includes voice data, face data and expression data of the first user and voice data, face data and expression data of the second user.
  • Step S150 Determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector.
  • this step includes the following sub-steps:
  • Step S151 Determine the target video segment pair.
  • Step S152 Determine corresponding multiple target unit time periods according to the time axis position of the target video clip pair.
  • Step S153 Calculate the sum vector of the third vector of the multiple target unit time periods.
  • Step S154 Determine the element random distribution function corresponding to each element according to the average vector and the window duration, and the average vector is calculated based on the average calculation of the third vector of each unit time period in the multiple video files.
  • Step S155 Determine a fourth vector of the target video segment pair according to the element random distribution function corresponding to each element and the sum vector, where each element of the fourth vector is the corresponding element of the sum vector The quantile value in the random distribution function of the elements.
  • step S153 for each unit time period (that is, every second) in each video segment pair, a corresponding third vector can be obtained. Then, by summing the third vector of multiple unit time periods in the time axis covered by a video clip (that is, summing each element of the third vector), a video clip can be obtained. For the corresponding fourth vector.
  • a large number (for example, 10,000) video files similar to the first video file and the second video file may be extracted in advance to determine the average vector.
  • the teaching videos include student video files and teacher video files.
  • For the above video files analyze and merge according to the unit time period to obtain each unit time period The third vector. Then average the third vector to get the mean value.
  • step S154 the element random distribution function of each element corresponding to different time window lengths can be obtained according to the average value in the average value vector obtained above and the corresponding length of the video segment.
  • the binomial distribution is to repeat n independent Bernoulli tests. In each experiment, there are only two possible outcomes, and the two outcomes are opposite to each other and independent of each other. The probability of the occurrence of the event remains unchanged in each independent experiment. , Then this series of experiments is collectively called n-fold Bernoulli experiment. When the number of trials is 1, the binomial distribution obeys the 0-1 distribution. The binomial distribution is determined by the mean and the number or length of experiments.
  • each element conforms to the binomial distribution B(t, avg), where t is the window duration of the video segment.
  • the value interval of the independent variable of B is [0, t].
  • the element random distribution vector B [Bsst, Bsft, Bset, Btst, Btft, Btet].
  • step S155 for each element random distribution function, its quantile value in the corresponding element random distribution function can be determined according to the corresponding element value in the third vector, thereby determining the fourth vector. That is, each element of the fourth vector is the quantile value of the corresponding element of the sum vector in the corresponding element random distribution function.
  • Step S160 Select multiple first video clips and second video clips from the first video candidate set according to the fourth vector.
  • the server screens and sorts multiple first video clips in the first video candidate set according to the fourth vector, and selects multiple first video clips according to the screening and ranking results (for example, the top three The first video segment) serves as the first target video segment.
  • step S160 includes the following sub-steps:
  • Step S161 Determine a filtered video segment pair according to the fourth vector.
  • the corresponding video segment pair is determined as the filtered video segment.
  • the filter video segment pair is determined according to the screening conditions of Bsst ⁇ 0.4, Bsft ⁇ 0.4, Bset ⁇ 0.2, Btst ⁇ 0.4, Btft ⁇ 0.4, and Btet ⁇ 0.2.
  • Step S162 removes the filtered video segment pair from the first video candidate set to obtain a second video candidate set.
  • step S160 further includes:
  • Step S163 Calculate the score value of each video segment pair in the second video candidate set according to the fourth vector.
  • each video segment in the second video candidate set is added to each element of the corresponding fourth vector to obtain the score value.
  • Step S164 Sort and filter multiple first video clips in the second video candidate set according to the score value until the number of remaining first video clips in the second video candidate set meets a predetermined condition.
  • the top N first video clips and the corresponding second video clips can be directly selected as the basis for the next step, where N ⁇ 1 and an integer.
  • the video clip with the highest score is selected, and all the time axes are removed from the second video candidate set, and there is overlap with the video clip with the highest score.
  • For video segments update the second video candidate set, and then enter the next iteration, until the number of remaining first video segments in the second video candidate set meets a predetermined condition.
  • the embodiment of the present application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video file.
  • Video candidate set according to the first vector and second vector representing the state of the user in the video in the unit time period of the first video file, obtain the fourth vector representing the probability distribution value of the first video file, and obtain the fourth vector representing the probability distribution value from the first video candidate according to the fourth vector
  • the video processing method of FIG. 1 further includes:
  • Step 170 Obtain a target video segment according to the selected first video segment and second video segment.
  • the target video segment is obtained by splicing multiple first video segments and second video segments selected from the first video candidate set. For example, if three first video clips such as 0-10s, 15-33s, and 35-57s are selected from the first video candidate set as the first target video clip, the corresponding second target video clip also includes 0- Three second video clips of 10s, 15-33s and 35-57s.
  • Fig. 2 is a data flow diagram of a method according to an embodiment of the present invention.
  • the data processing process of the embodiment of the present invention will be exemplified by taking a student video in a network classroom and a teacher teaching video synchronized recording as an example.
  • step S110 the first video file S of the first user (the student's video file in this example) and the second user's second video file T (the teacher's video file in this example) are acquired.
  • step S120 video clips are clipped by sliding through multiple different window durations to obtain the first video candidate set.
  • two window durations of 10s and 13s are used for sliding interception.
  • the duration of the video clip is the same as the duration of the window used for sliding capture.
  • the first video segment and the second video segment with the same time axis constitute a video segment pair.
  • step S130 the data of the first video file and the second video file are analyzed second by second (that is, the unit time period is 1 second). Acquire the first vector Vss corresponding to each second in the first video file and the second vector Vts corresponding to each of the second video files.
  • step S140 the first vector Vss and the second vector Vts of each second are merged into a third vector Vs.
  • the third vector Vs [1, 0, 0, 1, 1, 0].
  • the third vector Vs [1,1,1,1,1,1].
  • step S150 a fourth vector is determined according to the time axis position corresponding to each video segment pair and the third vector.
  • Bsst is the random distribution function of the elements of the student’s voice
  • Bsft is the random distribution function of the elements of the student’s face
  • Bset is the random distribution function of the elements of the student’s facial expressions
  • Btst is the random distribution of the elements of the teacher’s voice Function
  • Btft is the random distribution function of the elements of the teacher's face
  • Bset is the random distribution function of the elements of the teacher's face.
  • the above-mentioned element random distribution function is determined according to the pre-calculated mean value and the corresponding window duration. Taking Bsst as an example, Bsst obeys the binomial distribution B(t, savg), and the range of the independent variable of B is [0, t].
  • each solution of the multiple video segments Rt (that is, the video segment pair) is respectively summed in six dimensions to obtain the sum vector corresponding to the video segment pair Rt.
  • the third vector Vs1-Vs10 corresponding to each second is added to obtain a six-dimensional sum vector.
  • the quantile vector (that is, the fourth vector) of each feasible solution in B can be calculated. Specifically, the quantile value of the element value in the sum vector in the random distribution function of the corresponding element is calculated to obtain a fourth vector composed of six quantile values. For example, if the first element in the sum vector is 4, calculate the median value of 4 in the Bsst distribution.
  • step S160 a plurality of first video clips and second video clips are selected from the first video candidate set according to the fourth vector.
  • the six elements in the fourth vector of each video segment pair (that is, the solution) are summed to obtain a sum. Then, sort based on the result of the summation to extract the maximum value, remove all candidate sets in R'that overlap with the time period represented by the maximum value to obtain a new R', and repeat this step until three video clips are taken out.
  • the corresponding fourth vectors are b1-b6 respectively.
  • the 6 elements in the vector are summed to obtain the sum value s1-s6.
  • b1 ⁇ 0.5,0.5,0.3,0.5,0.5,0.4 ⁇
  • s1-s6 are sorted, and if s2 is the largest, other video segment pairs that overlap the time axis and video segment pair r2 (s2 and its corresponding) are removed, and the second video candidate set R'is updated. Iteratively executes until the number of remaining video segment pairs in the updated video candidate set R'meets the requirement (for example, there are 3 remaining).
  • the binomial distribution function satisfied by the voice data of the first user is B(10, 0.2), and the corresponding distribution and quantile values are:
  • prb(2) 0.3020, 0.6778
  • prb(4) 0.0881, 0.9672
  • prb(8) 0.0001, 1.0000
  • the first column is that the duration of the voice is at i
  • the probability between and i-1 the probability that the duration of the speech in the second column is less than or equal to i seconds.
  • the second column above can be used as the quantile value of the corresponding parameter i.
  • the above parameters can be calculated according to the probability calculation formula of the binomial distribution when the binomial distribution function is determined.
  • the quantile value when the first user’s speech duration is less than or equal to 0 seconds, the quantile value is 0.1074; when the speech duration is less than or equal to 1 second, the quantile value is 0.3758; when the speech duration is less than or equal to 2 seconds, the quantile value is 0.3758. The value is 0.6778, and so on. Further, the quantile value is compared with the preset quantile value (for example, 0.4), and if the quantile value is less than or equal to the preset quantile value, the video clip corresponding to the quantile value is filtered out; If the value is greater than the preset quantile value and meets the requirements of other elements at the same time, the corresponding video clip is retained.
  • the preset quantile value for example, 0.4
  • step 170 a target video segment is obtained according to the selected first video segment and second video segment.
  • the first video segment and the second video segment pair in the last remaining video segment pair in the second video candidate set are synthesized to form a beautiful video.
  • splicing can be performed, so that the selected first video segment and the second video segment can be displayed in the same screen at the same time, that is, the first video segment and the second video segment can be displayed in the same screen at the same time.
  • the embodiment of the application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video candidate set According to the first vector and the second vector representing the state of the user in the video in the unit time period of the first video file, the fourth vector representing the probability distribution value is obtained, and the fourth vector is selected from the first video candidate set according to the fourth vector.
  • a first video segment and a corresponding second video segment and then synthesize the target video segment. In this way, high-quality or wonderful video clips can be extracted to fully embody the interaction between the first user and the second user, thereby providing a richer user experience.
  • Fig. 3 is a schematic diagram of a video processing device according to an embodiment of the present invention.
  • the video processing device 3 of this embodiment includes a first acquisition unit 31, an interception unit 32, a second acquisition unit 33, a third acquisition unit 32, a fourth acquisition unit 35, a selection unit 36, and a synthesis unit 37. .
  • the first obtaining unit 31 is configured to obtain at least one first video file of the first user and at least one second video file of the second user.
  • the interception unit 32 is configured to traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set.
  • the first video candidate set includes a plurality of video clip pairs, each The video clip pair includes a first video clip and a corresponding second video clip with the same window duration and the same time axis position.
  • the second obtaining unit 33 is configured to obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file.
  • the vector is used to characterize the first user state in the corresponding unit time period
  • the second vector is used to characterize the second user state in the corresponding unit time period.
  • the third obtaining unit 34 is configured to obtain a third vector according to the first vector and the second vector corresponding to each unit time period.
  • the fourth acquiring unit 35 is configured to determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector.
  • the selecting unit 36 is configured to select multiple first video clips and second video clips from the first video candidate set according to the fourth vector.
  • the synthesis unit 37 is configured to obtain a target video segment according to the selected first video segment and second video segment.
  • the embodiment of the present application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video file.
  • Video candidate set according to the first vector and second vector representing the state of the user in the video in the unit time period of the first video file, obtain the fourth vector representing the probability distribution value of the first video file, and obtain the fourth vector representing the probability distribution value from the first video candidate according to the fourth vector
  • Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.
  • the electronic device 4 shown in FIG. 4 is a general data processing device, which includes a general computer hardware structure, and at least includes a processor 41 and a memory 42.
  • the processor 41 and the memory 42 are connected by a bus 43.
  • the memory 42 is suitable for storing instructions or programs executable by the processor 41.
  • the processor 41 may be an independent microprocessor, or a collection of one or more microprocessors. In this way, the processor 41 executes the command stored in the memory 42 to execute the method procedure of the embodiment of the present invention described above to realize data processing and control of other devices.
  • the bus 43 connects the above-mentioned multiple components together, and at the same time connects the above-mentioned components to the display controller 44 and the display device and the input/output (I/O) device 45.
  • the input/output (I/O) device 45 may be a mouse, a keyboard, a modem, a network interface, a touch input device, a motion sensing input device, a printer, and other devices known in the art.
  • an input/output (I/O) device 45 is connected to the system through an input/output (I/O) controller 46.
  • the memory 42 may store software components, such as an operating system, a communication module, an interaction module, and an application program.
  • software components such as an operating system, a communication module, an interaction module, and an application program.
  • Each module and application program described above corresponds to a set of executable program instructions that complete one or more functions and methods described in the embodiments of the invention.
  • aspects of the embodiments of the present invention may be implemented as a system, a method, or a computer program product. Therefore, various aspects of the embodiments of the present invention can take the following forms: a complete hardware implementation, a complete software implementation (including firmware, resident software, microcode, etc.), or can be generally referred to as "circuits" and “modules” in this document. "Or “system” is an implementation that combines software and hardware.
  • aspects of the present invention may take the following form: a computer program product implemented in one or more computer-readable media, the computer-readable medium having computer-readable program codes implemented thereon.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any appropriate combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that can contain or store a program used by an instruction execution system, device, or device or a program used in conjunction with an instruction execution system, device, or device.
  • the computer-readable signal medium may include a propagated data signal having computer-readable program code implemented therein as in baseband or as part of a carrier wave. Such a propagated signal can take any of a variety of forms, including but not limited to: electromagnetic, optical, or any suitable combination.
  • the computer-readable signal medium may be any of the following computer-readable media: it is not a computer-readable storage medium, and it can communicate and propagate the program used by the instruction execution system, device or device or used in conjunction with the instruction execution system, device or device Or transmission.
  • Any suitable medium including but not limited to wireless, wired, fiber optic cable, RF, etc. or any appropriate combination of the foregoing can be used to transmit the program code implemented on the computer-readable medium.
  • the computer program code used to perform operations directed to various aspects of the present invention can be written in any combination of one or more programming languages, and the programming languages include: object-oriented programming languages such as Java, Smalltalk, C++, PHP, Python, etc.; And conventional process programming languages such as "C" programming language or similar programming languages.
  • the program code can be executed as an independent software package entirely on the user's computer, partly on the user's computer; partly on the user's computer and partly on a remote computer; or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any type of network including a local area network (LAN) or a wide area network (WAN), or can be connected with an external computer (for example, by using the Internet of an Internet service provider) .
  • LAN local area network
  • WAN wide area network
  • These computer program instructions can also be stored in a computer-readable medium that can direct a computer, other programmable data processing equipment, or other devices to operate in a specific manner, so that the generation of instructions stored in the computer-readable medium includes implementation in the flowcharts and / Or block diagram block or the product of the instruction of the function/action specified in the block.
  • Computer program instructions can also be loaded on a computer, other programmable data processing equipment or other devices, so that a series of operable steps are executed on the computer, other programmable equipment or other devices to generate a computer-implemented process, so that the computer Or instructions executed on other programmable devices provide a process for implementing the functions/actions specified in the flowchart and/or block diagrams or blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Disclosed are a method and device for video processing, an electronic device, and a computer-readable storage medium. With the embodiments of the present application, a first video file of at least one first user and a second video file of a second user are acquired, the first video file and the second video file are traversed and captured on the basis of at least one window duration to acquire a first video candidate set, acquired on the basis of a first vector and a second vector in a unit time cycle of the first video file for expressing a state of the user in the video is a fourth vector that characterizes the probability distribution values, multiple first video clips and corresponding second video clips are selected from the first video candidate set on the basis of the fourth vector, and a target video clip is thus synthesized. As such, high-quality or highlight video clips can be extracted to fully embody exchanges between the first user and the second user, thus providing enriched user experience.

Description

视频处理方法和装置、电子设备及计算机可读存储介质Video processing method and device, electronic equipment and computer readable storage medium
本申请要求了2019年4月26日提交的、申请号为2019103456254、发明名称为“视频处理方法和装置、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on April 26, 2019, with the application number 2019103456254, and the title of the invention "video processing method and device, electronic equipment, and computer-readable storage medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本发明涉及视频处理技术领域,具体涉及一种视频处理方法和装置、电子设备及计算机可读存储介质。The present invention relates to the technical field of video processing, in particular to a video processing method and device, electronic equipment and computer-readable storage media.
背景技术Background technique
随着互联网技术和教育信息化的高速发展,移动多媒体教学平台的应用也越来越广泛。现有技术中,精彩视频的提取主要是通过对样本进行训练生成模型,再使用该模型对教学视频进行处理,由此,无法保证很好地捕捉到老师与学生之间的互动的精彩瞬间。如果人工方式提取,则工作量巨大,几乎不可能完成。由此,导致无法精确地提取老师与学生之间的互动的精彩视频片段,并进一步导致用户体验差。With the rapid development of Internet technology and education informatization, the application of mobile multimedia teaching platforms has become more and more extensive. In the prior art, the extraction of wonderful videos is mainly to generate a model by training samples, and then use the model to process the teaching videos. Therefore, it is impossible to ensure that the wonderful moments of the interaction between the teacher and the students are well captured. If it is extracted manually, the workload is huge and it is almost impossible to complete. As a result, it is impossible to accurately extract the wonderful video clips of the interaction between the teacher and the student, and further leads to a poor user experience.
发明内容Summary of the invention
有鉴于此,本发明实施例提供一种视频处理方法和装置、电子设备及计算机可读存储介质,能够基于第一用户与第二用户之间的交互提取优质或精彩视频片段,并进一步提供更加丰富的用户体验。In view of this, embodiments of the present invention provide a video processing method and device, electronic equipment, and computer-readable storage medium, which can extract high-quality or wonderful video clips based on the interaction between a first user and a second user, and further provide more Rich user experience.
根据本发明实施例的第一方面,提供一种视频处理方法,包括:According to a first aspect of the embodiments of the present invention, a video processing method is provided, including:
获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件;Acquiring at least one first video file of the first user and at least one second video file of the second user;
根据至少一个窗口时长对所述第一视频文件和所述第二视频文件进行遍历截取以获得第一视频候选集,所述第一视频候选集包括多个视频片段对,各所述视频片段对包括窗口时长相同且时间轴位置相同的第一视频片段和对应的第二视频片段;Traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set. The first video candidate set includes a plurality of video clip pairs, and each video clip pair Including a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单位时间周期内的第二用户状态;Obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file, and the first vector is used to characterize the corresponding unit A first user state in a time period, where the second vector is used to represent a second user state in a corresponding unit time period;
根据每个单位时间周期对应的所述第一向量和所述第二向量获取每个单位时间周期的第三向量;Obtaining a third vector for each unit time period according to the first vector and the second vector corresponding to each unit time period;
根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量;Determine the fourth vector according to the time axis position corresponding to each video segment pair and the third vector;
根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段;Selecting multiple first video segments and second video segments from the first video candidate set according to the fourth vector;
根据选取的第一视频片段和第二视频片段获取目标视频片段。Obtain the target video segment according to the selected first video segment and the second video segment.
优选地,根据所述的第一视频文件获取每个单位时间周期对应的第一向量包括:Preferably, obtaining the first vector corresponding to each unit time period according to the first video file includes:
根据第一视频文件确定目标单位时间周期对应的视频数据和音频数据;Determining the video data and audio data corresponding to the target unit time period according to the first video file;
分别对所述视频数据的多个图像帧进行人脸识别,获取每个图像帧对应的人脸状态参数;Performing face recognition on a plurality of image frames of the video data respectively, and obtaining face state parameters corresponding to each image frame;
对所述音频数据进行语音识别,获取语音分布参数;Perform voice recognition on the audio data to obtain voice distribution parameters;
根据所述人脸状态参数和所述语音分布参数确定所述第一向量。The first vector is determined according to the face state parameter and the voice distribution parameter.
优选地,根据所述第二视频文件获取每个单位时间周期对应的第二向量包括:Preferably, obtaining the second vector corresponding to each unit time period according to the second video file includes:
根据第二视频文件确定目标单位时间周期对应的视频数据和音频数据;Determine the video data and audio data corresponding to the target unit time period according to the second video file;
分别对所述视频数据的多个图像帧进行人脸识别,得到每个图像帧对应的人脸状态参数;Performing face recognition on multiple image frames of the video data, respectively, to obtain face state parameters corresponding to each image frame;
对所述音频数据进行语音识别,获取语音分布参数;Perform voice recognition on the audio data to obtain voice distribution parameters;
根据所述人脸状态参数和所述语音分布参数获取所述第二向量。Obtaining the second vector according to the face state parameter and the voice distribution parameter.
优选地,所述人脸状态参数包括表征人脸出现情况的第一值和表征人脸表情状态的第二值。Preferably, the face state parameter includes a first value that characterizes the appearance of the face and a second value that characterizes the expression state of the face.
优选地,所述根据每个单位时间周期对应的所述第一向量和所述第二向量获取第三向量包括:Preferably, the obtaining a third vector according to the first vector and the second vector corresponding to each unit time period includes:
将相同单位时间周期对应的第一向量和第二向量合并为所述单位时间周期对应的第三向量。The first vector and the second vector corresponding to the same unit time period are combined into a third vector corresponding to the unit time period.
优选地,根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量包括:Preferably, determining the fourth vector according to the time axis position corresponding to each video segment pair and the third vector includes:
确定目标视频片段对;Determine the target video clip pair;
根据所述目标视频片段对的时间轴位置确定对应的多个目标单位时间周期;Determining corresponding multiple target unit time periods according to the time axis position of the target video clip pair;
计算所述多个目标单位时间周期的第三向量的和向量;Calculating the sum vector of the third vector of the multiple target unit time periods;
根据平均向量和窗口时长确定每个元素对应的元素随机分布函数,所述平均向 量根据多个视频文件中的各单位时间周期的第三向量平均计算获得;Determine the element random distribution function corresponding to each element according to the average vector and the window duration, and the average vector is calculated by averaging the third vector in each unit time period in the multiple video files;
根据每个元素对应的元素随机分布函数和所述和向量确定所述目标视频片段对的第四向量,其中,所述第四向量的各元素为所述和向量的对应元素在对应的元素随机分布函数中的分位值。The fourth vector of the target video segment pair is determined according to the element random distribution function corresponding to each element and the sum vector, wherein each element of the fourth vector is the random element of the sum vector in the corresponding element. The quantile value in the distribution function.
优选地,根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段包括:Preferably, selecting a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector includes:
根据所述第四向量确定过滤视频片段对;Determining a filtered video segment pair according to the fourth vector;
从所述第一视频候选集中去除掉所述过滤视频片段对以获得第二视频候选集。The filtered video segment pair is removed from the first video candidate set to obtain a second video candidate set.
优选地,所述根据所述第四向量确定过滤视频片段包括:Preferably, the determining to filter the video clip according to the fourth vector includes:
响应于所述第四向量中所述各元素中任一项小于对应的分位值阈值,将对应的视频片段对确定为所述过滤视频片段。In response to any one of the elements in the fourth vector being less than the corresponding quantile threshold, the corresponding video segment pair is determined as the filtered video segment.
优选地,所述元素随机分布函数为以平均向量中对应元素为均值,长度和所述窗口时长匹配的二项分布函数。Preferably, the element random distribution function is a binomial distribution function with a corresponding element in the average vector as a mean value and a length matching the window duration.
优选地,根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段还包括:Preferably, selecting a plurality of first video segments and second video segments from the first video candidate set according to the fourth vector further includes:
根据所述第四向量计算所述第二视频候选集中的各视频片段对的评分值;Calculating the score value of each video segment pair in the second video candidate set according to the fourth vector;
根据所述评分值对所述第二视频候选集中的多个第一视频片段进行排序和过滤,直至所述第二视频候选集中剩余的第一视频片段的数量满足预定条件。Sorting and filtering multiple first video clips in the second video candidate set according to the score value until the number of remaining first video clips in the second video candidate set meets a predetermined condition.
根据本发明实施例的第二方面,提供一种视频处理装置,包括:According to a second aspect of the embodiments of the present invention, there is provided a video processing device, including:
第一获取单元,用于获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件;The first obtaining unit is configured to obtain at least one first video file of the first user and at least one second video file of the second user;
截取单元,用于根据至少一个窗口时长对所述第一视频文件和所述第二视频文件进行遍历截取以获得第一视频候选集,所述第一视频候选集包括多个视频片段对,各所述视频片段对包括窗口时长相同且时间轴位置相同的第一视频片段和对应的第二视频片段;The interception unit is configured to traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set, where the first video candidate set includes a plurality of video clip pairs, each The video clip pair includes a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
第二获取单元,用于根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单位时间周期内的第二用户状态;The second obtaining unit is configured to obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file. The vector is used to characterize the first user state in the corresponding unit time period, and the second vector is used to characterize the second user state in the corresponding unit time period;
第三获取单元,用于根据每个单位时间周期对应的所述第一向量和所述第二向 量获取第三向量;A third obtaining unit, configured to obtain a third vector according to the first vector and the second vector corresponding to each unit time period;
第四获取单元,用于根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量;A fourth acquiring unit, configured to determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector;
选取单元,用于根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段;以及A selecting unit, configured to select a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector; and
合成单元,根据选取的第一视频片段和第二视频片段得到目标视频片段。The synthesis unit obtains the target video segment according to the selected first video segment and the second video segment.
根据本发明实施例的第三方面,提供一种电子设备,包括存储器和处理器,其中,存储器用于存储一条或多条计算机程序指令,其中,一条或多条计算机程序指令被处理器执行以实现如第一方面所述的方法。According to a third aspect of the embodiments of the present invention, there is provided an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where one or more computer program instructions are executed by the processor to Implement the method as described in the first aspect.
根据本发明实施例的第四方面,提供一种计算机可读存储介质,其上存储计算机程序指令,其中,计算机程序指令在被处理器执行时实现如第一方面所述的方法。According to a fourth aspect of the embodiments of the present invention, there is provided a computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method as described in the first aspect when executed by a processor.
本申请实施例通过获取至少一个第一用户的第一视频文件和第二用户的第二视频文件,根据至少一个窗口时长对第一视频文件和第二视频文件进行遍历截取获得第一视频候选集,根据第一视频文件的单位时间周期中表征视频内用户的状态的第一向量和第二向量来获取表征其概率分布值的第四向量,并根据第四向量从第一视频候选集中选取多个第一视频片段和对应的第二视频片段,进而合成目标视频片段。由此,能够提取目标视频片段(如:优质或精彩视频片段)以充分体现第一用户与第二用户之间的交互,由此,提供了更加丰富的用户体验。The embodiment of the application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video candidate set According to the first vector and the second vector representing the state of the user in the video in the unit time period of the first video file, the fourth vector representing the probability distribution value is obtained, and the fourth vector is selected from the first video candidate set according to the fourth vector. A first video segment and a corresponding second video segment, and then synthesize the target video segment. As a result, the target video segment (such as high-quality or wonderful video segment) can be extracted to fully reflect the interaction between the first user and the second user, thereby providing a richer user experience.
附图说明Description of the drawings
通过以下参照附图对本发明实施例的描述,本发明的上述以及其它目的、特征和优点将更为清楚,在附图中:Through the following description of the embodiments of the present invention with reference to the accompanying drawings, the above and other objectives, features and advantages of the present invention will be clearer, in the accompanying drawings:
图1是本发明实施例的视频处理方法的流程图;FIG. 1 is a flowchart of a video processing method according to an embodiment of the present invention;
图2是本发明实施例的视频处理方法的数据流向图;Figure 2 is a data flow diagram of a video processing method according to an embodiment of the present invention;
图3是本发明实施例的视频处理装置的示意图;FIG. 3 is a schematic diagram of a video processing device according to an embodiment of the present invention;
图4是本发明实施例的电子设备的示意图。Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
以下根据实施例对本发明进行描述,但是本发明并不仅仅限于这些实施例。在下文对本发明的细节描述中,详尽描述了一些特定的细节部分。对本领域技术人员来 说没有这些细节部分的描述也可以完全理解本发明。为了避免混淆本发明的实质内容,公知的方法、过程、流程、元件和电路并没有详细叙述。The present invention will be described below based on embodiments, but the present invention is not limited to these embodiments. In the following detailed description of the present invention, some specific details are described in detail. Those skilled in the art can fully understand the present invention without the description of these details. In order to avoid obscuring the essence of the present invention, the well-known methods, processes, procedures, components and circuits are not described in detail.
此外,本领域普通技术人员应当理解,在此提供的附图都是为了说明的目的,并且附图不一定是按比例绘制的。In addition, those of ordinary skill in the art should understand that the drawings provided herein are for illustrative purposes, and the drawings are not necessarily drawn to scale.
除非上下文明确要求,否则整个说明书和权利要求书中的“包括”、“包含”等类似词语应当解释为包含的含义而不是排他或穷举的含义;也就是说,是“包括但不限于”的含义。Unless the context clearly requires, the words "including", "including" and other similar words in the entire specification and claims should be interpreted as inclusive rather than exclusive or exhaustive meanings; in other words, "including but not limited to" Meaning.
在本发明的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。In the description of the present invention, it should be understood that the terms "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or implying relative importance. In addition, in the description of the present invention, unless otherwise specified, "plurality" means two or more.
图1是本发明实施例的视频处理方法的流程图。如图1所示,该方法的执行主体为服务器,本实施例的视频处理方法包括:Fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention. As shown in Figure 1, the execution subject of this method is a server, and the video processing method of this embodiment includes:
步骤S110、获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件。Step S110: Obtain at least one first video file of the first user and at least one second video file of the second user.
在本发明实施例中,服务器获取第一用户的至少一个第一视频文件和第二用户的第二视频文件。这里,第一用户可以是学生,并且第一用户的数量可以是一个、两个、四个或更多个,本发明对此不作限制。第二用户可以是老师,并且第二用户的数量可以是一个。优选地,在该实施例中,第一用户的数量为四个,第二用户的数量为一个,也就是说,本发明实施例的在线教学模式为“一对四”。In the embodiment of the present invention, the server obtains at least one first video file of the first user and the second video file of the second user. Here, the first user may be a student, and the number of the first user may be one, two, four or more, which is not limited in the present invention. The second user may be a teacher, and the number of second users may be one. Preferably, in this embodiment, the number of first users is four, and the number of second users is one, that is, the online teaching mode of the embodiment of the present invention is "one to four".
进一步地,第一视频文件可以是第一用户进行在线学习时的多媒体文件,其可以包括第一用户的实时音频和视频信息。第二视频文件可以是第二用户进行在线教学时的多媒体文件,其可以包括第二用户的实时音频和视频信息。此外,第一视频文件和第二视频文件的格式可以包括但不限于.AVI、.MOV、.RM、.MPEG、.ASF等。Further, the first video file may be a multimedia file when the first user performs online learning, and it may include real-time audio and video information of the first user. The second video file may be a multimedia file during online teaching by the second user, and it may include real-time audio and video information of the second user. In addition, the formats of the first video file and the second video file may include, but are not limited to, .AVI, .MOV, .RM, .MPEG, .ASF, etc.
步骤S120、根据至少一个窗口时长对第一视频文件和第二视频文件进行遍历截取以获得第一视频候选集,该第一视频候选集包括多个视频片段,多个视频片段包括窗口时长相同且时间轴位置相同的多个第一视频片段和对应的多个第二视频片段。Step S120: Traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set. The first video candidate set includes multiple video clips, and the multiple video clips include windows with the same duration and Multiple first video clips and multiple corresponding second video clips with the same time axis position.
在本发明实施例中,窗口时长用t表示,第一视频候选集用R表示,服务器以t为窗口时长分别对第一视频文件和第二视频文件进行遍历截取以获得第一视频候选集,该第一视频候选集用R表示,R={Rt},其中,Rt为第一视频候选集中的多个视频片段,多个视频片段包括多个第一视频片段和多个第二视频片段,其中,多个第二视 频片段与多个第一视频片段一一对应。相互对应的第二视频片段和第一视频片段具有相同的窗口时长和时间轴位置。In the embodiment of the present invention, the window duration is represented by t, and the first video candidate set is represented by R. The server uses t as the window duration to traverse and intercept the first video file and the second video file to obtain the first video candidate set. The first video candidate set is represented by R, R={Rt}, where Rt is multiple video clips in the first video candidate set, and the multiple video clips include multiple first video clips and multiple second video clips, Wherein, the plurality of second video clips correspond to the plurality of first video clips one-to-one. The second video segment and the first video segment corresponding to each other have the same window duration and time axis position.
这里,窗口时长可以为例如10秒、13秒、16秒、19秒、22秒、25秒等。假设窗口时长t的取值范围为[10,13,16,19,22,25],则先用10秒为窗口时长按照预定的滑动步长(如:1秒)滑动截取视频片段,然后再以13秒为窗口时长按照滑动步长(如:1秒)滑动截取视频片段,以此类推。最后得到的第一视频候选集R可以表示为{0-10s,1-11s,……,0-13s,1-14s,……}。也就是说,第一视频候选集中可以包括窗口时长为10秒的多个视频片段,也可以包括窗口时长为13秒的多个视频片段,还可以包括窗口时长为16秒、19秒、22秒和25秒的多个视频片段。Here, the window duration may be, for example, 10 seconds, 13 seconds, 16 seconds, 19 seconds, 22 seconds, 25 seconds, etc. Assuming that the value range of the window duration t is [10, 13, 16, 19, 22, 25], first use 10 seconds as the window duration to slide and intercept the video clips according to a predetermined sliding step (eg: 1 second), and then Use 13 seconds as the window duration to slide and capture video clips in accordance with the sliding step (e.g. 1 second), and so on. The finally obtained first video candidate set R can be expressed as {0-10s, 1-11s,..., 0-13s, 1-14s,...}. That is, the first video candidate set may include multiple video clips with a window duration of 10 seconds, or multiple video clips with a window duration of 13 seconds, and may also include a window duration of 16 seconds, 19 seconds, and 22 seconds. And multiple video clips of 25 seconds.
需要说明的是,窗口时长、滑动步长可以是系统默认的时间长度,也可以是管理员根据需要预先设置的时间长度,本发明对此不作限制。此外,还需要说明的是,第一视频候选集不限于包括如上所述的具有不同窗口时长的多个视频片段,而是可以仅包括具有相同窗口时长的视频片段,例如,第一视频候选集可以仅包括窗口时长为10秒的多个视频片段,或者仅包括窗口时长为13秒的多个视频片段。It should be noted that the window time length and the sliding step length may be the default time length of the system, or the time length preset by the administrator according to needs, and the present invention does not limit this. In addition, it should be noted that the first video candidate set is not limited to include multiple video clips with different window durations as described above, but may only include video clips with the same window duration, for example, the first video candidate set It may only include multiple video clips with a window length of 10 seconds, or only include multiple video clips with a window length of 13 seconds.
步骤S130、根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单位时间周期内的第二用户状态。Step S130: Obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file, where the first vector is used to characterize Corresponding to the first user state in the unit time period, the second vector is used to represent the second user state in the corresponding unit time period.
需要说明的是,步骤S130和步骤S120之间不存在依赖关系,两者执行顺序可以是同时进行,也可以是按照预定的顺序先后执行,例如,先执行步骤S120,然后执行步骤S130,也可以先执行步骤S130,再执行步骤S120。It should be noted that there is no dependency relationship between step S130 and step S120. The order of execution of the two can be performed simultaneously or in a predetermined order. For example, step S120 is executed first, and then step S130 is executed. Step S130 is performed first, and then step S120 is performed.
在本发明实施例中,服务器按照预定的单位时间周期(如:以逐秒的方式)和/或设定的帧数(如:以逐帧的方式)对多个视频片段进行分析以获取每个时间周期对应的第一向量,这里,第一向量用Vs表示,用于表征多个视频片段中各视频片段的第一用户或第二用户的状态。In the embodiment of the present invention, the server analyzes multiple video clips according to a predetermined unit time period (e.g., in a second-by-second manner) and/or a set number of frames (e.g., in a frame-by-frame manner) to obtain each The first vector corresponding to each time period, where the first vector is represented by Vs, is used to characterize the state of the first user or the second user of each video segment in the multiple video segments.
在本实施例的一个可选实现方式中,基于语音、人脸出现情况和人脸表情这三个维度信息来表征第一用户或第二用户的状态。这里,第一用户或第二用户的状态包括:在该视频片段中第一用户或第二用户在每个设定的时间段是否在说话(如以逐秒的方式分析判断每秒是否在说话)、每帧画面中是否出现第一用户或第二用户的人脸以及第一用户或第二用户的表情是否为高兴,等。In an optional implementation of this embodiment, the state of the first user or the second user is represented based on three dimensional information of voice, facial appearance, and facial expression. Here, the status of the first user or the second user includes: in the video clip, whether the first user or the second user is talking at each set time period (for example, analyzing and judging whether the first user or the second user is talking every second in a second-by-second manner) ), whether the face of the first user or the second user appears in each frame, and whether the expression of the first user or the second user is happy, etc.
具体地,步骤S130中的根据所述的第一视频文件获取每个单位时间周期对应的第一向量包括:Specifically, obtaining the first vector corresponding to each unit time period according to the first video file in step S130 includes:
步骤S131、根据第一视频文件确定目标单位时间周期对应的视频数据和音频数据。Step S131: Determine video data and audio data corresponding to the target unit time period according to the first video file.
步骤S132、分别对所述视频数据的多个图像帧进行人脸识别,获取每个图像帧对应的人脸状态参数。Step S132: Perform face recognition on multiple image frames of the video data respectively, and obtain face state parameters corresponding to each image frame.
具体地,所述人脸状态参数包括表征人脸出现情况的第一值和表征人脸表情状态的第二值。Specifically, the face state parameter includes a first value that characterizes the appearance of the face and a second value that characterizes the expression state of the face.
步骤S133、对所述音频数据进行语音识别,获取语音分布参数。Step S133: Perform voice recognition on the audio data to obtain voice distribution parameters.
步骤S134、根据所述人脸状态参数和所述语音分布参数确定所述第一向量。Step S134: Determine the first vector according to the face state parameter and the voice distribution parameter.
人脸识别是基于人的脸部特征信息进行身份识别的一种生物识别技术,用摄像机或摄像头采集含有人脸的图像或视频流,并自动在图像中检测和跟踪人脸,进而对检测到的人脸进行脸部识别等一系列相关技术。人脸识别算法可以包括但不限于基于人脸特征点的识别算法(Feature-based recognition algorithms)、基于整幅人脸图像的识别算法(Appearance-based recognition algorithms)、基于模板的识别算法(Template-based recognition algorithms)、利用神经网络进行识别的算法(Recognition algorithms using neural network)、基于光照估计模型理论等。人脸状态参数是指每秒是否有人脸出现,以及人脸的表情是否为高兴。Face recognition is a kind of biometric recognition technology based on human facial feature information. It uses a camera or camera to collect images or video streams containing human faces, and automatically detects and tracks the human face in the image, and then detects Face recognition and other related technologies. Face recognition algorithms can include, but are not limited to, recognition algorithms based on facial feature points (Feature-based recognition algorithms), recognition algorithms based on the entire face image (Appearance-based recognition algorithms), and template-based recognition algorithms (Template- based on recognition algorithms), recognition algorithms using neural networks, based on light estimation model theory, etc. The face state parameter refers to whether a human face appears every second, and whether the facial expression is happy.
语音识别是将一段语音信号转换成相对应的文本信息,系统主要包含特征提取、声学模型、语言模型以及字典与解码四大部分;此外,为了更有效地提取特征,还需要对所采集到的声音信号进行滤波、分帧等音频数据预处理工作,将需要分析的音频信号从原始信号中合适地提取出来;特征提取工作将声音信号从时域转换到频域,为声学模型提供合适的特征向量;声学模型中再根据声学特性计算每一个特征向量在声学特征上的得分;而语言模型则根据语言学相关的理论,计算该声音信号对应可能词组序列的概率;最后根据已有的字典,对词组序列进行解码,得到最后可能的文本表示。语音识别算法可以包括但不限于高斯混合模型(Gaussian Mixed Model,GMM)算法、动态时间规整算法(Dynamic Time Warping,DTW)算法、联结主义时间分类(Connectionist temporal classification,CTC)算法等。语音分布参数是指每秒是否有人在说话。Speech recognition is to convert a piece of speech signal into corresponding text information. The system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding. In addition, in order to extract features more effectively, it is necessary to The sound signal performs audio data preprocessing such as filtering and framing to appropriately extract the audio signal that needs to be analyzed from the original signal; the feature extraction work converts the sound signal from the time domain to the frequency domain to provide appropriate features for the acoustic model Vector; the acoustic model calculates the score of each feature vector on the acoustic feature according to the acoustic characteristics; while the language model calculates the probability of the sound signal corresponding to the possible sequence of phrases according to the linguistic theory; finally, according to the existing dictionary, Decode the phrase sequence to get the final possible text representation. Speech recognition algorithms may include, but are not limited to, Gaussian Mixed Model (GMM) algorithms, Dynamic Time Warping (DTW) algorithms, Connectionist temporal classification (CTC) algorithms, etc. The speech distribution parameter refers to whether someone is speaking every second.
以获取第一向量为例,在本步骤,对于第一视频文件,提取其中的音频信息和 视频信息,并按照预定的单位时间周期(例如以1秒为周期)进行分析,以从语音、人脸出现情况和人脸表情三个维度获取该时间周期对应的视频部分(包括音频和视频)的属性信息,记为[Ss,Fs,Es]。其中Ss表征对应的单位时间周期中语音的状态,Fs表征对应的单位时间周期中人脸出现情况;Es表征对应的单位时间周期中人脸表情情况。Take obtaining the first vector as an example. In this step, for the first video file, extract the audio information and video information from it, and analyze it according to a predetermined unit time period (for example, a period of 1 second) to obtain information from voice, human The three dimensions of face appearance and facial expressions obtain the attribute information of the video part (including audio and video) corresponding to the time period, denoted as [Ss, Fs, Es]. Among them, Ss represents the state of the voice in the corresponding unit time period, Fs represents the appearance of the face in the corresponding unit time period; Es represents the facial expression in the corresponding unit time period.
具体地,在语音维度,对于每个单位时间周期(例如,每一秒)通过语音分析判断在该视频片段的音频数据中第一用户是否在持续在说话,该判断结果用Ss来表示。例如,对于视频片段中的1秒,如果该时间段内持续检测到语音信息,说明该片段中,在该时间段第一用户在说话,因此Ss为1,否则取值为零。Specifically, in the voice dimension, for each unit time period (for example, every second), it is determined by voice analysis whether the first user is continuously speaking in the audio data of the video segment, and the determination result is represented by Ss. For example, for 1 second in a video clip, if voice information is continuously detected during this time period, it means that in the clip, the first user is speaking during this time period, so Ss is 1, otherwise the value is zero.
在人脸出现情况和人脸表情维度,则进行逐秒分析以获取表征人脸出现情况和人脸表情维度的信息。首先,从视频文件的每一秒的视频数据中提取所有的图像帧或提取部分图像帧。对于提取的每个图像帧,通过图像识别判断每个图像帧中是否出现第一用户的人脸及人脸的表情是否为高兴。用Ff表示一个图像帧中人脸是否出现,用Ef一个图像帧中表示人脸的表情是否为高兴。其中,Ff的取值为0或1,Ef的取值为0或1。由此,对于每一秒视频中提取的每一图像帧,都可以得到一个对应的[Ff,Ef]。例如,以每秒24帧图像数据为例,可以对于视频片段中的每一秒,提取24帧的每一图像帧的图像,并分别进行人脸识别,以得到24个[Ff,Ef]组成的序列。又例如,以每秒24帧图像数据中提取部分图像帧(例如8帧)为例,可以对视频片段中的每一秒,以间隔的方式提取8帧图像,并分别进行人脸识别,已得到8个[Ff,Ef]组成的序列。In the face appearance situation and the face expression dimension, the second-by-second analysis is performed to obtain information that characterizes the face appearance situation and the face expression dimension. First, extract all image frames or extract some image frames from the video data of each second of the video file. For each extracted image frame, it is determined by image recognition whether the face of the first user appears in each image frame and whether the facial expression is happy. Use Ff to indicate whether a human face appears in an image frame, and use Ef to indicate whether the facial expression is happy in an image frame. Among them, the value of Ff is 0 or 1, and the value of Ef is 0 or 1. Thus, for each image frame extracted in each second of the video, a corresponding [Ff, Ef] can be obtained. For example, taking 24 frames of image data per second as an example, for each second in the video clip, the images of each image frame of 24 frames can be extracted, and face recognition can be performed respectively to obtain 24 [Ff, Ef] components the sequence of. For another example, taking partial image frames (such as 8 frames) extracted from 24 frames of image data per second as an example, 8 frames of images can be extracted at intervals for each second in a video clip, and face recognition can be performed separately. Get 8 sequences composed of [Ff,Ef].
然后,视频片对中的每一秒视频数据对应的上述[Ff,Ef]值的序列,进行预定时间段合并(在预定时间段为1秒时,进行逐秒合并),即如果每秒内Ff包含2个及以上的1值,则该秒的人脸值Fs或表情值Es即为1,否则为0。最后,对上述结果进行合并得到每个单位时间周期对应的第一向量,该第一向量用Vs表示,且Vs=[Ss,Fs,Es]。Then, the sequence of the above-mentioned [Ff, Ef] values corresponding to each second of the video data in the video slice pair is combined for a predetermined period of time (when the predetermined period of time is 1 second, the combination is performed second by second), that is, if every second Ff contains 2 or more 1 values, then the face value Fs or expression value Es in that second is 1; otherwise, it is 0. Finally, the above results are combined to obtain the first vector corresponding to each unit time period, the first vector is represented by Vs, and Vs=[Ss, Fs, Es].
例如,对于一个第一视频片段中的第2秒,持续检测到语音信息,则Ss=1,同时,该秒内的24帧图像中,24帧均检测到人脸,则Fs=1,同时,有12帧检测到人脸的表情为微笑,则Es=1。因此,上述第一视频片段第2秒的第一向量Vs=[1,1,1]。For example, for the second second of a first video clip, if voice information is continuously detected, Ss=1. At the same time, if a face is detected in 24 frames of the 24 frames in this second, then Fs=1, and at the same time , If there are 12 frames detected that the facial expression is a smile, then Es=1. Therefore, the first vector Vs of the second second of the above-mentioned first video segment=[1,1,1].
步骤S130中的,根据所述第二视频文件获取每个单位时间周期对应的第二向量具体包括:In step S130, obtaining the second vector corresponding to each unit time period according to the second video file specifically includes:
步骤S135、根据第二视频文件确定目标单位时间周期对应的视频数据和音频数据。Step S135: Determine the video data and audio data corresponding to the target unit time period according to the second video file.
步骤S136、分别对所述视频数据的多个图像帧进行人脸识别,获取每个图像帧对应的人脸状态参数。Step S136: Perform face recognition on multiple image frames of the video data respectively, and obtain face state parameters corresponding to each image frame.
步骤S137、对所述音频数据进行语音识别,获取语音分布参数。Step S137: Perform voice recognition on the audio data to obtain voice distribution parameters.
步骤S138、根据所述人脸状态参数和所述语音分布参数获取所述第二向量。Step S138: Obtain the second vector according to the face state parameter and the voice distribution parameter.
应理解,上述步骤S131-步骤S134和步骤s135-步骤S138的执行顺序不区分先后,可以并行地执行,也可以按照设定的顺序先后执行。It should be understood that the execution order of steps S131 to S134 and steps s135 to S138 is not distinguished, and they can be executed in parallel or in a set sequence.
由此,对于第一视频文件中的每一秒,可以获得对应的一个第一向量Vss。对于第二视频文件中的每一秒,可以获得对应的一个第二向量Vts。Thus, for every second in the first video file, a corresponding first vector Vss can be obtained. For each second in the second video file, a corresponding second vector Vts can be obtained.
步骤S140、根据每个单位时间周期对应的所述第一向量和所述第二向量获取每个单位时间周期的第三向量。Step S140: Obtain a third vector for each unit time period according to the first vector and the second vector corresponding to each unit time period.
在本步骤中,合并Vss和Vts可以得到第三向量。如上所述,第三向量为一个6维的向量,其可以表征在第一视频文件、第二视频文件中同一个单位时间周期内第一用户(例如学生)和第二用户(例如老师)的状态。In this step, the third vector can be obtained by combining Vss and Vts. As mentioned above, the third vector is a 6-dimensional vector, which can characterize the relationship between the first user (such as a student) and the second user (such as a teacher) in the same unit time period in the first video file and the second video file. status.
在本实施例中,通过合并时间轴坐标相同的第一视频文件的第一向量和第二视频文件的第二向量可以得到维度为6的第三向量。第三向量包括第一用户的语音数据、人脸数据和表情数据及第二用户的语音数据、人脸数据和表情数据。In this embodiment, a third vector with a dimension of 6 can be obtained by combining the first vector of the first video file and the second vector of the second video file with the same time axis coordinate. The third vector includes voice data, face data and expression data of the first user and voice data, face data and expression data of the second user.
步骤S150、根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量。Step S150: Determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector.
具体地,本步骤包括如下子步骤:Specifically, this step includes the following sub-steps:
步骤S151、确定目标视频片段对。Step S151: Determine the target video segment pair.
步骤S152、根据所述目标视频片段对的时间轴位置确定对应的多个目标单位时间周期。Step S152: Determine corresponding multiple target unit time periods according to the time axis position of the target video clip pair.
步骤S153、计算所述多个目标单位时间周期的第三向量的和向量。Step S153: Calculate the sum vector of the third vector of the multiple target unit time periods.
步骤S154、根据平均向量和窗口时长确定每个元素对应的元素随机分布函数,所述平均向量根据多个视频文件中的各单位时间周期的第三向量平均计算获得。Step S154: Determine the element random distribution function corresponding to each element according to the average vector and the window duration, and the average vector is calculated based on the average calculation of the third vector of each unit time period in the multiple video files.
步骤S155、根据每个元素对应的元素随机分布函数和所述和向量确定所述目标视频片段对的第四向量,其中,所述第四向量的各元素为所述和向量的对应元素在对应的元素随机分布函数中的分位值。Step S155: Determine a fourth vector of the target video segment pair according to the element random distribution function corresponding to each element and the sum vector, where each element of the fourth vector is the corresponding element of the sum vector The quantile value in the random distribution function of the elements.
具体地,在步骤S153中,对于每个视频片段对中的每一个单位时间周期(也即每一秒),均可以得到一个对应的第三向量。然后可以通过将一个视频片对所覆盖的时间轴内的多个单位时间周期的第三向量求和(也即,对第三向量的每个元素求和),由此,可以得到一个视频片段对对应的第四向量。Specifically, in step S153, for each unit time period (that is, every second) in each video segment pair, a corresponding third vector can be obtained. Then, by summing the third vector of multiple unit time periods in the time axis covered by a video clip (that is, summing each element of the third vector), a video clip can be obtained. For the corresponding fourth vector.
具体地,在步骤S154,可以预先提取与第一视频文件和第二视频文件类似的大量(例如10000条)视频文件,以确定平均向量。对于网络课堂的应用场景,可以获取大量的历史教学视频,教学视频中包括学生的视频文件和老师视频文件,对于上述视频文件,按照所述单位时间周期进行分析、合并以获得每个单位时间周期的第三向量。然后对第三向量进行平均,就可以得到均值。Specifically, in step S154, a large number (for example, 10,000) video files similar to the first video file and the second video file may be extracted in advance to determine the average vector. For network classroom application scenarios, a large number of historical teaching videos can be obtained. The teaching videos include student video files and teacher video files. For the above video files, analyze and merge according to the unit time period to obtain each unit time period The third vector. Then average the third vector to get the mean value.
该平均值向量可以用Vs,avg表示,且Vs,avg=[Ss,savg,Fs,savg,Es,savg,Ss,tavg,Fs,tavg,Es,tavg],其中,Ss,savg表示第一用户的语音数据的平均值,Fs,savg表示第一用户的人脸数据的平均值,Es,savg表示第一用户的表情数据的平均值,Ss,tavg表示第二用户的语音数据的平均值,Fs,tavg表示第二用户的人脸数据的平均值,Es,tavg表示第二用户的表情数据的平均值。The average vector can be represented by Vs,avg, and Vs,avg=[Ss,savg,Fs,savg,Es,savg,Ss,tavg,Fs,tavg,Es,tavg], where Ss,savg means the first The average value of the user’s voice data, Fs, savg represents the average value of the first user’s face data, Es, savg represents the average value of the first user’s facial data, Ss, tavg represents the average value of the second user’s voice data , Fs, tavg represents the average value of the face data of the second user, and Es, tavg represents the average value of the facial expression data of the second user.
在本实施例中,假设声音以及表情方面的特性符合二项分布。因此,在步骤S154中,可以根据上述获得的平均值向量中的平均值和视频片段对应的长度获取不同的时间窗口长度所对应的各个元素的元素随机分布函数。In this embodiment, it is assumed that the characteristics of voice and expression conform to the binomial distribution. Therefore, in step S154, the element random distribution function of each element corresponding to different time window lengths can be obtained according to the average value in the average value vector obtained above and the corresponding length of the video segment.
二项分布即重复n次独立的伯努利试验。在每次试验中只有两种可能的结果,而且两种结果发生与否互相对立,并且相互独立,与其它各次试验结果无关,事件发生与否的概率在每一次独立试验中都保持不变,则这一系列试验总称为n重伯努利实验,当试验次数为1时,二项分布服从0-1分布。二项分布通过均值和实验次数或长度来确定。The binomial distribution is to repeat n independent Bernoulli tests. In each experiment, there are only two possible outcomes, and the two outcomes are opposite to each other and independent of each other. The probability of the occurrence of the event remains unchanged in each independent experiment. , Then this series of experiments is collectively called n-fold Bernoulli experiment. When the number of trials is 1, the binomial distribution obeys the 0-1 distribution. The binomial distribution is determined by the mean and the number or length of experiments.
应理解,本领域技术人员也可以采用其它类型符合视频内声音以及表情方面的特性的元素随机分布函数。It should be understood that those skilled in the art can also use other types of element random distribution functions that meet the characteristics of the sounds and expressions in the video.
在本实施例中,假设每个元素符合二项分布B(t,avg)其中,t为视频片段的窗口时长。B的自变量的取值区间为[0,t]。In this embodiment, it is assumed that each element conforms to the binomial distribution B(t, avg), where t is the window duration of the video segment. The value interval of the independent variable of B is [0, t].
由此,根据第三向量中每个元素的均值和预定的时间窗口的长度,就可以确定得到六个相互独立的二项分布,由此得到元素随机分布向量B=[Bsst,Bsft,Bset,Btst,Btft,Btet]。Thus, according to the mean value of each element in the third vector and the length of the predetermined time window, six mutually independent binomial distributions can be determined, and the element random distribution vector B=[Bsst, Bsft, Bset, Btst, Btft, Btet].
在步骤S155中,对于每一个元素随机分布函数,可以根据第三向量中对应的元 素值确定其在对应的元素随机分布函数中的分位值,由此确定第四向量。也即,第四向量的各元素为所述和向量的对应元素在对应的元素随机分布函数中的分位值。In step S155, for each element random distribution function, its quantile value in the corresponding element random distribution function can be determined according to the corresponding element value in the third vector, thereby determining the fourth vector. That is, each element of the fourth vector is the quantile value of the corresponding element of the sum vector in the corresponding element random distribution function.
步骤S160、根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段。Step S160: Select multiple first video clips and second video clips from the first video candidate set according to the fourth vector.
在本发明实施例中,服务器根据第四向量对第一视频候选集中的多个第一视频片段进行筛选和排序,并根据筛选和排序结果选取多个第一视频片段(例如,排名前三的第一视频片段)作为第一目标视频片段。In the embodiment of the present invention, the server screens and sorts multiple first video clips in the first video candidate set according to the fourth vector, and selects multiple first video clips according to the screening and ranking results (for example, the top three The first video segment) serves as the first target video segment.
具体地,步骤S160包括如下子步骤:Specifically, step S160 includes the following sub-steps:
步骤S161、根据所述第四向量确定过滤视频片段对。Step S161: Determine a filtered video segment pair according to the fourth vector.
在一个可选的实现方式中,响应于所述第四向量中所述各元素任一项小于对应的分位值阈值,将对应的视频片段对确定为所述过滤视频片段。优选地,按照Bsst<0.4,Bsft<0.4,Bset<0.2,Btst<0.4,Btft<0.4,Btet<0.2为筛选条件来确定过滤视频片段对。In an optional implementation manner, in response to any item of each element in the fourth vector being less than a corresponding quantile threshold, the corresponding video segment pair is determined as the filtered video segment. Preferably, the filter video segment pair is determined according to the screening conditions of Bsst<0.4, Bsft<0.4, Bset<0.2, Btst<0.4, Btft<0.4, and Btet<0.2.
步骤S162从所述第一视频候选集中去除掉所述过滤视频片段对以获得第二视频候选集。Step S162 removes the filtered video segment pair from the first video candidate set to obtain a second video candidate set.
优选地,步骤S160还包括:Preferably, step S160 further includes:
步骤S163、根据所述第四向量计算所述第二视频候选集中的各视频片段对的评分值。Step S163: Calculate the score value of each video segment pair in the second video candidate set according to the fourth vector.
在一个可选的实现方式中,将第二视频候选集中的各个视频片段对对应的第四向量的各个元素相加以获得所述评分值。In an optional implementation manner, each video segment in the second video candidate set is added to each element of the corresponding fourth vector to obtain the score value.
步骤S164、根据所述评分值对所述第二视频候选集中的多个第一视频片段进行排序和过滤,直至所述第二视频候选集中剩余的第一视频片段的数量满足预定条件。Step S164: Sort and filter multiple first video clips in the second video candidate set according to the score value until the number of remaining first video clips in the second video candidate set meets a predetermined condition.
在一个可选的实现方式中,可以直接将排序靠前的N个第一视频片段和对应的第二视频片段选取作为下一步的基础,其中N≥1且为整数。In an optional implementation manner, the top N first video clips and the corresponding second video clips can be directly selected as the basis for the next step, where N≥1 and an integer.
在另一个可选的实现方式中,也可以以迭代的方式在每次循环中,将评分最高的视频片段对选中,从第二视频候选集中去除所有时间轴与上述评分最高的视频片段存在重合视频片段,更新第二视频候选集,然后进入下一次迭代,直至最后所述第二视频候选集中剩余的第一视频片段的数量满足预定条件。In another optional implementation manner, iteratively, in each cycle, the video clip with the highest score is selected, and all the time axes are removed from the second video candidate set, and there is overlap with the video clip with the highest score. For video segments, update the second video candidate set, and then enter the next iteration, until the number of remaining first video segments in the second video candidate set meets a predetermined condition.
由此,本申请实施例通过获取至少一个第一用户的第一视频文件和第二用户的第二视频文件,根据至少一个窗口时长对第一视频文件和第二视频文件进行遍历截取 获得第一视频候选集,根据第一视频文件的单位时间周期中表征视频内用户的状态的第一向量和第二向量来获取表征其概率分布值的第四向量,并根据第四向量从第一视频候选集中选取多个第一视频片段和对应的第二视频片段,进而合成目标视频片段。由此,能够提取优质或精彩视频片段以充分体现第一用户与第二用户之间的交互,由此,提供了更加丰富的用户体验。Therefore, the embodiment of the present application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video file. Video candidate set, according to the first vector and second vector representing the state of the user in the video in the unit time period of the first video file, obtain the fourth vector representing the probability distribution value of the first video file, and obtain the fourth vector representing the probability distribution value from the first video candidate according to the fourth vector Collectively select multiple first video clips and corresponding second video clips, and then synthesize the target video clips. In this way, high-quality or wonderful video clips can be extracted to fully embody the interaction between the first user and the second user, thereby providing a richer user experience.
在本发明的另一个实施例中,图1的视频处理方法还包括:In another embodiment of the present invention, the video processing method of FIG. 1 further includes:
步骤170、根据选取的第一视频片段和第二视频片段得到目标视频片段。Step 170: Obtain a target video segment according to the selected first video segment and second video segment.
具体地,目标视频片段是根据从第一视频候选集中选取的多个第一视频片段和第二视频片段进行拼接得到的。举例来说,如果从第一视频候选集中选取例如0-10s、15-33s和35-57s的三个第一视频片段作为第一目标视频片段,则对应的第二目标视频片段也包括0-10s、15-33s和35-57s的三个第二视频片段。Specifically, the target video segment is obtained by splicing multiple first video segments and second video segments selected from the first video candidate set. For example, if three first video clips such as 0-10s, 15-33s, and 35-57s are selected from the first video candidate set as the first target video clip, the corresponding second target video clip also includes 0- Three second video clips of 10s, 15-33s and 35-57s.
图2是本发明实施例的方法的数据流向图。以下结合图2,通过网络课堂的学生视频和同步录制的老师教学视频为例,来举例说明本发明实施例的数据处理过程。Fig. 2 is a data flow diagram of a method according to an embodiment of the present invention. In the following, with reference to FIG. 2, the data processing process of the embodiment of the present invention will be exemplified by taking a student video in a network classroom and a teacher teaching video synchronized recording as an example.
在步骤S110、获取第一用户的第一视频文件S(在本示例中为学生的视频文件)和第二用户的第二视频文件T(在本示例中为教师的视频文件)。In step S110, the first video file S of the first user (the student's video file in this example) and the second user's second video file T (the teacher's video file in this example) are acquired.
由此,实际上获得了一对视频文件。Thus, a pair of video files is actually obtained.
在步骤S120,通过多个不同的窗口时长来滑动截取视频片段以得到第一视频候选集。在本示例中,通过两个窗口时长10s和13s来进行滑动截取。首先,通过10s为窗口对第一视频文件S和第二视频文件T进行滑动截取,可以获得时间轴为{0-10s,1-11s,2-12s,......}的第一视频片段和第二视频片段。然后通过13s的窗口对第一视频文件S和第二视频文件T进行滑动截取,可以获得施加轴为{0-13s,1-14s,2-15s,……}的第一视频片段和第二视频片段。视频片段的时间长度和滑动截取使用的窗口时长相同。时间轴相同的第一视频片段和第二视频片段组成一个视频片段对。In step S120, video clips are clipped by sliding through multiple different window durations to obtain the first video candidate set. In this example, two window durations of 10s and 13s are used for sliding interception. First, by sliding the first video file S and the second video file T through the 10s window, the first video file with the time axis {0-10s, 1-11s, 2-12s,...} can be obtained. Video clip and second video clip. Then slide the first video file S and the second video file T through the 13s window, and you can get the first video segment and the second video segment with the applied axis {0-13s, 1-14s, 2-15s,...} Video clips. The duration of the video clip is the same as the duration of the window used for sliding capture. The first video segment and the second video segment with the same time axis constitute a video segment pair.
在步骤S130,对于第一视频文件和第二视频文件的数据进行逐秒分析(也即,单位时间周期为1秒)。获取第一视频文件中每一秒对应的第一向量Vss和第二视频文件中每一条对应的第二向量Vts。In step S130, the data of the first video file and the second video file are analyzed second by second (that is, the unit time period is 1 second). Acquire the first vector Vss corresponding to each second in the first video file and the second vector Vts corresponding to each of the second video files.
例如,对于时间轴第5秒,第一视频文件的第一向量Vss=[1,0,0],也即,在该秒中持续检测到语音,但是学生的人脸没有出现,也不可能检测到笑脸,同时,第二视频文件的第二向量Vts=[1,1,0],也即,在该秒中持续检测到语音,也检测到老师人脸,但是没有检测到笑脸。For example, for the 5th second of the timeline, the first vector Vss of the first video file = [1,0,0], that is, the voice continues to be detected in this second, but the student's face does not appear, which is impossible A smiling face is detected, and at the same time, the second vector Vts of the second video file=[1, 1, 0], that is, the voice is continuously detected in this second, and the teacher's face is also detected, but no smiling face is detected.
在步骤S140,将每一秒的第一向量Vss和第二向量Vts合并为第三向量Vs。In step S140, the first vector Vss and the second vector Vts of each second are merged into a third vector Vs.
例如,对于时间轴第5秒,第三向量Vs=[1,0,0,1,1,0]。类似地,对于时间轴第6秒,第三向量Vs=[1,1,1,1,1,1]。For example, for the 5th second on the time axis, the third vector Vs=[1, 0, 0, 1, 1, 0]. Similarly, for the 6th second on the time axis, the third vector Vs=[1,1,1,1,1,1].
在步骤S150,根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量。In step S150, a fourth vector is determined according to the time axis position corresponding to each video segment pair and the third vector.
首先,将如上所述的窗口时长t与预先确定的大量其他视频文件的第三向量的每个维度的平均值Vs,avg中的六个维度值分别组合,生成六个二项分布,用B表示,B=[Bsst,Bsft,Bset,Btst,Btft,Btet]。在本示例中,Bsst为学生语音的元素随机分布函数,Bsft为学生人脸出现情况的元素随机分布函数,Bset为学生人脸表情情况的元素随机分布函数,Btst为为老师语音的元素随机分布函数,Btft为老师人脸出现情况的元素随机分布函数,Bset为老师人脸表情情况的元素随机分布函数。上述元素随机分布函数根据预先计算的均值和对应的窗口时长来确定。以Bsst为例,Bsst服从二项分布B(t,savg),B的自变量取值区间为[0,t]。First, the above-mentioned window duration t is combined with the predetermined average value of each dimension of the third vector Vs and avg of a large number of other video files to generate six binomial distributions, using B Means, B=[Bsst, Bsft, Bset, Btst, Btft, Btet]. In this example, Bsst is the random distribution function of the elements of the student’s voice, Bsft is the random distribution function of the elements of the student’s face, Bset is the random distribution function of the elements of the student’s facial expressions, and Btst is the random distribution of the elements of the teacher’s voice Function, Btft is the random distribution function of the elements of the teacher's face, and Bset is the random distribution function of the elements of the teacher's face. The above-mentioned element random distribution function is determined according to the pre-calculated mean value and the corresponding window duration. Taking Bsst as an example, Bsst obeys the binomial distribution B(t, savg), and the range of the independent variable of B is [0, t].
接着,将多个视频片段Rt中的每一个解(也即,视频片段对)按六个维度分别进行求和以得到视频片段对Rt对应的和向量。例如,对于0-10s时间区间的第一视频片段和第二视频片段。将每一秒对应的第三向量Vs1-Vs10相加,得到一个六维的和向量。Next, each solution of the multiple video segments Rt (that is, the video segment pair) is respectively summed in six dimensions to obtain the sum vector corresponding to the video segment pair Rt. For example, for the first video segment and the second video segment in the 0-10s time interval. The third vector Vs1-Vs10 corresponding to each second is added to obtain a six-dimensional sum vector.
基于上述的和向量,就可以计算每个可行解在B中的分位值向量(也即,第四向量)。具体地,计算和向量中的元素的值在对应于的元素随机分布函数中的分位值,以得到由六个分位值构成的第四向量。例如,和向量中第一个元素为4,则计算4在Bsst这个分布的中的分位值。Based on the above-mentioned sum vector, the quantile vector (that is, the fourth vector) of each feasible solution in B can be calculated. Specifically, the quantile value of the element value in the sum vector in the random distribution function of the corresponding element is calculated to obtain a fourth vector composed of six quantile values. For example, if the first element in the sum vector is 4, calculate the median value of 4 in the Bsst distribution.
在步骤S160、根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段。In step S160, a plurality of first video clips and second video clips are selected from the first video candidate set according to the fourth vector.
在本示例中,根据预定的筛选条件来筛选可行解。如果满足Bsst<0.4,Bs,f,t<0.4,Bs,e,t<0.2,Bt,s,t<0.4,Bt,f,t<0.4,Bt,e,t<0.2中的任一个,则将该视频片段过滤掉而保留剩下的视频,这时,第一视频候选集变为第二视频候选集,用R’表示,且R’={Rt’},其中,Rt’表示剩余视频候选集R’中的多个视频片段。需要说明的是,0.4和0.2为预设分位值,其可以是系统默认值,也可以是管理员根据需要预先设置的,本发明对此不作限制。In this example, feasible solutions are filtered according to predetermined filtering conditions. If satisfy any of Bsst<0.4, Bs,f,t<0.4, Bs,e,t<0.2, Bt,s,t<0.4, Bt,f,t<0.4, Bt,e,t<0.2, The video segment is filtered out and the remaining videos are retained. At this time, the first video candidate set becomes the second video candidate set, denoted by R', and R'={Rt'}, where Rt' denotes remaining Multiple video clips in the video candidate set R'. It should be noted that 0.4 and 0.2 are preset quantile values, which may be system default values, or may be preset by the administrator according to needs, and the present invention does not limit this.
进一步地,对于第二视频候选集R’,对每个视频片段对(也即可行解)的第四 向量中的六个元素求和,得到一个和值。然后,基于求和结果进行排序以取出最大值,将R’中与最大值所代表的时间段有重合的所有候选集去除,得到新的R’,重复该步骤,直至取出三个视频片段。Further, for the second video candidate set R', the six elements in the fourth vector of each video segment pair (that is, the solution) are summed to obtain a sum. Then, sort based on the result of the summation to extract the maximum value, remove all candidate sets in R'that overlap with the time period represented by the maximum value to obtain a new R', and repeat this step until three video clips are taken out.
举例说明,在第二视频候选集R’中有6个视频片段对r1-r6,其对应的第四向量分别为b1-b6。在这一步骤中,对每个第四向量b1-b6,对向量中的6个元素求和,得到和值s1-s6。例如,b1={0.5,0.5,0.3,0.5,0.5,0.4},则对应的s1=0.5+0.5+0.3+0.5+0.5+0.4=2.7。然后对s1-s6排序,如果s2最大,则将时间轴和视频片段对r2(s2与其对应)重合的其它视频片段对去除,更新第二视频候选集R’。以迭代方式反复执行,直至更新好的视频候选集R’剩余的视频片段对的数量满足要求(例如,剩余3个)。For example, there are 6 video segment pairs r1-r6 in the second video candidate set R', and the corresponding fourth vectors are b1-b6 respectively. In this step, for each fourth vector b1-b6, the 6 elements in the vector are summed to obtain the sum value s1-s6. For example, b1={0.5,0.5,0.3,0.5,0.5,0.4}, the corresponding s1=0.5+0.5+0.3+0.5+0.5+0.4=2.7. Then, s1-s6 are sorted, and if s2 is the largest, other video segment pairs that overlap the time axis and video segment pair r2 (s2 and its corresponding) are removed, and the second video candidate set R'is updated. Iteratively executes until the number of remaining video segment pairs in the updated video candidate set R'meets the requirement (for example, there are 3 remaining).
进一步地,以第一用户的语音数据为例,假设窗口时长为10s,则第一用户的语音数据满足的二项分布函数为B(10,0.2),对应的分布和分位值分别为:Further, taking the voice data of the first user as an example, assuming that the window duration is 10s, the binomial distribution function satisfied by the voice data of the first user is B(10, 0.2), and the corresponding distribution and quantile values are:
prb(0):0.1074,0.1074prb(0): 0.1074, 0.1074
prb(1):0.2684,0.3758prb(1): 0.2684, 0.3758
prb(2):0.3020,0.6778prb(2): 0.3020, 0.6778
prb(3):0.2013,0.8791prb(3): 0.2013, 0.8791
prb(4):0.0881,0.9672prb(4): 0.0881, 0.9672
prb(5):0.0264,0.9936prb(5): 0.0264, 0.9936
prb(6):0.0055,0.9991prb(6): 0.0055, 0.9991
prb(7):0.0008,0.9999prb(7): 0.0008, 0.9999
prb(8):0.0001,1.0000prb(8): 0.0001, 1.0000
prb(9):0.0000,1.0000prb(9): 0.0000, 1.0000
prb(10):0.0000,1.0000。prb(10): 0.0000, 1.0000.
其中,prb(i),i=1-10是指根据二项分布函数B(10,0.2),在10秒的视频片段中的概率分布情况,具体地,第一列为语音持续时间在i和i-1之间的概率,第二列语音持续时间小于等于i秒的概率。上述第二列既可以作为对应参数i的分位值。上述参数在二项分布函数确定的情况下,根据二项分布的概率计算公式即可计算获得。Among them, prb(i), i=1-10 refers to the probability distribution in a 10-second video segment according to the binomial distribution function B(10, 0.2). Specifically, the first column is that the duration of the voice is at i The probability between and i-1, the probability that the duration of the speech in the second column is less than or equal to i seconds. The second column above can be used as the quantile value of the corresponding parameter i. The above parameters can be calculated according to the probability calculation formula of the binomial distribution when the binomial distribution function is determined.
从上面的数据可以看出,第一用户的说话时长小于等于0秒时,分位值为0.1074;说话时长小于等于1秒时,分位值为0.3758;说话时长小于等于2秒时,分位值为0.6778,以此类推。进一步地,将分位值与预设分位值(例如,0.4)进行比较,如 果分位值小于或等于预设分位值,则将该分位值对应的视频片段过滤掉;如果分位值大于预设分位值且同时满足其它元素的要求,则保留对应的视频片段。From the above data, it can be seen that when the first user’s speech duration is less than or equal to 0 seconds, the quantile value is 0.1074; when the speech duration is less than or equal to 1 second, the quantile value is 0.3758; when the speech duration is less than or equal to 2 seconds, the quantile value is 0.3758. The value is 0.6778, and so on. Further, the quantile value is compared with the preset quantile value (for example, 0.4), and if the quantile value is less than or equal to the preset quantile value, the video clip corresponding to the quantile value is filtered out; If the value is greater than the preset quantile value and meets the requirements of other elements at the same time, the corresponding video clip is retained.
在步骤170,根据选取的第一视频片段和第二视频片段得到目标视频片段。In step 170, a target video segment is obtained according to the selected first video segment and second video segment.
可选地,作为本发明的另一个实施例,对第二视频候选集最后剩余的视频片段对中的第一视频片段和第二视频片对进行合成,形成精彩视频。Optionally, as another embodiment of the present invention, the first video segment and the second video segment pair in the last remaining video segment pair in the second video candidate set are synthesized to form a wonderful video.
具体地,可以进行拼接,使得选取的第一视频片段和第二视频片段可以同时显示在同一画面中,即在同一画面中同时显示第一视频片段和第二视频片段。Specifically, splicing can be performed, so that the selected first video segment and the second video segment can be displayed in the same screen at the same time, that is, the first video segment and the second video segment can be displayed in the same screen at the same time.
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。All the above-mentioned optional technical solutions can be combined in any way to form an optional embodiment of the present invention, which will not be repeated here.
本申请实施例通过获取至少一个第一用户的第一视频文件和第二用户的第二视频文件,根据至少一个窗口时长对第一视频文件和第二视频文件进行遍历截取获得第一视频候选集,根据第一视频文件的单位时间周期中表征视频内用户的状态的第一向量和第二向量来获取表征其概率分布值的第四向量,并根据第四向量从第一视频候选集中选取多个第一视频片段和对应的第二视频片段,进而合成目标视频片段。由此,能够提取优质或精彩视频片段以充分体现第一用户与第二用户之间的交互,由此,提供了更加丰富的用户体验。The embodiment of the application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video candidate set According to the first vector and the second vector representing the state of the user in the video in the unit time period of the first video file, the fourth vector representing the probability distribution value is obtained, and the fourth vector is selected from the first video candidate set according to the fourth vector. A first video segment and a corresponding second video segment, and then synthesize the target video segment. In this way, high-quality or wonderful video clips can be extracted to fully embody the interaction between the first user and the second user, thereby providing a richer user experience.
下述为本发明装置实施例,可以用于执行本发明方法实施例。对于本发明装置实施例中未披露的细节,请参照本发明方法实施例。The following are device embodiments of the present invention, which can be used to implement the method embodiments of the present invention. For details not disclosed in the device embodiment of the present invention, please refer to the method embodiment of the present invention.
图3是本发明实施例的视频处理装置的示意图。如图3所示,本实施例的视频处理装置3包括第一获取单元31、截取单元32、第二获取单元33、第三获取单元32、第四获取单元35、选取单元36和合成单元37。Fig. 3 is a schematic diagram of a video processing device according to an embodiment of the present invention. As shown in FIG. 3, the video processing device 3 of this embodiment includes a first acquisition unit 31, an interception unit 32, a second acquisition unit 33, a third acquisition unit 32, a fourth acquisition unit 35, a selection unit 36, and a synthesis unit 37. .
其中,第一获取单元31用于获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件。The first obtaining unit 31 is configured to obtain at least one first video file of the first user and at least one second video file of the second user.
截取单元32用于根据至少一个窗口时长对所述第一视频文件和所述第二视频文件进行遍历截取以获得第一视频候选集,所述第一视频候选集包括多个视频片段对,各所述视频片段对包括窗口时长相同且时间轴位置相同的第一视频片段和对应的第二视频片段。The interception unit 32 is configured to traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set. The first video candidate set includes a plurality of video clip pairs, each The video clip pair includes a first video clip and a corresponding second video clip with the same window duration and the same time axis position.
第二获取单元33用于根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单 位时间周期内的第二用户状态。The second obtaining unit 33 is configured to obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file. The vector is used to characterize the first user state in the corresponding unit time period, and the second vector is used to characterize the second user state in the corresponding unit time period.
第三获取单元34用于根据每个单位时间周期对应的所述第一向量和所述第二向量获取第三向量。The third obtaining unit 34 is configured to obtain a third vector according to the first vector and the second vector corresponding to each unit time period.
第四获取单元35用于根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量。The fourth acquiring unit 35 is configured to determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector.
选取单元36用于根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段。The selecting unit 36 is configured to select multiple first video clips and second video clips from the first video candidate set according to the fourth vector.
合成单元37用于根据选取的第一视频片段和第二视频片段得到目标视频片段。The synthesis unit 37 is configured to obtain a target video segment according to the selected first video segment and second video segment.
由此,本申请实施例通过获取至少一个第一用户的第一视频文件和第二用户的第二视频文件,根据至少一个窗口时长对第一视频文件和第二视频文件进行遍历截取获得第一视频候选集,根据第一视频文件的单位时间周期中表征视频内用户的状态的第一向量和第二向量来获取表征其概率分布值的第四向量,并根据第四向量从第一视频候选集中选取多个第一视频片段和对应的第二视频片段,进而合成目标视频片段。由此,能够提取优质或精彩视频片段以充分体现第一用户与第二用户之间的交互,由此,提供了更加丰富的用户体验。Therefore, the embodiment of the present application obtains the first video file of at least one first user and the second video file of the second user, and traverses and intercepts the first video file and the second video file according to the at least one window duration to obtain the first video file. Video candidate set, according to the first vector and second vector representing the state of the user in the video in the unit time period of the first video file, obtain the fourth vector representing the probability distribution value of the first video file, and obtain the fourth vector representing the probability distribution value from the first video candidate according to the fourth vector Collectively select multiple first video clips and corresponding second video clips, and then synthesize the target video clips. In this way, high-quality or wonderful video clips can be extracted to fully embody the interaction between the first user and the second user, thereby providing a richer user experience.
图4是本发明实施例的电子设备的示意图。图4所示的电子设备4为通用数据处理装置,其包括通用的计算机硬件结构,其至少包括处理器41和存储器42。处理器41和存储器42通过总线43连接。存储器42适于存储处理器41可执行的指令或程序。处理器41可以是独立的微处理器,也可以是一个或者多个微处理器集合。由此,处理器41通过执行存储器42所存储的命令,从而执行如上所述的本发明实施例的方法流程实现对于数据的处理和对于其他装置的控制。总线43将上述多个组件连接在一起,同时将上述组件连接到显示控制器44和显示装置以及输入/输出(I/O)装置45。输入/输出(I/O)装置45可以是鼠标、键盘、调制解调器、网络接口、触控输入装置、体感输入装置、打印机以及本领域公知的其他装置。典型地,输入/输出(I/O)装置45通过输入/输出(I/O)控制器46与系统相连。Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention. The electronic device 4 shown in FIG. 4 is a general data processing device, which includes a general computer hardware structure, and at least includes a processor 41 and a memory 42. The processor 41 and the memory 42 are connected by a bus 43. The memory 42 is suitable for storing instructions or programs executable by the processor 41. The processor 41 may be an independent microprocessor, or a collection of one or more microprocessors. In this way, the processor 41 executes the command stored in the memory 42 to execute the method procedure of the embodiment of the present invention described above to realize data processing and control of other devices. The bus 43 connects the above-mentioned multiple components together, and at the same time connects the above-mentioned components to the display controller 44 and the display device and the input/output (I/O) device 45. The input/output (I/O) device 45 may be a mouse, a keyboard, a modem, a network interface, a touch input device, a motion sensing input device, a printer, and other devices known in the art. Typically, an input/output (I/O) device 45 is connected to the system through an input/output (I/O) controller 46.
其中,存储器42可以存储软件组件,例如操作系统、通信模块、交互模块以及应用程序。以上所述的每个模块和应用程序都对应于完成一个或多个功能和在发明实施例中描述的方法的一组可执行程序指令。Among them, the memory 42 may store software components, such as an operating system, a communication module, an interaction module, and an application program. Each module and application program described above corresponds to a set of executable program instructions that complete one or more functions and methods described in the embodiments of the invention.
上述根据本发明实施例的方法、设备(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应理解,流程图和/或框图的每个块以及流程图图例和/ 或框图中的块的组合可以由计算机程序指令来实现。这些计算机程序指令可以被提供至通用计算机、专用计算机或其它可编程数据处理设备的处理器,以产生机器,使得(经由计算机或其它可编程数据处理设备的处理器执行的)指令创建用于实现流程图和/或框图块或块中指定的功能/动作的装置。The above-mentioned flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present invention describe various aspects of the present invention. It should be understood that each block of the flowchart illustrations and/or block diagrams and combinations of blocks in the flowchart illustrations and/or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device to generate a machine, so that the instructions (executed by the processor of the computer or other programmable data processing device) are created for implementation Flow chart and/or block diagram block or means of function/action specified in the block.
同时,如本领域技术人员将意识到的,本发明实施例的各个方面可以被实现为系统、方法或计算机程序产品。因此,本发明实施例的各个方面可以采取如下形式:完全硬件实施方式、完全软件实施方式(包括固件、常驻软件、微代码等)或者在本文中通常可以都称为“电路”、“模块”或“系统”的将软件方面与硬件方面相结合的实施方式。此外,本发明的方面可以采取如下形式:在一个或多个计算机可读介质中实现的计算机程序产品,计算机可读介质具有在其上实现的计算机可读程序代码。At the same time, as those skilled in the art will realize, various aspects of the embodiments of the present invention may be implemented as a system, a method, or a computer program product. Therefore, various aspects of the embodiments of the present invention can take the following forms: a complete hardware implementation, a complete software implementation (including firmware, resident software, microcode, etc.), or can be generally referred to as "circuits" and "modules" in this document. "Or "system" is an implementation that combines software and hardware. In addition, aspects of the present invention may take the following form: a computer program product implemented in one or more computer-readable media, the computer-readable medium having computer-readable program codes implemented thereon.
可以利用一个或多个计算机可读介质的任意组合。计算机可读介质可以是计算机可读信号介质或计算机可读存储介质。计算机可读存储介质可以是如(但不限于)电子的、磁的、光学的、电磁的、红外的或半导体系统、设备或装置,或者前述的任意适当的组合。计算机可读存储介质的更具体的示例(非穷尽列举)将包括以下各项:具有一根或多根电线的电气连接、便携式计算机软盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或闪速存储器)、光纤、便携式光盘只读存储器(CD-ROM)、光存储装置、磁存储装置或前述的任意适当的组合。在本发明实施例的上下文中,计算机可读存储介质可以为能够包含或存储由指令执行系统、设备或装置使用的程序或结合指令执行系统、设备或装置使用的程序的任意有形介质。Any combination of one or more computer readable media can be utilized. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any appropriate combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media would include the following: electrical connections with one or more wires, portable computer floppy disks, hard disks, random access memory (RAM), read-only memory ( ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any appropriate combination of the foregoing. In the context of the embodiments of the present invention, a computer-readable storage medium may be any tangible medium that can contain or store a program used by an instruction execution system, device, or device or a program used in conjunction with an instruction execution system, device, or device.
计算机可读信号介质可以包括传播的数据信号,该传播的数据信号具有在其中如在基带中或作为载波的一部分实现的计算机可读程序代码。这样的传播的信号可以采用多种形式中的任何形式,包括但不限于:电磁的、光学的或其任何适当的组合。计算机可读信号介质可以是以下任意计算机可读介质:不是计算机可读存储介质,并且可以对由指令执行系统、设备或装置使用的或结合指令执行系统、设备或装置使用的程序进行通信、传播或传输。The computer-readable signal medium may include a propagated data signal having computer-readable program code implemented therein as in baseband or as part of a carrier wave. Such a propagated signal can take any of a variety of forms, including but not limited to: electromagnetic, optical, or any suitable combination. The computer-readable signal medium may be any of the following computer-readable media: it is not a computer-readable storage medium, and it can communicate and propagate the program used by the instruction execution system, device or device or used in conjunction with the instruction execution system, device or device Or transmission.
可以使用包括但不限于无线、有线、光纤电缆、RF等或前述的任意适当组合的任意合适的介质来传送实现在计算机可读介质上的程序代码。Any suitable medium including but not limited to wireless, wired, fiber optic cable, RF, etc. or any appropriate combination of the foregoing can be used to transmit the program code implemented on the computer-readable medium.
用于执行针对本发明各方面的操作的计算机程序代码可以以一种或多种编程语言的任意组合来编写,编程语言包括:面向对象的编程语言如Java、Smalltalk、C++、PHP、Python等;以及常规过程编程语言如“C”编程语言或类似的编程语言。程序代 码可以作为独立软件包完全地在用户计算机上、部分地在用户计算机上执行;部分地在用户计算机上且部分地在远程计算机上执行;或者完全地在远程计算机或服务器上执行。在后一种情况下,可以将远程计算机通过包括局域网(LAN)或广域网(WAN)的任意类型的网络连接至用户计算机,或者可以与外部计算机进行连接(例如通过使用因特网服务供应商的因特网)。The computer program code used to perform operations directed to various aspects of the present invention can be written in any combination of one or more programming languages, and the programming languages include: object-oriented programming languages such as Java, Smalltalk, C++, PHP, Python, etc.; And conventional process programming languages such as "C" programming language or similar programming languages. The program code can be executed as an independent software package entirely on the user's computer, partly on the user's computer; partly on the user's computer and partly on a remote computer; or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user computer through any type of network including a local area network (LAN) or a wide area network (WAN), or can be connected with an external computer (for example, by using the Internet of an Internet service provider) .
上述根据本发明实施例的方法、设备(系统)和计算机程序产品的流程图图例和/或框图描述了本发明的各个方面。将要理解的是,流程图图例和/或框图的每个块以及流程图图例和/或框图中的块的组合可以由计算机程序指令来实现。这些计算机程序指令可以被提供至通用计算机、专用计算机或其它可编程数据处理设备的处理器,以产生机器,使得(经由计算机或其它可编程数据处理设备的处理器执行的)指令创建用于实现流程图和/或框图块或块中指定的功能/动作的装置。The foregoing flowchart illustrations and/or block diagrams of methods, devices (systems) and computer program products according to embodiments of the present invention describe various aspects of the present invention. It will be understood that each block of the flowchart illustrations and/or block diagrams and combinations of blocks in the flowchart illustrations and/or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device to generate a machine, so that the instructions (executed by the processor of the computer or other programmable data processing device) are created for implementation Flow chart and/or block diagram block or means of function/action specified in the block.
还可以将这些计算机程序指令存储在可以指导计算机、其它可编程数据处理设备或其它装置以特定方式运行的计算机可读介质中,使得在计算机可读介质中存储的指令产生包括实现在流程图和/或框图块或块中指定的功能/动作的指令的制品。These computer program instructions can also be stored in a computer-readable medium that can direct a computer, other programmable data processing equipment, or other devices to operate in a specific manner, so that the generation of instructions stored in the computer-readable medium includes implementation in the flowcharts and / Or block diagram block or the product of the instruction of the function/action specified in the block.
计算机程序指令还可以被加载至计算机、其它可编程数据处理设备或其它装置上,以使在计算机、其它可编程设备或其它装置上执行一系列可操作步骤来产生计算机实现的过程,使得在计算机或其它可编程设备上执行的指令提供用于实现在流程图和/或框图块或块中指定的功能/动作的过程。Computer program instructions can also be loaded on a computer, other programmable data processing equipment or other devices, so that a series of operable steps are executed on the computer, other programmable equipment or other devices to generate a computer-implemented process, so that the computer Or instructions executed on other programmable devices provide a process for implementing the functions/actions specified in the flowchart and/or block diagrams or blocks.
以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域技术人员而言,本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not used to limit the present invention. For those skilled in the art, the present invention can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

  1. 一种视频处理方法,包括:A video processing method, including:
    获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件;Acquiring at least one first video file of the first user and at least one second video file of the second user;
    根据至少一个窗口时长对所述第一视频文件和所述第二视频文件进行遍历截取以获得第一视频候选集,所述第一视频候选集包括多个视频片段对,各所述视频片段对包括窗口时长相同且时间轴位置相同的第一视频片段和对应的第二视频片段;Traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set. The first video candidate set includes a plurality of video clip pairs, and each video clip pair Including a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
    根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单位时间周期内的第二用户状态;Obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file, and the first vector is used to characterize the corresponding unit A first user state in a time period, where the second vector is used to represent a second user state in a corresponding unit time period;
    根据每个单位时间周期对应的所述第一向量和所述第二向量获取每个单位时间周期的第三向量;Obtaining a third vector for each unit time period according to the first vector and the second vector corresponding to each unit time period;
    根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量;Determine the fourth vector according to the time axis position corresponding to each video segment pair and the third vector;
    根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段;Selecting multiple first video segments and second video segments from the first video candidate set according to the fourth vector;
    根据选取的第一视频片段和第二视频片段得到目标视频片段。Obtain the target video segment according to the selected first video segment and the second video segment.
  2. 根据权利要求1所述的方法,其特征在于,根据所述的第一视频文件获取每个单位时间周期对应的第一向量包括:The method according to claim 1, wherein obtaining the first vector corresponding to each unit time period according to the first video file comprises:
    根据第一视频文件确定目标单位时间周期对应的视频数据和音频数据;Determining the video data and audio data corresponding to the target unit time period according to the first video file;
    分别对所述视频数据的多个图像帧进行人脸识别,获取每个图像帧对应的人脸状态参数;Performing face recognition on a plurality of image frames of the video data respectively, and obtaining face state parameters corresponding to each image frame;
    对所述音频数据进行语音识别,获取语音分布参数;Perform voice recognition on the audio data to obtain voice distribution parameters;
    根据所述人脸状态参数和所述语音分布参数确定所述第一向量。The first vector is determined according to the face state parameter and the voice distribution parameter.
  3. 根据权利要求1所述的方法,其特征在于,根据所述第二视频文件获取每个单位时间周期对应的第二向量包括:The method according to claim 1, wherein obtaining the second vector corresponding to each unit time period according to the second video file comprises:
    根据第二视频文件确定目标单位时间周期对应的视频数据和音频数据;Determine the video data and audio data corresponding to the target unit time period according to the second video file;
    分别对所述视频数据的多个图像帧进行人脸识别,获取每个图像帧对应的人脸状态参数;Performing face recognition on a plurality of image frames of the video data respectively, and obtaining face state parameters corresponding to each image frame;
    对所述音频数据进行语音识别,获取语音分布参数;Perform voice recognition on the audio data to obtain voice distribution parameters;
    根据所述人脸状态参数和所述语音分布参数获取所述第二向量。Obtaining the second vector according to the face state parameter and the voice distribution parameter.
  4. 根据权利要求2或3所述的方法,其特征在于,所述人脸状态参数包括表征人脸出现情况的第一值和表征人脸表情状态的第二值。The method according to claim 2 or 3, wherein the face state parameter includes a first value that characterizes the appearance of the face and a second value that characterizes the expression state of the face.
  5. 根据权利要求1所述的方法,其特征在于,所述根据每个单位时间周期对应的所述第一向量和所述第二向量获取第三向量包括:The method according to claim 1, wherein the obtaining a third vector according to the first vector and the second vector corresponding to each unit time period comprises:
    将相同单位时间周期对应的所述第一向量和所述第二向量合并为所述单位时间周期对应的所述第三向量。Combining the first vector and the second vector corresponding to the same unit time period into the third vector corresponding to the unit time period.
  6. 根据权利要求5所述的方法,其特征在于,根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量包括:The method according to claim 5, wherein determining the fourth vector according to the time axis position corresponding to each video segment pair and the third vector comprises:
    确定目标视频片段对;Determine the target video clip pair;
    根据所述目标视频片段对的时间轴位置确定对应的多个目标单位时间周期;Determining corresponding multiple target unit time periods according to the time axis position of the target video clip pair;
    计算所述多个目标单位时间周期的所述第三向量的和向量;Calculating the sum vector of the third vector of the multiple target unit time periods;
    根据平均向量和窗口时长确定每个元素对应的元素随机分布函数,所述平均向量根据多个视频文件中的各单位时间周期的第三向量平均计算获得;Determine the element random distribution function corresponding to each element according to the average vector and the window duration, and the average vector is obtained by average calculation according to the third vector of each unit time period in the multiple video files;
    根据每个元素对应的所述元素随机分布函数和所述和向量确定所述目标视频片段对的第四向量,其中,所述第四向量的各元素为所述和向量的对应元素在对应的元素随机分布函数中的分位值。The fourth vector of the target video segment pair is determined according to the element random distribution function corresponding to each element and the sum vector, wherein each element of the fourth vector is the corresponding element of the sum vector The quantile value in the element random distribution function.
  7. 根据权利要求6所述的方法,其特征在于,根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段包括:The method according to claim 6, wherein selecting a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector comprises:
    根据所述第四向量确定过滤视频片段对;Determining a filtered video segment pair according to the fourth vector;
    从所述第一视频候选集中去除掉所述过滤视频片段对以获得第二视频候选集。The filtered video segment pair is removed from the first video candidate set to obtain a second video candidate set.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述第四向量确定过滤视频片段包括:The method according to claim 7, wherein the determining to filter the video segment according to the fourth vector comprises:
    响应于所述第四向量中所述各元素中任一项小于对应的分位值阈值,将对应的视频片段对确定为所述过滤视频片段。In response to any one of the elements in the fourth vector being less than the corresponding quantile threshold, the corresponding video segment pair is determined as the filtered video segment.
  9. 根据权利要求5所述的方法,其特征在于,所述元素随机分布函数为以平均向量中对应元素为均值,长度和所述窗口时长匹配的二项分布函数。The method according to claim 5, wherein the element random distribution function is a binomial distribution function that takes the corresponding element in the average vector as the mean value and the length matches the window duration.
  10. 根据权利要求7所述的方法,其特征在于,根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段还包括:8. The method according to claim 7, wherein selecting a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector further comprises:
    根据所述第四向量计算所述第二视频候选集中的各视频片段对的评分值;Calculating the score value of each video segment pair in the second video candidate set according to the fourth vector;
    根据所述评分值对所述第二视频候选集中的多个第一视频片段进行排序和过滤,直至所述第二视频候选集中剩余的第一视频片段的数量满足预定条件。Sorting and filtering multiple first video clips in the second video candidate set according to the score value until the number of remaining first video clips in the second video candidate set meets a predetermined condition.
  11. 一种视频处理装置,包括:A video processing device includes:
    第一获取单元,用于获取第一用户的至少一个第一视频文件和第二用户的至少一个第二视频文件;The first obtaining unit is configured to obtain at least one first video file of the first user and at least one second video file of the second user;
    截取单元,用于根据至少一个窗口时长对所述第一视频文件和所述第二视频文件进行遍历截取以获得第一视频候选集,所述第一视频候选集包括多个视频片段对,各所述视频片段对包括窗口时长相同且时间轴位置相同的第一视频片段和对应的第二视频片段;The interception unit is configured to traverse and intercept the first video file and the second video file according to at least one window duration to obtain a first video candidate set, where the first video candidate set includes a plurality of video clip pairs, each The video clip pair includes a first video clip and a corresponding second video clip with the same window duration and the same time axis position;
    第二获取单元,用于根据所述的第一视频文件获取每个单位时间周期对应的第一向量,根据所述第二视频文件获取每个单位时间周期对应的第二向量,所述第一向量用于表征对应的单位时间周期内的第一用户状态,所述第二向量用于表征对应的单位时间周期内的第二用户状态;The second obtaining unit is configured to obtain a first vector corresponding to each unit time period according to the first video file, and obtain a second vector corresponding to each unit time period according to the second video file. The vector is used to characterize the first user state in the corresponding unit time period, and the second vector is used to characterize the second user state in the corresponding unit time period;
    第三获取单元,用于根据每个单位时间周期对应的所述第一向量和所述第二向量获取第三向量;A third acquiring unit, configured to acquire a third vector according to the first vector and the second vector corresponding to each unit time period;
    第四获取单元,用于根据每个视频片段对对应的时间轴位置和所述第三向量,确定第四向量;A fourth acquiring unit, configured to determine a fourth vector according to the time axis position corresponding to each video segment pair and the third vector;
    选取单元,用于根据所述第四向量从所述第一视频候选集中选取多个第一视频片段和第二视频片段;以及A selecting unit, configured to select a plurality of first video clips and second video clips from the first video candidate set according to the fourth vector; and
    合成单元,根据选取的第一视频片段和第二视频片段得到目标视频片段。The synthesis unit obtains the target video segment according to the selected first video segment and the second video segment.
  12. 一种电子设备,包括存储器和处理器,其特征在于,所述存储器用于存储一条或多条计算机程序指令,其中,所述一条或多条计算机程序指令被所述处理器执行以实现如权利要求1至10中任一项所述的方法。An electronic device comprising a memory and a processor, wherein the memory is used to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to realize The method of any one of 1 to 10 is required.
  13. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现如权利要求1至10中任一项所述的方法。A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 10 when executed by a processor.
PCT/CN2019/121228 2019-04-26 2019-11-27 Method and device for video processing, electronic device, and computer-readable storage medium WO2020215722A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910345625.4 2019-04-26
CN201910345625.4A CN110087143B (en) 2019-04-26 2019-04-26 Video processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2020215722A1 true WO2020215722A1 (en) 2020-10-29

Family

ID=67417083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121228 WO2020215722A1 (en) 2019-04-26 2019-11-27 Method and device for video processing, electronic device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110087143B (en)
WO (1) WO2020215722A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022170837A1 (en) * 2021-02-09 2022-08-18 华为技术有限公司 Video processing method and apparatus

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110087143B (en) * 2019-04-26 2020-06-09 北京谦仁科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium
CN110650368B (en) * 2019-09-25 2022-04-26 新东方教育科技集团有限公司 Video processing method and device and electronic equipment
CN110650369B (en) * 2019-09-29 2021-09-17 北京谦仁科技有限公司 Video processing method and device, storage medium and electronic equipment
CN111107442B (en) * 2019-11-25 2022-07-12 北京大米科技有限公司 Method and device for acquiring audio and video files, server and storage medium
CN112887801A (en) * 2019-11-29 2021-06-01 阿里巴巴集团控股有限公司 Multimedia playing method, terminal and storage medium
CN112565914B (en) * 2021-02-18 2021-06-04 北京世纪好未来教育科技有限公司 Video display method, device and system for online classroom and storage medium
CN113709560B (en) * 2021-03-31 2024-01-02 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120237126A1 (en) * 2011-03-16 2012-09-20 Electronics & Telecommunications Research Institute Apparatus and method for determining characteristic of motion picture
CN108028054A (en) * 2015-09-30 2018-05-11 苹果公司 The Voice & Video component of audio /video show to automatically generating synchronizes
CN108989691A (en) * 2018-10-19 2018-12-11 北京微播视界科技有限公司 Video capture method, apparatus, electronic equipment and computer readable storage medium
CN110087143A (en) * 2019-04-26 2019-08-02 北京谦仁科技有限公司 Method for processing video frequency and device, electronic equipment and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105872584A (en) * 2015-11-25 2016-08-17 乐视网信息技术(北京)股份有限公司 Intercepted video sharing method and device
CN109089059A (en) * 2018-10-19 2018-12-25 北京微播视界科技有限公司 Method, apparatus, electronic equipment and the computer storage medium that video generates

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120237126A1 (en) * 2011-03-16 2012-09-20 Electronics & Telecommunications Research Institute Apparatus and method for determining characteristic of motion picture
CN108028054A (en) * 2015-09-30 2018-05-11 苹果公司 The Voice & Video component of audio /video show to automatically generating synchronizes
CN108989691A (en) * 2018-10-19 2018-12-11 北京微播视界科技有限公司 Video capture method, apparatus, electronic equipment and computer readable storage medium
CN110087143A (en) * 2019-04-26 2019-08-02 北京谦仁科技有限公司 Method for processing video frequency and device, electronic equipment and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022170837A1 (en) * 2021-02-09 2022-08-18 华为技术有限公司 Video processing method and apparatus

Also Published As

Publication number Publication date
CN110087143B (en) 2020-06-09
CN110087143A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
WO2020215722A1 (en) Method and device for video processing, electronic device, and computer-readable storage medium
Tao et al. Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
WO2021082941A1 (en) Video figure recognition method and apparatus, and storage medium and electronic device
WO2021159775A1 (en) Training method and device for audio separation network, audio separation method and device, and medium
WO2019196205A1 (en) Foreign language teaching evaluation information generating method and apparatus
CN109614934B (en) Online teaching quality assessment parameter generation method and device
Mariooryad et al. Exploring cross-modality affective reactions for audiovisual emotion recognition
US9477304B2 (en) Information processing apparatus, information processing method, and program
WO2024000867A1 (en) Emotion recognition method and apparatus, device, and storage medium
WO2020010883A1 (en) Method for synchronising video data and audio data, storage medium, and electronic device
Ringeval et al. Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion
WO2020052062A1 (en) Detection method and device
WO2021179719A1 (en) Face detection method, apparatus, medium, and electronic device
Wei et al. Real-time head nod and shake detection for continuous human affect recognition
CN113298015A (en) Video character social relationship graph generation method based on graph convolution network
CN114140885A (en) Emotion analysis model generation method and device, electronic equipment and storage medium
CN113920534A (en) Method, system and storage medium for extracting video highlight
US10930169B2 (en) Computationally derived assessment in childhood education systems
Song et al. Emotional listener portrait: Realistic listener motion simulation in conversation
TWI769520B (en) Multi-language speech recognition and translation method and system
CN113379874B (en) Face animation generation method, intelligent terminal and storage medium
Song et al. Emotional listener portrait: Neural listener head generation with emotion
CN111008579A (en) Concentration degree identification method and device and electronic equipment
Gervasi et al. A method for predicting words by interpreting labial movements
JP2022539634A (en) USER INTERACTION METHODS, APPARATUS, APPARATUS AND MEDIUM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925823

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925823

Country of ref document: EP

Kind code of ref document: A1