CN110381367A - Video processing method, video processing equipment and computer readable storage medium - Google Patents

Video processing method, video processing equipment and computer readable storage medium Download PDF

Info

Publication number
CN110381367A
CN110381367A CN201910619956.2A CN201910619956A CN110381367A CN 110381367 A CN110381367 A CN 110381367A CN 201910619956 A CN201910619956 A CN 201910619956A CN 110381367 A CN110381367 A CN 110381367A
Authority
CN
China
Prior art keywords
video
information
emotion
segment
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910619956.2A
Other languages
Chinese (zh)
Other versions
CN110381367B (en
Inventor
张进
莫东松
钟宜峰
马丹
张健
赵璐
马晓琳
王科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN201910619956.2A priority Critical patent/CN110381367B/en
Publication of CN110381367A publication Critical patent/CN110381367A/en
Application granted granted Critical
Publication of CN110381367B publication Critical patent/CN110381367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4532Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Child & Adolescent Psychology (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a video processing method, video processing equipment and a computer readable storage medium, relates to the technical field of video processing, and aims to solve the problem that a video fragment meeting the personalized requirements of a user cannot be obtained by the conventional video editing method. The method comprises the following steps: obtaining a first video clip from a video to be processed; acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user; acquiring a second video clip from the video to be processed based on the personalized feature information; and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment. The embodiment of the invention can enable the obtained target video clip to better meet the personalized requirements of the user.

Description

Video processing method, video processing equipment and computer readable storage medium
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video processing method, a video processing device, and a computer-readable storage medium.
Background
Generally, video editing is mainly performed manually, and is performed by professional editing personnel by using video editing tools. However, the efficiency of manual editing is low, and the requirement of rapidly exposing the internet live broadcast content to the service cannot be met.
The rise of artificial intelligence, particularly the development of deep learning in computer vision, has led to the emergence of techniques for video editing using deep learning. Compared with artificial editing, the editing method of artificial intelligence can greatly improve the editing speed of a specific scene.
However, in the video editing method based on artificial intelligence, the definition of the highlight video is preset by the operator. Therefore, the clipped video segment cannot meet the personalized requirements of the user.
Disclosure of Invention
Embodiments of the present invention provide a video processing method, a video processing device, and a computer-readable storage medium, so as to solve a problem that a video clip meeting personalized requirements of a user cannot be obtained by an existing video clipping method.
In a first aspect, an embodiment of the present invention provides a video processing method, including:
obtaining a first video clip from a video to be processed;
acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user;
acquiring a second video clip from the video to be processed based on the personalized feature information;
and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.
Wherein, in a case that the personalized feature information includes the first emotion information, acquiring the first emotion information includes:
collecting image information of the user when watching the video to be processed;
inputting the image information into a first emotion analysis model;
and outputting the first emotion analysis model as the first emotion information.
Acquiring a second video clip from the video to be processed based on the personalized feature information, wherein the acquiring of the second video clip from the video to be processed comprises:
when the first emotion information is acquired, marking a first video frame in the video to be processed;
forming the second video segment using the first video frame;
the first emotion information is emotion information reflected when the user watches the first video frame.
Wherein, in a case that the personalized feature information includes the second emotion information, acquiring the second emotion information includes:
collecting text information input by the user;
preprocessing the text information to obtain a text preprocessing result;
inputting the text preprocessing result into a second emotion analysis model;
and outputting the second emotion analysis model as the second emotion information.
Acquiring a second video clip from the video to be processed based on the personalized feature information, wherein the acquiring of the second video clip from the video to be processed comprises:
when the second emotion information is acquired, marking a second video frame in the video to be processed;
forming the second video segment using the second video frame;
wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
Wherein, the obtaining of the target video segment to be clipped by using the first video segment and the second video segment comprises:
selecting a first target video clip from the first video clips;
selecting a second target video clip from the second video clips;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
Wherein the second video segment comprises a third video segment and a fourth video segment;
the obtaining a second video clip from the video to be processed based on the personalized feature information includes:
when the first emotion information is acquired, marking a third video frame in the video to be processed, and forming a third video segment by using the third video frame;
when the second emotion information is acquired, marking a fourth video frame in the video to be processed, and forming a fourth video segment by using the fourth video frame;
the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
Wherein, the obtaining of the target video segment to be clipped by using the first video segment and the second video segment comprises:
forming a set of video segments including emotional features using the third video segment and the fourth video segment;
selecting a first target video clip from the first video clips;
selecting a second target video clip from the video clip set containing the emotional features;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
In a second aspect, an embodiment of the present invention further provides a video processing apparatus, including: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement the steps in the video processing method.
In a third aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the video processing method described above.
In the embodiment of the invention, the acquired first video segment and the second video segment acquired based on the personalized feature information of the user are combined to acquire the target video segment to be edited. Therefore, by using the scheme of the embodiment of the invention, the obtained target video clip can better meet the personalized requirements of the user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flow chart of a video processing method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a video processing system provided by an embodiment of the invention;
FIG. 3 is a block diagram of a video processing apparatus according to an embodiment of the present invention;
fig. 4 is one of the structural diagrams of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 5 is one of the structural diagrams of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 6 is a second block diagram of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 7 is a second block diagram of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 8 is one of the structural diagrams of a processing module in the video processing apparatus according to the embodiment of the present invention;
fig. 9 is a third block diagram of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 10 is a third block diagram of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;
fig. 11 is a second block diagram of a processing module in the video processing apparatus according to the second embodiment of the present invention;
fig. 12 is a second block diagram of a video processing apparatus according to an embodiment of the present invention;
fig. 13 is a structure of a video processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:
step 101, obtaining a first video clip from a video to be processed.
In the embodiment of the present invention, the first video segment herein may be obtained in any manner.
For example, when the user selects to watch video live broadcasting, the system obtains the ID number of the video watched by the user, and uses an AI (Artificial Intelligence) video clip device to perform highlight video clip on the video content according to a predefined rule.
Fig. 2 is a schematic diagram of a video processing system according to an embodiment of the invention. In fig. 2, the system comprises: AI video clip equipment, video information acquisition equipment, text information acquisition equipment. The AI video clipping device is used for clipping the input live stream by using an AI-based clipping mode. The video information acquisition equipment is used for acquiring image information of the user and analyzing emotion information of the user. And the text information acquisition equipment is used for acquiring the text information input by the user and analyzing the emotion information of the user. The AI video clipping device, the video information acquisition device and the text information acquisition device can respectively obtain video segments clipped by the video clipping device, the video information acquisition device and the text information acquisition device. And then, the video clips obtained by the three devices can be processed by the video processing module, so that the video clip transmitted to the client is formed.
In fig. 2, the AI video clip device includes: a 3D module, a face Recognition module and an OCR (Optical Character Recognition) module. The 3D module is used for processing and recognizing actions in the video, the face recognition module is used for recognizing characters in the video, and the OCR module is used for recognizing characters in the video. With the AI video clip device described above, a first video segment may be generated. Such as goals, fouls, shots, score information, etc., in a football game, a first video clip may be generated.
And 102, acquiring personalized feature information of the user.
In the embodiment of the invention, the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user.
Wherein, in the case that the personalized feature information includes the first emotion information, the first emotion information may include:
and acquiring image information of the user when watching the video to be processed. And then, inputting the image information into a first emotion analysis model, and taking the output of the first emotion analysis model as the first emotion information. The first emotion analysis model may be any emotion analysis model, such as a VGG19 preprocessing model. In this way, the obtained emotional information can be made more accurate.
In the embodiment of the present invention, the obtained emotion information includes, but is not limited to: happiness, anger, fear, sadness, disgust and surprise.
On the basis, in order to improve the processing speed, in the embodiment of the present invention, after the image information is collected, the image information may be sampled to obtain the sampled image information. Then, the sampled visual information is then input to the first emotion analysis model. The sampling processing refers to selecting a part of the collected influence information from the collected influence information according to a preset rule and inputting the part of the influence information into the emotion analysis model. For example, every 8 pictures taken may be sampled, for a total of 8 pictures.
As shown in connection with fig. 2, the system may further include: video information acquisition equipment. The apparatus comprises: camera module and video processing module.
The camera module is used for collecting images of a user in real time, such as emotion and actions when the user watches videos. The video processing module has two functions: the first is to align the user image frame with the live video stream frame. Through the method, the live video streaming frame corresponding to the expression and action frame of the user can be confirmed, so that the change of the expression and action of the user aiming at the part of the video is confirmed; and secondly, video preprocessing is carried out on the user image. In the embodiment of the present invention, the collected user image is sampled, and 8 × 8 (8 pictures are sampled once every 8 pictures, and a total number of 8 pictures are sampled) video segments are obtained. The video clip is then input into a trained emotion analysis model.
In practical application, when training the emotion analysis model, a section of video clip can be input in a live stream in a sliding window mode by using the same sampling strategy as the preprocessing, and the confidence degrees of the section of video clip belonging to different emotion categories are output. Specifically, the trained VGG19 preprocessing model is used for acquiring the emotion of the user, and the method comprises the following steps: happiness, anger, fear, sadness, disgust and surprise, and simultaneously acquires a live streaming frame corresponding to the emotion.
Wherein, in the case that the personalized feature information includes the second emotion information, acquiring the second emotion information may include:
and collecting the text information input by the user. Wherein the text information comprises comments input by the user, a barrage and the like. And then, preprocessing the text information to obtain a text preprocessing result. Then, the text preprocessing result is input into a second emotion analysis model, and the output of the second emotion analysis model is used as the second emotion information.
Wherein the pre-processing comprises: and performing word segmentation, feature extraction, text classification and the like on the text.
As shown in connection with fig. 2, the system may further include: text information collection equipment. The apparatus comprises: the device comprises a text acquisition module and a text processing module. The text collection module can obtain the text of the user in the bullet screen or comment in real time. The text processing module has two functions: firstly, aligning a user text with a video live streaming frame, so that the video live streaming frame corresponding to the text input by a user can be confirmed; secondly, emotion recognition is carried out on the text.
When the emotion recognition is carried out on the text, firstly, a word segmentation tool is adopted to carry out word segmentation on the text, secondly, feature extraction is carried out on the text, and then, the text classification is carried out. The text classification can adopt a naive Bayes method, and the concrete formula is as follows:
wherein, cNBThe emotion classification when the right part of the formula has the maximum value, P (c)j) Indicating the probability of the occurrence of the emotion,indicating the probability of each word appearing in the text message under such an emotion.
Wherein,
count (c) represents the number of categories of emotion, Count (w, c) represents the number of times a certain word appears under a certain emotion;representing the probability of a word occurring under a certain emotion and V identifies the vocabulary of the current text.
In the case that the personalized feature information includes the first emotion information and the second emotion information, in this step, the two manners are combined. Specifically, image information of the user when watching the video to be processed is collected, the image information is input into a first emotion analysis model, and the output of the first emotion analysis model is used as the first emotion information. The method comprises the steps of collecting text information input by a user, and preprocessing the text information to obtain a text preprocessing result. And then, inputting the text preprocessing result into a second emotion analysis model, and outputting the second emotion analysis model as the second emotion information. The first emotion information and the second emotion information are acquired without strict precedence relation.
And 103, acquiring a second video clip from the video to be processed based on the personalized feature information.
The manner in which the second video clip is obtained is different for different personalized feature information.
In this step, when the personalized feature information includes the first emotion information, a first video frame is marked in the video to be processed, and the first video frame is used to form the second video segment. The first emotion information is emotion information reflected when the user watches the first video frame.
In this step, when the second emotion information is acquired, a second video frame is marked in the video to be processed, and the second video segment is formed by using the second video frame. Wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
In this way, the obtained second video segment can be made to accurately correspond to the emotional change exhibited by the user.
In the case where the personalized feature information includes the first emotion information and the second emotion information, in this step, the second video clip includes a third video clip and a fourth video clip. Specifically, when the first emotion information is acquired, a third video frame is marked in the video to be processed, and the third video frame is used for forming the third video segment, and when the second emotion information is acquired, a fourth video frame is marked in the video to be processed, and the fourth video frame is used for forming the fourth video segment.
The first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
And acquiring the third video clip and the fourth video clip, wherein the sequence of the third video clip and the fourth video clip has no strict sequence relation.
And 104, obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.
In this step, a first target video segment is selected from the first video segment and a second target video segment is selected from the second video segment, in case that the personalized feature information only includes the first emotion information or in case that the personalized feature information only includes the second emotion information. Then, obtaining the target video clip by utilizing the first target video clip and the second target video clip; wherein the first target video segment and the second target video segment have the same attribute information.
Here, the attribute information may be that the contents are the same, the start and end times of the video clips in the video to be processed are the same, and the like. Then the target video segment is the result of the intersection of the first target video segment and the second target video segment.
Wherein, in the case that the personalized feature information includes the first emotion information and the second emotion information, in this step, a set of video clips including an emotion feature is formed using the third video clip and the fourth video clip. And then, selecting a first target video segment from the first video segments, and selecting a second target video segment from the video segment set containing the emotional features. Then, obtaining the target video clip by utilizing the first target video clip and the second target video clip; wherein the first target video segment and the second target video segment have the same attribute information.
And the video segment set containing the emotional features is the result after the third video segment and the fourth video segment are subjected to de-union set. Here, the attribute information may be that the contents are the same, the start and end times of the video clips in the video to be processed are the same, and the like. Then the target video segment is the result of the intersection of the first target video segment and the second target video segment.
Through the method, the obtained target video clip not only meets the requirements of a common video clip, but also takes the personalized features of the user into account, so that the obtained target video clip better meets the requirements of the user.
In the embodiment of the invention, the acquired first video segment and the second video segment acquired based on the personalized feature information of the user are combined to acquire the target video segment to be edited. Therefore, by using the scheme of the embodiment of the invention, the obtained target video clip can better meet the personalized requirements of the user.
After the target video segment is obtained, the target video segment can be injected into a background video content storage module to generate a corresponding ID so as to facilitate subsequent searching or use and the like.
On the basis of the above embodiment, in order to provide a video clip more meeting the user requirements for the user subsequently, the identification information of the user may also be acquired, and then the target video clip is associated with the identification. The identification information may be, for example, a user name, an ID, or the like. After the target video clip is obtained, a video playing address can be configured for the target video clip, and the target video clip is pushed to a client program and clicked by a user for watching.
The embodiment of the invention also provides a video processing device. Referring to fig. 3, fig. 3 is a structural diagram of a video processing apparatus according to an embodiment of the present invention. Since the principle of the video processing apparatus for solving the problem is similar to the video processing method in the embodiment of the present invention, the implementation of the video processing apparatus can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 3, the video processing apparatus includes: a first obtaining module 301, configured to obtain a first video segment from a video to be processed; a second obtaining module 302, configured to obtain personalized feature information of a user, where the personalized feature information includes at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user; a third obtaining module 303, configured to obtain a second video segment from the video to be processed based on the personalized feature information; and the processing module 304 is configured to obtain a target video segment to be clipped by using the first video segment and the second video segment.
Optionally, in a case that the personalized feature information includes the first emotion information, as shown in fig. 4, the second obtaining module 302 may include:
the first acquisition submodule 3021 is configured to acquire image information of the user when watching the video to be processed; the first processing sub-module 3022 is configured to input the image information into a first emotion analysis model, and output the first emotion analysis model as the first emotion information.
Optionally, the second obtaining module 302 may further include: the sampling submodule is used for sampling the image information to obtain sampled image information; the first processing sub-module is specifically configured to input the sampled image information to the first emotion analysis model.
Optionally, as shown in fig. 5, the third obtaining module 303 may include: a first marking submodule 3031, configured to mark a first video frame in the video to be processed when the first emotion information is acquired; a first obtaining submodule 3032, configured to form the second video segment by using the first video frame; the first emotion information is emotion information reflected when the user watches the first video frame.
Optionally, in a case that the personalized feature information includes the second emotion information, as shown in fig. 6, the second obtaining module 302 may include:
a second collecting submodule 3023, configured to collect text information input by the user; the preprocessing submodule 3024 is configured to preprocess the text information to obtain a text preprocessing result; a second processing sub-module 3025, configured to input the text preprocessing result into a second emotion analysis model, and output the second emotion analysis model as the second emotion information.
Optionally, as shown in fig. 7, the third obtaining module 303 may include: a second labeling submodule 3033, configured to label a second video frame in the video to be processed when the second emotion information is obtained; a second obtaining submodule 3034, configured to form the second video segment by using the second video frame; wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
Optionally, as shown in fig. 8, the processing module 304 may include: a first selecting submodule 3041 for selecting a first target video segment from the first video segments; a second selecting submodule 3042, configured to select a second target video segment from the second video segments; a first processing submodule 3043, configured to obtain the target video segment by using the first target video segment and the second target video segment; wherein the first target video segment and the second target video segment have the same attribute information.
Optionally, as shown in fig. 9, the second obtaining module 302 may include:
a third collecting submodule 3026, configured to collect image information of the user when watching the video to be processed;
a third processing sub-module 3027, configured to input the image information into a first emotion analysis model, and output the first emotion analysis model as the first emotion information;
a fourth collecting submodule 3028, configured to collect text information input by a user;
the fourth processing submodule 3029 is configured to preprocess the text information to obtain a text preprocessing result;
a fifth processing sub-module 3020, configured to input the text preprocessing result into a second emotion analysis model, and output the second emotion analysis model as the second emotion information.
Optionally, the second video segment includes a third video segment and a fourth video segment. As shown in fig. 10, the third obtaining module 303 may include:
the first obtaining submodule 3035 is configured to mark a third video frame in the video to be processed when the first emotion information is obtained, and form the third video segment by using the third video frame; a second obtaining submodule 3036, configured to mark a fourth video frame in the video to be processed when the second emotion information is obtained, and form the fourth video segment by using the fourth video frame; the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
Optionally, as shown in fig. 11, the processing module 304 may include:
a first processing submodule 3044 for forming a set of video segments containing emotional features using the third video segment and the fourth video segment; a third selecting submodule 3045 for selecting a first target video segment from the first video segments; a fourth selecting submodule 3046 for selecting a second target video segment from the set of video segments containing emotional characteristics; a second processing sub-module 3047, configured to obtain the target video segment by using the first target video segment and the second target video segment; wherein the first target video segment and the second target video segment have the same attribute information.
Optionally, as shown in fig. 12, the apparatus may further include:
an obtaining module 305, configured to obtain identification information of the user; an associating module 306, configured to associate the target video segment with the identifier.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
As shown in fig. 13, the video processing apparatus according to the embodiment of the present invention includes: a processor 1300, for reading the program in the memory 1320, for executing the following processes:
obtaining a first video clip from a video to be processed;
acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user;
acquiring a second video clip from the video to be processed based on the personalized feature information;
and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.
A transceiver 1310 for receiving and transmitting data under the control of the processor 1300.
In fig. 13, among other things, the bus architecture may include any number of interconnected buses and bridges with various circuits being linked together, particularly one or more processors represented by processor 1300 and memory represented by memory 1320. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1310 can be a number of elements including a transmitter and a transceiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.
The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.
The processor 1300 is further configured to read the computer program and execute the following steps:
under the condition that the personalized feature information comprises the first emotion information, collecting image information of the user when watching the video to be processed;
inputting the image information into a first emotion analysis model;
and outputting the first emotion analysis model as the first emotion information.
The processor 1300 is further configured to read the computer program and execute the following steps:
when the first emotion information is acquired, marking a first video frame in the video to be processed;
forming the second video segment using the first video frame;
the first emotion information is emotion information reflected when the user watches the first video frame.
The processor 1300 is further configured to read the computer program and execute the following steps:
collecting text information input by the user under the condition that the personalized feature information comprises the second emotion information;
preprocessing the text information to obtain a text preprocessing result;
inputting the text preprocessing result into a second emotion analysis model;
and outputting the second emotion analysis model as the second emotion information.
The processor 1300 is further configured to read the computer program and execute the following steps:
when the second emotion information is acquired, marking a second video frame in the video to be processed;
forming the second video segment using the second video frame;
wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
The processor 1300 is further configured to read the computer program and execute the following steps:
selecting a first target video clip from the first video clips;
selecting a second target video clip from the second video clips;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
The second video segment comprises a third video segment and a fourth video segment; the processor 1300 is further configured to read the computer program and execute the following steps:
when the first emotion information is acquired, marking a third video frame in the video to be processed, and forming a third video segment by using the third video frame;
when the second emotion information is acquired, marking a fourth video frame in the video to be processed, and forming a fourth video segment by using the fourth video frame;
the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
The processor 1300 is further configured to read the computer program and execute the following steps:
forming a set of video segments including emotional features using the third video segment and the fourth video segment;
selecting a first target video clip from the first video clips;
selecting a second target video clip from the video clip set containing the emotional features;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
The device provided by the embodiment of the present invention may implement the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Furthermore, a computer-readable storage medium of an embodiment of the present invention stores a computer program executable by a processor to implement:
obtaining a first video clip from a video to be processed;
acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user;
acquiring a second video clip from the video to be processed based on the personalized feature information;
and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.
Wherein, in a case that the personalized feature information includes the first emotion information, acquiring the first emotion information includes:
collecting image information of the user when watching the video to be processed;
inputting the image information into a first emotion analysis model;
and outputting the first emotion analysis model as the first emotion information.
Acquiring a second video clip from the video to be processed based on the personalized feature information, wherein the acquiring of the second video clip from the video to be processed comprises:
when the first emotion information is acquired, marking a first video frame in the video to be processed;
forming the second video segment using the first video frame;
the first emotion information is emotion information reflected when the user watches the first video frame.
Wherein, in a case that the personalized feature information includes the second emotion information, acquiring the second emotion information includes:
collecting text information input by the user;
preprocessing the text information to obtain a text preprocessing result;
inputting the text preprocessing result into a second emotion analysis model;
and outputting the second emotion analysis model as the second emotion information.
Acquiring a second video clip from the video to be processed based on the personalized feature information, wherein the acquiring of the second video clip from the video to be processed comprises:
when the second emotion information is acquired, marking a second video frame in the video to be processed;
forming the second video segment using the second video frame;
wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
Wherein, the obtaining of the target video segment to be clipped by using the first video segment and the second video segment comprises:
selecting a first target video clip from the first video clips;
selecting a second target video clip from the second video clips;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
Wherein the second video segment comprises a third video segment and a fourth video segment;
the obtaining a second video clip from the video to be processed based on the personalized feature information includes:
when the first emotion information is acquired, marking a third video frame in the video to be processed, and forming a third video segment by using the third video frame;
when the second emotion information is acquired, marking a fourth video frame in the video to be processed, and forming a fourth video segment by using the fourth video frame;
the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
Wherein, the obtaining of the target video segment to be clipped by using the first video segment and the second video segment comprises:
forming a set of video segments including emotional features using the third video segment and the fourth video segment;
selecting a first target video clip from the first video clips;
selecting a second target video clip from the video clip set containing the emotional features;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A video processing method, comprising:
obtaining a first video clip from a video to be processed;
acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user;
acquiring a second video clip from the video to be processed based on the personalized feature information;
and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.
2. The method of claim 1, wherein, in the case that the personalized feature information includes the first emotion information, obtaining the first emotion information comprises:
collecting image information of the user when watching the video to be processed;
inputting the image information into a first emotion analysis model;
and outputting the first emotion analysis model as the first emotion information.
3. The method according to claim 2, wherein the obtaining a second video segment from the video to be processed based on the personalized feature information comprises:
when the first emotion information is acquired, marking a first video frame in the video to be processed;
forming the second video segment using the first video frame;
the first emotion information is emotion information reflected when the user watches the first video frame.
4. The method according to claim 1 or 2, wherein, in the case that the personalized feature information includes the second emotion information, acquiring the second emotion information includes:
collecting text information input by the user;
preprocessing the text information to obtain a text preprocessing result;
inputting the text preprocessing result into a second emotion analysis model;
and outputting the second emotion analysis model as the second emotion information.
5. The method according to claim 4, wherein the obtaining a second video segment from the video to be processed based on the personalized feature information comprises:
when the second emotion information is acquired, marking a second video frame in the video to be processed;
forming the second video segment using the second video frame;
wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.
6. The method according to claim 1, wherein the obtaining a target video segment to be edited by using the first video segment and the second video segment comprises:
selecting a first target video clip from the first video clips;
selecting a second target video clip from the second video clips;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
7. The method of claim 4, wherein the second video segment comprises a third video segment and a fourth video segment;
the obtaining a second video clip from the video to be processed based on the personalized feature information includes:
when the first emotion information is acquired, marking a third video frame in the video to be processed, and forming a third video segment by using the third video frame;
when the second emotion information is acquired, marking a fourth video frame in the video to be processed, and forming a fourth video segment by using the fourth video frame;
the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.
8. The method according to claim 7, wherein the obtaining a target video segment to be edited by using the first video segment and the second video segment comprises:
forming a set of video segments including emotional features using the third video segment and the fourth video segment;
selecting a first target video clip from the first video clips;
selecting a second target video clip from the video clip set containing the emotional features;
obtaining the target video clip by using the first target video clip and the second target video clip;
wherein the first target video segment and the second target video segment have the same attribute information.
9. A video processing apparatus comprising: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; processor for reading a program in a memory to implement the steps in the video processing method according to any one of claims 1 to 8.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the video processing method according to any one of claims 1 to 8.
CN201910619956.2A 2019-07-10 2019-07-10 Video processing method, video processing equipment and computer readable storage medium Active CN110381367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910619956.2A CN110381367B (en) 2019-07-10 2019-07-10 Video processing method, video processing equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910619956.2A CN110381367B (en) 2019-07-10 2019-07-10 Video processing method, video processing equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110381367A true CN110381367A (en) 2019-10-25
CN110381367B CN110381367B (en) 2022-01-25

Family

ID=68250904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910619956.2A Active CN110381367B (en) 2019-07-10 2019-07-10 Video processing method, video processing equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110381367B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022048347A1 (en) * 2020-09-02 2022-03-10 华为技术有限公司 Video editing method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431689A (en) * 2007-11-05 2009-05-13 华为技术有限公司 Method and device for generating video abstract
US20130283301A1 (en) * 2012-04-18 2013-10-24 Scorpcast, Llc System and methods for providing user generated video reviews
CN103856833A (en) * 2012-12-05 2014-06-11 三星电子株式会社 Video processing apparatus and method
CN104123396A (en) * 2014-08-15 2014-10-29 三星电子(中国)研发中心 Soccer video abstract generation method and device based on cloud television
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
US20180014052A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for real time, dynamic, adaptive and non-sequential stitching of clips of videos
US20180014037A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for switching to dynamically assembled video during streaming of live video
CN108391164A (en) * 2018-02-24 2018-08-10 广东欧珀移动通信有限公司 Video analytic method and Related product
CN108595477A (en) * 2018-03-12 2018-09-28 北京奇艺世纪科技有限公司 A kind for the treatment of method and apparatus of video data
CN108924576A (en) * 2018-07-10 2018-11-30 武汉斗鱼网络科技有限公司 A kind of video labeling method, device, equipment and medium
US20190005133A1 (en) * 2015-12-21 2019-01-03 Thomson Licensing Method, apparatus and arrangement for summarizing and browsing video content
CN109657100A (en) * 2019-01-25 2019-04-19 深圳市商汤科技有限公司 Video Roundup generation method and device, electronic equipment and storage medium
CN109688463A (en) * 2018-12-27 2019-04-26 北京字节跳动网络技术有限公司 A kind of editing video generation method, device, terminal device and storage medium
CN109842805A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Generation method, device, computer equipment and the storage medium of video watching focus
US20190188479A1 (en) * 2017-12-14 2019-06-20 Google Llc Generating synthesis videos

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431689A (en) * 2007-11-05 2009-05-13 华为技术有限公司 Method and device for generating video abstract
US20130283301A1 (en) * 2012-04-18 2013-10-24 Scorpcast, Llc System and methods for providing user generated video reviews
CN103856833A (en) * 2012-12-05 2014-06-11 三星电子株式会社 Video processing apparatus and method
CN104123396A (en) * 2014-08-15 2014-10-29 三星电子(中国)研发中心 Soccer video abstract generation method and device based on cloud television
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
US20190005133A1 (en) * 2015-12-21 2019-01-03 Thomson Licensing Method, apparatus and arrangement for summarizing and browsing video content
US20180014037A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for switching to dynamically assembled video during streaming of live video
US20180014052A1 (en) * 2016-07-09 2018-01-11 N. Dilip Venkatraman Method and system for real time, dynamic, adaptive and non-sequential stitching of clips of videos
US20190188479A1 (en) * 2017-12-14 2019-06-20 Google Llc Generating synthesis videos
CN108391164A (en) * 2018-02-24 2018-08-10 广东欧珀移动通信有限公司 Video analytic method and Related product
CN108595477A (en) * 2018-03-12 2018-09-28 北京奇艺世纪科技有限公司 A kind for the treatment of method and apparatus of video data
CN108924576A (en) * 2018-07-10 2018-11-30 武汉斗鱼网络科技有限公司 A kind of video labeling method, device, equipment and medium
CN109688463A (en) * 2018-12-27 2019-04-26 北京字节跳动网络技术有限公司 A kind of editing video generation method, device, terminal device and storage medium
CN109842805A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Generation method, device, computer equipment and the storage medium of video watching focus
CN109657100A (en) * 2019-01-25 2019-04-19 深圳市商汤科技有限公司 Video Roundup generation method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AMIR H: "Interactive Exploration of Surveillance Video through Action Shot Summarization and Trajectory Visualization", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》 *
栾悉道: "一种基于层次分析法的视频摘要评价模型", 《计算机应用研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022048347A1 (en) * 2020-09-02 2022-03-10 华为技术有限公司 Video editing method and device
CN114205534A (en) * 2020-09-02 2022-03-18 华为技术有限公司 Video editing method and device

Also Published As

Publication number Publication date
CN110381367B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
CN110837579B (en) Video classification method, apparatus, computer and readable storage medium
CN110020437B (en) Emotion analysis and visualization method combining video and barrage
CN109117777B (en) Method and device for generating information
CN111967302B (en) Video tag generation method and device and electronic equipment
US8280158B2 (en) Systems and methods for indexing presentation videos
CN109034069B (en) Method and apparatus for generating information
CN109862397B (en) Video analysis method, device, equipment and storage medium
CN110781347A (en) Video processing method, device, equipment and readable storage medium
CN113542777B (en) Live video editing method and device and computer equipment
CN110740389B (en) Video positioning method, video positioning device, computer readable medium and electronic equipment
CN104063683A (en) Expression input method and device based on face identification
CN111160134A (en) Human-subject video scene analysis method and device
CN115994230A (en) Intelligent archive construction method integrating artificial intelligence and knowledge graph technology
WO2022062027A1 (en) Wine product positioning method and apparatus, wine product information management method and apparatus, and device and storage medium
CN112328833A (en) Label processing method and device and computer readable storage medium
CN111491209A (en) Video cover determining method and device, electronic equipment and storage medium
CN110381367B (en) Video processing method, video processing equipment and computer readable storage medium
CN114051154A (en) News video strip splitting method and system
CN113949828A (en) Video editing method and device, electronic equipment and storage medium
CN107656760A (en) Data processing method and device, electronic equipment
CN111949820A (en) Video associated interest point processing method and device and electronic equipment
CN116129319A (en) Weak supervision time sequence boundary positioning method and device, electronic equipment and storage medium
CN115665508A (en) Video abstract generation method and device, electronic equipment and storage medium
CN115035453A (en) Video title and tail identification method, device and equipment and readable storage medium
CN113965798A (en) Video information generating and displaying method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant