CN110381367A

CN110381367A - Video processing method, video processing equipment and computer readable storage medium

Info

Publication number: CN110381367A
Application number: CN201910619956.2A
Authority: CN
Inventors: 张进; 莫东松; 钟宜峰; 马丹; 张健; 赵璐; 马晓琳; 王科
Original assignee: Migu Cultural Technology Co Ltd
Current assignee: Migu Cultural Technology Co Ltd
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2019-10-25
Anticipated expiration: 2039-07-10
Also published as: CN110381367B

Abstract

The invention discloses a video processing method, video processing equipment and a computer readable storage medium, relates to the technical field of video processing, and aims to solve the problem that a video fragment meeting the personalized requirements of a user cannot be obtained by the conventional video editing method. The method comprises the following steps: obtaining a first video clip from a video to be processed; acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user; acquiring a second video clip from the video to be processed based on the personalized feature information; and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment. The embodiment of the invention can enable the obtained target video clip to better meet the personalized requirements of the user.

Description

Video processing method, video processing equipment and computer readable storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video processing method, a video processing device, and a computer-readable storage medium.

Background

Generally, video editing is mainly performed manually, and is performed by professional editing personnel by using video editing tools. However, the efficiency of manual editing is low, and the requirement of rapidly exposing the internet live broadcast content to the service cannot be met.

The rise of artificial intelligence, particularly the development of deep learning in computer vision, has led to the emergence of techniques for video editing using deep learning. Compared with artificial editing, the editing method of artificial intelligence can greatly improve the editing speed of a specific scene.

However, in the video editing method based on artificial intelligence, the definition of the highlight video is preset by the operator. Therefore, the clipped video segment cannot meet the personalized requirements of the user.

Disclosure of Invention

Embodiments of the present invention provide a video processing method, a video processing device, and a computer-readable storage medium, so as to solve a problem that a video clip meeting personalized requirements of a user cannot be obtained by an existing video clipping method.

In a first aspect, an embodiment of the present invention provides a video processing method, including:

obtaining a first video clip from a video to be processed;

acquiring personalized feature information of a user, wherein the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user;

acquiring a second video clip from the video to be processed based on the personalized feature information;

and obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.

Wherein, in a case that the personalized feature information includes the first emotion information, acquiring the first emotion information includes:

collecting image information of the user when watching the video to be processed;

inputting the image information into a first emotion analysis model;

and outputting the first emotion analysis model as the first emotion information.

Acquiring a second video clip from the video to be processed based on the personalized feature information, wherein the acquiring of the second video clip from the video to be processed comprises:

when the first emotion information is acquired, marking a first video frame in the video to be processed;

forming the second video segment using the first video frame;

the first emotion information is emotion information reflected when the user watches the first video frame.

Wherein, in a case that the personalized feature information includes the second emotion information, acquiring the second emotion information includes:

collecting text information input by the user;

preprocessing the text information to obtain a text preprocessing result;

inputting the text preprocessing result into a second emotion analysis model;

and outputting the second emotion analysis model as the second emotion information.

when the second emotion information is acquired, marking a second video frame in the video to be processed;

forming the second video segment using the second video frame;

wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.

Wherein, the obtaining of the target video segment to be clipped by using the first video segment and the second video segment comprises:

selecting a first target video clip from the first video clips;

selecting a second target video clip from the second video clips;

obtaining the target video clip by using the first target video clip and the second target video clip;

wherein the first target video segment and the second target video segment have the same attribute information.

Wherein the second video segment comprises a third video segment and a fourth video segment;

the obtaining a second video clip from the video to be processed based on the personalized feature information includes:

when the first emotion information is acquired, marking a third video frame in the video to be processed, and forming a third video segment by using the third video frame;

when the second emotion information is acquired, marking a fourth video frame in the video to be processed, and forming a fourth video segment by using the fourth video frame;

the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.

forming a set of video segments including emotional features using the third video segment and the fourth video segment;

selecting a first target video clip from the first video clips;

selecting a second target video clip from the video clip set containing the emotional features;

In a second aspect, an embodiment of the present invention further provides a video processing apparatus, including: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement the steps in the video processing method.

In a third aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the video processing method described above.

In the embodiment of the invention, the acquired first video segment and the second video segment acquired based on the personalized feature information of the user are combined to acquire the target video segment to be edited. Therefore, by using the scheme of the embodiment of the invention, the obtained target video clip can better meet the personalized requirements of the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flow chart of a video processing method provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a video processing system provided by an embodiment of the invention;

FIG. 3 is a block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 4 is one of the structural diagrams of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 5 is one of the structural diagrams of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 6 is a second block diagram of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 7 is a second block diagram of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 8 is one of the structural diagrams of a processing module in the video processing apparatus according to the embodiment of the present invention;

fig. 9 is a third block diagram of a second obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 10 is a third block diagram of a third obtaining module in the video processing apparatus according to the embodiment of the present invention;

fig. 11 is a second block diagram of a processing module in the video processing apparatus according to the second embodiment of the present invention;

fig. 12 is a second block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 13 is a structure of a video processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

step 101, obtaining a first video clip from a video to be processed.

In the embodiment of the present invention, the first video segment herein may be obtained in any manner.

For example, when the user selects to watch video live broadcasting, the system obtains the ID number of the video watched by the user, and uses an AI (Artificial Intelligence) video clip device to perform highlight video clip on the video content according to a predefined rule.

Fig. 2 is a schematic diagram of a video processing system according to an embodiment of the invention. In fig. 2, the system comprises: AI video clip equipment, video information acquisition equipment, text information acquisition equipment. The AI video clipping device is used for clipping the input live stream by using an AI-based clipping mode. The video information acquisition equipment is used for acquiring image information of the user and analyzing emotion information of the user. And the text information acquisition equipment is used for acquiring the text information input by the user and analyzing the emotion information of the user. The AI video clipping device, the video information acquisition device and the text information acquisition device can respectively obtain video segments clipped by the video clipping device, the video information acquisition device and the text information acquisition device. And then, the video clips obtained by the three devices can be processed by the video processing module, so that the video clip transmitted to the client is formed.

In fig. 2, the AI video clip device includes: a 3D module, a face Recognition module and an OCR (Optical Character Recognition) module. The 3D module is used for processing and recognizing actions in the video, the face recognition module is used for recognizing characters in the video, and the OCR module is used for recognizing characters in the video. With the AI video clip device described above, a first video segment may be generated. Such as goals, fouls, shots, score information, etc., in a football game, a first video clip may be generated.

And 102, acquiring personalized feature information of the user.

In the embodiment of the invention, the personalized feature information comprises at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user.

Wherein, in the case that the personalized feature information includes the first emotion information, the first emotion information may include:

and acquiring image information of the user when watching the video to be processed. And then, inputting the image information into a first emotion analysis model, and taking the output of the first emotion analysis model as the first emotion information. The first emotion analysis model may be any emotion analysis model, such as a VGG19 preprocessing model. In this way, the obtained emotional information can be made more accurate.

In the embodiment of the present invention, the obtained emotion information includes, but is not limited to: happiness, anger, fear, sadness, disgust and surprise.

On the basis, in order to improve the processing speed, in the embodiment of the present invention, after the image information is collected, the image information may be sampled to obtain the sampled image information. Then, the sampled visual information is then input to the first emotion analysis model. The sampling processing refers to selecting a part of the collected influence information from the collected influence information according to a preset rule and inputting the part of the influence information into the emotion analysis model. For example, every 8 pictures taken may be sampled, for a total of 8 pictures.

As shown in connection with fig. 2, the system may further include: video information acquisition equipment. The apparatus comprises: camera module and video processing module.

The camera module is used for collecting images of a user in real time, such as emotion and actions when the user watches videos. The video processing module has two functions: the first is to align the user image frame with the live video stream frame. Through the method, the live video streaming frame corresponding to the expression and action frame of the user can be confirmed, so that the change of the expression and action of the user aiming at the part of the video is confirmed; and secondly, video preprocessing is carried out on the user image. In the embodiment of the present invention, the collected user image is sampled, and 8 × 8 (8 pictures are sampled once every 8 pictures, and a total number of 8 pictures are sampled) video segments are obtained. The video clip is then input into a trained emotion analysis model.

In practical application, when training the emotion analysis model, a section of video clip can be input in a live stream in a sliding window mode by using the same sampling strategy as the preprocessing, and the confidence degrees of the section of video clip belonging to different emotion categories are output. Specifically, the trained VGG19 preprocessing model is used for acquiring the emotion of the user, and the method comprises the following steps: happiness, anger, fear, sadness, disgust and surprise, and simultaneously acquires a live streaming frame corresponding to the emotion.

Wherein, in the case that the personalized feature information includes the second emotion information, acquiring the second emotion information may include:

and collecting the text information input by the user. Wherein the text information comprises comments input by the user, a barrage and the like. And then, preprocessing the text information to obtain a text preprocessing result. Then, the text preprocessing result is input into a second emotion analysis model, and the output of the second emotion analysis model is used as the second emotion information.

Wherein the pre-processing comprises: and performing word segmentation, feature extraction, text classification and the like on the text.

As shown in connection with fig. 2, the system may further include: text information collection equipment. The apparatus comprises: the device comprises a text acquisition module and a text processing module. The text collection module can obtain the text of the user in the bullet screen or comment in real time. The text processing module has two functions: firstly, aligning a user text with a video live streaming frame, so that the video live streaming frame corresponding to the text input by a user can be confirmed; secondly, emotion recognition is carried out on the text.

When the emotion recognition is carried out on the text, firstly, a word segmentation tool is adopted to carry out word segmentation on the text, secondly, feature extraction is carried out on the text, and then, the text classification is carried out. The text classification can adopt a naive Bayes method, and the concrete formula is as follows:

wherein, c_NBThe emotion classification when the right part of the formula has the maximum value, P (c)_j) Indicating the probability of the occurrence of the emotion,indicating the probability of each word appearing in the text message under such an emotion.

Wherein,

count (c) represents the number of categories of emotion, Count (w, c) represents the number of times a certain word appears under a certain emotion;representing the probability of a word occurring under a certain emotion and V identifies the vocabulary of the current text.

In the case that the personalized feature information includes the first emotion information and the second emotion information, in this step, the two manners are combined. Specifically, image information of the user when watching the video to be processed is collected, the image information is input into a first emotion analysis model, and the output of the first emotion analysis model is used as the first emotion information. The method comprises the steps of collecting text information input by a user, and preprocessing the text information to obtain a text preprocessing result. And then, inputting the text preprocessing result into a second emotion analysis model, and outputting the second emotion analysis model as the second emotion information. The first emotion information and the second emotion information are acquired without strict precedence relation.

And 103, acquiring a second video clip from the video to be processed based on the personalized feature information.

The manner in which the second video clip is obtained is different for different personalized feature information.

In this step, when the personalized feature information includes the first emotion information, a first video frame is marked in the video to be processed, and the first video frame is used to form the second video segment. The first emotion information is emotion information reflected when the user watches the first video frame.

In this step, when the second emotion information is acquired, a second video frame is marked in the video to be processed, and the second video segment is formed by using the second video frame. Wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.

In this way, the obtained second video segment can be made to accurately correspond to the emotional change exhibited by the user.

In the case where the personalized feature information includes the first emotion information and the second emotion information, in this step, the second video clip includes a third video clip and a fourth video clip. Specifically, when the first emotion information is acquired, a third video frame is marked in the video to be processed, and the third video frame is used for forming the third video segment, and when the second emotion information is acquired, a fourth video frame is marked in the video to be processed, and the fourth video frame is used for forming the fourth video segment.

And acquiring the third video clip and the fourth video clip, wherein the sequence of the third video clip and the fourth video clip has no strict sequence relation.

And 104, obtaining a target video segment to be clipped by utilizing the first video segment and the second video segment.

In this step, a first target video segment is selected from the first video segment and a second target video segment is selected from the second video segment, in case that the personalized feature information only includes the first emotion information or in case that the personalized feature information only includes the second emotion information. Then, obtaining the target video clip by utilizing the first target video clip and the second target video clip; wherein the first target video segment and the second target video segment have the same attribute information.

Here, the attribute information may be that the contents are the same, the start and end times of the video clips in the video to be processed are the same, and the like. Then the target video segment is the result of the intersection of the first target video segment and the second target video segment.

Wherein, in the case that the personalized feature information includes the first emotion information and the second emotion information, in this step, a set of video clips including an emotion feature is formed using the third video clip and the fourth video clip. And then, selecting a first target video segment from the first video segments, and selecting a second target video segment from the video segment set containing the emotional features. Then, obtaining the target video clip by utilizing the first target video clip and the second target video clip; wherein the first target video segment and the second target video segment have the same attribute information.

And the video segment set containing the emotional features is the result after the third video segment and the fourth video segment are subjected to de-union set. Here, the attribute information may be that the contents are the same, the start and end times of the video clips in the video to be processed are the same, and the like. Then the target video segment is the result of the intersection of the first target video segment and the second target video segment.

Through the method, the obtained target video clip not only meets the requirements of a common video clip, but also takes the personalized features of the user into account, so that the obtained target video clip better meets the requirements of the user.

After the target video segment is obtained, the target video segment can be injected into a background video content storage module to generate a corresponding ID so as to facilitate subsequent searching or use and the like.

On the basis of the above embodiment, in order to provide a video clip more meeting the user requirements for the user subsequently, the identification information of the user may also be acquired, and then the target video clip is associated with the identification. The identification information may be, for example, a user name, an ID, or the like. After the target video clip is obtained, a video playing address can be configured for the target video clip, and the target video clip is pushed to a client program and clicked by a user for watching.

The embodiment of the invention also provides a video processing device. Referring to fig. 3, fig. 3 is a structural diagram of a video processing apparatus according to an embodiment of the present invention. Since the principle of the video processing apparatus for solving the problem is similar to the video processing method in the embodiment of the present invention, the implementation of the video processing apparatus can refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 3, the video processing apparatus includes: a first obtaining module 301, configured to obtain a first video segment from a video to be processed; a second obtaining module 302, configured to obtain personalized feature information of a user, where the personalized feature information includes at least one of first emotion information when the user watches the video to be processed and second emotion information represented by text information input by the user; a third obtaining module 303, configured to obtain a second video segment from the video to be processed based on the personalized feature information; and the processing module 304 is configured to obtain a target video segment to be clipped by using the first video segment and the second video segment.

Optionally, in a case that the personalized feature information includes the first emotion information, as shown in fig. 4, the second obtaining module 302 may include:

the first acquisition submodule 3021 is configured to acquire image information of the user when watching the video to be processed; the first processing sub-module 3022 is configured to input the image information into a first emotion analysis model, and output the first emotion analysis model as the first emotion information.

Optionally, the second obtaining module 302 may further include: the sampling submodule is used for sampling the image information to obtain sampled image information; the first processing sub-module is specifically configured to input the sampled image information to the first emotion analysis model.

Optionally, as shown in fig. 5, the third obtaining module 303 may include: a first marking submodule 3031, configured to mark a first video frame in the video to be processed when the first emotion information is acquired; a first obtaining submodule 3032, configured to form the second video segment by using the first video frame; the first emotion information is emotion information reflected when the user watches the first video frame.

Optionally, in a case that the personalized feature information includes the second emotion information, as shown in fig. 6, the second obtaining module 302 may include:

a second collecting submodule 3023, configured to collect text information input by the user; the preprocessing submodule 3024 is configured to preprocess the text information to obtain a text preprocessing result; a second processing sub-module 3025, configured to input the text preprocessing result into a second emotion analysis model, and output the second emotion analysis model as the second emotion information.

Optionally, as shown in fig. 7, the third obtaining module 303 may include: a second labeling submodule 3033, configured to label a second video frame in the video to be processed when the second emotion information is obtained; a second obtaining submodule 3034, configured to form the second video segment by using the second video frame; wherein the second emotional information is emotional information embodied by text input by the user when the user watches the second video frame.

Optionally, as shown in fig. 8, the processing module 304 may include: a first selecting submodule 3041 for selecting a first target video segment from the first video segments; a second selecting submodule 3042, configured to select a second target video segment from the second video segments; a first processing submodule 3043, configured to obtain the target video segment by using the first target video segment and the second target video segment; wherein the first target video segment and the second target video segment have the same attribute information.

Optionally, as shown in fig. 9, the second obtaining module 302 may include:

a third collecting submodule 3026, configured to collect image information of the user when watching the video to be processed;

a third processing sub-module 3027, configured to input the image information into a first emotion analysis model, and output the first emotion analysis model as the first emotion information;

a fourth collecting submodule 3028, configured to collect text information input by a user;

the fourth processing submodule 3029 is configured to preprocess the text information to obtain a text preprocessing result;

a fifth processing sub-module 3020, configured to input the text preprocessing result into a second emotion analysis model, and output the second emotion analysis model as the second emotion information.

Optionally, the second video segment includes a third video segment and a fourth video segment. As shown in fig. 10, the third obtaining module 303 may include:

the first obtaining submodule 3035 is configured to mark a third video frame in the video to be processed when the first emotion information is obtained, and form the third video segment by using the third video frame; a second obtaining submodule 3036, configured to mark a fourth video frame in the video to be processed when the second emotion information is obtained, and form the fourth video segment by using the fourth video frame; the first emotion information is emotion information reflected when the user watches the third video frame; the second emotion information is emotion information embodied by text input by the user when the user watches the fourth video frame.

Optionally, as shown in fig. 11, the processing module 304 may include:

a first processing submodule 3044 for forming a set of video segments containing emotional features using the third video segment and the fourth video segment; a third selecting submodule 3045 for selecting a first target video segment from the first video segments; a fourth selecting submodule 3046 for selecting a second target video segment from the set of video segments containing emotional characteristics; a second processing sub-module 3047, configured to obtain the target video segment by using the first target video segment and the second target video segment; wherein the first target video segment and the second target video segment have the same attribute information.

Optionally, as shown in fig. 12, the apparatus may further include:

an obtaining module 305, configured to obtain identification information of the user; an associating module 306, configured to associate the target video segment with the identifier.

The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

As shown in fig. 13, the video processing apparatus according to the embodiment of the present invention includes: a processor 1300, for reading the program in the memory 1320, for executing the following processes:

obtaining a first video clip from a video to be processed;

A transceiver 1310 for receiving and transmitting data under the control of the processor 1300.

In fig. 13, among other things, the bus architecture may include any number of interconnected buses and bridges with various circuits being linked together, particularly one or more processors represented by processor 1300 and memory represented by memory 1320. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1310 can be a number of elements including a transmitter and a transceiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.

The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.

The processor 1300 is further configured to read the computer program and execute the following steps:

under the condition that the personalized feature information comprises the first emotion information, collecting image information of the user when watching the video to be processed;

inputting the image information into a first emotion analysis model;

forming the second video segment using the first video frame;

collecting text information input by the user under the condition that the personalized feature information comprises the second emotion information;

preprocessing the text information to obtain a text preprocessing result;

inputting the text preprocessing result into a second emotion analysis model;

forming the second video segment using the second video frame;

selecting a first target video clip from the first video clips;

selecting a second target video clip from the second video clips;

The second video segment comprises a third video segment and a fourth video segment; the processor 1300 is further configured to read the computer program and execute the following steps:

selecting a first target video clip from the first video clips;

The device provided by the embodiment of the present invention may implement the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

Furthermore, a computer-readable storage medium of an embodiment of the present invention stores a computer program executable by a processor to implement:

obtaining a first video clip from a video to be processed;

inputting the image information into a first emotion analysis model;

forming the second video segment using the first video frame;

collecting text information input by the user;

preprocessing the text information to obtain a text preprocessing result;

inputting the text preprocessing result into a second emotion analysis model;

forming the second video segment using the second video frame;

selecting a first target video clip from the first video clips;

selecting a second target video clip from the second video clips;

selecting a first target video clip from the first video clips;

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video processing method, comprising:

obtaining a first video clip from a video to be processed;

2. The method of claim 1, wherein, in the case that the personalized feature information includes the first emotion information, obtaining the first emotion information comprises:

inputting the image information into a first emotion analysis model;

3. The method according to claim 2, wherein the obtaining a second video segment from the video to be processed based on the personalized feature information comprises:

forming the second video segment using the first video frame;

4. The method according to claim 1 or 2, wherein, in the case that the personalized feature information includes the second emotion information, acquiring the second emotion information includes:

collecting text information input by the user;

preprocessing the text information to obtain a text preprocessing result;

inputting the text preprocessing result into a second emotion analysis model;

5. The method according to claim 4, wherein the obtaining a second video segment from the video to be processed based on the personalized feature information comprises:

forming the second video segment using the second video frame;

6. The method according to claim 1, wherein the obtaining a target video segment to be edited by using the first video segment and the second video segment comprises:

selecting a first target video clip from the first video clips;

selecting a second target video clip from the second video clips;

7. The method of claim 4, wherein the second video segment comprises a third video segment and a fourth video segment;

8. The method according to claim 7, wherein the obtaining a target video segment to be edited by using the first video segment and the second video segment comprises:

selecting a first target video clip from the first video clips;

9. A video processing apparatus comprising: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; processor for reading a program in a memory to implement the steps in the video processing method according to any one of claims 1 to 8.

10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the video processing method according to any one of claims 1 to 8.