WO2022077977A1

WO2022077977A1 - Video conversion method and video conversion apparatus

Info

Publication number: WO2022077977A1
Application number: PCT/CN2021/106338
Authority: WO
Inventors: 宋玉岩; 徐宁
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2020-10-12
Filing date: 2021-07-14
Publication date: 2022-04-21
Also published as: CN112165635A

Abstract

The present application provides a video conversion method, apparatus, and system, and a storage medium. The video conversion method may comprise the following steps: obtaining a first video in a first orientation and cutting information which is used for converting the first video into a second video in a second orientation; on the basis of the cutting information, generating and displaying a user interface used for adjusting the cutting information; receiving, by means of the user interface, user input used for adjusting the cutting information; and generating the second video according to the adjusted cutting information.

Description

Video conversion method and video conversion device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202011086867.5 and the filing date of 2020.10.12, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present disclosure relates to the technical field of video processing, and in particular, to a video conversion method, device, system, and storage medium.

Background technique

At present, most video and film and television works will use a wide aspect ratio (ie landscape), such as 4:3 and 16:9, during the shooting process. Video or similar media recorded in a wide aspect ratio may be designed to be viewed on a desktop or in landscape orientation. Therefore, when a user uses a mobile terminal to watch a horizontal screen video, in order to obtain a good visual experience, the terminal screen is generally converted to a horizontal screen position to play the video.

However, more and more users, especially mobile phone users, are more accustomed to watching videos in high aspect ratio (ie, portrait). Vertically oriented media has become a popular format for viewing and displaying media in many applications.

SUMMARY OF THE INVENTION

The present disclosure provides a video conversion method, device, system and storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a video conversion method, the video conversion method may include: acquiring a first video in a first orientation and converting the first video into a second video in a second orientation cutting information; generating and displaying a user interface for adjusting the cutting information based on the cutting information; receiving user input for adjusting the cutting information via the user interface; and generating a second video according to the adjusted cutting information.

According to a second aspect of the embodiments of the present disclosure, there is provided a video conversion apparatus, the video conversion apparatus may include: an interface module configured to receive a first video in a first orientation; an analysis module configured to obtain a video for converting The first video is converted into cut information of the second video in the second orientation, and based on the cut information, a user interface for adjusting the cut information is generated and displayed; the display module is configured to display the user interface, wherein, using a user input for adjusting the cut information is received via the user interface; and an editing module configured to generate a second video according to the adjusted cut information.

According to a third aspect of the embodiments of the present disclosure, there is provided a video conversion device, the video conversion device may include: a display; a transceiver for receiving a first video in a first orientation; and a processor for: acquiring for converting the first video into the cut information of the second video in the second orientation, generating and displaying a user interface for adjusting the cut information based on the cut information, controlling the display to display the user interface, and controlling the transceiver via the user interface User input for adjusting the cut information is received, and a second video is generated according to the adjusted cut information.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, the electronic device may include: a processor; a memory storing instructions for execution by the processor, wherein execution of the instructions causes the processor Perform the video conversion method as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a non-volatile computer-readable storage medium having stored thereon instructions for execution by a processor, wherein execution of the instructions causes the processor to execute the above-described Video conversion method.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

1 is a diagram of an application environment for converting video from one orientation to another, provided according to an embodiment of the present disclosure;

2 is a flowchart of a video conversion method according to an embodiment of the present disclosure;

3 is a diagram of a user interface for adjusting a clipping window according to an embodiment of the present disclosure;

4 is a schematic flowchart of obtaining clipping window information of a single frame according to an embodiment of the present disclosure;

5 is a schematic diagram of a marked area according to an embodiment of the present disclosure;

6 is a schematic diagram of a user interface for adjusting information weights according to an embodiment of the present disclosure;

7 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure;

8 is a flowchart of a video conversion method according to another embodiment of the present disclosure;

9 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure;

10 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Throughout the drawings, it should be noted that the same reference numerals are used to refer to the same or similar elements, features and structures.

Detailed ways

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of embodiments of the present disclosure as defined by the claims and their equivalents. Various specific details are included to aid in that understanding, but are to be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following examples are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

It should be noted here that "at least one of several items" in the present disclosure all means including "any one of the several items", "a combination of any of the several items", The three categories of "the whole of the several items" are juxtaposed. For example, "including at least one of A and B" includes the following three parallel situations: (1) including A; (2) including B; (3) including A and B. Another example is "execute at least one of step 1 and step 2", which means the following three parallel situations: (1) execute step 1; (2) execute step 2; (3) execute step 1 and step 2.

The video cutting of the related art is fully automatic, and the automatically cut video may not achieve the user's expected cutting effect, but the user cannot make further cutting adjustments to the final cutting result. In addition, in automatic video cutting, the user cannot adjust the importance of each information flow in the video scene. As a result, the cut video scene may not meet user expectations.

The present disclosure can provide users with the functions of parameter adjustment before video cutting processing and adjustment of the cutting area after processing, so that users can obtain video cutting results that they are satisfied with.

Hereinafter, according to various embodiments of the present disclosure, the method, apparatus, and system of the present disclosure will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram of an application environment for converting video from one orientation to another, provided according to an embodiment of the present disclosure. In this disclosure, orientation is landscape or portrait relative to the device/device.

Referring to FIG. 1 , the application environment 100 includes a terminal 110 and a media server system 120 .

The terminal 110 is a terminal where the user is located, and the terminal 110 may be at least one of a smart phone, a tablet computer, a portable computer, a desktop computer, and the like. Although this embodiment only shows one terminal 110 for description, those skilled in the art may know that the number of the above-mentioned terminals may be two or more. This embodiment of the present disclosure does not impose any limitation on the number of terminals and device types.

The terminal 110 may be installed with a target application for providing the video to be cut and converted to the media server system 120 , and the target application may be a multimedia application, a social application or an information application or the like. For example, the terminal 110 may be a terminal used by a user, and the user's account is logged in an application running in the terminal 110 .

The terminal 110 can be connected to the media server system 120 through a wireless network or a wired network, so that data interaction can be performed between the terminal 110 and the media server system 120 . For example, a network may include a local area network (LAN), a wide area network (WAN), a telephone network, a wireless link, an intranet, the Internet, combinations thereof, and the like.

The media server system 120 may be a server system for cut-converting video. For example, media server system 120 may include one or more processing processors and memory. The memory may include one or more programs for performing the above video conversion method. The media server system 120 may also include a power supply assembly configured to perform power management of the media server system 120, a wired or wireless network interface configured to connect the media server system 120 to a network, and an input output (I/O) interface . The media server system 120 may operate based on an operating system stored in memory, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, and the like. However, the devices included in the media server system 120 described above are only exemplary, and the present disclosure is not limited thereto.

The media server system 120 can cut and convert the input video, and then deliver the converted video to the terminal 110 or publish it to the media platform via a wireless network or a wired network.

Further, the media server system 120 may acquire cut information for converting the first video into the second video of the second orientation, generate and display a user interface for adjusting the cut information based on the cut information, and receive via the user interface User input for adjusting cut information, and then adjust the previously cut video again according to the adjusted cut information.

In some embodiments, the terminal 110 may be installed with an application program implementing the video conversion method of the present disclosure, and the terminal 110 may realize the cut conversion of the video. For example, the memory of the terminal 110 may store one or more programs for performing the above video conversion method. The processor of the terminal 110 may implement the cut and convert of the video by running related programs/algorithms. The terminal 110 may then upload the cut and converted video to the media server system 120 via a wireless network or a wired network, or may store the converted video in the memory of the terminal 110 .

As an example, the terminal 110 may transmit the horizontal video obtained locally or externally to the media server system 120 via a wireless or wired network, and the media server system 120 may cut and convert the horizontal video into a vertical video according to the video conversion method of the present disclosure, and then The converted vertical video is delivered to the terminal 110 via a wireless or wired network.

As another example, the terminal 110 may convert a locally or externally acquired horizontal video into a vertical video screen according to the video conversion method of the present disclosure, and then upload the vertical video to the media server system 120 via a wireless or wired network. The media server system 120 may distribute the vertical video to other electronic devices.

Although the embodiments illustrate converting a landscape video to a portrait video, the method of the present disclosure can similarly be used to cut a portrait video into a landscape video.

FIG. 2 is a flowchart of a video conversion method according to an embodiment of the present disclosure. The video conversion method of the embodiment of the present disclosure may be executed by the media server system 120 or an electronic device having a video cut conversion function.

In step S201, a first video in a first orientation and cut information for converting the first video into a second video in a second orientation are acquired. The cut information may include a cut window for cutting the first video into the second video. Here, the first video of the first orientation may refer to a landscape video.

A video smart crop tool, such as Google Autoflip, can be used to directly obtain crop information for converting a video in one orientation to a video in another. That is to say, the clipping information for clipping the first video can be obtained from the relevant video intelligent clipping tool.

According to an embodiment of the present disclosure, the cut information can be obtained by: analyzing each frame of the first video to determine at least one kind of information of each frame, generating an annotation map of the corresponding frame based on the at least one kind of information, The focus of the corresponding frame is obtained by calculating the moment of the annotation map, the focus is taken as the center of the clipping window for clipping the frame, and the clipping window is generated according to the focus and the specified aspect ratio.

According to another embodiment of the present disclosure, the cut information may be obtained by: analyzing each frame of the first video to determine at least one kind of information of each frame, and generating and displaying the cutout information for each frame based on the analysis result A user interface for adjusting the weight of at least one information in the case of video orientation conversion, receiving user input for adjusting the weight of the at least one information through the user interface, and generating based on the weighted at least one information For the annotation map of the corresponding frame, the focus of the corresponding frame is obtained by calculating the moment of the annotation map, the focus is taken as the center of the clipping window used to clip the frame, and the clipping window is generated according to the focus and the specified aspect ratio.

According to the embodiments of the present disclosure, before acquiring the cut information, a user interface can be set so that the user can adjust the proportion of each information stream in the converted video result according to their own needs, so that the important information defined by the user is retained in the cut process. .

In addition, by calculating the focus of each frame of image, the distribution of key information in each frame is more prominent, and by fitting the trajectory of the focus of each frame, better clipping information can be provided, and the fit between frames can be increased. , to improve the user experience.

However, the above-mentioned acquisition of the clipping information is only exemplary, and the present disclosure is not limited thereto.

In the present disclosure, the acquired cut information may be result information after the video is cut. In some embodiments, the clipping information may be clipping window information calculated during information analysis for the video frame. That is to say, the acquired clipping information may be information after video clipping processing, or may be pre-analysis information before video clipping processing.

In step S202, a user interface for adjusting the cut information is generated and displayed based on the cut information. In the user interface, for a frame of the first video, a cutting window for cutting the frame into a corresponding frame of the second video may be displayed on the frame. For example, refer to FIG. 3 .

At step S203, a user input for adjusting the clipping information is received via the user interface. Here, the user input may be one of a touch input, a key input, a hovering input, and the like. Different types of user input can be implemented depending on the capabilities of the display device.

In step S204, a second video is generated according to the adjusted cut information. The cutting window of the first video may be adaptively adjusted according to the adjusted cutting information, and then the first video may be cut using the adaptively adjusted cutting window to obtain the second video. Through adaptive adjustment, better clipping information can be provided and the fit between frames can be increased.

In a possible implementation manner, at least one key frame of the first video may be determined, and then a user interface for adjusting cut information of each key frame in the at least one key frame is generated and displayed. After adjusting the cutting window of the key frame of the first video, the cutting window of the relevant frame of the first video can be automatically adjusted adaptively. After the adjustment of the whole video is completed, the user can export the cut video .

According to the embodiments of the present disclosure, users are allowed to have a more comprehensive grasp of the entire cutting process flow before and after the video cutting process, and finally obtain a cutting result that they are satisfied with.

In addition, the video conversion method according to the present disclosure can better handle video scene switching, user-specified area change, or lost scenes.

FIG. 3 is a diagram of a user interface for adjusting a clipping window according to an embodiment of the present disclosure. The user interface of FIG. 3 may be displayed on a partial area of a display such as a terminal or a server, or on the display in full screen.

According to the embodiments of the present disclosure, after the automatic cutting process, the cutting information of each frame can be provided to the user, and the cutting information can be reflected in the user interface for adjusting the cutting window later.

Referring to FIG. 3 , the user can adjust the clipping window for a certain frame through the user interface 301 . The user can move the clipping window up, down, left and right to adjust to the area of interest. When the user interface 301 is displayed on the touch screen, the user can touch the cutout window to move accordingly. Or you can drag the clipping window to the area of interest by mouse, keyboard, etc. However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

The user can selectively adjust the clipping window for some frames. For example, in the user interface 301, the user can select a frame of interest to the user by dragging the slide bar of the video, and then adjust the clipping window of the frame. Alternatively, a “next frame” button (not shown) may be set on the user interface 301, and after the user adjusts the clipping window of the current frame, the adjustment interface of the next frame can be switched by clicking the “next frame” button .

In addition, after the adjustment of the entire video is completed, the user can export the cut video by clicking an "export" button (not shown) on the user interface. The above button examples are only exemplary, and buttons with different functions can be set on the user interface according to actual requirements.

In some embodiments, the cutout window of each key frame may be displayed on the user interface 301, so that the user can adjust the cutout window of the keyframe of the video. After adjusting the cutout window of the keyframe, the user can Click the "Export" button on the user interface to export the adjusted video.

The user interface according to the embodiment of the present disclosure is simple, easy for the user to operate, and improves the efficiency of the user to adjust the cut information.

FIG. 4 is a schematic flowchart of acquiring clipping window information of a single frame according to an embodiment of the present disclosure. The method for acquiring the cut window information of a single frame according to the embodiment of the present disclosure may be executed by the media server system 120 or an electronic device having a video cut conversion function.

Referring to FIG. 4 , after the image 401 is acquired, the image 401 is analyzed to determine M kinds of information of the image 401 , where M is a positive integer. The analysis of each type of information may be implemented by using a corresponding analysis method, that is, the image 401 may be analyzed using M types of analysis methods to determine M types of information. For example, the face information of the image 401 can be analyzed using a face analysis method.

By analyzing the M types of information, M corresponding marked regions can be generated, that is, for each type of information analyzed, an information distribution map corresponding to the image 401 is generated. For example, when analyzing face information, a pixel-based labeling area of the face information of the image 401 is generated, and then the pixel-based labeling area is converted into an information distribution labeling area.

Users can assign weights to the M marked regions according to their own needs to highlight the parts they are concerned about. For example, if you want to focus on protecting the face part from being cut off, you can increase the weighting ratio of the marked area of the face information and reduce the weighted ratio of the marked area of other information.

The overall marked area of the image 401 is calculated according to the weighted M marked areas. For example, the overall labeled regions of the image 401 can be obtained by summing the weighted M labeled regions.

After the overall labeling area is obtained, the labeling map of the image 401 may be generated based on the overall labeling area. Since the weighting of each annotated area was performed before, the annotation map can show the importance of each annotated area.

The focus of the image 401 is obtained by computing the moments of the annotation map. For example, the focus of image 401 can be obtained by calculating the geometric center point of the annotation map. Generates a clipping window using the position of the focus and the specified aspect ratio.

However, the above examples are only exemplary, and the clipping window information for converting a video in one orientation to a video in another orientation can also be obtained from a video intelligent cropping tool (such as Google Autoflip). After obtaining the clipping information from other clipping tools or software, the clipping information such as the center position, size, aspect ratio, etc., of the clipping window of each frame can be obtained in a similar manner as described above.

FIG. 5 is a schematic diagram of a marked area according to an embodiment of the present disclosure.

Referring to FIG. 5 , (a) of FIG. 5 is a certain frame of the first video, (b) of FIG. 5 shows the marked area of important information (such as motion information) in the frame, and the white area in (b) is Label area. However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

FIG. 6 is a schematic diagram of a user interface for adjusting information weights according to an embodiment of the present disclosure. After analyzing various information of a frame, a user interface associated with the various information may be displayed accordingly.

Referring to FIG. 6 , in the user interface 601, a slider bar may be configured for each type of information (such as the first information, the second information, etc.), and the slider bar may be used to adjust the weight of the corresponding information. For example, the range of the slider can be set to [0, 1]. After setting the corresponding weight for each kind of information, click the "OK" button to complete the setting of the weight of each information flow in a frame. For example, after clicking the "OK" button, the weight information input by the user may be transmitted to the processor of the electronic device for subsequent cut conversion. Alternatively, after clicking the "OK" button, the corresponding cutout window may be presented on the corresponding frame, so as to show the user the cutout position of the cutout window on the frame.

However, the user interface of FIG. 6 is merely exemplary, and elements in the user interface may be presented in other forms.

In some embodiments, a text input box may be configured for each type of information, and the user may assign weights to the corresponding information through the text input box. However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

The user interface can be displayed on a partial area of the display of the electronic device (such as the terminal 110 or the media server system 120 ), or displayed on the display in a full screen, and those skilled in the art can make display settings according to actual needs.

According to an embodiment of the present disclosure, before the video cutting process, the user is allowed to adjust the information flow weight of each frame, so that the important information defined by the user is preserved in the cutting process.

7 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure. The video conversion device 700 may be implemented as a terminal 110 or as a media server system 120, or any other device.

7 , a video conversion apparatus 700 may include a transceiver 701 , a display 702 and a processor 703 .

The transceiver 701 can receive the first video in the first orientation.

The processor 703 may use a video smart cropping tool (such as Google Autoflip) to obtain the cropping window information for converting a video in one orientation to a video in another orientation. In some embodiments, the processor may use the algorithm for obtaining cut information of embodiments of the present disclosure (eg, the method shown in FIG. 4 ) to obtain the data for converting the first video into the second video in the second orientation Cut information.

The processor 703 may generate and display a user interface for adjusting the clipping information based on the clipping information, and control the display 702 to display the user interface. For example, the user interface shown in FIG. 3 may be displayed.

The user interface may include graphics, text, icons, video, and any combination thereof associated with the analysis information. When the display 702 is a touch display screen, the display 702 also has the ability to acquire touch signals on or over the surface of the display 702 . The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 702 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the number of displays 702 may be one, which is arranged on the front panel of the video conversion device 700; in other embodiments, the number of displays 702 may be at least two, which are respectively arranged on different surfaces of the video conversion device 700 or folded. Design; In still other embodiments, display 702 may be a flexible display screen disposed on a curved or folded surface of video conversion device 700 . The display 702 can be prepared by using materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode). However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

The processor 703 can control the transceiver 701 to receive a user input for adjusting the clipping information via the user interface. After the clipping window is adjusted, the processor 703 can automatically adjust the clipping window for the relevant frame to ensure that the frames are consistent with each other. compatibility between.

As an example, the processor 703 may adaptively adjust the cutting window of the first video according to the adjusted cutting information, and then use the adaptively adjusted cutting window to cut the first video to obtain the second video . After the final second video is obtained, the second video can be output to other devices via the transceiver 701 .

By setting the adjustment options of the cut window of the cut video frame, the user can make further adjustments to the final cut result.

According to the embodiments of the present disclosure, not only the function of adjusting the trimming area after the video trimming process can be provided to the user, but also the parameter adjustment before the video trimming process can be provided to the user, so that the user can obtain the trimming result they are satisfied with.

The processor 703 may analyze each frame of the first video to determine at least one kind of information for each frame, and based on the analysis result, generate a method for adjusting the at least one kind of information for each frame in the case of video orientation conversion. Weight UI. For example, the user interface shown in FIG. 6 may be displayed.

The processor 703 may control the transceiver 701 to receive, through the user interface of FIG. 6, a user input for adjusting the weight of at least one kind of information of each frame, and to generate a clipping for each frame based on the at least one kind of information whose weight is adjusted. Cut window information. After generating the clipping window information of each frame, the processor 703 may generate a user interface according to the clipping window information to visually display to the user how each frame is clipped.

In a possible implementation manner, the processor 703 may generate, based on the analysis of the at least one type of information, each marked area of the corresponding frame corresponding to the at least one type of information, where the marked area is an area representing the distribution of information, wherein the corresponding frame Each annotated region of is given a weight entered by the user.

In a possible implementation manner, for each frame of the first video, the processor 703 may calculate the overall labeling area of the corresponding frame according to each labeling area whose weights are adjusted, and calculate the focus of the corresponding frame based on the overall labeling area, Generates a clipping window for the corresponding frame based on the focus and the specified aspect ratio. In addition, the size of the cutting window may be preset, or the size of the cutting window may be adaptively adjusted.

In a possible implementation manner, the processor 703 may obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and then generate the corresponding frame based on the fitted focus and the specified aspect ratio. frame clipping window.

In a possible implementation manner, the processor may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating a moment of the annotation map.

In some embodiments, the video conversion apparatus 700 may include a memory that may store the original input video and the converted video. Additionally, the memory may include one or more computer-readable storage media, which may be non-transitory. Memory may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory is used to store at least one instruction for execution by processor 703 .

In some embodiments, the video conversion device 700 further includes: a peripheral device interface and at least one peripheral device. The processor 703 and the peripheral device interface can be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface through bus, signal line or circuit board. In some embodiments, the peripheral devices may include at least one of radio frequency circuits, touch screen displays, cameras, audio circuits, positioning components, power supplies, and the like.

In some embodiments, the video conversion device 700 may also include one or more sensors. The one or more sensors include, but are not limited to, acceleration sensors, gyroscope sensors, pressure sensors, fingerprint sensors, optical sensors, and proximity sensors. For example, the processor 703 may receive an indication of an orientation change from one or more sensors, thereby recommending a video of the corresponding orientation to the user.

FIG. 8 is a flowchart of a video conversion method according to another embodiment of the present disclosure.

Referring to FIG. 8, in step S801, a first video of a first orientation is acquired. For example, the first video in the first orientation may be a landscape video.

In step S802, at least one kind of information of each frame of the first video is analyzed.

Here, at least one type of information of each frame may include key area information, for example, may include at least one of face information, human body information, main object information, motion scene information, and video boundary information. The face information may include face recognition information and face tracking information, etc., and the main object information may include object identification information and object tracking information. However, the above examples are merely exemplary, and the present disclosure may analyze any amount and kind of information in a frame.

Analysis algorithms for main information, key information or information of interest to the user may be stored in advance to analyze the information contained in the frame. For example, a face recognition algorithm can be used to analyze the face information in a frame, and an optical flow algorithm can be used to analyze the motion scene information in a frame. However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

In step S803, each annotated area corresponding to the at least one type of information of the corresponding frame is generated based on the analysis of the at least one type of information. Here, the labeled area may refer to an area representing the distribution of information. For a frame, the frame may include a variety of information, each time one kind of information in the frame is analyzed, an information distribution map corresponding to the frame can be generated. Correspondingly, if a variety of information in a frame is analyzed, multiple annotations can be generated area.

As an example, when analyzing the face information in a frame, a pixel-based annotation area (mask) corresponding to the face information of this frame can be generated, and then the pixel-based annotation area can be converted into an annotation of information distribution area.

In step S804, based on the analysis result, a user interface for adjusting the weight occupied by each marked region in the video cutting is generated and displayed. The user interface may include a slider bar or text input box for adjusting the weight for each of the at least one information.

After analyzing the information contained in a frame, a user interface for that frame can be generated, and the user interface can include a user interface for adjusting the weight of the information contained in the frame. For example, the user interface may include slider bars or text entry boxes for adjusting each type of information. However, the above-described examples are merely exemplary, and the present disclosure is not limited thereto.

In step S805, a user input for adjusting the weight of each marked region is received through the user interface. Weights input by the user may be assigned to each annotated region of the corresponding frame. Users can set the weight of the information they want to keep through the user interface according to their needs. For example, if the user wants to focus on protecting the face part from being cut off, the user can increase the weighting ratio of the marked area of the face information, and reduce the weighted ratio of the marked area of other information. The user can interactively adjust the weighting parameters. By weighting each labeled area, the information/area that the user pays more attention to can be highlighted.

Here, each kind of information corresponds to one kind of information labeling area, and weighting each kind of information can be interpreted as the weighting of the information labeling area.

By setting the user interface for each frame, it is possible to realize the user's weight on each information flow in a frame in the subsequent cut and transform operation.

In step S806, for each frame of the first video, the overall labeling area of the corresponding frame is calculated according to each labeling area whose weight is adjusted. For example, the weighted regions can be summed to obtain the overall annotated region of a frame.

In step S807, an annotation map for the corresponding frame is generated based on the overall annotation area. Here, the annotation map may be an information distribution image for each annotation area.

In step S808, the focus of the corresponding frame is obtained by calculating the moment of the annotation map. Here, the focus can reflect the distribution of important information in a frame. For example, the geometric center point of the annotation map can be calculated as the focal point of a frame.

In step S809, a clipping window of the corresponding frame is generated based on the focus and the specified aspect ratio. For example, after obtaining the focus of a frame, the focus is set as the center of the clipping window, and the layout and size of the clipping window are set according to the specified aspect ratio. Here, the aspect ratio of the second video may be used as the specified aspect ratio, but the present disclosure is not limited thereto.

In one possible implementation, a fitted focus of the corresponding frame may be obtained by fitting the focus of each frame, and a crop of the corresponding frame may be generated based on the fitted focus and a specified aspect ratio window. By fitting the clipping region of the current scene according to the focus of some series of frames of the current scene, a smoother clipping effect between frames is achieved.

In step S810, cut information for converting the first video into the second video in the second orientation is obtained. For example, after the clipping window information of each frame is obtained according to steps S802 to S809, the clipping window information of all frames is obtained, which is used for further adjustment of the clipping window subsequently.

In step S811, a user interface for adjusting the cut information is generated and displayed based on the cut information. In the user interface, for a frame of the first video, a cutting window for cutting the frame into a corresponding frame of the second video may be displayed on the frame. For example, refer to FIG. 3 .

In step S812, a user input for adjusting the cut information is received via the user interface.

In step S813, the cutting window of the first video may be adaptively adjusted according to the adjusted cutting information. For example, after the user further adjusts the video frame, a fitting process may be performed on the further adjusted clipping window, so that the final presented video is smoother.

In step S814, the first video is cut using the adaptively adjusted cut window to obtain a further adjusted second video.

The embodiments of the present disclosure can provide the user with the functions of parameter adjustment before video cutting processing and adjustment of the cutting area after processing, so that the user can have a more comprehensive grasp of the entire cutting processing flow before and after the video cutting processing, and In the end, he was satisfied with the cutting result.

FIG. 9 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure.

9 , the video conversion apparatus 900 may include an interface module 901 , an analysis module 902 , a display module 903 and an editing module 904 . Each module in the video conversion apparatus 900 may be implemented by one or more modules, and the name of the corresponding module may vary according to the type of the module. In various embodiments, some modules in the video conversion apparatus 900 may be omitted, or additional modules may also be included. Furthermore, modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus may equivalently perform the functions of the corresponding modules/elements prior to combination.

The interface module 901 may be configured to receive the first video in the first orientation and user input.

The analysis module 902 may be configured to analyze each frame of the first video to determine at least one kind of information for each frame, and to generate, for each frame, a method for adjusting the at least one kind of information in the video orientation transition based on the analysis result. User interface for case weights.

In one possible implementation, at least one type of information may include key area information.

In a possible implementation manner, the key area information may include at least one of face information, human body information, significant object information, motion scene information, and video boundary information.

The display module 903 may be configured to display a user interface for adjusting the weight of at least one kind of information.

In one possible implementation, the user interface may include a user interface for adjusting the weight for each of the at least one information.

The editing module 904 may be configured to generate cut window information to cut the first video based on at least one type of information whose weights are adjusted, and to generate a second video in a second orientation based on the cut first video.

In a possible implementation manner, the analysis module 902 may generate, based on the analysis of the at least one type of information, each marked area of the corresponding frame corresponding to the at least one type of information, where the marked area is an area representing the distribution of information, wherein the corresponding frame Each annotated region of is given a weight entered by the user.

In a possible implementation manner, for each frame of the first video, the editing module 904 may calculate the overall labeling area of the corresponding frame according to each labeling area whose weight is adjusted; calculate the focus of the corresponding frame based on the overall labeling area, Generates a clipping window for the corresponding frame based on the focus and the specified aspect ratio.

In one possible implementation, the editing module 904 may obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and generate the corresponding frame based on the fitted focus and the specified aspect ratio clipping window.

In a possible implementation manner, the editing module 904 may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating the moment of the annotation map.

In addition, the video conversion apparatus 900 can provide the user with the function of adjusting the cut area after the video cutting process, so that the user can obtain a cutting result that they are satisfied with.

The analysis module 902 may obtain cut information for converting the first video into the second video in the second orientation, and generate and display a user interface for adjusting the cut information based on the cut information. User input for adjusting clipping information may be received via the user interface.

In a possible implementation manner, the cut information may include a cut window for cutting the first video into the second video.

In a possible implementation manner, for a frame of the first video, the analysis module 902 may display a cutout window on the frame for cutting the frame into a corresponding frame of the second video.

As an example, after adjusting the weight of each piece of information in each frame, the video can be cut according to the adjusted pieces of information, and then, the cut information that has been cut before can be presented to the user again, so that the user can Adjust the cut window again for the cut video. In some embodiments, after adjusting the weight of each information of each frame, the video is not cut at this time, but the cut information generated according to the adjusted information is presented to the user through the user interface , the user can adjust the clipping window as a whole, and then use the final adjusted clipping window for clipping processing.

In one possible implementation, the analysis module 902 may determine at least one key frame of the first video, and generate and display a user interface for adjusting cut information of each of the at least one key frame.

In a possible implementation manner, the editing module 904 may adaptively adjust the cutting window of the first video according to the adjusted cutting information, and use the adaptively adjusted cutting window to cut the first video Cut to get the second video.

In the video conversion apparatus of this embodiment, the implementation principle and technical effect of video conversion by using the above-mentioned modules are the same as those of the above-mentioned related method embodiments.

According to an embodiment of the present disclosure, an electronic device can be provided. 10 is a block diagram of an electronic device according to an embodiment of the present disclosure, the electronic device 1000 includes at least one memory 1002 and at least one processor 1001, the at least one memory 1002 stores a set of computer-executable instructions, when the computer can execute the instructions When the collection is executed by at least one processor 1001, the video conversion method according to the embodiment of the present disclosure is executed.

As an example, the electronic device 1000 may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or any other device capable of executing the above set of instructions. Here, the electronic device 1000 is not necessarily a single electronic device, but can also be a collection of any device or circuit capable of executing the above-mentioned instructions (or instruction sets) individually or jointly. Electronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces locally or remotely (eg, via wireless transmission).

In the electronic device 1000, the processor 1001 may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller or a microprocessor. By way of example and not limitation, processor 1001 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 1001 may execute instructions or code stored in memory, which may also store data. Instructions and data may also be sent and received over a network via a network interface device, which may employ any known transport protocol.

The memory 1002 may be integrated with the processor, eg, RAM or flash memory arranged within an integrated circuit microprocessor or the like. In addition, the memory may comprise a separate device such as an external disk drive, a storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled, or may communicate with each other, eg, through I/O ports, network connections, etc., to enable the processor to read files stored in the memory.

In addition, the electronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1000 may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the video conversion method according to the present disclosure. Examples of computer-readable storage media herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Disc Storage, Hard Disk Drive (HDD), Solid State Hard disk (SSD), card memory (such as a multimedia card, Secure Digital (SD) card, or Extreme Digital (XD) card), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid state disk, and any other apparatuses configured to store, in a non-transitory manner, a computer program and any associated data, data files and data structures and to provide said computer program and any associated data, data files and data structures The computer program is given to a processor or computer so that the processor or computer can execute the computer program. The computer program in the above-mentioned computer-readable storage medium can run in an environment deployed in a computer device such as a client, a host, an agent device, a server, etc. In addition, in one example, the computer program and any associated data, data files and data structures are distributed over networked computer systems so that the computer programs and any associated data, data files and data structures are stored, accessed and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, a computer program product can also be provided, and instructions in the computer program product can be executed by a processor of a computer device to complete the above-mentioned video conversion method.

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A video conversion method, wherein the video conversion method comprises:

obtaining a first video in a first orientation and clipping information for converting the first video into a second video in a second orientation;

Generate and display a user interface for adjusting the clipping information based on the clipping information;

receiving, via the user interface, user input for adjusting the clipping information; and

A second video is generated according to the adjusted cut information.
The video conversion method of claim 1, wherein the cut information includes a cut window for cutting the first video into the second video.
The video conversion method according to claim 1, wherein the step of generating and displaying a user interface for adjusting the cut information based on the cut information comprises:

For a frame of the first video, a cutout window for cutting the frame into a corresponding frame of the second video is displayed on the frame.
The video conversion method according to claim 1, wherein the video conversion method comprises:

determining at least one key frame of the first video,

Wherein, the step of generating and displaying a user interface for adjusting the clipping information includes:

A user interface for adjusting cut information for each of the at least one keyframe is generated and displayed.
The video conversion method according to claim 1, wherein the step of generating the second video according to the adjusted cut information comprises:

adaptively adjust the cut window of the first video according to the adjusted cut information;

Cut the first video by using the adaptively adjusted cut window to obtain the second video.
The video conversion method according to claim 1, wherein the step of acquiring cut information for converting the first video into the second video in the second orientation comprises:

analyzing each frame of the first video to determine at least one information for each frame;

generating and displaying, for each frame, another user interface for adjusting the weight of the at least one information in the case of a video orientation transition based on the analysis results;

receiving, through the another user interface, user input for adjusting the weight of the at least one information;

Clipping information is generated based on the at least one kind of information whose weights are adjusted.
The video conversion method according to claim 6, wherein the step of generating cut information based on the at least one type of information whose weights are adjusted comprises:

generating an annotation map of the corresponding frame based on the at least one kind of information whose weights are adjusted;

Obtain the focus of the corresponding frame by calculating the moment of the annotation map;

A clipping window is generated based on the focus and the specified aspect ratio.
A video conversion device, wherein the video conversion device comprises:

an interface module configured to receive a first video in a first orientation;

an analysis module configured to obtain cut information for converting the first video into the second video in the second orientation, and to generate and display a user interface for adjusting the cut information based on the cut information;

a display module configured to display the user interface, wherein user input for adjusting clipping information is received via the user interface;

The editing module is configured to generate the second video according to the adjusted cut information.
The video conversion apparatus of claim 8, wherein the cut information includes a cut window for cutting the first video into the second video.
9. The video conversion apparatus of claim 8, wherein the analysis module is configured for a frame of the first video, and a cutting window for cutting the frame into a corresponding frame of the second video is set on the frame.
The video conversion device of claim 8, wherein the analysis module is configured to:

determining at least one key frame of the first video,

A user interface for adjusting cut information for each of the at least one keyframe is generated and displayed.
The video conversion device according to claim 8, wherein the editing module is configured to:

adaptively adjust the cut window of the first video according to the adjusted cut information;

Cut the first video by using the adaptively adjusted cut window to obtain the second video.
The video conversion apparatus of claim 8, wherein the analysis module is configured to:

analyzing each frame of the first video to determine at least one information for each frame;

generating and displaying, for each frame, another user interface for adjusting the weight of the at least one information in the case of a video orientation transition based on the analysis results;

receiving, through the another user interface, user input for adjusting the weight of the at least one information;

Clipping information is generated based on the at least one kind of information whose weights are adjusted.
The video conversion apparatus of claim 13, wherein the analysis module is configured to:

generating an annotation map of the corresponding frame based on the at least one kind of information whose weights are adjusted;

Obtain the focus of the corresponding frame by calculating the moment of the annotation map;

A clipping window is generated based on the focus and the specified aspect ratio.
A video conversion device, wherein the video conversion device comprises:

monitor;

a transceiver for receiving a first video in a first orientation; and

processor for:

obtaining cut information for converting the first video into a second video in a second orientation;

Generate and display a user interface for adjusting the clipping information based on the clipping information;

controlling the display to display the user interface;

The control transceiver receives, via the user interface, user input for adjusting the clipping information; and

A second video is generated according to the adjusted cut information.
16. The video conversion apparatus of claim 15, wherein the cut information includes a cut window for cutting the first video into the second video.
The video conversion apparatus of claim 15, wherein the processor is configured to:

For a frame of the first video, a cutting window for cutting the frame into a corresponding frame of the second video is set on the frame.
The video conversion apparatus of claim 15, wherein the processor is configured to:

determining at least one key frame of the first video,

A user interface for adjusting cut information for each of the at least one keyframe is generated and displayed.
The video conversion apparatus of claim 15, wherein the processor is configured to:

adaptively adjust the cut window of the first video according to the adjusted cut information;

Cut the first video by using the adaptively adjusted cut window to obtain the second video.
The video conversion apparatus of claim 15, wherein the processor is configured to:

analyzing each frame of the first video to determine at least one information for each frame;

generating and displaying, for each frame, another user interface for adjusting the weight of the at least one information in the case of a video orientation transition based on the analysis results;

receiving, through the another user interface, user input for adjusting the weight of the at least one information;

Clipping information is generated based on the at least one kind of information whose weights are adjusted.
The video conversion device of claim 20, wherein the processor is configured to:

generating an annotation map of the corresponding frame based on the at least one kind of information whose weights are adjusted;

Obtain the focus of the corresponding frame by calculating the moment of the annotation map;

A clipping window is generated based on the focus and the specified aspect ratio.
An electronic device comprising:

processor;

memory for storing instructions to be executed by the processor,

wherein execution of the instructions causes the processor to perform the following steps:

obtaining a first video in a first orientation and clipping information for converting the first video into a second video in a second orientation;

Generate and display a user interface for adjusting the clipping information based on the clipping information;

receiving, via the user interface, user input for adjusting the clipping information; and

A second video is generated according to the adjusted cut information.
A non-volatile computer-readable storage medium having stored thereon instructions for execution by a processor, wherein execution of the instructions causes the processor to perform the following steps:

obtaining a first video in a first orientation and clipping information for converting the first video into a second video in a second orientation;

Generate and display a user interface for adjusting the clipping information based on the clipping information;

receiving, via the user interface, user input for adjusting the clipping information; and

A second video is generated according to the adjusted cut information.