CN112218160A

CN112218160A - Video conversion method and device, video conversion equipment and storage medium

Info

Publication number: CN112218160A
Application number: CN202011086676.9A
Authority: CN
Inventors: 宋玉岩; 徐宁
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-01-12
Also published as: WO2022077995A1

Abstract

The present disclosure provides a video conversion method and apparatus, a video conversion device, and a storage medium. The video conversion method may include the steps of: acquiring a first video of a first orientation; analyzing each frame of the first video to determine at least one type of information for each frame; generating and displaying a user interface for adjusting the weight of the at least one information at the time of the video orientation conversion for each frame based on the analysis result; receiving, through the user interface, a user input for adjusting a weight of the at least one information; generating cropping window information to crop the first video based on the at least one information whose weight is adjusted; and generating a second video in a second orientation based on the cropped first video.

Description

Video conversion method and device, video conversion equipment and storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video conversion method and apparatus, a video conversion device, and a storage medium.

Background

Currently, most video and film works employ wide aspect ratios (i.e., landscape) such as 4:3, 16:9 during the capture process. Video or similar media recorded with a wide aspect ratio may be designed to be viewed on a table top or landscape orientation. Therefore, when a user watches a landscape video using a mobile terminal, the terminal screen is generally switched to the landscape position to play the video in order to obtain a good visual experience.

However, more and more users, particularly mobile phone users, are more accustomed to viewing high aspect ratio (i.e., portrait) video. Vertically oriented media has become a popular format for viewing and displaying media in many applications. The common solution is to narrow down the horizontal screen video to the vertical screen for watching according to the horizontal screen aspect ratio, so that large unused screen areas exist up and down the video, and the picture size becomes smaller, which causes poor visual experience of users.

Disclosure of Invention

The present disclosure provides a video conversion method and apparatus, a video conversion device, and a storage medium, to at least solve a problem that a user cannot adjust each information stream in a video scene at the time of video orientation conversion.

According to a first aspect of embodiments of the present disclosure, there is provided a video conversion method, which may include the steps of: acquiring a first video of a first orientation; analyzing each frame of the first video to determine at least one type of information for each frame; generating and displaying a user interface for adjusting the weight of the at least one information at the time of the video orientation conversion for each frame based on the analysis result; receiving, through the user interface, a user input for adjusting a weight of the at least one information; generating cropping window information to crop the first video based on the at least one information whose weight is adjusted; generating a second video of a second orientation based on the cropped first video.

Optionally, the at least one information may include key area information.

Optionally, the key region information may include at least one of face information, human body information, important object information, motion scene information, and video boundary information.

Optionally, the user interface may comprise a user interface for adjusting the weights for each of the at least one information.

Optionally, the step of generating the cutting window information based on the at least one information whose weight is adjusted may include: calculating a focus of a corresponding frame based on the at least one information whose weight is adjusted; generating a cropping window for the respective frame based on the focus.

Optionally, the step of calculating the focus of the corresponding frame based on the at least one information whose weight is adjusted may include: generating each label area with the adjusted weight of the corresponding frame based on the at least one type of information with the adjusted weight, wherein the label area is an area representing information distribution; calculating the focus of the corresponding frame by using the marked areas with the adjusted weights.

Optionally, the step of analyzing each frame of the first video may comprise: generating respective labeled regions of the respective frames corresponding to the at least one information based on the analysis of the at least one information, the labeled regions being regions representing distribution of information, wherein the step of adjusting the weight of the at least one information may include giving the respective labeled regions of the respective frames a weight input by a user.

Optionally, the step of generating the cutting window information based on the at least one information whose weight is adjusted may include: for each frame of the first video, calculating the whole labeling area of the corresponding frame according to the labeling areas with the adjusted weights; calculating a focus of the corresponding frame based on the entire labeling area; generating a cropping window for the respective frame based on the focus and the specified aspect ratio.

Optionally, the step of generating a cropping window for the respective frame based on the focus and the specified aspect ratio may comprise: obtaining a fitted focus of the corresponding frame by fitting the focus of each frame; and generating a cropping window for the respective frame based on the fitted focus and the specified aspect ratio.

Optionally, the step of calculating the focus of the corresponding frame based on the overall labeled region may include: generating an annotation graph for the corresponding frame based on the overall annotation region; the focal point of the corresponding frame is obtained by calculating the moment of the label graph.

According to a second aspect of the embodiments of the present disclosure, there is provided a video conversion apparatus, which may include the following modules: an interface module configured to receive a first video at a first orientation and a user input; an analysis module configured to analyze each frame of the first video to determine at least one information of each frame, and to generate a user interface for each frame for adjusting a weight of the at least one information at the time of video orientation conversion based on a result of the analysis; a display module configured to display the user interface, wherein a user input for adjusting the weight of the at least one information is received via the user interface; and an editing module configured to generate cropping window information based on the at least one information whose weight is adjusted to crop the first video, and generate a second video of a second orientation based on the cropped first video.

Optionally, the at least one information may include key area information.

Alternatively, the analysis module may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing information distribution, wherein the respective labeled regions of the respective frames are given a weight input by the user.

Alternatively, the editing module may calculate a focus of the corresponding frame based on the at least one information whose weight is adjusted, and generate a cropping window of the corresponding frame based on the focus.

Alternatively, the analysis module may generate, based on the at least one kind of information whose weight is adjusted, each of the label regions whose weights are adjusted for the corresponding frame, the label region being a region representing information distribution, and the editing module may calculate the focus of the corresponding frame using the each of the label regions whose weights are adjusted.

Alternatively, for each frame of the first video, the editing module may calculate an overall annotation region for the respective frame according to the respective annotation region whose weight is adjusted, calculate a focus for the respective frame based on the overall annotation region, and generate a cropping window for the respective frame based on the focus and the specified aspect ratio.

Alternatively, the editing module may obtain a fitted focus for each frame by fitting the focus of the respective frame, and generate a cropping window for the respective frame based on the fitted focus and the specified aspect ratio.

Alternatively, the editing module can generate an annotation graph for the respective frame based on the entire annotation region and obtain the focus of the respective frame by computing a moment of the annotation graph.

According to a third aspect of embodiments of the present disclosure, there is provided a video conversion apparatus, which may include: a display; a transceiver for receiving a first video in a first orientation; and a processor for analyzing each frame of the first video to determine at least one information for each frame; generating a user interface for each frame for adjusting the weight of the at least one information at the time of video orientation conversion based on the analysis result; controlling a display to display the user interface; controlling a transceiver to receive a user input for adjusting a weight of the at least one information through the user interface; generating cropping window information to crop the first video based on the at least one information whose weight is adjusted; generating a second video of a second orientation based on the cropped first video.

Alternatively, the processor may calculate a focus of the respective frame based on the at least one information whose weight is adjusted, and generate a cropping window of the respective frame based on the focus.

Alternatively, the processor may generate, based on the at least one information whose weight is adjusted, each of the label regions whose weights are adjusted for the corresponding frame, the label region being a region representing information distribution, and calculate the focus of the corresponding frame using the each of the label regions whose weights are adjusted.

Alternatively, the processor may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing information distribution, wherein the respective labeled regions of the respective frames are given a weight input by the user.

Alternatively, for each frame of the first video, the processor may calculate an overall annotation region for the respective frame according to the respective annotation region whose weight is adjusted, calculate a focus for the respective frame based on the overall annotation region, and generate a cropping window for the respective frame based on the focus and the specified aspect ratio.

Alternatively, the processor may obtain a fitted focus for each frame by fitting the focus of the respective frame, and generate a cropping window for the respective frame based on the fitted focus and the specified aspect ratio.

Alternatively, the processor may generate an annotation map for the respective frame based on the overall annotation region and obtain the focus of the respective frame by computing a moment of the annotation map.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic apparatus, which may include: at least one processor; at least one memory storing computer executable instructions, wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the video conversion method as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a video conversion method as described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions of which are executed by at least one processor in an electronic device to perform the video conversion method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the user can adjust the proportion of each information flow in the converted video result according to the requirement by setting the user interface, so that important information defined by the user is kept in the cutting processing, and the cutting effect expected by the user is achieved.

In addition, the distribution situation of each frame of key information is more highlighted by calculating the focus of each frame of image, better shearing information can be provided by fitting the track of each frame of focus, the conformity between frames is increased, and the user experience is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram of an application environment for transitioning a video from one orientation to another provided in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow chart of a video conversion method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of cropping a single frame according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a labeling area according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a user interface according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a single frame transitioning from one orientation to another orientation in accordance with an embodiment of the present disclosure;

fig. 7 is a block diagram of a video conversion device according to an embodiment of the present disclosure;

fig. 8 is a flow chart of a video conversion method according to another embodiment of the present disclosure;

fig. 9 is a block diagram of a video conversion device according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device according to an embodiment of the disclosure.

Throughout the drawings, it should be noted that the same reference numerals are used to designate the same or similar elements, features and structures.

Detailed Description

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of the embodiments of the disclosure as defined by the claims and their equivalents. Various specific details are included to aid understanding, but these are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Video cutting in the related art is fully automatic operation, and a user cannot adjust the importance of each information stream in a video scene. This results in that the cut-out video scene may not meet the user's expectations. In the present disclosure, a user interface is provided to enable a user to adjust the weight of each information stream in video conversion according to the requirement, so as to achieve a better cropping effect.

Hereinafter, according to various embodiments of the present disclosure, a method, an apparatus, and a system of the present disclosure will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram of an application environment for transitioning a video from one orientation to another provided in accordance with an embodiment of the present disclosure. In the present disclosure, the orientation is a lateral or vertical direction relative to the device/apparatus.

Referring to fig. 1, the application environment 100 includes a terminal 110 and a media server system 120.

The terminal 110 is a terminal where a user is located, and the terminal 110 may be at least one of a smart phone, a tablet computer, a portable computer, a desktop computer, and the like. Although the present embodiment shows only one terminal 110 for illustration, those skilled in the art will appreciate that the number of the terminals may be two or more. The number of terminals and the type of the device are not limited in any way in the embodiments of the present disclosure.

The terminal 110 may be installed with a target application for providing the video to be cut and converted to the media server system 120, and the target application may be a multimedia-type application, a social-type application, an information-type application, or the like. For example, the terminal 110 may be a terminal used by a user, and an account of the user is registered in an application running in the terminal 110.

The terminal 110 may be connected to the media server system 120 through a wireless network or a wired network, so that data interaction between the terminal 110 and the media server system 120 is possible. For example, the network may comprise a Local Area Network (LAN), a Wide Area Network (WAN), a telephone network, a wireless link, an intranet, the Internet, a combination thereof, or the like.

The media server system 120 may be a server system for clip converting video. For example, the media server system 120 may include one or more processors and memory. The memory may include one or more programs for performing the above video conversion method. The media server system 120 may also include a power component configured to perform power management of the media server system 120, a wired or wireless network interface configured to connect the media server system 120 to a network, and an input output (I/O) interface. The media server system 120 may operate based on an operating system stored in memory, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc. However, the devices included in the above-described media server system 120 are merely exemplary, and the present disclosure is not limited thereto.

The media server system 120 may cut and convert the input video using the video conversion method of the present disclosure, and then send the converted video to the terminal 110 via a wireless network or a wired network or publish the video to a media platform.

Optionally, the terminal 110 may be installed with an application program implementing the video conversion method of the present disclosure, and the terminal 110 may implement cut conversion on the video. For example, the memory of the terminal 110 may store one or more programs for executing the above video conversion method. The processor of the terminal 110 may implement the cut-to-convert of the video by running the relevant programs/algorithms. The terminal 110 may then upload the cut converted video to the media server system 120 via a wireless network or a wired network, or may store the converted video in a memory of the terminal 110.

As an example, the terminal 110 may transmit a horizontal video acquired locally or externally to the media server system 120 via a wireless or wired network, the media server system 120 clips the horizontal video into a vertical video, and then transmits the converted vertical video to the terminal 110 via the wireless or wired network.

As another example, the terminal 110 may convert a landscape video, which is acquired locally or externally, into a portrait screen and then upload the portrait video to the media server system 120 via a wireless or wired network. The media server system 120 may distribute the vertical video to other electronic devices.

Although the embodiment illustrates the conversion of a landscape video to a portrait video, the disclosed method may be similarly employed to convert a landscape video to a landscape video by cropping the portrait video.

Fig. 2 is a flow chart of a video conversion method according to an embodiment of the present disclosure. The video conversion method of the embodiment of the present disclosure may be executed by the media server system 120 or an electronic device having a video clip conversion function including the present disclosure.

In step S201, a first video of a first orientation is acquired. Here, the first video of the first orientation may refer to a landscape video. For example, the first video may be obtained locally or externally.

In step S202, each frame of the first video is analyzed to determine at least one information for each frame. Here, the at least one type of information of each frame may include key region information, and for example, may include at least one of face information, body information, main object information, moving scene information, video boundary information, and the like. The face information may include face recognition information, face tracking information, and the like, and the main object information may include object recognition information, object tracking information, and the like. However, the above examples are merely exemplary, and the present disclosure may analyze any number and kind of information in one frame.

As an example, analysis of information contained within a frame may be accomplished by pre-storing analysis algorithms for primary, key, or user-interested information. For example, face recognition algorithms may be used to analyze face information in a frame, and optical flow algorithms may be used to analyze motion scene information in a frame. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.

In step S203, a user interface for adjusting the weight of at least one information of each frame at the time of video orientation conversion is generated and displayed based on the analysis result. After analyzing the information contained within a frame, a user interface may be generated for the frame, which may include a user interface for adjusting the weight of the information contained in the frame. For example, the user interface may include a slider bar or a text entry box for adjusting each type of information. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.

In step S204, a user input for adjusting a weight of at least one information is received through the user interface. The user can apply a higher weight to one information (such as face information) and a lower weight to other information according to the requirement of the user on the user interface.

In one possible implementation, each labeled region of a frame corresponding to at least one information may be generated based on an analysis of the at least one information in the frame, and then each labeled region of the frame may be given a weight input by a user. The label area may refer to an area where information is distributed. The labeling area will be described below with reference to fig. 4.

In the present disclosure, each kind of information corresponds to one kind of information labeling area, and weighting each kind of information may be interpreted as weighting the information labeling area.

As an example, when the user wants to focus on the face part, the user may apply a higher weight to the labeled region of the face information via the user interface, and may apply a lower weight to the labeled region of the other information as appropriate. By setting the user interface for each frame of the first video, the user can weight the respective information streams in each frame in the subsequent cut-to-convert operation.

In step S205, the cropping window information is generated based on the at least one information whose weight is adjusted to crop the first video.

In one possible implementation, for each frame of the first video, an overall annotation region for the respective frame can be calculated according to the respective annotation region whose weight is adjusted, a focus for the respective frame can be calculated based on the overall annotation region, and then a cropping window for the respective frame can be generated based on the focus and a specified aspect ratio (e.g., an aspect ratio of the video to be converted). Here, the focus may reflect the distribution of important information in one frame. Further, the size of the cropping window may be set in advance, or may be adaptively adjusted.

In one possible implementation, the fitted focus of each frame may be obtained by fitting the focus of the respective frame, and the cropping window of the respective frame is generated based on the fitted focus and the specified aspect ratio. The method achieves a smoother cutting effect between frames by fitting the cutting area of the current scene according to the focuses of a series of frames of the current scene.

In one possible implementation, the processor may generate an annotation map for the respective frame based on the entire annotation region and obtain the focus of the respective frame by computing a moment of the annotation map. For example, the focus of the corresponding frame can be obtained by calculating the geometric center point of the annotation graph.

In step S206, a second video of a second orientation is generated based on the cropped first video. Here, the second video of the second orientation may be a portrait video. A second video of a specified aspect ratio is generated by cropping each frame of the first video.

In the method, the proportion of each information stream in the converted video result can be adjusted by a user according to the requirement by setting the user interface, so that the expected cutting effect of the user is achieved, the distribution situation of each frame of key information is more prominent by calculating the focus of each frame of image, better cutting information can be provided by fitting the track of each frame of focus, the degree of fit between frames is increased, and the user experience is improved.

Optionally, the key frames included in the first video may be analyzed, a corresponding user interface may be generated for each key frame, a user may adjust the labeled area of each key frame via the user interface of each key frame, then calculate a focus of each key frame according to the adjusted labeled area, and adaptively adjust the focus of the relevant frame of the first video through fitting of the key frame, so as to obtain cropping window information of the first video, so as to crop the first video.

Fig. 3 is a schematic diagram of cropping a single frame according to an embodiment of the present disclosure.

Referring to fig. 3, after image 301 is acquired, image 301 is analyzed to determine M types of information for image 301, M being a positive integer. For each information analysis, a corresponding analysis method may be used, that is, the image 301 may be analyzed using M analysis methods to determine M information. For example, the face information of the image 301 may be analyzed using a face analysis method.

By analyzing the M kinds of information, M corresponding labeled regions can be generated, that is, each information analysis will generate an information distribution map corresponding to one image 301. For example, when analyzing the face information, a pixel-based labeling area of the face information of the image 301 is generated, and then the pixel-based labeling area is converted into a labeling area with information distribution.

The user can respectively give weights to the M marking areas according to the requirement of the user so as to highlight the concerned parts of the user. For example, if the emphasis is to protect the face part from being cut off, the weighting ratio of the labeled region of the face information can be increased and the weighting ratio of the labeled region of other information can be decreased.

The entire annotation region of the image 301 is calculated from the weighted M annotation regions. For example, the entire labeled region of the image 301 can be obtained by summing the weighted M labeled regions.

After obtaining the overall annotation region, an annotation graph for the image 301 can be generated based on the overall annotation region. Since weighting processing is performed on each label area in the past, the label graph can show the importance of each label area.

The focus of the image 301 is obtained by computing the moments of the annotation map. For example, the focus of the image 301 can be obtained by calculating the geometric center point of the annotation map. A cropping window is generated using the position of the focus and the specified aspect ratio, and finally the image 301 is cropped using the cropping window.

FIG. 4 is a schematic diagram of a labeling area according to an embodiment of the present disclosure.

Referring to fig. 4, (a) of fig. 4 is a frame of the first video, and (b) of fig. 4 shows a labeled area of important information (e.g., motion information) in the frame, and a white area in (b) is the labeled area. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.

FIG. 5 is a schematic diagram of a user interface according to an embodiment of the present disclosure. After analyzing the various information of a frame, a user interface associated with the various information may be displayed accordingly.

Referring to fig. 5, in the user interface, one slider bar may be configured for each kind of information, and the slider bar may be used to adjust the weight of the corresponding information. For example, the range of the slider bar may be set to [0, 1 ]. After setting the respective weights for each type of information, the "ok" button is clicked to complete the setting of the weights for the respective information streams in one frame. For example, upon clicking the "OK" button, the weight information entered by the user may be transmitted to a processor of the electronic device for subsequent cut transitions. Or the corresponding cropping window may be presented on the corresponding frame after clicking the "ok" button to show the user the cropping position of the cropping window on the frame.

However, the user interface of FIG. 5 is merely exemplary, and elements in the user interface may be presented in other forms.

Alternatively, one text input box may be configured for each kind of information, and the user may give a weight to the corresponding information through the text input box. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.

The user interface can be displayed on a partial area of a display of the electronic device, or displayed on the display in a full screen mode, and a person skilled in the art can perform display setting according to actual needs.

Fig. 6 is a schematic diagram of a single frame transitioning from one orientation to another orientation in accordance with an embodiment of the present disclosure. The input image 601 is shown in a horizontal or landscape orientation, as shown in fig. 6 (a). After analyzing the image 601, performing weighted summation on the analyzed information according to the user requirement, calculating the moment of the annotation graph, and performing series of calculations such as clipping the image 601 by using a clipping window, the image 602 can be generated. The image 602 retains as much of the area of important information as possible. The image 602 is shown in a vertical or portrait orientation, as shown in fig. 6 (b). In this embodiment, a region of the person's horse riding in image 601 is left in image 602 by giving a greater weight to the region.

Although this embodiment shows the conversion of the landscape video into the portrait video, those skilled in the art can implement the conversion of the portrait video into the landscape video according to the above-described video conversion method.

Fig. 7 is a block diagram of a video conversion device according to an embodiment of the present disclosure. The video conversion device 700 may be implemented as a terminal 110 or as a media server system 120, or any other device.

Referring to fig. 7, the video conversion apparatus 700 may include a transceiver 701, a display 702, and a processor 703.

The transceiver 701 may receive a first video of a first orientation from the outside. Thereafter, the processor 703 may analyze each frame of the first video to determine at least one information of each frame, and generate a user interface for each frame for adjusting a weight of the at least one information at the time of the video orientation conversion based on a result of the analysis.

The processor 703 may control the display 702 to display a user interface. The user interface may include graphics, text, icons, videos, and any combination thereof associated with the analysis information. When the display 702 is a touch screen display, the display 702 also has the ability to capture touch signals on or over the surface of the display 702. The touch signal may be input to the processor 703 as a control signal for processing. At this point, the display 702 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 702 may be one, disposed on the front panel of the video conversion device 700; in other embodiments, the displays 702 may be at least two, each disposed on a different surface of the video conversion device 700 or in a folded design; in still other embodiments, the display 702 may be a flexible display screen disposed on a curved surface or on a folded surface of the video conversion device 700. The Display 702 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.

The processor 703 may control the transceiver 701 to receive, through the user interface, a user input for adjusting a weight of at least one information of each frame, generate cut window information for each frame based on the at least one information whose weight is adjusted, utilize the cut window information of each frame to cut each frame of the first video, and finally generate a second video of a second orientation based on the cut first video. The transceiver 701 may then output the generated second video to other devices.

In one possible implementation, the processor 703 may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing distribution of the information, wherein the respective labeled regions of the respective frames are given a weight input by the user.

In one possible implementation, for each frame of the first video, the processor 703 may calculate an overall labeled region of the corresponding frame according to the respective labeled region whose weight is adjusted, calculate a focus of the corresponding frame based on the overall labeled region, and generate a cropping window of the corresponding frame based on the focus and a specified aspect ratio.

In one possible implementation, the processor 703 may obtain a fitted focus for each frame by fitting the focus of the respective frame, and then generate a cropping window for the respective frame based on the fitted focus and the specified aspect ratio.

The distribution situation of each frame of key information is more highlighted by calculating the focus of each frame of image, better shearing information can be provided by fitting the track of each frame of focus, the degree of fit between frames is increased, and the user experience is improved

In one possible implementation, the processor may generate an annotation map for the respective frame based on the entire annotation region and obtain the focus of the respective frame by computing a moment of the annotation map.

In some embodiments, the video conversion device 700 may include a memory that may store the raw input video and the converted video. Further, the memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory is used to store at least one instruction for execution by the processor 703.

In some embodiments, the video conversion apparatus 700 may further include: a peripheral interface and at least one peripheral. The processor 703 and the peripheral interface may be connected by a bus or signal line. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. In particular, the peripheral device may include at least one of a radio frequency circuit, a touch display screen, a camera, an audio circuit, a positioning component, a power supply, and the like.

In some embodiments, the video conversion device 700 may also include one or more sensors. The one or more sensors include, but are not limited to, acceleration sensors, gyroscope sensors, pressure sensors, fingerprint sensors, optical sensors, and proximity sensors. For example, the processor 703 may receive an indication of a change in orientation from one or more sensors, recommending a video of the respective orientation to the user.

Fig. 8 is a flowchart of a video conversion method according to another embodiment of the present disclosure.

Referring to fig. 8, in step S801, a first video of a first orientation is acquired. For example, the first video of the first orientation may be a landscape video.

At step S802, at least one type of information of each frame of the first video is analyzed. The at least one type of information analyzed may include at least one of face information, body information, main object information, motion scene information, video boundary information, and the like. The face information may include face recognition information, face tracking information, and the like, and the main object information may include object recognition information, object tracking information, and the like. However, the above examples are merely exemplary, and the present disclosure is not limited thereto. For example, face recognition algorithms may be used to analyze face information in a frame, and optical flow algorithms may be used to analyze motion scene information in a frame.

In step S803, each labeled region corresponding to at least one type of information of the corresponding frame is generated based on the analysis of the at least one type of information. Here, the label area may refer to an area indicating information distribution. For a frame, the frame may include various information, and each analysis of one information in the frame may generate an information distribution map corresponding to the frame, and accordingly, if the various information in the frame is analyzed, a plurality of labeled regions may be generated.

As an example, when analyzing the face information in a frame, a pixel-based labeled region (mask) corresponding to the face information of the frame may be generated, and then the pixel-based labeled region may be converted into a labeled region with information distribution.

In step S804, a user interface for adjusting the weight occupied by each labeled area in video cropping is generated based on the analysis result and displayed. The user interface may include a slider bar or text entry box for adjusting the weight for each of the at least one information.

After analyzing the information contained in a frame, a user interface associated with analyzing the information may be generated. And a user interface used for adjusting the weight occupied by the labeling area corresponding to each piece of information during subsequent cutting is arranged in the user interface.

In step S805, a user input for adjusting the weight of each annotation region is received through the user interface. Each labeled region of the corresponding frame may be given a weight input by the user. The user can set the weight of the information to be reserved through the user interface according to the requirement of the user. For example, if the user wants to focus on protecting the face part from being cut off, the user may increase the weighting ratio of the labeled region of the face information and decrease the weighting ratio of the labeled region of other information. The user can interactively adjust the weighting parameters. By weighting the various labeled regions, information/regions that are of greater interest to the user can be highlighted.

In step S806, for each frame of the first video, the entire annotation region of the corresponding frame is calculated according to the respective annotation region whose weight is adjusted. For example, the weighted regions can be summed to obtain the overall labeled region for a frame.

In step S807, an annotation map for the corresponding frame is generated based on the entire annotation region. Here, the annotation graph may be an information distribution image for each annotation region.

In step S808, the focal point of the corresponding frame is obtained by calculating the moment of the annotation graph. For example, the geometric center point of the annotation graph can be computed as the focus of a frame.

In step S809, a cropping window for the respective frame is generated based on the focus and the specified aspect ratio and the respective frame is cropped with the cropping window. For example, after the focus of one frame is obtained, the focus is taken as the center of the cropping window, and the layout and size of the cropping window are set in accordance with a specified aspect ratio.

Alternatively, after the focus of each frame of the first video is obtained, the focus may be subjected to a fitting process, and the fitted focus is used as a final focus of each frame, so that the smooth frame-to-frame cropping effect can be achieved.

In step S810, a second video of a second orientation is generated based on the clipped frames.

Fig. 9 is a block diagram of a video conversion device according to an embodiment of the present disclosure.

Referring to fig. 9, the video conversion apparatus 900 may include an interface module 901, an analysis module 902, a display module 903, and an editing module 904. Each module in the video conversion apparatus 900 may be implemented by one or more modules, and names of the corresponding modules may vary according to types of the modules. In various embodiments, some modules in the video conversion apparatus 900 may be omitted, or additional modules may be further included. Furthermore, modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus the functions of the respective modules/elements may be equivalently performed prior to the combination.

The interface module 901 may be configured to receive a first video in a first orientation and a user input.

The analysis module 902 may be configured to analyze each frame of the first video to determine at least one information for each frame, and generate a user interface for each frame to adjust a weight of the at least one information at the time of the video orientation transition based on a result of the analysis.

In one possible implementation, the at least one information may include key zone information.

In one possible implementation, the key area information may include at least one of face information, human body information, important object information, motion scene information, and video boundary information.

The display module 903 may be configured to display a user interface, wherein a user input for adjusting the weight of at least one information is received via the user interface.

In one possible implementation, the user interface may include a user interface for adjusting the weights for each of the at least one information.

The editing module 904 may be configured to generate cropping window information to crop the first video based on the at least one information with the adjusted weight, and generate a second video in a second orientation based on the cropped first video.

In one possible implementation, the analysis module 902 may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing a distribution of the information, wherein the respective labeled regions of the respective frames are given a weight input by the user.

In one possible implementation, for each frame of the first video, the editing module 904 may calculate an overall annotation region of the corresponding frame according to the respective annotation region whose weight is adjusted; a focus of the respective frame is calculated based on the global labeling area, and a cropping window of the respective frame is generated based on the focus and the specified aspect ratio.

In one possible implementation, the editing module 904 may obtain a fitted focus of each frame by fitting the focus of the respective frame, and generate a cropping window of the respective frame based on the fitted focus and the specified aspect ratio.

In one possible implementation, the editing module 904 can generate a label graph for the corresponding frame based on the entire label region and obtain a focus of the corresponding frame by calculating a moment of the label graph.

In one possible implementation, the editing module 904 may calculate a focus of the corresponding frame based on the at least one information whose weight is adjusted, and generate a cropping window of the corresponding frame based on the focus.

In one possible implementation, the editing module 904 may generate each of the label regions with the adjusted weight of the corresponding frame based on at least one kind of information with the adjusted weight, the label regions being regions representing information distribution, and calculate the focus of the corresponding frame using each of the label regions with the adjusted weight.

The video conversion apparatus of this embodiment, the implementation principle and technical effect of implementing video conversion by using the above modules are the same as those of the related method embodiments, and details of the related method embodiments may be referred to, and are not repeated herein.

According to an embodiment of the present disclosure, an electronic device may be provided. Fig. 10 is a block diagram of an electronic device according to an embodiment of the disclosure, the electronic device 1000 including at least one memory 1002 and at least one processor 1001, the at least one memory 1002 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 1001, perform a video conversion method according to an embodiment of the disclosure.

By way of example, the electronic device 1000 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. The electronic device 1000 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 1000, the processor 1001 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processor 1001 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.

The processor 1001 may execute instructions or code stored in a memory, where the memory may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 1002 may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.

In addition, the electronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1000 may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a video conversion method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), solid-state disk (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or an extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a magnetic tape, a magneto-optical, Hard disk, solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and so on, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product, in which instructions are executable by a processor of a computer device to perform the above-mentioned video conversion method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video conversion method, characterized in that the video conversion method comprises:

acquiring a first video of a first orientation;

analyzing each frame of the first video to determine at least one type of information for each frame;

generating and displaying a user interface for adjusting the weight of the at least one information at the time of the video orientation conversion for each frame based on the analysis result;

receiving, through the user interface, a user input for adjusting a weight of the at least one information;

generating cropping window information to crop the first video based on the at least one information whose weight is adjusted;

generating a second video of a second orientation based on the cropped first video.

2. The video conversion method of claim 1, wherein the at least one type of information includes key region information,

the key area information comprises at least one of face information, human body information, important object information, motion information and video boundary information.

3. The video conversion method of claim 1, wherein the user interface comprises a user interface for adjusting the weights for each of the at least one information.

4. The video conversion method according to claim 1, wherein the step of generating the cropping window information based on the at least one information whose weight is adjusted comprises:

calculating a focus of a corresponding frame based on the at least one information whose weight is adjusted;

generating a cropping window for a respective frame based on the focus,

wherein the calculating of the focus of the corresponding frame based on the at least one information whose weight is adjusted includes:

generating each label area with the adjusted weight of the corresponding frame based on the at least one type of information with the adjusted weight, wherein the label area is an area representing information distribution;

calculating the focus of the corresponding frame by using the marked areas with the adjusted weights.

5. The video conversion method of claim 1, wherein the step of analyzing each frame of the first video comprises:

generating respective label regions corresponding to the at least one information of the respective frames based on the analysis of the at least one information, the label regions being regions representing distribution of information,

wherein the step of adjusting the weight of the at least one type of information comprises:

and giving the weight input by the user to each marked region of the corresponding frame.

6. The video conversion method according to claim 4 or 5, wherein the step of generating the cropping window information based on the at least one information whose weight is adjusted comprises:

for each frame of the first video, calculating the whole labeling area of the corresponding frame according to the labeling areas with the adjusted weights;

calculating a focus of the corresponding frame based on the entire labeling area;

generating a cropping window for the respective frame based on the focus and the specified aspect ratio,

wherein generating a cropping window for a respective frame based on the focus and a specified aspect ratio comprises:

obtaining a fitted focus of the corresponding frame by fitting the focus of each frame; and

generating a cropping window for the respective frame based on the fitted focus and the specified aspect ratio,

wherein the step of calculating the focus of the corresponding frame based on the entire annotation region comprises:

generating an annotation graph for the corresponding frame based on the overall annotation region;

the focal point of the corresponding frame is obtained by calculating the moment of the label graph.

7. A video conversion apparatus, characterized in that the video conversion apparatus comprises:

an interface module configured to receive a first video at a first orientation and a user input;

an analysis module configured to analyze each frame of the first video to determine at least one information of each frame, and to generate a user interface for each frame for adjusting a weight of the at least one information at the time of video orientation conversion based on a result of the analysis;

a display module configured to display the user interface, wherein a user input for adjusting the weight of the at least one information is received via the user interface;

an editing module configured to generate cropping window information to crop the first video based on the at least one information whose weight is adjusted, and to generate a second video of a second orientation based on the cropped first video.

8. A video conversion apparatus, characterized in that the video conversion apparatus comprises:

a display;

a transceiver for receiving a first video in a first orientation; and

a processor to:

generating a user interface for each frame for adjusting the weight of the at least one information at the time of video orientation conversion based on the analysis result;

controlling a display to display the user interface;

controlling a transceiver to receive a user input for adjusting a weight of the at least one information through the user interface;

9. An electronic device, comprising:

at least one processor;

at least one memory storing computer executable instructions,

wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 6.

10. A computer-readable storage medium storing instructions which, when executed by at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 6.