WO2022077995A1 - Procédé de conversion vidéo et dispositif de conversion vidéo - Google Patents

Procédé de conversion vidéo et dispositif de conversion vidéo Download PDF

Info

Publication number
WO2022077995A1
WO2022077995A1 PCT/CN2021/107704 CN2021107704W WO2022077995A1 WO 2022077995 A1 WO2022077995 A1 WO 2022077995A1 CN 2021107704 W CN2021107704 W CN 2021107704W WO 2022077995 A1 WO2022077995 A1 WO 2022077995A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
video
frame
focus
corresponding frame
Prior art date
Application number
PCT/CN2021/107704
Other languages
English (en)
Chinese (zh)
Inventor
宋玉岩
徐宁
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022077995A1 publication Critical patent/WO2022077995A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4438Window management, e.g. event handling following interaction with the user interface

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a video conversion method, a video conversion apparatus, a video conversion device, an electronic device, a computer-readable storage medium, and a computer program product.
  • the aspect ratio is greater than 1, that is, landscape
  • Videos or similar media recorded in large aspect ratios may be designed to be viewed on a desktop or in landscape orientation. Therefore, when a user uses a mobile terminal to watch a horizontal screen video, in order to obtain a good visual experience, the terminal screen is generally converted to a horizontal screen position to play the video.
  • the present disclosure provides a video conversion method, a video conversion apparatus, a video conversion device, an electronic device, a computer-readable storage medium, and a computer program product, so as to improve the video conversion effect.
  • a video conversion method may include the steps of: acquiring a first video in a first orientation; analyzing each frame of the first video to determine each at least one kind of information of a frame; generating and displaying a user interface for adjusting the weight of the at least one kind of information at the time of video orientation transition for each frame based on the analysis result; receiving through the user interface for adjusting the weight of the at least one kind of information User input of the weight of at least one kind of information; generating cut window information to cut the first video based on the at least one kind of information whose weights are adjusted; generating the first video of the second orientation based on the cut first video Two videos.
  • the at least one type of information may include key area information.
  • the key area information may include at least one of face information, human body information, prominent object information, motion scene information, and video boundary information.
  • the user interface may include a user interface for adjusting the weight for each of the at least one type of information.
  • the step of generating the clipping window information based on the at least one type of information whose weights are adjusted may include: calculating the focus of the corresponding frame based on the at least one type of information whose weights are adjusted ; and generating a clipping window for the corresponding frame based on the focus and the specified aspect ratio.
  • the step of calculating the focus of the corresponding frame based on the at least one kind of information whose weights are adjusted may include: generating the weights of the corresponding frames based on the at least one kind of information whose weights are adjusted. Adjusted respective mask regions, the mask regions being regions representing the distribution of information; and calculating the focus of the corresponding frame using the respective mask regions whose weights are adjusted.
  • the step of analyzing each frame of the first video may include: generating a corresponding frame corresponding to the at least one kind of information based on the analysis of the at least one kind of information
  • Each marked area, the marked area is an area representing the distribution of information.
  • the step of calculating the focus of the corresponding frame by using the respective annotation regions whose weights are adjusted may include: for each frame of the first video, according to the respective annotations whose weights are adjusted The overall annotation area of the corresponding frame is calculated based on the area; and the focus of the corresponding frame is calculated based on the overall annotation area.
  • the step of generating the clipping window of the corresponding frame based on the focus and the specified aspect ratio may include: obtaining the fitting of the corresponding frame by fitting the focus of each frame and generates a clipping window of the corresponding frame based on the fitted focus and the specified aspect ratio.
  • the step of calculating the focus of the corresponding frame based on the overall annotation area may include: generating an annotation map for the corresponding frame based on the overall annotation area; and obtaining the corresponding frame by calculating a moment of the annotation map hot spot.
  • a video conversion apparatus configured to include the following modules: an interface module configured to receive a first video of a first orientation and user input; an analysis module configured to for analyzing each frame of the first video to determine at least one kind of information for each frame, and generating a user for each frame for adjusting the weight of the at least one kind of information in the video orientation transition based on the analysis result an interface; a display module configured to display the user interface, wherein user input for adjusting a weight of the at least one information is received via the user interface; and an editing module configured to be adjusted based on the weight
  • the at least one piece of information is used to generate cut window information to cut the first video, and to generate a second video with a second orientation based on the cut first video.
  • the at least one type of information may include key area information.
  • the key area information may include at least one of face information, human body information, prominent object information, motion scene information, and video boundary information.
  • the user interface may include a user interface for adjusting the weight for each of the at least one type of information.
  • the editing module is configured to calculate the focus of the corresponding frame based on the at least one kind of information whose weight is adjusted, and to generate the focus of the corresponding frame based on the focus and the specified aspect ratio. Cut the window.
  • the editing module is configured to generate, based on the at least one type of information whose weights are adjusted, each marked area in which the weight of the corresponding frame is adjusted, where the marked area is an area representing information distribution, and calculating the focal point of the corresponding frame using the respective marked regions whose weights are adjusted.
  • the analysis module may generate, based on the analysis of the at least one type of information, each marked area of the corresponding frame corresponding to the at least one type of information, where the marked area represents the distribution of information , wherein the respective annotated regions of the corresponding frame are given weights input by the user.
  • the editing module for each frame of the first video, is configured to calculate the overall labeling area of the corresponding frame according to the respective labeling areas whose weights are adjusted, and to calculate the overall labeling area based on the overall labeling area. Calculate the focus of the corresponding frame.
  • the editing module is configured to obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and to obtain the fitted focus based on the fitted focus and the specified aspect ratio. Generate a clipping window for the corresponding frame.
  • the editing module is configured to generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating a moment of the annotation map.
  • a video conversion device may include: a display; a transceiver for receiving a first video in a first orientation; and a processor for converting a first video Analyzing each frame of the video to determine at least one kind of information for each frame; generating a user interface for each frame for adjusting the weight of the at least one kind of information in the video orientation transition based on the analysis results; controlling the display of the display the user interface; controlling the transceiver to receive, through the user interface, a user input for adjusting the weight of the at least one kind of information; generating clipping window information based on the at least one kind of information whose weight is adjusted to Cut a video; generate a second video with a second orientation based on the cut first video.
  • the at least one type of information includes key area information.
  • the key area information includes at least one of face information, human body information, prominent object information, motion information, and video boundary information.
  • the user interface includes a user interface for adjusting the weight for each of the at least one type of information.
  • the processor may calculate the focus of the corresponding frame based on the at least one type of information whose weights are adjusted, and generate a cutout of the corresponding frame based on the focus and a specified aspect ratio window.
  • the processor may generate, based on the at least one type of information whose weights are adjusted, each marked area in which the weight of the corresponding frame is adjusted, where the marked area is an area representing the distribution of information, and uses The weights are adjusted for the respective annotated regions to calculate the focus of the corresponding frame.
  • the processor may generate, based on the analysis of the at least one type of information, each marked area of the corresponding frame corresponding to the at least one type of information, where the marked area represents the distribution of information Area.
  • the processor may calculate the overall labeling area of the corresponding frame according to the respective labeling areas whose weights are adjusted, and calculate the corresponding labeling area based on the overall labeling area. frame focus.
  • the processor may obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and generate the corresponding frame based on the fitted focus and the specified aspect ratio. frame clipping window.
  • the processor may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating a moment of the annotation map.
  • an electronic device may include: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are At least one processor, when run, causes the at least one processor to perform the video conversion method as described above.
  • a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to execute the video conversion method as described above.
  • a computer program product wherein instructions in the computer program product are executed by at least one processor in an electronic device to execute the video conversion method as described above.
  • the user can adjust the proportion of each information stream in the converted video result according to his own needs, so that the important information defined by the user is retained in the cutting process, so as to achieve the cutting effect expected by the user. .
  • the distribution of key information in each frame is more prominent, and by fitting the trajectory of the focus of each frame, better clipping information can be provided, and the fit between frames can be increased. , to improve the user experience.
  • FIG. 1 is a diagram of an application environment for converting video from one orientation to another, provided according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a video conversion method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of cutting a single frame according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a marked area according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a user interface according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a single frame transitioning from one orientation to another according to an embodiment of the present disclosure
  • FIG. 7 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of a video conversion method according to another embodiment of the present disclosure.
  • FIG. 9 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the video cutting in the related art is fully automatic operation, and the user cannot adjust the importance of each information flow in the video scene. As a result, the cut video scene may not meet user expectations.
  • the user by setting the user interface, the user can adjust the weight of each information stream in the video conversion according to the requirements, so as to achieve a better cutting effect.
  • FIG. 1 is a diagram of an application environment for converting video from one orientation to another, provided according to an embodiment of the present disclosure.
  • orientation is relative to the lateral or vertical orientation of the device/apparatus.
  • the application environment 100 includes a terminal 110 and a media server system 120 .
  • the terminal 110 is a terminal where the user is located, and the terminal 110 may be at least one of a smart phone, a tablet computer, a portable computer, a desktop computer, and the like. Although this embodiment only shows one terminal 110 for description, those skilled in the art may know that the number of the above-mentioned terminals may be two or more. This embodiment of the present disclosure does not impose any limitation on the number of terminals and device types.
  • the terminal 110 may be installed with a target application for providing the video to be cut and converted to the media server system 120 , and the target application may be a multimedia application, a social application or an information application or the like.
  • the terminal 110 may be a terminal used by a user, and the user's account is logged in an application running in the terminal 110 .
  • the terminal 110 can be connected to the media server system 120 through a wireless network or a wired network, so that data interaction can be performed between the terminal 110 and the media server system 120 .
  • a network may include a local area network (LAN), a wide area network (WAN), a telephone network, a wireless link, an intranet, the Internet, combinations thereof, and the like.
  • the media server system 120 may be a server system for cut-converting video.
  • media server system 120 may include one or more processing processors and memory.
  • the memory may include one or more programs for performing the above video conversion method.
  • the media server system 120 may also include a power supply component configured to perform power management of the media server system 120; a wired or wireless network interface configured to connect the media server system 120 to a network; and an input/output (I/O) interface .
  • the media server system 120 may operate based on an operating system stored in memory, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
  • Windows ServerTM Mac OS XTM
  • UnixTM UnixTM
  • LinuxTM FreeBSDTM
  • the devices included in the media server system 120 described above are only exemplary, and the present disclosure is not limited thereto.
  • the media server system 120 can use the video conversion method of the present disclosure to cut and convert the input video, and then deliver the converted video to the terminal 110 or to the media platform via a wireless network or a wired network.
  • the terminal 110 may be installed with an application program implementing the video conversion method of the present disclosure, and the terminal 110 may realize the cutting and conversion of the video.
  • the memory of the terminal 110 may store one or more programs for performing the above video conversion method.
  • the processor of the terminal 110 can cut and convert the video by running related programs/algorithms. Then, the terminal 110 may upload the cut and converted video to the media server system 120 via a wireless network or a wired network, or may store the converted video in the memory of the terminal 110 .
  • the terminal 110 may transmit the horizontal video obtained locally or externally to the media server system 120 via a wireless or wired network, the media server system 120 cuts and converts the horizontal video into a vertical video, and then converts the converted video via the wireless or wired network The good vertical video is delivered to the terminal 110 .
  • the terminal 110 may cut and convert the horizontal video obtained locally or externally into a vertical video screen, and then upload the vertical video to the media server system 120 via a wireless or wired network.
  • the media server system 120 may distribute the vertical video to other electronic devices.
  • a portrait video can also be cut and transformed into a landscape video.
  • FIG. 2 is a flowchart of a video conversion method according to an embodiment of the present disclosure.
  • the video conversion method of the embodiment of the present disclosure may be executed by the media server system 120 or an electronic device having the video cut conversion function of the present disclosure.
  • a first video of a first orientation is acquired.
  • the first video of the first orientation may refer to a landscape video.
  • the first video can be obtained locally or externally.
  • each frame of the first video is analyzed to determine at least one kind of information of each frame.
  • at least one type of information of each frame may include key area information, for example, may include at least one of face information, human body information, main object information, motion scene information, and video boundary information.
  • the face information may include face recognition information and face tracking information, etc.
  • the main object information may include object identification information and object tracking information.
  • the above examples are merely exemplary, and the present disclosure may analyze any amount and kind of information in a frame.
  • an analysis algorithm for main information, key information, or information of interest to the user may be stored in advance to implement the analysis of the information contained in the frame.
  • a face recognition algorithm can be used to analyze the face information in a frame
  • an optical flow algorithm can be used to analyze the motion scene information in a frame.
  • the above-described examples are merely exemplary, and the present disclosure is not limited thereto.
  • a user interface for adjusting the weight of at least one kind of information of each frame when the video orientation is converted is generated and displayed based on the analysis result.
  • the user interface may include a user interface for adjusting the weight of the information contained in the frame.
  • the user interface may include slider bars or text entry boxes for adjusting each type of information.
  • step S204 a user input for adjusting the weight of at least one kind of information is received through the user interface.
  • the user can apply a higher weight to one type of information (for example, face information) and lower weight to other information according to his needs on the user interface.
  • each annotated area corresponding to at least one type of information in a frame may be generated based on an analysis of at least one type of information in the frame, and then each annotated area of the frame is assigned a weight input by the user .
  • the labeled area may refer to the area where the information is distributed. The marked area will be described below with reference to FIG. 4 .
  • each type of information corresponds to one type of information labeling area, and weighting each type of information may be interpreted as weighting the information labeling area.
  • the user may apply a higher weight to the annotated area of the face information via the user interface, and may appropriately apply a lower weight to the annotated area of other information.
  • the user interface By setting the user interface for each frame of the first video, the user can implement the weighting of each piece of information in each frame in the subsequent cutting and converting operations.
  • step S205 cutting window information is generated based on at least one kind of information whose weights are adjusted to cut the first video.
  • the overall labeling area of the corresponding frame may be calculated according to each labeling area whose weights are adjusted, the focus of the corresponding frame is calculated based on the overall labeling area, and then based on the focus and Specify an aspect ratio (eg, the aspect ratio of the video to be converted) to generate a clipping window for the corresponding frame.
  • the focus can reflect the distribution of important information in a frame.
  • the size of the cutting window may be preset, or the size of the cutting window may be adaptively adjusted.
  • the fitted focus of each frame may be obtained by fitting the focus of each frame, and a clipping window of the corresponding frame may be generated based on the fitted focus and a specified aspect ratio.
  • the processor may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating a moment of the annotation map.
  • the focus of the corresponding frame can be obtained by calculating the geometric center point of the annotation map.
  • a second video with a second orientation is generated based on the cut first video.
  • the second video in the second orientation may be a portrait video.
  • a second video of the specified aspect ratio is generated by cropping each frame of the first video.
  • the user can adjust the proportion of each information in the converted video result according to his own needs, so as to achieve the user's desired cutting effect, and by calculating the focus of each frame of the image, it is more prominent
  • the distribution of the key information of each frame, and through the trajectory fitting of the focus of each frame, it can provide better clipping information, increase the fit between frames, and improve user experience.
  • the key frames included in the first video may be analyzed, and a corresponding user interface may be generated for each key frame. Adjust the annotation area of each key frame, then calculate the focus of each key frame according to the adjusted annotation area, and adaptively adjust the focus of the relevant frame of the first video by fitting the focus of the key frame, so as to obtain the first video. cut window information to cut the first video.
  • FIG. 3 is a schematic diagram of cropping a single frame according to an embodiment of the present disclosure.
  • the image 301 is analyzed to determine M kinds of information of the image 301, where M is a positive integer.
  • the analysis of each type of information can be implemented by using a corresponding analysis method, that is, M types of analysis methods can be used to analyze the image 301 to determine M types of information.
  • the face information of the image 301 can be analyzed using a face analysis method.
  • M corresponding marked regions can be generated by analyzing the M types of information, that is, an information distribution map corresponding to the image 301 is generated for each type of information analyzed. For example, when analyzing the face information, a pixel-based labeling area of the face information of the image 301 is generated, and then the pixel-based labeling area is converted into a labeling area representing the distribution of information.
  • the overall marked area of the image 301 is calculated according to the weighted M marked areas.
  • the overall marked area of the image 301 can be obtained by summing the weighted M marked areas, that is to say, according to the M marked areas given the weights marked by the user, generate a Overall marked area.
  • the labeling map of the image 301 may be generated based on the overall labeling area. Since the weighting process has been performed on each labeling area before, the labeling map can show the importance of each labeling area. As an implementation method, after obtaining the overall labeling area, it can be based on the M labels contained in the overall labeling area. The weight of the area is identified according to the size of the weight, so that the generated annotation map of the image 301 can display the importance of each annotated area, wherein the greater the weight, the higher the importance.
  • the focus of the image 301 is obtained by computing the moments of the annotation map.
  • the focus of the image 301 can be obtained by calculating the geometric center point of the annotation map.
  • a clipping window is generated using the position of the focus and the specified aspect ratio, and finally the image 301 is clipped using the clipping window.
  • FIG. 4 is a schematic diagram of a marked area according to an embodiment of the present disclosure.
  • FIG. 4 (a) of FIG. 4 is a certain frame of the first video, and (b) of FIG. 4 shows the marked area of important information (such as motion information) in the frame (as shown in (b) of FIG. 4 ) shown in the white area).
  • important information such as motion information
  • FIG. 5 is a schematic diagram of a user interface according to an embodiment of the present disclosure. After analyzing various information of a frame, a user interface associated with the various information may be displayed accordingly.
  • a slider bar may be configured for each type of information, and the slider bar may be used to adjust the weight of the corresponding information.
  • the range of the slider can be set to [0, 1].
  • click the "OK" button to complete the setting of the weight of each information flow in a frame.
  • the weight information input by the user may be transmitted to the processor of the electronic device for subsequent cut conversion.
  • the corresponding cutout window may be presented on the corresponding frame, so as to show the user the cutout position of the cutout window on the frame.
  • the user interface of FIG. 5 is merely exemplary, and elements in the user interface may be presented in other forms.
  • a text input box may be configured for each type of information, and a user may assign weights to corresponding information through the text input box.
  • a user may assign weights to corresponding information through the text input box.
  • the user interface can be displayed on a partial area of the display of the electronic device, or displayed on the display in a full screen, and those skilled in the art can make display settings according to actual needs.
  • FIG. 6 is a schematic diagram of a single frame transitioning from one orientation to another according to an embodiment of the present disclosure.
  • the input image 601 is shown in a horizontal or landscape orientation, as shown in (a) of FIG. 6 .
  • an image 602 can be generated.
  • Image 602 preserves areas of important information as much as possible.
  • Image 602 is shown in a vertical or portrait orientation, as shown in (b) of FIG. 6 . In this embodiment, by assigning greater weight to the area in image 601 where the person is riding, this area is retained in image 602 .
  • this embodiment shows converting a horizontal video into a vertical video
  • those skilled in the art can convert a vertical video into a horizontal video according to the above-mentioned video conversion method.
  • the video conversion device 700 may be implemented as a terminal 110 or as a media server system 120, or any other device.
  • a video conversion apparatus 700 may include a transceiver 701 , a display 702 and a processor 703 .
  • the transceiver 701 may externally receive the first video in the first orientation. Afterwards, the processor 703 may analyze each frame of the first video to determine at least one kind of information of each frame, and generate, based on the analysis result, a method for adjusting the at least one kind of information at the time of video orientation conversion for each frame. Weight UI.
  • the processor 703 may control the display 702 to display a user interface.
  • the user interface may include graphics, text, icons, video, and any combination thereof associated with the analysis information.
  • the display 702 is a touch display screen, the display 702 also has the ability to acquire touch signals on or over the surface of the display 702 .
  • the touch signal may be input to the processor 701 as a control signal for processing.
  • the display 702 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the number of displays 702 may be one, which is arranged on the front panel of the video conversion device 700; in other embodiments, the number of displays 702 may be at least two, which are respectively arranged on different surfaces of the video conversion device 700 or folded. Design; In still other embodiments, the display 702 may be a flexible display screen disposed on a curved or folded surface of the video conversion device 700 .
  • the display 702 can be prepared by using a liquid crystal display (LCD, Liquid Crystal Display), an organic light emitting diode (OLED, Organic Light-Emitting Diode) and other devices.
  • LCD liquid crystal display
  • OLED Organic Light-Emitting Diode
  • the user can adjust the proportion of each information in the converted video result according to his own needs, so that the important information defined by the user is retained in the cutting process, so as to achieve the cutting effect expected by the user.
  • the processor 703 may control the transceiver 701 to receive, through the user interface, a user input for adjusting the weight of at least one kind of information of each frame, and to generate window-cutting information for each frame based on the at least one kind of information whose weight is adjusted,
  • the cutting window information of each frame is used to cut each frame of the first video, and finally a second video with a second orientation is generated based on the cut first video.
  • the transceiver 701 can output the generated second video to other devices.
  • the processor 703 may generate, based on the analysis of the at least one type of information, each marked area of the corresponding frame corresponding to the at least one type of information, where the marked area is an area representing the distribution of information, wherein each of the corresponding frame Annotated regions are given weights entered by the user.
  • the processor 703 may calculate the overall labeling area of the corresponding frame according to each labeling area whose weight is adjusted, calculate the focus of the corresponding frame based on the overall labeling area, and calculate the focus of the corresponding frame based on the focus. and the specified aspect ratio to generate a clipping window for the corresponding frame.
  • the processor 703 may obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and then generate the corresponding frame based on the fitted focus and the specified aspect ratio. Cut the window.
  • the distribution of key information of each frame is more prominent, and by fitting the trajectory of the focus of each frame, it can provide better clipping information, increase the degree of fit between frames, and improve user experience.
  • the processor may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating a moment of the annotation map.
  • the video conversion apparatus 700 may include a memory that may store the original input video and the converted video. Additionally, the memory may include one or more computer-readable storage media, which may be non-transitory. Memory may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory is used to store at least one instruction for execution by processor 703 .
  • the video conversion device 700 further includes: a peripheral device interface and at least one peripheral device.
  • the processor 703 and the peripheral device interface can be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface through bus, signal line or circuit board.
  • the peripheral device may include at least one of a radio frequency circuit, a touch display screen, a camera, an audio circuit, a positioning component, a power supply, and the like.
  • video conversion device 700 may also include one or more sensors.
  • the one or more sensors include, but are not limited to, acceleration sensors, gyroscope sensors, pressure sensors, fingerprint sensors, optical sensors, and proximity sensors.
  • the processor 703 may receive an indication of an orientation change from one or more sensors, thereby recommending a video of the corresponding orientation to the user.
  • FIG. 8 is a flowchart of a video conversion method according to another embodiment of the present disclosure.
  • a first video of a first orientation is acquired.
  • the first video in the first orientation may be a landscape video.
  • step S802 at least one kind of information of each frame of the first video is analyzed.
  • the at least one type of information analyzed may include at least one of face information, human body information, main object information, motion scene information, video boundary information, and the like.
  • the face information may include face recognition information and face tracking information, etc.
  • the main object information may include object identification information and object tracking information.
  • a face recognition algorithm can be used to analyze the face information in a frame
  • an optical flow algorithm can be used to analyze the motion scene information in a frame.
  • each annotated area corresponding to the at least one type of information of the corresponding frame is generated based on the analysis of the at least one type of information.
  • the labeled area may refer to an area representing the distribution of information.
  • the frame may include a variety of information, each time one kind of information in the frame is analyzed, an information distribution map corresponding to the frame can be generated.
  • multiple Label area if a variety of information in a frame is analyzed, multiple Label area.
  • a pixel-based annotation area corresponding to the face information of this frame can be generated, and then the pixel-based annotation area can be converted into an annotation area representing the distribution of information .
  • a user interface for adjusting the weight occupied by each marked region in the video cutting is generated and displayed.
  • the user interface may include a slider bar or text input box for adjusting the weight for each of the at least one information.
  • a user interface associated with the analyzed information can be generated.
  • a user interface is arranged in the user interface for adjusting the weight occupied by the marked area corresponding to each piece of information in the subsequent cutting.
  • a user input for adjusting the weight of each marked region is received through the user interface.
  • Weights input by the user may be assigned to each annotated region of the corresponding frame.
  • Users can set the weight of the information they want to keep through the user interface according to their needs. For example, if the user wants to focus on protecting the face part from being cut off, the user can increase the weight of the marked area of the face information, and reduce the weight of the marked area of other information.
  • the user can interactively adjust the weighting parameters. By weighting each labeled area, the information/area that the user pays more attention to can be highlighted.
  • step S806 for each frame of the first video, the overall labeling area of the corresponding frame is calculated according to each labeling area whose weight is adjusted. For example, the weighted regions can be summed to obtain the overall annotated region of a frame.
  • an annotation map for the corresponding frame is generated based on the overall annotation area.
  • the annotation map may be an information distribution image for each annotation area.
  • step S808 the focus of the corresponding frame is obtained by calculating the moment of the annotation map.
  • the geometric center point of the annotation map can be calculated as the focal point of a frame.
  • a clipping window of the corresponding frame is generated based on the focus and the specified aspect ratio and the corresponding frame is clipped using the clipping window. For example, after obtaining the focus of a frame, the focus is set as the center of the clipping window, and the layout and size of the clipping window are set according to the specified aspect ratio.
  • fitting processing may be performed on these focuses, and the fitted focus may be used as the final focus of each frame, so that The processing can achieve a smooth clipping effect between frames.
  • step S810 a second video in a second orientation is generated based on the cropped frames.
  • FIG. 9 is a block diagram of a video conversion apparatus according to an embodiment of the present disclosure.
  • the video conversion apparatus 900 may include an interface module 901 , an analysis module 902 , a display module 903 and an editing module 904 .
  • Each module in the video conversion apparatus 900 may be implemented by one or more modules, and the name of the corresponding module may vary according to the type of the module. In various embodiments, some modules in the video conversion apparatus 900 may be omitted, or additional modules may also be included.
  • modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus may equivalently perform the functions of the corresponding modules/elements prior to combination.
  • the interface module 901 may be configured to receive the first video in the first orientation and user input.
  • the analysis module 902 may be configured to analyze each frame of the first video to determine at least one information for each frame, and to generate at least one information for each frame based on the analysis results for adjusting the at least one information during the video orientation transition The weights of the user interface.
  • the at least one type of information may include critical area information.
  • the key area information may include at least one of face information, human body information, prominent object information, motion scene information, and video boundary information.
  • the display module 903 may be configured to display a user interface, wherein user input for adjusting the weight of at least one type of information is received via the user interface.
  • the user interface may include a user interface for adjusting weights for each of the at least one information.
  • the editing module 904 may be configured to generate cut window information to cut the first video based on at least one type of information whose weights are adjusted, and to generate a second video in a second orientation based on the cut first video.
  • the analysis module 902 is configured to generate, based on the analysis of the at least one kind of information, respective labeled regions of the corresponding frame corresponding to the at least one kind of information, the labeled regions being regions representing the distribution of the information.
  • the editing module 904 may calculate the overall labeling area of the corresponding frame according to each labeling area whose weights are adjusted; and calculate the focus of the corresponding frame based on the overall labeling area.
  • the editing module 904 may obtain the fitted focus of the corresponding frame by fitting the focus of each frame, and generate a clipping of the corresponding frame based on the fitted focus and the specified aspect ratio cut window.
  • the editing module 904 may generate an annotation map for the corresponding frame based on the overall annotation area, and obtain the focus of the corresponding frame by calculating moments of the annotation map.
  • the editing module 904 may calculate the focus of the corresponding frame based on the at least one information whose weights are adjusted, and generate a clipping window of the corresponding frame based on the focus and the specified aspect ratio.
  • the editing module 904 may generate, based on at least one type of information whose weights are adjusted, each annotated area in which the weight of the corresponding frame is adjusted, where the annotated area is an area representing the distribution of information, and each annotated area using the weighted adjustment to calculate the focus of the corresponding frame.
  • this embodiment also provides an electronic device, the electronic device may include: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are When the at least one processor is running, the at least one processor is caused to execute the video conversion method described in the above embodiment.
  • this embodiment further provides a computer-readable storage medium storing instructions, when the instructions are executed by at least one processor, the at least one processor is caused to execute the video described in the above embodiments conversion method.
  • this embodiment also provides a computer program product, wherein instructions in the computer program product are executed by at least one processor in the electronic device to execute the video conversion method described in the above embodiments.
  • FIG. 10 is a block diagram of an electronic device 1000 according to an embodiment of the present disclosure.
  • the electronic device 1000 includes at least one memory 1002 and at least one processor 1001, and the at least one memory 1002 stores a set of computer-executable instructions.
  • the instruction set is executed by at least one processor 1001
  • the video conversion method according to the embodiment of the present disclosure is executed.
  • the electronic device 1000 may be a personal computer (PC), a tablet device, a personal digital assistant, a smart phone, or other device capable of executing the above set of instructions.
  • the electronic device 1000 is not necessarily a single electronic device, but can also be a collection of any device or circuit capable of executing the above-mentioned instructions (or instruction sets) individually or jointly.
  • Electronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces locally or remotely (eg, via wireless transmission).
  • the processor 1001 may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller or a microprocessor.
  • processor 1001 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
  • the processor 1001 may execute instructions or code stored in memory, which may also store data. Instructions and data may also be sent and received over a network via a network interface device, which may employ any known transport protocol.
  • the memory 1002 may be integrated with the processor, eg, RAM or flash memory arranged within an integrated circuit microprocessor or the like.
  • the memory may comprise a separate device such as an external disk drive, a storage array, or any other storage device that may be used by a database system.
  • the memory and the processor may be operatively coupled, or may communicate with each other, eg, through I/O ports, network connections, etc., to enable the processor to read files stored in the memory.
  • the electronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1000 may be connected to each other via a bus and/or a network.
  • a video display such as a liquid crystal display
  • a user interaction interface such as a keyboard, mouse, touch input device, etc.
  • a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the video conversion method according to the present disclosure.
  • Examples of the computer-readable storage medium herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Disc Storage, Hard Disk Drive (HDD), Solid State Hard disk (SSD), card memory (such as a multimedia card, Secure Digital (SD) card, or Extreme Digital (X), Secure Digital (SD) card, or Extreme
  • the computer program in the above-mentioned computer-readable storage medium can run in an environment deployed in a computer device such as a client, a host, an agent device, a server, etc.
  • a computer device such as a client, a host, an agent device, a server, etc.
  • the computer program and any associated data, data files and data structures are distributed over networked computer systems so that the computer programs and any associated data, data files and data structures are stored, accessed and executed in a distributed fashion by one or more processors or computers.
  • a computer program product can also be provided, and the instructions in the computer program product can be executed by the processor of the computer device to complete the above-mentioned video conversion method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

Un procédé de conversion vidéo, un appareil de conversion vidéo, un dispositif de conversion vidéo, un dispositif électronique, un support de stockage lisible par ordinateur et un produit-programme d'ordinateur sont divulgués. Le procédé de conversion vidéo peut faire appel aux étapes suivantes : l'acquisition d'une première vidéo dans une première orientation (S201); l'analyse de chaque trame de la première vidéo de façon à déterminer au moins un type d'informations de chaque trame (S202); la génération et l'affichage, sur la base d'un résultat d'analyse, d'une interface utilisateur pour chaque trame et utilisée pour ajuster le poids de l'au moins un type d'informations pendant la conversion d'orientation vidéo (S203); la réception, au moyen de l'interface utilisateur, d'une entrée d'utilisateur pour ajuster le poids de l'au moins un type d'informations (S204); la génération d'informations de fenêtre de découpage sur la base de l'au moins un type d'informations dont le poids a été ajusté, de façon à découper la première vidéo (S205); et la génération d'une seconde vidéo dans une seconde orientation sur la base de la première vidéo découpée (S206).
PCT/CN2021/107704 2020-10-12 2021-07-21 Procédé de conversion vidéo et dispositif de conversion vidéo WO2022077995A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011086676.9 2020-10-12
CN202011086676.9A CN112218160A (zh) 2020-10-12 2020-10-12 视频转换方法及装置和视频转换设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022077995A1 true WO2022077995A1 (fr) 2022-04-21

Family

ID=74053573

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107704 WO2022077995A1 (fr) 2020-10-12 2021-07-21 Procédé de conversion vidéo et dispositif de conversion vidéo

Country Status (2)

Country Link
CN (1) CN112218160A (fr)
WO (1) WO2022077995A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112218160A (zh) * 2020-10-12 2021-01-12 北京达佳互联信息技术有限公司 视频转换方法及装置和视频转换设备及存储介质
CN113032626A (zh) * 2021-03-23 2021-06-25 北京字节跳动网络技术有限公司 搜索结果处理方法、装置、电子设备和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008301123A (ja) * 2007-05-30 2008-12-11 Sharp Corp 撮像機能付き端末装置、該端末装置の制御方法、制御プログラム、該制御プログラムを記録したコンピュータ読み取り可能な記録媒体
US20140152773A1 (en) * 2011-07-25 2014-06-05 Akio Ohba Moving image capturing device, information processing system, information processing device, and image data processing method
CN108921856A (zh) * 2018-06-14 2018-11-30 北京微播视界科技有限公司 图像裁剪方法、装置、电子设备及计算机可读存储介质
CN108986117A (zh) * 2018-07-18 2018-12-11 北京优酷科技有限公司 视频图像分割方法及装置
CN111373740A (zh) * 2017-12-05 2020-07-03 谷歌有限责任公司 使用选择界面将横向视频转换成纵向移动布局的方法
CN112165635A (zh) * 2020-10-12 2021-01-01 北京达佳互联信息技术有限公司 视频转换方法、装置、系统及存储介质
CN112218160A (zh) * 2020-10-12 2021-01-12 北京达佳互联信息技术有限公司 视频转换方法及装置和视频转换设备及存储介质

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999034280A1 (fr) * 1997-12-29 1999-07-08 Koninklijke Philips Electronics N.V. Interface utilisateur graphique permettant de mesurer des parametres d'entree
US6933954B2 (en) * 2003-10-31 2005-08-23 Microsoft Corporation Aspect ratio conversion of video content
US7392278B2 (en) * 2004-01-23 2008-06-24 Microsoft Corporation Building and using subwebs for focused search
US7528846B2 (en) * 2005-02-23 2009-05-05 Microsoft Corporation Systems and methods to adjust a source image aspect ratio to match a different target display aspect ratio
US9800422B2 (en) * 2012-10-26 2017-10-24 International Business Machines Corporation Virtual meetings
WO2014100936A1 (fr) * 2012-12-24 2014-07-03 华为技术有限公司 Procédé, plateforme et système de création de bibliothèque d'informations associées de vidéos et de lecture des vidéos
US10817672B2 (en) * 2014-10-01 2020-10-27 Nuance Communications, Inc. Natural language understanding (NLU) processing based on user-specified interests
CN106055912B (zh) * 2016-06-15 2019-04-16 张家港赛提菲克医疗器械有限公司 一种根据在线影像产生治疗床调整数据的计算机系统
CN113079390B (zh) * 2016-07-01 2024-04-05 斯纳普公司 一种用于处理视频源的方法、服务器计算机以及计算机可读介质
EP3482286A1 (fr) * 2016-11-17 2019-05-15 Google LLC Rendu multimédia avec métadonnées d'orientation
WO2018106213A1 (fr) * 2016-12-05 2018-06-14 Google Llc Procédé pour convertir une vidéo de paysage en une disposition mobile de portrait
CN108038730A (zh) * 2017-12-22 2018-05-15 联想(北京)有限公司 产品相似度判断方法、装置及服务器集群
US11462009B2 (en) * 2018-06-01 2022-10-04 Apple Inc. Dynamic image analysis and cropping
CN111010590B (zh) * 2018-10-08 2022-05-17 阿里巴巴(中国)有限公司 一种视频裁剪方法及装置
CN110691259B (zh) * 2019-11-08 2022-04-22 北京奇艺世纪科技有限公司 视频播放方法、系统、装置、电子设备及存储介质
CN111107418B (zh) * 2019-12-19 2022-07-12 北京奇艺世纪科技有限公司 视频数据处理方法、装置、计算机设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008301123A (ja) * 2007-05-30 2008-12-11 Sharp Corp 撮像機能付き端末装置、該端末装置の制御方法、制御プログラム、該制御プログラムを記録したコンピュータ読み取り可能な記録媒体
US20140152773A1 (en) * 2011-07-25 2014-06-05 Akio Ohba Moving image capturing device, information processing system, information processing device, and image data processing method
CN111373740A (zh) * 2017-12-05 2020-07-03 谷歌有限责任公司 使用选择界面将横向视频转换成纵向移动布局的方法
CN108921856A (zh) * 2018-06-14 2018-11-30 北京微播视界科技有限公司 图像裁剪方法、装置、电子设备及计算机可读存储介质
CN108986117A (zh) * 2018-07-18 2018-12-11 北京优酷科技有限公司 视频图像分割方法及装置
CN112165635A (zh) * 2020-10-12 2021-01-01 北京达佳互联信息技术有限公司 视频转换方法、装置、系统及存储介质
CN112218160A (zh) * 2020-10-12 2021-01-12 北京达佳互联信息技术有限公司 视频转换方法及装置和视频转换设备及存储介质

Also Published As

Publication number Publication date
CN112218160A (zh) 2021-01-12

Similar Documents

Publication Publication Date Title
US11501499B2 (en) Virtual surface modification
US20200250888A1 (en) Redundant tracking system
AU2017339440B2 (en) Techniques for incorporating a text-containing image into a digital image
WO2022077977A1 (fr) Procédé et appareil de conversion vidéo
US10593118B2 (en) Learning opportunity based display generation and presentation
US20120113141A1 (en) Techniques to visualize products using augmented reality
WO2022077995A1 (fr) Procédé de conversion vidéo et dispositif de conversion vidéo
US11715223B2 (en) Active image depth prediction
US9729792B2 (en) Dynamic image selection
US10901612B2 (en) Alternate video summarization
US20200007948A1 (en) Video subtitle display method and apparatus
WO2019214019A1 (fr) Procédé et appareil d'enseignement en ligne basés sur un réseau neuronal à convolution
US11392788B2 (en) Object detection and identification
US10841482B1 (en) Recommending camera settings for publishing a photograph based on identified substance
US10841544B2 (en) Systems and methods for media projection surface selection
US20240037944A1 (en) Computer-generated reality recorder
CN109923540A (zh) 实时记录用于修改动画的手势和/或声音
US20240062490A1 (en) System and method for contextualized selection of objects for placement in mixed reality
TW201814433A (zh) 虛擬實境環境中選定物件之管理方法及系統,及其相關電腦程式產品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21879042

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03-08-2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21879042

Country of ref document: EP

Kind code of ref document: A1