CN113709385A

CN113709385A - Video processing method and device, computer equipment and storage medium

Info

Publication number: CN113709385A
Application number: CN202110245053.XA
Authority: CN
Inventors: 赵远远; 郑青青; 刘浩; 李琛; 杨博; 吕静
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-11-26

Abstract

The application discloses a video processing method and device, computer equipment and storage medium applied to the field of image processing, comprising the following steps: and acquiring a video processing instruction aiming at the video to be processed, and detecting the video content of the video to be processed to obtain a video content detection result. And responding to a video processing instruction, and acquiring K video type labels corresponding to the video to be processed according to a video content detection result. And determining K video processing modes according to the K video type labels, wherein each video processing mode comprises at least two processing sub-modes, and each processing sub-mode comprises at least one of a picture quality processing mode and a content processing mode. And processing the video to be processed by adopting K video processing modes to output the target video. By the method, a plurality of video processing items can be executed on the video to be processed, so that the effect of beautifying and enhancing by one key is achieved, the video processing operation is simplified, and the video processing efficiency is improved.

Description

Video processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a video processing method and apparatus, a computer device, and a storage medium.

Background

With the continuous development of video sharing platforms, various videos have already entered an outbreak period, and users have higher and higher requirements for video quality. Generally, a piece of video is often edited and beautified before being distributed, for example, brightness enhancement is performed on a video with low exposure, noise reduction is performed on the video to improve image quality, and the like. Therefore, the video sharing platform can provide various intelligent methods for enhancing the video quality, and a user can edit videos conveniently.

Currently, video sharing platforms provide a variety of intelligent tools for users to select. When a user selects an intelligent tool, the video is processed by using a video quality enhancement method corresponding to the intelligent tool, for example, the user can select a filter button, and the video sharing platform can add a filter to the video according to the selection of the user. The user manually selects various intelligent tools, a great deal of time and energy are consumed, and the user experience is seriously influenced, so how to intelligently process the videos in multiple ways to achieve the effect of beautifying and enhancing by one key becomes a problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a video processing method and device, computer equipment and a storage medium. When a user needs to process video data, the server can acquire a plurality of video type labels of a video to be processed according to a video processing instruction, acquire a plurality of video processing modes through the video type labels, and perform a plurality of processing on the video through the video processing modes, so that the effect of beautifying and enhancing by one key is achieved, the video processing operation is simplified, and the video processing efficiency is improved.

In view of the above, a first aspect of the present application provides a method for video processing, including:

video processing instructions for a video to be processed are obtained.

And carrying out video content detection on the video to be processed to obtain a video content detection result.

And responding to the video processing instruction, and acquiring K video type labels corresponding to the video to be processed according to the video content detection result, wherein K is an integer greater than or equal to 1.

And determining K video processing modes according to the K video type tags, wherein the video processing modes have corresponding relations with the video type tags, and each video processing mode comprises at least two processing sub-modes, wherein each processing sub-mode comprises at least one of a picture quality processing mode and a content processing mode.

And processing the video to be processed by adopting K video processing modes to output the target video.

A second aspect of the present application provides a video processing apparatus comprising:

and the acquisition unit is used for acquiring the video processing instruction aiming at the video to be processed.

And the detection unit is used for detecting the video content of the video to be processed to obtain a video content detection result.

And the acquisition unit is also used for responding to the video processing instruction and acquiring K video type labels corresponding to the video to be processed according to the video content detection result, wherein K is an integer greater than or equal to 1.

The device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining K video processing modes according to K video type tags, the video processing modes and the video type tags have corresponding relations, each video processing mode comprises at least two processing sub-modes, and each processing sub-mode comprises at least one of a picture quality processing mode and a content processing mode.

And the processing unit is used for processing the video to be processed by adopting K video processing modes so as to output the target video.

In one possible design, the K video type tags include a primary video type tag and at least one secondary video type tag, and the determining unit is specifically configured to determine a primary video processing mode according to the primary video type tag and determine at least one secondary video processing mode according to the at least one secondary video type tag.

And the processing unit is specifically used for processing all content objects of the video frames in the video to be processed by adopting a main video processing mode. And processing part of content objects of the video frames in the video to be processed by adopting a secondary video processing mode. Wherein the secondary video type tag is obtained from a portion of the content object.

In one possible design, the obtaining unit is specifically configured to obtain at least one key video frame in the video to be processed.

A detection unit for determining all content objects of at least one key video frame. And determining a main content object and at least one secondary content object in all the content objects according to the number of pixel points occupied by all the content objects.

And the determining unit is further used for determining the video type label corresponding to the main content object as a main video type label and determining the video type label corresponding to the at least one secondary content object as at least one secondary video type label.

A detection unit, in particular for determining all content objects of at least one key video frame. Determining a content object with the highest priority in all the content objects as a main content object according to the priority levels corresponding to all the content objects; and determining at least one secondary content object based on the priority level.

The determining unit is specifically configured to determine a video type corresponding to the primary content object as a primary video type tag, and determine a video type tag corresponding to at least one secondary content object as at least one secondary video type tag.

In one possible design, K is equal to 1, and the obtaining unit is specifically configured to periodically intercept a plurality of video frames in the video to be processed according to a preset frequency.

The determining unit is further used for determining a plurality of video type labels corresponding to the plurality of video frames, and the plurality of video frames and the plurality of video type labels are in one-to-one correspondence. And determining the video type label with the highest frequency of occurrence in the plurality of video type labels as the video type label corresponding to the video to be processed.

In one possible design, the determining unit is specifically configured to determine the plurality of content objects in each of the plurality of video frames. Determining the weight values of a plurality of content objects in each video frame, and determining the video type label corresponding to the content object with the highest weight in the plurality of content objects as the video type label corresponding to each video frame.

In one possible design, the obtaining unit is specifically configured to input a plurality of video frames to the image tag model; and determining a video type label corresponding to the video to be processed according to the output of the image label model.

In one possible design, the K video type tags include a person type tag, and the video processing mode corresponding to the person type tag includes at least two of filtering processing, liquefaction processing, or brightness adjustment. And the processing unit is specifically used for determining a character content object in the video to be processed according to the character type tag and performing at least two of filtering processing, liquefaction processing and brightness adjustment processing on the character content object.

In one possible design, the K video type tags include a gourmet type tag, and the video processing modes corresponding to the gourmet type tag include at least two of color temperature adjustment, saturation adjustment, or adding filters.

The processing unit is specifically used for determining a food content object in the video to be processed according to the gourmet type label; at least two of a color temperature adjustment process, a saturation adjustment process, or a filter addition process are performed on the food content object.

In one possible embodiment, the K video type tags include a night scene type tag, and the video processing mode corresponding to the night scene type tag includes at least two of brightness adjustment, saturation adjustment, or denoising processing.

And the processing unit is specifically used for determining a plurality of night scene video frames in the video to be processed according to the night scene type label, and performing at least two of brightness adjustment processing, saturation adjustment processing or denoising processing on the plurality of night scene video frames.

In one possible implementation, the K video type tags include an indoor type tag, and the video processing mode corresponding to the indoor type tag includes at least two of brightness adjustment, saturation adjustment, or white balance adjustment.

And the processing unit is specifically used for determining a plurality of indoor video frames in the video to be processed according to the indoor type tag, and performing at least two of brightness adjustment processing, saturation adjustment processing or white balance adjustment processing on the plurality of indoor video frames.

In one possible embodiment, the K video type tags include a plant type tag, and the video processing mode corresponding to the plant type tag includes contrast adjustment and filter addition.

And the processing unit is specifically used for determining the plant content object in the video to be processed according to the plant type label, and performing contrast adjustment processing and filter adding processing on the plant content object.

A third aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a video processing method is provided, and when a video processing instruction sent by a user is received, the video processing instruction is responded to first to obtain a video type tag of a video to be processed, then a video processing mode corresponding to the video type tag is obtained, then the video to be processed is subjected to multiple items of processing according to multiple processing sub-modes in the video processing mode, and finally a processed target video is output. By the mode, the function of one-key beautification of the video to be processed can be realized, and a user can realize multiple items of enhancement processing of the video to be processed only by inputting one video operation instruction, so that the video processing operation is simplified, and the video processing efficiency is improved.

Drawings

Fig. 1 is an environment schematic diagram of a video processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a video editing interface according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another video processing method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of another video processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the increasing demand of network users for sharing short videos, various video sharing platforms leap back and forth quickly, and multiple services of editing videos, uploading videos, video comments and the like are provided for the users. Generally, before a certain user uploads a video made by the user, the video often needs to be edited for the second time, so as to further beautify and enhance the video. The existing video editing interface provides various editing tools for users, can perform multiple processing operations on videos, and meets the pursuit of the users on the video quality. However, when editing the video, the user is required to manually adjust each video enhancement function, which is tedious and time-consuming for the image processing and inexperienced user. Based on the problems, the invention provides an intelligent method for editing the video to be processed according to the content in the video to be processed, so that the enhancement and beautification of the video can be automatically performed according to the video scene characteristics and the video content roles, and the video processing efficiency is greatly improved. It can be understood that if the hardware acceleration and the lightweight image processing algorithm are adopted, the waiting time of the user can be reduced, and the video processing efficiency is further improved.

An application scenario of the embodiment of the present application is described below. It is understood that the video processing method may be executed by the terminal device or by the server. The video processing method can be implemented, when the video processing method is deployed on the terminal equipment, the terminal equipment can directly process the video according to the user instruction in an off-line state, at the moment, the terminal equipment does not need to be networked, privacy of user video data can be better protected, and the video processing process is more convenient and faster. When the video processing method is deployed in the server, the server can process video data in real time according to a user instruction, provide richer video effect materials for the video to be processed based on the hardware performance of the server, and simultaneously improve the video processing speed, so that the video processing efficiency is improved.

The video processing method provided by the embodiment of the present application is described below by taking a server as an execution subject. Referring to fig. 1, fig. 1 is an environment schematic diagram of a video processing method in an embodiment of the present application, as shown in fig. 1, the video processing system includes a server and a terminal device, edits a video to be processed at the server side, and displays an operable interface and a video picture of a processed new video for a user at a client.

The server in fig. 1 may be one server or a server cluster composed of multiple servers, or a cloud computing center, and the like, which are not limited herein. The client is specifically deployed and configured as a terminal device, and the terminal device may be a tablet computer, a notebook computer, a palm computer, a mobile phone, a Personal Computer (PC) and a voice interaction device shown in fig. 3.

The terminal device and the server can communicate with each other through a wireless network, a wired network or a removable storage medium. Wherein the wireless network described above uses standard communication techniques and/or protocols. The wireless Network is typically the internet, but can be any Network including, but not limited to, bluetooth, Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, private, or any combination of virtual private networks. In some embodiments, custom or dedicated data communication techniques may be used in place of or in addition to the data communication techniques described above. The removable storage medium may be a Universal Serial Bus (USB) flash drive, a removable hard drive or other removable storage medium, and the like.

Although only five terminal devices and one server are shown in fig. 1, it should be understood that the example in fig. 1 is only used for understanding the present solution, and the number of the specific terminal devices and the number of the servers should be flexibly determined according to actual situations.

Because the embodiment of the present application can also be implemented based on the field of artificial intelligence, before the video processing method provided by the embodiment of the present application is introduced, some basic concepts in the field of artificial intelligence are introduced. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in various directions, Machine Learning (ML) is a multi-field cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With reference to the above description, taking an execution subject as a server as an example to describe the video processing method in the present application, please refer to fig. 2, and fig. 2 is a schematic flow diagram of a video processing method according to an embodiment of the present application, as shown in fig. 2, the method includes:

201. the server acquires a video to be processed and receives a video processing instruction aiming at the video to be processed.

For example, a user may upload a to-be-processed video on a video editing interface, as shown in fig. 3, fig. 3 includes a video presentation interface 301 and a toolbar interface 302, where the video presentation interface 301 is used to present the to-be-processed video while performing an effect presentation on a processed new video, and the toolbar interface 302 may include a "beautify by one key" button 303. When the user determines that the video to be processed is good, the server needs to display the video to be processed on the video display interface 301, and meanwhile, the user selects the click button 303 and sends a video processing instruction to the server through the button 303. After receiving the video processing instruction, the server starts a video processing flow, performs automatic processing on the video to be processed, and finally displays the processed new video through the video display interface 301 for the user to view the effect of the final video.

202. And the server detects the video content of the video to be processed to obtain a video content detection result.

After the server receives the video processing instruction for the video to be processed, it needs to detect the video content of the video to be processed first. For example, the video scene characteristics of the video to be processed may be detected, for example, the scene content of each frame of video picture in the video to be processed is detected, and the scene type of the video to be processed is determined, which includes an indoor scene, an outdoor scene, or a motion scene.

For example, the content object of the video to be processed may also be detected, for example, feature point comparison is performed on video pictures in the video to be processed, and the type of the included content object is determined, for example, people, animals, plants, vehicles, and the like included in the video to be processed.

For example, the roles of the content objects in the video to be processed may be differentiated, for example, the video to be processed includes a plurality of character objects, and further, the character may be detected, for example, whether the character object is a female character or a male character, a young character or an old character, and the like. Based on the detection strategy, the server can divide the content included in the whole video to be processed to obtain a final video content detection result, so that the subsequent server can determine the video type according to the video content detection result, and further determine the video processing mode.

203. And the server responds to the video processing instruction and acquires a video type label corresponding to the video to be processed according to the video content detection result.

And the server responds to the video processing instruction and starts a video processing flow. First, the server needs to determine the video type tag of the video according to the video content detection result. Specifically, the video sharing platform can count and analyze the content in the video shot by mass users, establish a type label system for the video, namely classify the video according to the video content, and add different types of labels. Illustratively, a person type tag, a pet type tag, a food type tag, a plant type tag, a document type tag, a night scene type tag, an indoor type tag, etc. may be included.

For example, the server may detect a face image in each video frame of the video to be processed according to the image key point, and when the number of video frames including the face image accounts for more than 80% of the total number of video frames, it may be determined that the video type label of the video to be processed is a human type label. For another example, the server may identify light intensity in a video frame image of the video to be processed, and when the video frame image is determined to be a night view image according to the light intensity, it may be determined that the video type tag of the video to be processed is a night view type tag. It can be understood that one video may correspond to multiple video type tags, for example, when the server detects that a certain to-be-processed video is shot in a night scene and a shot object is a person, it may be determined that the video type tag corresponding to the to-be-processed video is a person type tag and a night scene type tag, that is, the number of the video type tags corresponding to the to-be-processed video is not limited specifically.

For example, an image tag model may be established based on the above video type tag system, and the image tag model is used for outputting a video type tag of a video to be identified. Illustratively, the image label model may be trained by: firstly, a training sample is labeled, the training sample is a history uploaded video, the labeled content is a video type label with accurate training sample, then the labeled sample is input to an image label model, and the image label model obtains an output type label of the labeled sample through the content of the labeled sample. And then, according to the loss of the output type label and the labeled video type label, iteratively updating the parameters of the image label model until the image label model tends to converge. Therefore, the video to be processed can be input into the trained image label model, and the video type label of the video to be processed can be obtained according to the output of the image label model.

Different video type labels correspond to different video processing modes, for example, a person type label focuses more on face remodeling, a food type label focuses more on saturation and color of food, and a night scene type label focuses more on brightness and contrast adjustment of a picture. Therefore, the server can establish a video processing strategy for each video type label, formulate a video processing mode, and finally generate a video processing mode specific to each video type label.

204. And the server determines a video processing mode according to the video type label.

The server determines a video processing mode corresponding to the video to be processed according to the video type label of the video to be processed, and then sequentially processes the video to be processed according to a plurality of sub-modes in the video processing mode. The video type tags correspond to video processing modes one to one, and each video processing mode includes a plurality of specific processing modes (i.e., processing sub-modes), which may be a picture quality processing mode or a content processing mode. The image quality processing mode may be to edit and adjust the entire video screen, for example, to change the brightness, contrast, white balance, color temperature, etc. of the entire video screen, and to enhance the beautification of the video by adjusting the basic indexes of the screen. The content processing mode may be to adjust specific picture content, such as to perform operations of whitening, buffing, and thinning a face in a picture. Or carrying out operations such as sharpening, sharpness deepening and the like on the pet in the picture. The server can formulate a personalized processing mode combination according to the characteristics of each video type label, and finally obtain the video processing mode of each video type label.

As shown in table 1, for one video type tag and video processing mode corresponding table provided in this embodiment of the application, it can be understood that the corresponding table is only used as an example, a video processing mode corresponding to a video type tag is a combination of multiple video processing modes, and processing submodes included in a video processing mode may be arbitrarily combined according to different requirements, and are not particularly limited.

TABLE 1

Video type tag	Video processing mode
		Character	Buffing → face thinning → whitening
Food	Color temperature processing → saturation processing → filtering
		Plant and method for producing the same	Contrast adjustment → filter addition
Night scene	Lightness adjustment → saturation adjustment → denoising process
		Indoor use	Lightness adjustment → saturation adjustment → white balance adjustment
Animal(s) production	Brightness adjustment → saturation adjustment → filter
		Document	De-noising → sharpening

For example, in the above correspondence table, the video processing mode corresponding to the person type tag may include a filtering process, a liquefaction process, or a brightness adjustment. Wherein, the filtering treatment corresponds to skin grinding, the liquefying treatment corresponds to face thinning and brightness adjustment, and the whitening is obtained. When the video type label corresponding to the video to be processed is an object type label, the server can automatically perform face thinning, skin grinding and whitening on the face in the video, so that the beautification and enhancement of the whole video are achieved.

For example, in the above correspondence table, the video processing mode corresponding to the gourmet type tag may include at least two of a color temperature adjustment process, a saturation adjustment process, or a filter addition process. When the video type label corresponding to the video to be processed is a food type label, the server firstly determines food in the video to be processed, and then color temperature adjustment processing, saturation adjustment processing and filter adding processing can be automatically carried out on the food, and finally beautification and enhancement of the whole video are achieved.

For example, in the above correspondence table, the video processing mode corresponding to the plant type tag may include contrast adjustment and filter addition. When the video type label corresponding to the video to be processed is the plant type label, the server firstly determines the plant in the video to be processed, then the contrast of the plant can be automatically adjusted, and a filter is added to the video frame, so that the beautification and the enhancement of the whole video are finally achieved.

For example, in the above correspondence table, the video processing mode corresponding to the night scene type tag may include at least two of brightness adjustment, saturation adjustment, or denoising processing. When the video type label corresponding to the video to be processed is a night scene type label, the server can perform at least two of brightness adjustment processing, saturation adjustment processing or denoising processing on the whole video frame, and finally beautification and enhancement on the whole video are achieved.

For example, in the above correspondence table, the video processing mode corresponding to the indoor type tag may include at least two of brightness adjustment, saturation adjustment, or white balance adjustment. When the video type label corresponding to the video to be processed is an indoor type label, the server can perform at least two processes of brightness adjustment, saturation adjustment or white balance adjustment on the whole video frame, and finally beautification and enhancement on the whole video are achieved.

It can be understood that the server may determine a plurality of video processing combinations for each video type tag according to the processing style, and establish a personalized video processing mode, and the specific processing mode is not limited. Meanwhile, when a certain video to be processed corresponds to a plurality of video type tags, the server can sequentially perform a plurality of processing on the video to be processed according to the video processing mode corresponding to each video type tag, the processing effects can be superposed, and finally, the beautified and enhanced video is generated.

205. And the server processes the video to be processed according to the video processing mode and outputs the target video.

After the server determines the video type label of the video to be processed and the video processing mode corresponding to the video type label, the video to be processed is sequentially processed in multiple ways according to the processing mode in the video processing mode to generate a target video, the target video is sent to a terminal for a user to read, and when the user is satisfied, the processed target video can be uploaded to a video sharing platform.

In this embodiment, when receiving a video processing instruction sent by a user, a server first responds to the video processing instruction to obtain a video type tag of a to-be-processed video, then obtains a video processing mode corresponding to the video type tag, performs multiple processing on the to-be-processed video according to multiple processing sub-modes in the video processing mode, and finally outputs a processed target video. By the mode, the function of one-key beautification of the video to be processed can be realized, and a user can realize multiple items of enhancement processing of the video to be processed only by inputting one video operation instruction, so that the video processing operation is simplified, and the video processing efficiency is improved. The 'one-key beautifying' function can omit the re-editing time of the user for shooting the video, greatly increase the desire of the user to upload the video while improving the attractiveness of the video, and improve the user viscosity and the activity of the video sharing platform.

Referring to fig. 4, fig. 4 is a schematic flow chart of another video processing method provided in the embodiment of the present application, and it can be understood that in the embodiment of the present application, one video to be processed corresponds to one video type tag; as shown in fig. 4, includes:

401. the server acquires a video to be processed and receives a video processing instruction aiming at the video to be processed.

It is understood that step 401 is similar to step 201 in the embodiment shown in fig. 2, and is not described herein again.

402. And the server detects the video content of the video to be processed to obtain a video content detection result.

It is understood that step 402 is similar to step 202 in the embodiment shown in fig. 2, and is not described herein again.

403. And the server responds to the video processing instruction and periodically intercepts a plurality of video frames in the video to be processed according to the preset frequency.

When the server receives a video processing instruction sent by a user, the server needs to respond to the video processing instruction and start a video processing flow aiming at the video to be processed. Firstly, the server can periodically intercept a plurality of video frames in the video to be processed, and judge the main scene of the whole video according to the contents in the plurality of video frames, so that the subsequent server can reasonably plan the processing mode. It can be understood that the server may also determine a plurality of key frames in the video according to a certain policy, and determine a plurality of key frames according to the number of objects included in the video picture, for example, by analyzing the content video application scene of the plurality of key frames, and the specific form is not limited.

404. And the server respectively determines a plurality of content objects in each video frame according to the video content detection result.

When the server determines that there are multiple video frames, it needs to analyze each video frame in the multiple video frames in turn. It can be understood that the content in the plurality of video frames is used to determine the processing policy of the video to be processed, and therefore, each video frame may be classified according to the video content detection result to determine the video type tag corresponding to each video frame. And then analyzing a plurality of video type labels corresponding to the plurality of video frames, and selecting the video type label which can represent the video content most as the video type label of the whole video to be processed, so that the video to be processed can be processed more intelligently by the processing mode determined by the server, and the video processing effect is improved.

For example, the server may determine a video type tag corresponding to each video frame by the video scene characteristics corresponding to each video frame. For example, the shooting scene of each video frame, such as a night scene mode, an indoor scene, an outdoor scene, etc., can be determined according to the light characteristics in the video frame.

For example, the server may determine a video type tag corresponding to each video frame by the video object feature corresponding to each video frame. For example, the server may detect an object in each video frame according to the key feature point comparison, and determine a video type tag of the video frame according to the object type, such as a person type, an animal type, a plant type, and the like.

For example, when the server determines that the video cosmetic object of the video frame is a character type, roles of the character type, such as a female role and a male role, may also be determined, and the video type tag of the video frame may be determined according to the role characteristics. It is to be understood that the server may determine different types of content objects according to different policies, and is not limited in particular.

405. And the server determines the video type label corresponding to each video frame according to the weight values of the plurality of content objects.

The server may formulate a correlation policy that determines different weight values for different content objects. For example, it may be determined that the weight of a face is greater than that of an animal, and the weight of the animal is greater than that of a plant, so that when the server detects that a certain video frame picture includes a face image, an animal image, and a plant image, it may be determined that the most dominant content object in the video frame is the face, and thus it may be determined that the video type tag corresponding to the video frame is the object type tag.

For example, when the server detects that a certain video frame picture includes a face image, an animal image and a plant image, the server first determines the proportion of the face image, the animal image and the plant image in the whole picture, and if the largest area of the whole picture is the animal image, the server can determine that the video type tag corresponding to the video frame is the animal type tag.

For example, the server may further make the weight of the shooting environment greater than the weight of the image, for example, a certain video frame picture is a face under a night scene, and the server retrieves the content of the video frame as a night scene mode and a face image, so that it may be determined that the video type tag corresponding to the video frame is a night scene type tag through the weight value of the video type tag.

As can be seen from the above example, the server attempts to find the most representative feature to determine the video type label of the video frame by analyzing the content in each video frame, so that different strategies can be formulated according to the user requirements to determine the video type label of a certain video frame, which is not limited specifically.

406. And the server determines a video type label corresponding to the video to be processed according to the video type labels of the video frames.

After the server determines the video type tag corresponding to each video frame, statistical analysis needs to be performed on the video type tags of a plurality of video frames to obtain the video type tag of the final video to be processed. For example, the video type tag with the highest frequency of occurrence in the multiple video type tags may be determined as the video type tag corresponding to the video to be processed, for example, the server obtains 10 video frames of the video to be processed, where among the 10 video frames, the video type tag corresponding to 5 video frames is a person type tag, the video type tag corresponding to 3 video frames is an animal type tag, and the tag corresponding to 2 video frames is a plant type tag, and then the video type tag of the video to be processed may be determined as the person type tag.

Illustratively, the server may also determine the video type tag of the video to be processed through a voting mechanism. Specifically, the server determines video type tags corresponding to a plurality of video frames respectively, and then selects a video type tag with the highest priority as a video type tag of the whole to-be-processed video.

For example, the server may further calculate a probability score of each video frame under a plurality of video type tags, and finally determine the video type tag with the maximum probability sum as the video type tag of the video to be processed. For example, the server determines that the probability value of the first video frame under the character type tag is 0.8 and the probability value under the animal type tag is 0.2. The probability value of the second video frame under the character type label is 0.5, the probability value under the animal type label is 0.2, and the probability value under the night scene type label is 0.3. The probability value of the third video frame under the character type label is 0.4, and the probability value under the plant type label is 0.6, so that the probability values of the three video frames are combined, the video type label with the maximum probability sum is a character type label, and the video type label corresponding to the video to be processed can be determined to be the character type label. It is understood that the server may determine the video type tag of the video to be processed through various strategies, which are not limited herein.

407. And the server determines a video processing mode according to the video type label corresponding to the video to be processed.

After the server determines the video type tag corresponding to the video to be processed, the video to be processed can be processed according to the video processing mode corresponding to the video type tag. The video processing mode is a combination of multiple video processing modes, and may include a beautification enhancement mode for the overall image quality of a video image, or a beautification editing mode for specific contents of a video. The video processing mode in step 407 is similar to the video processing mode in step 204 in the embodiment shown in fig. 2, and is not described herein again.

408. And the server processes the video to be processed according to the video processing mode and outputs the target video.

In this embodiment, the server may periodically intercept a plurality of video frames in the video to be processed, determine a video type tag corresponding to each video frame according to the content specifically included in each video frame, perform statistical analysis on the plurality of video type tags corresponding to the plurality of video frames, select a video type tag that can best reflect video characteristics as the video type tag of the entire video to be processed, perform a plurality of processes on the video to be processed according to the video processing mode corresponding to the video type tag of the video to be processed, and output the processed target video. When the video to be processed only comprises one video type label, the video processing process can be greatly simplified, the style and color of the video frame in the video are kept uniform, the video editing processing time is reduced, and the video processing speed is improved.

Referring to fig. 5, fig. 5 is a schematic flow chart of another video processing method according to the embodiment of the present application, it can be understood that, in the embodiment shown in fig. 5, one video to be processed corresponds to a plurality of video type tags; as shown in fig. 5, similarly, taking a server as an execution subject, the method includes the following steps:

501. the server acquires a video to be processed and receives a video processing instruction aiming at the video to be processed.

It is understood that step 501 is similar to step 201 in the embodiment shown in fig. 2, and is not described herein again.

502. And the server detects the video content of the video to be processed to obtain a video content detection result.

It is understood that step 502 is similar to step 202 in the embodiment shown in fig. 2, and is not described herein again.

503. And the server responds to the video processing instruction and acquires at least one key video frame in the video to be processed.

And when the server receives a video processing instruction sent by the user, responding to the video processing instruction, and starting a video processing flow aiming at the video to be processed. Firstly, the server needs to determine at least one key video frame of the video to be processed according to the relevant policy, and determines the key frame by analyzing the content included in the plurality of key frames, for example, according to the number of objects included in the video picture, and the specific form is not limited.

504. And the server determines a main video type label and at least one secondary video type label corresponding to the video to be processed according to the video content detection result of the key video frame.

After the server determines the key video frames of the videos to be processed, a plurality of video type labels can be determined according to the video content detection results corresponding to the key video frames. For example, the server may determine a video type tag corresponding to each video frame for the video scene characteristics corresponding to each key video frame, and then determine a primary video type tag and a secondary video type tag according to the video type tag corresponding to each video frame. For example, by analyzing the video scene characteristics of the key video, the scenes corresponding to the key video frames, including the indoor scene and the outdoor scene, are obtained, and if the scene characteristics are that the number of the key video frames of the outdoor scene is greater than that of the indoor scene, it can be determined that the outdoor scene tag is the primary video tag and the indoor scene tag is the secondary video tag;

illustratively, the server can also analyze the object features included in the key video frames, intelligently select object features more representative of the video, and determine a main video type label of the video to be processed according to the object features. The other content objects are then re-analyzed to determine secondary video type tags. For example, the server may determine the primary content object and the secondary content object by analyzing the number of pixel points occupied by all content objects in the key video frame, determine the primary video type tag according to the primary content object, and determine the secondary video type tag according to the secondary content object. For example, the server detects that a key video frame included in a certain to-be-processed video includes a character image, an animal image and a plant image, wherein the character image is a long shot of a video picture, the number of occupied pixel points (i.e., occupied picture area) is small, the number of pixel points occupied by the animal image is the largest, and the plant image is the next, so that it can be determined that the animal image is a main content object, and the character image and the plant image are secondary content objects. And then determining that the main video type label of the video to be processed is an animal type label and the secondary video type label of the video to be processed is an animal type label and a plant type label according to the main content object, namely the animal image.

For example, the server may determine the primary content object and the secondary content object by the priority level of all content objects in the key video frame, and determine the primary video type tag according to the primary content object and determine the secondary video type tag according to the secondary content object. For example, the server detects that a key video frame included in a certain to-be-processed video includes a human image, an animal image and a plant image, and since the priority of the human image is higher than that of the animal image and the priority of the plant image is the lowest, it can be determined that the human image is a main content object and the animal image and the plant image are secondary content objects. And then determining that the main video type label of the video to be processed is an animal type label and determining that the secondary video type label of the video to be processed is an animal type label and a plant type label according to the main content object, namely the figure image.

For example, the server may further analyze the character role features included in the key video frame, for example, if the server determines that the main content in the key video frame is a character, it needs to analyze the roles of multiple characters included, for example, the number of female characters and male characters, the number of young characters and young characters, and the like, and if the number of female characters is greater than the number of male characters, it may be determined that the primary video type tag of the video to be processed is a female character type tag, and the secondary video type tag is a male character type tag.

It can be understood that the server may formulate different strategies according to the requirements to determine the primary video type tag and the secondary video type tag of the video to be processed, and the number of the secondary video type tags may be freely determined, which is not particularly limited.

505. The server determines a primary video processing mode according to the primary video type tag and determines at least a secondary video processing mode according to the secondary video type tag.

After the server determines the video type tag corresponding to the video to be processed, the video to be processed can be processed according to the video processing mode corresponding to the video type tag. The video processing mode is a combination of multiple video processing modes, and may include a beautification enhancement mode for the overall image quality of a video image, or a beautification editing mode for specific contents of a video. The video processing mode corresponding to the primary video type tag is a primary video processing mode, and the video processing mode corresponding to the secondary video type tag is a secondary video processing mode. The video processing mode in step 505 is similar to the video processing mode in step 204 in the embodiment shown in fig. 2, and is not described herein again.

506. And the server processes the video to be processed by adopting a main video processing mode and a secondary video processing mode.

As can be understood, the main video type tag is a video type tag that can represent a video type style best, and for example, a main video processing mode corresponding to the main video type tag can process a to-be-processed video for an entire video picture. Specifically, all content objects of the video frame may be processed, for example, if a certain video to be processed is a dance video of a person in a night scene, it may be determined that the main video type tag corresponding to the video is a night scene type tag, and the sub video type tag is a person type tag. When the server processes the video, the brightness and contrast of the whole video frame picture can be adjusted according to the video processing mode corresponding to the night scene type label, so that the effect of enhancing and beautifying the to-be-processed video is achieved.

It can be understood that the secondary video processing mode corresponding to the secondary video type tag may be configured to process a local content object of the video, for example, in the above example, if the main video type of the video to be processed is a night view type tag, and the secondary video type is a person type tag, then brightness and contrast adjustment may be performed on the whole video frame picture according to the video processing mode corresponding to the night view type tag, and the person in the video frame may be subjected to skin grinding, whitening and other processing according to the video processing mode corresponding to the person type tag, so as to finally generate the target video.

Illustratively, when the server determines a plurality of video type tags according to the characteristics of a video scene, the characteristics of the video scene corresponding to each video frame may be analyzed, the video to be processed is divided into a plurality of segments, the scene information corresponding to each segment is the same, and then the video to be processed is processed according to a plurality of video processing modes. For example, in a certain video, when a shooting scene changes from indoor to outdoor to night, the video frames of the indoor scene can be processed according to the video processing mode corresponding to the indoor scene tag, the video frames of the outdoor scene can be processed according to the video processing mode corresponding to the outdoor scene tag, and the video frames of the night scene can be processed according to the video processing mode corresponding to the night scene tag.

For example, when the server determines a plurality of video type tags according to the object characteristics, different types of objects in the video to be processed may be determined, for example, a certain video includes a person, an animal, and a plant, the person object may be processed according to a video processing mode corresponding to the person type tag, the animal object may be processed according to a video processing mode corresponding to the animal type tag, and the plant object may be processed according to a video processing mode corresponding to the plant type tag.

For example, when the server determines that there are multiple characters in the video to be processed, for example, a female character and a male character are included in a certain video, the female character may be processed according to the video processing mode corresponding to the female character type tag, and the male character may be processed according to the video processing mode corresponding to the male character type tag.

In this embodiment, the server may first determine a key video frame in the video to be processed, then determine a primary video type tag and a secondary video type tag of the video to be processed by analyzing all content objects in the key video frame, finally process an entire picture of the video to be processed according to a video processing mode corresponding to the primary video type tag of the video to be processed, process a specific content object in the video to be processed by using the secondary video processing mode, and finally output the processed target video. In the mode, each video to be processed corresponds to a plurality of tags, wherein the main video tag can determine the overall style of video processing, and the secondary video tags perform personalized processing aiming at specific content objects in video frames, so that the combination of video processing modes can be enriched, the video processing effect with better requirements can be met, and the video processing quality can be improved.

It will be appreciated that the video processing method may also be performed by the terminal device. When the video processing method is deployed on the terminal equipment, the terminal equipment can directly process the video in an off-line state according to a user instruction, after the video to be edited is uploaded by a user, a detection unit of the terminal equipment detects the video content of the uploaded video to be processed, then the detection result is input into a processing unit, the processing unit can match the video content detection result according to a pre-stored video label system to determine the video content label of the video to be processed, finally, the video to be processed is subjected to multiple items of processing according to the video processing mode corresponding to the video content label, finally, the processed target video is output, and the target video is displayed on a display screen of the terminal equipment for the user to perform subsequent operation.

Fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the video processing apparatus includes:

an obtaining unit 601, configured to obtain a video processing instruction for a video to be processed.

The detecting unit 602 is configured to perform video content detection on a video to be processed to obtain a video content detection result.

The obtaining unit 601 is further configured to, in response to the video processing instruction, obtain K video type tags corresponding to the video to be processed according to the video detection result, where K is an integer greater than or equal to 1.

A determining unit 603, configured to determine K video processing modes according to the K video type tags, where the video processing modes have a corresponding relationship with the video type tags, and each video processing mode includes at least two processing sub-modes, where each processing sub-mode includes at least one of a picture quality processing mode and a content processing mode.

The processing unit 604 is configured to process the video to be processed in the K video processing modes to output the target video.

In a possible design, the K video type tags include a primary video type tag and at least one secondary video type tag, and the determining unit 603 is specifically configured to determine a primary video processing mode according to the primary video type tag; a secondary video processing mode is determined based on at least one secondary video type tag.

The processing unit 604 is specifically configured to process all content objects of a video frame in the video to be processed in the main video processing mode. And processing part of content objects of the video frames in the video to be processed by adopting a secondary video processing mode. Wherein the secondary video type tag is obtained from the partial content object.

In one possible design, the obtaining unit 601 is specifically configured to obtain at least one key video frame in the video to be processed.

The detecting unit 602 is specifically configured to determine all content objects of at least one key video frame. And determining a main content object and at least one secondary content object in all the content objects according to the number of pixel points occupied by all the content objects.

The determining unit 603 is specifically configured to determine the video type tag corresponding to the primary content object as a primary video type tag, and determine the video type tag corresponding to at least one secondary content object as at least one secondary video type tag.

A detecting unit 602, specifically configured to determine all content objects of at least one key video frame; and determining the content object with the highest priority in all the content objects as a main content object according to the priority levels corresponding to all the content objects. And determining at least one secondary content object based on the priority level.

The determining unit 603 is specifically configured to determine the video type corresponding to the primary content object as a primary video type tag, and determine the video type tag corresponding to the at least one secondary content object as at least one secondary video type tag.

In one possible design, K is equal to 1, and the obtaining unit 601 is specifically configured to periodically intercept a plurality of video frames in the video to be processed according to a preset frequency.

The determining unit 603 is further configured to determine a plurality of video type tags corresponding to a plurality of video frames. The plurality of video frames and the plurality of video type labels correspond one to one. And determining the video type label with the highest frequency of occurrence in the plurality of video type labels as the video type label corresponding to the video to be processed.

In one possible design, K is equal to 1, and the determining unit 603 is specifically configured to determine a plurality of content objects in each of a plurality of video frames. Determining the weight values of a plurality of content objects in each video frame, and determining the video type label corresponding to the content object with the highest weight in the plurality of content objects as the video type label corresponding to each video frame.

In one possible design, the obtaining unit 601 is specifically configured to input a plurality of video frames to the image tag model. And determining a video type label corresponding to the video to be processed according to the output of the image label model.

In one possible design, the K video type tags include a person type tag, and the video processing mode corresponding to the person type tag includes at least two of filtering processing, liquefaction processing, or brightness adjustment.

The processing unit 604 is specifically configured to determine the character content object in the video to be processed according to the character type tag. At least two of the filtering process, the liquefaction process, and the brightness adjustment process are performed on the human content object.

The processing unit 604 is specifically configured to determine a food content object in the video to be processed according to the cate type tag. At least two of a color temperature adjustment process, a saturation adjustment process, or a filter addition process are performed on the food content object.

The processing unit 604 is specifically configured to determine a plurality of night scene video frames in the video to be processed according to the night scene type tag. And performing at least two of brightness adjustment processing, saturation adjustment processing or denoising processing on the plurality of night scene video frames.

The processing unit 604 is specifically configured to determine a plurality of indoor video frames in the video to be processed according to the indoor type tag. At least two of brightness adjustment processing, saturation adjustment processing, and white balance adjustment processing are performed on the plurality of indoor video frames.

The processing unit 604 is specifically configured to determine a plant content object in the video to be processed according to the plant type tag. And carrying out contrast adjustment processing and filter adding processing on the plant content object.

An embodiment of the present application further provides another video processing apparatus, which may be deployed in a server or a terminal device, where the video processing apparatus is deployed in the server in this application as an example, please refer to fig. 7, and fig. 7 is a schematic diagram of an embodiment of a server in an embodiment of the present application, as shown in the figure, the server 700 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and a memory 732, and one or more storage media 730 (e.g., one or more mass storage devices) storing an application 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.

Server 700 may also include one or more serversA power supply 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, and/or one or more operating systems 741, such as a Windows Server^TM，Mac OS X^TM，Unix^TM，Linux^TM，FreeBSD^TMAnd so on.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 7.

An embodiment of the present application further provides a terminal device, as shown in fig. 8, which is a schematic structural diagram of the terminal device provided in the embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to a method part in the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as the mobile phone as an example:

fig. 8 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 8, the handset includes: radio Frequency (RF) circuitry 810, memory 820, input unit 830, display unit 840, sensor 850, audio circuitry 860, wireless fidelity (WiFi) module 870, processor 880, and power supply 890. Those skilled in the art will appreciate that the handset configuration shown in fig. 8 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 8:

the RF circuit 810 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing downlink information of a base station after receiving the downlink information to the processor 880; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 810 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 810 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 820 may be used to store software programs and modules, and the processor 880 executes various functional applications and data processing of the cellular phone by operating the software programs and modules stored in the memory 820. The memory 820 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 830 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 830 may include a touch panel 831 and other input devices 832. The touch panel 831, also referred to as a touch screen, can collect touch operations performed by a user on or near the touch panel 831 (e.g., operations performed by the user on the touch panel 831 or near the touch panel 831 using any suitable object or accessory such as a finger or a stylus, and touch operations within a certain range of gaps on the touch panel 831), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 831 may include two portions, i.e., a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 880, and can receive and execute commands from the processor 880. In addition, the touch panel 831 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 830 may include other input devices 832 in addition to the touch panel 831. In particular, other input devices 832 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 840 may be used to display information input by the user or information provided to the user and various menus of the cellular phone. The display unit 840 may include a display panel 841, and the display panel 841 may be optionally configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, touch panel 831 can overlay display panel 841, and when touch panel 831 detects a touch operation thereon or nearby, communicate to processor 880 to determine the type of touch event, and processor 880 can then provide a corresponding visual output on display panel 841 based on the type of touch event. Although in fig. 8, the touch panel 831 and the display panel 841 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 831 and the display panel 841 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 850, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 841 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 1041 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 860, speaker 861, microphone 862 may provide an audio interface between the user and the handset. The audio circuit 860 can transmit the electrical signal converted from the received audio data to the speaker 861, and the electrical signal is converted into a sound signal by the speaker 861 and output; on the other hand, the microphone 862 converts collected sound signals into electrical signals, which are received by the audio circuit 860 and converted into audio data, which are then processed by the audio data output processor 880 and transmitted to, for example, another cellular phone via the RF circuit 810, or output to the memory 820 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to send and receive e-mails, browse webpages, access streaming media and the like through the WiFi module 870, and provides wireless broadband Internet access for the user. Although fig. 8 shows WiFi module 870, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 880 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 820 and calling data stored in the memory 820, thereby integrally monitoring the mobile phone. Optionally, processor 880 may include one or more processing units; optionally, the processor 880 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 880.

The phone also includes a power supply 890 (e.g., a battery) for supplying power to various components, optionally, the power supply may be logically connected to the processor 880 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the terminal includes a processor 880 having a function of performing the steps of the video processing method as described above.

Also provided in an embodiment of the present application is a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the computer program causes the computer to execute the steps executed by the server in the method described in the foregoing embodiment shown in fig. 2, or causes the computer to execute the steps executed by the server in the method described in the foregoing embodiment shown in fig. 4, or causes the computer to execute the steps executed by the server in the method described in the foregoing embodiment shown in fig. 5.

Also provided in an embodiment of the present application is a computer program product including a program, which, when run on a computer, causes the computer to perform the steps performed by the server in the method described in the foregoing embodiment shown in fig. 2, or causes the computer to perform the steps performed by the server in the method described in the foregoing embodiment shown in fig. 4, or causes the computer to perform the steps performed by the server in the method described in the foregoing embodiment shown in fig. 5.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A video processing method, comprising:

acquiring a video processing instruction for a video to be processed;

performing video content detection on the video to be processed to obtain a video content detection result;

responding to the video processing instruction, and acquiring K video type labels corresponding to the video to be processed according to the video content detection result, wherein K is an integer greater than or equal to 1;

determining K video processing modes according to the K video type tags, wherein the video processing modes have a corresponding relation with the video type tags, each video processing mode comprises at least two processing sub-modes, and each processing sub-mode comprises at least one of a picture quality processing mode and a content processing mode;

and processing the video to be processed by adopting the K video processing modes to output a target video.

2. The method of claim 1, wherein the K video type tags include a primary video type tag and at least one secondary video type tag, and wherein determining K video processing modes based on the K video type tags comprises:

determining a main video processing mode according to the main video type label;

determining at least one secondary video processing mode according to the at least one secondary video type tag;

the processing the video to be processed by adopting the K video processing modes comprises:

processing all content objects of video frames in the video to be processed by adopting the main video processing mode;

processing partial content objects of video frames in the video to be processed by adopting the secondary video processing mode; wherein the secondary video type tag is obtained by the partial content object.

3. The method according to claim 1, wherein the performing video content detection on the video to be processed to obtain a video content detection result comprises:

acquiring at least one key video frame in the video to be processed;

determining the overall content object of the at least one key video frame;

determining a main content object and at least one secondary content object in all the content objects according to the number of pixel points occupied by all the content objects;

the acquiring the K video type labels corresponding to the video to be processed according to the video content detection result includes:

and determining the video type label corresponding to the main content object as the main video type label, and determining the video type label corresponding to the at least one secondary content object as the at least one secondary video type label.

4. The method according to claim 1, wherein the performing video content detection on the video to be processed to obtain a video content detection result comprises:

acquiring at least one key video frame in the video to be processed;

determining the overall content object of the at least one key video frame;

determining the content object with the highest priority in all the content objects as a main content object according to the priority levels corresponding to all the content objects; determining at least one secondary content object according to the priority level;

and determining the video type corresponding to the main content object as the main video type label, and determining the video type label corresponding to the at least one secondary content object as the at least one secondary video type label.

5. The method according to claim 1, wherein K is equal to 1, and the obtaining K video type tags corresponding to the to-be-processed video according to the video content detection result comprises:

periodically intercepting a plurality of video frames in the video to be processed according to a preset frequency;

determining a plurality of video type labels corresponding to the plurality of video frames; the video frames correspond to the video type labels one by one;

and determining the video type label with the highest frequency of occurrence in the plurality of video type labels as the video type label corresponding to the video to be processed.

6. The method of claim 5, wherein determining a plurality of video type tags corresponding to the plurality of video frames comprises:

determining a plurality of content objects in each of the plurality of video frames;

determining the weight values of a plurality of content objects in each video frame, and determining the video type label corresponding to the content object with the highest weight in the plurality of content objects as the video type label corresponding to each video frame.

7. The method according to claim 1, wherein the obtaining K video type tags corresponding to the video to be processed according to the video content detection result comprises:

inputting the plurality of video frames to an image tag model;

and determining the K video type labels corresponding to the video to be processed according to the output of the image label model.

8. The method of any of claims 1 to 7, wherein the K video type tags comprise a people type tag, and the video processing modes corresponding to the people type tag comprise at least two of filtering processing, liquefaction processing, or brightness adjustment;

determining a character content object in the video to be processed according to the character type label;

and performing at least two of filtering processing, liquefaction processing and brightness adjustment processing on the character content object.

9. The method of any one of claims 1 to 7, wherein the K video type tags comprise a gourmet type tag, and the video processing modes corresponding to the gourmet type tag comprise at least two of color temperature adjustment, saturation adjustment, or adding filters;

determining a food content object in the video to be processed according to the cate type tag;

performing at least two of a color temperature adjustment process, a saturation adjustment process, or a filter addition process on the food content object.

10. The method according to any one of claims 1 to 7, wherein the K video type tags comprise a night scene type tag, and the video processing mode corresponding to the night scene type tag comprises at least two of brightness adjustment, saturation adjustment or denoising processing;

determining a plurality of night scene video frames in the video to be processed according to the night scene type label;

and performing at least two of brightness adjustment processing, saturation adjustment processing or denoising processing on the plurality of night scene video frames.

11. The method according to any one of claims 1 to 7, wherein the K video type tags comprise an indoor type tag, and the video processing mode corresponding to the indoor type tag comprises at least two of brightness adjustment, saturation adjustment or white balance adjustment;

determining a plurality of indoor video frames in the video to be processed according to the indoor type labels;

at least two of brightness adjustment processing, saturation adjustment processing, or white balance adjustment processing are performed on the plurality of indoor video frames.

12. The method according to any one of claims 1 to 7, wherein the K video type tags comprise a plant type tag, and the video processing mode corresponding to the plant type tag comprises contrast adjustment and filter addition;

determining a plant content object in the video to be processed according to the plant type label;

and carrying out contrast adjustment processing and filter adding processing on the plant content object.

13. A video processing apparatus, characterized in that the video processing apparatus comprises:

an acquisition unit configured to acquire a video processing instruction for a video to be processed;

the detection unit is used for detecting the video content of the video to be processed to obtain a video content detection result;

the obtaining unit is further configured to, in response to the video processing instruction, obtain K video type tags corresponding to the video to be processed according to the video content detection result, where K is an integer greater than or equal to 1;

the device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining K video processing modes according to the K video type tags, the video processing modes and the video type tags have corresponding relations, each video processing mode comprises at least two processing sub-modes, and each processing sub-mode comprises at least one of a picture quality processing mode and a content processing mode;

and the processing unit is used for processing the video to be processed by adopting the K video processing modes so as to output a target video.

14. A computer device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute a program in the memory to implement the method of any one of claims 1 to 12;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.