CN117291929A

CN117291929A - Video processing method, device, electronic equipment and storage medium

Info

Publication number: CN117291929A
Application number: CN202310988395.XA
Authority: CN
Inventors: 南秀; 陈妙; 张宝玉
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2023-12-26

Abstract

The disclosure provides a video processing method, a video processing device, electronic equipment and a storage medium, and relates to the artificial intelligence fields of computer vision, image processing, deep learning, cloud computing and the like. The method may include: acquiring a video to be processed, and performing shot segmentation on the video to be processed to obtain at least one video segment, wherein different video segments respectively correspond to different shots; for any video clip, the following processing is performed: dividing each frame image forming the video segment into a key frame image and a non-key frame image, respectively cutting each frame image in the video segment according to a main body target detected from the key frame image and a preset cutting size to obtain each cutting image, and taking the video segment formed by each cutting image as a required target video. By applying the scheme disclosed by the disclosure, the video quality after conversion and the like can be improved.

Description

Video processing method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a video processing method, a device, electronic equipment and a storage medium in the fields of computer vision, image processing, deep learning, cloud computing and the like.

Background

Currently, users can produce videos through various techniques, but the produced videos are often different in size, thereby affecting subsequent use, and the like. For this reason, post-processing, i.e., video processing, is required for the produced video, thereby converting the video into a predetermined size.

Disclosure of Invention

The disclosure provides a video processing method, a video processing device, electronic equipment and a storage medium.

A video processing method, comprising:

acquiring a video to be processed, and performing shot segmentation on the video to be processed to obtain at least one video segment, wherein different video segments respectively correspond to different shots;

for any video clip, the following processing is performed: dividing each frame image forming the video segment into a key frame image and a non-key frame image, respectively cutting each frame image in the video segment according to a main object detected from the key frame image and a preset cutting size to obtain each cutting image, and taking the video segment formed by each cutting image as a required target video.

A video processing apparatus comprising: the device comprises a segmentation module and a conversion module;

the segmentation module is used for acquiring a video to be processed, and performing shot segmentation on the video to be processed to obtain at least one video segment, wherein different video segments respectively correspond to different shots;

the conversion module is configured to perform, for any video clip, the following processes: dividing each frame image forming the video segment into a key frame image and a non-key frame image, respectively cutting each frame image in the video segment according to a main object detected from the key frame image and a preset cutting size to obtain each cutting image, and taking the video segment formed by each cutting image as a required target video.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described above.

A computer program product comprising computer programs/instructions which when executed by a processor implement a method as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an embodiment of a video processing method according to the present disclosure;

FIG. 2 is a schematic diagram of an overall implementation of the video processing method of the present disclosure;

fig. 3 is a schematic structural diagram of a first embodiment 300 of a video processing apparatus according to the present disclosure;

fig. 4 is a schematic structural diagram of a second embodiment 400 of a video processing apparatus according to the present disclosure;

fig. 5 is a schematic structural diagram of a third embodiment 500 of a video processing apparatus according to the present disclosure;

fig. 6 shows a schematic block diagram of an electronic device 600 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Fig. 1 is a flowchart of an embodiment of a video processing method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.

In step 101, a video to be processed is obtained, and shot segmentation is performed on the video to be processed to obtain at least one video segment, where different video segments respectively correspond to different shots.

In step 102, for any video clip, the following processing is performed: dividing each frame image forming the video segment into a key frame image and a non-key frame image, respectively cutting each frame image in the video segment according to a main body target detected from the key frame image and a preset cutting size to obtain each cutting image, and taking the video segment formed by each cutting image as a required target video.

In the conventional manner, when the video is subjected to size conversion, the analysis of the picture content is not performed, and each frame of image is directly cut into a required size, so that the effective information loss is large, and the quality of the converted video is reduced.

By adopting the scheme of the embodiment of the method, the main body target detection can be carried out on the key frames in the video clips, and then each frame of image in the video clips can be cut based on the detected main body target and the cutting size, namely the image can be cut based on the picture content analysis result, so that the effective information loss is reduced, the quality of the converted video is further improved, in addition, inaccurate main body target detection caused by main body target change and the like due to lens conversion is avoided by carrying out lens segmentation, the quality of the converted video is further improved, in addition, the whole process can be realized in a full-flow automatic mode without manual participation, the labor and time cost are saved, the processing efficiency is improved, and the like.

The video to be processed in the disclosure may refer to a video segment directly obtained, or may refer to a sub-video obtained by segmenting a video segment with a longer duration.

For the video to be processed, the video to be processed can be subjected to shot segmentation first, so that at least one video segment is obtained, and different video segments respectively correspond to different shots. Preferably, shot boundary detection (SBD, shot Boundary Detection) may be performed on the video to be processed, and shot segmentation may be performed on the video to be processed according to the detection result.

A shot is a continuous shot of the camera, representing a set of continuous actions in time and space, etc., and is a combination of a series of interrelated continuous frames. Through the shot boundary detection technology, the video can be segmented according to shots, so that video fragments respectively corresponding to different shots can be obtained, and a good foundation is laid for subsequent processing.

For the video to be processed, if only one lens is included, only one video segment is obtained, namely the video to be processed itself, and conversely, a plurality of video segments can be obtained.

For each video segment, the same processing mode can be adopted in the follow-up, namely each frame image forming the video segment can be divided into a key frame image and a non-key frame image, each frame image in the video segment is cut according to a main body target detected from the key frame image and a preset cutting size, so that each cutting image is obtained, and the video segment formed by each cutting image can be used as a required target video. And the video clips can be respectively subjected to size conversion, and the conversion result is used as a required target video.

Preferably, for any video clip, a first frame image in the video clip may be determined as a key frame image, and each frame image after the first frame image may be traversed, and for each traversed image, the following processing may be performed separately: and determining the similarity between the traversed image and the adjacent previous frame image, determining the traversed image as a non-key frame image in response to the similarity being greater than a first threshold, and determining the traversed image as a key frame image in response to the similarity being less than or equal to the first threshold.

For example, assuming that the similarity between the second frame image and the first frame image is greater than a first threshold, then the second frame image may be determined to be a non-key frame image, assuming that the similarity between the third frame image and the second frame image is also greater than the first threshold, then the third frame image may be determined to be a non-key frame image, and assuming that the similarity between the fourth frame image and the third frame image is less than or equal to the first threshold, then the fourth frame image may be determined to be a key frame image.

For any video clip, the number of key frame images may be one or more.

How to obtain the similarity between the two frames of images is not limited, and for example, various mature existing implementations can be adopted. In addition, the specific value of the first threshold is not limited, and may be determined according to actual needs.

For each key frame image, subject target detection may be performed separately. Preferably, for each key frame image, the subject target may be detected therefrom by a salient region detection method, respectively.

The main object refers to an object mainly displayed in the picture of the image, and may be a person, an animal, a vehicle, or the like, and the number of the main objects may be one or a plurality of main objects. The required subject target can be efficiently and accurately detected by the salient region detection technology.

In addition, it can be seen that by distinguishing the key frame image from the non-key frame image, the main object detection can be performed only for the key frame image in the following, and accordingly, compared with the manner of performing the main object detection for each frame image, the processing efficiency can be improved, the resource consumption can be reduced, and the like.

In addition, preferably, in response to determining the traversed image as a non-key frame image, the newly determined key frame image may be used as the key frame image corresponding to the non-key frame image, that is, the key frame image nearest to the non-key frame image may be used as the key frame image corresponding to the non-key frame image, and accordingly, the manner of cutting each frame image in the video clip according to the subject target detected from the key frame image and the preset cutting size may include: for a subject target detected from any one of the key frame images, the following processing is performed: and according to the main body target and the cutting size, cutting the key frame image where the main body target is positioned and the non-key frame image corresponding to the key frame image where the main body target is positioned according to the principle that the main body target comprises as much effective information as possible on the premise of conforming to the cutting size.

For example, assuming that the image a is a key frame image, and the image b and the image c are non-key frame images corresponding to the image a, according to the principle that the main object detected from the image a includes as much effective information as possible on the premise of conforming to the clipping size, the image a, the image b and the image c are clipped respectively, so as to obtain 3 frames of clipping images conforming to the clipping size and including a large amount of effective information of the main object. The method comprises as much effective information of the subject target as possible, namely, the effective information of the subject target needs to be contained in the content of the cropping image as completely as possible or in a larger range, and the effective information is namely, the effective information.

Through the processing, the effective information loss caused by clipping is reduced as much as possible, and the video quality after conversion is further improved.

For convenience of description, the converted video will be referred to as a target video. Preferably, for each obtained target video, quality detection can be performed on each target video, detected target videos can be stored in a material library, and target videos which are not detected can be discarded.

The quality of the target video can be further improved through quality detection, so that the target video stored in the material library is ensured to be the target video with the quality meeting the requirement, and the subsequent use is convenient.

Preferably, for any target video, M detection modes may be used to detect quality of the target video, and in response to the M detection modes being detection passing, the target video is determined to pass, and in response to any detection mode being detection failing, the target video is determined to fail, where M is a positive integer greater than one.

Preferably, for any target video, the method for detecting the quality of the target video by using any detection method may include: and respectively carrying out quality detection on each clipping image in the target video by using the detection mode, counting the number of clipping images passing the detection, and determining that the detection of the target video by using the detection mode passes in response to the number being larger than a second threshold value, otherwise, determining that the detection of the target video by using the detection mode fails in response to the number being smaller than or equal to the second threshold value.

The specific value of the second threshold value can be determined according to practical needs, such as 90%.

Assuming that 20 frames of images are included in a certain target video, quality detection is performed on the 20 frames of images by using a certain detection mode, the number of detected images is 19 and is larger than a second threshold value, then the quality detection performed on the target video by using the detection mode can be determined to be detected as passing, otherwise, the quality detection is determined to be failed.

In addition, for any target video, if the M detection modes are all detection passing, the target video can be determined to pass, otherwise, the target video can be determined to fail.

In the processing mode, the quality of the target video can be detected from M different dimensions, and the passing of the detection of the target video can be determined only when the detection of each dimension is passed, so that the quality of the target video stored in the material library is further improved.

Preferably, when quality detection is performed on any one target video by using M detection methods, the M detection methods may be sequentially performed in a predetermined order, and in response to the detection that the currently performed detection method is not passed, the detection method after the execution may be stopped, or the M detection methods may be performed in parallel. The order in which the predetermined order is specified is not limited.

Since the detection of the target video is determined to pass only when the M detection modes are all detection passes, when the M detection modes are sequentially executed according to a predetermined sequence, once a certain detection mode is found to be detection failure, the subsequent processing can be ended, and the detection failure of the target video can be directly determined, thereby saving the subsequent resource consumption and the like.

In addition, the specific value of M may be determined according to the actual needs, and the M detection modes may also include those detection modes that are respectively included according to the actual needs, for example, may include a frame cut-off detection, a text cut-off detection, a worthless detection, a black edge detection, a frame detection, a sharpness detection, and the like.

The picture cut-off detection is used for detecting whether a picture is complete, for example, a face appears in the picture, but only includes an area above a nose, and then the picture is considered to be incomplete.

The text cut-off detection is used to detect whether text in a picture is complete, for example, if a subtitle in the picture is cut off, the text is considered to be incomplete.

The non-value detection is used to detect whether the picture content is valuable, which refers to whether the picture content is valuable for business requirements, such as whether the picture content is needed in a material library, etc.

The black edge detection is used to detect whether a large area of black edge appears in the screen.

The frame detection is used for detecting whether a frame appears in the picture.

The definition detection is used for detecting whether the picture is clear or not, namely whether the definition meets the requirements or not.

How the above-mentioned various detection modes are implemented respectively is not limited. For example, the corresponding detection models may be obtained by training in advance, so that the images are input as the detection models to obtain the output detection results, or the above-described various detection methods may be respectively implemented by combining the image detection technology and the preset detection rules.

Preferably, in response to obtaining a search request of a user, a target video matched with the search request can be determined from a material library and returned to the user.

For example, the user can input keywords for searching, and then can return the target video meeting the keywords to the user as a search result, and the user can use the obtained search result as creation material to create videos or other contents, thereby facilitating the use of the user.

In connection with the above description, fig. 2 is a schematic diagram of an overall implementation process of the video processing method according to the present disclosure. As shown in fig. 2, for a video to be processed, operations such as shot segmentation, key frame and non-key frame segmentation, main object detection, image cropping and the like may be sequentially performed, so that one or more object videos may be obtained, then, quality detection may be performed on each object video by using M detection modes, and the detected object videos may be added to a material library, so that services such as searching may be provided for a user based on the object videos in the material library.

In addition, in practical application, the scheme disclosed by the disclosure supports flow rate introduction, can define the dependency relationship among different operations through graph engine technology, and can schedule flow and the like.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus.

Fig. 3 is a schematic structural diagram of a first embodiment 300 of a video processing device according to the present disclosure. As shown in fig. 3, includes: segmentation module 301 and conversion module 302.

The segmentation module 301 is configured to obtain a video to be processed, and segment the video to be processed by shots to obtain at least one video segment, where different video segments respectively correspond to different shots.

The conversion module 302 is configured to perform the following processing for any video clip: dividing each frame image forming the video segment into a key frame image and a non-key frame image, respectively cutting each frame image in the video segment according to a main body target detected from the key frame image and a preset cutting size to obtain each cutting image, and taking the video segment formed by each cutting image as a required target video.

By adopting the scheme of the embodiment of the device, the main body target detection can be carried out on the key frames in the video clips, and then each frame of image in the video clips can be cut based on the detected main body targets and the cutting size, and the image can be cut based on the picture content analysis result, so that the effective information loss is reduced, the converted video quality is further improved, in addition, inaccurate main body target detection caused by main body target change and the like due to lens conversion is avoided through lens segmentation, the converted video quality is further improved, in addition, the whole process can be realized in a full-flow automatic mode without manual participation, the labor and time cost are saved, the processing efficiency is improved, and the like.

Preferably, the segmentation module 301 may perform shot boundary detection on the video to be processed, and perform shot segmentation on the video to be processed according to the detection result, and/or the subject target detected by the conversion module 302 from any key frame image may include: and detecting a main object from the key frame image by a salient region detection mode.

Preferably, the conversion module 302 may determine, for any video clip, a first frame image in the video clip as a key frame image, and may traverse each frame image after the first frame image, and for each traversed image, may perform the following processing respectively: and determining the similarity between the traversed image and the adjacent previous frame image, determining the traversed image as a non-key frame image in response to the similarity being greater than a first threshold, and determining the traversed image as a key frame image in response to the similarity being less than or equal to the first threshold.

In addition, preferably, in response to determining the traversed image as a non-key frame image, the conversion module 302 may further use the newly determined key frame image as the key frame image corresponding to the non-key frame image, that is, may use the key frame image closest to the non-key frame image as the key frame image corresponding to the non-key frame image, and accordingly, the manner of clipping each frame image in the video clip according to the subject target detected from the key frame image and the preset clipping size may include: for a subject target detected from any one of the key frame images, the following processing is performed: and according to the main body target and the cutting size, cutting the key frame image where the main body target is positioned and the non-key frame image corresponding to the key frame image where the main body target is positioned according to the principle that the main body target comprises as much effective information as possible on the premise of conforming to the cutting size.

Fig. 4 is a schematic structural diagram of a second embodiment 400 of a video processing device according to the present disclosure. As shown in fig. 4, includes: a segmentation module 301, a conversion module 302 and a detection module 303.

The dividing module 301 and the converting module 302 are the same as those in the foregoing embodiments, and are not described in detail.

And the detection module 303 is configured to detect the quality of each target video, store the detected target videos into the material library, and discard the target videos that are not detected.

Preferably, the detection module 303 detects the quality of any target video by using M detection modes, determines that the target video passes through in response to the M detection modes being passing detection, and determines that the target video fails through in response to the detection failure in any detection mode, where M is a positive integer greater than one.

The M detection modes respectively include which detection modes can be determined according to actual needs, for example, the detection modes can include picture cut-off detection, text cut-off detection, worthless detection, black edge detection, frame detection, definition detection and the like.

Preferably, the detecting module 303 detects, for any target video, the quality of the target video by using any detecting method, which may include: and respectively carrying out quality detection on each clipping image in the target video by using the detection mode, counting the number of clipping images passing the detection, and determining that the detection of the target video by using the detection mode passes in response to the number being larger than a second threshold value, otherwise, determining that the detection of the target video by using the detection mode fails in response to the number being smaller than or equal to the second threshold value.

In addition, preferably, when the detection module 303 detects the quality of any target video by using M detection methods, the M detection methods may be sequentially performed in a predetermined order, and the detection method after stopping the execution may be performed in response to the detection failure of the currently performed detection method, or the M detection methods may be performed in parallel. The order in which the predetermined order is specified is not limited.

Fig. 5 is a schematic structural diagram of a third embodiment 500 of a video processing device according to the present disclosure. As shown in fig. 5, includes: a segmentation module 301, a conversion module 302, a detection module 303, and a service module 304.

The dividing module 301, the converting module 302, and the detecting module 303 are the same as those in the foregoing embodiments, and are not described in detail.

And the service module 304 is used for responding to the retrieval request of the obtained user, determining the target video matched with the retrieval request from the material library and returning the target video to the user.

For example, the user may input a keyword to search, and then may return the target video corresponding to the keyword as a search result to the user, and the user may use the obtained search result as authoring material to author video or other content.

The specific workflow of the embodiment of the apparatus shown in fig. 3 to 5 may refer to the related description in the foregoing method embodiment, and will not be repeated.

In a word, by adopting the scheme disclosed by the disclosure, the quality of the converted video can be improved, the processing efficiency can be improved, and the video can be converted into any required size, so that the video processing method is very flexible and convenient.

The scheme disclosed by the disclosure can be applied to the field of artificial intelligence, and particularly relates to the fields of computer vision, image processing, deep learning, cloud computing and the like. Artificial intelligence is the subject of studying certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that make a computer simulate a person, and has technology at both hardware and software levels, and artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, etc., and artificial intelligence software technologies mainly include computer vision technologies, speech recognition technologies, natural language processing technologies, machine learning/deep learning, big data processing technologies, knowledge graph technologies, etc.

In addition, the video and the like in the embodiments of the present disclosure are not specific to a particular user, and cannot reflect personal information of a particular user. In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 shows a schematic block diagram of an electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. One or more steps of the methods described in this disclosure may be performed when a computer program is loaded into RAM 603 and executed by computing unit 601. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the methods described in the present disclosure in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video processing method, comprising:

2. The method of claim 1, wherein,

the performing lens segmentation on the video to be processed comprises the following steps: performing shot boundary detection on the video to be processed, and performing shot segmentation on the video to be processed according to a detection result;

and/or, the subject target detected from any key frame image includes: and detecting a main object from the key frame image by a salient region detection mode.

3. The method of claim 1, wherein,

the dividing each frame image constituting the video clip into key frame images and non-key frame images includes:

determining a first frame image in the video clip as the key frame image;

traversing each frame image after the first frame image, and respectively executing the following processing for each traversed image: determining a similarity between the traversed image and an adjacent previous frame image, determining the traversed image as the non-key frame image in response to the similarity being greater than a first threshold, and determining the traversed image as the key frame image in response to the similarity being less than or equal to the first threshold.

4. A method according to claim 3, further comprising:

responding to the traversed image to be determined as the non-key frame image, and taking the latest determined key frame image as the key frame image corresponding to the non-key frame image;

wherein, the cropping each frame image in the video clip according to the main object detected from the key frame image and the preset cropping size comprises:

for a subject target detected from any one of the key frame images, the following processing is performed: and cutting the key frame image where the main body target is located and the non-key frame image corresponding to the key frame image where the main body target is located according to the principle that the main body target comprises as much effective information as possible on the premise of conforming to the cutting size according to the main body target and the cutting size.

5. The method of any one of claims 1-4, further comprising:

and respectively carrying out quality detection on each target video, storing the detected target videos into a material library, and discarding the target videos which are not detected.

6. The method of claim 5, further comprising:

and responding to the search request of the user, determining a target video matched with the search request from the material library, and returning to the user.

7. The method of claim 5, wherein,

the quality detection of each target video comprises the following steps:

and aiming at any target video, respectively carrying out quality detection on the target video by using M detection modes, determining that the target video passes through the detection in response to the detection passing of the M detection modes, and determining that the target video fails to pass through the detection in response to the detection failing of any detection mode, wherein M is a positive integer larger than one.

8. The method of claim 7, wherein,

the quality detection of the target video by any detection mode comprises the following steps: and respectively carrying out quality detection on each clipping image in the target video by using the detection mode, counting the number of clipping images passing detection, and determining that the detection of the target video by using the detection mode passes in response to the number being larger than a second threshold value.

9. The method of claim 7, wherein,

the quality detection of the target video by using M detection modes respectively comprises the following steps: the M detection modes are sequentially executed in a predetermined order, and the detection modes after the execution are stopped or the M detection modes are executed in parallel in response to the detection failure of the currently executed detection mode.

10. A video processing apparatus comprising: the device comprises a segmentation module and a conversion module;

11. The apparatus of claim 10, wherein,

the segmentation module carries out shot boundary detection on the video to be processed, and carries out shot segmentation on the video to be processed according to a detection result;

12. The apparatus of claim 10, wherein,

the conversion module determines a first frame image in the video clip as the key frame image, traverses each frame image after the first frame image, and respectively executes the following processing for each traversed image: determining a similarity between the traversed image and an adjacent previous frame image, determining the traversed image as the non-key frame image in response to the similarity being greater than a first threshold, and determining the traversed image as the key frame image in response to the similarity being less than or equal to the first threshold.

13. The apparatus of claim 12, wherein,

the conversion module is further used for responding to the traversed image to be determined as the non-key frame image, and taking the latest determined key frame image as the key frame image corresponding to the non-key frame image;

the conversion module performs the following processing respectively for a main object detected from any key frame image: and cutting the key frame image where the main body target is located and the non-key frame image corresponding to the key frame image where the main body target is located according to the principle that the main body target comprises as much effective information as possible on the premise of conforming to the cutting size according to the main body target and the cutting size.

14. The apparatus of any one of claims 10-13, further comprising:

the detection module is used for respectively carrying out quality detection on each target video, storing the detected target videos into the material library, and discarding the target videos which are not detected.

15. The apparatus of claim 14, further comprising:

and the service module is used for responding to the acquired search request of the user, determining the target video matched with the search request from the material library and returning the target video to the user.

16. The apparatus of claim 14, wherein,

the detection module is used for detecting the quality of any target video by using M detection modes, determining that the target video passes through the detection in response to the detection passing of the M detection modes, determining that the target video fails through the detection in response to the detection failing of any detection mode, and determining that the target video fails through the detection, wherein M is a positive integer larger than one.

17. The apparatus of claim 16, wherein,

the detection module detects the quality of each cut image in the target video by using any detection mode, counts the number of cut images passing detection, and determines the detection passing of the target video by using the detection mode in response to the number being larger than a second threshold.

18. The apparatus of claim 16, wherein,

the detection module sequentially executes M detection modes according to a preset sequence, and responds to the detection that the currently executed detection mode is not passed, the detection mode after the execution is stopped, or the M detection modes are executed in parallel.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-9.

21. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-9.