CN111815656A

CN111815656A - Video processing method, video processing device, electronic equipment and computer readable medium

Info

Publication number: CN111815656A
Application number: CN202010711781.0A
Authority: CN
Inventors: 杨松
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-23
Anticipated expiration: 2040-07-22
Also published as: CN111815656B

Abstract

The embodiment of the disclosure discloses a video processing method, a video processing device, an electronic device and a computer readable medium. One embodiment of the method comprises: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images of a second frame image associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmentation image set; generating a first rectangular frame set which is corresponding to the first frame image and represents the corresponding position of the target object based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The embodiment improves the spatial proportion of the target organism in the image to be segmented comprising the target organism, and can simply, conveniently and accurately segment the target organism in the frame image comprising the target video.

Description

Video processing method, video processing device, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a video processing method and apparatus, an electronic device, and a computer-readable medium.

Background

The living body segmentation (for example, human body) can be achieved by segmenting a living body from an image, and most of the current technologies are implemented based on a Convolutional Neural Network (CNN). The organism segmentation technology can be applied to scenes such as changing backgrounds of video terminals. However, the current technology has a poor segmentation effect when the target organism occupies a small area in the image. For example, when a video terminal uses an algorithm network to perform a background change special effect on the target organism in real time, segmentation may fail because the target organism is far away from the video terminal (e.g., 2-5 meters).

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose video processing methods, apparatuses, devices and computer readable media to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a video processing method, including: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set; generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set

In a second aspect, some embodiments of the present disclosure provide a video processing apparatus, the apparatus comprising: a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of a second frame image associated with the first frame image; a second determination unit configured to determine a connected region set corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first set of rectangular frames representing a corresponding position of the target object corresponding to the first frame image, based on the set of connected regions; and a second generating unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.

One of the above-described various embodiments of the present disclosure has the following advantageous effects: first, it is determined whether the first frame image is a target frame image in the target video, and different segmentation methods may be applied to the first frame image according to whether the first frame image is the target frame image in the target video. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second segmented image set of second frame images associated with the first frame image is determined. And obtaining the basis for the second segmentation image set to be used for segmenting the first frame image. Then, a connected region set corresponding to the second frame image can be specified by the second divided image set. Further, a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first segmentation image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The video processing method determines the position of a target image in the first frame image through the connected region set and the first rectangular frame set, so that the region to be segmented is reduced, and the occupation ratio of the target image in the region to be segmented is increased. Furthermore, the segmentation of the target organism in the frame image included in the target video can be simply and accurately realized.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1-2 are schematic diagrams of an application scene diagram of a video processing method of some embodiments of the present disclosure;

fig. 3 is a flow diagram of some embodiments of a video processing method according to the present disclosure;

FIG. 4 is a flow diagram of further embodiments of a video processing method according to the present disclosure;

fig. 5 is a schematic block diagram of some embodiments of a video processing apparatus according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1-2 are schematic diagrams of one application scenario of a video processing method according to some embodiments of the present disclosure.

As shown in fig. 1, reference numeral 102 may be a target frame image 102 in a target video, reference numeral 103 may be a second frame image 103, and reference numeral 104 may be a first frame image 104.

As shown in fig. 2, as an example, in response to detecting that the first frame image 104 is not the target frame image 102 in the target video, the electronic device 101 may determine the second divided image 105 of the second frame image 103 associated with the above-described first frame image 104. Alternatively, the electronic device 101 may determine the second divided image 105 of the second frame image 103 associated with the first frame image 104 through an image division network. Then, the connected component 106 corresponding to the second frame image is specified by the second divided image 103. Optionally, the binarized image may be obtained by binarizing the second segmented image. The white area in the binarized image is then taken as the connected region 106. Further, a first rectangular frame 107 representing the position corresponding to the target object corresponding to the first frame image 104 is generated from the connected component 106. Finally, a first divided image 108 of the first frame image 104 is generated based on the first rectangular frame 107. Optionally, an image segmentation algorithm may be used to segment the image corresponding to the first rectangular frame 107, so as to obtain a first segmented image 108 of the first frame image 104.

It should be noted that the video processing method may be executed by the electronic device 101. The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device 101 is embodied as software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.

With continued reference to fig. 3, a flow 300 of some embodiments of a video processing method according to the present disclosure is shown. The video processing method comprises the following steps:

step 301, in response to detecting that the first frame image is not the target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image.

In some embodiments, in response to detecting that the first frame image is not a target frame image in the target video, an executing subject of the video processing method (e.g., the electronic device 101 shown in fig. 2) may determine a second set of segmented images of a second frame image associated with the first frame image. The first frame image and the second frame image may be obtained from a frame image sequence corresponding to a target video. The first frame image may be in a time-series relationship with the second frame image. In the frame image sequence, the second frame image may be, for example, a frame image that is previous to the first frame image. The target frame image may be an image previously designated in the frame image sequence. It should be noted that the pre-calibrated images may be marked according to a predetermined number of frame intervals in the frame image sequence.

As an example, the first frame image may be input to a pre-trained target detection network, and it may be detected whether the first frame image is a target frame image in the target video. In response to detecting that the first frame image is not the target frame image in the target video, the target object segmentation may be performed on a frame image that is previous to the first frame image, and the obtained segmentation result is used as a second segmentation image set of a second frame image.

Step 302 is to determine a connected region set corresponding to the second frame image based on the second divided image set.

In some embodiments, the execution subject may determine a set of connected regions corresponding to the second frame image according to the second segmented image set. The Connected Component generally refers to an image area (Blob) composed of foreground pixels having the same pixel value and adjacent positions in the image. For example, the second frame image may be marked by locating the inner and outer contours of the connected region using a marking algorithm used in the open source library cvBlob, so as to obtain a connected region set corresponding to the second frame image.

Step 303 is to generate a first rectangular frame set representing a position corresponding to the target object, which corresponds to the first frame image, based on the connected region set.

In some embodiments, the executing subject may generate a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object based on the set of connected regions. As an example, first, a bounding box for each connected region in the set of connected regions may be determined. Then, a minimum rectangular frame surrounding the surrounding frame of each connected region is determined as a rectangular frame corresponding to the first frame image.

In some optional implementations of some embodiments, generating a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object may include:

firstly, removing the connected regions with the area smaller than a preset threshold value in the connected region set. As an example, an average of the connected region areas in the set of connected regions may be determined first. And then, removing the communication regions with the area of the communication regions smaller than the average value in the communication region set.

And secondly, determining a surrounding frame of each communication area in the removed communication area set.

And a third step of generating a rectangular frame as a rectangular frame corresponding to the first frame image based on the bounding frame of each connected region. As an example, a minimum rectangular frame of the bounding frame that bounds each of the connected regions described above may be determined as the rectangular frame corresponding to the first frame image described above.

Step 304, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, so as to obtain a first segmented image set.

In some embodiments, the executing entity may generate a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, resulting in a first set of segmented images. As an example, first, a total rectangular frame surrounding the first rectangular frame set may be determined according to each rectangular frame in the first rectangular frame set. Then, image segmentation is performed on the image corresponding to the total rectangular frame to generate a first segmented image of the first frame image, and a first segmented image set is obtained.

In some optional implementations of some embodiments, the method further comprises: first, the first divided images in the first divided image set may be combined to obtain a combined image. And then, adding a target background to the combined image to obtain the combined image with the target background. As an example, combining the respective first divided images in the first divided image set may be performed by determining positions of the respective first divided images and then combining the respective divided images in accordance with the positions. It should be noted that, adding the target background to each of the first divided images after being combined in the target video may achieve the effect of adding a special effect to the target object in the target video.

In some optional implementations of some embodiments, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, and obtaining the first segmented image set may be: and inputting the image corresponding to each rectangular frame into a pre-trained image segmentation network, and outputting a first segmentation image to obtain a first segmentation image set. Wherein, the image segmentation network may be one of the following: FCN Network (full convolutional Network), SegNet Network (Semantic Segmentation Network), deep lab Semantic Segmentation Network, PSPNet Network (Semantic Segmentation Network), Mask-RCNN Network (Mask-Region-CNN, image instance Segmentation Network).

Optionally, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set may include the following steps:

the first step is to crop the first frame image based on the rectangular frame corresponding to the first frame image, and generate a rectangular image set corresponding to the first frame image.

And secondly, performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.

It can be seen from the foregoing embodiments that, first, it is determined whether the first frame image is the target frame image in the target video, and as a result of determining whether the first frame image is the target frame image in the target video, different segmentation methods may be applied to the first frame image. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second segmented image set of second frame images associated with the first frame image is determined. And obtaining the basis for the second segmentation image set to be used for segmenting the first frame image. Then, a connected region set corresponding to the second frame image can be specified by the second divided image set. Further, a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first segmentation image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The video processing method determines the position of a target image in the first frame image through the connected region set and the first rectangular frame set, so that the region to be segmented is reduced, and the occupation ratio of the target image in the region to be segmented is increased. Furthermore, the segmentation of the target organism in the frame image included in the target video can be simply and accurately realized.

With continued reference to fig. 4, a flow 400 of further embodiments of a video processing method according to the present disclosure is shown. The video processing method comprises the following steps:

step 401, in response to detecting that the first frame image is not the target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image.

Step 402 of determining a connected region set corresponding to the second frame image based on the second divided image set.

Step 403 is performed to generate a first rectangular frame set representing a position corresponding to the target object, corresponding to the first frame image, based on the connected region set.

Step 404, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, to obtain a first segmented image set.

In some embodiments, the specific implementation and technical effects of

steps

401 and 404 can refer to

steps

301 and 304 in those embodiments corresponding to fig. 3, which are not described herein again.

Step 405, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object.

In some embodiments, in response to detecting that the first frame image is the target frame image, an executing entity (e.g., the electronic device 101 shown in fig. 2) of the video processing method may perform target detection on the first frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. As an example, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image, and performing target detection on the first frame image by using a Histogram of gradients (HOG), a second rectangular frame set representing a corresponding position of the target object may be obtained.

In some optional implementations of some embodiments, obtaining the second set of rectangular boxes may be: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object. Wherein, the target detection network may be one of the following: SSD (Single Shot MultiBoxDector) algorithm, R-CNN (Region-conditional Neural Networks) algorithm, Fast R-CNN (Fast propagation-conditional Neural Networks) algorithm, SPP-NET (spatial Pyramid Networks) algorithm, YOLO (young Only Look one) algorithm, FPN (feature Neural Networks) algorithm, DCN (Deformable ConvNet) algorithm, RetinaNet target detection algorithm.

Step 406, generating a third segmented image of the first frame image based on each rectangular frame in the second rectangular frame set, so as to obtain a third segmented image set.

In some embodiments, the specific implementation of step 406 and the technical effect brought by the implementation may refer to step 304 in those embodiments corresponding to fig. 3, and are not described herein again.

As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 3, the process 400 of the video processing method in some embodiments corresponding to fig. 4 highlights a specific step of detecting that the first frame image is the target frame image and obtaining the third segmented image set. Therefore, the embodiments enable the target organism in the frame image included in the target video to be segmented more accurately and effectively.

With continuing reference to fig. 5, as an implementation of the above-described method for the above-described figures, the present disclosure provides some embodiments of a video processing apparatus, which correspond to those of the above-described method embodiments of fig. 3, and which may be particularly applicable to various electronic devices.

As shown in fig. 5, the video processing apparatus 500 of some embodiments includes: a first determining unit 501, a second determining unit 502, a first generating unit 503, and a second generating unit 504. Wherein the first determining unit 501 is configured to determine a second segmented image set of second frame images associated with the first frame image in response to detecting that the first frame image is not a target frame image in the target video. A second determining unit 502 configured to determine a connected region set corresponding to the second frame image based on the second divided image set. A first generating unit 503 configured to generate a first set of rectangular frames representing the corresponding position of the target object corresponding to the first frame image based on the set of connected regions. A second generating unit 504 configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.

In some optional implementations of some embodiments, the apparatus 500 may further include: a combining unit and an adding unit (not shown in the figure). The combining unit may be configured to combine the first segmented images in the first segmented image set to obtain a combined image. The adding unit may be configured to add the target background to the combined image, resulting in the combined image after the target background is added.

In some optional implementations of some embodiments, the apparatus 500 may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform target detection on the first frame image in response to detecting that the first frame image is the target frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.

In some optional implementations of some embodiments, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

In some optional implementations of some embodiments, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

In some optional implementations of some embodiments, the first generating unit 503 may be further configured to: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

In some optional implementations of some embodiments, the second generating unit 504 may be further configured to: and cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the electronic device of fig. 2) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (hypertext transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a set of rectangular frames corresponding to the first frame image and representing the corresponding position of the target object based on the set of connected regions; and generating a first segmentation image of the first frame image based on each rectangular frame in the rectangular frame set to obtain a first segmentation image set.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determining unit, a second determining unit, a first generating unit, and a second generating unit. Where the names of the cells do not in some cases constitute a limitation on the cells themselves, for example, the first determination unit may also be described as a "unit that determines a second divided image set of a second frame image associated with a first frame image described above in response to detecting that the first frame image is not a target frame image in the target video".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

According to one or more embodiments of the present disclosure, there is provided a video processing method including: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the method further includes: combining each first segmentation image in the first segmentation image set to obtain a combined image; and adding a target background to the combined image to obtain the combined image with the target background.

According to one or more embodiments of the present disclosure, the method further includes: in response to the fact that the first frame image is detected to be the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing the corresponding position of the target object; and generating a third segmentation image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third segmentation image set.

According to one or more embodiments of the present disclosure, the performing, in response to detecting that the first frame image is the target frame image, target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object includes: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

According to one or more embodiments of the present disclosure, the generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set includes: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the generating a first rectangular frame set corresponding to the first frame image and representing a corresponding position of the target object based on the connected region set includes: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

According to one or more embodiments of the present disclosure, the generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set includes: cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image; and performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.

According to one or more embodiments of the present disclosure, there is provided a video processing apparatus including: a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of a second frame image associated with the first frame image; a second determination unit configured to determine a connected region set corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first set of rectangular frames representing a corresponding position of the target object corresponding to the first frame image, based on the set of connected regions; and a second generating unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.

According to one or more embodiments of the present disclosure, an apparatus may further include: a combining unit and an adding unit (not shown in the figure). The combining unit may be configured to combine the first segmented images in the first segmented image set to obtain a combined image. The adding unit may be configured to add the target background to the combined image, resulting in the combined image after the target background is added.

According to one or more embodiments of the present disclosure, an apparatus may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform target detection on the first frame image in response to detecting that the first frame image is the target frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.

According to one or more embodiments of the present disclosure, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

According to one or more embodiments of the present disclosure, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the first generating unit may be further configured to: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

According to one or more embodiments of the present disclosure, the second generating unit may be further configured to: and cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method as described in any of the embodiments above.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A video processing method, comprising:

in response to detecting that a first frame image is not a target frame image in a target video, determining a second set of segmented images of a second frame image associated with the first frame image;

determining a set of connected regions corresponding to the second frame image based on the second set of segmented images;

generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set;

and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set.

2. The method of claim 1, wherein the method further comprises:

combining each first segmentation image in the first segmentation image set to obtain a combined image;

and adding a target background to the combined image to obtain the combined image with the target background.

3. The method of claim 1, wherein the method further comprises:

responding to the fact that the first frame image is detected to be the target frame image, carrying out target detection on the first frame image, and obtaining a second rectangular frame set representing the corresponding position of the target object;

and generating a third segmentation image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third segmentation image set.

4. The method of claim 3, wherein the performing target detection on the first frame image in response to detecting that the first frame image is the target frame image to obtain a second rectangular frame set representing a corresponding position of the target object comprises:

and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

5. The method of claim 1, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes, resulting in a first set of segmented images, comprises:

and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame in the first rectangular frame set, and outputting a first segmentation image to obtain a first segmentation image set.

6. The method of claim 1, wherein the generating, based on the set of connected regions, a first set of rectangular boxes corresponding to the first frame image that characterize a corresponding location of the target object comprises:

removing the connected regions with the area smaller than a preset threshold value in the connected region set;

determining a surrounding frame of each communication area in the removed communication area set;

generating a rectangular frame as a first rectangular frame corresponding to the first frame image based on the bounding frame of each connected region.

7. The method of claim 6, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes resulting in a first set of segmented images comprises:

based on the rectangular frame corresponding to the first frame image, cutting the first frame image to generate a rectangular image set corresponding to the first frame image;

and performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.

8. A video processing apparatus comprising:

a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of second frame images associated with the first frame image;

a second determination unit configured to determine a set of connected regions corresponding to the second frame image based on the second set of divided images;

a first generating unit configured to generate a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object based on the set of connected regions;

a second generating unit configured to generate a first segmented image of the first frame image based on each rectangular frame in the rectangular frame set, resulting in a first segmented image set.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any one of claims 1-7.