CN111815656B

CN111815656B - Video processing method, apparatus, electronic device and computer readable medium

Info

Publication number: CN111815656B
Application number: CN202010711781.0A
Authority: CN
Inventors: 杨松
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2023-08-11
Anticipated expiration: 2040-07-22
Also published as: CN111815656A

Abstract

Embodiments of the present disclosure disclose video processing methods, apparatuses, electronic devices, and computer readable media. One embodiment of the method comprises the following steps: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images for a second frame image associated with the first frame image; determining a set of connected regions corresponding to the second frame image based on the second segmented image set; generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. According to the embodiment, the space proportion of the target organism in the image to be segmented comprising the target organism is improved, and the segmentation of the target organism in the frame image comprising the target video can be simply and accurately realized.

Description

Video processing method, apparatus, electronic device and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a video processing method, apparatus, electronic device, and computer readable medium.

Background

Organism segmentation (e.g., human body) can be performed by segmenting the organism from the image, and most current techniques are based on convolutional neural networks (Convolutional Neural Networks, CNN). The organism segmentation technology can be applied to scenes such as background changing of a video terminal. However, the current technology has poor segmentation effect when the target organism occupies a relatively small area in the image. For example, using an algorithmic network in real-time at a video terminal to make background changes to the target organism may result in segmentation failure due to the target organism being relatively far from the video terminal (e.g., 2-5 meters).

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose video processing methods, apparatuses, devices, and computer readable media to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a video processing method, the method comprising: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images for a second frame image associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second divided image set; generating a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image based on the connected region set; generating a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames to obtain a first set of segmented images

In a second aspect, some embodiments of the present disclosure provide a video processing apparatus, the apparatus comprising: a first determination unit configured to determine a second divided image set of a second frame image associated with the first frame image in response to detecting that the first frame image is not a target frame image in a target video; a second determination unit configured to determine a set of connected regions corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first rectangular frame set representing a position corresponding to the target object, based on the connected region set; and a second generation unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, and obtain a first divided image set.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first aspects.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any of the first aspects.

One of the above embodiments of the present disclosure has the following advantageous effects: first, it is necessary to determine whether the first frame image is a target frame image in a target video, and different division modes may be used for the first frame image according to the result of whether the first frame image is a target frame image in a target video. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second set of segmented images of a second frame image associated with the first frame image is determined. The second set of segmented images is obtained as a basis for segmenting the first frame image. Then, a set of connected regions corresponding to the second frame image can be determined from the second divided image set. Further, a first set of rectangular boxes representing positions corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first divided image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first divided image set. According to the video processing method, the position of the target image in the first frame image is determined through the communication area set and the first rectangular frame set, so that the area to be segmented is reduced, and the occupation ratio of the target image in the area to be segmented is increased. Furthermore, the division of the target living body in the frame image included in the target video can be simply and accurately realized.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

1-2 are schematic diagrams of one application scenario diagram of a video processing method of some embodiments of the present disclosure;

FIG. 3 is a flow chart of some embodiments of a video processing method according to the present disclosure;

FIG. 4 is a flow chart of further embodiments of a video processing method according to the present disclosure;

fig. 5 is a schematic structural diagram of some embodiments of a video processing apparatus according to the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1-2 are schematic diagrams of one application scenario of a video processing method according to some embodiments of the present disclosure.

As shown in fig. 1, reference numeral 102 may be a target frame image 102 in a target video, reference numeral 103 may be a second frame image 103, and reference numeral 104 may be a first frame image 104.

As shown in fig. 2, as an example, in response to detecting that the first frame image 104 is not the target frame image 102 in the target video, the electronic device 101 may determine a second segmented image 105 of the second frame image 103 associated with the first frame image 104 described above. Alternatively, the electronic device 101 may determine the second segmented image 105 of the second frame image 103 associated with the above-described first frame image 104 through an image segmentation network. Then, the communication area 106 corresponding to the second frame image is determined by the second divided image 105. Alternatively, the binarized image may be obtained by binarizing the second divided image. Then, the black region in the binarized image is taken as a connected region 106. Further, a first rectangular frame 107 representing the position corresponding to the target object corresponding to the first frame image 104 is generated from the communication region 106. Finally, a first divided image 108 of the first frame image 104 is generated according to the first rectangular frame 107. Alternatively, an image segmentation algorithm may be used to segment the image corresponding to the first rectangular frame 107, so as to obtain the first segmented image 108 of the first frame image 104.

It should be noted that the video processing method may be performed by the electronic device 101. The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device 101 is embodied as software, it may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices as desired for an implementation.

With continued reference to fig. 3, a flow 300 of some embodiments of video processing methods according to the present disclosure is shown. The video processing method comprises the following steps:

in response to detecting that the first frame image is not a target frame image in the target video, a second set of segmented images of a second frame image associated with the first frame image is determined 301.

In some embodiments, in response to detecting that the first frame image is not a target frame image in the target video, an executing subject of the video processing method (e.g., the electronic device 101 shown in fig. 2) may determine a second segmented image set of a second frame image associated with the first frame image. The first frame image and the second frame image may be obtained from a frame image sequence corresponding to a target video. The first frame image may be in a time series relationship with the second frame image. In the above-described frame image sequence, the above-described second frame image may be a previous frame image of the above-described first frame image, as an example. The target frame image may be an image previously calibrated in the frame image sequence. Note that, the marking method of the pre-calibrated image may be to mark the frame image sequence at predetermined frame number intervals.

As an example, the first frame image is input into a pre-trained target detection network, and it may be detected whether the first frame image is a target frame image in the target video. In response to detecting that the first frame image is not the target frame image in the target video, the target object segmentation may be performed on a previous frame image of the first frame image, and the obtained segmentation result may be used as a second segmented image set of the second frame image.

Step 302, determining a connected region set corresponding to the second frame image based on the second divided image set.

In some embodiments, the execution body may determine a connected region set corresponding to the second frame image according to the second divided image set. The connected Region (Connected Component) is generally an image Region (Region, blob) formed by foreground pixels having the same pixel value and adjacent positions in the image. As an example, the second frame image may be marked by locating the inner and outer contours of the connected region by using a marking algorithm used in the open source library cvBlob, and a connected region set corresponding to the second frame image may be obtained.

Step 303, generating a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image based on the connected region set.

In some embodiments, the execution body may generate a first rectangular frame set representing a position corresponding to the target object, corresponding to the first frame image, based on the connected region set. As an example, first, a bounding box for each connected region in the connected region set may be determined. Then, a minimum rectangular frame of surrounding frames surrounding each of the communication areas is determined as a rectangular frame corresponding to the first frame image.

In some optional implementations of some embodiments, generating a first set of rectangular boxes corresponding to the first frame image and characterizing a position corresponding to the target object may include:

and a first step of removing the connected region of which the connected region area is smaller than a preset threshold value in the connected region set. As an example, an average value of the connected region areas in the connected region set may be first determined. Then, the connected regions having the connected region areas smaller than the average value in the connected region set are removed.

And a second step of determining the bounding box of each connected region in the removed connected region set.

And a third step of generating a rectangular frame as a rectangular frame corresponding to the first frame image based on the bounding box of each connected region. As an example, a minimum rectangular frame of bounding boxes surrounding each of the above-described connected areas may be determined as the rectangular frame corresponding to the above-described first frame image.

Step 304, generating a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, to obtain a first set of segmented images.

In some embodiments, the executing body may generate a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, to obtain the first set of segmented images. As an example, first, a total rectangular frame surrounding the first set of rectangular frames may be determined from each rectangular frame in the first set of rectangular frames. Then, image segmentation is carried out on the corresponding image of the total rectangular frame, a first segmented image of the first frame image is generated, and a first segmented image set is obtained.

In some optional implementations of some embodiments, the method further includes: first, each first divided image in the first divided image set may be combined to obtain a combined image. Then, adding a target background to the combined image to obtain a combined image with the target background added. As an example, combining the respective first segmented images of the set of first segmented images may be performed by determining the positions of the respective first segmented images and then combining the respective segmented images in accordance with the positions. It should be noted that, adding the target background to each first divided image combined in the target video may achieve the effect of adding the special effect to the target object in the target video.

In some optional implementations of some embodiments, generating the first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames may be: and inputting the images corresponding to each rectangular frame into a pre-trained image segmentation network, and outputting a first segmentation image to obtain a first segmentation image set. Wherein, the image segmentation network can be one of the following: FCN network (Fully Convolutional Networks, full convolution network), segNet network (Semantic Segmentation Network, image semantic segmentation network), deep lab semantic segmentation network, PSPNet network (Pyramid Scene Parsing Network, semantic segmentation network), mask-RCNN network (Mask-Region-CNN, image instance segmentation network).

Optionally, generating the first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, to obtain the first set of segmented images may include the following steps:

the first step is to cut out the first frame image based on the rectangular frame corresponding to the first frame image, and generate a rectangular image set corresponding to the first frame image.

And secondly, dividing a target object for each rectangular image in the rectangular image set corresponding to the first frame image, and generating a first divided image to obtain a first divided image set.

As can be seen from the above embodiments, first, it is required to determine whether the first frame image is a target frame image in a target video, and a result of determining whether the first frame image is a target frame image in a target video may use different segmentation manners for the first frame image. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second set of segmented images of a second frame image associated with the first frame image is determined. The second set of segmented images is obtained as a basis for segmenting the first frame image. Then, a set of connected regions corresponding to the second frame image can be determined from the second divided image set. Further, a first set of rectangular boxes representing positions corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first divided image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first divided image set. According to the video processing method, the position of the target image in the first frame image is determined through the communication area set and the first rectangular frame set, so that the area to be segmented is reduced, and the occupation ratio of the target image in the area to be segmented is increased. Furthermore, the division of the target living body in the frame image included in the target video can be simply and accurately realized.

With continued reference to fig. 4, a flow 400 of further embodiments of video processing methods according to the present disclosure is shown. The video processing method comprises the following steps:

in response to detecting that the first frame image is not a target frame image in the target video, a second set of segmented images of a second frame image associated with the first frame image is determined 401.

Step 402, determining a connected region set corresponding to the second frame image based on the second divided image set.

Step 403, generating a first rectangular frame set corresponding to the first frame image and representing a position corresponding to the target object based on the connected region set.

Step 404, generating a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, to obtain a first set of segmented images.

In some embodiments, the specific implementation of steps 401 to 404 and the technical effects thereof may refer to steps 301 to 304 in those embodiments corresponding to fig. 3, which are not described herein.

And step 405, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a position corresponding to the target object.

In some embodiments, in response to detecting that the first frame image is the target frame image, an execution subject of the video processing method (e.g., the electronic device 101 shown in fig. 2) may perform target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object. As an example, in response to detecting that the first frame image is the target frame image, the first frame image may be subject to target detection, and a gradient histogram (Histogram of Oriented Gradient, HOG) may be used to subject the first frame image to target detection, so as to obtain a second set of rectangular frames representing a corresponding position of the target object.

In some optional implementations of some embodiments, obtaining the second set of rectangular boxes may be: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object. Wherein, the target detection network may be one of the following: SSD (Single Shot MultiBox Detector) algorithm, R-CNN (Region-Convolutional Neural Networks) algorithm, fast R-CNN (Fast Region-Convolutional Neural Networks) algorithm, SPP-NET (Spatial Pyramid Pooling Network) algorithm, YOLO (You Only Look Once) algorithm, FPN (Feature Pyramid Networks) algorithm, DCN (Deformable ConvNets) algorithm, retinaNet target detection algorithm.

Step 406, generating a third segmented image of the first frame image based on each rectangular frame in the second set of rectangular frames, to obtain a third set of segmented images.

In some embodiments, the specific implementation of step 406 and the technical effects thereof may refer to step 304 in those embodiments corresponding to fig. 3, which are not described herein.

As can be seen from fig. 4, the specific step of detecting that the first frame image is the target frame image and obtaining the third segmented image set is highlighted by the flow 400 of the video processing method in some embodiments corresponding to fig. 4, compared to the description of some embodiments corresponding to fig. 3. Thus, the embodiments enable more accurate and effective segmentation of the target organism in the frame image included in the target video.

With continued reference to fig. 5, as an implementation of the method described above for each of the above figures, the present disclosure provides some embodiments of a video processing apparatus, which apparatus embodiments correspond to those described above for fig. 3, and which apparatus is particularly applicable in a variety of electronic devices.

As shown in fig. 5, the video processing apparatus 500 of some embodiments includes: a first determination unit 501, a second determination unit 502, a first generation unit 503, and a second generation unit 504. Wherein the first determining unit 501 is configured to determine, in response to detecting that the first frame image is not a target frame image in the target video, a second set of divided images of a second frame image associated with the above-mentioned first frame image. The second determining unit 502 is configured to determine a connected region set corresponding to the second frame image based on the second divided image set. The first generating unit 503 is configured to generate a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image based on the connected region set. The second generating unit 504 is configured to generate, based on each rectangular frame in the first set of rectangular frames, a first divided image of the first frame image, and obtain a first set of divided images.

In some alternative implementations of some embodiments, the apparatus 500 may further include: a combination unit and an addition unit (not shown in the figure). Wherein the combining unit may be configured to combine the respective first segmented images of the first set of segmented images to obtain a combined image. The adding unit may be configured to add a target background to the above-described combined image, resulting in a combined image after adding the target background.

In some alternative implementations of some embodiments, the apparatus 500 may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform object detection on the first frame image in response to detecting that the first frame image is the target frame image, so as to obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.

In some alternative implementations of some embodiments, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

In some optional implementations of some embodiments, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

In some alternative implementations of some embodiments, the first generation unit 503 may be further configured to: removing the communication areas with the area smaller than a preset threshold value in the communication area set; determining a bounding box of each connected region in the removed connected region set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

In some optional implementations of some embodiments, the second generating unit 504 may be further configured to: and clipping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And carrying out target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image, generating a first segmented image, and obtaining a first segmented image set.

It will be appreciated that the elements described in the apparatus 500 correspond to the various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 500 and the units contained therein, and are not described in detail herein.

Referring now to fig. 6, a schematic diagram of an electronic device 600 (e.g., the electronic device of fig. 2) suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 609, or from storage device 608, or from ROM 602. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be embodied in the apparatus; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images for a second frame image associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second divided image set; generating a rectangular frame set representing a position corresponding to the target object corresponding to the first frame image based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the rectangular frame set to obtain a first segmentation image set.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first determination unit, a second determination unit, a first generation unit, and a second generation unit. Wherein the names of the units do not constitute a limitation of the unit itself in some cases, for example, the first determining unit may also be described as "a unit that determines a second divided image set of the second frame image associated with the above-mentioned first frame image in response to detecting that the first frame image is not the target frame image in the target video".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

According to one or more embodiments of the present disclosure, there is provided a video processing method including: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images for a second frame image associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second divided image set; generating a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the above method further comprises: combining the first divided images in the first divided image set to obtain a combined image; and adding a target background to the combined image to obtain a combined image with the target background added.

According to one or more embodiments of the present disclosure, the above method further comprises: in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object; and generating a third divided image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third divided image set.

According to one or more embodiments of the present disclosure, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object, including: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

According to one or more embodiments of the present disclosure, the generating, based on each rectangular frame in the first set of rectangular frames, a first segmented image of the first frame image, to obtain a first set of segmented images, includes: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the generating, based on the connected region set, a first rectangular frame set corresponding to the first frame image and representing a position corresponding to the target object includes: removing the communication areas with the area smaller than a preset threshold value in the communication area set; determining a bounding box of each connected region in the removed connected region set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

According to one or more embodiments of the present disclosure, the generating, based on each rectangular frame in the first set of rectangular frames, a first segmented image of the first frame image, to obtain a first set of segmented images, includes: cutting the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image; and carrying out target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image, generating a first segmented image, and obtaining a first segmented image set.

According to one or more embodiments of the present disclosure, there is provided a video processing apparatus including: a first determination unit configured to determine a second divided image set of a second frame image associated with the first frame image in response to detecting that the first frame image is not a target frame image in a target video; a second determination unit configured to determine a set of connected regions corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first rectangular frame set representing a position corresponding to the target object, based on the connected region set; and a second generation unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, and obtain a first divided image set.

In accordance with one or more embodiments of the present disclosure, the apparatus may further include: a combination unit and an addition unit (not shown in the figure). Wherein the combining unit may be configured to combine the respective first segmented images of the first set of segmented images to obtain a combined image. The adding unit may be configured to add a target background to the above-described combined image, resulting in a combined image after adding the target background.

In accordance with one or more embodiments of the present disclosure, the apparatus may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform object detection on the first frame image in response to detecting that the first frame image is the target frame image, so as to obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.

According to one or more embodiments of the present disclosure, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

According to one or more embodiments of the present disclosure, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.

According to one or more embodiments of the present disclosure, the first generation unit may be further configured to: removing the communication areas with the area smaller than a preset threshold value in the communication area set; determining a bounding box of each connected region in the removed connected region set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.

According to one or more embodiments of the present disclosure, the second generation unit may be further configured to: and clipping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And carrying out target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image, generating a first segmented image, and obtaining a first segmented image set.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, there is provided a computer readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a method as described in any of the embodiments above.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A video processing method, comprising:

in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images of a second frame image associated with the first frame image;

determining a connected region set corresponding to the second frame image based on the second divided image set;

generating a first rectangular frame set corresponding to the position of the characterization target object corresponding to the first frame image based on the connected region set;

and cutting the first frame image based on each rectangular frame in the first rectangular frame set to obtain a rectangular image set, and dividing each rectangular image in the rectangular image set to generate a first divided image of the first frame image to obtain a first divided image set.

2. The method of claim 1, wherein the method further comprises:

combining each first segmented image in the first segmented image set to obtain a combined image;

and adding a target background to the combined image to obtain a combined image with the target background added.

3. The method of claim 1, wherein the method further comprises:

In response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object;

and generating a third segmentation image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third segmentation image set.

4. A method according to claim 3, wherein said responsive to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second set of rectangular boxes characterizing a corresponding position of the target object, comprises:

and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.

5. The method of claim 1, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes, resulting in a first set of segmented images, comprises:

and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame in the first rectangular frame set, and outputting a first segmentation image to obtain a first segmentation image set.

6. The method of claim 1, wherein the generating, based on the set of connected regions, a first set of rectangular boxes corresponding to the first frame image that characterize the target object correspondence location comprises:

removing the communication areas with the area smaller than a preset threshold value in the communication area set;

determining a bounding box of each connected region in the removed connected region set;

and generating a rectangular frame as a first rectangular frame corresponding to the first frame image based on the bounding box of each connected region.

7. The method of claim 6, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes, resulting in a first set of segmented images, comprises:

clipping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image;

and carrying out target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image, generating a first segmented image, and obtaining a first segmented image set.

8. A video processing apparatus comprising:

A first determination unit configured to determine a second divided image set of a second frame image associated with a first frame image in response to detecting that the first frame image is not a target frame image in a target video;

a second determination unit configured to determine a set of connected regions corresponding to the second frame image based on the second divided image set;

a first generation unit configured to generate a first rectangular frame set of a position corresponding to a characterization target object corresponding to the first frame image, based on the connected region set;

the second generating unit is configured to cut the first frame image based on each rectangular frame in the rectangular frame set to obtain a rectangular image set, divide each rectangular image in the rectangular image set, and generate a first divided image of the first frame image to obtain a first divided image set.

9. An electronic device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-7.