CN111815656A - Video processing method, video processing device, electronic equipment and computer readable medium - Google Patents

Video processing method, video processing device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN111815656A
CN111815656A CN202010711781.0A CN202010711781A CN111815656A CN 111815656 A CN111815656 A CN 111815656A CN 202010711781 A CN202010711781 A CN 202010711781A CN 111815656 A CN111815656 A CN 111815656A
Authority
CN
China
Prior art keywords
image
frame
frame image
rectangular
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010711781.0A
Other languages
Chinese (zh)
Other versions
CN111815656B (en
Inventor
杨松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010711781.0A priority Critical patent/CN111815656B/en
Publication of CN111815656A publication Critical patent/CN111815656A/en
Application granted granted Critical
Publication of CN111815656B publication Critical patent/CN111815656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The embodiment of the disclosure discloses a video processing method, a video processing device, an electronic device and a computer readable medium. One embodiment of the method comprises: in response to detecting that the first frame image is not a target frame image in the target video, determining a second set of segmented images of a second frame image associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmentation image set; generating a first rectangular frame set which is corresponding to the first frame image and represents the corresponding position of the target object based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The embodiment improves the spatial proportion of the target organism in the image to be segmented comprising the target organism, and can simply, conveniently and accurately segment the target organism in the frame image comprising the target video.

Description

Video processing method, video processing device, electronic equipment and computer readable medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a video processing method and apparatus, an electronic device, and a computer-readable medium.
Background
The living body segmentation (for example, human body) can be achieved by segmenting a living body from an image, and most of the current technologies are implemented based on a Convolutional Neural Network (CNN). The organism segmentation technology can be applied to scenes such as changing backgrounds of video terminals. However, the current technology has a poor segmentation effect when the target organism occupies a small area in the image. For example, when a video terminal uses an algorithm network to perform a background change special effect on the target organism in real time, segmentation may fail because the target organism is far away from the video terminal (e.g., 2-5 meters).
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose video processing methods, apparatuses, devices and computer readable media to solve the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a video processing method, including: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set; generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set
In a second aspect, some embodiments of the present disclosure provide a video processing apparatus, the apparatus comprising: a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of a second frame image associated with the first frame image; a second determination unit configured to determine a connected region set corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first set of rectangular frames representing a corresponding position of the target object corresponding to the first frame image, based on the set of connected regions; and a second generating unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.
In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.
In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.
One of the above-described various embodiments of the present disclosure has the following advantageous effects: first, it is determined whether the first frame image is a target frame image in the target video, and different segmentation methods may be applied to the first frame image according to whether the first frame image is the target frame image in the target video. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second segmented image set of second frame images associated with the first frame image is determined. And obtaining the basis for the second segmentation image set to be used for segmenting the first frame image. Then, a connected region set corresponding to the second frame image can be specified by the second divided image set. Further, a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first segmentation image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The video processing method determines the position of a target image in the first frame image through the connected region set and the first rectangular frame set, so that the region to be segmented is reduced, and the occupation ratio of the target image in the region to be segmented is increased. Furthermore, the segmentation of the target organism in the frame image included in the target video can be simply and accurately realized.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1-2 are schematic diagrams of an application scene diagram of a video processing method of some embodiments of the present disclosure;
fig. 3 is a flow diagram of some embodiments of a video processing method according to the present disclosure;
FIG. 4 is a flow diagram of further embodiments of a video processing method according to the present disclosure;
fig. 5 is a schematic block diagram of some embodiments of a video processing apparatus according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1-2 are schematic diagrams of one application scenario of a video processing method according to some embodiments of the present disclosure.
As shown in fig. 1, reference numeral 102 may be a target frame image 102 in a target video, reference numeral 103 may be a second frame image 103, and reference numeral 104 may be a first frame image 104.
As shown in fig. 2, as an example, in response to detecting that the first frame image 104 is not the target frame image 102 in the target video, the electronic device 101 may determine the second divided image 105 of the second frame image 103 associated with the above-described first frame image 104. Alternatively, the electronic device 101 may determine the second divided image 105 of the second frame image 103 associated with the first frame image 104 through an image division network. Then, the connected component 106 corresponding to the second frame image is specified by the second divided image 103. Optionally, the binarized image may be obtained by binarizing the second segmented image. The white area in the binarized image is then taken as the connected region 106. Further, a first rectangular frame 107 representing the position corresponding to the target object corresponding to the first frame image 104 is generated from the connected component 106. Finally, a first divided image 108 of the first frame image 104 is generated based on the first rectangular frame 107. Optionally, an image segmentation algorithm may be used to segment the image corresponding to the first rectangular frame 107, so as to obtain a first segmented image 108 of the first frame image 104.
It should be noted that the video processing method may be executed by the electronic device 101. The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device 101 is embodied as software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.
With continued reference to fig. 3, a flow 300 of some embodiments of a video processing method according to the present disclosure is shown. The video processing method comprises the following steps:
step 301, in response to detecting that the first frame image is not the target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image.
In some embodiments, in response to detecting that the first frame image is not a target frame image in the target video, an executing subject of the video processing method (e.g., the electronic device 101 shown in fig. 2) may determine a second set of segmented images of a second frame image associated with the first frame image. The first frame image and the second frame image may be obtained from a frame image sequence corresponding to a target video. The first frame image may be in a time-series relationship with the second frame image. In the frame image sequence, the second frame image may be, for example, a frame image that is previous to the first frame image. The target frame image may be an image previously designated in the frame image sequence. It should be noted that the pre-calibrated images may be marked according to a predetermined number of frame intervals in the frame image sequence.
As an example, the first frame image may be input to a pre-trained target detection network, and it may be detected whether the first frame image is a target frame image in the target video. In response to detecting that the first frame image is not the target frame image in the target video, the target object segmentation may be performed on a frame image that is previous to the first frame image, and the obtained segmentation result is used as a second segmentation image set of a second frame image.
Step 302 is to determine a connected region set corresponding to the second frame image based on the second divided image set.
In some embodiments, the execution subject may determine a set of connected regions corresponding to the second frame image according to the second segmented image set. The Connected Component generally refers to an image area (Blob) composed of foreground pixels having the same pixel value and adjacent positions in the image. For example, the second frame image may be marked by locating the inner and outer contours of the connected region using a marking algorithm used in the open source library cvBlob, so as to obtain a connected region set corresponding to the second frame image.
Step 303 is to generate a first rectangular frame set representing a position corresponding to the target object, which corresponds to the first frame image, based on the connected region set.
In some embodiments, the executing subject may generate a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object based on the set of connected regions. As an example, first, a bounding box for each connected region in the set of connected regions may be determined. Then, a minimum rectangular frame surrounding the surrounding frame of each connected region is determined as a rectangular frame corresponding to the first frame image.
In some optional implementations of some embodiments, generating a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object may include:
firstly, removing the connected regions with the area smaller than a preset threshold value in the connected region set. As an example, an average of the connected region areas in the set of connected regions may be determined first. And then, removing the communication regions with the area of the communication regions smaller than the average value in the communication region set.
And secondly, determining a surrounding frame of each communication area in the removed communication area set.
And a third step of generating a rectangular frame as a rectangular frame corresponding to the first frame image based on the bounding frame of each connected region. As an example, a minimum rectangular frame of the bounding frame that bounds each of the connected regions described above may be determined as the rectangular frame corresponding to the first frame image described above.
Step 304, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, so as to obtain a first segmented image set.
In some embodiments, the executing entity may generate a first segmented image of the first frame image based on each rectangular frame in the first set of rectangular frames, resulting in a first set of segmented images. As an example, first, a total rectangular frame surrounding the first rectangular frame set may be determined according to each rectangular frame in the first rectangular frame set. Then, image segmentation is performed on the image corresponding to the total rectangular frame to generate a first segmented image of the first frame image, and a first segmented image set is obtained.
In some optional implementations of some embodiments, the method further comprises: first, the first divided images in the first divided image set may be combined to obtain a combined image. And then, adding a target background to the combined image to obtain the combined image with the target background. As an example, combining the respective first divided images in the first divided image set may be performed by determining positions of the respective first divided images and then combining the respective divided images in accordance with the positions. It should be noted that, adding the target background to each of the first divided images after being combined in the target video may achieve the effect of adding a special effect to the target object in the target video.
In some optional implementations of some embodiments, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, and obtaining the first segmented image set may be: and inputting the image corresponding to each rectangular frame into a pre-trained image segmentation network, and outputting a first segmentation image to obtain a first segmentation image set. Wherein, the image segmentation network may be one of the following: FCN Network (full convolutional Network), SegNet Network (Semantic Segmentation Network), deep lab Semantic Segmentation Network, PSPNet Network (Semantic Segmentation Network), Mask-RCNN Network (Mask-Region-CNN, image instance Segmentation Network).
Optionally, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set may include the following steps:
the first step is to crop the first frame image based on the rectangular frame corresponding to the first frame image, and generate a rectangular image set corresponding to the first frame image.
And secondly, performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.
It can be seen from the foregoing embodiments that, first, it is determined whether the first frame image is the target frame image in the target video, and as a result of determining whether the first frame image is the target frame image in the target video, different segmentation methods may be applied to the first frame image. Further, in response to detecting that the first frame image is not a target frame image in the target video, a second segmented image set of second frame images associated with the first frame image is determined. And obtaining the basis for the second segmentation image set to be used for segmenting the first frame image. Then, a connected region set corresponding to the second frame image can be specified by the second divided image set. Further, a first rectangular frame set representing a position corresponding to the target object corresponding to the first frame image is generated. And accurately determining the position of the target image in the first frame image through the connected region set and the first rectangular frame set. And finally, generating a first segmentation image of the first frame image through each rectangular frame in the first rectangular frame set to obtain a first segmentation image set. The video processing method determines the position of a target image in the first frame image through the connected region set and the first rectangular frame set, so that the region to be segmented is reduced, and the occupation ratio of the target image in the region to be segmented is increased. Furthermore, the segmentation of the target organism in the frame image included in the target video can be simply and accurately realized.
With continued reference to fig. 4, a flow 400 of further embodiments of a video processing method according to the present disclosure is shown. The video processing method comprises the following steps:
step 401, in response to detecting that the first frame image is not the target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image.
Step 402 of determining a connected region set corresponding to the second frame image based on the second divided image set.
Step 403 is performed to generate a first rectangular frame set representing a position corresponding to the target object, corresponding to the first frame image, based on the connected region set.
Step 404, generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set, to obtain a first segmented image set.
In some embodiments, the specific implementation and technical effects of steps 401 and 404 can refer to steps 301 and 304 in those embodiments corresponding to fig. 3, which are not described herein again.
Step 405, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object.
In some embodiments, in response to detecting that the first frame image is the target frame image, an executing entity (e.g., the electronic device 101 shown in fig. 2) of the video processing method may perform target detection on the first frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. As an example, in response to detecting that the first frame image is the target frame image, performing target detection on the first frame image, and performing target detection on the first frame image by using a Histogram of gradients (HOG), a second rectangular frame set representing a corresponding position of the target object may be obtained.
In some optional implementations of some embodiments, obtaining the second set of rectangular boxes may be: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object. Wherein, the target detection network may be one of the following: SSD (Single Shot MultiBoxDector) algorithm, R-CNN (Region-conditional Neural Networks) algorithm, Fast R-CNN (Fast propagation-conditional Neural Networks) algorithm, SPP-NET (spatial Pyramid Networks) algorithm, YOLO (young Only Look one) algorithm, FPN (feature Neural Networks) algorithm, DCN (Deformable ConvNet) algorithm, RetinaNet target detection algorithm.
Step 406, generating a third segmented image of the first frame image based on each rectangular frame in the second rectangular frame set, so as to obtain a third segmented image set.
In some embodiments, the specific implementation of step 406 and the technical effect brought by the implementation may refer to step 304 in those embodiments corresponding to fig. 3, and are not described herein again.
As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 3, the process 400 of the video processing method in some embodiments corresponding to fig. 4 highlights a specific step of detecting that the first frame image is the target frame image and obtaining the third segmented image set. Therefore, the embodiments enable the target organism in the frame image included in the target video to be segmented more accurately and effectively.
With continuing reference to fig. 5, as an implementation of the above-described method for the above-described figures, the present disclosure provides some embodiments of a video processing apparatus, which correspond to those of the above-described method embodiments of fig. 3, and which may be particularly applicable to various electronic devices.
As shown in fig. 5, the video processing apparatus 500 of some embodiments includes: a first determining unit 501, a second determining unit 502, a first generating unit 503, and a second generating unit 504. Wherein the first determining unit 501 is configured to determine a second segmented image set of second frame images associated with the first frame image in response to detecting that the first frame image is not a target frame image in the target video. A second determining unit 502 configured to determine a connected region set corresponding to the second frame image based on the second divided image set. A first generating unit 503 configured to generate a first set of rectangular frames representing the corresponding position of the target object corresponding to the first frame image based on the set of connected regions. A second generating unit 504 configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.
In some optional implementations of some embodiments, the apparatus 500 may further include: a combining unit and an adding unit (not shown in the figure). The combining unit may be configured to combine the first segmented images in the first segmented image set to obtain a combined image. The adding unit may be configured to add the target background to the combined image, resulting in the combined image after the target background is added.
In some optional implementations of some embodiments, the apparatus 500 may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform target detection on the first frame image in response to detecting that the first frame image is the target frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.
In some optional implementations of some embodiments, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.
In some optional implementations of some embodiments, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.
In some optional implementations of some embodiments, the first generating unit 503 may be further configured to: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.
In some optional implementations of some embodiments, the second generating unit 504 may be further configured to: and cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.
It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the electronic device of fig. 2) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (hypertext transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a set of rectangular frames corresponding to the first frame image and representing the corresponding position of the target object based on the set of connected regions; and generating a first segmentation image of the first frame image based on each rectangular frame in the rectangular frame set to obtain a first segmentation image set.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determining unit, a second determining unit, a first generating unit, and a second generating unit. Where the names of the cells do not in some cases constitute a limitation on the cells themselves, for example, the first determination unit may also be described as a "unit that determines a second divided image set of a second frame image associated with a first frame image described above in response to detecting that the first frame image is not a target frame image in the target video".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
According to one or more embodiments of the present disclosure, there is provided a video processing method including: in response to detecting that the first frame image is not a target frame image in the target video, determining a second segmented image set of second frame images associated with the first frame image; determining a connected region set corresponding to the second frame image based on the second segmented image set; generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set; and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set.
According to one or more embodiments of the present disclosure, the method further includes: combining each first segmentation image in the first segmentation image set to obtain a combined image; and adding a target background to the combined image to obtain the combined image with the target background.
According to one or more embodiments of the present disclosure, the method further includes: in response to the fact that the first frame image is detected to be the target frame image, performing target detection on the first frame image to obtain a second rectangular frame set representing the corresponding position of the target object; and generating a third segmentation image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third segmentation image set.
According to one or more embodiments of the present disclosure, the performing, in response to detecting that the first frame image is the target frame image, target detection on the first frame image to obtain a second rectangular frame set representing a corresponding position of the target object includes: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.
According to one or more embodiments of the present disclosure, the generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set includes: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.
According to one or more embodiments of the present disclosure, the generating a first rectangular frame set corresponding to the first frame image and representing a corresponding position of the target object based on the connected region set includes: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.
According to one or more embodiments of the present disclosure, the generating a first segmented image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmented image set includes: cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image; and performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.
According to one or more embodiments of the present disclosure, there is provided a video processing apparatus including: a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of a second frame image associated with the first frame image; a second determination unit configured to determine a connected region set corresponding to the second frame image based on the second divided image set; a first generation unit configured to generate a first set of rectangular frames representing a corresponding position of the target object corresponding to the first frame image, based on the set of connected regions; and a second generating unit configured to generate a first divided image of the first frame image based on each rectangular frame in the first rectangular frame set, resulting in a first divided image set.
According to one or more embodiments of the present disclosure, an apparatus may further include: a combining unit and an adding unit (not shown in the figure). The combining unit may be configured to combine the first segmented images in the first segmented image set to obtain a combined image. The adding unit may be configured to add the target background to the combined image, resulting in the combined image after the target background is added.
According to one or more embodiments of the present disclosure, an apparatus may further include: a detection unit and a third generation unit (not shown in the figure). The detection unit may be configured to perform target detection on the first frame image in response to detecting that the first frame image is the target frame image, and obtain a second rectangular frame set representing a corresponding position of the target object. The third generating unit may be configured to generate a third divided image of the first frame image based on each rectangular frame in the second set of rectangular frames, resulting in a third set of divided images.
According to one or more embodiments of the present disclosure, the detection unit may be further configured to: and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.
According to one or more embodiments of the present disclosure, the third generating unit may be further configured to: and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame, and outputting a first segmentation image to obtain a first segmentation image set.
According to one or more embodiments of the present disclosure, the first generating unit may be further configured to: removing the connected regions with the area smaller than a preset threshold value in the connected region set; determining a surrounding frame of each communication area in the removed communication area set; based on the bounding box of each connected region, a rectangular box is generated as a rectangular box corresponding to the first frame image.
According to one or more embodiments of the present disclosure, the second generating unit may be further configured to: and cropping the first frame image based on the rectangular frame corresponding to the first frame image to generate a rectangular image set corresponding to the first frame image. And performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments above.
According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method as described in any of the embodiments above.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A video processing method, comprising:
in response to detecting that a first frame image is not a target frame image in a target video, determining a second set of segmented images of a second frame image associated with the first frame image;
determining a set of connected regions corresponding to the second frame image based on the second set of segmented images;
generating a first rectangular frame set corresponding to the first frame image and representing the corresponding position of the target object based on the connected region set;
and generating a first segmentation image of the first frame image based on each rectangular frame in the first rectangular frame set to obtain a first segmentation image set.
2. The method of claim 1, wherein the method further comprises:
combining each first segmentation image in the first segmentation image set to obtain a combined image;
and adding a target background to the combined image to obtain the combined image with the target background.
3. The method of claim 1, wherein the method further comprises:
responding to the fact that the first frame image is detected to be the target frame image, carrying out target detection on the first frame image, and obtaining a second rectangular frame set representing the corresponding position of the target object;
and generating a third segmentation image of the first frame image based on each rectangular frame in the second rectangular frame set to obtain a third segmentation image set.
4. The method of claim 3, wherein the performing target detection on the first frame image in response to detecting that the first frame image is the target frame image to obtain a second rectangular frame set representing a corresponding position of the target object comprises:
and inputting the first frame image into a pre-trained target detection network to obtain a second rectangular frame set representing the corresponding position of the target object.
5. The method of claim 1, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes, resulting in a first set of segmented images, comprises:
and inputting the first frame image into a pre-trained image segmentation network based on each rectangular frame in the first rectangular frame set, and outputting a first segmentation image to obtain a first segmentation image set.
6. The method of claim 1, wherein the generating, based on the set of connected regions, a first set of rectangular boxes corresponding to the first frame image that characterize a corresponding location of the target object comprises:
removing the connected regions with the area smaller than a preset threshold value in the connected region set;
determining a surrounding frame of each communication area in the removed communication area set;
generating a rectangular frame as a first rectangular frame corresponding to the first frame image based on the bounding frame of each connected region.
7. The method of claim 6, wherein the generating a first segmented image of the first frame image based on each rectangular box in the first set of rectangular boxes resulting in a first set of segmented images comprises:
based on the rectangular frame corresponding to the first frame image, cutting the first frame image to generate a rectangular image set corresponding to the first frame image;
and performing target object segmentation on each rectangular image in the rectangular image set corresponding to the first frame image to generate a first segmented image, so as to obtain a first segmented image set.
8. A video processing apparatus comprising:
a first determination unit configured to determine, in response to detecting that a first frame image is not a target frame image in a target video, a second divided image set of second frame images associated with the first frame image;
a second determination unit configured to determine a set of connected regions corresponding to the second frame image based on the second set of divided images;
a first generating unit configured to generate a first set of rectangular frames corresponding to the first frame image and representing a corresponding position of the target object based on the set of connected regions;
a second generating unit configured to generate a first segmented image of the first frame image based on each rectangular frame in the rectangular frame set, resulting in a first segmented image set.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any one of claims 1-7.
CN202010711781.0A 2020-07-22 2020-07-22 Video processing method, apparatus, electronic device and computer readable medium Active CN111815656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010711781.0A CN111815656B (en) 2020-07-22 2020-07-22 Video processing method, apparatus, electronic device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010711781.0A CN111815656B (en) 2020-07-22 2020-07-22 Video processing method, apparatus, electronic device and computer readable medium

Publications (2)

Publication Number Publication Date
CN111815656A true CN111815656A (en) 2020-10-23
CN111815656B CN111815656B (en) 2023-08-11

Family

ID=72862010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010711781.0A Active CN111815656B (en) 2020-07-22 2020-07-22 Video processing method, apparatus, electronic device and computer readable medium

Country Status (1)

Country Link
CN (1) CN111815656B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107123131A (en) * 2017-04-10 2017-09-01 安徽清新互联信息科技有限公司 A kind of moving target detecting method based on deep learning
CN107507221A (en) * 2017-07-28 2017-12-22 天津大学 With reference to frame difference method and the moving object detection and tracking method of mixed Gauss model
CN108447049A (en) * 2018-02-27 2018-08-24 中国海洋大学 A kind of digitlization physiology organism dividing method fighting network based on production
CN108960090A (en) * 2018-06-20 2018-12-07 腾讯科技(深圳)有限公司 Method of video image processing and device, computer-readable medium and electronic equipment
CN110796664A (en) * 2019-10-14 2020-02-14 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107123131A (en) * 2017-04-10 2017-09-01 安徽清新互联信息科技有限公司 A kind of moving target detecting method based on deep learning
CN107507221A (en) * 2017-07-28 2017-12-22 天津大学 With reference to frame difference method and the moving object detection and tracking method of mixed Gauss model
CN108447049A (en) * 2018-02-27 2018-08-24 中国海洋大学 A kind of digitlization physiology organism dividing method fighting network based on production
CN108960090A (en) * 2018-06-20 2018-12-07 腾讯科技(深圳)有限公司 Method of video image processing and device, computer-readable medium and electronic equipment
CN110796664A (en) * 2019-10-14 2020-02-14 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111815656B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN110516678B (en) Image processing method and device
CN109829432B (en) Method and apparatus for generating information
CN109255337B (en) Face key point detection method and device
CN109377508B (en) Image processing method and device
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN108510084B (en) Method and apparatus for generating information
CN111325792B (en) Method, apparatus, device and medium for determining camera pose
CN110310299B (en) Method and apparatus for training optical flow network, and method and apparatus for processing image
CN111757100B (en) Method and device for determining camera motion variation, electronic equipment and medium
CN112150490A (en) Image detection method, image detection device, electronic equipment and computer readable medium
CN115272182B (en) Lane line detection method, lane line detection device, electronic equipment and computer readable medium
CN109919220B (en) Method and apparatus for generating feature vectors of video
CN113658196A (en) Method and device for detecting ship in infrared image, electronic equipment and medium
CN110636331B (en) Method and apparatus for processing video
CN110852250B (en) Vehicle weight removing method and device based on maximum area method and storage medium
CN112150491A (en) Image detection method, image detection device, electronic equipment and computer readable medium
CN111783777A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN112085733A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113435393B (en) Forest fire smoke root node detection method, device and equipment
CN111784709B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111815656B (en) Video processing method, apparatus, electronic device and computer readable medium
CN111898529B (en) Face detection method and device, electronic equipment and computer readable medium
CN111815654A (en) Method, apparatus, device and computer readable medium for processing image
CN112085035A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111726476A (en) Image processing method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant