CN115082682A

CN115082682A - Image segmentation method and device

Info

Publication number: CN115082682A
Application number: CN202210827424.XA
Authority: CN
Inventors: 胡彦强; 周全; 郝平昌
Original assignee: Qingdao Xinxin Microelectronics Technology Co Ltd
Current assignee: Qingdao Xinxin Microelectronics Technology Co Ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-09-20

Abstract

The application discloses an image segmentation method and device, which are used for solving the problem that in the prior art, the real-time segmentation precision is not high. The method provided by the application comprises the following steps: acquiring a first area where at least one target object is located in a previous image frame of a current image frame; determining the position of a first motion estimation area according to a first area where the at least one target object is located; determining a position of a second motion estimation region in the current image frame according to the position of the first motion estimation region; identifying the at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtaining a second area where the at least one target object is located; and segmenting a second area where the at least one target object is located in the current image frame.

Description

Image segmentation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image segmentation method and apparatus.

Background

With the development of multimedia technology and the gradual rise of applications such as short video and live broadcast, more and more attention is paid to portrait segmentation technology. In general, the higher the image resolution, the more details of the foreground portrait region, the higher the segmentation accuracy, and the larger the calculation workload. However, the power consumption and the performance of the chip of the electronic device are limited, and the power consumption is often controlled to achieve the purpose of real-time segmentation, thereby reducing the segmentation accuracy.

Disclosure of Invention

The embodiment of the application provides an image segmentation method and device, which are used for solving the problem of low real-time segmentation precision in the prior art.

In a first aspect, an embodiment of the present application provides an image segmentation method, including:

acquiring a first area where at least one target object is located in a previous image frame of a current image frame; determining the position of a first motion estimation area according to a first area where the at least one target object is located, wherein the first motion estimation area covers the first area where any target object in the at least one target object is located; determining the position of a second motion estimation area in the current image frame according to the position of the first motion estimation area, wherein the position of the second motion estimation area in the current image frame is the same as the position of the first motion estimation area in the previous image frame; identifying the at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtaining a second area where the at least one target object is located; and segmenting a second area where the at least one target object is located in the current image frame.

Based on the scheme, the position of the motion estimation area of the current image frame is determined according to the position of the motion estimation area of the previous image frame, and then the target object is identified for the motion estimation area of the current image frame. The computational effort is reduced compared to image segmentation of the entire image. In addition, the scheme performs image segmentation on the image in the motion region estimation, so that the segmentation precision can be greatly improved, and the pixel ratio of a human image region in a segmentation region can be improved.

In a possible implementation manner, the determining a first motion estimation region according to a first region in which the at least one target object is located in the previous image frame includes:

determining a first target frame of the target object, wherein the first target frame is a minimum bounding rectangle of a first area of the target object;

and carrying out boundary expansion on the first target frame to obtain the first motion estimation area.

In a possible implementation manner, the determining, according to a first area where the at least one target object is located, a first motion estimation area in the previous image frame, where the number of target objects in the previous image frame is N, where N is a positive integer, includes:

determining a second target frame according to the first areas where the N target objects are respectively located, wherein the second target frame is a minimum circumscribed rectangle comprising the first areas corresponding to the N target objects respectively;

and carrying out boundary expansion on the second target frame to obtain the first motion estimation area.

By the scheme, the first motion estimation area is determined after the boundary of the target frame of the previous image frame is expanded, so that the position of the second motion estimation area in the current image frame determined by the position of the first motion estimation area can include all target objects, and the accuracy of image segmentation is improved.

In one possible implementation, the position of the first motion estimation region satisfies a condition shown by the following formula:

wherein x is _{c_left} ，y _{c_top} Representing the coordinates of the top left corner vertex, x, of the rectangle corresponding to the motion region estimate in said previous image frame _{c_right} ，y _{c_bottom} Representing coordinates of a vertex of a lower right corner of a minimum bounding rectangle in the previous image frame; x is the number of _{d_left} ，y _{d_top} Representing the coordinates of the top left corner vertex of the minimum bounding rectangle in the previous image frame; x is the number of _{d_right} ，y _{d_bottom} Representing the coordinates of the vertex of the lower right corner of the minimum bounding rectangle in the previous image frame; w, h representing the previous image framePixel width and pixel height; α, β represent setting coefficients of the boundary expansion.

In a possible implementation, the method further includes: determining that a current image frame is not a first image frame in any detection cycle before acquiring a first region in which at least one target object is located in a previous image frame of the current image frame.

In a possible implementation, the method further includes: when the current image frame is the first image frame in any detection period, identifying the at least one target object for the current image frame, and obtaining third areas where the at least one target object is respectively located; segmenting the third region in the current image frame.

Based on the scheme, when the current image frame is the first image frame of the detection period, the target object is identified for the current image frame so as to correct the area where the target object is located in the current image frame and improve the image segmentation precision.

In a possible implementation manner, the identifying the at least one target object in the second motion estimation region according to the position of the second motion estimation region to obtain a second region where the at least one target object is located includes: and taking the position of the second motion estimation area as the input of a neural network model, and identifying the at least one target object in the second motion estimation area through the neural network model to obtain a second area where the at least one target object is located.

In a second aspect, an embodiment of the present application provides an image segmentation apparatus, including:

the acquisition module is used for acquiring a first area where at least one target object is located in a previous image frame of a current image frame;

a determining module, configured to determine a position of a first motion estimation area according to a first area where the at least one target object is located, where the first motion estimation area covers the first area where any one object of the at least one object is located; determining a position of a second motion estimation region in the current image frame according to the position of the first motion estimation region, wherein the position of the second motion estimation region in the current image frame is the same as the position of the first motion estimation region in the previous image frame;

the identification module is used for identifying the at least one target object in the second motion estimation area according to the position of the second motion estimation area to obtain a second area where the at least one target object is located;

and the segmentation module is used for segmenting a second area where the at least one target object is located in the current image frame.

In a possible implementation manner, the number of the target objects in the previous image frame is one, and the determining module, when determining the first motion estimation area according to the first area where the at least one target object is located, is specifically configured to: determining a first target frame of the target object, wherein the first target frame is a minimum bounding rectangle of a first area of the target object; and carrying out boundary expansion on the first target frame to obtain the first motion estimation area.

In a possible implementation manner, the number of target objects in the previous image frame is N, where N is a positive integer, and when determining the first motion estimation area according to the first area where the at least one target object is located, the determining module is specifically configured to: determining a second target frame according to the first areas where the N target objects are respectively located, wherein the second target frame is a minimum circumscribed rectangle comprising the first areas corresponding to the N target objects respectively; and carrying out boundary expansion on the second target frame to obtain the first motion estimation area.

wherein x is _{c_left} ，y _{c_top} Representing a rectangle corresponding to an estimate of a motion region in said previous image frameVertex coordinates of upper left corner, x _{c_right} ，y _{c_bottom} Representing coordinates of a vertex of a lower right corner of a minimum bounding rectangle in the previous image frame; x is the number of _{d_left} ，y _{d_top} Representing the coordinates of the top left corner vertex of the minimum bounding rectangle in the previous image frame; x is the number of _{d_right} ，y _{d_bottom} Representing the coordinates of the vertex of the lower right corner of the minimum bounding rectangle in the previous image frame; w, h represents the pixel width and the pixel height of the previous image frame; α, β represent setting coefficients of the boundary expansion.

In a possible implementation manner, the determining module is further configured to: determining that a current image frame is not a first image frame in any detection cycle before acquiring a first region in which at least one target object is located in a previous image frame of the current image frame.

In a possible implementation manner, the identification module is further configured to: when the current image frame is the first image frame in any detection period, identifying the at least one target object for the current image frame, and obtaining third areas where the at least one target object is respectively located;

the segmentation module is further configured to segment the third region in the current image frame.

In a possible implementation manner, when the identifying module identifies the at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtains a second area where the at least one target object is located, the identifying module is specifically configured to: and taking the position of the second motion estimation area as the input of a neural network model, and identifying the at least one target object in the second motion estimation area through the neural network model to obtain a second area where the at least one target object is located.

In a third aspect, an embodiment of the present application provides an execution apparatus, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing the methods in the first aspect and different implementation manners in the first aspect according to the obtained program instructions.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method according to the first aspect and the different implementations of the first aspect.

In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to the technical effects brought by the first aspect and different implementation manners of the first aspect, and details are not described here.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a block diagram of an image segmentation method according to an embodiment of the present disclosure;

fig. 2 is a schematic view of a usage scenario of a display device according to an embodiment of the present application;

fig. 3 is a block diagram of a configuration of a control device 100 according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a hardware configuration of a display device 200 according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a software architecture of a terminal device according to an embodiment of the present application;

FIG. 6A is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 6B is a schematic diagram of another system architecture according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic flowchart of an image segmentation method according to an embodiment of the present application;

fig. 9 is a schematic diagram of a method for determining a first target frame according to an embodiment of the present disclosure;

fig. 10 is a schematic diagram of a first motion estimation region according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a target object provided by an embodiment of the present application;

fig. 12 is a schematic diagram of a second target frame determining method according to an embodiment of the present application;

fig. 13 is a schematic diagram of another first motion region estimation provided in the embodiment of the present application;

fig. 14 is a schematic diagram illustrating a position of a first motion estimation region according to an embodiment of the present application;

fig. 15 is a schematic diagram illustrating another position of a first motion estimation region according to an embodiment of the present application;

FIG. 16 is a schematic flow chart of another image segmentation provided in the embodiments of the present application;

FIG. 17 is a schematic diagram of a target box boundary provided by an embodiment of the present application;

fig. 18 is a schematic flowchart of image segmentation in a detection period according to an embodiment of the present disclosure;

FIG. 19 is a schematic view of another exemplary embodiment of a process for image segmentation during a detection period;

fig. 20 is a schematic diagram of an image segmentation apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Compared with tasks such as artificial intelligence image classification, detection and key point positioning, the image segmentation needs more calculation power. In general, the greater the resolution of the image, the more detail of the foreground portrait region, the higher the segmentation accuracy, and the greater the computational effort. However, the power consumption and the performance of the chip of the electronic device are limited, and the power consumption is often controlled to achieve the purpose of real-time segmentation, thereby reducing the segmentation accuracy.

In order to solve the above problem, an embodiment of the present application provides an image segmentation method and an image segmentation apparatus, where a first region where at least one target object is located in a previous image frame of a current image frame is obtained, and a position of a first motion estimation region is determined according to the first region. And determining the position of a second motion estimation area in the current image frame according to the position of the first motion estimation area, identifying a target object in the second motion estimation area according to the position of the second motion estimation area, obtaining a second area where the target object is located, and further segmenting the second area where the target object is located in the current image frame. In some scenarios, a detection period may be set when performing image segmentation. When image segmentation is performed, a current image frame and a frame number may be acquired, and whether the current image frame is the first image frame in any detection period is determined according to the frame number. When the current image frame is the first image frame in the detection period, a target detection algorithm is executed to obtain the estimated position of the motion region. When the current image frame is not the first image frame in the detection period, the position of the motion estimation region of the current image frame is determined according to the region of the target object of the previous image frame of the current image frame. Further, the position of the motion estimation area of the previous image frame and the current image frame may be used as inputs of the neural network model, and the target object identification may be performed on the motion estimation area of the current image frame to obtain an area where at least one target object is located, so as to segment the area where at least one target object is located from the current image frame, as shown in fig. 1.

The image segmentation method provided by the embodiment of the application can be realized by an execution device. In some embodiments, the enforcement device may be a terminal device. The terminal device may be a display device having a display function. The display device may include: smart televisions, cell phones, tablet computers, and the like.

The structure and application scenario of the execution device are described below by taking the execution device as a display device as an example. Fig. 2 is a schematic diagram of a usage scenario of the display device in the embodiment. As shown in fig. 2, the display apparatus 200 may also perform data communication with the server 400, and the user may operate the display apparatus 200 through the smart device 300 or the control device 100. In one possible example, the undivided image may be transmitted by the server 400 to the display device 200, and the display device 200 performs the image segmentation method.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.

In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.

In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.

In some embodiments, the smart device 300 and the display device 200 may also be used for communication of data.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, and a memory. The control apparatus 100 may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive to the display device 200 to mediate interaction between the user and the display device 200.

In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.

In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.

The embodiment will be specifically described below by taking the display device 200 as an example. It should be understood that the display apparatus 200 shown in fig. 4 is only an example, and the display apparatus 200 may have more or less components than those shown in fig. 4, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

Fig. 4 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a central processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, the detector 230 includes a light receiver (not shown), a sensor for collecting the intensity of ambient light; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation of connecting to a hyperlink page, document, image, etc., or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

The CPU processor is a control center of the display device 200, and includes a system on chip SOC, as shown in fig. 4, for executing an operating system and application program instructions stored in the memory, and executing various application programs, data, and contents according to various interactive instructions received from the outside, so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and other video processing according to a standard codec protocol of the input signal, so as to obtain a signal that can be directly displayed or played on the display device 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform at least one of noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical buttons on the body of the display device, or the like).

In some embodiments, the system of the display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 5, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer, respectively, from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 5, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 5, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

In other embodiments, the execution device may be an electronic device, the electronic device may be implemented by one or more servers, and the servers may be local servers or cloud servers. Referring to fig. 6A, the server 500 may be implemented by a physical server or a virtual server. The server can be realized by a single server, can be realized by a server cluster formed by a plurality of servers, and can realize the image segmentation method provided by the application through the single server or the server cluster. In fig. 6A, the server 500 is connected to the terminal device 600 and the display device 200 as an example. The server 500 may perform an image segmentation method. In some scenarios, the server 500 may also receive the image segmentation task sent by the terminal device 600 or send the image segmentation result to the terminal device 600. In other scenarios, the server 500 may also receive an image segmentation task sent by the display device 200 and perform image segmentation, or display a segmented image through the display device 200. As shown in fig. 6B, the server 500 is connected to the display apparatus 200 as an example. The server 500 may perform an image segmentation method. In some scenarios, the server 500 may receive the image segmentation task sent by the display device 200, segment according to the image segmentation task, and send the segmented image to the display device 200. The electronic device may also be a personal computer, a handheld or laptop device, a mobile device (such as a mobile phone, a tablet, a personal digital assistant, and the like).

As an example, referring to fig. 7, an electronic device may include a processor 510, a communication interface 520. The electronic device may also include memory 530. Of course, other components, not shown in fig. 7, may also be included in the electronic device.

The communication interface 520 is used for communicating with the display device, and is used for receiving an image segmentation task sent by the display device or sending an image segmentation result to the display device.

In the embodiments of the present application, the processor 510 may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The processor 510 is a control center of the electronic device, connects various parts of the electronic device using various interfaces and routes, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 530 and calling data stored in the memory 530. Alternatively, processor 510 may include one or more processing units. The processor 510 may be a control component such as a processor, a microprocessor, a controller, etc., and may be, for example, a general purpose Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processing (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.

The memory 530 may be used to store software programs and modules, and the processor 510 executes various functional applications and data processing by operating the software programs and modules stored in the memory 530. The memory 530 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Memory 530, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 530 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 530 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 530 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.

It should be noted that the structures shown in fig. 2 to 7 are only examples, and the embodiments of the present invention do not limit this.

An embodiment of the present application provides an image segmentation method, and fig. 8 exemplarily shows a flow of the image segmentation method, which may be performed by an execution device, which may be the display device 200 shown in fig. 4, and specifically may perform image segmentation by the controller 250 in the display device 200. Alternatively, the executing device may be the electronic device shown in fig. 7, and the image segmentation may be specifically executed by the processor 510 in the electronic device. The specific process is as follows:

801, a first area where at least one target object is located in a previous image frame of a current image frame is acquired.

In some embodiments, the execution device obtains a current image frame and a frame number of the current image frame. Wherein, the current image frame may be an RGB image. When image segmentation is performed, a detection period is set, and whether the current image frame is the first image frame of the detection period can be determined according to the frame number of the image frame. Determining that the current image frame is not the first image frame of any detection cycle before acquiring the first region in which the at least one target object is located in the previous image frame of the current image frame. In some scenarios, when the image frames are acquired by the acquisition device, the frame images are numbered sequentially from 0. The detection period may be set to θ, and the frame number of the image frame may be represented by id. When the frame number of the current image frame meets the condition that id% theta is not equal to 0, a first area where at least one target object is located in the last image frame of the current image frame is obtained. For example, the frame number of the current image frame is 36, and the detection period is 25. When it is determined that 36% 25 ≠ 0, a first region in which at least one target object is located in a previous image frame of the current image frame is acquired. In other scenes, when the frame number of the current image frame satisfies the condition shown in the following formula, it is determined that the current image frame is not the first image frame of any detection period:

θ×(m-1)-1≠id；

where θ denotes a detection period, m denotes an mth detection period, and id denotes a frame number of the current image frame.

As an example, the frame rate of the image frames is 25fps, and the execution device receives 25 image frames per second. When the set detection period duration is 1 second, the detection period of the image frame is 25. When the frame number of the current image frame is 45, it may be determined that the frame numbers of the first image frames of the 1 st, 2 nd, and 3 rd detection periods are 0, 25, and 50, respectively, according to the detection period. Therefore, it may be determined that the frame number of the current image frame is not the frame number of the first image frame in the detection period, and the first region where the at least one target object is located in the previous image frame of the current image frame may be obtained. The target object may be a person, an animal, a plant, an article, etc., and this is not particularly limited in this application.

And 802, determining the position of a first motion estimation area according to the first area where at least one target object is located.

The first motion estimation area covers a first area where any one of the at least one target object is located.

In some embodiments, when a target object is included in the previous image frame, a first target frame of the target object in the previous image frame may be determined. The first target frame is a minimum bounding rectangle of a first region of a target object. It is understood that the first target frame may have other shapes, and the present application is not limited thereto. For example, the first target frame may also be a circle, and the obtained circular area may be the first target frame by using the geometric center of the first area of the target object as an origin and setting the length radius. As an example, as shown in fig. 9, when the target object included in the previous image frame is a portrait, a first region of the portrait in the previous image frame may be determined, and then a minimum bounding rectangle of the first region may be used as the first target frame. Further, the boundary expansion may be performed on the first target frame to obtain the position of the first motion estimation region. The first motion estimation region satisfies a condition shown by the following formula:

wherein x is _{c_left} ，y _{c_top} Representing the coordinates of the top left corner vertex, x, of the rectangle corresponding to the motion region estimate in the previous image frame _{c_right} ，y _{c_bottom} Representing the coordinates of the vertex of the lower right corner of the minimum circumscribed rectangle in the previous image frame; x is the number of _{d_left} ，y _{d_top} Representing the coordinates of the top left corner vertex of the minimum circumscribed rectangle in the previous image frame; x is the number of _{d_right} ，y _{d_bottom} Representing the coordinates of the vertex of the lower right corner of the minimum bounding rectangle in the previous image frame; w, h represent the pixel width and pixel height of the previous image frame; α, β represent setting coefficients of the boundary expansion.

In a possible example, the determination of the motion estimation area may also be determined by an algorithm such as optical flow prediction, which is not specifically limited in this application.

As an example, after determining a first target frame corresponding to a portrait in a previous image frame, the first target frame may be subjected to boundary expansion to obtain a position of a first motion estimation region, as shown in fig. 10.

In other embodiments, the previous image frame may include a plurality of target objects. When the number of the target objects in the previous image frame is N, where N is a positive integer, the second target frame may be determined according to the first regions in which the N target objects are respectively located. The second target frame is a minimum circumscribed rectangle of the first area, which comprises N target objects respectively corresponding to the first area. As an example, the number of the target objects in the previous image frame is 3, and 3 target objects respectively correspond to the first regions, as shown in fig. 11. A minimum bounding rectangle including the first regions corresponding to the 3 target objects, respectively, may be determined as the second target frame, as shown in fig. 12. Further, the boundary expansion of the second target frame may be performed to obtain the position of the first motion estimation region. As an example, as shown in fig. 13, a second target frame determined by first regions corresponding to 3 target objects respectively may be subjected to boundary expansion to obtain an estimated position of the first motion region.

803, the position of the second motion estimation region in the current image frame is determined according to the position of the first motion estimation region.

In some embodiments, after determining the location of the first motion estimation region, the location of the second motion estimation region in the current image frame may be determined based on the location of the first motion estimation. Wherein the position of the second motion estimation region in the current image frame is the same as the position of the first motion estimation region in the previous image frame. Specifically, the position of the second motion estimation region in the current image frame may be determined according to the pixel position coordinates of the position of the first motion estimation region in the previous image frame. As an example, the position of the first motion estimation region in the previous image frame may be represented by coordinates of two pixel points on a diagonal line. For example, the position of the first motion estimation region in the previous image frame can be represented as ((376, 320), (542, 500)), and then the region surrounded by the pixel coordinates ((376, 320), (542, 500)) is the first motion estimation region. Further, the position of the second motion estimation region can be determined according to the pixel point coordinates estimated in the first motion region. For example, the position of the first motion estimation region can be represented by ((376, 320), (542, 500)), and then the region surrounded by the coordinates of the pixel points ((376, 320), (542, 500)) in the current image frame is the position of the second motion estimation region, as shown in fig. 14. It is to be understood that the position of the first motion estimation region can also be represented by the pixel coordinates of the four vertex positions of the first motion estimation region, as shown in fig. 15, which is not specifically limited in this application.

And 804, identifying at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtaining a second area where the at least one target object is located.

In some embodiments, the position of the second motion estimation region may be used as an input of a neural network model, and the target object may be identified in the second motion estimation region by the neural network model to obtain a second region where the at least one target object is located. For example, the target object may be identified for the second motion estimation region by a neural network model to obtain a mask map. Further, a softmax layer may be added to the output results of the neural network model and a threshold value set to obtain a binary mask map. The second region in which the at least one target object is located may be determined from the binary mask map. For example, after adding the softmax layer to the output of the neural network model, the output may be normalized to a probability distribution of 0-1. Further, a threshold may be set to obtain a binary mask map, and a second region in which the at least one target object is located may be determined according to the binary mask map. For example, the threshold may be set to 0.5, and if the probability distribution is less than 0.5, the mask value of the area with the probability distribution less than 0.5 is set to 0, and the area with the mask value of 0 is the background area. If the probability distribution is greater than or equal to 0.5, the mask value of the region with the probability distribution greater than or equal to 0.5 is set to 1, and the region with the mask value of 1 is the second region where the target object is located.

In other embodiments, the current image frame may be cropped according to the location of the second motion estimation region to obtain the first image. Further, the first image may be used as an input of a neural network model, and the target object may be identified in the first image by the neural network model to obtain a second region where the at least one target object is located.

At 805, a second region in which at least one target object is located is segmented in the current image frame.

In some embodiments, after determining the second region where the at least one target object is located, the at least one target object may be segmented from the current image according to the second region. As an example, when a target object is included in the current image frame, the target object may be segmented from the current image frame according to the second region of the target object. When two target objects are included in the current image frame, the two target objects may be segmented from the current image frame according to the second regions to which the two target objects respectively correspond.

Based on the scheme, the position of the motion estimation area of the current image frame is determined according to the position of the motion estimation area of the previous image frame, and then the target object is identified for the motion estimation area of the current image frame. The computational effort is reduced compared to image segmentation of the entire image. In addition, the scheme performs image segmentation on the image in the motion region estimation, so that the segmentation precision can be greatly improved, and the pixel ratio of a human image region in the segmentation region can be improved. In addition, the first motion estimation area is determined after the boundary expansion is carried out on the target frame of the previous image frame, so that the position of the second motion estimation area in the current image frame determined by the position of the first motion estimation area can be ensured to include all target objects, and the accuracy of image segmentation is improved.

In some embodiments, when the frame number of the current image frame satisfies id% θ ═ 0 or θ × (m-1) -1 ═ id, the current image frame is determined to be the first image frame in the detection period. When the current image frame is the first image frame in any detection period, at least one target object may be identified for the current image frame, and third regions in which the at least one target object is respectively located are obtained. When the current image frame is the first image frame in any detection period, the flow of the image segmentation method is as shown in fig. 16, which specifically includes the following steps:

and 1601, performing target detection on the current image frame to obtain target frame boundaries corresponding to at least one target object respectively.

In some embodiments, when the frame number of the current image frame is the first image frame in the detection period, object detection may be performed on the current image frame through an object detection model to obtain an object frame boundary corresponding to at least one object in the current image frame. It is to be understood that the target detection model may also be replaced by a model for roughly positioning a target object, such as semantic segmentation, a human skeleton point, and the like, which is not limited herein. As shown in fig. 17, when 2 target objects are included in the current image frame, the current image frame may be used as an input of the target detection model, and the target frame boundaries of the 2 target objects may be output through the target detection model. For example, the target detection model may determine that the target frame boundary of target object 1 is ((202, 320), (412, 540)), and the target frame boundary of target object 2 is ((380, 600), (510, 840)).

And determining 1602, according to the target detection frame of the at least one target, a position of a third motion estimation region of the current image frame.

In some embodiments, when a target object is included in the current image frame, the boundary of the target frame corresponding to the target object may be expanded to obtain the position of the third motion estimation region of the current image frame. When the current image frame includes N target objects, a target object region may be determined according to target frame boundaries of the N target objects, and the boundary of the target object region is expanded to determine a position of a third motion estimation region of the current image frame. The target object area is a minimum circumscribed rectangle including target frames corresponding to the N target objects respectively.

1603, identifying at least one target object for the third motion estimation area of the current image frame according to the position of the third motion estimation area, and obtaining third areas where the at least one target object is respectively located.

In some embodiments, the position of the third motion estimation region may be used as an input of a neural network model, and the target object may be identified in the third motion estimation region by the neural network model to obtain a third region where the at least one target object is located. For details, refer to step 804, which is not described herein.

In other embodiments, the current image frame may be cropped according to the location of the third motion estimation region to obtain the second image. Further, the second image may be used as an input of the neural network model, and the target object may be identified in the second image through the neural network model to obtain a third region where the at least one target object is located.

1604, a third region is segmented in the current image frame.

In some embodiments, after determining the third region where the at least one target object is located, the at least one target object may be segmented from the current image frame according to the third region. As an example, when a target object is included in the current image frame, the target object may be segmented from the current image frame according to a third region of the target object. When a plurality of target objects are included in the current image frame, the plurality of target objects may be segmented from the current image frame according to third regions to which the plurality of target objects respectively correspond.

Based on the scheme, when the current image frame is the first image frame of the detection period, the target detection is carried out on the current image frame so as to correct the area where the target object is located in the current image frame and improve the segmentation precision.

In some embodiments, in the detection period, the first image frame performs target detection, determines a target frame boundary where at least one target object is located, and performs boundary expansion on the target frame boundary to obtain a position of the motion estimation region of the first image frame. Further, the first image frame and the position of the motion estimation region may be used as input of a neural network model to obtain a region of at least one target object in the motion estimation region of the first image frame. And segmenting at least one target object region from the first image frame and outputting the segmented region, and storing the at least one target object region of the first image frame. When the execution device receives the second image frame, the region of the at least one target object of the first image frame may be acquired, and the position of the motion estimation region of the first image frame may be determined according to the region of the at least one target object in the first image frame. Further, the position of the second motion estimation area in the second image frame may be determined from the position of the motion estimation area of the first image frame, and the second image frame and the position of the second motion estimation area are input into the neural network model to obtain the area of the at least one target object in the second motion estimation area of the second image frame. Further, a region of at least one target object may be segmented from the second image frame and output, and the region of at least one target object of the second image frame may be saved, as shown in fig. 18. In some scenarios, when inputting the position of the motion estimation region and the image frame into the neural network model, the following steps can be substituted: the image frame is clipped according to the position of the motion estimation region, and the clipped image is input into the neural network model, as shown in fig. 19.

Based on the same technical concept, the embodiment of the present application provides an image segmentation apparatus 2000, as shown in fig. 20. The apparatus 2000 may implement any step of the image segmentation method, and is not described herein again to avoid repetition. The apparatus 2000 includes an acquisition module 2001, a determination module 2002, a recognition module 2003, and a segmentation module 2004.

An obtaining module 2001, configured to obtain a first region where at least one target object is located in a previous image frame of a current image frame;

a determining module 2002, configured to determine a position of a first motion estimation area according to a first area where the at least one target object is located, where the first motion estimation area covers the first area where any object in the at least one object is located; determining a position of a second motion estimation region in the current image frame according to the position of the first motion estimation region, wherein the position of the second motion estimation region in the current image frame is the same as the position of the first motion estimation region in the previous image frame;

an identifying module 2003, configured to identify the at least one target object in the second motion estimation region according to the position of the second motion estimation region, and obtain a second region where the at least one target object is located;

a segmenting module 2004, configured to segment a second region in the current image frame where the at least one target object is located.

In a possible implementation manner, when the number of the target objects in the previous image frame is one, the determining module 2002 is specifically configured to, when determining the first motion estimation area according to the first area where the at least one target object is located: determining a first target frame of the target object, wherein the first target frame is a minimum bounding rectangle of a first area of the target object; and carrying out boundary expansion on the first target frame to obtain the first motion estimation area.

In a possible implementation manner, the number of target objects in the previous image frame is N, where N is a positive integer, and the determining module 2002, when determining the first motion estimation area according to the first area where the at least one target object is located, is specifically configured to: determining a second target frame according to the first areas where the N target objects are respectively located, wherein the second target frame is a minimum circumscribed rectangle comprising the first areas corresponding to the N target objects respectively; and carrying out boundary expansion on the second target frame to obtain the first motion estimation area.

wherein x is _{c_left} ，y _{c_top} Representing the coordinates of the top left corner vertex, x, of the rectangle corresponding to the motion region estimate in said previous image frame _{c_right} ，y _{c_bottom} Representing coordinates of a vertex of a lower right corner of a minimum bounding rectangle in the previous image frame; x is the number of _{d_left} ，y _{d_top} Representing the coordinates of the top left corner vertex of the minimum bounding rectangle in the previous image frame; x is the number of _{d_right} ，y _{d_bottom} Representing coordinates of a vertex of a lower right corner of the minimum bounding rectangle in the previous image frame; w, h represents the pixel width and the pixel height of the previous image frame; α, β represent setting coefficients of the boundary expansion.

In a possible implementation manner, the determining module 2002 is further configured to: determining that a current image frame is not a first image frame in any detection cycle before acquiring a first region in which at least one target object is located in a previous image frame of the current image frame.

In a possible implementation manner, the identification module 2003 is further configured to: when the current image frame is the first image frame in any detection period, identifying the at least one target object for the current image frame, and obtaining third areas where the at least one target object is respectively located;

the segmenting module 2004 is further configured to segment the third region in the current image frame.

In a possible implementation manner, when the identifying module 2003 identifies the at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtains a second area where the at least one target object is located, the identifying module is specifically configured to: and taking the position of the second motion estimation area as the input of a neural network model, and identifying the at least one target object in the second motion estimation area through the neural network model to obtain a second area where the at least one target object is located.

Based on the same technical concept, embodiments of the present application provide a computer-readable storage medium including computer program code, which, when run on a computer, causes the computer to perform any of the image segmentation methods as discussed above. Since the principle of solving the problem of the computer-readable storage medium is similar to that of the image segmentation method, the implementation of the computer-readable storage medium can refer to the implementation of the method, and repeated details are not repeated.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An image segmentation method, comprising:

acquiring a first area where at least one target object is located in a previous image frame of a current image frame;

determining the position of a first motion estimation area according to a first area where the at least one target object is located, wherein the first motion estimation area covers the first area where any target object in the at least one target object is located;

determining a position of a second motion estimation region in the current image frame according to the position of the first motion estimation region, wherein the position of the second motion estimation region in the current image frame is the same as the position of the first motion estimation region in the previous image frame;

identifying the at least one target object in the second motion estimation area according to the position of the second motion estimation area, and obtaining a second area where the at least one target object is located;

and segmenting a second area where the at least one target object is located in the current image frame.

2. The method as claimed in claim 1, wherein the number of the target objects in the previous image frame is one, and the determining the first motion estimation region according to the first region where the at least one target object is located comprises:

3. The method as claimed in claim 1, wherein the number of target objects in the previous image frame is N, where N is a positive integer, and the determining the first motion estimation region according to the first region where the at least one target object is located comprises:

4. A method as claimed in claim 2 or 3, wherein the position of the first motion estimation region satisfies the condition shown by the following formula:

wherein x is _{c_left} ，y _{c_top} Representing the coordinates of the top left corner vertex, x, of the rectangle corresponding to the motion region estimate in said previous image frame _{c_right} ，y _{c_bottom} Representing a minimum circumscribed shape in the previous image frameCoordinates of the vertex of the lower right corner of the rectangle; x is the number of _{d_left} ，y _{d_top} Representing the coordinates of the top left corner vertex of the minimum bounding rectangle in the previous image frame; x is the number of _{d_right} ，y _{d_bottom} Representing the coordinates of the vertex of the lower right corner of the minimum bounding rectangle in the previous image frame; w, h represents the pixel width and the pixel height of the previous image frame; α, β represent setting coefficients of the boundary expansion.

5. The method of any one of claims 1-3, further comprising:

determining that a current image frame is not a first image frame in any detection cycle before acquiring a first region in which at least one target object is located in a previous image frame of the current image frame.

6. The method of claim 5, wherein the method further comprises:

when the current image frame is the first image frame in any detection period, identifying the at least one target object for the current image frame, and obtaining third areas where the at least one target object is respectively located;

segmenting the third region in the current image frame.

7. The method as claimed in any one of claims 1 to 3, wherein said identifying said at least one target object in said second motion estimation region according to the position of said second motion estimation region, and obtaining a second region where said at least one target object is located, comprises:

and taking the position of the second motion estimation area as the input of a neural network model, and identifying the at least one target object in the second motion estimation area through the neural network model to obtain a second area where the at least one target object is located.

8. An image segmentation apparatus, comprising:

9. An execution device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the method according to any one of claims 1 to 7 in accordance with the obtained program instructions.

10. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-7.