CN113902696A

CN113902696A - Image processing method, image processing apparatus, electronic device, and medium

Info

Publication number: CN113902696A
Application number: CN202111156311.3A
Authority: CN
Inventors: 于越
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-07

Abstract

The present disclosure provides an image processing method, apparatus, device, medium, and product, which relate to the field of artificial intelligence, and in particular, to the technical field of computer vision and deep learning. The specific implementation scheme comprises the following steps: determining object characteristic information associated with a current frame image according to the current frame image to be identified and an object identification result aiming at a previous frame image; and determining an object recognition result associated with a current frame image based on the object feature information, wherein the previous frame image comprises at least one frame image positioned before the current frame image in the image sequence, and the object recognition result comprises recognition result information for a target object.

Description

Image processing method, image processing apparatus, electronic device, and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of computer vision and deep learning techniques, which can be applied in image processing scenarios.

Background

Image segmentation techniques can be used to identify target objects in image sequences, and have wide application in the fields of automatic driving, public safety, content recommendation, and the like. The accuracy of the object recognition results affects the breadth and effectiveness of the application of image segmentation techniques. However, in some scenes, when object information in an image frame is recognized by using an image segmentation technique, there are phenomena that the recognition accuracy is not high and the recognition result is unstable.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, electronic device, storage medium, and program product.

According to an aspect of the present disclosure, there is provided an image processing method including: determining object characteristic information associated with a current frame image according to the current frame image to be identified and an object identification result aiming at a previous frame image; and determining an object recognition result associated with a current frame image based on the object feature information, wherein the previous frame image comprises at least one frame image positioned before the current frame image in the image sequence, and the object recognition result comprises recognition result information for a target object.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the first processing module is used for determining object characteristic information associated with a current frame image according to the current frame image to be identified and an object identification result aiming at a previous frame image; and a second processing module, configured to determine an object identification result associated with a current frame image based on the object feature information, where the previous frame image includes at least one frame image located before the current frame image in an image sequence, and the object identification result includes identification result information for a target object.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-described image processing method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the image processing method described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically shows a system architecture of an image processing and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram of an image processing method according to an embodiment of the present disclosure;

fig. 3 schematically shows a schematic diagram of an image processing method according to another embodiment of the present disclosure;

FIG. 4 schematically shows a schematic diagram of an image processing method according to a further embodiment of the present disclosure;

fig. 5 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

fig. 6 schematically shows a block diagram of an electronic device for performing the image processing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The embodiment of the disclosure provides an image processing method, an image processing device, an electronic device and a medium. The image processing method comprises the following steps: determining object feature information associated with the current frame image according to the current frame image to be identified and an object identification result for the previous frame image, and determining the object identification result associated with the current frame image based on the object feature information. The previous frame image includes at least one frame image located before the current frame image in the image sequence, and the object recognition result includes recognition result information for the target object.

Fig. 1 schematically shows a system architecture of an image processing and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a database 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between database 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud computing, network services, middleware services, and the like.

The database 101 may be a local database or a cloud database, and the database 101 stores object recognition results for different image frames in the image sequence, and the object recognition results include recognition result information for a target object. The server 103 may be configured to obtain an object identification result of a previous frame image associated with a current frame image from the database 101 according to the current frame image to be identified. The server 103 is further configured to determine object feature information associated with the current frame image according to the current frame image to be recognized and an object recognition result for the previous frame image, and determine an object recognition result associated with the current frame image based on the object feature information. The previous frame image includes at least one frame image located before the current frame image in the image sequence, for example, at least one frame image located chronologically before the current frame image.

It should be noted that the image processing method provided by the embodiment of the present disclosure may be executed by the server 103. Accordingly, the image processing apparatus provided by the embodiment of the present disclosure may be provided in the server 103. The image processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and is capable of communicating with the database 101 and/or the server 103. Accordingly, the image processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the database 101 and/or the server 103.

It should be understood that the number of databases, networks, and servers in fig. 1 are merely illustrative. There may be any number of databases, networks, and servers, as desired for implementation.

An image processing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 4 in conjunction with the system architecture of fig. 1. The image processing method of the embodiment of the present disclosure may be executed by the server 103 shown in fig. 1, for example.

Fig. 2 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S220.

In operation S210, object feature information associated with a current frame image is determined according to a current frame image to be recognized and an object recognition result for a previous frame image.

In operation S220, an object recognition result associated with the current frame image is determined based on the object feature information. The previous frame image includes at least one frame image located before the current frame image in the image sequence, and the object recognition result includes recognition result information for the target object.

An exemplary flow of each operation of the image processing method of the present embodiment is described in detail below.

Illustratively, object feature information associated with a current frame image is determined from a current frame image to be recognized and an object recognition result for a previous frame image. The previous frame image includes at least one frame image located in the image sequence before the current frame image, and illustratively, the previous frame image includes at least one frame image temporally located in the video stream before the current frame image. The current frame image is the Kth frame image in the image sequence, K is an integer greater than or equal to 2, and the previous frame image is the 1 st to K-1 frame images in the image sequence. The object recognition result describes recognition result information for the target object, for example, a confidence that each pixel in the previous frame image is a pixel constituting the target object. In the case where the current frame image is the first frame image in the image sequence, the object identification result associated with the previous frame image may be considered to be a zero vector.

In an example mode, a depth convolution network is used for extracting image features of a current frame image, and convolution calculation and pooling calculation are performed on the current frame image to obtain initial feature information of the current frame image. And according to the object identification result related to the previous frame image, performing enhancement processing aiming at the initial characteristic information of the current frame image to obtain the object characteristic information related to the current frame image. The object characteristic information is more pertinent information based on the characteristic expression of the attention mechanism, and more accurate and efficient target object identification operation can be realized based on the object characteristic information.

After the object characteristic information associated with the current frame image is obtained, an object identification result associated with the current frame image is determined based on the object characteristic information. Illustratively, a deconvolution operation is performed on the object feature information of the current frame image by using a deep convolution network, and an object identification result for the current frame image is obtained.

After the object recognition result for the current frame image is obtained, the object recognition result of the current frame image and the object recognition result of the previous frame image may be subjected to weighted average, and the weighted average result may be used as the adjusted object recognition result of the current frame image. Illustratively, 0.5d is selected as the weight of the weighted average, and d is the distance (e.g., may be the number of frames) between the previous frame image and the current frame image. The design can effectively reduce the shaking effect before the image frame and improve the consistency and accuracy of target object identification.

When the object identification result associated with the current frame image is determined based on the object feature information, a color value can be allocated to each pixel in the image area according to the image area where the target object is located in the current frame image, so as to obtain a semantic index image associated with the current frame image.

The color value assigned to each pixel within the image area may be a number or a string of characters that exist in binary form inside the computer. By way of example, the semantic index image can be displayed in a normal color image by limiting the number of bits of the color value of each pixel to be no more than the number of bits of the color supported by the display system. The object recognition result aiming at the current frame image is visually presented in the display equipment in the form of the color image, so that the semantic information of the target object can be visually expressed, and the intuitiveness and the readability of the object recognition result are improved.

Before target object identification is carried out, saliency detection can be carried out on the current frame image, and a saliency area with the area proportion larger than a preset threshold value in the current frame image is determined. And identifying the target object in the salient region according to the object characteristic information associated with the current frame image, thereby improving the target object identification efficiency, reducing the noise interference in the target object identification process and reducing the calculation amount of the object identification operation. Illustratively, an image area where an object whose area ratio is greater than 60% is located is taken as a salient area in the current frame image.

With the embodiments of the present disclosure, object feature information associated with a current frame image is determined according to the current frame image to be recognized and an object recognition result for a previous frame image, and the object recognition result associated with the current frame image is determined based on the object feature information. The previous frame image includes at least one frame image located before the current frame image in the image sequence, and the object recognition result includes recognition result information for the target object.

The object identification result associated with the current frame image is determined by combining the current frame image and the object identification result aiming at the previous frame image, so that the target object identification is carried out by fusing the related information of other frame images when the single frame image is detected, the accuracy of the object identification result is obviously improved on the basis of keeping the high efficiency of the single frame image detection and small error accumulation, and the consistency and stability of the object identification result are ensured.

Fig. 3 schematically shows a schematic diagram of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 3, operation S210 may include operations S310 to S330.

In operation S310, initial feature information of the current frame image is determined.

In operation S320, N target frame images for performing an enhancement process on initial feature information of a current frame image are determined among K-1 previous frame images, where N is a positive integer less than or equal to K-1.

In operation S330, an enhancement process is performed on the initial feature information according to the object recognition result associated with each target frame image, resulting in object feature information.

Illustratively, a current frame image to be recognized is input into a trained image segmentation model, and a feature extraction module of the image segmentation model is utilized to perform feature extraction processing on the current frame image, so as to obtain initial feature information of the current frame image. The initial feature information may include, for example, information of a gray scale feature, a color feature, a texture feature, and the like of the current frame image. In the embodiment of the present disclosure, the image segmentation model may be implemented by using, for example, a convolutional neural network CNN, and the image segmentation model may be trained by a continuous frame sample atlas, for example. The feature extraction module may be implemented by using a residual error network, for example, and in order to store more image feature information, a part of down-sampling layers in the residual error network may be replaced by a hole convolution layer. This embodiment does not limit this.

After obtaining the initial characteristic information of the current frame image, determining N target frame images for enhancing the initial characteristic information in K-1 previous frame images, wherein N is a positive integer less than or equal to K-1. As an example, among K-1 previous frame images, N previous frame images closest to the current frame image are taken as target frame images. The distance between the previous frame image and the current frame image includes a temporal distance and/or a content distance, and illustratively, N previous frame images that are temporally closest to the current frame image are taken as target frame images.

In determining the N target frame images based on the content distance, a similarity of the object feature information of each previous frame image and the initial feature information of the current frame image may be calculated as the content distance between the previous frame image and the current frame image. And according to the content distance between each previous frame image and the current frame image, taking the N previous frame images with the minimum content distance as target frame images.

In another example, the structural similarity between the current frame image and each of the previous frame images is determined as the content distance between the current frame image and each of the previous frame images according to the pixel array of the previous frame image and the pixel array of the current frame image. The structural similarity is used to represent the similarity of pixel structures of different image frames, and may include similarity information in terms of brightness, contrast, gray scale, pixel average value, pixel standard deviation, and the like, for example. And determining N previous frame images with the highest structural similarity as target frame images according to the structural similarity between the current frame image and each previous frame image.

By way of further example, the global optical flow features of each previous frame image are extracted, and a first optical flow feature matrix for characterizing each previous frame image is respectively constructed. And extracting the global optical flow characteristics of the current frame image, and constructing a second optical flow characteristic matrix for representing the current frame image. And constructing an inter-frame similarity matrix associated with the image sequence based on the second optical flow feature matrix and each first optical flow feature matrix. And determining N previous frame images with the highest similarity with the current frame image as target frame images based on the inter-frame similarity matrix. The optical flow characteristics reflect pixel motion information of a space moving object on an imaging plane, and the global optical flow characteristics can be used for determining the relevance between adjacent frame images according to the change of pixels in an image sequence in a time domain and calculating the pixel motion information in the adjacent frame images.

And according to the object identification results of the N target frame images, performing enhancement processing aiming at the initial characteristic information of the current frame image. The object recognition result associated with the target frame image indicates a confidence that each pixel in the target frame image is a pixel constituting the target object, for example, indicates a probability value that each pixel in the target frame image is a pixel constituting the target object. And according to the confidence degree which is indicated by the object identification result and is associated with each pixel in the target frame image, carrying out weight distribution on the corresponding pixel in the current frame image to obtain a pixel-based weight distribution result which is associated with the current frame image. In general, the higher the confidence associated with a pixel in the target frame image, the higher the weight assigned to the corresponding pixel in the current frame image.

According to an example mode, for each target frame image, weight distribution is carried out on pixels at the same position in the current frame image according to the confidence degree, indicated by the object identification result, associated with each pixel in the target frame image, and a pixel-based weight distribution result associated with the current frame image is obtained.

According to another example mode, for each target frame image, according to an object recognition result associated with the target frame image, a first target pixel with the confidence level higher than a preset threshold value in each target frame image is determined. And determining a motion vector of the first target pixel in the N target frame images according to the determined first target pixel in each target frame image. And determining a second target pixel to be subjected to weight distribution in the current frame image according to the motion vector of the first target pixel in the N target frame images. And according to the confidence degree associated with the first target pixel in each target frame image, carrying out weight distribution on the second target pixel in the current frame image to obtain a pixel-based weight distribution result associated with the current frame image. The method for determining the motion vector associated with the first target pixel in the N target frame images may be implemented by using an optical flow tracking algorithm, which is not described herein again in this embodiment.

Fig. 4 schematically shows a schematic diagram of an image processing procedure according to an embodiment of the present disclosure.

As shown in fig. 4, in the image processing process 400, a t-th frame image 401 to be recognized and an object recognition result 402 for a previous frame image are input into an image segmentation model 403, resulting in an object recognition result associated with the t-th frame image 401. The previous frame image includes the t-1 st frame image and the t-2 nd frame image which are temporally closest to the t-th frame image 401. The image segmentation model 403 can be implemented by, for example, a deep convolutional network, which is not limited in this embodiment.

Image feature extraction is performed on the t-th frame image 401 by using the image segmentation model 403, so as to obtain initial feature information of the t-th frame image 401. And combining the object recognition result 402 associated with the previous frame image, performing enhancement processing on the initial feature information of the t-th frame image 401 to obtain object feature information of the t-th frame image 401, wherein the object feature information is more targeted information based on feature expression of an attention mechanism. The image segmentation model 403 identifies the target object in the t-th frame image 401 based on the object feature information of the t-th frame image 401, and obtains an object identification result.

Optionally, a consistency loss function associated with the image segmentation model 403 is calculated according to the object recognition results of the t-th frame image, the t-1 th frame image and the t-2 th frame image, and parameters of the image segmentation model 403 are adjusted based on the consistency loss function, so as to achieve the purpose of adjusting and optimizing the iterative optimization image segmentation model 403.

And according to the object identification result associated with the previous frame image, the object identification operation aiming at the current frame image is carried out, so that the object identification precision is favorably improved remarkably, and the stability and the continuity of the object identification result are favorably improved.

Fig. 5 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the image processing apparatus 500 of the embodiment of the present disclosure includes, for example, a first processing module 510 and a second processing module 520.

The first processing module 510 is configured to determine object feature information associated with a current frame image according to a current frame image to be identified and an object identification result for a previous frame image. The second processing module 520 is configured to determine an object identification result associated with the current frame image based on the object feature information. The previous frame image includes at least one frame image located before the current frame image in the image sequence, and the object recognition result includes recognition result information for the target object.

According to the embodiment of the disclosure, the current frame image is a K-th frame image in the image sequence, and K is an integer greater than or equal to 2. The first processing module includes: the first processing submodule is used for determining initial characteristic information of the current frame image; the second processing submodule is used for determining N target frame images for enhancing the initial characteristic information of the current frame image in K-1 previous frame images, and N is a positive integer less than or equal to K-1; and the third processing submodule is used for performing enhancement processing aiming at the initial characteristic information according to the object identification result associated with each target frame image to obtain object characteristic information.

According to an embodiment of the present disclosure, the second processing submodule includes: and the first processing unit is used for taking N previous frame images which are closest to the current frame image in the K-1 previous frame images as target frame images. The distance from the current frame image includes a temporal distance and/or a content distance.

According to an embodiment of the present disclosure, a first processing unit includes: the first processing subunit is used for calculating the similarity between the object characteristic information of the previous frame image and the initial characteristic information of the current frame image as the content distance between the previous frame image and the current frame image aiming at any previous frame image; and the second processing subunit is used for taking the N previous frame images with the minimum content distance as the target frame images according to the content distance between each previous frame image and the current frame image.

According to an embodiment of the present disclosure, the object recognition result associated with the target frame image indicates a confidence that each pixel in the target frame image is a pixel constituting the target object. A third processing sub-module comprising: the second processing unit is used for distributing weights to corresponding pixels in the current frame image according to the confidence degree indicated by the object identification result and associated with each pixel in the target frame image aiming at any target frame image, and obtaining a pixel-based weight distribution result associated with the current frame image; and the third processing unit is used for performing enhancement processing on the initial characteristic information according to a weight distribution result based on the pixels related to the current frame image to obtain object characteristic information.

According to an embodiment of the present disclosure, a second processing unit includes: and the third processing subunit is used for carrying out weight distribution on pixels at the same position in the current frame image according to the confidence degree indicated by the object identification result and associated with each pixel in the target frame image, so as to obtain a pixel-based weight distribution result associated with the current frame image.

According to an embodiment of the present disclosure, a second processing unit includes: the fourth processing subunit is used for determining a first target pixel with the confidence level higher than a preset threshold value in each target frame image according to the object identification result associated with the target frame image aiming at each target frame image; the fifth processing subunit is used for determining a motion vector of a first target pixel in the N target frame images according to the determined first target pixel in each target frame image; the sixth processing subunit is configured to determine, according to the motion vector of the first target pixel in the N target frame images, a second target pixel to be subjected to weight allocation in the current frame image; and the seventh processing subunit is used for allocating weight to the second target pixel in the current frame image according to the confidence degree of the first target pixel in each target frame image, so as to obtain a pixel-based weight allocation result associated with the current frame image.

According to an embodiment of the present disclosure, a second processing module includes: the fourth processing submodule is used for identifying a target object in the current frame image based on the object characteristic information; and the fifth processing submodule is used for distributing color values to each pixel in the image area according to the image area of the target object in the current frame image to obtain a semantic index image associated with the current frame image.

According to an embodiment of the present disclosure, the apparatus further comprises: the third processing module is used for carrying out significance detection on the current frame image before identifying the target object in the current frame image and determining a significance region with the area ratio larger than a preset threshold value in the current frame image; and the second processing module is used for identifying the target object in the salient region based on the object characteristic information.

It should be noted that in the technical solutions of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the related information are all in accordance with the regulations of the related laws and regulations, and do not violate the customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

As shown in FIG. 6, the electronic device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 600 via ROM 602 and/or communications unit 606. When the computer program is loaded into the RAM603 and executed by the computing unit 601, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing method comprising:

determining object characteristic information associated with a current frame image according to the current frame image to be identified and an object identification result aiming at a previous frame image; and

determining an object recognition result associated with the current frame image based on the object feature information,

wherein the previous frame image includes at least one frame image located before the current frame image in the image sequence, and the object recognition result includes recognition result information for a target object.

2. The method of claim 1, wherein,

the current frame image is the Kth frame image in the image sequence, K is an integer more than or equal to 2,

the determining of the object feature information associated with the current frame image according to the current frame image to be identified and the object identification result for the previous frame image includes:

determining initial characteristic information of the current frame image;

determining N target frame images for enhancing the initial characteristic information of the current frame image from K-1 previous frame images, wherein N is a positive integer less than or equal to K-1; and

and according to the object identification result associated with each target frame image, performing enhancement processing aiming at the initial characteristic information to obtain the object characteristic information.

3. The method according to claim 2, wherein the determining N target frame images for performing enhancement processing on the initial feature information of the current frame image among K-1 previous frame images comprises:

taking N previous frame images closest to the current frame image among the K-1 previous frame images as the target frame image,

wherein the distance comprises a temporal distance and/or a content distance.

4. The method according to claim 3, wherein the distance includes a content distance, and the regarding, as the target frame image, N previous frame images that are closest in distance to the current frame image among the K-1 previous frame images includes:

calculating the similarity between the object characteristic information of the previous frame image and the initial characteristic information of the current frame image as the content distance between the previous frame image and the current frame image aiming at any one previous frame image; and

and taking the N previous frame images with the minimum content distance as the target frame images according to the content distance between each previous frame image and the current frame image.

5. The method of claim 2, wherein the object recognition result associated with the target frame image indicates a confidence that each pixel in the target frame image is a pixel comprising the target object;

the enhancing processing for the initial feature information according to the object identification result associated with each target frame image to obtain the object feature information includes:

for any target frame image, according to the confidence degree indicated by the object identification result and associated with each pixel in the target frame image, allocating a weight to the corresponding pixel in the current frame image to obtain a pixel-based weight allocation result associated with the current frame image; and

and according to a pixel-based weight distribution result associated with the current frame image, performing enhancement processing on the initial characteristic information to obtain the object characteristic information.

6. The method of claim 5, wherein the assigning, for any one of the target frame images, a weight to a corresponding pixel in the current frame image according to the confidence level associated with each pixel in the target frame image indicated by the object identification result, resulting in a pixel-based weight assignment result associated with the current frame image, comprises:

and for any target frame image, performing weight distribution for pixels at the same position in the current frame image according to the confidence degree indicated by the object identification result and associated with each pixel in the target frame image, and obtaining a pixel-based weight distribution result associated with the current frame image.

7. The method of claim 5, wherein the assigning, for any one of the target frame images, a weight to a corresponding pixel in the current frame image according to the confidence level associated with each pixel in the target frame image indicated by the object identification result, resulting in a pixel-based weight assignment result associated with the current frame image, comprises:

aiming at each target frame image, determining a first target pixel with the confidence level higher than a preset threshold value in each target frame image according to an object identification result associated with the target frame image;

determining a motion vector of a first target pixel in the N target frame images according to the determined first target pixel in each target frame image;

determining a second target pixel to be subjected to weight distribution in the current frame image according to the motion vector of the first target pixel in the N target frame images; and

and according to the confidence degree of the first target pixel in each target frame image, distributing weights to the second target pixel in the current frame image to obtain a pixel-based weight distribution result associated with the current frame image.

8. The method of claim 1, wherein the determining an object identification result associated with the current frame image based on the object feature information comprises:

identifying a target object in the current frame image based on the object feature information; and

and according to the image area of the target object in the current frame image, distributing color values to each pixel in the image area to obtain a semantic index image associated with the current frame image.

9. The method of claim 8, wherein prior to identifying a target object in the current frame image, further comprising:

carrying out significance detection on the current frame image, and determining a significance region with the area proportion larger than a preset threshold value in the current frame image;

the identifying a target object in the current frame image based on the object feature information includes:

and identifying a target object in the salient region based on the object characteristic information.

10. An image processing apparatus comprising:

the first processing module is used for determining object characteristic information associated with a current frame image according to the current frame image to be identified and an object identification result aiming at a previous frame image; and

a second processing module for determining an object recognition result associated with the current frame image based on the object feature information,

11. The apparatus of claim 10, wherein,

the first processing module comprises:

the first processing submodule is used for determining initial characteristic information of the current frame image;

the second processing submodule is used for determining N target frame images for enhancing the initial characteristic information of the current frame image in K-1 previous frame images, and N is a positive integer less than or equal to K-1; and

and the third processing submodule is used for performing enhancement processing aiming at the initial characteristic information according to the object identification result associated with each target frame image to obtain the object characteristic information.

12. The apparatus of claim 11, wherein the second processing submodule comprises:

a first processing unit configured to take, as the target frame image, N preceding frame images closest to the current frame image among the K-1 preceding frame images,

wherein the distance comprises a temporal distance and/or a content distance.

13. The apparatus of claim 12, wherein the first processing unit comprises:

a first processing subunit, configured to calculate, for any one of the previous frame images, a similarity between object feature information of the previous frame image and initial feature information of the current frame image as a content distance between the previous frame image and the current frame image; and

and the second processing subunit is configured to use, as the target frame image, the N previous frame images with the smallest content distance according to the content distance between each previous frame image and the current frame image.

14. The apparatus of claim 11, wherein,

the object recognition result associated with the target frame image indicates a confidence that each pixel in the target frame image is a pixel constituting the target object;

the third processing sub-module comprises:

a second processing unit, configured to, for any one of the target frame images, assign weights to corresponding pixels in the current frame image according to a confidence level, indicated by the object identification result, associated with each pixel in the target frame image, and obtain a pixel-based weight assignment result associated with the current frame image; and

and the third processing unit is used for performing enhancement processing aiming at the initial characteristic information according to a weight distribution result based on pixels and related to the current frame image to obtain the object characteristic information.

15. The apparatus of claim 14, wherein the second processing unit comprises:

and the third processing subunit is configured to, for any one of the target frame images, perform weight distribution for pixels at the same position in the current frame image according to the confidence degree, indicated by the object identification result, associated with each pixel in the target frame image, and obtain a pixel-based weight distribution result associated with the current frame image.

16. The apparatus of claim 14, wherein the second processing unit comprises:

the fourth processing subunit is used for determining, for each target frame image, a first target pixel with the confidence level higher than a preset threshold value in each target frame image according to an object identification result associated with the target frame image;

a fifth processing subunit, configured to determine, according to the determined first target pixel in each target frame image, a motion vector of the first target pixel in the N target frame images;

a sixth processing subunit, configured to determine, according to the motion vector of the first target pixel in the N target frame images, a second target pixel to be subjected to weight allocation in the current frame image; and

and the seventh processing subunit is configured to, according to the confidence of the first target pixel in each target frame image, assign a weight to the second target pixel in the current frame image, and obtain a pixel-based weight assignment result associated with the current frame image.

17. The apparatus of claim 10, wherein the second processing module comprises:

the fourth processing submodule is used for identifying a target object in the current frame image based on the object characteristic information; and

and the fifth processing submodule is used for distributing color values to each pixel in the image area according to the image area of the target object in the current frame image to obtain a semantic index image associated with the current frame image.

18. The apparatus of claim 17, further comprising:

the third processing module is used for performing significance detection on the current frame image before identifying the target object in the current frame image, and determining a significance region with the area ratio larger than a preset threshold value in the current frame image; and

the second processing module is configured to identify a target object in the salient region based on the object feature information.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.