CN113066122A

CN113066122A - Image processing method and device

Info

Publication number: CN113066122A
Application number: CN202110301945.7A
Authority: CN
Inventors: 郭林杰; 马岳文
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2021-07-02
Anticipated expiration: 2040-05-15
Also published as: CN111340878B; CN113066122B; CN111340878A

Abstract

The embodiment of the specification provides an image processing method and an image processing device, wherein the image processing method comprises the following steps: acquiring an article image of a target article; determining the position coordinates of the target object in a camera coordinate system corresponding to the object image; adding at least one virtual article in the camera coordinate system according to the position coordinates; generating an intermediate image containing image features of the virtual article at a target viewing angle; the target visual angle is determined according to the acquisition visual angle of the article image; and carrying out image fusion on the intermediate image and the article image to obtain a sample image.

Description

Image processing method and device

The present invention was filed on 20200515, application No. 202010411827.7, and was filed as a divisional application entitled "image processing method and apparatus".

Technical Field

The embodiment of the present specification relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.

Background

With the development of the deep learning technology, the accuracy of the image recognition technology is significantly improved and widely applied in various fields, however, due to the severe dependence of the deep learning on the training data, the accuracy of the image recognition depends greatly on the acquisition and labeling of the sample image, and in the process of acquiring the sample image and labeling the acquired sample image, manual participation is required at present, so that a large amount of time cost and labor cost are consumed in the process of acquiring a large amount of training data, and therefore a more effective scheme needs to be provided.

Disclosure of Invention

In view of this, embodiments of the present specification provide an image processing method. One or more embodiments of the present specification also relate to an image processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.

In a first aspect of embodiments of the present specification, there is provided an image processing method including:

acquiring an article image of a target article;

determining the position coordinates of the target object in a camera coordinate system corresponding to the object image;

adding at least one virtual article in the camera coordinate system according to the position coordinates;

generating an intermediate image containing image features of the virtual article at a target viewing angle; the target visual angle is determined according to the acquisition visual angle of the article image;

and carrying out image fusion on the intermediate image and the article image to obtain a sample image.

Optionally, the article image is obtained by:

issuing a moving instruction aiming at the target object to a grabbing device; the moving instruction carries an initial coordinate of an initial position and a target coordinate of a target position;

receiving an execution feedback notification uploaded by the grabbing device and aiming at the movement instruction;

issuing an image acquisition instruction aiming at the target position to image acquisition equipment;

and receiving the article image uploaded by the image acquisition equipment.

Optionally, the grabbing device grabs the target object from an initial position according to the initial coordinate, moves the target object to the target position corresponding to the target coordinate, and uploads an execution feedback notification for the movement instruction after the movement is completed;

and the image acquisition equipment acquires the image of the target object at the target position according to the target coordinate carried in the image acquisition instruction, and acquires and uploads the object image.

Optionally, the determining the position coordinates of the target object in the camera coordinate system corresponding to the object image includes:

and calculating the position coordinates of the target object in a camera coordinate system corresponding to the object image according to the target coordinates and a preset transformation matrix.

Optionally, adding at least one virtual article in the camera coordinate system according to the position coordinates includes:

acquiring shape information of the target item contained in the execution feedback notification;

adding a marked virtual article matched with the shape information in size at the position coordinates;

determining a target view angle of the camera coordinate system according to the acquired collection view angle of the image collection equipment;

and adding at least one virtual article in a target area determined according to the marked virtual article and the target visual angle.

Optionally, after the step of adding at least one virtual article to the camera coordinate system according to the position coordinates is executed and before the step of generating an intermediate image including image features of the virtual article at a target viewing angle is executed, the method further includes:

determining the shielding relation between the marked virtual article and the virtual article according to the marked virtual article and the distance between the virtual article and the origin point in the camera coordinate system;

correspondingly, the image fusing the intermediate image and the article image to obtain a sample image includes:

and screening out sample image units from the intermediate image unit of the intermediate image and the image unit of the article image according to the shielding relation to form the sample image.

performing image rendering on the marked virtual article and the virtual article from the target view angle to obtain a rendered image;

determining the shielding relation between the marking image feature of the marking virtual article and the image feature according to the rendering image;

judging whether a rendering image area corresponding to the rendering image in the target image area belongs to a shielding area or not according to the shielding relation aiming at the target image area where the target image unit in the article image is located;

if so, determining an intermediate image unit of the target image area in an intermediate image area corresponding to an intermediate image as a sample image unit of a sample image area corresponding to the target image area in a sample image;

and if not, determining the image unit as the sample image unit.

Optionally, the generating an intermediate image including an image feature of the virtual article at a target viewing angle includes:

removing the marked virtual article at the camera coordinate system;

and performing image rendering on the virtual article from the target view angle to obtain the intermediate image containing the image characteristics.

Optionally, the image processing method further includes:

performing image processing on the article image to obtain target mask information of the target image characteristic of the target article in the article image;

performing image processing on the intermediate image to obtain mask information of the image features in the intermediate image;

performing Boolean operation according to the target mask information and the mask information to obtain mask marking information of sample image features of the target object in the sample image;

marking the sample image according to the mask marking information to obtain a marked image of the sample image; the sample image and the annotation image form a training sample pair.

Optionally, the performing image processing on the article image to obtain target mask information of a target image feature of the target article in the article image includes:

carrying out image difference on the pre-collected background image of the target position and the article image through an image difference algorithm to obtain the target image characteristics;

and carrying out mask processing on the target image characteristics to obtain the target mask information.

Optionally, the image processing method further includes:

and taking the training sample pair as a training sample, training the initial article identification model, and obtaining the article identification model after training.

Optionally, the image processing method further includes:

inputting a target image into the item identification model;

and the article identification model identifies the object image to obtain an article identification result output by the article identification model and aiming at the object image.

In a second aspect of embodiments of the present specification, there is provided an image processing apparatus including:

an acquisition module configured to acquire an item image of a target item;

a determining module configured to determine position coordinates of the target item in a camera coordinate system corresponding to the item image;

an adding module configured to add at least one virtual item in the camera coordinate system according to the position coordinates;

a generation module configured to generate an intermediate image containing image features of the virtual item at a target perspective; the target visual angle is determined according to the acquisition visual angle of the article image;

and the fusion module is configured to perform image fusion on the intermediate image and the article image to obtain a sample image.

In a third aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

acquiring an article image of a target article;

In a fourth aspect of the embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the image processing method.

The present specification provides an image processing method, which determines position coordinates of a target object in a camera coordinate system corresponding to an object image on the basis of acquiring the object image of the target object, constructs a virtual scene similar to the object image on the basis of the position coordinates, namely, adding at least one virtual article in a camera coordinate system according to the position coordinates, generating an intermediate image containing the image characteristics of the virtual article at a target view angle, and image fusion is carried out on the intermediate image and the article image to obtain a sample image, the obtained article image is subjected to image fusion to generate a sample image, since the adding positions of the virtual articles in the virtual scene are various, the sample images generated based on the adding positions are also various, therefore, the diversity and richness of the sample images are increased, and the generation efficiency and the image quality of the sample images are improved.

Drawings

Fig. 1 is a process flow diagram of an image processing method provided in one embodiment of the present specification;

FIG. 2 is a schematic diagram of a coordinate system of a robot provided in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a camera coordinate system provided in one embodiment of the present description;

fig. 4 is a process flow diagram of an image processing method applied to an unmanned convenience store according to an embodiment of the present specification;

fig. 5 is a schematic diagram of an image processing apparatus provided in an embodiment of the present specification;

fig. 6 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Six degrees of freedom: the object has six degrees of freedom in space, namely, the degree of freedom of movement along the directions of three orthogonal coordinate axes of x, y and z and the degree of freedom of rotation around the three coordinate axes, so that the position of the object needs to be completely determined, and the six degrees of freedom need to be clear.

An Iterative Closest Point (IPC) algorithm is a Point set to Point set registration method, and by providing two data sets, a transformation matrix Rt of the Point sets in the two data sets is solved, so that the Point sets contained in the two data sets can be subjected to spatial transformation through the transformation matrix Rt.

In this specification, an image processing method is provided, and one or more embodiments of the specification relate to an image processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

An embodiment of an image processing method provided in this specification is as follows:

fig. 1 shows a processing flowchart of an image processing method provided according to an embodiment of the present specification, including steps S102 to S110.

Step S102, an article image of the target article is acquired.

In practical application, an article identification model based on deep learning is adopted, the accuracy of image identification depends on the acquisition and labeling of sample images to a great extent, and in the process of acquiring the sample images and labeling the acquired sample images, manual participation is needed, so that a great amount of time cost and labor cost are consumed in the process of acquiring a great amount of training data.

The target object refers to an object to be identified, and specifically, the target object may be a beverage, an electric appliance, a daily necessity, a food, and the like, which is not limited herein; the article image refers to an image including target image characteristics of a target article, for example, a picture of the target article obtained by image acquisition of the target article; the target image feature refers to a feature of an image displayed in the article image by the target article.

During specific implementation, in order to guarantee automation and sustainability of an image acquisition process, the image acquisition equipment and the grabbing equipment are matched with each other, after the grabbing equipment moves the target object to the target position, the image acquisition equipment acquires images of the target object, and the authenticity of an object image is increased due to the fact that the image acquisition can be carried out in an actual scene during image acquisition, the object image is obtained in the following mode:

and receiving the article image uploaded by the image acquisition equipment.

Specifically, the moving instruction is a moving instruction which is issued to the grasping device and moves the target object from an initial position to a target position, and the moving instruction carries an initial coordinate of the initial position and a target coordinate of the target position; the execution feedback notification is a feedback notification generated based on a movement result of the grabbing device moving the target object according to the movement instruction, where the movement result includes information such as success in movement, failure in movement, or interruption in movement, and the execution feedback notification may further include specific details of the movement result, for example, a target coordinate of a target position where the target object is located after the movement is successful; after the movement fails, the reason for the movement failure is not limited herein.

The six-degree-of-freedom coordinate refers to that an object has six degrees of freedom in space, namely, a degree of freedom of movement in the directions of three orthogonal coordinate axes x, y and z and a degree of freedom of rotation around the three coordinate axes, so that the position of the object needs to be completely determined, and the six-degree-of-freedom coordinate needs to be clear.

In practical applications, the number of the initial coordinates may be one or multiple, and when the number of the initial coordinates is multiple, each initial position corresponds to one target coordinate, and the target coordinates corresponding to each initial position may be the same or different, and are not limited herein.

the grabbing equipment can be equipment such as a mechanical arm and a robot which can grab an article to move according to a moving instruction, and under the condition that the grabbing equipment is the mechanical arm, the initial coordinate and the target coordinate are six-degree-of-freedom coordinates in a mechanical arm coordinate system which is constructed by taking the mechanical arm as a center.

After moving the target object to the target coordinate, the grabbing device uploads an execution feedback notification so as to know the moving condition of the target object, and under the condition of successful movement, namely under the condition that the target object moves to the target position, the image acquisition is carried out on the target object at the target position.

The image acquisition instruction is an image acquisition instruction which is issued to the image acquisition equipment and is used for carrying out image acquisition on the target position, and the image acquisition instruction carries the target coordinate of the target position; the image capturing device refers to a device with a shooting function, such as a camera, a video camera, a scanner, a mobile phone, a tablet computer, and the like, and is not limited herein.

Optionally, the image acquisition device acquires an image of the target object at the target position according to the target coordinate carried in the image acquisition instruction, and obtains and uploads the object image.

In practical application, the image acquisition equipment is calibrated in advance according to the target coordinates, and the acquisition parameters of the image acquisition equipment corresponding to the target coordinates can be determined according to the calibration result; after the image acquisition module receives the image acquisition instruction, the image acquisition device is adjusted according to the acquisition parameters corresponding to the target coordinates obtained by pre-calibration, so that the image acquisition device can accurately capture the target object at the target position.

Taking a grabbing device as an example of a mechanical arm, issuing a moving instruction for a target object to the mechanical arm, the mechanical arm carrying an initial coordinate (x1, y1, z1, α 1, β 1, γ 1) according to the moving instruction, grabbing the target object at the initial position, moving the target object to a target position corresponding to the target coordinate (x2, y2, z2, α 2, β 2, γ 2), after the moving is completed, uploading a moving success notification for the moving instruction by the mechanical arm, after receiving the moving success notification uploaded by the mechanical arm, issuing an image acquisition instruction for the target position to a camera device, the camera device obtaining an acquisition parameter corresponding to the target coordinate of the camera device according to the target coordinate (x2, y2, z2, α 2, β 2, γ 2) carried in the image acquisition instruction, adjusting the camera device based on the acquisition parameter, adjusting the target coordinate (x2, y2, z2, α 2, β 2, γ 2), and obtaining a picture p1 of the target object.

And step S104, determining the position coordinates of the target object in a camera coordinate system corresponding to the object image.

On the basis of obtaining the object image, the position coordinates of the target object in the camera coordinate system need to be further determined, so as to construct a virtual scene similar to the object image based on the position coordinates.

In specific implementation, on the basis of obtaining an article image, a position coordinate of a target article in a camera coordinate system is determined according to a rotation matrix obtained by performing coordinate system conversion between a capture coordinate system and the camera coordinate system in advance, so that a virtual scene similar to the article image is constructed in the camera coordinate system based on the position coordinate, and accuracy of the virtual scene is increased.

The camera coordinate system corresponding to the object image is a real camera coordinate system constructed by taking the center of a lens of the image acquisition equipment as an origin, and a camera coordinate system is constructed in a virtual scene.

The preset transformation matrix is a rigid transformation matrix for performing spatial transformation on coordinates from a coordinate system of the capturing device to a coordinate system of the camera, and in specific implementation, the transformation matrix for performing spatial transformation between the coordinate system of the camera and the coordinate system of the capturing device is calculated by using an Iterative Closest Point (IPC) algorithm according to three-dimensional coordinates corresponding to a plurality of positions in the coordinate system of the camera and captured three-dimensional coordinates corresponding to the positions in the coordinate system of the capturing device.

The IPC algorithm is a point set-to-point set registration method, and through the given two data sets, the transformation matrix Rt of the point sets in the two data sets is solved, so that the point sets contained in the two data sets can be subjected to spatial transformation through the transformation matrix Rt.

It should be noted that, since the coordinates in the camera coordinate system are three-dimensional coordinates, in the case where the target coordinates of the target object are six-degree-of-freedom coordinates, the six-degree-of-freedom coordinates need to be converted into three-dimensional coordinates, for example, the target coordinates are (x2, y2, z2, α 2, β 2, γ 2), and the target three-dimensional coordinates are obtained after being converted into three-dimensional coordinates (x2, y2, z 2).

Following the above example, the target object is obtainedAfter the picture p1, converting the target coordinates (X2, y2, z2, α 2, β 2, γ 2) of the target position of the target object into three-dimensional coordinates to obtain target three-dimensional coordinates (X2, y2, z2), and converting the target three-dimensional coordinates (X2, y2, z2) into position coordinates 202 (X2, y2, z2) of the camera coordinate system by calculating the product of the target three-dimensional coordinates and a preset rigid transformation matrix Rt_c1，Y_c1，Z_c1) This position coordinate 202 (X)_c1，Y_c1，Z_c1) As the position coordinates 202 of the target object corresponding to the camera coordinate system, as shown in FIG. 2, the camera coordinate system is represented by X_CAxis, Y_CAxis and Z_CAxial composition, X_CAxis, Y_CAxis and Z_CThe intersection point of the axes is the origin O_C。

And step S106, adding at least one virtual article in the camera coordinate system according to the position coordinates.

On the basis of determining the position coordinates of the target object in the camera coordinate system, if the virtual scene of the camera coordinate system needs to be perfected, at least one virtual object is added in the camera coordinate system according to the position coordinates, specifically, at least one virtual object is added in the camera coordinate system under the condition that the position coordinates are not overlapped in a certain position range.

Specifically, the virtual article may be an article model corresponding to an article of the same type as the target article, or may not be an article model corresponding to an article of the same type as the target article, and in practical applications, the virtual article refers to a three-dimensional article model, or may be a two-dimensional article model, which is not limited herein.

In practical applications, the manner of adding the virtual object in the camera coordinate system is various, for example, according to a preset adding rule, adding the corresponding virtual object in a preset coordinate, or randomly adding the virtual object in the preset coordinate, and the like, which is not limited herein.

On the basis of the above-mentioned execution feedback notification about the movement instruction uploaded by the grasping apparatus, in an optional implementation manner provided by an embodiment of this specification, the adding at least one virtual article in the camera coordinate system according to the position coordinate is specifically implemented by:

Specifically, the shape information refers to information that can indicate the shape of the target object, for example, information such as the diameter, height, circumference, and the like of the target object, and is not limited herein.

The marked virtual article refers to a three-dimensional virtual model, and is used for marking the position of a target article in a camera coordinate system. The acquisition visual angle refers to an acquisition visual angle of an image acquisition device for acquiring an article image, the acquisition visual angle is determined as a target visual angle of a camera coordinate system, at least one virtual article is randomly added in a target area determined by the virtual marked article and the target visual angle, and the added virtual articles cannot be overlapped.

Specifically, the target view angle refers to an image view angle of an image generated from a virtual article in a camera coordinate system, and may also be a camera view angle of a virtual camera introduced in the camera coordinate system.

In practical application, at least one virtual article is added in a target area determined according to the marked virtual article and the target visual angle, so that the added virtual article is prevented from being out of the range of the target visual angle, and the added virtual article and the marked virtual article are prevented from being overlapped.

The target view angle is consistent with the collection view angle of the image collection device for collecting the object image, so that the collection view angle of the image collection device can be directly obtained, and in addition, the virtual focal length and the virtual film of the virtual camera in the camera coordinate system can be determined according to the obtained collection focal length and the collection film of the image collection module; and calculating the target visual angle of the virtual camera in the camera coordinate system according to the virtual focal length and the virtual film.

Along with the above example, the circumscribed circle diameter L of the target item included in the movement success notification uploaded by the robot arm is acquired, and as shown in fig. 2, the position coordinates 202 (X) in the camera coordinate system are obtained from the circumscribed circle diameter L_c，Y_c，Z_c) Adding a marked virtual article 204 with the diameter of L, as shown in FIG. 3, acquiring a collection angle of view theta of a camera device when acquiring a picture p1, determining the collection angle of view theta as a target angle of view theta 1 of a camera coordinate system, and randomly adding 2 virtual articles, namely a virtual article 306 and a virtual article 308, according to the marked virtual article 204 and a target area determined by the target angle of view theta 1.

Further, on the basis of adding the marked virtual article and the virtual article to the camera coordinate system, since there may be an occlusion relationship between the added marked virtual article and the virtual article, in this case, the occlusion relationship between the marked virtual article and the virtual article needs to be determined, so as to perform image fusion on the intermediate image and the article image according to the occlusion relationship in the following step, in specific implementation, implementation manners of determining the occlusion relationship between the marked virtual article and the virtual article are various, and in a first optional implementation manner provided by this specification, the occlusion relationship is determined specifically by adopting the following manner:

in practical applications, since the shielding relationship between the marked virtual article and the virtual article is determined in the camera coordinate system, the distance between the marked virtual article and the origin (i.e. the lens of the virtual camera) in the camera coordinate system refers to the distance between the marked virtual article and the origin in the same direction from the origin, and the distance between the virtual article and the origin, and the shielding relationship between the marked virtual article and the virtual article is determined according to the distance (the shielding distance from the near position is far) from the origin.

When the virtual object is marked and the virtual object is a three-dimensional model, the distance between the three-dimensional model and the origin point is compared, and the distance between the coordinates covered by the marked virtual object and the origin point and the distance between the coordinates covered by each virtual object and the origin point are compared in the same direction from the origin point, so that the shielding relation is determined.

Following the above example, FIG. 3 shows, in the camera coordinate system, the origin O_CAnd position coordinates 202 (X)_c，Y_c，Z_c) In the direction of the connecting line between them, mark the virtual object 204 and the origin O_CD1, virtual item 306 and origin O_CD2, the virtual item 308 has no corresponding coordinates in this direction, wherein d2<d1, it is determined that virtual item 306 occludes tagged virtual item 204 and that no occluding relationship exists between virtual item 308 and tagged virtual item 204.

In addition to the foregoing implementation manner of determining an occlusion relationship, in a second optional implementation manner provided in this specification, the occlusion relationship is determined specifically by using the following manner:

specifically, image rendering is performed on the marked virtual article and the virtual article from a target view angle to obtain a rendered image, which means that image rendering is performed through a rendering engine, and in the process of image rendering of the marked virtual article and the virtual article, rendering is performed according to the distance relation between the marked virtual article and the virtual article relative to the origin, so that in the obtained rendered image, shielding is performed according to the shielding relation between the marked virtual article and the virtual article, and the shielding relation between the marked image feature of the marked virtual article and the image feature can be determined according to the principle that the shielded image area in the rendered image is placed at the bottom layer of the image.

The marked image features refer to features of an image displayed in a rendered image of a marked virtual article; similarly, the image feature refers to a feature of an image displayed in the image by the virtual article.

Following the above example, as shown in fig. 3, in the camera coordinate system, starting from a target view angle θ 1 (a view angle range corresponding to the target view angle θ 1 is shown by a hatched portion in fig. 3), image rendering is performed on the marked virtual article 204, the virtual article 306, and the virtual article 308 to obtain a rendered image, and according to the rendered image, it is determined that the image feature of the virtual article 306 blocks the marked image feature of the marked virtual article 204, and the marked image feature of the marked virtual article 204 has no blocking relationship with the image feature of the virtual article 308.

In conclusion, the shielding relation between the marking image feature of the marking virtual article and the image feature is determined according to the rendering image obtained by image rendering of the marking virtual article and the virtual article from the target view angle, which is the basis for image fusion of the intermediate image and the article image subsequently, and the accuracy of image fusion is improved.

Step S108, generating an intermediate image containing the image characteristics of the virtual article at the target view angle.

The method comprises the steps of generating an intermediate image containing image characteristics of a virtual article at a target view angle on the basis of adding at least one virtual article in a camera coordinate system, wherein the step of imaging a virtual scene containing the virtual article from the target view angle of the camera coordinate system to obtain the intermediate image is carried out.

In a specific implementation, on the basis that the virtual object is added to the position coordinates, the added marked virtual object is removed first, and then the virtual object is rendered from the target view angle to obtain an intermediate image, so that image segmentation of the marked virtual object is avoided before image fusion of the intermediate image and the object image, and the efficiency of image fusion is increased.

Removing the marked virtual article at the camera coordinate system;

In practical application, the step of rendering the image of the virtual article from the target view point refers to the step of rendering a three-dimensional scene containing the virtual article into a two-dimensional image, namely an intermediate image, from the target view point.

Following the above example, the marked virtual object 204 is removed from the camera coordinate system, and the virtual object 306 and the virtual object 308 are image-rendered from the target view angle θ 1, so as to obtain an intermediate image including the image features of the virtual object 306 and the virtual object 308.

And step S110, carrying out image fusion on the intermediate image and the article image to obtain a sample image.

In the method, the intermediate image including the image feature of the virtual article is generated, and then the intermediate image and the article image are subjected to image fusion to obtain the sample image.

In specific implementation, under the condition that the image sizes and/or resolutions of the intermediate image and the article image are different, the intermediate image and the article image are subjected to image fusion, the intermediate image needs to be subjected to image size scaling processing to be scaled to be consistent with the image size of the article image, and/or the resolution of the intermediate image is adjusted to be consistent with the resolution of the article image, and then the image fusion is carried out on the basis.

In practical applications, the intermediate image and the article image are subjected to image fusion in various ways, the image feature of the virtual article in the intermediate image may be firstly subjected to image segmentation, the image feature obtained after the segmentation may be fused with the article image, the target image feature of the target article in the article image may be subjected to image segmentation, and the target image feature obtained after the segmentation may be fused with the intermediate image, which is not limited herein.

Corresponding to the first optional implementation manner of determining the occlusion relationship in step S106, in an optional implementation manner provided by the embodiment of this specification, the image fusion is performed on the intermediate image and the article image to obtain the sample image, which is specifically implemented by the following method:

In specific implementation, a sample image unit in a sample image is determined according to the shielding relation between a marked virtual article and the virtual article, if the marked virtual article is shielded by the virtual article, a shielded image area is determined, an intermediate image unit of an intermediate image is selected as the sample image unit in the shielded image area, if the marked virtual article is not shielded by the virtual article, an unblocked image area is determined, and in the unblocked image area, the image unit of the article image is selected as the sample image unit, so that the selected sample image units form the sample image.

Specifically, the image unit refers to a pixel point in the article image, and in addition, a plurality of pixel points in the article image can form an image block or an image area and the like; similarly, the intermediate image unit refers to a pixel point in the intermediate image, and besides, a plurality of pixel points in the intermediate image may form an image block or an image area, and the like; the sample image unit refers to a pixel point in a sample image, and besides, a plurality of pixel points in the sample image can form an image block or an image area and the like.

According to the above example, according to the fact that the virtual article 204 is marked in a shielding mode through the determined virtual article 306 and no shielding relation exists between the virtual article 308 and the marked virtual article 204, in the intermediate image containing the image features of the virtual article 306 and the virtual article 308 and the picture p1 containing the target image feature of the target article, according to the fact that the virtual article 204 is marked in a shielding mode through the virtual article 306, the image area where the image feature of the virtual article 306 and the target image feature of the marked virtual article 204 are located in the image are overlapped, pixel points corresponding to the image area are selected from the intermediate image to serve as pixel points in the sample image, in other areas where the image features are not overlapped, the pixel points in the article image are selected to serve as pixel points in the sample image, and the selected pixel points form the sample image.

Corresponding to the second optional implementation manner of determining the occlusion relationship in step S106, in an optional implementation manner provided by the embodiment of this specification, the intermediate image and the article image are subjected to image fusion to obtain a sample image, which is specifically implemented by the following method:

if so, selecting an intermediate image unit of the target image area in the intermediate image area corresponding to the intermediate image, and determining the selected intermediate image unit as a sample image unit of the target image area in the sample image area corresponding to the sample image;

if not, selecting an image unit in the target image area, and determining the image unit as the sample image unit.

In the specific implementation process, in the process of fusing the intermediate image and the article image, a target image area where each pixel point is located in the article image needs to be traversed, whether a rendering image feature corresponding to each target image area in the rendering image is shielded by an image feature of the virtual article is judged, if yes, a pixel point in the intermediate image area corresponding to the target image area in the intermediate image is selected, the selected pixel point in the intermediate image area is used as a pixel point in a sample image area corresponding to the target image area in the sample image, and if not, a pixel point in the target image area is selected, and the pixel point in the target image area is used as a pixel point in the sample image area.

The rendering image area refers to an image area corresponding to the rendering image in the target image area; the occlusion region is an occlusion region for occluding and marking image features for the image features, and the occlusion region can be determined according to occlusion relations.

It should be noted that the target image area in the rendered image area corresponding to the rendered image, the target image area in the intermediate image area corresponding to the intermediate image, and the sample image area corresponding to the target image area in the sample image mean that the positions of the target image area, the rendered image area, the intermediate image area, and the sample image area in the image to which the target image area, the rendered image area, the intermediate image area, and the sample image area belong are the same.

Following the above example, the determined occlusion relationship is: the image feature of the virtual article 306 blocks the marked image feature of the marked virtual article 204, the marked image feature of the marked virtual article 204 has no blocking relation with the image feature of the virtual article 308, the target image area where each pixel point in the picture p1 is located is traversed, first, whether the rendered image area corresponding to the target image area in the rendered image belongs to the area where the marked image feature is blocked by the image feature of the virtual article 306 is judged, if yes, selecting pixel points of the target image region in the intermediate image region corresponding to the intermediate image, taking the pixel points in the intermediate image region as the pixel points of the target image region in the sample image region corresponding to the sample image, if not, indicating that a pixel in the target image region should be selected, the pixel in the target image region in picture p1 is taken as the pixel in the sample image region.

After the sample image is obtained, the generated sample image needs to be labeled, the sample image and the labeled image form a training sample pair to be used as a training sample for supervised model training, and the accuracy of article identification of an article identification model generated by model training using the training sample is increased.

In practical application, the mask is a binary image consisting of 0 and 1; in order to extract the region of interest in the image, the image value of the region of interest is represented by 1 value, and the image values of other regions outside the region are represented by 0 value, and a binary image formed on the basis is used as mask information.

Correspondingly, the mask information is a binary image formed by representing the image value of the image area where the image feature is located in the intermediate image by 1 value and representing the image value of the other area except the image area in the intermediate image by 0 value; the target mask information is a binary image formed by representing the image value of an image area with the target image characteristic in the object image by 1 value and representing the image values of other areas except the target image area in the object image by 0 value; similarly, the mask annotation information is a binary image formed by expressing the image value of the image region where the target image feature of the target object in the sample image is located by 1 value and expressing the other regions except the target image region in the sample image by 0 value.

In a specific implementation, the image is processed to obtain mask information of the specific image feature in the image, and an image processing method such as an image segmentation algorithm or a saliency detection algorithm may be used to extract an image region in which the specific image feature is located in the image as a region of interest, and the image value of the extracted region is represented by 1 value and the image values of the other regions are represented by 0 values to generate the mask information of the specific image feature.

It should be noted that, the mask labeling information corresponding to the sample image feature of the target object in the sample image may be obtained by performing boolean operation according to the target mask information and the mask information, and the intersection of the target mask information and the mask information is first obtained, the intersection is used as a portion of the target image feature that is blocked by the image feature, the difference between the target mask information and the intersection is further obtained, and the difference is used as the mask labeling information.

In practical applications, due to the complexity of the occlusion relationship, when performing boolean operations, it is necessary to consider the case where the image feature is occluded by the target image feature, and the image value of the partially occluded region is regarded as a 1 value and is part of the mask annotation information.

In the above example, image segmentation processing is performed on the picture p1 to obtain target mask information M0 of the target image feature of the target article in the picture p1, image segmentation processing is performed on the intermediate image to obtain mask information M1 of the virtual article 306 and mask information M2 of the image feature of the virtual article 308 in the intermediate image, intersection is performed on M0, M1 and M2 to obtain intersection M3, an area where the intersection is located in the sample image is used as a blocking area where the virtual article 306 blocks and marks the virtual article 204, a difference between M0 and M3 is further calculated to obtain difference M4, M4 is used as mask marking information of the target article in the sample image, and the sample image and the mask marking information M4 are combined into a training sample pair Z1.

In a specific implementation, the method for performing image processing on an object image to obtain target mask information of a target image feature of a target object in an object image is various, and in an optional implementation provided by an embodiment of this specification, the method for performing image processing on the object image to obtain target mask information of a target image feature of the target object in the object image is specifically implemented by:

Specifically, the background image is a background image acquired in a state that no article is located at the target position, and in practical application, before issuing the movement instruction to the capture device, the image acquisition instruction for the target position may be issued to the image acquisition device to obtain the background image of the target position.

In practical application, through an image difference algorithm, a difference portion between two images can be compared, in the embodiment of the present specification, the difference portion between a background image and an article image is obtained through the image difference algorithm, that is, a target image feature of a target article, an image value of an image area where the target image feature is located is represented by 1 value, an image value of an image area outside the image area where the target image feature is located in the article image is represented by 0 value, and a binary image corresponding to the target image feature, that is, the target mask information is obtained.

The method for obtaining the sample image and labeling the sample image to obtain the labeled image of the sample image is repeatedly adopted to obtain a large number of training sample pairs on the basis that the sample image and the labeled image form a training sample pair, an initial article identification model is trained according to the training sample pair to obtain an article identification model, the labor cost and the time cost consumed for obtaining the article identification model are reduced, and the identification accuracy and the identification efficiency of the article identification model are further improved on the basis of the accuracy of the training sample pair.

Specifically, the training samples for model training of the initial article recognition model are obtained by repeatedly executing the above-described modes of obtaining sample images and labeling images.

The initial article identification model is a model constructed by adopting a deep learning algorithm; correspondingly, the article identification model refers to an article identification model which can identify articles after the initial article identification model is trained.

According to the above example, training sample pairs Z1, Z2, Z3 and … Zn are obtained, the training sample pairs are used as training samples, the initial article identification model is trained according to the training samples, and the article identification model is obtained after training is completed.

In an optional implementation manner provided by the embodiment of the present specification, the image processing method further includes, on the basis of training the initial article identification model to obtain the article identification model, performing article identification by using the article identification model, and increasing accuracy of the article identification model:

inputting a target image into the item identification model;

Specifically, the target image is an image to be identified, and the target image includes image features of an article to be identified.

Along with the above example, the picture p2 containing the image characteristics of the target item is input into the item identification model, and the item identification model performs item identification on the picture p2 to obtain the identification result of the target item in the picture p2 output by the item identification model.

The following will further describe the image processing method provided in this specification with reference to fig. 4, by taking an example of application of the image processing method to an unmanned convenience store. Fig. 4 shows a processing flow chart of an image processing method applied to an unmanned convenience store according to an embodiment of the present specification, and specific steps include step S402 to step S436.

Step S402, a moving instruction for the target object is issued to the grabbing equipment.

Specifically, the movement instruction carries an initial coordinate of a shelf in the unmanned convenience store and a target coordinate of a checkout counter.

The grabbing device grabs the target object from a shelf according to the initial coordinate, moves the target object to a cash register corresponding to the target coordinate, and uploads an execution feedback notice aiming at the movement instruction after the movement is completed;

and step S404, receiving an execution feedback notice uploaded by the grabbing device and aiming at the movement instruction.

Step S406, an image capture instruction for the checkout counter is issued to the image capture device.

Specifically, the image acquisition device acquires an image of the target object at the checkout counter according to the target coordinate carried in the image acquisition instruction, and obtains and uploads the object image.

Step S408, receiving the object image of the target object uploaded by the image capturing device.

Step S410, calculating the position coordinates of the target object in the camera coordinate system corresponding to the object image according to the target coordinates and the preset transformation matrix.

In step S412, the shape information of the target item included in the execution feedback notification is acquired.

In step S414, a marked virtual article matching the size of the shape information is added to the position coordinates.

Step S416, determining a target view angle of the virtual camera in the camera coordinate system according to the acquired collection view angle of the image collection device.

Step S418, at least one virtual item is added to the target area determined according to the marked virtual item and the target view angle.

Step S420, image rendering is carried out on the marked virtual article and the virtual article from the target view angle, and a rendered image is obtained.

Step S422, according to the rendered image, determining the shielding relation between the marked image characteristic of the marked virtual article and the image characteristic of the virtual article.

In step S424, the marked virtual object is removed from the camera coordinate system.

Step S426, performing image rendering on the virtual object from the target view angle to obtain an intermediate image including image features.

Step 428, image fusion is performed on the intermediate image and the article image according to the shielding relation, and a sample image is obtained.

Specifically, a sample image is obtained, which is realized by adopting the following method:

and if not, determining the image unit as the sample image unit.

Step S430, image processing is performed on the article image to obtain target mask information of the target image feature of the target article in the article image.

Specifically, the intermediate image is subjected to image processing to obtain mask information of image features in the intermediate image, and the method is specifically implemented by adopting the following steps:

Step S432 is to perform image processing on the intermediate image to obtain mask information of image features in the intermediate image.

In practical applications, the execution sequence of step S430 and step S432 may be changed, for example, the execution sequence of step S430 and step S432 may be changed, and further, step S430 may be executed directly after obtaining the object image, and step S432 may be executed directly after obtaining the intermediate image, which is not limited herein.

Step S434, performing boolean operation according to the target mask information and the mask information to obtain mask labeling information of the sample image features of the target object in the sample image.

And step S436, labeling the sample image according to the mask labeling information to obtain a labeled image of the sample image.

Specifically, the sample image and the labeled image form a training sample pair.

In practical application, the training sample pair is used as a training sample, the initial article recognition model is trained, and the article recognition model is obtained after training is completed.

On the basis of obtaining the article identification model, the article identification is carried out on the target image through the article identification model, and the method is specifically realized by adopting the following mode:

inputting a target image into the item identification model;

In practice, the target image may be an image containing image features of the items being traded in the unmanned convenience store.

In summary, the present specification provides an image processing method, which determines position coordinates of a target object in a camera coordinate system corresponding to an object image on the basis of acquiring the object image of the target object, and constructs a virtual scene similar to the object image based on the position coordinates, namely, adding at least one virtual article in a camera coordinate system according to the position coordinates, generating an intermediate image containing the image characteristics of the virtual article at a target view angle, and image fusion is carried out on the intermediate image and the article image to obtain a sample image, the obtained article image is subjected to image fusion to generate a sample image, since the adding positions of the virtual articles in the virtual scene are various, the sample images generated based on the adding positions are also various, therefore, the diversity and richness of the sample images are increased, and the generation efficiency and the image quality of the sample images are improved.

An embodiment of an image processing apparatus provided in this specification is as follows:

corresponding to the above method embodiment, the present specification also provides an image processing apparatus embodiment, and fig. 5 shows a schematic diagram of an image processing apparatus provided in an embodiment of the present specification. As shown in fig. 5, the apparatus includes:

an acquisition module 502 configured to acquire an item image of a target item;

a determining module 504 configured to determine position coordinates of the target item in a camera coordinate system corresponding to the item image;

an adding module 506 configured to add at least one virtual item in the camera coordinate system according to the position coordinates;

a generation module 508 configured to generate an intermediate image containing image features of the virtual item at a target perspective; the target visual angle is determined according to the acquisition visual angle of the article image;

a fusion module 510 configured to image-fuse the intermediate image and the item image to obtain a sample image.

Optionally, the article image is obtained by:

and receiving the article image uploaded by the image acquisition equipment.

Optionally, the determining module 504 is further configured to: and calculating the position coordinates of the target object in a camera coordinate system corresponding to the object image according to the target coordinates and a preset transformation matrix.

Optionally, the adding module 506 includes:

an acquisition information sub-module configured to acquire shape information of the target item included in the execution feedback notification;

an adding mark virtual article sub-module configured to add a mark virtual article matching the shape information in size at the position coordinates;

a view angle determination submodule configured to determine a target view angle of the camera coordinate system from the acquired acquisition view angle of the image acquisition device;

an add virtual item sub-module configured to add at least one virtual item in a target area determined from the marked virtual item and the target perspective.

Optionally, the image processing apparatus further includes:

a first relation determining module configured to determine an occlusion relation between the marked virtual article and the virtual article according to the marked virtual article and a distance between the virtual article and an origin in the camera coordinate system;

accordingly, the fusion module 510 is further configured to: and screening out sample image units from the intermediate image unit of the intermediate image and the image unit of the article image according to the shielding relation to form the sample image.

Optionally, the image processing apparatus further includes:

a rendering module configured to perform image rendering on the marked virtual article and the virtual article from the target perspective to obtain a rendered image;

a second relation determination module configured to determine an occlusion relation between a marking image feature of the marking virtual article and the image feature according to the rendered image;

accordingly, the fusion module 510 includes:

the judging submodule is configured to judge whether a rendering image area corresponding to the rendering image in the target image area belongs to a shielding area or not according to the shielding relation aiming at the target image area where the target image unit is located in the article image;

if so, operating a first determination unit sub-module, wherein the first determination unit sub-module is configured to determine an intermediate image unit of the target image area in an intermediate image area corresponding to an intermediate image as a sample image unit of a sample image area corresponding to the target image area in a sample image;

if not, operating a second determination unit sub-module configured to determine the image unit as the sample image unit.

Optionally, the generating module 508 includes:

a removal sub-module configured to remove the marked virtual article in the camera coordinate system;

an image rendering sub-module configured to perform image rendering on the virtual article from the target perspective, obtaining the intermediate image including the image feature.

Optionally, the image processing apparatus further includes:

a first image processing module configured to perform image processing on the article image to obtain target mask information of a target image feature of the target article in the article image;

a second image processing module configured to perform image processing on the intermediate image to obtain mask information of the image features in the intermediate image;

the operation module is configured to perform Boolean operation according to the target mask information and the mask information to obtain mask marking information of sample image features of the target object in the sample image;

the marking module is configured to mark the sample image according to the mask marking information to obtain a marked image of the sample image; the sample image and the annotation image form a training sample pair.

Optionally, the first image processing module includes:

the image difference submodule is configured to perform image difference on a background image of the target position and the article image which are acquired in advance through an image difference algorithm to obtain the target image characteristics;

and the mask processing sub-module is configured to perform mask processing on the target image features to obtain the target mask information.

Optionally, the image processing apparatus further includes:

and the training module is configured to train the initial article recognition model by taking the training sample pair as a training sample, and obtain the article recognition model after the training is finished.

Optionally, the image processing apparatus further includes:

an input model module configured to input a target image into the item identification model;

and the article identification module is configured to perform article identification on the target image by the article identification model, and obtain an article identification result which is output by the article identification model and aims at the target image.

The above is a schematic configuration of an image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the image processing method.

The present specification provides an embodiment of a computing device as follows:

FIG. 6 illustrates a block diagram of a computing device 600 provided in accordance with one embodiment of the present description. The components of the computing device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 400 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 6 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

The present specification provides a computing device comprising a memory 610, a processor 620, and computer instructions stored on the memory and executable on the processor, the processor 620 being configured to execute the following computer-executable instructions:

acquiring an article image of a target article;

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the image processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the image processing method.

This specification provides one example of a computer-readable storage medium, comprising:

the present specification provides a computer readable storage medium storing computer instructions that, when executed by a processor, are operable to:

acquiring an article image of a target article;

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the image processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An image processing method comprising:

acquiring an article image and an intermediate image of a target article, wherein the intermediate image comprises the image characteristics of a virtual article at a target view angle, and the target view angle is determined according to the acquisition view angle of the article image;

carrying out image fusion on the intermediate image and the article image to obtain a sample image;

obtaining mask marking information of the sample image feature according to target mask information of a target image feature and mask information of the image feature, wherein the target image feature is a target image feature of the target object in the object image, and the sample image feature is a sample image feature of the target object in the sample image;

and labeling the sample image according to the mask labeling information to obtain a labeled image of the sample image.

2. The image processing method according to claim 1, the intermediate image being generated by:

generating an intermediate image containing image features of the virtual item at a target perspective.

3. The image processing method according to claim 2, wherein the article image is obtained by:

and receiving the article image uploaded by the image acquisition equipment.

4. The image processing method according to claim 3, wherein the grasping device grasps the target object from an initial position according to the initial coordinate, moves the target object to the target position corresponding to the target coordinate, and uploads an execution feedback notification for the movement instruction after the movement is completed;

5. The image processing method of claim 3, the determining location coordinates of the target item in a camera coordinate system corresponding to the item image, comprising:

6. The image processing method of claim 3, said adding at least one virtual item in said camera coordinate system according to said location coordinates, comprising:

7. The image processing method according to claim 6, further comprising, after the step of adding at least one virtual article to the camera coordinate system according to the position coordinates is performed and before the step of generating an intermediate image including image features of the virtual article at a target view angle is performed:

8. The image processing method according to claim 6, further comprising, after the step of adding at least one virtual article to the camera coordinate system according to the position coordinates is performed and before the step of generating an intermediate image including image features of the virtual article at a target view angle is performed:

and if not, determining the image unit as the sample image unit.

9. The image processing method of claim 7 or 8, the generating an intermediate image containing image features of the virtual article at a target perspective, comprising:

removing the marked virtual article at the camera coordinate system;

10. The image processing method of claim 1, wherein obtaining mask annotation information for the sample image feature according to the target mask information for the target image feature and the mask information for the image feature comprises:

and performing Boolean operation according to the target mask information and the mask information to obtain mask marking information of the sample image characteristics of the target object in the sample image.

11. The image processing method according to claim 10, wherein the image processing the article image to obtain target mask information of a target image feature of the target article in the article image comprises:

12. The image processing method according to claim 10, further comprising:

the sample image and the labeled image form a training sample pair;

and taking the training sample pair as a training sample, training an initial article identification model, and obtaining the article identification model after training.

13. The image processing method according to claim 12, further comprising:

inputting a target image into the item identification model;

14. An image processing apparatus comprising:

the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is configured to acquire an article image of a target article and an intermediate image, the intermediate image comprises image characteristics of a virtual article at a target view angle, and the target view angle is determined according to an acquisition view angle of the article image;

a fusion module configured to perform image fusion on the intermediate image and the article image to obtain a sample image;

the computing module is configured to obtain mask annotation information of the sample image feature according to target mask information of a target image feature and mask information of the image feature, wherein the target image feature is a target image feature of the target object in the object image, and the sample image feature is a sample image feature of the target object in a sample image;

and the marking module is configured to mark the sample image according to the mask marking information to obtain a marked image of the sample image.

15. A computing device, comprising:

a memory and a processor;

16. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the image processing method of any one of claims 1 to 13.