CN113869388A

CN113869388A - Target object identification method and device, equipment, medium and product thereof

Info

Publication number: CN113869388A
Application number: CN202111120489.2A
Authority: CN
Inventors: 刘涛; 兴百桥
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-31

Abstract

The application discloses a target object identification method and a device, equipment, medium and product thereof, wherein the method comprises the following steps: acquiring an article picture whether to be identified contains a target article; coding and decoding the article picture in multiple scales to obtain multiple image characteristic information capturing the article contour characteristics, and extracting an article segmentation picture from the article picture according to the multiple image characteristic information; classifying the image characteristic information respectively to obtain the classification probability that the image characteristic information contains the target object, and solving the average probability of a plurality of classification probabilities obtained by classifying all the image characteristic information; carrying out image recognition on the article segmentation graph to obtain the recognition probability that the article segmentation graph contains the target article; and fusing the average probability and the recognition probability to obtain a fusion probability, and judging that the object picture contains the target object when the fusion probability is greater than a preset threshold value. The target object can be identified from the object picture more accurately.

Description

Target object identification method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a target object recognition method, and a corresponding apparatus, computer device, computer-readable storage medium, and computer program product.

Background

Image recognition using an artificial neural network model has been the mainstream technology. For application scenarios such as e-commerce platforms, a large number of commodity pictures or other pictures are generated every day, and for various application needs including automatic commodity classification, commodity compliance detection, target commodity search, and the like, it is generally necessary to detect and identify target objects in the commodity pictures uploaded by users, and then perform further processing in response to different target objects and needs.

For an application scene such as an e-commerce platform, if a commodity picture uploaded by a merchant and used for displaying an article contains non-sold articles such as a knife, a sword and the like, due to the characteristics of smallness, light reflection, various forms and the like of the commodity, the existing neural network model sometimes has difficulty in accurately identifying the commodity picture, and thus, the e-commerce platform is greatly disturbed. It is not practical to rely entirely on manual investigation.

Therefore, how to accurately and efficiently identify a target object, that is, a target image, from various object pictures including an object image, so that an identification result is more reliable, and the method becomes a technical problem to be solved in the field.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide a target item identification method, and a corresponding apparatus, computer device, computer readable storage medium, and computer program product.

In order to meet various purposes of the application, the following technical scheme is adopted in the application:

a target object identification method adapted to one of the objects of the present application is provided, including the steps of:

acquiring an article picture whether to be identified contains a target article;

calling a pre-trained image segmentation model to perform multi-scale coding and decoding on the article picture, acquiring a plurality of image characteristic information capturing the article contour characteristics, and extracting an article segmentation picture from the article picture according to the plurality of image characteristic information;

classifying the image characteristic information in the image segmentation model respectively to obtain the classification probability that the image characteristic information contains the target object, and solving the average probability of a plurality of classification probabilities obtained by classifying all the image characteristic information;

calling a pre-trained image recognition model to perform image recognition on the article segmentation graph to obtain the recognition probability that the article segmentation graph contains the target article;

and fusing the average probability and the recognition probability to obtain a fusion probability, judging the result, judging that the object picture contains the target object when the fusion probability is greater than a preset threshold value, and otherwise, judging that the object picture does not contain the target object.

In a further embodiment, a pre-trained image segmentation model is called to perform multi-scale encoding and decoding on the article picture to obtain a plurality of image feature information for capturing the article contour features, and an article segmentation picture is extracted from the article picture according to the plurality of image feature information, including the following steps:

the pre-trained image segmentation model is subjected to multi-level coding, the size original drawing of the article picture is reduced in scale step by step, and intermediate characteristic information corresponding to each scale is correspondingly generated, wherein the intermediate characteristic information is used for representing the outline characteristics of the article in the article picture;

the image segmentation model after pre-training is decoded in a multi-stage mode, on the basis of image feature information generated by the middle feature information of the minimum scale, the middle feature information generated by the same-level codes of the image segmentation model is used as reference information in a stage-by-stage mode, and correspondingly decoded image feature information of a higher scale is used for representing the outline features of the articles in the article picture in a mask mode;

and performing image segmentation on the specification original image of the commodity picture by using an image segmentation model according to mask image data formed by fusing all image characteristic information so as to extract a commodity segmentation picture corresponding to the commodity in the commodity picture.

In an embodiment, the image segmentation module performs image segmentation on the specification original image of the article picture according to mask image data formed by fusing all image feature information, so as to extract an article segmentation map corresponding to an article from the article picture, and the method includes the following steps:

fully connecting all image characteristic information by a full connection layer in the image segmentation model to fuse and generate mask image data, wherein the mask image data inherits the contour characteristics of the article in the image characteristic information;

and performing image extraction on the specification original image of the article picture according to the mask image data to obtain an image in the outline corresponding to the outline characteristic of the article to form the article segmentation picture.

In a deepened embodiment, the fusion of the average probability and the recognition probability to obtain a fusion probability for result judgment includes the following steps:

obtaining the average probability and the recognition probability;

fusing the average probability and the recognition probability to calculate a fused probability, wherein the average probability and the recognition probability are smoothed by the same associated hyper-parameter;

and comparing the fusion probability with a preset threshold, when the fusion probability is greater than the preset threshold, judging that the object picture contains the target object, otherwise, judging that the object picture does not contain the target object.

In a specific embodiment, the image feature information corresponding to a plurality of scales generated by the image segmentation model is output to a two-classifier corresponding to each scale for performing two-classification decision to obtain the classification probability.

In a preferred embodiment, the image segmentation model is U²net model, the basic network structure of the image segmentation model is U²net model, the target object is a knife or a sword.

The target object identification device comprises a picture acquisition module, an image segmentation module, a contour classification module, an object classification module and a fusion judgment module, wherein the picture acquisition module is used for acquiring an object picture of whether a target object is included or not to be identified; the image segmentation module is used for calling a pre-trained image segmentation model to perform multi-scale coding and decoding on the article picture, acquiring a plurality of image characteristic information capturing the article contour characteristics, and extracting an article segmentation picture from the article picture according to the plurality of image characteristic information; the contour classification module is used for classifying the image characteristic information in the image segmentation model respectively to obtain the classification probability that the image characteristic information contains the target object, and solving the average probability of a plurality of classification probabilities obtained by classifying all the image characteristic information; the article classification module is used for calling a pre-trained image recognition model to perform image recognition on an article segmentation map to obtain the recognition probability that the article segmentation map contains the target article; and the fusion judging module is used for fusing the average probability and the recognition probability to obtain a fusion probability for result judgment, judging that the object picture contains the target object when the fusion probability is greater than a preset threshold value, and otherwise, judging that the object picture does not contain the target object.

In a further embodiment, the image segmentation module comprises: the coding path unit is configured to perform multi-level coding on a pre-trained image segmentation model, gradually reduce the dimension of the specification original image of the article picture, and correspondingly output intermediate feature information corresponding to each dimension, wherein the intermediate feature information is used for representing the contour feature of an article in the article picture; the decoding path unit is configured to perform multi-stage decoding on the pre-trained image segmentation model, on the basis of image feature information generated by the intermediate feature information of the minimum scale, and on the basis of intermediate feature information generated by the intermediate feature information of the same scale, correspondingly decode image feature information of a higher scale by using the intermediate feature information generated by the same level coding as reference information, wherein the image feature information is used for representing the contour feature of an article in the article picture in a mask mode; and the fusion and segmentation unit is used for performing image segmentation on the specification original image of the commodity picture according to mask image data formed by fusing all image characteristic information by using an image segmentation model so as to extract a commodity segmentation picture corresponding to the commodity from the commodity picture.

In an embodiment, the fusion segmentation unit comprises: the fusion subunit is used for performing full connection on all the image characteristic information by a full connection layer in the image segmentation model so as to generate mask image data through fusion, wherein the mask image data inherits the contour characteristics of the article in the image characteristic information; and the dividing subunit is used for carrying out image extraction on the specification original image of the article picture according to the mask image data to obtain an image in the outline corresponding to the outline characteristic of the article to form the article dividing picture.

In a further embodiment, the fusion decision module includes: the probability obtaining sub-block is used for obtaining the average probability and the recognition probability; a mean calculation sub-block for fusing the average probability and the recognition probability to calculate a fusion probability, wherein the average probability and the recognition probability are smoothed by the same associated hyper-parameter; and the result judgment sub-module is used for comparing the fusion probability with a preset threshold value, judging that the object picture contains the target object when the fusion probability is greater than the preset threshold value, and otherwise judging that the object picture does not contain the target object.

A computer device adapted for one of the purposes of the present application comprises a central processing unit and a memory, the central processing unit being configured to invoke execution of a computer program stored in the memory to perform the steps of the target item identification method described herein.

A computer-readable storage medium, which stores a computer program implemented according to the target item identification method in the form of computer-readable instructions, and when the computer program is called by a computer, executes the steps included in the method.

A computer program product, provided to adapt to another object of the present application, comprises computer programs/instructions which, when executed by a processor, implement the steps of the method described in any of the embodiments of the present application.

Compared with the prior art, the application has the following advantages:

firstly, carrying out image segmentation on an article picture needing target article identification by using an image segmentation model to obtain image feature information of a plurality of scales and representing the outline features of articles in the article picture, on the basis, carrying out two-way processing by one way, carrying out independent classification on the image feature information corresponding to each scale by the other way, obtaining the classification probability that each image feature information indicates that the article picture contains the target articles, and solving the average probability of the plurality of classification probabilities; and the other path fuses the image characteristic information corresponding to each scale to form mask image data which represents the contour characteristic of the captured article in the article picture, divides the article segmentation graph from the article picture according to the mask image data, calls an image recognition model to recognize whether the article segmentation graph is the target article or not, and obtains the recognition probability that the article segmentation graph contains the target article. And on the basis of obtaining the average probability and the recognition probability, further fusing the two probabilities into a single fusion probability, then comparing the fusion probability with a preset threshold, and judging that the object is contained in the object image when the fusion probability is higher than the preset threshold.

The method and the device have the advantages that the image feature information generated by the image segmentation model is multiplexed, the classification probability is calculated by utilizing the image feature information, a judgment factor about whether an article picture contains a target article is increased, and reference information can be provided for the identification result of the image identification model; for the image recognition model, the background image in the article picture does not interfere with the recognition of the article any more as the image recognition model only needs to perform image recognition on the article segmentation map, so that the target article can be recognized from the article segmentation map in a more efficient and accurate manner, and the corresponding recognition probability is obtained.

And finally, based on the fusion probability obtained by fusing the average probability obtained by the classification probability and the identification probability, with the help of an empirical threshold, comprehensively judging whether the object image contains the object, wherein the judgment result is more accurate and reliable.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of an exemplary embodiment of a target item identification method of the present application;

FIG. 2 is a schematic logical framework diagram of a network for implementing the target item identification method of the present application;

FIG. 3 shows a modified U-based scheme of the present application²A structural schematic block diagram of an image segmentation model of net;

FIG. 4 is a flowchart illustrating the operation of an image segmentation model according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart illustrating a process of determining whether an item picture includes a target item according to two probabilities in an embodiment of the present application;

FIG. 6 is a functional block diagram of a target item identification device of the present application;

fig. 7 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As will be appreciated by those skilled in the art, "client," "terminal," and "terminal device" as used herein include both devices that are wireless signal receivers, which are devices having only wireless signal receivers without transmit capability, and devices that are receive and transmit hardware, which have receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The target object identification method can be programmed into a computer program product and is deployed in a client or a server to run, so that the method can be executed by accessing an open interface after the computer program product runs and performing man-machine interaction with a process of the computer program product through a graphical user interface.

Referring to fig. 1, the target object identification method of the present application, in an exemplary embodiment thereof, includes the steps of:

step S1100, acquiring an article picture of whether the target article is included or not to be identified:

the article picture generally refers to a picture containing an article image and used for displaying the appearance shape of the article. In an exemplary application scenario for assisting the description, the article picture may be a commodity picture in an e-commerce platform, and an article displayed by the commodity picture is generally a vendable article, and the implementation of the present application is to identify a non-vendable article, such as a knife, a sword, etc., from the commodity picture, and therefore it is required to identify a corresponding target article, that is, a target image thereof, in the commodity picture.

In this application scenario, the target image is generally determined by the customized article of the e-commerce platform, i.e., the non-vendable commodity. Those skilled in the art will appreciate that the ability to recognize a target image from a picture of an article can be learned by pre-training an image recognition model, and therefore, a large number of pictures of the article containing various shapes of the non-vendable article can be used to train the image recognition model, and such image recognition model learns the ability to recognize the non-vendable article from the picture of the article. In other application scenarios, an image recognition model for recognizing a target image from a picture of an article can be prepared according to the principle. However, as shown in fig. 2, in the present application, before the image recognition model recognizes the article image, the feature extraction is performed on the article image by using the image segmentation model, which will be described in detail below, and will not be described here.

Since the image recognition model is a relatively mature technology, there are various image recognition models available in the prior art, and they are generally implemented based on CNN convolutional neural networks, including but not limited to ViT, Resnet, HTC, etc., and different such neural network-based image recognition models may have different convolutional structures, which results in different recognition modes, but in terms of function and purpose, as long as the recognition model has been trained in advance to a convergent state for a target item and put into a production stage, it can be implemented to recognize the target item from the item picture. Of course, the recognition capability of each model presents different recognition accuracy rates according to different model structures and different training samples, which is reflected as different confidence degrees of the recognition results of each recognition model.

In the application scenario of the e-commerce platform, if an article picture needs to be acquired, one implementation is to receive input of a user of the e-commerce platform, particularly input when a warehouse management user of a merchant instance configures commodity information, and take the article picture in the commodity information as the article picture, or certainly identify the picture issued to the evaluation area by a consumer and take the picture as the article picture; in another mode, the server of the e-commerce platform may process the commodity pictures in the e-commerce platform database in batch at the background, and identify the target image by using the commodity pictures as the article pictures.

Step S1200, calling a pre-trained image segmentation model to perform multi-scale coding and decoding on the article picture, obtaining a plurality of image characteristic information capturing the article contour characteristics, and extracting an article segmentation picture from the article picture according to the plurality of image characteristic information:

referring to FIG. 3, the present application employs a U-based scheme²And training the image segmentation model of the net basic network architecture until the image segmentation model is converged, and then preprocessing the to-be-recognized article picture by the preposed image recognition model in the application.

U²The net model is based on the residual convolution principle, and is constructed with a coding path and a decoding path, wherein the coding path and the decoding path are respectively provided with a plurality of stages of decoders with the same number, and the stages of the coder and the decoder can be flexibly determined by a person skilled in the art according to prior knowledge such as experiments, experience and the like. For example, fig. 3 shows six levels of encoders and decoders, the lowest level of encoders and decoders in the figure being shown as a single block diagram, mainly in that it directly subjects the intermediate feature information obtained from the previous level of encoding to a 1 x 1 convolutional kernel transformThe image characteristic information is formed later and is therefore generally illustrated as a single block diagram.

In the encoding path, in this embodiment, the six encoders (En _1 to En _5 and En _ De) in the side branch path in fig. 3 apply the specification original image corresponding to the article picture, that is, adapt to U²net cuts the article picture into the article picture with the specified specification as required by the specification of the input picture, and encodes the specification original picture step by step. The first-stage encoder at the top layer extracts the intermediate feature information corresponding to the first scale from the specification original image, then transmits the intermediate feature information to the encoder at the next stage to extract the intermediate feature information corresponding to the second scale, and by analogy, the scale of the specification original image is reduced step by step to extract the corresponding intermediate feature information, so that the six intermediate feature information corresponding to the specification original image can be obtained after the encoder encodes the specification original image step by step.

It can be understood that the intermediate feature information of each scale is an obtained representation after deep semantic understanding of the specification picture under the corresponding scale, and is information extracted from the outline feature of the article in the article picture. U shape²This ability of the net model is known to those skilled in the art, and the encoding path thereof can be provided with a deep semantic understanding ability of capturing the outline features of the item therein from the picture of the item by training it to converge with a sufficient amount of training samples.

In the decoding path, in this embodiment, in the six decoders (De _1 to De _5 and En _ De) in the right branch path in fig. 3, from the bottom layer, the first decoder En _ De performs 1 × 1 convolution transformation on the intermediate feature information output by the encoder En _5 to realize decoding, so as to obtain image feature information of a corresponding scale, and in addition, each of the other decoders at higher levels refers to the image feature information output by the encoder at the same level as the next level based on the image feature information output by the next level, so as to restore the image feature information of the scale corresponding to the level where the decoder is located, so as to analogize to the last decoder at the top layer, so as to obtain the image feature information of the same scale as the specification original image.

It can also be understood that the intermediate feature information of each scale provides context information with the intermediate feature information of the same scale, and fills the image feature information of the lower scale to obtain image feature information output of larger scale (Mask1 to Mask6), and these image feature information also include representations of outline features of the article in the article picture captured from the article picture. Unlike the intermediate feature information, the image feature information is decoded and restored, and processed to represent the contour feature in a mask form, which is essentially mask image data. It will therefore be appreciated that after progressive decoding via the decoding path, six different scales of mask image data corresponding to the number of levels will be available.

As can be understood from the disclosure of the structure and principle of the image segmentation model, the article picture of the present application obtains a plurality of image feature information after being subjected to the step-by-step encoding and the step-by-step decoding by the image segmentation model, and is based on U²net model principle, these image feature information can be further integrated to obtain Mask image data (Mask 7).

In the U2net model, a plurality of image characteristic information obtained by decoding paths of the image characteristic information are fully connected by adopting a full connection layer and processed into single Mask image data (Mask7), wherein the Mask image data is essentially a binary gray-scale image, and in the visual expression, when no article exists in the article image, the gray-scale image is a pure white picture; when an article exists in the article image, the gray-scale image is a picture with a foreground defined by a black area of the article and a background which is pure white. Therefore, as can be easily understood, after full connection, a mask picture which is commonly called as "matting" is obtained and can be used for implementing image segmentation.

Accordingly, the U2net model uses the Mask image data (Mask7) to segment an image defined by a corresponding outline from the specification original image based on the outline information indicated by the Mask image data, thereby forming an article segmentation map. Compared with the original article picture, the article segmentation picture deletes background image information and leaves article image information, and plays a role commonly known as 'cutout'.

It is understood that this object segmentation map can be provided for further recognition by the image recognition model in the present application to embody the role of the image segmentation model in preprocessing the object picture.

Step S1300, classifying each image feature information in the image segmentation model, obtaining a classification probability that the image feature information includes the target item, and calculating an average probability of a plurality of classification probabilities obtained by classifying all the image feature information:

with continuing reference to FIG. 3, the present application employs U for the image segmentation model²The basic structure of the net model is improved by inserting a classifier at each decoder (De _1 to De _5 and En _ De), wherein the classifier can be a multi-classifier, preferably, a two-classifier can be directly adopted. Each classifier is used for carrying out classification processing according to the image feature information generated by the corresponding decoder, and the classifier also participates in the training stage of the image segmentation model, so that whether the corresponding image feature information contains the target object or not can be judged according to the image feature information of each corresponding scale, and the corresponding classification probability is generated, particularly the classification probability corresponding to the target object is indicated. Accordingly, the image feature information of each scale generated by the image segmentation model is classified and judged by the corresponding classifier, and finally the classification probability of the target object is generated according to the image feature information indication.

The classification probabilities (Class1 to Class6) of the image feature information are expressed in different scales corresponding to the article pictures, and in order to make the classification probabilities have reference values and function conveniently, all the classification probabilities obtained according to the image feature information may be fused, for example, the average value is obtained to obtain the corresponding average probability for later use. In an alternative embodiment, considering that a larger scale is closer to the original image, the average probability may be obtained by performing weighted averaging on a plurality of classification probabilities, specifically, each classification probability may be matched with weights from large to small according to the size of the corresponding scale, and the obtained numbers after matching with the weights may be summed one by one to obtain an average.

It is easy to understand that, the image segmentation model is reconstructed and is accessed into a plurality of classifiers, the functions of the image segmentation model are expanded, the generated image feature information is multiplexed, the image feature information is used for classification and judgment, and richer decision reference information is provided for judging whether a target object exists in an object picture, and the decision reference information is obtained by classifying the image feature information of different scale features obtained based on deep semantic understanding and calculating classification probability, so that the significance of the decision on subsequent comprehensive classification judgment is obvious.

Step S1400, calling a pre-trained image recognition model to perform image recognition on the article segmentation map, and obtaining the recognition probability that the article segmentation map contains the target article:

as mentioned above, after "matting", the article segmentation map can be obtained, and the article segmentation map can be provided for the image recognition model for further target article recognition.

As mentioned above, the image recognition model may be selected from a variety of image recognition models that are superior in the prior art, including but not limited to: HTC (Hybrid Task Cascade for Instance Segmentation, mixed Task Cascade for Instance Segmentation), mask-rcnn, Resnet, ViT, and upgrade and evolution versions thereof, etc., are all mature image recognition models, and can be used as the image recognition model of the present application as long as a sufficient amount of corresponding training samples are adopted to train the images to converge. In the present application, ViT is recommended because of its excellent performance.

When the image recognition model is trained to be convergent, the target object can be recognized according to the object segmentation graph, the deep semantic feature information of the object segmentation graph is classified and mapped by the classifier, the probability that the deep semantic feature information is mapped to the classification which contains or does not contain the target object is obtained, and the recognition probability that the object segmentation graph contains the target object is output.

In the present application, it is understood that the image recognition model does not directly recognize the original image of the article picture, but recognizes the original image by using the article segmentation map generated by the image segmentation model as an input. The object segmentation image is preprocessed by the image segmentation model, so background information in the object image is deleted, the object image is the full information in the object segmentation image, and the image recognition model performs image recognition on the object image on the basis of the object image, so that classification judgment is obviously more accurate and efficient.

S1500, fusing the average probability and the recognition probability to obtain a fusion probability for result judgment, judging that the article picture contains a target article when the fusion probability is greater than a preset threshold, otherwise, not containing the target article:

through the processing process, two probability values are obtained for one article picture, wherein the two probability values are the average probability obtained through classification according to the image characteristic information generated by the image segmentation model, the identification probability obtained through identification and classification according to the article segmentation image generated by the image identification model, the two probability values can indicate the probability that the article picture contains the target article, and probability information corresponding to two decision reference dimensions is provided, so that the two probability values need to be fused to realize final judgment.

Therefore, in the present application, the average probability and the recognition probability may be combined in various ways, including but not limited to, any way of summing the two probabilities, averaging, weighting, and smoothly summing, as long as the applied way can embody that the two probability values play a role in making a decision, and accordingly, a single probability value will be obtained, which may be called a fusion probability.

The fusion probability integrates the classification information corresponding to the different scale features of the article picture and the classification information corresponding to the article image, so that the comprehensive judgment can be made according to the fusion probability.

In order to realize comprehensive judgment, a threshold value can be preset by combining with prior knowledge such as experiments and/or experience, the fusion probability is compared with the threshold value, when the fusion probability is higher than the threshold value, the object picture can be judged to contain the target object, otherwise, the object picture is judged not to contain the target object. Therefore, the preset threshold value plays a role in adjusting the confidence of the fusion probability and plays an auxiliary role in improving the confidence of the judgment result.

Referring to fig. 4, in a further embodiment, the step S1200 of calling a pre-trained image segmentation model to perform multi-scale encoding and decoding on the article picture to obtain a plurality of image feature information for capturing the article contour features, and extracting the article segmentation picture from the article picture according to the plurality of image feature information includes the following steps:

step S1210, performing multi-level coding on the pre-trained image segmentation model, gradually reducing the dimension of the specification original image of the article picture, and correspondingly generating intermediate feature information corresponding to each dimension, wherein the intermediate feature information is used for representing the contour features of the articles in the article picture:

referring back to fig. 3 and in conjunction with the foregoing, in the encoding branch path of the image segmentation model, the specification original image of the article picture is scaled down one by one, and the intermediate feature information is extracted at each scale by a corresponding encoder, wherein after backward intermediate feature information with a larger scale is down-sampled, the intermediate feature information with a smaller scale is obtained and transmitted to a forward encoder, and meanwhile, the intermediate feature information with a smaller scale is also transmitted to a decoder at the same level as the encoder, and is used as an input of the decoder. The current encoder performs higher semantic feature extraction on the intermediate feature information generated by the backward encoder, and generates intermediate feature information of a corresponding scale, so that the intermediate feature information is trained in advance, and each intermediate feature information extracts and represents the outline feature of the article in the article picture.

Step S1220, performing multi-level decoding on the pre-trained image segmentation model, using the image feature information generated from the minimum-scale intermediate feature information as a basis, and using the intermediate feature information generated from the same level code thereof as reference information level by level, and correspondingly decoding higher-scale image feature information, where the image feature information is used to represent the contour features of the article in the article picture in a mask form:

in the decoding branch path of the image segmentation model, the intermediate characteristic information generated by the last-stage encoder of the encoding branch path is used as the basis, the intermediate characteristic information is converted into image characteristic information through 1 x 1 convolution kernel, the image characteristic information is divided into three paths for output, one path is used for full connection, the other path is used for scale reduction of a decoder at a higher stage before the image characteristic information is provided, and the other path is used for accessing a second classifier for classification so as to obtain the classification probability when the corresponding image characteristic information contains the target object.

The other decoders of the decoding branch path take two paths of data as input, one is image characteristic information provided by the decoder at the back, the other is intermediate characteristic information generated by the encoder at the same level as the decoder, the latter is used as context information of the former to provide reference, so that the current decoder can restore image characteristic information with higher scale than the decoder at the back, the method is divided into three paths of output in the same way, and the like until the last decoder generates image characteristic information with the same size as the specification original image of the article picture.

Since the image feature information is restored by means of the intermediate feature information, it can be understood that the image feature information also represents the contour feature in the article picture in the form of a mask.

Step S1230, the image segmentation model performs image segmentation on the specification original image of the article picture according to the mask image data formed by fusing all the image feature information, so as to extract an article segmentation map corresponding to an article from the article picture:

in an embodiment, the step S1230 may be implemented by the following specific steps:

step S1231, performing full connection on all image feature information by a full connection layer in the image segmentation model to generate mask image data by fusion, where the mask image data inherits the contour feature of the article in the image feature information:

inputting the image characteristic information of each decoder of the decoding branch path into a full connection layer of an image segmentation model, and fully connecting all the image characteristic information by the full connection layer to synthesize contour characteristics of different scales so as to realize the fusion of a plurality of image characteristic information, thereby obtaining fused mask image data (mask7), wherein the mask image data inherits the contour characteristics of the article represented in the image characteristic information. In conjunction with the foregoing description, it will be appreciated that the mask image data is essentially a mask picture (mask7) that represents the image area occupied by the item in the item picture by the back-blending of a black foreground with a white background, and that serves as an image mask, which may be used for what is colloquially referred to as "matting".

Step S1232, performing image extraction on the specification original image of the article picture according to the mask image data, and obtaining an image in the outline corresponding to the outline feature of the article to form the article segmentation map:

after the mask required for image segmentation is formed by the corresponding mask image data, the data of the mask can be used for extracting an image of an area occupied by an article from the specification original image of the article picture, namely an image corresponding to an area masked by a black background of the mask, and the image is used as an article segmentation picture. To this end, the following formula can be used:

Output_Img＝mask*Img_Org+(1-mask)*255

wherein, Output_ImgFor the article segmentation graph, mask refers to the mask picture (mask7 in FIG. 3) output after the image segmentation model is fully connected, and Img_OrgIs the specification original drawing of the article picture.

Thus, the object segmentation chart required by the image recognition model can be obtained. Indeed, in some alternative embodiments, the image recognition model may also perform pre-processing on the article picture by itself, such as scaling to a fixed size, image normalization, etc., so as to adapt to the input requirement of the image recognition model, as long as the input condition of the image recognition model is satisfied, so that the image recognition model can make a recognition probability that the article in the article picture belongs to the target article. In this case, the equivalent substitution of the related art means of the present application should be also considered.

In this embodiment, a detailed process of the image segmentation model for processing the article picture and various modified embodiments thereof are further shown, and it can be seen that the image segmentation model of the present application is suitable for adopting various variants based on the Unet, and is not limited to U²net, and can be popularized and compatible with other models with equivalent graph splitting capability.

Referring to fig. 5, in an alternative embodiment, a smoothing function pair is used to determine the fusion probability, and therefore, the step S1500 further includes the following steps:

step S1510, obtaining the average probability and the recognition probability:

the average probability has been generated in the process disclosed above, and the recognition probability has been output by the image recognition model, so it is only required to be directly called.

Wherein, for the average probability, the following formula can be applied for calculation:

Output_Cls＝(cls_1+cls_2+……+cls_n)/n

wherein, Output_ClsThe image feature information of each scale of the image segmentation model comprises the average probability of the corresponding classification probability when the contour feature of the target feature is contained, and cls _ n is the nth classification probability, wherein the number of n corresponds to the number of the image feature information. By adopting the mode, simple mean value calculation is realized, and the operation efficiency is higher.

Step S1520, fuse the average probability and the recognition probability to calculate a fused probability, wherein the average probability and the recognition probability are smoothed by the same associated hyper-parameter:

the fusion probability is calculated using the following formula:

Output_Prob＝W_Vit*Output_Vit+(1-W_Vit)*Output_Cls

wherein, Output_ProbTo fuse probabilities, Output_VitThe recognition probability, Output, of the target object contained in the segmentation map of the characteristic object Output by the image recognition model_ClsAverage probability, W, obtained after averaging the respective classification probabilities in the image segmentation model_VitFor the purpose of super-parameter, for achieving smoothing, the skilled person can flexibly adjust the value of the super-parameter according to actual conditions or experimental data.

The fusion probability is calculated by adopting the formula, so that the average probability and the recognition probability are organically unified, the classification information obtained based on the image segmentation model and the classification information obtained based on the image recognition model are mutually referred, and whether the object picture contains the target image corresponding to the target object can be more accurately judged.

Step S1530, comparing the fusion probability with a preset threshold, when the fusion probability is greater than the preset threshold, determining that the article picture includes the target article, otherwise, determining that the article picture does not include the target article: this is to be understood in connection with the exemplary embodiments of this application.

In this embodiment, a smoothing function is further applied to fuse the average probability and the recognition probability, so that a person skilled in the art can flexibly adjust the functions of two types of data, namely, image feature information with different scales and an article segmentation map, in the process of identifying a target article in an article picture according to the principle of the present application, thereby more flexibly utilizing the technical scheme of the present application to serve practical needs, and making the technical scheme of the present application have universality for serving the need of identifying the target article from the article picture.

Referring to fig. 6, a target object identification apparatus adapted to one of the objectives of the present application is a functional implementation of the target object identification method of the present application, and the apparatus includes: the system comprises a picture acquisition module 1100, an image segmentation module 1200, an outline classification module 1300, an article classification module 1400 and a fusion judgment module 1500, wherein the picture acquisition module 1100 is used for acquiring an article picture whether to be identified contains a target article; the image segmentation module 1200 is configured to invoke a pre-trained image segmentation model to perform multi-scale encoding and decoding on the article picture, obtain a plurality of image feature information capturing the article contour features of the article picture, and extract the article segmentation picture from the article picture according to the plurality of image feature information; the contour classification module 1300 is configured to classify the image feature information in the image segmentation model, obtain a classification probability that the image feature information includes the target item, and obtain an average probability of a plurality of classification probabilities obtained by classifying all the image feature information; the article classification module 1400 is configured to invoke a pre-trained image recognition model to perform image recognition on an article segmentation map, so as to obtain a recognition probability that the article segmentation map includes the target article; the fusion decision module 1500 is configured to fuse the average probability and the recognition probability to obtain a fusion probability, perform result judgment, and when the fusion probability is greater than a preset threshold, determine that the article picture contains a target article, otherwise, determine that the article picture does not contain the target article.

In a further embodiment, the image segmentation module 1200 comprises: the coding path unit is configured to perform multi-level coding on a pre-trained image segmentation model, gradually reduce the dimension of the specification original image of the article picture, and correspondingly output intermediate feature information corresponding to each dimension, wherein the intermediate feature information is used for representing the contour feature of an article in the article picture; the decoding path unit is configured to perform multi-stage decoding on the pre-trained image segmentation model, on the basis of image feature information generated by the intermediate feature information of the minimum scale, and on the basis of intermediate feature information generated by the intermediate feature information of the same scale, correspondingly decode image feature information of a higher scale by using the intermediate feature information generated by the same level coding as reference information, wherein the image feature information is used for representing the contour feature of an article in the article picture in a mask mode; and the fusion and segmentation unit is used for performing image segmentation on the specification original image of the commodity picture according to mask image data formed by fusing all image characteristic information by using an image segmentation model so as to extract a commodity segmentation picture corresponding to the commodity from the commodity picture.

In a further embodiment, the fusion decision module 1500 includes: the probability obtaining sub-block is used for obtaining the average probability and the recognition probability; a mean calculation sub-block for fusing the average probability and the recognition probability to calculate a fusion probability, wherein the average probability and the recognition probability are smoothed by the same associated hyper-parameter; and the result judgment sub-module is used for comparing the fusion probability with a preset threshold value, judging that the object picture contains the target object when the fusion probability is greater than the preset threshold value, and otherwise judging that the object picture does not contain the target object.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Fig. 7 is a schematic diagram of the internal structure of the computer device. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can make a processor realize a target item identification method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the target item identification method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 6, and the memory stores program codes and various data required for executing the modules or sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in the present embodiment stores program codes and data necessary for executing all modules and sub-modules in the target object recognition device of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the target item identification method of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In summary, in the present application, an image segmentation model is used to extract an article segmentation map from an article image to be identified whether to include a target article, the image identification model is used to identify the article segmentation map to obtain an identification probability that the article picture includes the target article, and a plurality of scales of image feature information extracted by the image segmentation model are used to classify the article picture to obtain an average probability that the article picture includes the target article, and then the final determination result of whether the article picture includes the target image corresponding to the target article is obtained by comparing the fusion probability of the average probability and the identification probability with a preset threshold, because a plurality of feature information sources are integrated on the basis of determining whether the article picture includes the target article, the determination of whether the article picture includes the target article is more accurate, and a more reliable target article identification determination result can be obtained, therefore, the manual marking cost is reduced, the technical scheme of the application is applied to the E-commerce platform scene to identify the commodity pictures, and the identification efficiency of the E-commerce platform for processing mass commodity pictures can be improved.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A target object identification method is characterized by comprising the following steps:

2. The method for identifying the target object according to claim 1, wherein a pre-trained image segmentation model is called to perform multi-scale coding and decoding on the object picture, so as to obtain a plurality of image feature information for capturing the object contour features, and the object segmentation picture is extracted from the object picture according to the plurality of image feature information, comprising the following steps:

3. The method for identifying the target item according to claim 1, wherein the image segmentation model performs image segmentation on the specification original image of the item picture according to mask image data obtained by fusing all image feature information so as to extract an item segmentation map corresponding to the item from the item picture, and the method comprises the following steps:

4. The method for identifying the target item according to any one of claims 1 to 3, wherein the step of fusing the average probability and the identification probability to obtain a fused probability for result judgment comprises the following steps:

obtaining the average probability and the recognition probability;

5. The method for identifying the target item according to any one of claims 1 to 3, wherein the image feature information corresponding to a plurality of scales generated by the image segmentation model is output to a two-classifier corresponding to each scale for performing two-classification decision to obtain the classification probability.

6. The target item identification method of claim 5, wherein: the basic network architecture of the image segmentation model is U²net model, the target object is a knife or a sword.

7. A target item identification device, comprising:

the image acquisition module is used for acquiring an article image whether to be identified contains a target article;

the image segmentation module is used for calling a pre-trained image segmentation model to carry out multi-scale coding and decoding on the article picture, obtaining a plurality of image characteristic information capturing the article contour characteristics, and extracting an article segmentation picture from the article picture according to the plurality of image characteristic information;

the contour classification module is used for classifying the image characteristic information in the image segmentation model respectively to obtain the classification probability that the image characteristic information contains the target object, and solving the average probability of a plurality of classification probabilities obtained by classifying all the image characteristic information;

the article classification module is used for calling a pre-trained image recognition model to perform image recognition on an article segmentation graph to obtain the recognition probability that the article segmentation graph contains the target article;

and the fusion judgment module is used for fusing the average probability and the recognition probability to obtain a fusion probability for result judgment, judging that the object picture contains the target object when the fusion probability is greater than a preset threshold value, and otherwise, judging that the object picture does not contain the target object.

8. A computer device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 7.