CN116977654A

CN116977654A - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN116977654A
Application number: CN202310256398.4A
Authority: CN
Inventors: 朱城; 鄢科
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-10-31

Abstract

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, which can be applied to the fields of artificial intelligence, deep learning and the like. The method comprises the following steps: encoding the target image to obtain a first feature map of the target image, and extracting features of the first feature map to obtain a second feature map; performing attention learning on the first feature map features by using the second feature map to obtain an image characterization vector of the target image; performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on the intellectual property of the target image based on the second feature map to obtain a property type of the target image; and determining the target category of the target image according to the property category of the target image and the attribute category of the target image. By adopting the embodiment of the application, the image identification can be completed with the assistance of the intellectual property information of the image, so that the accuracy of the image identification can be improved.

Description

Image processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing device, a computer device, and a storage medium.

Background

Animation, comic, game (Animation, comic, game, ACG) recognition refers to a method of distinguishing specific categories of pictures, wherein ACG often presents a problem of batch occurrence of an author or a theme (i.e., an intellectual property (Intellectual Property, IP) series), which is often updated in real time, and the categories in the updated works are rich and diverse. The existing ACG recognition model mainly uses a deep learning classification method to assist manual secondary auditing, however, the method only considers the classification problem, but omits the problem that pictures are updated according to an IP series, and the existing classification method is often used for directly performing transfer learning, so that unique task characteristics of IP data are ignored, and therefore, the ACG data of the IP series cannot be well classified by the deep learning classification method. In the existing method, the judgment of the IP series pictures is mainly through a retrieval method, and after the characteristics of the pictures are obtained, the method judges whether the pictures belong to the same IP through matching of local operators or global operators. However, this method of searching requires the picture itself to carry tag information, and cannot make a better distinction between normal categories. Therefore, how to improve the accuracy of IP-series image recognition becomes a problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, which can utilize intellectual property information of an image to assist in completing image identification, thereby improving the accuracy of image identification.

In a first aspect, an embodiment of the present application provides an image processing method, including:

encoding the target image to obtain a first feature map of the target image, and extracting features of the first feature map to obtain a second feature map; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image;

performing attention learning on the first feature map features by using the second feature map to obtain an image characterization vector of the target image;

performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on the intellectual property of the target image based on the second feature map to obtain a property type of the target image;

and determining the target category of the target image according to the property category of the target image and the attribute category of the target image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

The processing unit is used for carrying out coding processing on the target image to obtain a first feature image of the target image, and carrying out feature extraction on the first feature image to obtain a second feature image; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image;

the processing unit is also used for carrying out attention learning on the first characteristic diagram characteristic by utilizing the second characteristic diagram to obtain an image characterization vector of the target image;

the processing unit is also used for carrying out attribute category prediction on the target image based on the image characterization vector to obtain the attribute category of the target image, and carrying out property category prediction on the intellectual property of the target image based on the second feature map to obtain the property category of the target image;

and the determining unit is used for determining the target category of the target image according to the property category of the target image and the attribute category of the target image.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor, a communication interface, and a memory, where the processor, the communication interface, and the memory are connected to each other, where the memory stores a computer program, and the processor is configured to invoke the computer program to execute the image processing method provided by the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program that when executed by a processor implements the image processing method provided by the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the terminal reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the terminal executes the image processing method provided by the embodiment of the application.

In the embodiment of the application, the intellectual property characteristic of the characteristic image is used as the attention to be superimposed into the image characteristic of the image, so that the identification of the intellectual property information of the image can be enhanced, and the accuracy of the image identification can be improved by utilizing the intellectual property information of the target image to assist in completing the identification of the image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 2a is a schematic diagram of an encoder according to an embodiment of the present application;

fig. 2b is a schematic structural diagram of a residual block according to an embodiment of the present application;

FIG. 3 is a flow chart of a training method for a class identification model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an attention mechanism according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a class of process models provided by an embodiment of the present application;

fig. 6 is a schematic diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to facilitate understanding of the embodiments of the present application, the image processing method of the present application is described below.

In order to improve accuracy of image recognition, the embodiment of the application provides an image processing scheme. The following describes a general implementation procedure of an image processing scheme provided in an embodiment of the present application: the method comprises the steps that a computer device carries out coding processing on a target image to obtain a first feature map of the target image, and carries out feature extraction on the first feature map to obtain a second feature map; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image; performing attention learning on the first feature map features by using the second feature map to obtain an image characterization vector of the target image; performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on the intellectual property of the target image based on the second feature map to obtain a property type of the target image; and determining the target category of the target image according to the property category of the target image and the attribute category of the target image.

Practice shows that the image processing scheme provided by the embodiment of the application has the following beneficial effects: (1) the learned property features of the image are superimposed into the image features of the image, and the image recognition is completed in an auxiliary manner by utilizing the intellectual property information of the image, so that the accuracy of the image recognition can be improved. (2) The computer equipment can identify most images to be predicted by using the scheme provided by the embodiment of the application, and returns the identification result of the content auditing according to the identification result, so that the scheme has wide application range, can be oriented to all Internet enterprises, and assists in auditing the picture content. (3) Compared with a method for auditing contents by manpower alone, the method greatly reduces the workload of manual review, improves the efficiency, has higher stability, and can assist operators to obtain accurate results more quickly.

It should be noted that: in a specific implementation, the above scheme may be performed by a computer device, which may be a terminal or a server; among them, the terminals mentioned herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart watches, smart televisions, smart car terminals, and the like; a wide variety of clients (APP) may be running within the terminal, such as a video play client, a social client, a browser client, a streaming client, an educational client, and so forth. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and so on. In addition, the computer device in the embodiment of the present application may be located outside the blockchain network or inside the blockchain network, which is not limited thereto; a blockchain network is a network composed of a point-to-point network (P2P network) and a blockchain, and a blockchain refers to a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm, etc., which is essentially a decentralised database, and is a string of data blocks (or referred to as blocks) generated by association using a cryptographic method.

The image processing method provided by the embodiment of the application can be realized based on artificial intelligence (Artificial Intelligence, AI) technology. Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to a wide field, and the technology AI basic technology with both hardware level and software level generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing technology, an operation/interaction system, electromechanical integration and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The image processing method provided by the embodiment of the application mainly relates to a deep learning technology in an AI technology. Deep Learning (Deep Learning) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Deep learning and machine learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The image processing method provided by the embodiment of the application can be realized based on an artificial intelligence cloud Service, which is also commonly called as AI as a Service (AIaaS). The service mode of the artificial intelligent platform is the mainstream at present, and particularly, the AIaaS platform can split several common AI services and provide independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access one or more artificial intelligence services provided by the use platform through an API interface, and partial deep developers can also use an AI framework and AI infrastructure provided by the platform to deploy and operate and maintain self-proprietary cloud artificial intelligence services.

In order to facilitate understanding of the embodiments of the present application, a detailed description will be given below of a specific implementation of the image processing scheme provided in the present application.

Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment of the application, where the image processing method may be performed by a computer device. As shown in fig. 1, the image processing method may include, but is not limited to, the following steps:

s101, performing coding processing on a target image to obtain a first feature map of the target image, and performing feature extraction on the first feature map to obtain a second feature map.

The first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image.

In an alternative embodiment, the computer device encodes the target image to obtain a first feature map of the target image, which may include: and coding the target image by using the convolutional neural network to obtain a first feature map of the target image.

In an alternative embodiment, the computer device performs feature extraction on the first feature map to obtain a second feature map, which may include: and extracting the characteristics of the first characteristic map by utilizing a subnetwork formed by Residual blocks (Residual blocks) to obtain a second characteristic map. Alternatively, 6 ResBlocks may be included in a subnetwork, but are not limited to.

Alternatively, the convolutional neural network and the residual block may constitute an encoder. The process of the computer device obtaining the first and second feature maps will be described below using the encoder shown in fig. 2a as an example. As shown in fig. 2a, the encoder 301 consists of 3 convolutional layers and 6 ResBlock. Assuming that the size of the target image is w×h×c, where W denotes the width of the target image, H denotes the height of the target image, and C denotes the number of channels (c=3), after 3 convolution layer processes, a first feature map having a size of (W/4) × (H/4) ×128 can be obtained. After the first feature map passes through a sub-network composed of 6 ResBlock, a new feature map, namely a second feature map, is obtained, wherein the size of the second feature map is (W/4) x (H/4) x 128. Alternatively, the structure of ResBlock is shown in FIG. 2b, where ResBlock consists of two convolutional layers (or weight layers) and one body layer, as shown in FIG. 2 b. The ResBlock also includes a residual branch for conveying lower-level information to enable the network to learn deeper features.

S102, performing attention learning on the first feature map features by using the second feature map to obtain an image characterization vector of the target image.

From the foregoing, it can be seen that the first feature map is used to characterize image features of the target image, and the second feature map is used to characterize intellectual property features of the target image. Therefore, step S202, that is, superimposes the property feature characterizing the intellectual property of the target image as attention into the image feature characterizing the target image so that the property feature of the target image is included in the image characterization vector of the resulting target image.

S103, carrying out attribute type prediction on the target image based on the image characterization vector to obtain the attribute type of the target image, and carrying out property type prediction on the intellectual property of the target image based on the second feature map to obtain the property type of the target image.

S104, determining the target category of the target image according to the property category of the target image and the attribute category of the target image.

In an alternative embodiment, the determining, by the computer device, the target category of the target image according to the property category of the target image and the attribute category of the target image may include: dictionary learning is carried out on the second feature map, and a target dictionary matrix is obtained; and determining the target category of the target image according to the target dictionary matrix, the property category of the target image and the attribute category of the target image.

In this embodiment, the determining, by the computer device, the target category of the target image according to the target dictionary matrix, the property category of the target image, and the attribute category of the target image may include: determining a distance between a second feature map of the target image and each title feature in the target dictionary matrix; if the property features with the distance from the second feature map being smaller than or equal to the preset value exist, determining the property category of the target image as the known property category; and taking the attribute category of the target image as the target category of the target image.

Optionally, if there is no title feature whose distance from the second feature map is smaller than or equal to a preset value, determining that the title category of the target image is an unknown title category; and generating prompt information, wherein the prompt information comprises attribute categories of the target image, and the prompt information is used for prompting the attribute categories to assist in determining the target categories of the target image.

For example, assuming that the preset value is d, the property class of the image a is class 2, the attribute class of the image a is class a, the computer device may determine the distance between the second feature map of the image a and each property feature in the target dictionary matrix, if there is a property feature having a distance from the second feature map less than or equal to d, determine the property class of the image a as a known class, and may determine the target class of the image a as class a. If the property feature with the distance smaller than or equal to d from the second feature map does not exist, determining that the property category of the image a is an unknown category, generating prompt information at the moment, wherein the prompt information comprises the attribute category of the image a as a category A, and the prompt information is used for prompting the category A to assist in determining the target category of the image a. That is, the category a can only serve as the reference category of the image a.

Optionally, after determining that the property class of the target image is an unknown property class, the computer device may further add the property class of the target image to the property class library, and update the target dictionary matrix using a dictionary learning method. Thus, the dynamic newly-added expansion of the IP data can be solved.

In the embodiment of the application, the computer equipment performs coding processing on the target image to obtain a first feature map of the target image, and performs feature extraction on the first feature map to obtain a second feature map; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image; performing attention learning on the first feature map features by using the second feature map to obtain an image characterization vector of the target image; performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on the intellectual property of the target image based on the second feature map to obtain a property type of the target image; and determining the target category of the target image according to the property category of the target image and the attribute category of the target image. Therefore, by adopting the embodiment of the application, the recognition of the intellectual property information of the image can be enhanced by superposing the intellectual property feature representing the intellectual property of the image into the image feature of the image as the attention, so that the accuracy of image recognition can be improved by using the intellectual property information of the target image to assist in completing the recognition of the image.

In an alternative embodiment, in the image processing method shown in fig. 1, the target class of the target image may be determined by calling a class identification model. The training method of the class identification model is explained below with reference to fig. 3. Referring to fig. 3, fig. 3 is a flowchart of a training method of a class identification model according to an embodiment of the present application, where the training method of the class identification model may be executed by a computer device. As shown in fig. 3, the training method of the class identification model may include, but is not limited to, the following steps:

s301, acquiring a training image, an attribute type label of the training image and a property type label of intellectual property of the training image.

S302, calling an initial category recognition model, acquiring a first feature map of the training image, and extracting features of the first feature map of the training image to acquire a second feature map of the training image.

The first feature map of the training image is used for representing image features of the training image, and the second feature map of the training image is used for representing property features of intellectual property of the training image.

In an alternative embodiment, the initial class identification model includes a first initial classification network (alternatively referred to as an initial backbone network) for determining a predicted attribute class of the training image and a second initial classification network (alternatively referred to as an initial IP network) for determining a property class of the intellectual property of the training image.

Alternatively, the computer device may acquire the first feature map of the training image using an encoder in the first initial classification network.

S303, performing attention learning on the first feature map of the training image by using the second feature map of the training image to obtain an image characterization vector of the training image.

In an alternative embodiment, the computer device may employ an attention structure as shown in fig. 4, with the second feature map of the training image being used to learn the attention of the first feature map of the training image. Referring to fig. 4, fig. 4 is a schematic structural diagram of a soft attention mechanism according to an embodiment of the present application. The soft attention mechanism is to calculate a weighted average of N pieces of input information, instead of selecting only 1 piece of input information from the N pieces of input information, and input the weighted average to the neural network, wherein the weight of the important information is larger. As shown in FIG. 5, wherein X ₁ 、X ₂ 、…、X _N Representing the feature values of the feature layers; q represents a query vector, which is a representation vector related to a task and is used as a reference for feature selection; s denotes an attention scoring function for calculating the correlation between the input feature and the query vector to obtain a probability distribution (otherwise known as an attention distribution) of the feature being selected; alpha ₁ 、α ₂ 、…、α _N After the output result of the attention scoring function is calculated by the softmax function, the obtained weighted sum of the attention-based weight values represents the weight value of specific each category; and a represents feature information related to tasks which is screened out according to the weighted average of the attention distribution and the input features.

Alternatively, the attention scoring function may be an additive model, a dot product model, a scaled dot product model, a bilinear model, or the like, without limitation.

In the application, a soft attention mechanism as shown in fig. 4 is adopted, each feature layer can be associated with each category information, a soft information selection mechanism is adopted to give the result obtained by inquiring, and meanwhile, the picture tag information is carried in, namely, the input information is summarized in a weighted average mode with tag information guidance. Optionally, the calculation formula of the soft attention is as follows formula (1):

in the formula (1), X _i Representing the feature values of each feature layer, i E [1, N]I is a positive integer; alpha _i Representing the weight value, L, of a particular individual category _j The label information of ip is represented as an M-dimensional vector (M is the number of ip); q represents a query vector.

S304, carrying out attribute type prediction on the training image based on the image characterization vector of the training image to obtain a predicted attribute type of the training image, and carrying out property type prediction on the intellectual property of the training image based on the second feature map to obtain a predicted property type of the training image.

In an alternative embodiment, the computer device predicts the attribute type of the training image based on the image characterization vector of the training image, and the predicted attribute type of the training image may be determined based on the first initial classification network; the computer device predicts a title class of the intellectual property of the training image based on the second feature map, and the deriving the predicted title class of the training image may be determined based on the second initial classification network.

S305, training the initial category recognition model according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image and the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain a category recognition model.

In an alternative embodiment, the class identification model includes a first target classification network (or referred to as a target backbone network) for determining a property class of the target image and a second target classification network (or referred to as a target IP network) for determining a property class of the target image; the computer device trains the initial class recognition model according to the direction of reducing the difference between the predicted attribute class of the training image and the attribute class label of the training image and the difference between the predicted property class of the training image and the property class label of the intellectual property of the training image to obtain a class recognition model, and the method can comprise the following steps: training the first initial classification network according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image to obtain a first target classification network; the first initial classification network is used for determining the predicted attribute category of the training image; training the second initial classification network according to the direction of reducing the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain a second target classification network; the second initial classification network is used to determine a title category of intellectual property of the training image.

Alternatively, the difference between the predicted attribute category of the training image and the attribute category label of the training image may be calculated by calculating a loss function (e.g., denoted as L _gcls ) Determining; the difference between the predicted title class of the training image and the title class label of the intellectual property of the training image may be calculated by calculating a loss function (e.g., denoted as L _ipcls ). Alternatively, the loss function may include, but is not limited to, a cross entropy loss function.

Optionally, the computer device may perform further distinguishing of the property features by introducing dictionary learning, so as to avoid the problem that the accuracy of recognition of the property category of the image is low due to higher similarity between the IPs of the images, that is, smaller class spacing.

In an alternative embodiment, the computer device may input the second feature map of the training image, the initial dictionary matrix, and the sparse matrix into the initial dictionary learning model, update the initial dictionary matrix to obtain an updated dictionary matrix, where the initial dictionary matrix is determined based on the second feature map of the training image; and training the initial dictionary learning model according to the direction of reducing the difference between the first feature map of the training image and the updated dictionary matrix to obtain the target dictionary learning model. Optionally, the computer device may randomly select K column vectors from the feature matrix corresponding to the second feature map of the training image as atoms of the initial dictionary matrix, to obtain the initial dictionary matrix.

The specific method for dictionary learning comprises the following steps:

and step 1, initializing. The computer equipment randomly takes K column vectors { d ] from a feature matrix Y corresponding to a second feature map of the training image ₁ ，d ₂ ，…，d _K Using the initial dictionary as an atom of the initial dictionary to obtain an initial dictionary matrix D ⁽⁰⁾ ∈R ^m×K The dimension of Y is (W/4) x (H/4) x 128. Let j=0, repeatedly execute step two and step three until a specified number of iteration steps is reached, or converge to a specified error.

And step 2, sparse coding. Dictionary matrix D obtained in one step on dictionary ^(j) Sparse coding to obtain X ^(j) ∈R ^K×n 。

And step 3, updating the dictionary. Updating dictionary D column by column ^(j) Column d of dictionary _k ∈{d ₁ ，d ₂ ，…，d _K }。

The dictionary updating step is as follows from step 3.1 to step 3.5.

Step 3.1, when updating d _k When calculating the error matrix E _k . Wherein E is _k The calculation formula of (2) is as follows.

In the formula (2)Y represents a feature matrix corresponding to a first feature map of the image; d, d _j The j (j. Noteq. K) th column vector among the K column vectors is shown;represented is the j-th column vector in the sparse coding matrix.

Step 3.2, taking out the kth row vector of the sparse matrixSet of indexes other than 0

For example, assume that the sparse matrix is K rows and 5 columns, with the 2 nd row vectorThen a set ω of indexes other than 0 in row 2 vector _k = {1,3,5}. That is, the 1 st element, the 3 rd element, and the 5 th element in the 2 nd row vector are not 0. At this time, the 2 nd row vector becomes variable

Step 3.3, slave E _k Take out the corresponding omega _k Columns other than 0, give E _k ′。

That is, E _k All numbers in' are not 0.

Step 3.4, pair E _k ' singular value decomposition E _k ＝UΣV ^T Taking column 1 of U to update column k of dictionary, i.e. d _k =u (·, 1); order theObtain->After that, update its corresponding to +.>

Step 3.5, let j=j+1.

Error matrix E while training dictionary learning model _k Learning loss L for corresponding dictionary _dlos . The computer device can make the dictionary matrix represent the most essential features in IP by continuously reducing error loss. When the target image enters the dictionary during testing, the computer equipment can measure the gap between the first feature map of the target image and the target dictionary matrix, and if the gap is too large, the target image is a brand new image and represents a new IP category.

Therefore, by adopting the embodiment of the application, on one hand, the class identification model can classify the data of the same IP as much as possible by introducing the label identification capability of the IP. On the other hand, the property features learned by the IP network are used as attention to be superimposed into the backbone network, so that the learning capability of the backbone network for specific tasks of the IP can be enhanced. In addition, because the data of the IP related tasks are less, the embodiment of the application can enable the model to learn the most essential characteristic representing the IP in the image by introducing dictionary learning, and the whole performance of the model can be gradually updated by gradually updating the dictionary matrix in the dictionary learning.

In addition, the class identification model provided by the embodiment of the application can be applied to the IP related image identification task, and as the IP related image can be dynamically expanded along with the time, the class identification model provided by the embodiment of the application can ensure that the image processing service has stronger expansibility.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a similar processing model according to an embodiment of the present application. The class processing model shown in fig. 6 includes a backbone network 501 (corresponding to the first target classification network) and an IP network 502 (corresponding to the second target classification network), where the IP network 502 includes a dictionary learning module (corresponding to the target dictionary learning model).

In an alternative embodiment, the computer device may invoke the class processing model shown in FIG. 5 to complete the image processing method shown in FIG. 2. The computer device may input the target image into the backbone network 501 and the IP network 502 at the same time, encode the target image by using an encoder in the backbone network 501 to obtain a first feature map 503 of the target image, and perform feature extraction on the first feature map 503 to obtain a second feature map 504, where the first feature map 503 is used to characterize an image feature of the target image, and the second feature map 504 is used to characterize an intellectual property feature of the target image; the second feature map 504 is used as attention to be added into the first feature map 503, and an image characterization vector of the target image is obtained; inputting the image characterization vector of the target image into a classifier of the backbone network 501 to obtain an attribute category of the target image, and inputting the second feature map 504 of the target image into a classifier of the IP network to obtain a property category of intellectual property of the target image; searching the existing IP information by using a dictionary in the IP network, if the property features with the distance from the second feature map 504 being smaller than or equal to the preset value exist, determining the property category of the target image as the known property category, wherein the attribute category of the target image can be used as the target category of the target image, that is, the target category of the target image can be directly referred to the category output by the backbone network 501; if there is no title feature whose distance from the second feature map 504 is less than or equal to the preset value, that is, the distances between all the title features and the second feature map 504 are greater than the preset value, it is determined that the title class of the target image is an unknown title class, in this case, the title class corresponding to the target image may be newly added in the IP alone, and the dictionary learning library (i.e., the target dictionary matrix) may be updated, where the class output by the backbone network 501 is only used as a reference.

Therefore, in the embodiment of the application, on one hand, the recognition of the specific task of the IP by the backbone network can be enhanced by superposing the features learned by the IP network into the backbone network as the attention, so that the accuracy of image recognition is improved. On the other hand, the property categories of the target image are roughly ranked through the IP network, then are finely ranked through the dictionary learning mode, the IP is regulated through the rough ranking and the fine ranking, if the distance between the target image and a certain IP in the dictionary is smaller than or equal to a preset value, the target image is merged into the existing IP, otherwise, the IP category corresponding to the target image is newly added independently, and a new category cluster is expanded in the dictionary learning, so that the problem of dynamic expansion of IP data can be solved, and meanwhile, the problem of less IP data can be solved. In addition, by introducing tag identification capability of the IP, data of the same IP can be categorized as much as possible.

It should be noted that, when the embodiment of the present application is applied to a specific product or technology, the target image, the training image, etc. related to the embodiment of the present application are acquired after the permission or consent of the user is obtained; and the collection, use and processing of target images, training images, etc. requires compliance with relevant laws and regulations and standards of the relevant countries and regions.

Based on the above description of the related embodiments of the image processing method, the embodiments of the present application also provide an image processing apparatus, which may be a computer program (including program code) running in a computer device. Referring to fig. 6, fig. 6 is a schematic diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus may perform the image processing method shown in fig. 1, and the image processing apparatus may include:

the processing unit 601 is configured to perform encoding processing on the target image to obtain a first feature map of the target image, and perform feature extraction on the first feature map to obtain a second feature map; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image;

the processing unit 601 is further configured to learn the attention of the first feature map feature by using the second feature map, so as to obtain an image characterization vector of the target image;

the processing unit 601 is further configured to perform attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and perform title type prediction on intellectual property of the target image based on the second feature map to obtain a title type of the target image;

A determining unit 602, configured to determine a target category of the target image according to the property category of the target image and the attribute category of the target image.

In an alternative embodiment, the processing unit 601 is specifically configured to, when determining the target category of the target image according to the property category of the target image and the attribute category of the target image:

dictionary learning is carried out on the second feature map, and a target dictionary matrix is obtained;

and determining the target category of the target image according to the target dictionary matrix, the property category of the target image and the attribute category of the target image.

In an alternative embodiment, the determining unit 602 is specifically configured to, when determining the target category of the target image according to the target dictionary matrix, the property category of the target image, and the attribute category of the target image:

determining a distance between a second feature map of the target image and each title feature in the target dictionary matrix;

if the property features with the distance from the second feature map being smaller than or equal to the preset value exist, determining the property category of the target image as the known property category;

and taking the attribute category of the target image as the target category of the target image.

In an alternative embodiment, the determining unit 602 is further configured to:

if the property features with the distance from the second feature map smaller than or equal to the preset value do not exist, determining that the property category of the target image is an unknown property category;

and generating prompt information, wherein the prompt information comprises attribute categories of the target image, and the prompt information is used for prompting the attribute categories to assist in determining the target categories of the target image.

In an alternative embodiment, the image processing apparatus further comprises a training unit 603.

In an alternative embodiment, the target class of the target image is determined by invoking a class identification model; training unit 603 for:

acquiring a training image, an attribute type label of the training image and a property type label of intellectual property of the training image;

invoking an initial category recognition model to obtain a first feature map of the training image, and extracting features of the first feature map of the training image to obtain a second feature map of the training image; the first feature map of the training image is used for representing image features of the training image, and the second feature map of the training image is used for representing property features of intellectual property of the training image;

performing attention learning on the first feature map of the training image by using the second feature map of the training image to obtain an image characterization vector of the training image;

Performing attribute type prediction on the training image based on the image characterization vector of the training image to obtain a predicted attribute type of the training image, and performing property type prediction on the intellectual property of the training image based on the second feature map to obtain a predicted property type of the training image;

and training the initial category recognition model according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image and the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain a category recognition model.

In an alternative embodiment, the category recognition model includes a first target classification network for determining a property category of the target image and a second target classification network for determining a property category of the target image;

the training unit 603 is configured to train the initial class identification model according to a direction of reducing a difference between a predicted attribute class of the training image and an attribute class label of the training image, and a difference between a predicted property class of the training image and a property class label of an intellectual property of the training image, so as to obtain a class identification model, and is specifically configured to:

Training the first initial classification network according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image to obtain a first target classification network; the first initial classification network is used for determining the predicted attribute category of the training image;

training the second initial classification network according to the direction of reducing the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain a second target classification network; the second initial classification network is used to determine a title category of intellectual property of the training image.

In an alternative embodiment, the target dictionary matrix is determined based on a target dictionary learning model; the training unit 603 is further configured to:

inputting the second feature map, the initial dictionary matrix and the sparse matrix of the training image into an initial dictionary learning model, and updating the initial dictionary matrix to obtain an updated dictionary matrix; the initial dictionary matrix is determined based on the second feature map of the training image;

and training the initial dictionary learning model according to the direction of reducing the difference between the first feature map of the training image and the updated dictionary matrix to obtain the target dictionary learning model.

According to the embodiment provided by the application, each unit in the image processing device shown in fig. 6 can be respectively or completely combined into one or a plurality of other units to form the image processing device, or some unit(s) can be further split into a plurality of units with smaller functions to form the image processing device, so that the same operation can be realized without affecting the realization of the technical effects of the embodiment of the application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the image-based processing apparatus also includes other units, and in practical applications, these functions may be realized with assistance of other units, and may be realized by cooperation of a plurality of units.

According to an embodiment provided by the present application, an image processing apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing steps involved in the respective methods as shown in fig. 1 on a general-purpose computer device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and an image processing method of the embodiment of the present application is implemented. The computer program may be recorded on, for example, a computer-readable storage medium, and loaded into and executed by the above-described computer apparatus via the computer-readable storage medium.

It can be understood that the specific implementation of each unit in the image processing apparatus and the beneficial effects that can be achieved in the image processing apparatus provided in the embodiments of the present application may refer to the description of the foregoing image processing method embodiments, which is not repeated herein.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides a computer device. Referring to fig. 7, the computer device includes at least a processor 701, a memory 702, and a communication interface 703. The processor 701, the memory 702, and the communication interface 703 may be connected by a bus 704 or otherwise, as exemplified by the connection of the embodiments of the present application by the bus 704.

Among them, the processor 701 (or central processing unit (Central Processing Unit, CPU)) is a computing core and a control core of the computer device, which can parse various instructions in the computer device and process various data of the computer device, for example: the CPU can be used for analyzing a startup and shutdown instruction sent by a user to the computer equipment and controlling the computer equipment to perform startup and shutdown operation; and the following steps: the CPU may transmit various types of interaction data between internal structures of the computer device, and so on. The communication interface 703 may optionally include a standard wired interface, a wireless interface (e.g., wi-Fi, mobile communication interface, etc.), controlled by the processor 701 for transceiving data. Memory 702 (Memory) is a Memory device in a computer device for storing computer programs and data. It will be appreciated that the memory 702 herein may include both built-in memory of the computer device and extended memory supported by the computer device. Memory 702 provides storage space that stores the operating system of the computer device, which may include, but is not limited to: windows system, linux system, android system, iOS system, etc., the application is not limited in this regard. In an alternative implementation, the processor 701 according to an embodiment of the present application may execute the following operations by executing a computer program stored in the memory 702:

performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on intellectual property of the target image based on the second convolution feature to obtain a property type of the target image;

In an alternative embodiment, the processor 701, when executing the determining the target category of the target image according to the property category of the target image and the attribute category of the target image, specifically executes:

dictionary learning is carried out on the second convolution characteristics, and a target dictionary matrix is obtained;

In an alternative embodiment, the processor 701, when executing the determining the target category of the target image according to the target dictionary matrix, the property category of the target image, and the attribute category of the target image, specifically executes:

In an alternative embodiment, processor 701 further performs:

In an alternative embodiment, the target class of the target image is determined by invoking a class identification model; the processor 701 also performs:

The processor 701 performs training on the initial class recognition model in a direction of reducing the difference between the predicted attribute class of the training image and the attribute class label of the training image, and the difference between the predicted property class of the training image and the property class label of the intellectual property of the training image, to obtain a class recognition model, and specifically performs:

In an alternative embodiment, the target dictionary matrix is determined based on a target dictionary learning model; the processor 701 also performs:

In a specific implementation, the processor 701, the memory 702, and the communication interface 703 described in the embodiments of the present application may execute an implementation manner of a computer device described in the image processing method provided in the embodiments of the present application, or may execute an implementation manner described in the image processing apparatus provided in the embodiments of the present application, which is not described herein again.

The embodiments of the present application also provide a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the image processing method of any one of the possible implementations described above. The specific implementation manner may refer to the foregoing description, and will not be repeated here.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the image processing method of any one of the possible implementations described above. The specific implementation manner may refer to the foregoing description, and will not be repeated here.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The above disclosure is illustrative only of some embodiments of the application and is not intended to limit the scope of the application, which is defined by the claims and their equivalents.

Claims

1. An image processing method, the method comprising:

performing attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and performing property type prediction on intellectual property of the target image based on the second feature map to obtain a property type of the target image;

2. The method of claim 1, wherein the determining the target category of the target image based on the property category of the target image and the attribute category of the target image comprises:

3. The method of claim 2, wherein the determining the target class of the target image based on the target dictionary matrix, the property class of the target image, and the attribute class of the target image comprises:

if the property features with the distance from the second feature map being smaller than or equal to a preset value exist, determining the property category of the target image as a known property category;

4. A method according to claim 3, characterized in that the method further comprises:

5. The method of claim 1, wherein the target class of the target image is determined by invoking a class identification model; the method further comprises the steps of:

invoking an initial category recognition model, acquiring a first feature map of the training image, and extracting features of the first feature map of the training image to acquire a second feature map of the training image; the first feature map of the training image is used for representing image features of the training image, and the second feature map of the training image is used for representing intellectual property features of the training image;

And training the initial category recognition model according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image and the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain the category recognition model.

6. The method of claim 5, wherein the category-recognition model comprises a first target classification network and a second target classification network, wherein the first target classification network is used to determine an attribute category of the target image and the second target classification network is used to determine a property category of the target image;

the training the initial category recognition model according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image and the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain the category recognition model, comprising:

training a first initial classification network according to the direction of reducing the difference between the predicted attribute category of the training image and the attribute category label of the training image to obtain a first target classification network; the first initial classification network is used for determining a predicted attribute category of the training image;

Training a second initial classification network according to the direction of reducing the difference between the predicted property category of the training image and the property category label of the intellectual property of the training image to obtain a second target classification network; the second initial classification network is used to determine a title category of intellectual property of the training image.

7. The method of claim 2, wherein the target dictionary matrix is determined based on a target dictionary learning model; the method further comprises the steps of:

inputting the second feature map, the initial dictionary matrix and the sparse matrix of the training image into an initial dictionary learning model, and updating the initial dictionary matrix to obtain an updated dictionary matrix; the initial dictionary matrix is determined based on a second feature map of the training image;

and training the initial dictionary learning model according to the direction of reducing the difference between the first feature map of the training image and the updated dictionary matrix to obtain a target dictionary learning model.

8. An image processing apparatus, characterized in that the apparatus comprises:

the processing unit is used for carrying out coding processing on the target image to obtain a first characteristic image of the target image, and carrying out characteristic extraction on the first characteristic image to obtain a second characteristic image; the first feature map is used for representing image features of the target image, and the second feature map is used for representing property features of intellectual property of the target image;

The processing unit is further configured to learn the attention of the first feature map feature by using the second feature map, so as to obtain an image characterization vector of the target image;

the processing unit is further configured to perform attribute type prediction on the target image based on the image characterization vector to obtain an attribute type of the target image, and perform property type prediction on intellectual property of the target image based on the second feature map to obtain a property type of the target image;

9. A computer device comprising a memory, a communication interface, and a processor, wherein the memory, the communication interface, and the processor are interconnected; the memory stores a computer program, and the processor calls the computer program stored in the memory for realizing the image processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the image processing method of any one of claims 1 to 7.

11. A computer program product, characterized in that the computer program product comprises a computer program or computer instructions which, when executed by a processor, implement the image processing method according to any of claims 1 to 7.