CN116977987A

CN116977987A - Image recognition method, device, electronic equipment and storage medium

Info

Publication number: CN116977987A
Application number: CN202310003814.XA
Authority: CN
Inventors: 朱城; 鄢科
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-10-31

Abstract

The application discloses an image recognition method, an image recognition device, electronic equipment and a storage medium. The embodiment of the application relates to the technical fields of artificial intelligence machine learning, cloud technology and the like. The method comprises the following steps: acquiring a target image; identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories respectively; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories. The shared network layer of the image recognition model is obtained through feature matching loss value training, and the feature matching loss value is used for representing the degree of independence between the first model aiming at the first category and the second model aiming at the second category, so that the image recognition model comprising the shared network layer has good recognition capability on the first category and the second category, the recognition effect of the image recognition model is improved, and the accuracy of the recognition result of the target image obtained through the image recognition model is further improved.

Description

Image recognition method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition device, an electronic device, and a storage medium.

Background

Currently, an image to be detected can be identified through an image identification model aiming at a specific category, so that a label aiming at the specific category is obtained, and the label characterizes that the image to be detected comprises an object under the specific category or does not comprise the object under the specific category. For example, by means of an image recognition model for a cat, the determined label of the image to be detected is that the image to be detected includes a cat or that the image to be detected does not include a cat.

In order to enable the image recognition model to recognize objects of multiple categories, the image recognition models of different categories can be directly combined into the multi-category image recognition model. In the multi-category image recognition model, different categories share a plurality of layers of the same network at the front end of the model, and different categories start to diverge again at the rear end of the model, and different diverges are used for obtaining labels of different categories. For example, the same data is input into the multi-category image recognition model to obtain labels respectively corresponding to the bifurcation of each category.

However, different categories share a plurality of layers of the same network, and the shared network layers affect each other on the recognition capability between the different categories, so that the recognition effect of the multi-category image recognition model is poor, and the accuracy of the recognition result of the image to be detected determined according to the multi-category image recognition model is low.

Disclosure of Invention

In view of the above, the embodiments of the present application provide an image recognition method, an image recognition device, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides an image recognition method, including: acquiring a target image; identifying a target image through an image identification model to obtain the prediction probabilities corresponding to a plurality of preset categories, wherein the prediction probability of each preset category represents the probability that the target image comprises an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first class, and the second model is a model aiming at a second class; the first category comprises at least one preset category and the second category comprises at least one preset category different from the first category; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories.

In a second aspect, an embodiment of the present application provides an image recognition apparatus, including: the acquisition module is used for acquiring a target image; the identification module is used for identifying the target image through the image identification model to obtain the prediction probabilities corresponding to the preset categories respectively, and the prediction probability of each preset category represents the probability that the target image comprises an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first class, and the second model is a model aiming at a second class; the first category comprises at least one preset category and the second category comprises at least one preset category different from the first category; the result obtaining module is used for obtaining the recognition result of the corresponding target image according to the prediction probabilities corresponding to the preset categories.

Optionally, the device further comprises a model training module, configured to obtain, from the first model, a preset number of network layers with the forefront ranking, as a first initial shared network layer; determining, by the first model, a first probability for the first sample image, the first probability characterizing a probability that the first sample image includes an object under the first class; determining a second probability for the first sample image by a first independent model, wherein the second probability characterizes the probability that the first sample image comprises objects under a second class, the first independent model comprises a first initial shared network layer and a first independent network layer, the first independent network layer refers to a network layer except for a first relevant network layer in the second model, and the first relevant network layer refers to a network layer corresponding to the first initial shared network layer in the second model; determining a feature matching loss value according to the first probability and the second probability; and carrying out iterative training on the first model according to the feature matching loss value to obtain an image recognition model.

Optionally, the model training module is further configured to determine a first loss value according to the first probability, where the first loss value is used to characterize an accuracy of the first probability of the first model prediction; determining a second loss value according to the second probability, wherein the second loss value is used for representing the accuracy of the second probability predicted by the first independent model; obtaining a final loss value according to the first loss value, the second loss value and the feature matching loss value; and carrying out iterative training on the first model according to the final loss value to obtain an image recognition model.

Optionally, the model training module is further configured to calculate a sum of the first loss value, the second loss value, and the feature matching loss value as a final loss value.

Optionally, the model training module is further configured to perform iterative training on the first model according to the final loss value; if the iteration times reach the preset times and the trained first model does not meet the preset conditions, acquiring a trained first initial shared network layer from the trained first model as a shared network layer; and obtaining an image recognition model according to the shared network layer, the trained first model and the first independent network layer.

Optionally, the model training module is further configured to perform iterative training on the first model according to the final loss value; if the trained first model meets the preset condition and the iteration times do not reach the preset times, acquiring one network layer positioned behind the trained first initial shared network layer and the trained first initial shared network layer from the trained first model as a new first initial shared network layer; acquiring a trained first model as a new first model; returning to the step of determining a first probability for the first sample image by the first model until the number of iterations reaches a preset number or the trained first initial shared network layer is the trained first model; acquiring a trained first initial shared network layer obtained in the last training process as a shared network layer; and obtaining an image recognition model according to the shared network layer, the trained first model and the trained second model obtained in the last training process.

Optionally, the model training module is further configured to obtain a shared network layer as the image recognition model if the trained first initial shared network layer obtained in the last training process is the trained first model; if the trained first initial shared network layer obtained in the last training process is not the trained first model, obtaining a second independent network layer and a third independent network layer; and obtaining an image recognition model according to the shared network layer, a second independent network layer and a third independent network layer, wherein the second independent network layer refers to a network layer except the shared network layer in the first trained model obtained in the last training process, the third independent network layer refers to a network layer except the second related network layer in the second model, and the second related network layer refers to a network layer corresponding to the shared network layer in the second model.

Optionally, the second model comprises a plurality; the model training module is further used for determining a third model from the plurality of second models; acquiring a preset number of network layers with the forefront ordering from the first model to serve as a second initial shared network layer; determining, by the first model, a third probability for the second sample image, the third probability characterizing a probability that the second sample image includes an object under the first class; determining, by the second independent model, a fourth probability for the second sample image, the fourth probability characterizing a probability that the second sample image includes an object under a third category, the third category including a preset category for the third model, the second independent model including a second initial shared network layer and a fourth independent network layer, the fourth independent network layer being a network layer in the third model other than the third related network layer, the third related network layer being a network layer in the third model corresponding to the second initial shared network layer; determining a feature matching loss value according to the third probability and the fourth probability; performing iterative training on the first model through the feature matching loss value; obtaining a new first model according to the trained first model and the fourth independent network layer; determining a second model different from the third model from the plurality of second models as a new third model; returning to the step of determining a third probability for the second sample image by the first model until traversing the plurality of second models; and acquiring a new first model obtained in the last training process as an image recognition model.

Optionally, the device further includes a model determining module, configured to obtain initial models corresponding to a plurality of initial categories, where each initial category includes at least one preset category; sequencing a plurality of initial models according to the identification capacity corresponding to each initial model and the preset category number to obtain a model sequence; an initial model that is ranked first is obtained from a sequence of models as a first model.

Optionally, the model determining module is further configured to obtain a first weight corresponding to the identification capability of each initial model and a second weight corresponding to the number of preset categories of each initial model; determining the score of each initial model according to the first weight, the second weight, the identification capability and the preset category number corresponding to each initial model; and sequencing the plurality of initial models according to the scores from high to low to obtain a model sequence.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the method described above.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the method described above.

According to the image recognition method, the device, the electronic equipment and the storage medium, the shared network layer of the image recognition model is obtained through feature matching loss value training, and the feature matching loss value is used for representing the degree of independence between the first model aiming at the first category and the second model aiming at the second category, so that the recognition capability of the shared network layer obtained through training on the first category and the second category is high, the recognition capability of the image recognition model comprising the shared network layer on the first category and the second category is good, the recognition effect of the image recognition model is improved, and the accuracy of the recognition result of the target image obtained through the image recognition model is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of an application scenario proposed by an embodiment of the present application;

FIG. 2 is a flow chart of an image recognition method according to an embodiment of the present application;

FIG. 3 is a flowchart of an image recognition method according to another embodiment of the present application;

FIG. 4 shows a schematic diagram of a network of a resnet50 in an embodiment of the application;

FIG. 5 is a flow chart of an image recognition method according to still another embodiment of the present application;

FIG. 6 is a flowchart of an image recognition method according to still another embodiment of the present application;

FIG. 7 is a flow chart illustrating an image recognition method according to one embodiment of the present application;

FIG. 8 is a flow chart of a training method of an image recognition model in an embodiment of the application;

FIG. 9 is a flow chart of yet another training method of an image recognition model in an embodiment of the present application;

FIG. 10 is a block diagram of an image recognition device according to an embodiment of the present application;

fig. 11 shows a block diagram of an electronic device for performing an image recognition method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the application, are within the scope of the application in accordance with embodiments of the present application.

In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The application discloses an information selection method, an information selection device, electronic equipment and a storage medium, and relates to artificial intelligence machine learning, cloud technology and the like.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.

At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: according to the group of capacity measurement of objects stored in a logical volume (which often has a large margin with respect to the capacity of the objects actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), physical storage space is divided into stripes in advance, and a logical volume can be understood as a stripe, thereby allocating physical storage space to the logical volume

Content understanding requires the ability of models to identify various content when deployed on-line, including item identification, body attribute identification, apparel identification, and so forth.

The recognition models of different categories can be combined to obtain a multi-category recognition model, for example, the image recognition models of different categories can be directly combined into the multi-category image recognition model. In the multi-category image recognition model, different categories share a plurality of layers of the same network at the front end of the model, and different categories start to diverge again at the rear end of the model, and different diverges are used for obtaining labels of different categories. For example, the same data is input into the multi-category image recognition model to obtain labels respectively corresponding to the bifurcation of each category.

However, there may be coupling or rejection between the existing label capabilities of each class model, in addition, the difficulty of data accumulation of the identification capabilities of the models of different classes is different, and there is a problem of unbalanced data, if the models of different classes are combined directly under violence, the identification capabilities of some classes of multi-class identification models in combination are reduced compared with the models of the multi-class identification models before being combined, so that the identification effect of the multi-class identification models is poor, and the accuracy of the identification result obtained according to the multi-class identification models is low.

In addition, when the multi-category recognition model is built, different categories can not share a plurality of layers of a network, each category trains branches independently, and the branches of each category can influence each other in a shallow part of the network in a mode of regular constraint of parameters and the like.

However, the method has a regular factor, the requirement on parameters of the multi-category identification model is higher, and each newly added category is mutually opaque, so that the multi-category identification model is difficult to acquire the difference among all categories, and the identification effect of the multi-category identification model is poor.

Based on the above, the inventor proposes the image recognition method, and the degree of independence between the first model aiming at the first category and the second model aiming at the second category is constrained through the feature matching loss value, so that the recognition capability of the shared network layer obtained through training according to the feature matching loss value on the first category and the second category is higher, the image recognition model comprising the shared network layer can accurately obtain the difference between the first category and the second category, the recognition effect of the image recognition model is improved, and the accuracy of the recognition result of the target image obtained through the image recognition model is further improved.

As shown in fig. 1, an application scenario to which the embodiment of the present application is applicable includes a terminal 20 and a server 10, where the terminal 20 and the server 10 are connected through a wired network or a wireless network. The terminal 20 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart home appliance, a vehicle-mounted terminal, an aircraft, a wearable device terminal, a virtual reality device, and other terminal devices capable of page presentation, or other applications (e.g., instant messaging applications, shopping applications, search applications, game applications, forum applications, map traffic applications, etc.) capable of invoking page presentation applications.

The server 10 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The server 10 may be used to provide services for applications running at the terminal 20.

The terminal 20 may send a target image to the server 10, and the server 10 may identify the target image based on a preset image identification model, obtain an identification result of the target image, and feed back target information to the terminal 20.

The server 10 may train to obtain an image recognition model according to the training sample, and preset the image recognition model in a local storage space of the server 10, so as to recognize the target image when receiving the target image sent by the terminal 20.

In another embodiment, the terminal 20 may be configured to identify the target image according to the image identification model issued by the server 10, so as to obtain the identification result of the target image.

Alternatively, the server 10 may store the acquired image recognition model in a cloud storage system, and the terminal 20 acquires the image recognition model from the cloud storage system when executing the information selection method of the present application.

For convenience of description, in the following embodiments, description will be made as an example in which image recognition is performed by an electronic device.

Referring to fig. 2, fig. 2 is a flowchart illustrating an image recognition method according to an embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

S110, acquiring a target image.

In the present application, the image recognition model is an image recognition model for a plurality of preset categories, which may refer to animals, commodities, articles, buildings, and persons, etc., for example, the plurality of preset categories may include cats, dogs, bicycles, and persons, and for another example, the plurality of preset categories may include cats, persons, buildings, hills, and trees.

The target image may be an image including an object under a preset category or categories of preset categories, for example, the preset categories may include cat, dog, bicycle, and person, the target image may include person, and the target image may include bicycle, and cat; the target image may also be an image of an object that does not include any of a plurality of preset categories, for example, the plurality of preset categories may include cats, dogs, bicycles, and humans, the target image does not include cats, dogs, bicycles, and humans, and the target image includes a river.

As an embodiment, the target image may include a plurality of objects in a preset category or categories, for example, the preset categories may include cats, dogs, bicycles, and people, the target image may include old people, children, and young people, and for example, the preset categories may include cats, dogs, bicycles, and people, and the target image may include short-cat, and long-cat.

When the object under the preset category is an article or an animal with a changeable posture, the posture of the object under the preset category included in the target image may be different, for example, the preset category may include a cat, and the cat in the target image may be sitting, running, lying, or the like.

The target image may be an image obtained from a network (such as a cloud platform, a web page, and a chat log), may be an image obtained by capturing a user, or may be an image selected from multimedia information (such as a video).

S120, identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories, wherein the prediction probability of each preset category represents the probability that the target image comprises an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first class, and the second model is a model aiming at a second class; the first category includes at least one preset category and the second category includes at least one preset category different from the first category.

The image recognition model is for a first category and a second category, while the first category includes at least one preset category, the second category includes at least one preset category different from the first category, for example, the plurality of preset categories may include cats, dogs, bicycles, and people, the first category includes cats, dogs, and people, and the second category includes bicycles.

The higher the prediction probability of a preset class, the higher the likelihood that the target image includes an object under the preset class; the lower the prediction probability for a preset class, the lower the likelihood that the target image includes objects under that preset class.

The image recognition model may include a shared network layer, a first type independent network layer and a second type independent network layer, the target image may be input into the shared network layer to obtain an input result of the shared network layer, and then the output result of the shared network layer is input into the first type independent network layer and the second type independent network layer respectively to obtain prediction probabilities respectively output by the first type independent network layer and the second type independent network layer, where the prediction probabilities output by the first type independent network layer include prediction probabilities corresponding to preset categories in the first type, and the prediction probabilities output by the second type independent network layer include prediction probabilities corresponding to preset categories in the second type.

For example, the plurality of preset categories may include cats, dogs, bicycles, and humans, the first category including cats, dogs, and humans, the second category including bicycles; the prediction probabilities output by the first class independent network layer comprise the prediction probabilities respectively corresponding to cats, dogs and people, and the prediction probabilities output by the second class independent network layer comprise the prediction probabilities corresponding to bicycles.

Firstly, an image recognition model aiming at a first category can be obtained, the first model can accurately recognize each preset category included in the first category as a first model, and an image recognition model aiming at a second category can be obtained, and the second model can accurately recognize each preset category in the second category as a second model, wherein the network structures of the first model and the second model are the same, and network parameters of the first model and the second model are different.

The first number of network layers which are ranked most forward in the first model is selected for sharing by the first model and the second model, as a selected network layer, a portion of the first model except the selected network layer is taken as a first type independent network layer, a portion of the second model except the corresponding selected network layer is taken as a second type independent network layer, the portion of the second model corresponding to the selected network layer is the first number of network layers which are ranked most forward in the second model, and the first number can be a natural number which is not 0.

For example, the first model and the second model each include 30 layers, with the first 10 layers of the first model as the selected network layers, the last 20 layers of the first model as the first class independent network layers, and the last 20 layers of the second model as the second class independent network layers.

Then, feature matching loss values for representing the degree of independence between the first model and the second model can be determined, the selected network layer is trained through the feature matching loss values until training is finished, a shared network layer is obtained, the shared network layer, the first type independent network layer and the second type independent network layer are combined, and an image recognition model is obtained, wherein the image recognition model aims at the first type and the second type.

The first model and the second model are also called independent classification models, and the independent classification models refer to independent classification models for each new added capability, and are generally normal and the new added capability. The image recognition model is called a combined classification model, and the combined classification model can be a single classification model with the existing capacity and the newly added capacity, and has more general categories; the recognition capability for each capability is often weaker than the effect of an independent classification model for each capability.

It will be appreciated that the first model is essentially the base model for the image recognition model, and that incorporating the second model into the first model results in the image recognition model such that the first model and the second model share a shared network layer.

In this embodiment, the selected sample image may be input into the first model and the second model, respectively, to obtain a prediction probability of the first model (the prediction probability represents a probability that the sample image includes an object of the first category) and a prediction probability of the second model (the prediction probability represents a probability that the sample image includes an object of the second category), and the feature matching loss value may be determined by the prediction probability of the first model and the prediction probability of the second model. For example, the cross entropy loss value may be determined by the prediction probability of the first model, and the cross entropy loss value may be determined by the prediction probability of the second model, and then the two cross entropy loss values may be summed (may be weighted summed) to obtain the feature matching loss value.

S130, obtaining a recognition result of the corresponding target image according to the prediction probabilities corresponding to the preset categories.

After the prediction probabilities corresponding to the preset categories are obtained, the preset categories included in the target image and the preset categories not included in the target image are determined according to the prediction probabilities corresponding to the preset categories and serve as recognition results of the target image. For example, the plurality of preset categories may include cat, dog, bicycle and person, and according to the prediction probabilities of cat, dog, bicycle and person, determining that the target image includes cat and dog, and the target image does not include person and bicycle, the recognition result of the target image is: the target image includes cats and dogs, and the target image does not include people and bicycles.

And setting a corresponding probability threshold value for each preset category, and determining the preset category included in the target image and the preset category not included in the target image according to the comparison result between the respective prediction probability of the preset category and the probability threshold value. For each preset category, if the prediction probability corresponding to the preset category reaches the probability threshold value corresponding to the preset category, determining that the identification result corresponding to the preset category is that the target image comprises the object under the preset category, if the prediction probability corresponding to the preset category does not reach the probability threshold value corresponding to the preset category, determining that the identification result corresponding to the preset category is that the target image does not comprise the object under the preset category, traversing the prediction probabilities and the probability threshold values corresponding to the preset categories, determining the identification result corresponding to each preset category, and summarizing the identification results corresponding to the preset categories to obtain the identification result of the target image. The probability threshold values of different preset categories may be different or the same.

For example, the plurality of preset categories may include cat, dog, bicycle, and person, each of the plurality of preset categories having a probability threshold of: cat 0.7, dog 0.65, bicycle 0.8 and person 0.7, the predictive probability of the target image input by the image recognition model for each preset category is cat 0.3, dog 0.8, bicycle 0.1 and person 0.8, according to the probability threshold values and predictive probabilities of a plurality of preset categories, the predictive probabilities of the dog and person reach the corresponding probability threshold values, and the recognition result of the target image is determined as follows: the target image includes humans as well as dogs, and the target image does not include cats and bicycles.

In this embodiment, the shared network layer of the image recognition model is obtained through feature matching loss value training, and the feature matching loss value is used for representing the degree of independence between the first model for the first category and the second model for the second category, so that the recognition capability of the shared network layer obtained through training on the first category and the second category is higher, the recognition capability of the image recognition model comprising the shared network layer on the first category and the second category is better, the recognition effect of the image recognition model is improved, and the accuracy of the recognition result of the target image obtained through the image recognition model is further improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating an image recognition method according to another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

s210, acquiring a preset number of network layers with the forefront ordering from a first model to serve as a first initial shared network layer; a first probability for the first sample image is determined by the first model.

Wherein the first probability characterizes a probability that the first sample image includes an object under the first class.

In this embodiment, the first model and the second model may both be structures based on the resnet50 network, i.e. the first model and the second model each comprise 50 network layers.

As shown in fig. 4, the resnet50 network includes an input convolutional layer (not shown in fig. 4, before inputting the first block of the convolutional layer), a first group including 3 blocks, a second group including 4 blocks, a third group including 6 blocks, a fourth group including 3 blocks, each block including 3 convolutional networks, and a full connection layer (not shown in fig. 4, after the fourth block).

The preset number may be a natural number other than 0, for example, the preset number may be 3, at which time the first three network layers in the first model are determined to be initial shared network layers.

An image including an object of a first class may be acquired as a first sample image, and the first sample image may be input into a first model (entire model), resulting in a prediction probability output by the first model as a first probability for the first sample image.

S220, determining a second probability for the first sample image through the first independent model.

The second probability characterizes the probability that the first sample image comprises an object under a second class, the first independent model comprises a first initial shared network layer and a first independent network layer, the first independent network layer refers to a network layer except for a first relevant network layer in the second model, and the first relevant network layer refers to a network layer corresponding to the first initial shared network layer in the second model.

And taking the forefront preset number of network layers in the second model as first related network layers corresponding to the first initial shared network layer, taking network layers except the first related network layers in the second model as first independent network layers, and combining the first independent network layers and the first initial shared network layers to be taken as a first independent model.

For example, the preset number is 3, the first model and the second model each include 50 layers, the first 3 layers of the first model are used as the first initial shared network layer, the second 47 layers of the second model are used as the first independent network layers, and the first 3 layers of the first model and the second 47 layers of the second model are combined to obtain the first independent model of 50 layers.

After the first independent model is obtained, the first sample image is input into the first independent model, and the prediction probability output by the first independent model is obtained and used as the second probability aiming at the first sample image.

And S230, determining a feature matching loss value according to the first probability and the second probability.

After the first probability and the second probability are obtained, determining the feature matching loss value according to the first probability and the second probability.

For example, the feature matching loss value may be calculated according to equation one, which is as follows:

Wherein L is _fea For the feature matching loss value, j is the j-th sample image of the first sample images,first probability for the jth sample image, < >>For the second probability of the jth sample image, N is the total number of sample images in the first sample image.

Wherein, by a first model, it is determined thatIs determined by a first independent model +.>The process parameters of (a) can be according to a formula II, wherein the formula II is as follows:

wherein v is _i Refers to the features determined by the first model for the jth sample image in the first sample image, v _k Refers to the features determined by the first model for the kth sample image in the first sample image, z _i Refers to the features determined by the first independent model for the jth sample image in the first sample image, z _k The features of the kth sample image in the first sample image determined by the first independent model are determined, and T is an adjustment factor and may be 0.2.

S240, performing iterative training on the first model according to the feature matching loss value to obtain an image recognition model.

After the feature matching loss value is obtained, training the first initial shared network layer and the network layers except the first initial shared network layer in the first model according to the feature matching loss value until the training is finished, obtaining the trained first initial shared network layer as the shared network layer, obtaining the first independent network layer after training the network layers except the first initial shared network layer in the first model, obtaining the first independent network layer as the second independent network layer, and obtaining the image recognition model according to the shared network layer, the first independent network layer and the second independent network layer.

As an embodiment, S230 may include: determining a first loss value according to the first probability, wherein the first loss value is used for representing the accuracy of the first probability predicted by the first model; determining a second loss value according to the second probability, wherein the second loss value is used for representing the accuracy of the second probability predicted by the first independent model; obtaining a final loss value according to the first loss value, the second loss value and the feature matching loss value; and carrying out iterative training on the first model according to the final loss value to obtain an image recognition model.

The cross entropy loss value may be calculated as a first loss value based on a first probability and as a second loss value based on a second probability. And then, calculating the first loss value, the second loss value and the feature matching loss value to obtain a final loss value. The determining of the final loss value may include: and calculating the sum of the first loss value, the second loss value and the feature matching loss value as a final loss value.

After the final loss value is determined, a first initial shared network layer in the first model can be trained through the final loss value, the trained first initial shared network layer is obtained and serves as a shared network layer, network layers outside the first initial shared network layer in the first model are trained to obtain a first type independent network layer, the first independent network layer is obtained and serves as a second type independent network layer, and the image recognition model is obtained according to the shared network layer, the first type independent network layer and the second type independent network layer.

The first sample image may be divided into a plurality of batches of samples, a final loss value is determined for each batch, and the first model is iteratively trained according to the determined final loss values until all of the plurality of batches are traversed to obtain the image recognition model.

S250, acquiring a target image; identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories respectively; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories.

The description of S250 refers to the descriptions of S110 to S130 above, and will not be repeated here.

In the embodiment, the first probability is determined through the first model, the second probability is determined through the first independent model, and then the feature matching loss value is determined according to the first probability and the second probability, so that the degree of independence between the first model and the second model can be accurately represented by the feature matching loss value, the shared network layer obtained through training has higher recognition capability on the first category and the second category, the recognition effect of the image recognition model is improved, and the accuracy of the recognition result of the target image is further improved.

Meanwhile, the first sample image is processed through the first model and the first independent model respectively, so that the occurrence of different data accumulation difficulties is avoided, the occurrence probability of the data imbalance problem can be reduced, and the recognition effect of the image recognition model obtained through training is further improved.

Referring to fig. 5, fig. 5 shows a flowchart of an image recognition method according to still another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

s310, determining a first probability and a second probability for a first sample image according to the first model and the first independent model; determining a feature matching loss value according to the first probability and the second probability; determining a first loss value according to the first probability; determining a second loss value according to the second probability; and obtaining a final loss value according to the first loss value, the second loss value and the feature matching loss value.

The description of S310 refers to the descriptions of S210 to S230 above, and will not be repeated here.

S320, performing iterative training on the first model according to the final loss value; if the iteration number reaches the preset number and the trained first model does not meet the preset condition, acquiring a trained first initial shared network layer from the trained first model as a shared network layer.

In the present embodiment, the preset number of times may be set based on the demand and the scene (for example, the scene may be roughly classified or finely classified), for example, the preset number of times may be 10000 times; the preset condition may mean that the final loss value is not converged, wherein the final loss value convergence may mean that a difference between the final loss value in a last several iterations is smaller than a target value, and the target value may be determined based on a requirement and a scene, which is not limited by the present application.

When the iteration times reach the preset times and the trained first model does not meet the preset conditions, determining that the training process is terminated, and after the training is finished, acquiring a trained first initial shared network layer from the trained first model as a shared network layer.

S330, obtaining an image recognition model according to the shared network layer, the trained first model and the first independent network layer.

And acquiring network layers except the shared network layer in the trained first model, acquiring the first independent network layer as a first type independent network layer and the second type independent network layer, and acquiring an image recognition model according to the shared network layer, the first type independent network layer and the second type independent network layer.

S340, acquiring a target image; identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories respectively; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories.

The description of S340 refers to the descriptions of S110 to S130 above, and will not be repeated here.

In this embodiment, the iteration number reaches the preset number, the first trained model does not meet the preset condition, training is stopped, and according to the shared network layer, the first trained model and the first independent network layer, an image recognition model is obtained, so that the reduction of recognition capability of the combined image recognition model on the first category and the second category caused by further combining the first model and the second model is avoided, the recognition capability of the image recognition model on the first category and the second category is ensured, and the accuracy of a recognition result obtained through the image recognition model is improved.

Referring to fig. 6, fig. 6 shows a flowchart of an image recognition method according to still another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

S410, determining a first probability and a second probability for a first sample image according to the first model and the first independent model; determining a feature matching loss value according to the first probability and the second probability; determining a first loss value according to the first probability; determining a second loss value according to the second probability; and obtaining a final loss value according to the first loss value, the second loss value and the feature matching loss value.

The description of S410 refers to the descriptions of S210 to S230 above, and will not be repeated here.

S420, performing iterative training on the first model according to the final loss value; if the trained first model meets the preset condition and the iteration times do not reach the preset times, acquiring one network layer positioned behind the trained first initial shared network layer and the trained first initial shared network layer from the trained first model as a new first initial shared network layer.

If the trained first model meets the preset condition and the iteration times do not reach the preset times, at this time, continuously acquiring a network layer positioned behind the trained first initial shared network layer in the trained first model, and merging the acquired network layer with the trained first initial shared network layer to serve as a new first initial shared network layer.

In other words, if the trained first model meets the preset condition and the iteration number does not reach the preset number, a second number of network layers with the first sequence of the trained first model is obtained as a new first initial shared network layer, wherein the second number=the preset number+1.

S430, acquiring the trained first model as a new first model; and returning to the step of determining the first probability for the first sample image through the first model until the iteration number reaches a preset number or the trained first initial shared network layer is the trained first model.

Obtaining a trained first model as a new first model, continuing to determine a new first probability for the first sample image through the first model, determining a new first independent model according to the new first model, determining a new second probability for the first sample image through the new first independent model, and repeating the process of determining the final loss value and training the first model through the final loss value in the embodiment according to the new first probability and the new second probability.

For example, the first model and the second model are both 50 layers, the first initial shared network layer determined by the first training process is 3 layers, the first independent network layer is the back 47 layers of the second model, after one training, the obtained trained first model meets the preset condition, the iteration number does not reach the preset number, the first 4 layers of the trained first model are obtained as new first initial shared network layers, the back 46 layers of the second model are obtained as new first independent network layers, the new first initial shared network layers and the new first independent network layers are combined into new first independent models, and the training process is repeated.

And training in this way until the iteration times reach the preset times, or determining that the trained first initial shared network layer is the trained first model (all network layers of the first model are shared), and determining that the training process is terminated.

S440, acquiring a trained first initial shared network layer obtained in the last training process as a shared network layer; and obtaining an image recognition model according to the shared network layer, the trained first model and the trained second model obtained in the last training process.

After stopping training according to the constraint of S430, the trained first initial shared network layer obtained in the last training process is obtained and used as the shared network layer. And determining an independent part from the trained first model obtained in the last training process according to the shared network layer, taking the independent part as a first type independent network layer, and obtaining an independent part from a second model as a second type independent network layer, and obtaining an image recognition model according to the shared network layer, the first type independent network layer and the second type independent network layer.

As an embodiment, obtaining an image recognition model according to the shared network layer, the first model after training obtained in the last training process, and the second model, includes: and if the trained first initial shared network layer obtained in the last training process is the trained first model, acquiring the shared network layer as an image recognition model. At this point, the first model does not include an independent portion outside of the shared network layer, and the second model does not include an independent portion.

As still another embodiment, obtaining an image recognition model according to the shared network layer, the first model after training obtained in the last training process, and the second model, includes: if the trained first initial shared network layer obtained in the last training process is not the trained first model, obtaining a second independent network layer and a third independent network layer; and obtaining an image recognition model according to the shared network layer, a second independent network layer and a third independent network layer, wherein the second independent network layer refers to a network layer except the shared network layer in the first trained model obtained in the last training process, the third independent network layer refers to a network layer except the second related network layer in the second model, and the second related network layer refers to a network layer corresponding to the shared network layer in the second model.

If the first initial shared network layer after training is not the first model after training, and training is stopped, the number of iterations reaches a preset number, the first initial shared network layer after training obtained in the last training process is obtained and used as a shared network layer, a part of the first model obtained in the last training process except the shared network layer is obtained and used as a second independent network layer (i.e. the first type independent network layer in the S120), a network layer except the second related network layer in the second model is obtained and used as a third independent network layer (i.e. the second type independent network layer in the S120), wherein the second related network layer corresponding to the shared network layer in the second model refers to a plurality of top-ranked network layers in the second model, and the second related network layer is the same as the shared network layer.

And combining the shared network layer, the second independent network layer and the third independent network layer to obtain the image recognition model.

S450, acquiring a target image; identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories respectively; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories.

The description of S450 refers to the descriptions of S110 to S130 above, and will not be repeated here.

In this embodiment, the first model and the second model are combined and trained until the iteration number reaches the preset number or the trained initial shared network layer is the trained first model, the training is stopped, the first model and the second model are combined to the greatest extent, the combined image recognition model is simplified, the recognition effect of the image recognition model is better, the size of the image recognition model is smaller, and the storage space is saved.

Referring to fig. 7, fig. 7 shows a flowchart of an image recognition method according to still another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

S510, acquiring initial models corresponding to a plurality of initial categories respectively; sequencing a plurality of initial models according to the identification capacity corresponding to each initial model and the preset category number to obtain a model sequence; an initial model that is ranked first is obtained from a sequence of models as a first model.

Wherein each initial category comprises at least one preset category; the image recognition model of each of the plurality of initial classes may be acquired as an initial model (each initial model may be identical in structure to the first model, that is, each initial model is obtained based on network training of the same structure, so that network parameters between each initial model may be different), and the recognition capability of each initial model and the number of preset classes for each initial model may be determined. The recognition capacity of the initial model can be characterized by the AUC (Area Under ROC Curve). For example, the initial model is for a preset category of people and cats, with a preset category number of 2 for the initial model.

The method comprises the steps of sorting a plurality of initial models according to the recognition capability from high to low to obtain a sorting result, and fine-tuning the sorting result through the preset category number of the initial models to obtain a model sequence, wherein fine-tuning can be used for advancing the sorting with the large preset category number and delaying the sorting with the large preset category number.

And then, acquiring an initial model with a first rank from the model sequence as a first model, wherein the first model has higher recognition capability, more preset categories are aimed at, and the first model is used as a basic model of the image recognition model, so that the obtained image recognition model has better recognition capability. The initial category corresponding to the first model is the first category.

As an implementation manner, according to the recognition capability and the preset category number corresponding to each initial model, sorting the plurality of initial models to obtain a model sequence, including: acquiring a first weight corresponding to the identification capacity of each initial model and a second weight corresponding to the number of preset categories of each initial model; determining the score of each initial model according to the first weight, the second weight, the identification capability and the preset category number corresponding to each initial model; and sequencing the plurality of initial models according to the scores from high to low to obtain a model sequence.

The specific values of the first weight and the second weight are not limited in the application. The first weights of the initial models can be the same or different, the second weights of the initial models can be the same or different, the identification capacity and the number of preset categories are weighted and summed through the first weights and the second weights to obtain a summation result, the summation result is used as the score of the initial model, and the initial models are sequenced from high to low according to the score to obtain a model sequence.

S520, determining a third model from the plurality of second models.

When the second model includes a plurality, one second model may be determined as the third model, may be randomly determined, or may be determined according to a certain rule, for example, a model sequence of the plurality of second models is determined in the manner of S510, and the second model ranked first is determined as the third model in the model sequence of the second model.

Or after the model sequence of the initial model is obtained, selecting the first model, taking the model sequence formed by the rest initial models as the model sequence of the second model, and directly selecting the second model with the first rank from the model sequence of the second model as the third model.

S530, acquiring a preset number of network layers with the forefront ordering from the first model to serve as a second initial shared network layer; determining, by the first model, a third probability for the second sample image, the third probability characterizing a probability that the second sample image includes an object under the first class; determining, by the second independent model, a fourth probability for the second sample image, the fourth probability characterizing a probability that the second sample image includes an object under a third category, the third category including a preset category for the third model, the second independent model including a second initial shared network layer and a fourth independent network layer, the fourth independent network layer being a network layer in the third model other than the third related network layer, the third related network layer being a network layer in the third model corresponding to the second initial shared network layer; determining a feature matching loss value according to the third probability and the fourth probability; performing iterative training on the first model through the feature matching loss value; and obtaining a new first model according to the trained first model and the fourth independent network layer.

The description of the second sample image refers to the description of the first sample image above, and will not be repeated here. The training process of S530 is similar to the training processes of S210-S240 and will not be described again here.

Wherein the processing procedure of the third probability refers to the processing procedure of the first probability, the processing procedure of the fourth probability refers to the processing procedure of the second probability, the second independent model refers to the description of the first independent model, the fourth independent network layer refers to the description of the first independent network layer, the third model refers to the description of the second model in S210-S240, and the second related network refers to the description of the first related network.

In the merging process of the first model and the third model, the first model after the last training and the corresponding fourth independent network layer (namely the independent part in the third model in the last training process) in the last training process can be merged to obtain a new first model. The new first model includes the original first model (including the shared portion of the first model as well as the independent portion) and the independent portion of the third model.

S540, determining a second model different from the third model from the plurality of second models as a new third model; returning to the step of determining a third probability for the second sample image by the first model until traversing the plurality of second models; and acquiring a new first model obtained in the last training process as an image recognition model.

And continuing to select the unselected second model from the second models as a new third model, continuing to combine the new first model and the new third model according to the description of the S530 until a plurality of second models are traversed, and stopping the training process.

In this embodiment, after the training process is stopped, a new first model obtained in the last training process is obtained and used as an image recognition model, so as to combine a plurality of second models with the first model, and obtain the image recognition model.

S550, acquiring a target image; identifying a target image through an image identification model to obtain prediction probabilities corresponding to a plurality of preset categories respectively; and obtaining the recognition result of the corresponding target image according to the prediction probability corresponding to each of the preset categories.

The description of S550 refers to the descriptions of S110 to S130 above, and will not be repeated here.

In this embodiment, the determined first model has higher recognition capability, and the first model is used as a base model of the image recognition model, so that the obtained image recognition model has better recognition capability. Meanwhile, a plurality of second models can be combined into the first model to obtain an image recognition model, the preset types aimed by the image recognition model are more, the application of the image recognition model is wider, and the generalization of the image recognition model is higher.

Meanwhile, when other types of models need to be combined into the image recognition model, the models can be combined according to the mode, and the new type recognition capability is obtained under the condition that the original recognition capability of each type of the image recognition model is ensured, so that the generalization and the applicability of the image recognition model are improved.

In order to more clearly explain the technical scheme of the application, the image recognition method of the application is explained below in combination with an exemplary scene, wherein a plurality of initial categories comprise an initial category one, an initial category two and an initial category three, the initial category one comprises a preset category cat and dog, the initial category two comprises a preset category person, the initial category three comprises a preset category mobile phone, an initial model one aiming at the initial category one, an initial model two aiming at the initial category two and an initial model three aiming at the initial category three are obtained by training a resnet50 network according to training samples, namely, the initial model one, the initial model two and the initial model three comprise 50 layers of network structures, and the preset quantity is 3.

1. Determination of a first model

Determining an AUC value corresponding to the initial model I as a first value, an AUC value corresponding to the initial model II as a second value and an AUC value corresponding to the initial model III as a third value, obtaining a first weight corresponding to the identification capacity of 0.5 and a second weight corresponding to the preset category number of 0.5, carrying out weighted summation on the first value and the second value corresponding to the initial model I according to the first weight and the second weight to obtain a first score, carrying out weighted summation on the second value and the first value corresponding to the initial model II according to the first weight and the second weight to obtain a second score, carrying out weighted summation on the third value corresponding to the initial model III and the first value according to the first weight and the second weight to obtain a third score, wherein the first score is larger than the second score, and the second score is larger than the third score, and at the moment, the determined model sequence is one row of first, two rows of second and three rows of third scores.

Determining the initial model I as a first model, determining the initial model II and the initial model III as second models, determining the initial model II as a first combined second model from the initial model II and the initial model III, and determining the initial model II as a second combined second model. At this time, the first initial category is the first category, and the second category includes the second initial category and the third initial category.

2. Model training process

An image including a cat, a dog, and a person is acquired as a first sample image.

As shown in fig. 8, the top 3 network layers are acquired from the first model (initial model one) as the first initial shared network layer; and determining the 47-layer network in the second model (initial model II) as a first independent network layer, and merging the first independent network layer and the first initial shared network layer to be used as the first independent model. The back 47-layer network is determined in the first model as the fifth independent network layer.

Inputting the first sample image into a first model to obtain a prediction probability output by the first model as a first probability, and inputting the first sample image into a first independent model to obtain a prediction probability output by the first independent model as a second probability; calculating a feature matching loss value according to the first probability and the second probability through a formula I, calculating a cross entropy value according to the first probability, calculating the cross entropy value as a first loss value according to the second probability, summing the first loss value, the second loss value and the feature matching loss value as a second loss value to obtain a final loss value, and performing iterative training on the first model through the final loss value.

The first model after training meets the preset condition, the iteration times do not reach the preset times, the first 4-layer network of the first model after training is determined to be a new first initial shared network layer, the last 46 network layers in the second model (initial model II) are determined to be new first independent network layers, the new first independent model is obtained according to the new first initial shared network layer and the new first independent network layers, the first model after training is taken as the new first model, and the training process of the previous section is repeated to perform iterative training.

As shown in fig. 9, when training is stopped, the trained first initial shared network layer in the first model obtained by the last training includes a first 30-layer network, the first 30-layer network in the first model obtained by the last training is obtained as a shared network layer, the last 20-layer in the first model obtained by the last training is obtained as a second independent network layer, the last 20-layer of the second model (initial model two) is obtained as a third independent network layer, and the shared network layer, the second independent network layer and the third independent network layer are combined to obtain a combined model.

Taking the combined model as a new first model, taking the initial model III as a new second model, acquiring images including cats, dogs, people and mobile phones as new first sample images, and combining the new first model and the new second model (initial model III) in the mode.

At this time, for the second merging process, the front 27 layers of the first model after training are used as the shared network layers, the rear 23 layers of the first model after training are used as the second independent networks, the rear 23 layers of the second model (the initial model three) are used as the third independent networks, and the shared network layers (the shared network layers including the first merging process and the shared network layers including the second merging process), the second independent network layers and the third independent network layers (the third independent network layers including the first merging process and the second merging process) are merged to obtain the image recognition model.

The image recognition model comprises a shared network layer, an independent network part aiming at an initial category one, an independent network part aiming at an initial category two and an independent network part aiming at an initial category three. The initial class I and the initial class II share the first 30 layers of networks, the independent network part aiming at the initial class I and the independent network part aiming at the initial class II both comprise 20 layers of networks, and the independent network part aiming at the initial class III comprises 23 layers of networks.

The image recognition model obtained through training by the method ensures that the effect of the combined type does not influence the original model capacity, and the recognition effect of the image recognition model is good.

3. Image recognition

And acquiring a target image, wherein the target image is a call of a man holding the cat.

Inputting the target image into the image recognition model, inputting the target image recognition model into the image recognition model, obtaining that the prediction probability of the image recognition model output aiming at the independent network part of the first initial class comprises cat 0.8 and dog 0.2, obtaining that the prediction probability of the image recognition model output aiming at the independent network part of the second initial class comprises person 0.8, and obtaining that the prediction probability of the image recognition model output aiming at the independent network part of the second initial class comprises mobile phone 0.9.

The probability threshold values corresponding to the cat, the dog, the person and the mobile phone are all 0.7, at this time, the target image comprises the person, the cat and the mobile phone according to the probability threshold values, and the obtained recognition result of the target image is the target image comprising the person, the cat and the mobile phone.

As one implementation mode, the target image is determined to comprise a person, a cat and a mobile phone, and three labels of the person, the cat and the mobile phone can be added to the target image, so that the identification result of the target image can be obtained according to the added labels.

After the identification result of the target image is obtained, the user can be replied according to the identification result of the target image as soon as possible, so that the user can check the identification result conveniently, and the response speed of the image identification model is improved.

Referring to fig. 10, fig. 10 shows a block diagram of an image recognition apparatus according to an embodiment of the application, an apparatus 1100 includes:

an acquisition module 1110 for acquiring a target image;

the identifying module 1120 is configured to identify, by using an image identifying model, a target image, so as to obtain prediction probabilities corresponding to a plurality of preset categories, where the prediction probability of each preset category characterizes a probability that the target image includes an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first class, and the second model is a model aiming at a second class; the first category comprises at least one preset category and the second category comprises at least one preset category different from the first category;

the result obtaining module 1130 is configured to obtain a recognition result of the corresponding target image according to the prediction probabilities corresponding to the preset categories.

It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.

Fig. 11 shows a block diagram of an electronic device for performing an image recognition method according to an embodiment of the present application. The electronic device may be the terminal 20 or the server 10 in fig. 1, and it should be noted that, the computer system 1200 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 11, the computer system 1200 includes a central processing unit (Central Processing Unit, CPU) 1201 which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a random access Memory (Random Access Memory, RAM) 1203. In the RAM 1203, various programs and data required for the system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other through a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker, etc.; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 1210 as needed, so that a computer program read out therefrom is installed into the storage section 1208 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. When executed by a Central Processing Unit (CPU) 1201, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable storage medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the above embodiments.

According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the electronic device to perform the method of any of the embodiments described above.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause an electronic device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image recognition method, the method comprising:

acquiring a target image;

identifying the target image through an image identification model to obtain the prediction probabilities corresponding to a plurality of preset categories, wherein the prediction probability of each preset category represents the probability that the target image comprises an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first category, and the second model is a model aiming at a second category; the first category includes at least one of the preset categories, and the second category includes at least one of the preset categories that is different from the first category;

And obtaining a recognition result corresponding to the target image according to the prediction probabilities corresponding to the preset categories.

2. The method according to claim 1, wherein the image recognition model acquisition method includes:

acquiring a preset number of network layers with the forefront ordering from the first model to serve as a first initial shared network layer;

determining, by the first model, a first probability for a first sample image, the first probability characterizing a probability that the first sample image includes an object under the first class;

determining a second probability for the first sample image by a first independent model, wherein the second probability characterizes the probability that the first sample image comprises objects in the second class, the first independent model comprises the first initial shared network layer and a first independent network layer, the first independent network layer refers to a network layer except for a first relevant network layer in the second model, and the first relevant network layer refers to a network layer corresponding to the first initial shared network layer in the second model;

determining a feature matching loss value according to the first probability and the second probability;

And carrying out iterative training on the first model according to the characteristic matching loss value to obtain an image recognition model.

3. The method according to claim 2, wherein the performing iterative training on the first model according to the feature matching loss value to obtain an image recognition model includes:

determining a first loss value according to the first probability, wherein the first loss value is used for representing the accuracy of the first probability predicted by the first model;

determining a second loss value according to the second probability, wherein the second loss value is used for representing the accuracy of the second probability predicted by the first independent model;

obtaining a final loss value according to the first loss value, the second loss value and the feature matching loss value;

and carrying out iterative training on the first model according to the final loss value to obtain an image recognition model.

4. A method according to claim 3, wherein said deriving a final loss value from said first loss value, said second loss value, and said feature matching loss value comprises:

and calculating the sum of the first loss value, the second loss value and the characteristic matching loss value as a final loss value.

5. A method according to claim 3, wherein said iteratively training said first model based on said final loss value to obtain an image recognition model comprises:

performing iterative training on the first model according to the final loss value;

if the iteration times reach the preset times and the trained first model does not meet the preset conditions, acquiring a trained first initial shared network layer from the trained first model as the shared network layer;

and obtaining the image recognition model according to the shared network layer, the trained first model and the first independent network layer.

6. A method according to claim 3, wherein said iteratively training said first model based on said final loss value to obtain an image recognition model comprises:

if the trained first model meets the preset condition and the iteration times do not reach the preset times, acquiring a network layer positioned behind the trained first initial shared network layer and the trained first initial shared network layer from the trained first model as a new first initial shared network layer;

Acquiring the trained first model as a new first model;

returning to the step of determining the first probability for the first sample image through the first model until the iteration number reaches a preset number or the trained first initial shared network layer is the trained first model;

acquiring a trained first initial shared network layer obtained in the last training process as the shared network layer;

and obtaining an image recognition model according to the shared network layer, the trained first model obtained in the last training process and the second model.

7. The method of claim 6, wherein the obtaining the image recognition model from the shared network layer, the first trained model obtained from the last training process, and the second model comprises:

if the trained first initial shared network layer obtained in the last training process is the trained first model, acquiring the shared network layer as the image recognition model;

if the trained first initial shared network layer obtained in the last training process is not the trained first model, a second independent network layer and a third independent network layer are obtained; and obtaining the image recognition model according to the shared network layer, the second independent network layer and the third independent network layer, wherein the second independent network layer refers to a network layer except the shared network layer in a trained first model obtained in the last training process, the third independent network layer refers to a network layer except a second related network layer in the second model, and the second related network layer refers to a network layer corresponding to the shared network layer in the second model.

8. The method of claim 1, wherein the second model comprises a plurality of; the method for acquiring the image recognition model comprises the following steps:

determining a third model from the plurality of second models;

acquiring a preset number of network layers with the forefront ordering from the first model to serve as a second initial shared network layer;

determining, by the first model, a third probability for a second sample image, the third probability characterizing a probability that the second sample image includes an object under the first class;

determining, by a second independent model, a fourth probability for the second sample image, the fourth probability characterizing a probability that the second sample image includes an object under a third category, the third category including a preset category for the third model, the second independent model including the second initial shared network layer and a fourth independent network layer, the fourth independent network layer being a network layer of the third model other than a third related network layer, the third related network layer being a network layer of the third model corresponding to the second initial shared network layer;

determining a feature matching loss value according to the third probability and the fourth probability;

Performing iterative training on the first model through the feature matching loss value;

obtaining a new first model according to the trained first model and the fourth independent network layer;

determining a second model different from the third model from the plurality of second models as a new third model;

returning to the step of determining a third probability for a second sample image by the first model until the plurality of second models are traversed;

and acquiring a new first model obtained in the last training process as the image recognition model.

9. The method of claim 1, wherein the determining of the first model comprises:

acquiring initial models corresponding to a plurality of initial categories respectively, wherein each initial category comprises at least one preset category;

sequencing the plurality of initial models according to the identification capacity corresponding to each initial model and the preset category number to obtain a model sequence;

an initial model which is ranked first is obtained from the model sequence as the first model.

10. The method of claim 9, wherein the sorting the plurality of initial models according to the recognition capability and the number of preset categories corresponding to each initial model to obtain a model sequence includes:

Acquiring a first weight corresponding to the identification capacity of each initial model and a second weight corresponding to the number of preset categories of each initial model;

determining the score of each initial model according to the first weight, the second weight, the identification capability and the preset category number corresponding to each initial model;

and sequencing the plurality of initial models according to the scores from high to low to obtain a model sequence.

11. An image recognition apparatus, the apparatus comprising:

the acquisition module is used for acquiring a target image;

the identification module is used for identifying the target image through an image identification model to obtain the prediction probabilities corresponding to a plurality of preset categories, and the prediction probability of each preset category represents the probability that the target image comprises an object under the preset category; the image recognition model comprises a shared network layer, wherein the shared network layer is a network layer shared by a second model in a first model, the image recognition model is obtained through feature matching loss value training, the feature matching loss value is used for representing the degree of independence between the first model and the second model, the first model is a model aiming at a first category, and the second model is a model aiming at a second category; the first category includes at least one of the preset categories, and the second category includes at least one of the preset categories that is different from the first category;

And the result obtaining module is used for obtaining the recognition result corresponding to the target image according to the prediction probabilities corresponding to the preset categories.

12. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-10.

13. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for performing the method according to any one of claims 1-10.