CN113705323A

CN113705323A - Image recognition method, device, equipment and storage medium

Info

Publication number: CN113705323A
Application number: CN202110661352.1A
Authority: CN
Inventors: 蔡德; 肖凯文; 叶虎; 马兆轩; 韩骁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-11-26
Anticipated expiration: 2041-06-15
Also published as: CN113705323B

Abstract

The application discloses an image identification method, an image identification device, image identification equipment and a storage medium, and belongs to the technical field of computers. According to the technical scheme provided by the embodiment of the application, when the category of the image is determined, the influence of different objects on the image classification result is considered, namely the influence of the target object and the reference object on the image classification result is combined, the image characteristics of the image are determined based on the characteristics of the target object and the characteristics of the reference object, so that the obtained image characteristics have stronger expression capability, and when the category of the image is determined based on the image characteristics, a more accurate result can be obtained.

Description

Image recognition method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image recognition method, an image recognition apparatus, an image recognition device, and a storage medium.

Background

With the development of computer technology, the artificial intelligence technology is developed vigorously, and the image recognition technology is taken as a branch of the artificial intelligence technology, so that the application range of the image recognition technology is wider and wider; or the method is applied to a medical scene, and the image recognition model is adopted to recognize medical images and discover some pathological changes which are difficult to distinguish by human eyes, so that a doctor is assisted to determine a treatment scheme.

In the related art, an image recognition model often directly extracts features of an image to be recognized to obtain a feature map of the image, and the category of the image is determined based on the feature map of the image.

However, different objects displayed in the image have different effects on the classification result of the image, such as positive cells and negative cells contributing differently to the classification of the image in medical image recognition. When image recognition is performed by the method in the related art, the accuracy of image recognition is not high.

Disclosure of Invention

The embodiment of the application provides an image identification method, an image identification device, image identification equipment and a storage medium, and the accuracy of image identification can be improved. The technical scheme is as follows:

in one aspect, an image recognition method is provided, and the method includes:

acquiring an image to be recognized, wherein the image comprises a plurality of objects;

performing feature extraction on the image to obtain a plurality of object features, wherein one object feature corresponds to one object;

determining a plurality of target objects and a plurality of reference objects from the plurality of objects based on the plurality of object features, wherein the target objects and the reference objects are different classes of objects;

fusing a plurality of target object features and a plurality of reference object features to obtain image features of the image, wherein the target object features are object features corresponding to the target object, and the reference object features are object features corresponding to the reference object;

based on the image features, a category of the image is determined.

In one aspect, an image recognition apparatus is provided, the apparatus including:

the device comprises an image acquisition module, a recognition module and a recognition module, wherein the image acquisition module is used for acquiring an image to be recognized, and the image comprises a plurality of objects;

the characteristic extraction module is used for extracting the characteristics of the image to obtain a plurality of object characteristics, and one object characteristic corresponds to one object;

an object determination module, configured to determine a plurality of target objects and a plurality of reference objects from the plurality of objects based on the plurality of object features, where the target objects and the reference objects are different classes of objects;

the image characteristic acquisition module is used for fusing a plurality of target object characteristics and a plurality of reference object characteristics to obtain image characteristics of the image, wherein the target object characteristics are object characteristics corresponding to the target object, and the reference object characteristics are object characteristics corresponding to the reference object;

a classification module to determine a category of the image based on the image feature.

In a possible implementation manner, the image feature acquisition module is configured to encode an object feature set based on an attention mechanism to obtain image features of the image, where the object feature set includes the plurality of target object features and the plurality of reference object features.

In a possible implementation manner, the image feature acquisition module is configured to multiply the object feature set by a first transformation matrix to obtain a query matrix of the image; multiplying the object feature set by a second transformation matrix to obtain a key matrix of the image; multiplying the query matrix and the transformed key matrix to obtain an attention matrix of the image; and multiplying the attention matrix and the object feature set to obtain the image features of the image.

In a possible implementation manner, the classification module is configured to perform a dimension reduction process on the image feature to obtain the image feature after the dimension reduction; carrying out normalization processing on the image features subjected to dimension reduction to obtain the probability that the images belong to different categories; based on the probability, a category of the image is determined.

In a possible implementation manner, the feature extraction module is configured to perform convolution processing on the image to obtain a feature map corresponding to the image; determining a plurality of target regions on the feature map, one of the target regions corresponding to one of the objects; and performing pooling processing on the partial feature maps corresponding to the target areas to obtain the object features.

In a possible implementation manner, the feature extraction module is configured to divide the feature map into a plurality of candidate regions; determining the probability that the plurality of candidate areas correspond to the object based on the partial feature maps corresponding to the plurality of candidate areas respectively; and determining the candidate area with the probability greater than or equal to a third probability threshold as the target area.

In a possible embodiment, the apparatus further comprises:

a display module, configured to display a classification page of the image, where the classification page displays the image, at least one target object, at least one reference object, and a category of the image.

In a possible implementation, the classification module is further configured to determine categories of the target objects based on the target object features;

the display module is further configured to display the category of the at least one target object on the classification page.

In one aspect, a computer device is provided, the computer device comprising one or more processors and one or more memories having at least one computer program stored therein, the computer program being loaded and executed by the one or more processors to implement the image recognition method.

In one aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, which is loaded and executed by a processor to implement the image recognition method.

In one aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising program code, the program code being stored in a computer-readable storage medium, the program code being read by a processor of a computer device from the computer-readable storage medium, the program code being executed by the processor such that the computer device performs the image recognition method described above.

According to the technical scheme provided by the embodiment of the application, when the category of the image is determined, the influence of different objects on the image classification result is considered, namely the influence of the target object and the reference object on the image classification result is combined, the image characteristics of the image are determined based on the characteristics of the target object and the characteristics of the reference object, so that the obtained image characteristics have stronger expression capability, and when the category of the image is determined based on the image characteristics, a more accurate result can be obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of an image recognition method according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an image recognition model provided in an embodiment of the present application;

fig. 3 is a flowchart of an image recognition method provided in an embodiment of the present application;

fig. 4 is a flowchart of an image recognition method provided in an embodiment of the present application;

FIG. 5 is a schematic view of an interface provided by an embodiment of the present application;

fig. 6 is a flowchart of an image recognition method provided in an embodiment of the present application;

fig. 7 is a flowchart of an image recognition method provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

The term "at least one" in this application means one or more, "a plurality" means two or more, for example, a plurality of reference face images means two or more reference face images.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Computer Vision technology (CV) is a science for researching how to make a machine "look", and further refers to using a camera and a Computer to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing image processing, so that the Computer processes an image more suitable for human eyes to observe or transmit the image to an instrument to detect.

The technical scheme provided by the embodiment of the application can also be combined with a cloud technology, for example, an image recognition model obtained by training is deployed on a cloud server. Cloud Technology refers to a hosting Technology for unifying resources of hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

The Medical Cloud in the Cloud technology is based on new technologies such as Cloud computing, mobile technology, multimedia, 4G communication, big data, internet of things and the like, and combines Medical technology, and a Cloud computing is used to create a Medical health service Cloud platform, so that Medical resources are shared and the Medical scope is expanded. Due to the combination of the cloud computing technology, the medical cloud improves the efficiency of medical institutions and brings convenience to residents to see medical advice. Like the appointment register, the electronic medical record, the medical insurance and the like of the existing hospital are all products combining cloud computing and the medical field, and the medical cloud also has the advantages of data security, information sharing, dynamic expansion and overall layout. Illustratively, the image recognition model provided by the embodiment of the application is deployed on a medical health service cloud platform.

Transformer: a neural network based on a self attention mechanism is widely applied to the fields of voice recognition, image recognition, natural language processing and the like. New output sequences are formed by adding one or more attention-based weights to the input sequence, weakening or forgetting content that does not conform to the attention model.

Normalization treatment: and the arrays with different value ranges are mapped to the (0, 1) interval, so that the data processing is facilitated. In some cases, the normalized values may be directly implemented as probabilities.

Random inactivation (Dropout): the method is a method for optimizing the artificial neural network with the deep structure, and reduces interdependence among nodes by randomly zeroing partial weight or output of a hidden layer in the learning process, thereby realizing the standardization of the neural network and reducing the structural risk of the neural network. For example, in the model training process, there is a vector (1, 2, 3, 4), and after the vector is input into the random inactivation layer, the random inactivation layer can randomly convert a number in the vector (1, 2, 3, 4) into 0, for example, 2 into 0, and then the vector becomes (1, 0, 3, 4).

Learning Rate (Learning Rate): the learning rate can guide how the model adjusts the network weight by using the gradient of the loss function in the gradient descent method. If the learning rate is too large, the loss function can directly cross the global optimal point, and the loss is too large at the moment; if the learning rate is too small, the change speed of the loss function is slow, which greatly increases the convergence complexity of the network and is easily trapped in a local minimum or saddle point.

Embedded Coding (Embedded Coding): the embedded code expresses a corresponding relation mathematically, namely data on an X space is mapped to a Y space through a function F, wherein the function F is a single-shot function, the mapping result is structure storage, the single-shot function expresses that the mapped data is uniquely corresponding to the data before mapping, the structure storage expresses that the size relation of the data before mapping and the size relation of the mapped data is the same, for example, the data X exists before mapping₁And X₂Mapping to obtain X₁Corresponding Y₁And X₂Corresponding Y₂. If data X before mapping₁＞X₂Then correspondingly, the mapped data Y₁Greater than Y₂. For words, the words are mapped to another space, so that subsequent machine learning and processing are facilitated.

Attention weight: may represent the importance of certain data in the training or prediction process, the importance representing the magnitude of the impact of the input data on the output data. The data of high importance has a high value of attention weight, and the data of low importance has a low value of attention weight. Under different scenes, the importance of the data is different, and the process of training attention weight of the model is the process of determining the importance of the data.

Alternatively, the computer device provided in the embodiments of the present application may be implemented as a terminal or a server, and an implementation environment formed by the terminal and the server is described below.

Fig. 1 is a schematic diagram of an implementation environment of an image recognition method according to an embodiment of the present application, and referring to fig. 1, the implementation environment may include a terminal 110 and a server 140.

The terminal 110 is connected to the server 140 through a wireless network or a wired network. Optionally, the terminal 110 is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart watch, etc., but is not limited thereto. The terminal 110 is installed and operated with an application program supporting image recognition.

Optionally, the server 140 is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, distribution Network (CDN), big data and artificial intelligence platform, and the like. In some embodiments, the user can upload the image to be identified to the server 140 through the terminal 110, and the server 140 performs the image identification method provided by the embodiment on the image uploaded by the user. After the recognition is finished, the server 140 transmits the recognition result to the terminal 110, and the user can view the recognition result of the image through the terminal 110.

Optionally, the terminal 110 generally refers to one of a plurality of terminals, and the embodiment of the present application is illustrated by the terminal 110.

Those skilled in the art will appreciate that the number of terminals 110 may be greater or fewer. For example, only one terminal 110 is provided, or tens or hundreds of terminals 110 are provided, or more, and other terminals are also included in the implementation environment. The number of terminals and the type of the device are not limited in the embodiments of the present application.

After the implementation environment of the image recognition method provided by the embodiment of the present application is introduced, an application scenario of the image recognition method provided by the embodiment of the present application is described below. It should be noted that the computer device in the following description may be implemented as the terminal 110 or the server 140 in the foregoing implementation environment, which is not limited in this embodiment of the application.

The image identification method provided by the embodiment of the application can be applied to various image identification scenes. For example, the present invention is applicable to a scene for identifying a cell image, a scene for identifying a material image, and the like, and the present invention is not limited thereto.

In a scenario of identifying a cell image, for example, when determining whether a positive cell exists in a cervical cell image, or when determining a type of the positive cell in the cervical cell image, or when determining whether the cervical cell image is a positive image, the embodiment of the present application does not limit this scenario, where a positive cell refers to existence of a lesion, that is, a cell with a lesion. Because identifying the cervical cell image requires relatively rich experience, and identifying the cervical cell image is the basis of effective treatment, for some medical personnel with insufficient experience, the image identification method provided by the embodiment of the application can be used for identifying the cervical cell image, so that the accuracy of identifying the cervical cell image is improved. When the cervical cell image needs to be identified, the medical staff uploads the cervical cell image to the server through the terminal, and the server executes the image identification method provided by the embodiment of the application on the received cervical cell image to determine the disease corresponding to the cervical cell image. The server returns the disease that cervical cell image corresponds for the terminal, shows the recognition result for medical personnel by the terminal, and this recognition result just also can carry out disease identification to medical personnel and play the reference to play the auxiliary action of formulating the treatment scheme to medical personnel.

It should be noted that, the image recognition method provided in the embodiment of the present application is applied to the recognition of a cervical cell image as an example, and in other possible embodiments, the image recognition method provided in the embodiment of the present application can also be applied to a scene of recognizing other types of cell images, for example, a scene of recognizing a liver cell image or a scene of recognizing a kidney cell image, which is not limited in the embodiment of the present application.

In a scene of identifying a material image, for example, when the method is applied to identifying twin crystals in a material, twin crystals may be generated when the material is deformed and annealed, and the twin crystals are different from normal crystals. The twin crystal face is a special low-energy interface, and some researches prove that the electric conductivity of the metal material can be improved by increasing the number of twin crystals in the metal material. Therefore, some researchers have been working on a method of increasing the number of twin crystals in a metal material. When a researcher treats a metal material by different treatment methods to expect twin crystals to be added to the metal material, the researcher needs to photograph the treated metal material to obtain a metal material image, and in some embodiments, the researcher can photograph the metal material by using a Scanning Electron Microscope (SEM) to obtain a metal material image. The method for identifying the twin crystal from the material image also needs certain experience, and some twin crystals are difficult to identify by human eyes due to small volume. The server returns the identification result of the material to the terminal, the terminal displays the identification result to the user, and the user can find some twins which are difficult to identify by human eyes in time.

In addition, the twin crystal is identified from the material image by using the image identification method provided in the embodiment of the present application as an example, but in other possible embodiments, the image identification method provided in the embodiment of the present application can also be applied to identifying other types of crystals or identifying dislocations from the material image, and the embodiment of the present application is not limited thereto.

In the embodiment of the present application, a computer device can implement the image recognition method provided in the embodiment of the present application by using an image recognition model, and a structure of the image recognition model provided in the embodiment of the present application will be described below with reference to fig. 2.

Referring to fig. 2, the image recognition model 200 includes two parts, the first part is an object feature set obtaining unit 201, and the second part is an image recognition unit 202. The object feature set obtaining unit 201 is configured to obtain an object feature set, where the object feature set includes a plurality of target object features and a plurality of reference object features, and the target object and the reference object are different types of objects. The image recognition unit 202 is used to determine the category of the image.

In some embodiments, the object feature set obtaining unit 201 includes a feature map extraction layer 2011, an object feature extraction layer 2012, an object type determination layer 2013, and an object feature set obtaining layer 2014. The image recognition unit 202 includes an image feature acquisition layer 2021 and an image classification layer 2022.

The feature map extraction layer 2011 is configured to extract a feature map of an image.

The object feature extraction layer 2012 is used to obtain object features of a plurality of objects in the image based on the feature map of the image.

The object type determination layer 2013 is configured to determine types of the plurality of objects based on object features of the plurality of objects.

The object feature set acquisition layer 2014 is used for acquiring an object feature set including a plurality of target object features and a plurality of reference object features.

The image feature acquisition layer 2021 is configured to acquire an image feature of the image based on the reference object feature set.

The image classification layer 2022 is used to determine the category of the image based on the image features of the image.

After the description of the implementation environment, the application scenario, and the structure of the image recognition model in the embodiment of the present application, the following describes the image recognition method provided in the embodiment of the present application. In the embodiment of the present application, a server or a terminal may be used as an execution subject to implement the image recognition method provided in the embodiment of the present application, or the image recognition method provided in the present application may be implemented through interaction between the terminal and the server, where the terminal is the terminal 110 in the above-mentioned implementation environment, and the server is the server 140 in the above-mentioned implementation environment. For the interaction between the terminal and the server, namely, the terminal sends the sample data set to the server, the server trains the image recognition model, the server returns the trained image recognition model to the terminal, and the terminal performs image recognition based on the image recognition model. The embodiment of the present application is not limited to the execution subject. In the following description, taking the execution subject as a server as an example, referring to fig. 3, the method includes:

fig. 3 is a flowchart of an image recognition method provided in an embodiment of the present application, and referring to fig. 3, the method includes:

301. the server acquires an image to be recognized, the image including a plurality of objects.

In which, in different scenes, the images are different types of images, such as in a scene in which an image of a cervical cell, i.e., an image of a cervical cell, is identified, and the object, i.e., a cervical cell. In the scene of identifying the material image, the image is the material image, and the object is the crystal in the material.

302. The server extracts the features of the image to obtain a plurality of object features, wherein one object feature corresponds to one object.

The process of extracting the features of the image is a process of abstracting an object in the image into a feature vector, and the feature vector can represent the features of the object.

303. The server determines a plurality of target objects and a plurality of reference objects from the plurality of objects based on the plurality of object characteristics, wherein the target objects and the reference objects are different types of objects.

In different scenes, the target object and the reference object have different meanings, for example, in a scene of identifying a cervical cell image, the target object is also a positive cell, and the reference object is a negative cell. In the scene of identifying the material image, the target object is a twin crystal, and the reference object is a common crystal.

304. The server fuses the multiple target object features and the multiple reference object features to obtain image features of the image, wherein the target object features are object features corresponding to the target object, and the reference object features are object features corresponding to the reference object.

The image characteristics can reflect the characteristics of the image.

305. The server determines a category of the image based on the image features.

Wherein, for the cervical cell image, the category is also the disease category corresponding to the cervical cell image. For the material image, the category is, that is, whether the material image includes the twin crystal, which is not limited in the embodiment of the present application.

It should be noted that the above steps 301-305 are brief descriptions of the image recognition method provided in the embodiment of the present application, and the image recognition method provided in the embodiment of the present application will be described in more detail below with reference to some examples. Referring to fig. 4, the method includes:

fig. 4 is a flowchart of an image recognition method provided in an embodiment of the present application, and referring to fig. 4, the method includes:

401. the server acquires an image to be recognized, the image including a plurality of objects.

In one possible implementation mode, the server acquires the image to be identified uploaded by the terminal. The server is also the server 140 in the above implementation environment, and the terminal is also the terminal 110 in the above implementation environment.

The image is a cell image, the object is a cell, the terminal is a terminal arranged in a hospital, the terminal can be connected with a server through a network, the server provides cloud service for image recognition, and a user (medical staff) can use the cloud service through the terminal. In this case, after the user acquires the cell Image to be identified in the hospital, for example, after acquiring the cervical cell Image to be identified, the cervical cell Image can be uploaded to the server through the terminal, and the server performs subsequent Image identification, and in some embodiments, the cervical cell Image is also referred to as a full-field digital slice Image (WSI) of the cervix. In this case, no matter where the hospital is, the image recognition service provided by the server can be used by the terminal, which improves the application range of the image recognition method provided by the embodiment of the present application. In a scene of identifying a material image, a terminal is a terminal arranged in a laboratory, the terminal can establish network connection with a server, the server provides cloud service for image identification, and a user (experimenter) can use the cloud service through the terminal. In this case, after the user acquires the material image to be recognized in the laboratory, for example, after the metal material image to be recognized is acquired, the metal material image can be uploaded to the server through the terminal, and the server performs subsequent image recognition. In this case, no matter where the laboratory is, the image recognition service provided by the server can be used by the terminal, and the application range of the image recognition method provided by the embodiment of the application is widened.

In this embodiment, the user can upload an image to be recognized to the server through the terminal, and the server recognizes the image.

For example, the terminal acquires an image to be identified through the shooting device, and sends an image identification request to the server, where the image identification request carries the image. The server acquires an image recognition request, and acquires the image from the image recognition request.

The above embodiments will be described below in different scenarios.

Scene 1, in a scene for identifying a cell image, for example, a scene for identifying a cervical cell image, a shooting device is an Optical Microscope (OM). The user performs the preparation of cervical cells by a sedimentation or membrane method, wherein the preparation refers to a process of loading cervical cells into a slide. Place the objective lens that loads with cervical cell on OM's objective table, adjust OM's focus through the terminal, when the terminal was adjusted OM's focus, the picture that the terminal can show through OM's shooting in real time. When the focal distance is adjusted to be proper, namely the image displayed on the terminal has higher definition, the user shoots the cervical cell image through the terminal. Of course, the user can also move the cervical cells after production, that is, the terminal can shoot images of the cervical cells at different positions, and the terminal can store a plurality of shot images of the cervical cells so as to facilitate subsequent calling. The user can select among the stored plurality of cervical cell images, i.e., the images to be identified, by the terminal. The user can send an image identification request to the server through the terminal, and the image identification request carries the cervical cell image. The server acquires an image identification request, and acquires the cervical cell image from the image identification request.

For example, referring to fig. 5, an application for image recognition is running on the terminal, and the application is provided with an image recognition interface 501. An image selection control 502 is displayed on the image recognition interface 501. In response to the click operation on the image selection control 502, the terminal displays an image selection area 503 on the image recognition interface, and a plurality of cervical cell images shot by the OM device are displayed in the image selection area 503. In response to the selection operation of any cervical cell image, the terminal sends an image identification request to the server, wherein the image identification request carries the selected cervical cell image. The server acquires an image identification request, and acquires the cervical cell image from the image identification request.

Scene 2, in a scene for identifying a material image, where the image is a material image, and the object is a crystal in the material, for example, in a scene for identifying a metal material image and determining whether a twin crystal is included in the metal material image, the shooting device is any one of a scanning electron Microscope (sem), an Optical Microscope (OM), or a camera. Taking the shooting device as a scanning electron microscope as an example, for a batch of metal materials, a user can shoot the batch of metal materials by controlling the scanning electron microscope through a terminal to obtain a plurality of metal material images. The user can select among a plurality of metallic material images, the selected metallic material image being the image to be recognized. The user can send an image identification request to the server through the terminal, wherein the image identification request carries the metal material image, and the terminal is also a terminal connected with the OM. The server acquires an image identification request, and acquires the metal material image from the image identification request.

It should be noted that, in a scene in which a material image is recognized, an application program for image recognition as described in scene 1 may also be run on the terminal. The user can select an image to be recognized through the application.

In one possible embodiment, the server loads the image to be recognized from a correspondingly maintained database.

For example, a database maintained by the server correspondingly stores a plurality of images uploaded by the user, and the server loads the image to be identified from the plurality of images. For example, an application program for image recognition is run on the terminal, and a user can upload an image to a database correspondingly maintained by the server through the application program. In some embodiments, the application is capable of exposing a plurality of images stored in a database. When image recognition needs to be performed on an uploaded image, a user can select an image in the database through the application program, and the selected image is the image to be recognized. And responding to the selection operation of any image stored in the database, and sending an image acquisition request to the server by the terminal through the application program, wherein the image acquisition request carries the identifier of the selected image. In response to receiving the image acquisition request, the server acquires the identifier from the image acquisition request, queries in a correspondingly maintained database based on the identifier, and acquires an image corresponding to the identifier, wherein the image is the image to be identified.

402. The server extracts the features of the image to obtain a plurality of object features, wherein one object feature corresponds to one object.

In a possible implementation manner, the server performs convolution processing on the image to obtain a feature map corresponding to the image. The server determines a plurality of target areas on the feature map, one target area corresponding to one object. And the server performs pooling on the partial feature maps respectively corresponding to the target areas to obtain a plurality of object features.

In order to more clearly explain the above embodiment, the above embodiment is explained in three parts.

The first section describes a method of obtaining a feature map corresponding to the image by performing convolution processing on the image by the server.

In a possible implementation manner, the server inputs the image into the image recognition model, and performs convolution processing on the image through the image recognition model to obtain a feature map corresponding to the image. The image recognition model is also the image recognition model 200 in fig. 2.

For example, the server inputs the image into the object feature set obtaining unit 201 of the image recognition model 200, and performs convolution processing on the image through the feature map extraction layer 2011 of the object feature set obtaining unit 201 to obtain a feature map corresponding to the image. In some embodiments, the feature map extraction layer 2011 includes 5 convolution layers and two pooling layers, and after the server inputs the image into the object feature map extraction layer 2011, the server can perform convolution processing on the image through the 5 convolution layers to obtain the depth feature map of the image. And the server performs either mean pooling or maximum pooling on the depth feature map of the image through the two pooling layers to obtain the feature map of the image.

The second section describes a method for determining a plurality of target areas on a feature map by a server.

In one possible implementation, the server divides the feature map into a plurality of candidate areas. And the server determines the probability of the plurality of candidate areas corresponding to the object based on the partial feature maps respectively corresponding to the plurality of candidate areas. And the server determines the candidate area with the probability greater than or equal to the third probability threshold as the target area.

For example, the server divides the image into a plurality of image blocks, and determines the similarity between each two image blocks. And the server combines every two image blocks with the similarity greater than or equal to the similarity threshold into one target image block. After a number of iterations, the server divides the image into a number of target image blocks. And the server maps a plurality of target image blocks on the image to the characteristic diagram to obtain a plurality of areas to be selected. The server performs pooling on partial feature maps corresponding to the multiple candidate areas respectively to obtain partial features corresponding to each partial feature map, wherein one partial feature corresponds to one candidate area, and the pooling is any one of maximum pooling or mean pooling. And the server performs normalization processing on the plurality of partial features to obtain the probability of the plurality of to-be-selected areas corresponding to the object. The server determines the candidate area with the probability greater than or equal to a third probability threshold as the target area, where the third probability threshold is set by a technician according to an actual situation, and the embodiment of the application does not limit the candidate area.

Wherein, when the server determines the similarity between each two image blocks, the similarity between each two image blocks can be determined based on at least one of a color distance and a texture distance of the two image blocks. The color distance refers to the similarity of the RGB channel values of every two image blocks, for example, is a difference between average values of the RGB channels of every two image blocks, and the greater the difference is, the lower the similarity of the two image blocks is; the smaller the difference, the higher the similarity of the two image blocks. Or, the similarity between the color histograms of the RGB channels of every two image blocks is not limited in this embodiment of the application. The texture distance is the similarity between the texture histograms of every two image blocks, and the texture histograms are obtained by the server by performing Fast Scale-invariant Feature Transform (Fast Scale-invariant Feature Transform) on the image blocks. In addition, when determining the similarity between every two image blocks based on at least one of the color distance and the texture distance of every two image blocks, the server may assign a higher merging weight to an image block with a smaller size, assign a lower merging weight to an image block after merging, the more the number of times of merging, the lower the merging weight, the merging weight indicating how difficult the image blocks are merged, the higher the merging weight, the easier the image blocks are merged, and the lower the merging weight, the less easy the image blocks are merged. In the merging process, the server can determine the product of the merging weights of every two image blocks, only when the product is greater than or equal to the product threshold, the two image blocks can be merged, otherwise, the two image blocks cannot be merged, and the product threshold is set by technical personnel according to actual conditions, which is not limited by the embodiment of the application. By the method, the situation that a larger image block continuously 'phagocytizes' a smaller image block, and the target image block is poor in dividing effect can be avoided.

For the server to map a plurality of target image blocks on the image onto the feature map to obtain a plurality of candidate regions, since the image and the size may be different from the size of the feature map, in order to map a plurality of target image blocks on the image onto the feature map to obtain a plurality of candidate regions, the server can perform scaling processing on the image, that is, adjust the size of the image to be the same as that of the feature map, so as to map a plurality of target image blocks on the image onto the feature map. In the mapping process, the server determines the coordinates of each target image block, and determines a to-be-selected area corresponding to each image block on the feature map based on the coordinates of each target image block. In some embodiments, the server can determine the candidate area corresponding to each image block on the feature map based on the coordinates of the four vertices of each target image block.

For the server to perform normalization processing on the multiple partial features to obtain the probability that the multiple candidate regions correspond to the object, the server can perform normalization processing on the multiple partial features by using a Softmax (soft maximization) function to obtain the probability that each candidate region corresponds to the object.

A third step is to describe a method of pooling partial feature maps corresponding to the target areas by the server to obtain a plurality of object features.

In a possible embodiment, the server performs full connection processing and maximum value pooling or mean value pooling on partial feature maps corresponding to a plurality of target areas, respectively, to obtain a plurality of object features, where one object feature corresponds to one target area, that is, to one object.

For example, referring to fig. 2, the server inputs the partial feature maps corresponding to the target regions into the object feature extraction layer 2012 of the object feature set obtaining unit 201, and performs full connection processing and maximum value pooling processing or mean value pooling processing on the partial feature maps corresponding to the target regions through the object feature extraction layer 2012 to obtain a plurality of object features.

403. The server determines a plurality of target objects and a plurality of reference objects from the plurality of objects based on the plurality of object characteristics, wherein the target objects and the reference objects are different types of objects.

In different scenes, the target object and the reference object have different meanings, for example, in a scene of identifying a cervical cell image, the target object is also a positive cell, and the reference object is a negative cell. In some embodiments, positive cells include multiple types Of cells, including, for example, ASCUS (Atypical Squamous Of unknown Significance) cells, LSIL (Low-grade Squamous Intraepithelial Lesion) cells, HSIL (High-grade Squamous Intraepithelial Lesion) cells, and the like. The negative cells are medial squamous epithelial cells, i.e., normal cells. In the scene of identifying the material image, the target object is a twin crystal, and the reference object is a common crystal.

In one possible embodiment, the server determines the probability that the plurality of objects are the target object based on the plurality of object features. The server determines a plurality of objects having a probability greater than or equal to a first probability threshold as a plurality of target objects. The server determines a plurality of objects having probabilities less than a second probability threshold as a plurality of reference objects, the first probability threshold being greater than the second probability threshold. The first probability threshold and the second probability threshold are set by a technician according to an actual situation, which is not limited in the embodiment of the present application.

For example, the server performs full connection processing and normalization processing on the features of the plurality of objects to obtain probabilities that the plurality of objects are target objects, in some embodiments, the server can use score (score) to represent the probabilities, where a value range of the score is (0, 1), and a higher score corresponding to one object indicates that the probability that the object is a target object is higher, or a higher confidence that the object is a target object is higher; the lower the score corresponding to an object, the lower the probability that the object is the target object, or the lower the confidence that the object is the target object. The server determines a plurality of objects having a probability greater than or equal to a first probability threshold as a plurality of target objects. The server determines a plurality of objects with probabilities less than a second probability threshold as a plurality of reference objects. If the server employs scores to represent probabilities, then the first probability threshold is also referred to as a first score threshold and the second probability threshold is also referred to as a second score threshold, and in some embodiments the first score threshold is 0.8 and the second score threshold is 0.2.

For example, referring to fig. 2, the server inputs a plurality of object features into the object type determination layer 2013, and performs full connection processing and normalization processing on the plurality of object features through the object type determination layer 2013, thereby outputting the probability that the plurality of objects are target objects. The server determines a plurality of target objects and a plurality of reference objects from the plurality of objects based on the probabilities.

404. The server fuses the multiple target object features and the multiple reference object features to obtain image features of the image, wherein the target object features are object features corresponding to the target object, and the reference object features are object features corresponding to the reference object.

In one possible implementation, the server encodes an object feature set based on an attention mechanism to obtain image features of the image, wherein the object feature set includes a plurality of target object features and a plurality of reference object features. If the server adopts the target object feature vector to represent the target object features and adopts the reference object feature vector to represent the reference object features, the object feature set is a feature vector set consisting of a plurality of target object feature vectors and a plurality of reference object feature vectors. In some embodiments, the size of the target object feature vector and the size of the reference object feature vector are the same, for example, both are 1 × C, and the size of the object feature set is also N × C, where N is the sum of the number of the target object feature vectors and the number of the reference object feature vectors, C is the dimension of the feature vectors, and both N and C are positive integers.

For example, the server multiplies the set of object features by the first transformation matrix to obtain a query matrix of the image. And the server multiplies the object feature set by the second transformation matrix to obtain a key matrix of the image. And the server multiplies the query matrix by the transposed key matrix to obtain the attention matrix of the image. And the server multiplies the attention matrix and the object feature set to obtain the image features of the image.

In order to more clearly illustrate the above example, the following description will be divided into four parts.

The first section explains a method in which a server multiplies an object feature set by a first transformation matrix to obtain an image query matrix.

In a possible embodiment, the server represents the target object feature and the reference object feature in the form of vectors, and the object feature parameter set is a feature vector set, or an object feature matrix formed by feature vectors. The server multiplies the object feature matrix by a first transformation matrix to obtain a Query (Query) matrix of the image, wherein the first transformation matrix is a weight matrix obtained by model training, for example, the weight matrix obtained by training the image recognition model 200.

For example, referring to FIG. 2, the server can identify the model 200 by image identification unit 202 of the image, as well asThat is, the image feature acquisition layer 2021 multiplies the set of target features by the first transformation matrix to obtain the query matrix of the image. For example, the server determines two target objects and a reference object from the plurality of objects, that is, obtains two target object feature vectors and a reference object feature vector, and if the two target object feature vectors are (2, 1, 2) and (1, 3, 1), respectively, and the reference feature vector is (1, 1, 2), the server can obtain the object feature matrix

The server maps the object feature matrix

And a first transformation matrix WQ

Multiplying to obtain a query matrix Q of the image

And a second part for explaining a method for obtaining a key matrix of the image by multiplying the object feature set and the second transformation matrix by the server.

In a possible embodiment, the server represents the target object feature and the reference object feature in the form of vectors, and the object feature parameter set is a feature vector set, or an object feature matrix formed by feature vectors. The server multiplies the object feature matrix with a second transformation matrix to obtain a Key (Key) matrix of the image, wherein the second transformation matrix is a weight matrix obtained by model training, for example, the weight matrix obtained by training the image recognition model 200.

For example, referring to fig. 2, the server can multiply the object feature set and the second transformation matrix by the image recognition unit 202 of the image recognition model 200, that is, by the image feature acquisition layer 2021, to obtain the key matrix of the image. For example, the server determines from a plurality of objectsTwo target objects and a reference object are determined, namely two target object feature vectors and a reference object feature vector are obtained, if the two target object feature vectors are (2, 1, 2) and (1, 3, 1) respectively, and the reference feature vector is (1, 1, 2), then the server can obtain an object feature matrix

The server maps the object feature matrix

And a second transformation matrix WK

Multiplying to obtain the key matrix K of the image

The third section explains a method in which the server multiplies the query matrix by the transposed key matrix to obtain an attention matrix of the image.

In a possible implementation, the server obtains a transpose of the key matrix, and multiplies the query matrix by the transpose to obtain an attention matrix of the image.

For example, referring to fig. 2, the server can obtain the attention matrix of the image by obtaining the transpose matrix of the key matrix through the image recognition unit 202 of the image recognition model 200, that is, through the image feature obtaining layer 2021, and multiplying the query matrix and the transpose matrix. For example, the key matrix of the image is K

The server can then retrieve the key matrix K

Is transposed matrix K^T

The server will be imageQuery matrix Q

And a transposed matrix K^T

Multiplying to obtain attention moment array of image

In some embodiments, the Attention matrix of the image is also referred to as a Self-Attention Map (Self Attention Map).

And fourthly, the server multiplies the attention matrix and the object feature set to obtain the image features of the image.

In one possible embodiment, referring to fig. 2, the server can obtain the image feature of the image by multiplying the attention matrix and the object feature set by the image recognition unit 202 of the image recognition model 200, that is, by the image feature acquisition layer 2021. For example, the attention matrix of the image is

The server adopts the object characteristic set to represent the object characteristic set, and the object characteristic matrix is

The server can look at the image for a moment array

And object feature matrix

Obtaining an image characteristic matrix of the image by the multiplication

The image feature matrix can also represent image features of the image.

405. The server determines a category of the image based on the image features.

In a possible implementation manner, the server performs dimension reduction processing on the image features to obtain the image features after dimension reduction. And the server performs normalization processing on the image features subjected to dimension reduction to obtain the probability that the images belong to different categories. The server determines a category of the image based on the probability. In some embodiments, the severity of the cervical disease can be represented by a grade, where one grade represents a lower severity, and five grades represents a higher severity, and the server outputs the image category based on the image characteristics, that is, to which grade the cervical cell image belongs. In the scene of identifying the material image, the category of the image is the number of the twin crystals, in some embodiments, the level can be used to represent the number of the twin crystals, one level represents that the number of the twin crystals is small, five levels represents that the number of the twin crystals is large, and the server outputs the image category based on the image characteristics, that is, which level the material image belongs to.

The above embodiments are explained below by two examples.

Example 1, the server performs full connection processing on the image features to obtain the image features after dimensionality reduction, where the dimensionality reduction processing is also full connection processing. And the server performs normalization processing on the image features subjected to dimension reduction to obtain probability distribution lists of the images belonging to different categories, wherein numerical values in the probability distribution lists are used for expressing the probability that the images correspond to one category. The server determines a category of the image based on the probability distribution column.

For example, the server multiplies the image feature by the full-connection weight matrix to obtain the image feature after dimensionality reduction, and if the server adopts the image feature matrix to represent the image feature, the image feature matrix is

The full connection weight matrix is (0.1, -0.1, 0)^TThen the server maps the image feature matrix

With full connection weight matrix (0.1, -0.1, 0)^TMultiplying to obtain the image characteristics (2, 1.5, 3.6) after dimensionality reduction^T. The server pair image characteristics after dimensionality reduction (2, 1.5, 3.6)^TNormalization is carried out, i.e. by using Softmax function pairs (2, 1.5, 3.6)^TAnd processing to obtain probability distribution lists (0.28, 0.21 and 0.51) of the images belonging to different classes. The server can determine the category corresponding to the largest numerical value 0.51 in the probability distribution list (0.28, 0.21, 0.51) as the category of the image. In some embodiments, referring to fig. 2, the above process is implemented by the image classification layer 2022.

And example 2, the server performs pooling on the image features to obtain the image features after dimension reduction, wherein the dimension reduction is pooling. And the server performs normalization processing on the image features subjected to dimension reduction to obtain probability distribution lists of the images belonging to different categories, wherein numerical values in the probability distribution lists are used for expressing the probability that the images correspond to one category. The server determines a category of the image based on the probability distribution column.

For example, the server performs maximum pooling on the image features to obtain the image features after dimensionality reduction, and if the server adopts the image feature matrix to represent the image features, the image feature matrix is

Then the server pairs the image feature matrix

Performing maximum pooling, namely performing maximum pooling on each row of the image feature matrix to obtain the image features after dimension reduction (1813, 1634, 3354)^T. Server for reduced dimension image feature (1813, 1634, 3354)^TNormalization is performed, i.e. using Softmax function pairs (1813, 1634, 3354)^TAnd processing to obtain probability distribution lists (0.28, 0.21 and 0.51) of the images belonging to different classes. The server can distribute the maximum number in the probability distribution list (0.27, 0.24, 0.49)The category corresponding to the value 0.49 is determined as the category of the image. In some embodiments, referring to fig. 2, the above process is implemented by the image classification layer 2022.

In addition, the

above steps

401 and 405 will be described below with reference to fig. 6.

Referring to fig. 6, taking an application scenario as an example of identifying a cervical cell image, a server acquires a cervical cell image 601 to be identified, where the cervical cell image 601 includes a plurality of cervical cells (objects). The server is able to determine a plurality of suspicious positive cells 602 (target objects) and a plurality of reference cells 603 (reference objects) from the cervical cell image 601. The server classifies the cervical cell image 601 based on the attention mechanism by combining a plurality of suspicious positive cells and a plurality of reference cells, and obtains a classification result 604.

On the basis of fig. 6, further description is made in conjunction with fig. 2 and fig. 7.

Referring to fig. 2 and 7, part (a) of fig. 7 is an operation performed by the first partial object feature set acquisition unit 201 of the image recognition model 200, and part (b) of fig. 7 is an operation performed by the second partial image recognition unit 202 of the image recognition model 200. After the server acquires the image of the cervical cells to be identified, the object feature set can be acquired by the object feature set acquisition unit 201. In some embodiments, the object feature set obtaining unit 201, also referred to as suspicious cell detector, is capable of determining suspicious positive cells and reference cells, and generating a cell feature set based on the features of the suspicious positive cells and the reference cells, and when determining the suspicious positive cells and the reference cells from the cervical cell image, a score manner is adopted, that is, cells with corresponding scores higher than a first score threshold value are determined as suspicious positive cells, and cells with corresponding scores lower than a second probability threshold value are determined as reference cells. After that, the server processes the cell feature set by the second partial image recognition unit 202, that is, generates a query matrix q and a key matrix k of the cervical cell image based on the Attention mechanism, and generates a Self-Attention image (Self Attention Map) based on the query matrix q and the key matrix k. The server multiplies the self-attention image by the set of cell features to obtain image features of the cervical cell image, which are also referred to as full-slice features. And the server classifies based on the image characteristics to obtain the category of the cervical cell image.

It should be noted that, the

steps

401 and 405 are described by taking an execution subject as an example, and in other possible embodiments, the

steps

401 and 405 may also be executed by taking a terminal as an execution subject, which is not limited in this embodiment of the present application.

Optionally, after step 405, the server can also issue the category of the image to the terminal, and the terminal displays the category of the image, or, if

step

401 and 405 are executed by the terminal, the terminal can directly display the category of the image, and the method for displaying the category of the image refers to step 406 below.

406. The terminal displays a classification page of the image, and the classification page displays the image, at least one target object, at least one reference object and the category of the image.

In one possible embodiment, the server determines the category of the plurality of target objects based on the plurality of target object characteristics. The server sends the image, the image of the at least one target object, the image of the at least one reference object, the category of the image and the category of the at least one target object to the terminal, and the terminal generates an image classification page based on the image, the image of the at least one target object, the image of the at least one reference object, the category of the image and the category of the at least one target object, wherein the image classification page also displays the image, the image of the at least one target object, the image of the at least one reference object, the category of the image and the category of the at least one target object. In the scene of cervical cell image recognition, positive cells and middle-layer reference cells of various categories can be displayed to a user through a classification page, so that the user can quickly find the middle-layer reference cells and various positive cells for interpretation when seeing a positive film, the workload of the user is reduced, and the interpretation efficiency is improved. In some embodiments, The classification page of The image is a page in a TBS (The Bethesda System, bessel System) structured report.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

When the technical scheme provided by the embodiment of the application is applied to a scene of identifying cell images, for example, a scene of assisting screening of negative for cervical cell images, doctors often have a large number of negative films (about 80-90%) during actual film reading, and the method can screen out a part of negative films on the basis of ensuring positive recall, so that doctors are better assisted in film reading.

Fig. 8 is a schematic structural diagram of an image recognition apparatus provided in an embodiment of the present application, and referring to fig. 8, the apparatus includes: an image acquisition module 801, a feature extraction module 802, an object determination module 803, an image feature acquisition module 804, and a classification module 805.

An image obtaining module 801, configured to obtain an image to be identified, where the image includes a plurality of objects.

The feature extraction module 802 is configured to perform feature extraction on the image to obtain a plurality of object features, where one object feature corresponds to one object.

An object determining module 803, configured to determine, based on the plurality of object features, a plurality of target objects and a plurality of reference objects from the plurality of objects, where the target objects and the reference objects are different types of objects.

The image feature obtaining module 804 is configured to fuse the multiple target object features and the multiple reference object features to obtain image features of the image, where the target object features are object features corresponding to the target object, and the reference object features are object features corresponding to the reference object.

A classification module 805 configured to determine a category of the image based on the image feature.

In a possible implementation, the object determining module 803 is configured to determine probabilities that a plurality of objects are target objects based on a plurality of object features. A plurality of objects having a probability greater than or equal to a first probability threshold are determined as a plurality of target objects. Determining a plurality of objects having a probability less than a second probability threshold as a plurality of reference objects, the first probability threshold being greater than the second probability threshold.

In a possible implementation manner, the object determining module 803 is configured to perform full connection processing and normalization processing on the multiple object features to obtain probabilities that the multiple objects are target objects.

In a possible implementation manner, the image feature obtaining module 804 is configured to encode an object feature set based on an attention mechanism to obtain image features of an image, where the object feature set includes a plurality of target object features and a plurality of reference object features.

In a possible implementation manner, the image feature obtaining module 804 is configured to multiply the set of object features with the first transformation matrix to obtain a query matrix of the image. And multiplying the object feature set by the second transformation matrix to obtain a key matrix of the image. And multiplying the query matrix and the transposed key matrix to obtain the attention matrix of the image. And multiplying the attention matrix and the object feature set to obtain the image features of the image.

In a possible implementation manner, the classification module 805 is configured to perform a dimension reduction process on the image feature to obtain a dimension-reduced image feature. And carrying out normalization processing on the image features subjected to dimension reduction to obtain the probability that the images belong to different categories. Based on the probabilities, a category of the image is determined.

In a possible implementation manner, the feature extraction module 802 is configured to perform convolution processing on the image to obtain a feature map corresponding to the image. A plurality of target regions are determined on the feature map, one target region corresponding to one object. And performing pooling processing on the partial feature maps corresponding to the target areas to obtain a plurality of object features.

In a possible implementation manner, the feature extraction module 802 is configured to divide the feature map into a plurality of candidate regions. And determining the probability of the plurality of candidate areas corresponding to the object based on the partial feature maps respectively corresponding to the plurality of candidate areas. And determining the candidate area with the probability greater than or equal to the third probability threshold as the target area.

In one possible embodiment, the apparatus further comprises:

the display module is used for displaying a classification page of the image, and the classification page is displayed with the image, at least one target object, at least one reference object and the category of the image.

In a possible implementation, the classification module 805 is further configured to determine a category of the target objects based on the target object features.

And the display module is also used for displaying the category of the at least one target object on the classification page.

An embodiment of the present application provides a computer device, configured to perform the foregoing method, where the computer device may be implemented as a terminal or a server, and a structure of the terminal is introduced below:

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 900 may be: a smartphone, a tablet, a laptop, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: one or more processors 901 and one or more memories 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used to store at least one computer program for execution by the processor 901 to implement the image recognition methods provided by the method embodiments herein.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service).

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900.

The gyro sensor 912 may be a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user with respect to the terminal 900.

The pressure sensor 913 may be disposed on a side bezel of the terminal 900 and/or underneath the display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the display screen 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 905.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915.

The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

The computer device may also be implemented as a server, and the following describes a structure of the server:

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the one or more memories 1002 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1001 to implement the methods provided by the foregoing method embodiments. Of course, the server 1000 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 1000 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including a computer program, executable by a processor, is also provided to perform the image recognition method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which includes program code stored in a computer-readable storage medium, which is read by a processor of a computer apparatus from the computer-readable storage medium, and which is executed by the processor to cause the computer apparatus to execute the above-described image recognition method.

In some embodiments, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, and the multiple computer devices distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image recognition method, characterized in that the method comprises:

based on the image features, a category of the image is determined.

2. The method of claim 1, wherein determining a plurality of target objects and a plurality of reference objects from the plurality of objects based on the plurality of object features comprises:

determining probabilities that the plurality of objects are the target object based on the plurality of object features;

determining a plurality of objects having a probability greater than or equal to a first probability threshold as the plurality of target objects;

determining a plurality of objects having probabilities less than a second probability threshold as the plurality of reference objects, the first probability threshold being greater than the second probability threshold.

3. The method of claim 2, wherein determining the probability that the plurality of objects are the target object based on the plurality of object features comprises:

and carrying out full connection processing and normalization processing on the object characteristics to obtain the probability that the objects are the target objects.

4. The method of claim 1, wherein fusing the plurality of target object features and the plurality of reference object features to obtain the image features of the image comprises:

and encoding an object feature set based on an attention mechanism to obtain image features of the image, wherein the object feature set comprises the plurality of target object features and the plurality of reference object features.

5. The method of claim 4, wherein the encoding a set of object features based on an attention mechanism to obtain image features of the image comprises:

multiplying the object feature set by a first transformation matrix to obtain an inquiry matrix of the image;

multiplying the object feature set by a second transformation matrix to obtain a key matrix of the image;

multiplying the query matrix and the transformed key matrix to obtain an attention matrix of the image;

and multiplying the attention matrix and the object feature set to obtain the image features of the image.

6. The method of claim 1, wherein the determining the category of the image based on the image feature comprises:

performing dimension reduction processing on the image features to obtain the image features after dimension reduction;

carrying out normalization processing on the image features subjected to dimension reduction to obtain the probability that the images belong to different categories;

based on the probability, a category of the image is determined.

7. The method of claim 1, wherein the extracting the features of the image to obtain a plurality of object features comprises:

performing convolution processing on the image to obtain a characteristic diagram corresponding to the image;

determining a plurality of target regions on the feature map, one of the target regions corresponding to one of the objects;

and performing pooling processing on the partial feature maps corresponding to the target areas to obtain the object features.

8. The method of claim 7, wherein the determining a plurality of target regions on the feature map comprises:

dividing the characteristic diagram into a plurality of regions to be selected;

determining the probability that the plurality of candidate areas correspond to the object based on the partial feature maps corresponding to the plurality of candidate areas respectively;

and determining the candidate area with the probability greater than or equal to a third probability threshold as the target area.

9. The method of claim 1, wherein after determining the category of the image based on the image feature, the method further comprises:

displaying a classification page of the image, on which the image, at least one target object, at least one reference object, and a category of the image are displayed.

10. The method of claim 9, wherein prior to displaying the classified page of images, the method further comprises:

determining categories of the plurality of target objects based on the plurality of target object features;

the displaying the classification page of the image includes:

displaying the category of the at least one target object on the classification page.

11. An image recognition apparatus, characterized in that the apparatus comprises:

12. The apparatus of claim 11, wherein the object determination module is configured to determine probabilities of the plurality of objects being the target object based on the plurality of object features; determining a plurality of objects having a probability greater than or equal to a first probability threshold as the plurality of target objects; determining a plurality of objects having probabilities less than a second probability threshold as the plurality of reference objects, the first probability threshold being greater than the second probability threshold.

13. The apparatus of claim 12, wherein the object determining module is configured to perform full-join processing and normalization processing on the object features to obtain probabilities that the objects are the target objects.

14. A computer device, characterized in that the computer device comprises one or more processors and one or more memories, in which at least one computer program is stored, which is loaded and executed by the one or more processors to implement the image recognition method according to any one of claims 1 to 10.

15. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the image recognition method according to any one of claims 1 to 10.