CN114519401A

CN114519401A - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN114519401A
Application number: CN202210163242.7A
Authority: CN
Inventors: 周细文; 陈筱; 熊文硕; 曾凡涛; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-05-20

Abstract

An image classification method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: and acquiring a target image, and identifying first key point information from the target image. And correcting the first key point information for multiple times according to a preset rule to obtain a key point information set, wherein the key point information set comprises target key point information corrected each time. And inputting the target key point information corrected each time into a first preset model to obtain a characteristic weight graph corrected each time. And inputting the target image and the feature weight graph corrected each time into a second preset model for feature extraction to obtain target feature information. And classifying the target images according to the target characteristic information to obtain image classification results. According to the method and the device, the key point information can be continuously corrected, the error caused by unbalanced data size distribution is prevented from being continuously accumulated, the second preset model is guided by the characteristic weight graph to learn more effective and interpretable target characteristic information from the target image, and the reliability of image classification is improved.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an image classification method and device, electronic equipment and a storage medium.

Background

In recent years, image classification methods based on computer vision have been widely used in various fields such as disease prediction, risk prediction, and abnormality prediction. In the existing image classification method, a depth model is usually trained by using positive and negative image samples with balanced distribution, and when the number of model layers of the depth model is deep enough, the width of the model is wide enough, and the number of channels of the model layers is large enough, the depth model can effectively fit the sample distribution, thereby realizing a better image classification effect. However, in practice, it is found that under the conditions of less image sample data and unbalanced distribution, the feature difference between positive and negative image samples is not easy to distinguish, so that the existing depth model is easily interfered by noise when identifying image features, resulting in error accumulation and influencing the subsequent training process, thereby reducing the reliability of image classification.

Disclosure of Invention

The application provides an image classification method and device, an electronic device and a storage medium, and mainly aims to improve the reliability of image classification.

In order to achieve the above object, an embodiment of the present application provides an image classification method, where the method includes:

Acquiring a target image, and identifying first key point information from the target image;

correcting the first key point information for multiple times according to a preset rule to obtain a key point information set, wherein the key point information set comprises target key point information corrected each time;

inputting the target key point information corrected each time into a first preset model to obtain a characteristic weight graph corrected each time;

inputting the target image and the feature weight graph corrected each time into a second preset model for feature extraction to obtain target feature information;

and classifying the target image according to the target characteristic information to obtain an image classification result.

In order to achieve the above object, an embodiment of the present application further provides an image classification apparatus, including:

the acquisition module is used for acquiring a target image;

the identification module is used for identifying first key point information in the target image;

the correction module is used for correcting the first key point information for multiple times according to a preset rule to obtain a key point information set, and the key point information set comprises target key point information corrected each time;

the generating module is used for inputting the target key point information corrected each time into a first preset model to obtain a characteristic weight graph corrected each time;

The extraction module is used for inputting the target image and the feature weight graph corrected each time into a second preset model for feature extraction to obtain target feature information;

and the classification module is used for classifying the target images according to the target characteristic information to obtain image classification results.

In order to achieve the above object, an electronic device is further provided in an embodiment of the present application, where the electronic device includes a memory and a processor, and the memory stores a program, and the program implements the steps of the foregoing method when executed by the processor.

To achieve the above objects, the present application provides a storage medium for a computer readable storage, the storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of the aforementioned method.

According to the image classification method and device, the electronic equipment and the storage medium, the first key point information can be identified from the acquired target image, the first key point information is corrected for multiple times according to the preset rule to obtain the target key point information corrected every time, errors of the key point information can be continuously corrected, and the problem that the errors generated by unbalanced data size distribution are continuously accumulated to influence the subsequent characteristic learning process is avoided. Based on the method, a feature weight graph corrected each time is generated according to the target key point information, the target image and the feature weight graph corrected each time are input into the second preset model for feature extraction, the second preset model can be guided to identify a key area with high distinguishing degree from the target image through the feature weight graph, more effective and interpretable target feature information is learned from the key area, and therefore the reliability of image classification according to the target feature information is improved.

Drawings

Fig. 1 is a block diagram of an electronic device to which an embodiment of the present application is applied;

fig. 2 is a schematic flowchart of an image classification method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a specific flowchart of step S210 in FIG. 2;

FIG. 4 is a schematic diagram of a system to which embodiments of the present application are applied;

fig. 5 is a block diagram of an image classification apparatus applied in the embodiment of the present application.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no peculiar meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other large directions.

Computer vision is a process of understanding and analyzing images using machines, and belongs to an important branch of artificial intelligence. In recent years, image classification methods based on computer vision have been widely used in various fields such as disease prediction, risk prediction, and abnormality prediction. In the existing image classification method, a depth model is usually trained by using positive and negative image samples with balanced distribution, and when the number of model layers of the depth model is deep enough, the width of the model is wide enough, and the number of channels of the model layers is large enough, the depth model can effectively fit the sample distribution, thereby realizing a better image classification effect. However, in practice, it is found that under the conditions of less image sample data and unbalanced distribution, the feature difference between positive and negative image samples is not easily distinguished, and in order to prevent overfitting, the number of model layers, the width of the model and the number of channels of the depth model cannot be effectively balanced, so that the existing depth model is easily interfered by noise when identifying image features, errors are accumulated and the subsequent training process is affected, and thus the reliability of image classification is reduced.

In order to solve the above problem, the present application provides an image classification method applied to an electronic device. Referring to fig. 1, fig. 1 is a block diagram of an electronic device to which an embodiment of the present application is applied.

In the embodiment, the electronic device may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device includes: memory 11, processor 12, network interface 13, and data bus 14.

The memory 11 includes at least one type of readable storage medium, which may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device, such as a hard disk of the electronic device. In other embodiments, the readable storage medium may be an external memory of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

In the present embodiment, the readable storage medium of the memory 11 is generally used for storing an image classification program installed in the electronic device, a plurality of sample sets, a pre-trained model, and the like. The memory 11 may also be used to temporarily store data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, executes program code stored in memory 11 or processes data, such as performs an image classification procedure.

The network interface 13 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the electronic device and other electronic devices.

The data bus 14 is used to enable connection communication between these components.

Optionally, the electronic device may further include a user interface, the user interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound box, a headset, or other devices, and optionally, the user interface may further include a standard wired interface or a wireless interface.

Optionally, the electronic device may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device and for displaying a visualized user interface.

Optionally, the electronic device further comprises a touch sensor. The area provided by the touch sensor for the user to perform touch operation is referred to as a touch area. Further, the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.

In addition, the area of the display of the electronic device may be the same as or different from the area of the touch sensor. Optionally, the display is stacked with the touch sensor to form a touch display screen. The device detects touch operation triggered by a user based on the touch display screen.

The following describes an image classification method disclosed in the embodiments of the present application in detail.

As shown in fig. 2, fig. 2 is a schematic flowchart of an image classification method according to an embodiment of the present application. Based on the embodiment of the electronic device shown in fig. 1, the processor 12 implements the following steps when executing the program stored in the memory 11:

step S200: and acquiring a target image, and identifying first key point information from the target image.

The method and the device can be applied to various image recognition and classification scenes such as face recognition, a traffic control system, satellite image object positioning, pedestrian detection, medical image processing and the like. The target image may be an image including at least one identifiable object, and the identifiable object may be any object having specific properties (including shape, gray scale, texture, and the like), such as a human face, a human body, a tumor and other pathological localization objects of a medical image, tissues, and the like, and a vehicle of a traffic detection scene, and the like. Illustratively, if the target image is a face image, the face in the face image is an identifiable object.

In the embodiment of the present application, the first keypoint information includes a plurality of keypoints identified from the target image and coordinates of the keypoints in the target image, where the types of the keypoints are related to the type of the target image, for example, for a face image, the plurality of keypoints include, but are not limited to, an eye corner, a nose tip, a mouth corner, a face pointness, an eyebrow, and the like. Specifically, the first keypoint information may be identified from the target image by using an existing machine learning library, for example, by using a Dlib library, and identifying 68 face keypoints from the face image by using a face detection model such as a Histogram of Oriented Gradients (HOG) feature model or a Convolutional Neural Network (CNN) model.

Step S210: and correcting the first key point information for multiple times according to a preset rule to obtain a key point information set, wherein the key point information set comprises target key point information corrected each time.

In this embodiment, specifically, the first keypoint information may be input into a correction model to be corrected for multiple times, so as to obtain a keypoint information set. The modification model may adopt a neural network model (such as a CNN network model, a recurrent neural network model, a long-short term memory network model, or a gate cycle unit network model), a bayesian model, or a self-encoder deep learning model, which is not particularly limited. In one implementation, a certain number of key point samples and annotation data for correcting key point samples may be collected, where a key point sample is a plurality of key points identified from a sample image, the key points include abnormal key points having problems such as position deviation, abnormal values (for example, values that are obviously too large or too small), and unspecified target key point types, and the annotation data represents a correction result for the abnormal key points. Based on the method, a correction model which is constructed in advance is trained by using the key point samples, and the accuracy of the trained correction model is verified by using the labeled data. If the accuracy is less than the preset accuracy, increasing the number of key sample samples and executing the training step again; and if the accuracy is greater than or equal to the preset accuracy, finishing the training.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment of step S210 in fig. 2. As shown in fig. 3, step S210 includes, but is not limited to, the following steps S211 to S214.

Step S211: and acquiring classification task information of the target image, and determining the type of the target key point according to the classification task information.

The classification task information is used for indicating a classification task for the target image, and the classification task may be manually specified or determined according to an image type of the target image, for example, for a face image, the classification task is face recognition or disease diagnosis; aiming at the medical image, the classification task is focus identification; for the human body image, the classification task is gesture recognition and the like. The types of the target key points required to be identified by different classification tasks are different, and can be set and adjusted manually. For example, in the face recognition task, the target key point types include face five sense organ key points; in the focus identification task, the types of the target key points comprise focus contour points and the like.

Step S212: and adjusting the first key point information according to the target image and the target key point type to obtain second key point information meeting the target key point type.

In practical applications, for different types of target images, a common key point identification manner (for example, training a common key point identification model or directly calling an existing model) may be first adopted to identify first key point information from the target images. It can be understood that, at this time, the first keypoint information does not necessarily cover the actually required target keypoint type, and therefore, in step S212, the missing keypoint type may be determined according to the target keypoint type and the first keypoint information, and then, according to the first keypoint information and the missing keypoint type, the keypoint and the coordinate thereof corresponding to the missing keypoint type may be further located from the target image, so as to generate the second keypoint information together with the first keypoint information, thereby implementing adaptive adjustment on the first keypoint information, and being able to ensure complete coverage of the target keypoint type required by the classification task.

For example, assuming that the classification task is disease diagnosis, in the case that the target image is a face image, the first keypoint information includes 68 specific face keypoints, such as lower lip keypoints, face contour keypoints, eyebrow keypoints, and the like, and does not include the forehead, under-eye pouch, chin, eyebrow, and the like keypoints related to a disease condition, so that the first keypoint information can be further refined by determining the chin keypoints and coordinates thereof from the face image according to the lower lip keypoints and the face contour keypoints, and determining the eyebrow keypoints and coordinates thereof according to the left and right eyebrow keypoints.

Step S213: and determining the first corrected target key point information according to the second key point information.

Step S214: and correcting the second key point information for N times, and obtaining corrected target key point information for the (i + 1) th time when the second key point information is corrected for the ith time.

Wherein N and i are both positive integers, and i belongs to [1, N ]. Optionally, the correction model may include N correction units connected in sequence. Accordingly, step S214 is: and inputting the second key point information into the correction model to obtain the key point information output by each correction unit, and taking the key point information output by the ith correction unit as the target key point information of the (i + 1) th correction.

For example, please refer to fig. 4, fig. 4 is a schematic diagram of a system applied in the embodiment of the present application. As shown in fig. 4, the second keypoint information is input to the first correcting unit a1 of the correction model, and the keypoint information output by the correcting unit a1, that is, the target keypoint information of the second correction is obtained. Then, the keypoint information output by the correction unit a1 is input into the second correction unit a2, and the keypoint information output by the correction unit a2, that is, the target keypoint information of the third correction is obtained. And repeating the steps until the key point information output by the Nth correction unit, namely the target key point information corrected for the (N + 1) th time is obtained, and finishing the correction.

As an optional implementation manner, when the ith correction is performed on the second keypoint information, the correction parameter may be generated according to the target keypoint information corrected at the ith time. And correcting the ith corrected target key point information by using the correction parameters to obtain the (i + 1) th corrected target key point information.

Optionally, each correction unit in the correction model may further include a learning layer and a correction layer. The correction layer may be an adding unit, the learning layer may adopt a CNN module, and the learning layer includes a plurality of convolution layers connected to each other, for example, 5 convolution layers, which are not specifically limited. More specifically, the second keypoint information is input into the learning layer of the first correction unit to obtain the correction parameter of the first correction unit, and then the correction parameter of the first correction unit and the second keypoint information are input into the correction layer of the first correction unit to obtain the keypoint information output by the first correction unit. And then inputting the key point information output by the ith correction unit into a learning layer of the (i + 1) th correction unit based on the characteristic of the interconnection of the N correction units to obtain the correction parameter of the (i + 1) th correction unit, and then inputting the correction parameter of the (i + 1) th correction unit and the key point information output by the ith correction unit into a correction layer of the (i + 1) th correction unit to obtain the key point information output by the (i + 1) th correction unit. Therefore, the learning area of the key point features does not need to be completely limited, each correction unit can perform associated correction according to the output of the previous correction unit, and the method is more flexible.

Further, as an optional implementation manner, the ith corrected target keypoint information includes first coordinates of a plurality of keypoints, and the (i + 1) th corrected target keypoint information includes second coordinates of the plurality of keypoints, where the second coordinates satisfy:

(x ', y') (x, y) + (dx, dy), where (x ', y') is the second coordinate, (x, y) is the first coordinate, and dx and dy are the correction parameters, thereby implementing independent learning and adjustment of the correction quantity of the abscissa and the ordinate of the key point.

Step S220: and inputting the target key point information corrected each time into a first preset model to obtain a characteristic weight graph corrected each time.

In this embodiment of the application, the first preset model may adopt an attention model, where the attention model is used to determine an attention area indicated by target key point information from a target image, and a corresponding feature weight map is obtained by performing weighting processing on the attention area, so as to achieve an attention focusing effect on an effective feature area. Specifically, the attention model may adopt a first preset model (CBAM), a compressed-and-activation (SE) module, or an efficient pyramid attention partitioning (EPSA) module based on a convolutional neural network, or may adopt a combination structure of at least two of the foregoing modules, which is not limited herein.

As an alternative embodiment, step S220 may include, but is not limited to, the following implementation steps:

inputting the target key point information corrected each time into a first preset model according to a preset learning formula to learn attention parameters, obtaining the attention parameters corrected each time, and generating a feature weight graph corrected each time according to the target key point information and the attention parameters corrected each time, wherein the preset learning formula is as follows:

g (x, y) satisfies Gaussian distribution, x and y are respectively the abscissa and the ordinate of each key point in the target key point information, and sigma is an attention parameter.

Specifically, after obtaining the attention parameter corrected each time, the attention parameter is substituted into a preset learning formula, and the weight value of each key point in the target key point information can also be obtained. The target key point information is weighted through the weight values of all key points in the target key point information, so that the feature weight graph corrected each time can be obtained, the attention area indicated by the target key point information can be positioned, and the attention distribution probability of all key points in the feature weight graph meets Gaussian distribution.

In some implementations, the first predetermined model may also include a plurality of attention units, and each attention unit may include a plurality of convolution layers connected in sequence, such as 3 convolution layers, without limitation.

In practical applications, the ith correction unit may correspond to the (i + 1) th attention unit, that is, the output end of the ith correction unit is connected with the (i + 1) th attention unitThe input ends are connected. The description is given by way of example in fig. 4. As shown in fig. 4, the second keypoint information is input into the first attention unit b1, and the feature weight map output by the attention unit b1, that is, the feature weight map of the first correction, is obtained; inputting the keypoint information output by the correction unit a1 into a second attention unit b2 to obtain a feature weight map output by the attention unit b2, namely a feature weight map corrected for the second time; and so on until the (N + 1) th attention unit b passes_N+1And obtaining the characteristic weight map of the (N + 1) th correction.

Step S230: and inputting the target image and the feature weight graph corrected each time into a second preset model for feature extraction to obtain target feature information.

As an optional implementation manner, the second preset model may specifically include N +1 feature extraction units, and each feature extraction unit employs a pre-trained CNN module. Accordingly, step S230 may include, but is not limited to, the following implementation steps:

and inputting the target image and the feature weight graph corrected each time into a second preset model for N +1 times of feature extraction. In the first feature extraction, the target image and the first corrected feature weight map may be subjected to fusion processing to obtain first corrected feature information. Specifically, the second preset model may further include an initial extraction unit, where the initial extraction unit may also be a CNN module, and the target image is input into the initial extraction unit to perform feature extraction, so as to obtain an original feature image (feature map), and then the original feature image is weighted by using the first corrected feature weight map, so as to obtain first corrected feature information. Based on the first characteristic information, the first corrected characteristic information is input into a second preset model for characteristic extraction, and second characteristic information extracted for the first time is obtained.

In the j +1 th time of feature extraction, the j +1 th time of extracted second feature information and the j +1 th time of corrected feature weight graph are subjected to fusion processing to obtain the j +1 th time of corrected first feature information, and then the j +1 th time of corrected first feature information is input into a second preset model to be subjected to feature extraction to obtain the j +1 th time of extracted second feature information, wherein j is a positive integer and belongs to [1, N ].

And finally, taking the second characteristic information extracted at the (N + 1) th time as target characteristic information.

In practical applications, the jth feature extraction unit may correspond to the jth attention unit, and the N +1 th feature extraction unit may correspond to the N +1 th attention unit. The description is made by taking fig. 4 as an example. As shown in fig. 4, the feature weight map output by the attention unit b1 is fused with the target image (or the original feature image) and then input into the first feature extraction unit c1, so as to obtain the second feature information output by the feature extraction unit c1, i.e. the first extracted second feature information; fusing the feature weight map output by the attention unit b2 with the second feature information output by the feature extraction unit c1, and inputting the fused feature weight map and the second feature information into a second feature extraction unit c2 to obtain second feature information output by the feature extraction unit c2, namely second feature information extracted for the second time; and so on until obtaining the N +1 th feature extraction unit c _N+1And outputting the second feature information, namely the second feature information extracted at the (N + 1) th time.

Therefore, the feature weight graph output by each attention unit is used for being input into different feature extraction units in the second preset model, and can play a role in weighting in different feature extraction stages, so that the feature extraction units are guided to extract feature details for weighted key feature areas in the image. In addition, since the feature weight map is generated according to the target key point information which is continuously corrected, the correction of the key point information also runs through the whole feature extraction process, so that the second preset model has the capability of adapting to the change of the feature map. That is to say, compared with the method that the attention mechanism is relied on, the maximum value coordinate of the feature map parameter is taken from the model channel dimension or the time domain dimension to be taken as the attention point for feature extraction, the method and the device can reduce the error accumulation problem, for example, the maximum value is unstable in the training process and often appears in a completely irrelevant position, so that the more learned and more biased features are caused, and the training result cannot be effectively explained, thereby being beneficial to effectively extracting interpretable image features under the condition that the sample data distribution is unbalanced so as to realize a more accurate image classification result.

It is to be understood that the second feature information may be a heat map (heat map) of each correction, and the magnitude of the heat map of each correction gradually decreases as the number of times of feature extraction increases by adjusting the convolution parameters of the respective feature extraction units until the second feature information extracted at the (N + 1) th time satisfies the specified image size. Further, for the feature extraction unit, the attention unit, and the modification unit (such as the feature extraction unit c2, the attention unit b2, and the modification unit a1 shown in fig. 4) corresponding to each other, network parameters (such as convolution parameters) adopted by the three units have relevance, so that the overall synchronization and cooperation effects of the key point modification and attention mechanism and different feature extraction stages are realized.

Step S240: and classifying the target images according to the target characteristic information to obtain image classification results.

In the embodiment of the present application, the implementation manner of step S240 may specifically relate to the classification task information, so as to be suitable for diversified image classification scenes. As an optional implementation manner, in the context of disease diagnosis, step S240 specifically includes the following steps:

and inputting the target characteristic information into a face attribute prediction module to obtain a face attribute prediction result. And inputting the target characteristic information into a face abnormity prediction module to obtain a face abnormity prediction result. And classifying the target images according to the face attribute prediction result and the face abnormity prediction result, and determining an image classification result, wherein the image classification result comprises the target disease identified from the target images or the target disease not identified.

The face attribute prediction module is configured to predict a face abnormal attribute included in the target image according to the target feature information, where the face abnormal attribute may include but is not limited to an asymmetry of the face, a wide inter-ocular distance, a flat face, a relatively flat mountain root, a small eye fissure, a corner skin and an upward shift of the outer side of the eye. The human face abnormity prediction module is used for predicting whether the target image has dementia expression according to the target characteristic information, and the dementia expression is used for representing the disease types of target diseases such as Down syndrome, intellectual abnormality and non-actors. The face attribute prediction module and the face anomaly prediction module may both adopt a classification model based on a CNN network (such as VGG, google lenet, or ResNet) or naive bayes, which is not specifically limited. Meanwhile, the human face attribute prediction module and the human face abnormity prediction module respectively adopt different module parameters, do not share a weight value, and independently learn the image characteristics which are difficult to distinguish.

Optionally, when the face attribute prediction module and the face abnormality prediction module are trained, the overall loss values of the face attribute prediction module and the face abnormality prediction module are a weighted summation result of a first loss value of the face attribute prediction module and a second loss value of the face abnormality prediction module, for example, the overall loss value is equal to the first loss value plus the second loss value, so that the parameters of the face attribute prediction module and the face abnormality prediction module are adjusted through back propagation of the overall loss value until the training is finished. The first loss value is the cross entropy of the face attribute prediction module, and the second loss value is the cross entropy of the face abnormity prediction module.

Therefore, the specific human face abnormal attribute and the prediction result of the dementia performance are combined, and whether the target disease can be recognized from the target image or not can be judged more accurately.

In other alternative embodiments, different classification models may be trained for other application scenarios such as lesion recognition, human body posture detection, or vehicle recognition. In practical application, after the target feature information is obtained through steps S200 to S230, the corresponding classification model is called according to the classification task information, and then the target feature information is input into the classification model, so that an image classification result matched with the current classification task can be obtained. It can be understood that according to actual requirements, different correction models and/or second preset models can be trained for different classification tasks, and the training method is flexible and variable and meets diversified classification requirements.

Therefore, by implementing the method embodiment, the error of the key point information can be continuously corrected, the problem that the subsequent characteristic learning process is influenced due to continuous accumulation of errors caused by unbalanced data quantity distribution is avoided, the second preset model is guided by the characteristic weight graph to identify the key area with higher distinguishing degree from the target image, more effective and interpretable target characteristic information is learned from the key area, and the reliability of image classification according to the target characteristic information is improved.

The embodiment of the application further provides an image classification device. Referring to fig. 5, fig. 5 is a block diagram of an image classification apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the image classification apparatus 500 includes an obtaining module 510, an identifying module 520, a modifying module 530, a generating module 540, an extracting module 550, and a classifying module 560, wherein:

an obtaining module 510, configured to obtain a target image.

The identifying module 520 is configured to identify the first keypoint information in the target image.

The modifying module 530 is configured to modify the first keypoint information for multiple times according to a preset rule to obtain a keypoint information set, where the keypoint information set includes target keypoint information for each modification.

And the generating module 540 is configured to input the target keypoint information corrected each time into a first preset model, so as to obtain a feature weight map corrected each time.

And an extracting module 550, configured to input the target image and the feature weight map corrected each time into a second preset model for feature extraction, so as to obtain target feature information.

And the classification module 560 is configured to classify the target image according to the target feature information to obtain an image classification result.

It should be noted that, for the specific implementation process of this embodiment, reference may be made to the specific implementation process of the foregoing method embodiment, and details are not described again.

The embodiment of the application further provides electronic equipment, which comprises a memory and a processor, wherein the memory stores programs, and the programs are executed by the processor to realize the image classification method.

Embodiments of the present application also provide a storage medium for a computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the image classification method.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of the claims of the present application is not limited thereby. Any modifications, equivalents and improvements which may occur to those skilled in the art without departing from the scope and spirit of the present application are intended to be within the scope of the claims of the present application.

Claims

1. A method of image classification, the method comprising:

and classifying the target images according to the target characteristic information to obtain image classification results.

2. The method according to claim 1, wherein the modifying the first keypoint information for a plurality of times according to a preset rule to obtain a keypoint information set comprises:

Acquiring classification task information of the target image, and determining the type of a target key point according to the classification task information;

adjusting the first key point information according to the target image and the target key point type to obtain second key point information meeting the target key point type;

determining target key point information corrected for the first time according to the second key point information;

and correcting the second key point information for N times, and obtaining target key point information corrected for the (i + 1) th time when the second key point information is corrected for the (i) th time, wherein both N and i are positive integers, and i belongs to [1, N ].

3. The method according to claim 2, wherein the obtaining of the corrected target keypoint information for the (i + 1) th time when the correction is performed for the ith time on the second keypoint information comprises:

when the second key point information is corrected for the ith time, generating a correction parameter according to the corrected target key point information for the ith time;

and correcting the ith corrected target key point information by using the correction parameters to obtain the (i + 1) th corrected target key point information.

4. The method according to claim 3, wherein the i-th corrected target keypoint information comprises first coordinates of a plurality of keypoints, and the i + 1-th corrected target keypoint information comprises second coordinates of the plurality of keypoints, the second coordinates satisfying:

(x ', y') (x, y) + (dx, dy), where (x ', y') is the second coordinate, (x, y) is the first coordinate, and dx and dy are the correction parameters.

5. The method according to any one of claims 1 to 4, wherein the inputting the target keypoint information of each correction into a first preset model to obtain a feature weight map of each correction comprises:

inputting the target key point information corrected each time into a first preset model for attention parameter learning according to a preset learning formula to obtain an attention parameter corrected each time;

generating a feature weight graph of each correction according to the target key point information and the attention parameter of each correction;

wherein the preset learning formula is:

g (x, y) satisfies Gaussian distribution, x and y are respectively an abscissa and an ordinate of each key point in the target key point information, and sigma is the attention parameter.

6. The method according to any one of claims 1 to 4, wherein the inputting the target image and the feature weight map corrected each time into a second preset model for feature extraction to obtain target feature information comprises:

inputting the target image and the feature weight graph corrected each time into a second preset model for carrying out N +1 times of feature extraction, wherein N is a positive integer;

In the first feature extraction, the target image and the first corrected feature weight graph are subjected to fusion processing to obtain first corrected feature information, and the first corrected feature information is input into the second preset model to be subjected to feature extraction to obtain second extracted feature information;

in the (j + 1) th feature extraction, carrying out fusion processing on the j-th extracted second feature information and the (j + 1) th corrected feature weight graph to obtain j + 1-th corrected first feature information, and inputting the j + 1-th corrected first feature information into the second preset model for feature extraction to obtain j + 1-th extracted second feature information, wherein j is a positive integer and belongs to [1, N ];

and taking the second characteristic information extracted at the (N + 1) th time as target characteristic information.

7. The method according to any one of claims 1 to 4, wherein the classifying the target image according to the target feature information to obtain an image classification result comprises:

inputting the target characteristic information into a face attribute prediction module to obtain a face attribute prediction result;

inputting the target characteristic information into a face abnormity prediction module to obtain a face abnormity prediction result;

And classifying the target image according to the face attribute prediction result and the face abnormity prediction result, and determining an image classification result, wherein the image classification result comprises the target disease identified from the target image or the target disease not identified.

8. An image classification apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a target image;

9. An electronic device, characterized in that the electronic device comprises a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the program, when executed by the processor, implementing the steps of the image classification method according to any one of claims 1 to 7.

10. A storage medium for computer readable storage, characterized in that the storage medium stores one or more programs which are executable by one or more processors to implement the steps of the image classification method of any one of claims 1 to 7.