CN115661486B

CN115661486B - Intelligent image feature extraction method and device

Info

Publication number: CN115661486B
Application number: CN202211701613.9A
Authority: CN
Inventors: 陈畅新; 李展铿
Original assignee: Youmi Technology Co ltd
Current assignee: Youmi Technology Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-07
Anticipated expiration: 2042-12-29
Also published as: CN115661486A

Abstract

The invention discloses an intelligent extraction method and device of image features, wherein the method comprises the following steps: training a preset feature extraction model to be trained according to the determined target training image set for training to obtain a trained feature extraction model; and judging whether the trained feature extraction model is converged, and if so, determining the trained feature extraction model as a target feature extraction model. Compared with the traditional local feature extraction, the extracted global feature has more distinctive convolution features, so that even if the size of an image feature map is gradually reduced, the relevant details such as the edge of the feature map and the like can be kept, the image feature can be reliably, accurately and effectively extracted, the accuracy of a subsequent image classification task can be improved, and the reliability and the accuracy of operations such as identification, classification and retrieval of image data can be improved.

Description

Intelligent image feature extraction method and device

Technical Field

The invention relates to the technical field of image processing, in particular to an intelligent image feature extraction method and device.

Background

With the rapid development of image processing technology, image feature extraction technology has been widely applied in image classification tasks to identify, classify, and retrieve a large amount of image data.

Currently, the extraction mode of image features is generally realized by extracting local features of an image through a deep convolutional neural network. However, practice shows that the size of the image feature map is gradually reduced along with the continuous deepening of the convolution network in the local feature extraction of the image, so that relevant detail features such as edges of the image shallow feature map and the like are easily ignored, the convolution feature discrimination between the image feature maps is not high, the accuracy of an image classification task is difficult to improve, and a huge challenge is brought to the image classification task. Therefore, it is important to provide a method capable of improving the accuracy of image feature extraction.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method and an apparatus for intelligently extracting image features, which can facilitate reliable, accurate and effective extraction of image features, and further facilitate improvement of accuracy of subsequent image classification tasks, thereby facilitating improvement of reliability and accuracy of operations such as identification, classification and retrieval of image data.

In order to solve the above technical problem, a first aspect of the present invention discloses an intelligent image feature extraction method, including:

determining a target training image set for training; the target training image set comprises a first training image set and a second training image set corresponding to the first training image set, and the second training image set is obtained by performing image preprocessing on the first training image set;

according to the target training image set, performing model training operation on a preset feature extraction model to be trained to obtain a trained feature extraction model; the feature extraction model to be trained comprises a sampling layer, a target feature processing layer and a category prediction layer;

judging whether the trained feature extraction model is converged, if so, determining the trained feature extraction model as a target feature extraction model; the target feature extraction model is used for extracting image features of the target image to be processed.

As an optional implementation manner, in the first aspect of the present invention, the performing, according to the target training image set, a model training operation on a preset feature extraction model to be trained to obtain a trained feature extraction model includes:

performing image sampling operation on each target training image in the target training image set through a sampling layer of a preset feature extraction model to be trained and a first convolution parameter of the sampling layer to obtain a sampled training image corresponding to each target training image; the first convolution parameters of the sampling layer comprise sampling convolution kernel size parameters and/or sampling convolution step length parameters;

performing feature processing operation on the sampled training images corresponding to each target training image through a target feature processing layer of the feature extraction model to be trained to obtain target feature maps corresponding to all the target training images;

and executing category prediction operation on the target feature map corresponding to each target training image through a category prediction layer of the feature extraction model to be trained to obtain a category prediction score of the target feature map corresponding to each target training image, wherein the category prediction score is used as the target category prediction score of each target training image.

As an optional implementation manner, in the first aspect of the present invention, the target feature processing layer includes a plurality of global feature processing layers;

the obtaining, by the target feature processing layer of the to-be-trained feature extraction model, a target feature map corresponding to all target training images by performing feature processing operations on the sampled training images corresponding to each target training image includes:

for each global feature processing layer, determining all images to be processed corresponding to the global feature processing layer according to the sampled training images corresponding to all the target training images, and performing feature processing operation on all the images to be processed through the global feature processing layer to obtain undetermined feature maps corresponding to all the images to be processed;

determining current feature processing rounds corresponding to all the target training images, and judging whether the current feature processing rounds are larger than or equal to a preset round threshold value;

when the judgment result is negative, increasing the current feature processing round by 1 time, determining undetermined feature maps corresponding to all the images to be processed as all the images to be processed corresponding to a next global feature processing layer, triggering and executing the operation of performing feature processing operation on all the images to be processed through the global feature processing layer to obtain the undetermined feature maps corresponding to all the images to be processed, determining the current feature processing round corresponding to all the target training images, and judging whether the current feature processing round is larger than or equal to a preset round threshold value or not; the global feature processing layer is the next global feature processing layer;

and when the judgment result is yes, determining the undetermined feature maps corresponding to all the images to be processed as the target feature maps corresponding to all the target training images.

As an optional implementation manner, in the first aspect of the present invention, each of the global feature processing layers includes a corresponding global feature extractor and a corresponding global feature fuser;

the obtaining, by the global feature processing layer, the undetermined feature map corresponding to all the images to be processed by performing feature processing operations on all the images to be processed includes:

performing feature extraction operation on each image to be processed through a global feature extractor corresponding to the global feature processing layer and a preset second convolution parameter of the global feature extractor to obtain a feature map to be processed corresponding to each image to be processed; the second convolution parameters of the global feature extractor comprise a feature extraction convolution kernel size parameter and/or a feature extraction convolution step parameter;

performing feature fusion operation on the to-be-processed feature map corresponding to each to-be-processed image through a global feature fusion device corresponding to the global feature processing layer to obtain a feature fusion map corresponding to each to-be-processed image;

and performing addition operation on the to-be-processed feature map corresponding to each to-be-processed image and the corresponding feature fusion map to obtain the to-be-determined feature maps corresponding to all the to-be-processed images.

As an optional implementation manner, in the first aspect of the present invention, the performing, by the global feature fusion device corresponding to the global feature processing layer, a feature fusion operation on the to-be-processed feature map corresponding to each to-be-processed image to obtain a feature fusion map corresponding to each to-be-processed image includes:

inputting all the images to be processed into a global feature fusion device corresponding to the global feature processing layer, so that each feature fusion module of the global feature fusion device performs the following operations:

for each image to be processed, performing image copying operation on the image to be processed to obtain a plurality of copied images corresponding to the image to be processed, and performing feature point position scrambling operation on all feature points in each copied image to obtain all scrambled feature points corresponding to each copied image; the image to be processed and each copied image comprise a plurality of corresponding feature points, and all the feature points in each copied image are matched with all the feature points in the image to be processed;

for each copied image of each image to be processed, extracting spatial structure feature information of each disordered feature point corresponding to the copied image, and performing feature point position recovery operation on all the disordered feature points according to the spatial structure feature information of each disordered feature point to obtain all recovered feature points corresponding to the copied image; the spatial structure feature information of each disordered feature point comprises spatial structure incidence relation information between the disordered feature point and any other disordered feature point in the copied image;

for each image to be processed, according to a predetermined first feature value of each feature point in the image to be processed, a corresponding second feature value of each restored feature point in each copied image, a first weight of the image to be processed and a corresponding second weight of each copied image, performing feature weighted summation operation on each feature point in the image to be processed to obtain all weighted summed feature points of the image to be processed;

for each image to be processed, according to spatial structure feature information corresponding to each weighted and summed feature point in the image to be processed, performing feature fusion operation on each weighted and summed feature point to obtain all fused feature points of the image to be processed, and according to all the fused feature points of the image to be processed, determining a fused image to be determined corresponding to the image to be processed;

after the undetermined feature fusion graphs corresponding to all the images to be processed are obtained, judging whether a next feature fusion module exists or not; when the judgment result is yes, determining the pending feature fusion graphs corresponding to all the images to be processed as all the images to be processed corresponding to the next feature fusion module, inputting all the corresponding images to be processed into the next feature fusion module, and triggering and executing the image copying operation on the images to be processed for each image to be processed to obtain a plurality of copied images corresponding to the images to be processed; and when the judgment result is negative, determining the undetermined feature fusion graphs corresponding to all the images to be processed as the feature fusion graphs corresponding to all the images to be processed.

As an optional implementation manner, in the first aspect of the present invention, the performing, by the class prediction layer of the to-be-trained feature extraction model, a class prediction operation on the target feature map corresponding to each target training image to obtain a class prediction score of the target feature map corresponding to each target training image, where the class prediction score is used as the target class prediction score of each target training image, includes:

for a target feature map corresponding to each target training image, executing category probability calculation operation on each target feature point in the target feature map through a category prediction layer of the feature extraction model to be trained to obtain category probability parameters of all the target feature points of the target feature map, and determining the target category probability parameters of the target feature map according to the category probability parameters of all the target feature points of the target feature map;

and executing target function conversion operation on the target category probability parameters of the target characteristic diagram corresponding to each target training image to obtain the category prediction score of the target characteristic diagram corresponding to each target training image, wherein the category prediction score is used as the target category prediction score of each target training image.

As an alternative implementation, in the first aspect of the present invention, all of the target training images in the target training images include all of the first training images in the first training image set and all of the second training images in the second training image set;

wherein, the judging whether the trained feature extraction model converges comprises:

calculating a target loss parameter of the trained feature extraction model according to the target class prediction score of each first training image, the target class prediction score of each second training image, a preset first label of each first training image and a preset second label of each second training image, and judging whether the target loss parameter is less than or equal to a preset loss parameter threshold value;

and when the judgment result is yes, determining that the trained feature extraction model is converged.

The second aspect of the present invention discloses an intelligent image feature extraction device, which comprises:

the determining module is used for determining a target training image set for training; the target training image set comprises a first training image set and a second training image set corresponding to the first training image set, and the second training image set is obtained by performing image preprocessing on the first training image set;

the training module is used for executing model training operation on a preset feature extraction model to be trained according to the target training image set to obtain a trained feature extraction model; the feature extraction model to be trained comprises a sampling layer, a target feature processing layer and a category prediction layer;

the judging module is used for judging whether the trained feature extraction model is converged;

the determining module is further configured to determine the trained feature extraction model as a target feature extraction model when the judging result of the judging module is yes; the target feature extraction model is used for extracting image features of a target image to be processed.

As an optional implementation manner, in the second aspect of the present invention, the training module performs a model training operation on a preset feature extraction model to be trained according to the target training image set, and a manner of obtaining the trained feature extraction model specifically includes:

and executing category prediction operation on the target feature graph corresponding to each target training image through a category prediction layer of the feature extraction model to be trained to obtain a category prediction score of the target feature graph corresponding to each target training image, wherein the category prediction score is used as the target category prediction score of each target training image.

As an optional implementation manner, in the second aspect of the present invention, the target feature processing layer includes a plurality of global feature processing layers;

the method for obtaining the target feature maps corresponding to all the target training images by the training module executing the feature processing operation on the sampled training images corresponding to each target training image through the target feature processing layer of the feature extraction model to be trained specifically comprises the following steps:

if not, increasing the current feature processing round by 1 time, determining the undetermined feature maps corresponding to all the images to be processed as all the images to be processed corresponding to a next global feature processing layer, triggering and executing the operation of passing through the global feature processing layer, executing feature processing operation on all the images to be processed to obtain the undetermined feature maps corresponding to all the images to be processed, determining the current feature processing round corresponding to all the target training images, and judging whether the current feature processing round is greater than or equal to a preset round threshold value; the global feature processing layer is the next global feature processing layer;

As an optional implementation manner, in the second aspect of the present invention, each global feature processing layer includes a corresponding global feature extractor and a corresponding global feature fusion device;

the method for obtaining the undetermined feature maps corresponding to all the images to be processed by the training module through the global feature processing layer by performing the feature processing operation on all the images to be processed specifically comprises the following steps:

performing feature extraction operation on each image to be processed through a global feature extractor corresponding to the global feature processing layer and a preset second convolution parameter of the global feature extractor to obtain a feature map to be processed corresponding to each image to be processed; the second convolution parameter of the global feature extractor comprises a feature extraction convolution kernel size parameter and/or a feature extraction convolution step size parameter;

As an optional implementation manner, in the second aspect of the present invention, the manner in which the training module performs, through the global feature fusion device corresponding to the global feature processing layer, a feature fusion operation on the to-be-processed feature map corresponding to each to-be-processed image to obtain the feature fusion map corresponding to each to-be-processed image is specifically:

after undetermined feature fusion graphs corresponding to all the images to be processed are obtained, judging whether a next feature fusion module exists or not; when the judgment result is yes, determining the pending feature fusion graphs corresponding to all the images to be processed as all the images to be processed corresponding to the next feature fusion module, inputting all the corresponding images to be processed into the next feature fusion module, and triggering and executing the image copying operation on the images to be processed for each image to be processed to obtain a plurality of copied images corresponding to the images to be processed; and when the judgment result is negative, determining the undetermined feature fusion graphs corresponding to all the images to be processed as the feature fusion graphs corresponding to all the images to be processed.

As an optional implementation manner, in the second aspect of the present invention, the training module performs a class prediction operation on the target feature map corresponding to each target training image through a class prediction layer of the feature extraction model to be trained, to obtain a class prediction score of the target feature map corresponding to each target training image, and a specific manner of taking the class prediction score of each target training image as the target class prediction score of each target training image is:

As an alternative embodiment, in the second aspect of the present invention, all of the target training images in the target training images include all of the first training images in the first training image set and all of the second training images in the second training image set;

the mode for judging whether the trained feature extraction model converges by the judging module specifically comprises the following steps:

calculating a target loss parameter of the trained feature extraction model according to the target category prediction score of each first training image, the target category prediction score of each second training image, a preset first label of each first training image and a preset second label of each second training image, and judging whether the target loss parameter is less than or equal to a preset loss parameter threshold value;

The third aspect of the present invention discloses another intelligent image feature extraction device, which includes:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program codes stored in the memory to execute the intelligent image feature extraction method disclosed by the first aspect of the invention.

In a fourth aspect, the present invention discloses a computer storage medium, which stores computer instructions for executing the method for intelligently extracting image features disclosed in the first aspect of the present invention when the computer instructions are called.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a target training image set for training is determined; according to the target training image set, performing model training operation on a preset feature extraction model to be trained to obtain a trained feature extraction model; and judging whether the trained feature extraction model is converged, and if so, determining the trained feature extraction model as a target feature extraction model. Compared with the traditional local feature extraction, the extracted global feature has more distinctive convolution features, so that even if the size of an image feature map is gradually reduced, the relevant details such as the edge of the feature map and the like can be kept, the image feature can be reliably, accurately and effectively extracted, the accuracy of a subsequent image classification task can be improved, and the reliability and the accuracy of operations such as identification, classification and retrieval of image data can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of building a feature extraction model to be trained, which is disclosed by the embodiment of the invention;

FIG. 2 is a schematic diagram of an image processing flow of a feature fusion module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an image processing flow of another feature fusion module according to an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart of an intelligent image feature extraction method according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of another method for intelligently extracting image features according to the embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an intelligent image feature extraction apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another intelligent image feature extraction device disclosed in the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or article.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

The invention discloses an intelligent extraction method and device of image features, which can be beneficial to reliably, accurately and effectively extracting the image features, and further beneficial to improving the accuracy of a subsequent image classification task, thereby being beneficial to improving the reliability and accuracy of operations such as identification, classification and retrieval of image data. The following are detailed below.

Example one

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating an intelligent image feature extraction method according to an embodiment of the present invention. The intelligent image feature extraction method described in fig. 4 may be applied to image data identification, classification, and retrieval performed by using image feature extraction, and the embodiment of the present invention is not limited thereto. Optionally, the method may be implemented by a feature extraction model training system, and the feature extraction model training system may be integrated in the feature extraction model training device, or may be a local server or a cloud server for processing a feature extraction model training process, and the embodiment of the present invention is not limited. As shown in fig. 4, the intelligent image feature extraction method may include the following operations:

101. a set of target training images for training is determined.

In the embodiment of the present invention, optionally, the target training image set includes a first training image set and a second training image set corresponding to the first training image set, where the second training image set is obtained by performing image preprocessing on the first training image set. Specifically, the first training image set may be understood as real image data, and the second training image set may be understood as image data obtained by performing image preprocessing (such as PS tool processing, image generation network processing, and the like) on the real image data, such as converting a color of a blue-sky background in the original image a into green, or adding an animal image to the blue-sky background in the image a, and the like.

102. And according to the target training image set, performing model training operation on a preset feature extraction model to be trained to obtain a trained feature extraction model.

In the embodiment of the present invention, optionally, the feature extraction model to be trained includes a sampling layer, a target feature processing layer, and a category prediction layer. Specifically, as shown in fig. 1, fig. 1 is a schematic diagram of constructing a feature extraction model to be trained, which is disclosed in the embodiment of the present invention, wherein the sampling layer may be constructed as shown in a first rectangular frame, and the sampling layer may be formed by N ₁ *N ₁ Convolution layer (N) ₁ *N ₁ Conv), a normalization layer (BN) and an active layer (ACT); the construction of the target feature processing layer can be composed of M global feature processing layers shown as a large square frame with a dotted line; the category prediction layer can be constructed as the bottom rectangle frame, and the category prediction layer can be composed of N ₃ *N ₃ Convolution layer (N) ₃ *N ₃ Conv), a normalization layer (BN), and an active layer (ACT). Furthermore, the sampling layer is used for sampling each target training image, the target feature processing layer is used for extracting and fusing a feature map corresponding to each target training image, and the category prediction layer is used for determining a category prediction score to which each target training image belongs.

103. And judging whether the trained feature extraction model is converged, and if so, determining the trained feature extraction model as a target feature extraction model.

In the embodiment of the invention, the target feature extraction model can be applied to an image classification task, wherein the target feature extraction model is used for extracting the image features of a target image to be processed, and the extracted image features are global features of the image to be processed. Further, the method may further include: when the trained feature extraction model is judged not to be converged, adjusting parameters of the trained feature extraction model to obtain a new feature extraction model to be trained, repeatedly executing the operation of executing model training operation on the preset feature extraction model to be trained according to the target training image set in the step 102 to obtain the trained feature extraction model, and judging whether the trained feature extraction model is converged in the step 103. Optionally, the learning rate, the iteration number, the data enhancement mode, and the like of the trained feature extraction model may be adjusted by adjusting parameters of the trained feature extraction model.

Compared with the traditional local feature extraction, the global feature of the image to be processed can be extracted based on the trained target feature extraction model, and compared with the traditional local feature extraction, the global feature of the image extracted by the target feature extraction model has more distinctive convolution features, so that even if the size of an image feature map is gradually reduced, relevant details such as the edge of the feature map and the like can be kept, the image feature can be reliably, accurately and effectively extracted, the accuracy of a subsequent image classification task can be improved, and the reliability and the accuracy of operations such as identification, classification and retrieval of image data can be improved.

Example two

Referring to fig. 5, fig. 5 is a schematic flowchart of an intelligent image feature extraction method according to an embodiment of the present invention. The method for intelligently extracting image features described in fig. 5 may be applied to image data identification, classification, and retrieval performed by using image feature extraction, and the embodiment of the present invention is not limited thereto. Optionally, the method may be implemented by a feature extraction model training system, and the feature extraction model training system may be integrated in the feature extraction model training device, or may be a local server or a cloud server for processing a feature extraction model training process, and the embodiment of the present invention is not limited. As shown in fig. 5, the intelligent image feature extraction method may include the following operations:

201. a set of target training images for training is determined.

202. And performing image sampling operation on each target training image in the target training image set through a preset sampling layer of the feature extraction model to be trained and a first convolution parameter of the sampling layer to obtain a sampled training image corresponding to each target training image.

In this embodiment of the present invention, optionally, the first convolution parameter of the sampling layer includes a sampling convolution kernel size parameter and/or a sampling convolution step size parameter, for example, the sampling convolution kernel size parameter is N ₁ *N ₁ The sampling convolution step length parameter is L ₁ 。

203. And performing characteristic processing operation on the sampled training images corresponding to each target training image through a target characteristic processing layer of the to-be-trained characteristic extraction model to obtain target characteristic graphs corresponding to all the target training images.

In the embodiment of the present invention, as shown in fig. 1, the target feature processing layer may be composed of M global feature processing layers, where each global feature processing layer may be composed of a corresponding global feature extractor and a corresponding global feature fusion device. Further, the corresponding global feature extractor may be represented by N ₂ *N ₂ Convolution layer (N) ₂ *N ₂ Conv), normalization layer (BN) and active layer (ACT), and the corresponding global feature fusion may be composed of N feature fusion modules (gcmodules). Still further, the values of M and N may be determined according to image processing requirements, for example, when the model accuracy needs to be improved, the specific values of M and N may be increased, and when the model performance needs to be retained, the specific values of M and N may be appropriately decreased.

204. And performing category prediction operation on the target characteristic graph corresponding to each target training image through a category prediction layer of the characteristic extraction model to be trained to obtain a category prediction score of the target characteristic graph corresponding to each target training image as the target category prediction score of each target training image.

In the embodiment of the present invention, the target category prediction score of each target training image may be understood as a prediction score of a certain category corresponding to the target training image. Optionally, the class prediction operation may be to predict a single classification task, or may be to predict a multi-classification task.

205. And judging whether the trained feature extraction model is converged, and if so, determining the trained feature extraction model as a target feature extraction model.

In the embodiment of the present invention, for other descriptions of step 201 and step 205, please refer to the detailed description of step 101 and step 103 in the first embodiment, and the embodiment of the present invention is not described again.

Therefore, the training operation of the feature extraction model to be trained can be completed through the processing operation of the sampling layer, the target feature processing layer and the category prediction layer on the target training image in a targeted manner, so that the reliability and the accuracy of the training of the feature extraction model to be trained can be improved, the training effectiveness of the feature extraction model to be trained can be improved, and the trained feature extraction model can be trained reliably and accurately; meanwhile, through the flexible construction of the target feature processing layer, the intelligent and flexible construction mode of the feature extraction model to be trained is realized, so that the image processing requirements of users are met.

In an optional embodiment, the performing, by the target feature processing layer of the feature extraction model to be trained in step 203, a feature processing operation on the sampled training images corresponding to each target training image to obtain target feature maps corresponding to all target training images includes:

for each global feature processing layer, determining all images to be processed corresponding to the global feature processing layer according to the sampled training images corresponding to all target training images, and executing feature processing operation on all the images to be processed through the global feature processing layer to obtain undetermined feature maps corresponding to all the images to be processed;

determining current feature processing rounds corresponding to all target training images, and judging whether the current feature processing rounds are larger than or equal to a preset round threshold value;

when the judgment result is negative, increasing the current feature processing round by 1 time, determining undetermined feature maps corresponding to all the images to be processed as all the images to be processed corresponding to the next global feature processing layer, triggering the executed image to pass through the global feature processing layer, executing feature processing operation on all the images to be processed to obtain undetermined feature maps corresponding to all the images to be processed, determining the current feature processing round corresponding to all the target training images, and judging whether the current feature processing round is greater than or equal to the operation of a preset round threshold value;

and when the judgment result is yes, determining the undetermined characteristic graphs corresponding to all the images to be processed as target characteristic graphs corresponding to all the target training images.

In this alternative embodiment, in the triggered execution operation, the global feature processing layer is the next global feature processing layer. Optionally, the target feature processing layer includes a plurality of global feature processing layers, and as shown in fig. 1, the construction sequence among all the global feature processing layers is based on the sequence of image feature processing, that is, after the current global feature processing layer finishes performing the image feature processing operation to obtain the image feature processing result, the image feature processing result is transmitted to the next global feature processing layer, and the next global feature processing layer continues to perform the image feature processing operation on the image feature processing result until all the global feature processing layers finish performing the image feature processing operation, so as to obtain the target feature maps corresponding to all the target training images.

Optionally, each global feature processing layer includes a corresponding feature extraction convolution kernel size parameter and/or a feature extraction convolution step size parameter, for example, the first global feature processing layer may retain an image size of a sampled training image corresponding to each target training image, and the feature extraction convolution step size parameter with a specific value a is used, and the second to mth global feature processing layers may sample image feature processing results of each target training image step by step, and the feature extraction convolution step size parameter with a specific value b is usedThe kernel size parameter and/or the feature extraction convolution step size parameter are not limited. For example, the image size of the training image after sampling corresponding to each target training image is c × d ₁ After being processed by the first global feature processing layer, the image size corresponding to each target training image is changed into c × d ₂ （d ₂ Is the first channel number), and then is processed by the second global feature processing layer, the image size corresponding to each target training image is changed into (c/2) × d ₃ （d ₃ The second channel), and so on.

Therefore, the optional embodiment can perform feature processing on the target training image through the plurality of global feature processing layers based on the image processing requirement to obtain the target feature map corresponding to the target training image, so that the reliability and the accuracy of the feature processing operation of the target feature processing layers are improved, the reliability and the accuracy of the target feature map corresponding to the obtained target training image are improved, and the reliable and accurate class prediction of the target feature map by the subsequent class prediction layer is facilitated.

In another optional embodiment, in the above step, the performing, by the global feature processing layer, a feature processing operation on all the images to be processed to obtain pending feature maps corresponding to all the images to be processed includes:

performing feature extraction operation on each image to be processed through a global feature extractor corresponding to the global feature processing layer and a preset second convolution parameter of the global feature extractor to obtain a feature map to be processed corresponding to each image to be processed;

In this alternative embodiment, the addition operation may be as shown at "plus" in fig. 1, which can avoid the gradient from vanishing, thereby speeding up the training convergence. Each global feature processing layer comprises a corresponding global feature extractor and a corresponding global feature fusion device. Optionally, the second convolution parameter of the global feature extractor includes a feature extraction convolution kernel size parameter and/or a feature extraction convolution step size parameter. Further alternatively, the global feature fusion module may be composed of a plurality of feature fusion modules (gcmodules). Specifically, through the feature processing operation of the target feature processing layer, all target training images are subjected to feature extraction and feature fusion operations for multiple times, so that after global feature extraction and feature fusion, even though shallow convolution features of the target training images exist, certain differences exist among the shallow convolution features, and the distinctiveness of convolution features of different training images using the same template is increased.

Therefore, the optional embodiment can build each global feature processing layer according to the image processing requirement, and an intelligent image processing mode of the global feature processing layer on the image to be processed is realized, so that the feature processing reliability and accuracy of the global feature processing layer on the image to be processed can be improved, the reliability and accuracy of the undetermined feature map corresponding to the obtained image to be processed can be further improved, and the target feature map corresponding to the target training image can be accurately obtained.

In another optional embodiment, the performing, by a global feature fusion device corresponding to the global feature processing layer in the foregoing step, a feature fusion operation on the to-be-processed feature map corresponding to each to-be-processed image to obtain a feature fusion map corresponding to each to-be-processed image includes:

inputting all images to be processed into a global feature fusion device corresponding to the global feature processing layer, so that each feature fusion module of the global feature fusion device executes the following operations:

for each image to be processed, performing image copying operation on the image to be processed to obtain a plurality of copied images corresponding to the image to be processed, and performing feature point position scrambling operation on all feature points in each copied image to obtain all scrambled feature points corresponding to each copied image;

for each copied image of each image to be processed, extracting the spatial structure feature information of each disordered feature point corresponding to the copied image, and performing feature point position recovery operation on all the disordered feature points according to the spatial structure feature information of each disordered feature point to obtain all recovered feature points corresponding to the copied image;

for each image to be processed, performing feature weighted summation operation on each feature point in the image to be processed according to a predetermined first feature value of each feature point in the image to be processed, a corresponding second feature value of each restored feature point in each copied image, a first weight of the image to be processed and a corresponding second weight of each copied image, so as to obtain all weighted summed feature points of the image to be processed;

for each image to be processed, according to spatial structure feature information corresponding to each weighted and summed feature point in the image to be processed, performing feature fusion operation on each weighted and summed feature point to obtain all fused feature points of the image to be processed, and determining a fused image of the image to be processed corresponding to the feature to be determined according to all the fused feature points of the image to be processed;

after undetermined feature fusion graphs corresponding to all images to be processed are obtained, judging whether a next feature fusion module exists or not; when the judgment result is yes, determining the undetermined feature fusion images corresponding to all the images to be processed as all the images to be processed corresponding to the next feature fusion module, inputting all the corresponding images to be processed into the next feature fusion module, and triggering to execute image copying operation on each image to be processed to obtain a plurality of copied images corresponding to the images to be processed; and when the judgment result is negative, determining the undetermined feature fusion graphs corresponding to all the images to be processed as the feature fusion graphs corresponding to all the images to be processed.

In this alternative embodiment, a plurality of feature fusion modules are used to perform a plurality of feature fusion operations on all images to be processed. Fig. 2 is a schematic diagram of an image processing flow of a feature fusion Module (i.e., a certain GCModule in fig. 1) according to an embodiment of the present invention, wherein a feature fusion processing Module, i.e., a Conv Module in the feature fusion Module, may be composed of a plurality of convolution modules, normalization modules, and activation modules, for example, a first layer of the Conv Module is composed of a first convolution layer and a first normalization layer, a second layer of the Conv Module is composed of a second convolution layer, a second normalization layer, and a first activation layer, and a third layer of the Conv Module is composed of a third convolution layer and a third normalization layer, which is not limited in this embodiment.

Further, the image to be processed and each copied image include a plurality of corresponding feature points, and all the feature points in each copied image are matched with all the feature points in the image to be processed. Still further, the spatial structure feature information of each shuffled feature point includes spatial structure incidence relation information between the shuffled feature point and any other shuffled feature point in the copied image. Specifically, as shown in fig. 2, each feature fusion module may include k branches.

For example, as shown in fig. 2 and fig. 3, fig. 3 is a schematic diagram of an image processing flow of another feature fusion module according to an embodiment of the present invention, and each numeral in fig. 3 represents a corresponding position of a feature point. For each to-be-processed image, the k branches may indicate that the to-be-processed image is to be duplicated into (k-1) duplicated images. Wherein, the first branch can reserve the image to be processed (can maintain the space structure characteristic information of the original image), the second branch to the kth branch can disturb the position of the characteristic point of the corresponding copied image (can ensure that each characteristic point of any copied image can be randomly associated with any other characteristic point, thereby capturing the relation between the characteristic points at different positions), then, each branch performs characteristic extraction (can perform characteristic extraction by convolution operation) on the image to be processed/the corresponding copied image through Conv Module, and then the second branch to the kth branch can recover all disturbed characteristic points of the corresponding copied image according to the extracted space structure characteristic information of each disturbed characteristic point, and according to the weight of each branch (namely the first weight of the image to be processed and the second weight of all copied images), carrying out weighted summation on the feature points at the corresponding positions of the image to be processed, namely, carrying out weighted summation on the first feature point of the original image and the first feature point of the copied image to obtain the first feature point of the image to be processed after weighted summation, and then fusing the multi-space structure feature information learned by each feature point after weighted summation through a convolution layer, a normalization layer and an activation layer to capture the global feature information after obtaining all the feature points after weighted summation corresponding to all the images to be processed.

It should be noted that, a conventional image local feature extraction is generally composed of a large number of convolution modules and a small number of fully connected layers. The front large number of convolution layers are only used for extracting local features of the image, and only the rear full-connection layer is used for fusing image features. With the continuous deepening of the convolution network, the size of the feature map is gradually reduced, the shallow feature map is used for extracting detail features such as edges, and the deep feature map is used for extracting high-level semantic features. And finally, based on deep semantic features, fusing the image features by using a simple full-connection layer, wherein the full-connection layer can destroy the spatial structure of the image, so that the detail information of the shallow image feature is easy to ignore, and the final image feature fusion effect is poor. In the technical scheme, multiple times of spatial feature fusion are performed on the shallow layer, so that the feature correlation among feature points at different spatial positions is enhanced, and the subsequent classification and identification precision of the images can be improved.

Therefore, the optional embodiment can obtain the feature fusion graphs corresponding to all the images to be processed by performing multiple times of spatial feature fusion on the shallow layer, which is beneficial for the feature extraction model to be trained to learn the multi-spatial structure feature information among the feature points of each image, and further beneficial for the feature extraction model to be trained to capture the global feature information of each image, thereby being beneficial for improving the reliability and the accuracy of the obtained feature fusion graphs corresponding to the images to be processed, and realizing the accurate training of the feature extraction model to be trained.

In yet another optional embodiment, the performing, by the class prediction layer of the feature extraction model to be trained in step 204, a class prediction operation on the target feature map corresponding to each target training image to obtain a class prediction score of the target feature map corresponding to each target training image, as the target class prediction score of each target training image, includes:

for a target feature map corresponding to each target training image, executing category probability calculation operation on each target feature point in the target feature map through a category prediction layer of a feature extraction model to be trained to obtain category probability parameters of all target feature points of the target feature map, and determining the target category probability parameters of the target feature map according to the category probability parameters of all target feature points of the target feature map;

and executing target function conversion operation on the target category probability parameters of the target characteristic diagram corresponding to each target training image to obtain a category prediction score of the target characteristic diagram corresponding to each target training image, wherein the category prediction score is used as the target category prediction score of each target training image.

In this alternative embodiment, the class probability calculation operation may be through N ₃ *N ₃ The output feature dimension can represent the probability that each target feature point belongs to the same category, so that the mean value of the obtained category probability parameters of all the target feature points is calculated to obtain the target category probability parameters of the target feature map. Optionally, if the class prediction operation is for a single classification task, the objective function conversion operation may be performed by softmax, and if the class prediction operation is for a multi-classification task, the objective function conversion operation may be performed by sigmoid activation.

Therefore, the optional embodiment can calculate the target class prediction score of the target training image through the feature point class probability calculation operation on the target feature map of the target training image and the function conversion operation on the target class probability parameter of the target feature map, so that the reliability and the accuracy of the calculation operation on the target class prediction score of the target training image are improved, the reliability and the accuracy of the training operation on the feature extraction model to be trained are improved, the accurate and reliable target feature extraction model is obtained, and the accurate and reliable global feature extraction is performed on the target feature extraction model to be processed.

In yet another alternative embodiment, the determining whether the trained feature extraction model converges in step 205 includes:

In this alternative embodiment, wherein all of the target training images comprise all of the first training images in the first set of training images and all of the second training images in the second set of training images. Further, before determining that the trained feature extraction model converges, the method further comprises: and determining the verification index parameter trend of the trained feature extraction model, judging whether the verification index parameter trend is a preset trend, and triggering and executing the operation of determining the convergence of the trained feature extraction model when the verification index parameter trend is judged to be the preset trend. Optionally, the preset first label of each first training image may be 1, and the preset second label of each second training image may be 0. Specifically, the to-be-trained feature extraction model learns the spatial structure feature information of the real image data and the preprocessed image data, so that the to-be-trained feature extraction model can learn the correlation between the target and the background, and the reasonability of the image content can be judged.

Therefore, the optional embodiment can calculate the target loss parameter according to the relevant category prediction score and the label in a targeted manner, so that whether the trained feature extraction model is converged is judged according to the target loss parameter, the target feature extraction model is trained, the reliability and the accuracy of the calculated target loss parameter can be improved, the reliability and the accuracy of convergence judgment operation on the trained feature extraction model can be improved, the target feature extraction model can be effectively trained, and the extraction requirement of the image global feature of the user can be met.

EXAMPLE III

Referring to fig. 6, fig. 6 is a schematic structural diagram of an intelligent image feature extraction device according to an embodiment of the present invention. As shown in fig. 6, the intelligent image feature extraction device may include:

a determining module 301, configured to determine a target training image set for training;

the training module 302 is configured to perform a model training operation on a preset feature extraction model to be trained according to the target training image set to obtain a trained feature extraction model;

a judging module 303, configured to judge whether the trained feature extraction model converges;

the determining module 301 is further configured to determine the trained feature extraction model as the target feature extraction model when the determination result of the determining module 303 is yes.

In the embodiment of the invention, the target training image set comprises a first training image set and a second training image set corresponding to the first training image set, wherein the second training image set is obtained by performing image preprocessing on the first training image set; the feature extraction model to be trained comprises a sampling layer, a target feature processing layer and a category prediction layer; the target feature extraction model is used for extracting image features of the target image to be processed.

It can be seen that the intelligent image feature extraction device described in fig. 6 can extract the global features of the image to be processed based on the trained target feature extraction model, and compared with the conventional local feature extraction, the image global features extracted by the target feature extraction model have more distinctive convolution features, so that even if the size of the image feature map is gradually reduced, the relevant details such as the edge of the feature map can be retained, which is beneficial to reliably, accurately and effectively extracting the image features, and further beneficial to improving the accuracy of the subsequent image classification task, thereby being beneficial to improving the reliability and accuracy of operations such as identification, classification and retrieval of image data.

In an optional embodiment, the training module 302 performs a model training operation on a preset feature extraction model to be trained according to a target training image set, and the mode of obtaining the trained feature extraction model specifically includes:

performing image sampling operation on each target training image in the target training image set through a sampling layer of a preset feature extraction model to be trained and a first convolution parameter of the sampling layer to obtain a sampled training image corresponding to each target training image;

performing characteristic processing operation on the sampled training images corresponding to each target training image through a target characteristic processing layer of the to-be-trained characteristic extraction model to obtain target characteristic graphs corresponding to all the target training images;

and executing category prediction operation on the target characteristic graph corresponding to each target training image through a category prediction layer of the to-be-trained characteristic extraction model to obtain a category prediction score of the target characteristic graph corresponding to each target training image, wherein the category prediction score is used as the target category prediction score of each target training image.

In this alternative embodiment, the first convolution parameters of the sampling layer include a sampling convolution kernel size parameter and/or a sampling convolution step size parameter.

It can be seen that the intelligent image feature extraction device described in fig. 6 can perform the training operation of the feature extraction model to be trained through the processing operation of the sampling layer, the target feature processing layer and the category prediction layer on the target training image, which is favorable for improving the training reliability and accuracy of the feature extraction model to be trained, and further favorable for improving the training effectiveness of the feature extraction model to be trained, so that the trained feature extraction model can be trained reliably and accurately; meanwhile, through the flexible construction of the target feature processing layer, the intelligent and flexible construction mode of the feature extraction model to be trained is realized, so that the image processing requirements of users are met.

In another optional embodiment, the method for obtaining the target feature maps corresponding to all the target training images by the training module 302 performing the feature processing operation on the sampled training images corresponding to each target training image through the target feature processing layer of the feature extraction model to be trained specifically includes:

for each global feature processing layer, determining all images to be processed corresponding to the global feature processing layer according to the sampled training images corresponding to all target training images, and performing feature processing operation on all the images to be processed through the global feature processing layer to obtain undetermined feature maps corresponding to all the images to be processed;

if not, increasing the current feature processing round by 1 time, determining undetermined feature maps corresponding to all the images to be processed as all the images to be processed corresponding to the next global feature processing layer, triggering the executed passing global feature processing layer, executing feature processing operation on all the images to be processed to obtain the undetermined feature maps corresponding to all the images to be processed, determining the current feature processing round corresponding to all the target training images, and judging whether the current feature processing round is greater than or equal to a preset round threshold value;

In this alternative embodiment, the target feature handling layer includes a plurality of global feature handling layers; the global feature processing layer is the next global feature processing layer.

Therefore, the intelligent extraction device for implementing the image features described in fig. 6 can perform feature processing on the target training image through the plurality of global feature processing layers based on the image processing requirement to obtain the target feature map corresponding to the target training image, so that the reliability and accuracy of the feature processing operation of the target feature processing layers are improved, the reliability and accuracy of the target feature map corresponding to the obtained target training image are improved, and the reliable and accurate class prediction of the target feature map by the subsequent class prediction layer is facilitated.

In another optional embodiment, the manner for the training module 302 to perform the feature processing operation on all the images to be processed through the global feature processing layer to obtain the pending feature maps corresponding to all the images to be processed specifically is as follows:

In this optional embodiment, each global feature processing layer includes a corresponding global feature extractor and a corresponding global feature fuser; the second convolution parameters of the global feature extractor include a feature extraction convolution kernel size parameter and/or a feature extraction convolution step size parameter.

Therefore, the intelligent extraction device for the image features described in fig. 6 can be used for constructing each global feature processing layer according to the image processing requirements, and an intelligent image processing mode of the global feature processing layer on the image to be processed is realized, so that the feature processing reliability and accuracy of the global feature processing layer on the image to be processed can be improved, the reliability and accuracy of the undetermined feature map corresponding to the obtained image to be processed can be improved, and the target feature map corresponding to the target training image can be accurately acquired.

In yet another optional embodiment, the manner of obtaining the feature fusion map corresponding to each image to be processed by the training module 302 executing the feature fusion operation on the feature map to be processed corresponding to each image to be processed through the global feature fusion device corresponding to the global feature processing layer is specifically as follows:

after undetermined feature fusion graphs corresponding to all images to be processed are obtained, judging whether a next feature fusion module exists or not; when the judgment result is yes, determining the undetermined feature fusion images corresponding to all the images to be processed as all the images to be processed corresponding to the next feature fusion module, inputting all the corresponding images to be processed into the next feature fusion module, and triggering to execute image copying operation on each image to be processed to obtain a plurality of copied images corresponding to the images to be processed; and when the judgment result is negative, determining the undetermined feature fusion graphs corresponding to all the images to be processed as feature fusion graphs corresponding to all the images to be processed.

In this optional embodiment, the image to be processed and each copied image include a plurality of corresponding feature points, and all the feature points in each copied image are matched with all the feature points in the image to be processed; the spatial structure feature information of each shuffled feature point includes spatial structure association relationship information between the shuffled feature point and any other shuffled feature point in the copied image.

It can be seen that, by implementing the intelligent extraction device for image features described in fig. 6, multiple spatial feature fusions can be performed on a shallow layer to obtain feature fusion maps corresponding to all images to be processed, which is beneficial for the feature extraction model to be trained to learn the multi-spatial structure feature information between each feature point of each image, and further beneficial for the feature extraction model to be trained to capture the global feature information of each image, thereby being beneficial for improving the reliability and accuracy of the obtained feature fusion maps corresponding to the images to be processed, and realizing the precise training of the feature extraction model to be trained.

In yet another optional embodiment, the training module 302 performs a class prediction operation on the target feature map corresponding to each target training image through a class prediction layer of the feature extraction model to be trained to obtain a class prediction score of the target feature map corresponding to each target training image, and a specific way of using the class prediction score as the target class prediction score of each target training image is as follows:

It can be seen that, by implementing the intelligent image feature extraction device described in fig. 6, the target class prediction score of the target training image can be calculated through the feature point class probability calculation operation on the target feature map of the target training image and the function conversion operation on the target class probability parameter of the target feature map, which is favorable for improving the reliability and accuracy of the calculation operation on the target class prediction score of the target training image, and further favorable for improving the reliability and accuracy of the training operation on the feature extraction model to be trained, thereby being favorable for obtaining an accurate and reliable target feature extraction model and performing accurate and reliable global feature extraction on the target feature extraction model to be processed.

In yet another optional embodiment, the determining module 303 determines whether the feature extraction model converges after the training specifically as follows:

and when the judgment result is yes, determining that the feature extraction model after training is converged.

In this alternative embodiment, all of the target training images include all of the first training images in the first set of training images and all of the second training images in the second set of training images.

It can be seen that, the intelligent extraction device for implementing the image features described in fig. 6 can calculate the target loss parameters in a targeted manner according to the relevant category prediction scores and the labels, so as to judge whether the trained feature extraction model converges according to the target loss parameters, so as to train the target feature extraction model, thereby improving the reliability and accuracy of the calculated target loss parameters, further improving the reliability and accuracy of the convergence judgment operation on the trained feature extraction model, and further effectively training the target feature extraction model to meet the extraction requirements of the image global features of the user.

Example four

Referring to fig. 7, fig. 7 is a schematic structural diagram of another intelligent image feature extraction device according to an embodiment of the present invention. As shown in fig. 7, the intelligent image feature extraction device may include:

a memory 401 storing executable program code;

a processor 402 coupled with the memory 401;

the processor 402 calls the executable program code stored in the memory 401 to execute the steps in the method for intelligently extracting image features described in the first embodiment or the second embodiment of the present invention.

EXAMPLE five

The embodiment of the invention discloses a computer storage medium, which stores computer instructions, and the computer instructions are used for executing the steps in the intelligent image feature extraction method described in the first embodiment or the second embodiment of the invention when being called.

EXAMPLE six

The embodiment of the invention discloses a computer program product, which comprises a non-transitory computer readable storage medium storing a computer program, wherein the computer program is operable to make a computer execute the steps in the intelligent image feature extraction method described in the first embodiment or the second embodiment.

The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, wherein the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM) or other Memory capable of storing data, a magnetic tape, or any other computer-readable medium capable of storing data.

Finally, it should be noted that: the method and the apparatus for intelligently extracting image features disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, rather than for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An intelligent image feature extraction method is characterized by comprising the following steps:

judging whether the trained feature extraction model is converged, if so, determining the trained feature extraction model as a target feature extraction model; the target feature extraction model is used for extracting image features of a target image to be processed;

according to the target training image set, executing model training operation on a preset feature extraction model to be trained to obtain a trained feature extraction model, and the method comprises the following steps:

performing feature processing operation on the sampled training images corresponding to each target training image through a target feature processing layer of the feature extraction model to be trained to obtain target feature maps corresponding to all the target training images; the target feature maps corresponding to all the target training images are determined according to undetermined feature maps corresponding to-be-processed images corresponding to all the sampled training images determined by the target feature processing layer; the undetermined feature maps corresponding to all the images to be processed are obtained by performing addition operation on the feature maps to be processed corresponding to each image to be processed and the corresponding feature fusion maps; wherein, the feature fusion map corresponding to each image to be processed is obtained by the following method: after the to-be-processed image is copied, the feature points are disordered, restored and subjected to weighted summation fusion operation to obtain the to-be-determined feature fusion image corresponding to the to-be-processed image, the to-be-determined feature fusion image corresponding to the to-be-processed image is subjected to multiple times of fusion through a plurality of feature fusion modules to obtain the to-be-determined feature fusion image;

2. The intelligent image feature extraction method according to claim 1, wherein the target feature processing layer includes a plurality of global feature processing layers;

3. The intelligent image feature extraction method according to claim 2, wherein each global feature processing layer comprises a corresponding global feature extractor and a corresponding global feature fusion device;

4. The intelligent image feature extraction method according to claim 3, wherein the performing, by a global feature fusion device corresponding to the global feature processing layer, a feature fusion operation on the to-be-processed feature map corresponding to each to-be-processed image to obtain a feature fusion map corresponding to each to-be-processed image includes:

for each image to be processed, performing feature fusion operation on each weighted and summed feature point according to spatial structure feature information corresponding to each weighted and summed feature point in the image to be processed to obtain all fused feature points of the image to be processed, and determining a fused image of the feature to be determined corresponding to the image to be processed according to all the fused feature points of the image to be processed;

after the undetermined feature fusion graphs corresponding to all the images to be processed are obtained, judging whether a next feature fusion module exists or not; when the judgment result is yes, determining the pending feature fusion graphs corresponding to all the images to be processed as all the images to be processed corresponding to the next feature fusion module, inputting all the corresponding images to be processed into the next feature fusion module, and triggering and executing the image copying operation on the images to be processed for each image to be processed to obtain a plurality of copied images corresponding to the images to be processed; and when the judgment result is negative, determining the undetermined feature fusion graphs corresponding to all the images to be processed as feature fusion graphs corresponding to all the images to be processed.

5. The method according to any one of claims 1 to 4, wherein performing, by a class prediction layer of the feature extraction model to be trained, a class prediction operation on a target feature map corresponding to each target training image to obtain a class prediction score of the target feature map corresponding to each target training image, as the target class prediction score of each target training image, includes:

for a target feature map corresponding to each target training image, performing category probability calculation operation on each target feature point in the target feature map through a category prediction layer of the feature extraction model to be trained to obtain category probability parameters of all the target feature points of the target feature map, and determining the target category probability parameters of the target feature map according to the category probability parameters of all the target feature points of the target feature map;

6. The intelligent image feature extraction method according to claim 5, wherein all of the target training images comprise all of the first training images in the first training image set and all of the second training images in the second training image set;

wherein the judging whether the trained feature extraction model converges comprises:

7. An intelligent image feature extraction device, characterized in that the device comprises:

the determining module is further configured to determine the trained feature extraction model as a target feature extraction model when the judging result of the judging module is yes; the target feature extraction model is used for extracting image features of a target image to be processed;

the training module executes model training operation on a preset feature extraction model to be trained according to the target training image set, and the mode of obtaining the trained feature extraction model specifically comprises the following steps:

performing feature processing operation on the sampled training images corresponding to each target training image through a target feature processing layer of the feature extraction model to be trained to obtain target feature maps corresponding to all the target training images; the target feature maps corresponding to all the target training images are determined according to undetermined feature maps corresponding to-be-processed images corresponding to all the sampled training images determined by the target feature processing layer; the undetermined feature maps corresponding to all the images to be processed are obtained by performing addition operation on the characteristic map to be processed corresponding to each image to be processed and the corresponding feature fusion map; the feature fusion graph corresponding to each image to be processed is obtained by the following method: after the to-be-processed image is copied, the feature points are disordered, restored and subjected to weighted summation fusion operation to obtain the to-be-determined feature fusion image corresponding to the to-be-processed image, the to-be-determined feature fusion image corresponding to the to-be-processed image is subjected to multiple times of fusion through a plurality of feature fusion modules to obtain the to-be-determined feature fusion image;

8. An intelligent image feature extraction device, characterized in that the device comprises:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to execute the intelligent extraction method of image features according to any one of claims 1 to 6.

9. A computer storage medium storing computer instructions which, when invoked, perform a method for intelligent extraction of image features according to any one of claims 1 to 6.