CN115937537A - Intelligent identification method, device and equipment for target image and storage medium - Google Patents

Intelligent identification method, device and equipment for target image and storage medium Download PDF

Info

Publication number
CN115937537A
CN115937537A CN202211575082.3A CN202211575082A CN115937537A CN 115937537 A CN115937537 A CN 115937537A CN 202211575082 A CN202211575082 A CN 202211575082A CN 115937537 A CN115937537 A CN 115937537A
Authority
CN
China
Prior art keywords
image
sample
initial
target
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211575082.3A
Other languages
Chinese (zh)
Inventor
杨飞
熊佳乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northking Information Technology Co ltd
Original Assignee
Northking Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northking Information Technology Co ltd filed Critical Northking Information Technology Co ltd
Priority to CN202211575082.3A priority Critical patent/CN115937537A/en
Publication of CN115937537A publication Critical patent/CN115937537A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent identification method, device, equipment and storage medium of a target image. The method comprises the following steps: inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales; inputting the first characteristic diagram into a second submodel to obtain at least three second characteristic diagrams with different scales; and inputting the second feature map into a third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient. The technical scheme of the embodiment of the invention achieves balance in the aspects of speed and precision of image recognition, can quickly and accurately determine the images of the targets to be recognized in various simple or complex scenes, and solves the problem of poor image recognition effect in the complex scenes.

Description

Intelligent identification method, device and equipment for target image and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for intelligently identifying a target image.
Background
With the rapid development of new generation information technology represented by artificial intelligence, big data and cloud computing, image recognition is used as the most basic part in the process of enterprise digital transformation.
At present, a detection algorithm based on deep learning is generally used for extracting target features of target images, so that the method can obtain difference features among different target images and has good anti-interference capability. Currently, the mainstream detection algorithms include a YOLO (You Only Look Once) series detection algorithm and an RCNN (Region Convolutional Neural Networks) detection algorithm, and the two detection algorithms are applicable to different detection scenes.
However, the conventional target image detection method has certain limitations, and cannot achieve good effects on complex scenes such as image wrinkles, image darkness and image deformation.
Disclosure of Invention
The invention provides an intelligent identification method, device, equipment and storage medium of a target image, and aims to solve the problem of poor identification effect of the target image in a complex scene.
In a first aspect, the present invention provides a method for intelligently identifying a target image, including:
inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model;
inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
In a second aspect, the present invention provides an apparatus for intelligently recognizing a target image, including:
the first feature map determining module is used for inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model;
the second feature map determining module is used for inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
and the target image determining module is used for inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
In a third aspect, the present invention provides an electronic device comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of intelligent identification of a target image of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium storing computer instructions for causing a processor to implement the method for intelligently identifying a target image of the first aspect when executed.
The invention provides an intelligent identification scheme of a target image, which comprises the steps of inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model, inputting the first feature map into the second sub-model to obtain at least three second feature maps with different scales, constructing the second sub-model based on a feature pyramid network, inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining the target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence coefficient, direction confidence coefficient and deflection angle confidence coefficient. By adopting the technical scheme, the initial image is processed by utilizing the first sub-model of the preset feature extraction model to obtain the features (first feature map) of the initial image in different scales, then the features are processed by utilizing the second sub-model constructed based on the feature pyramid network to obtain the fine features (second feature map) of the initial image in different scales, and finally all the second feature maps are input into the third sub-model to obtain the target information of the target to be recognized in the initial image.
It should be understood that the statements herein do not identify key or critical features of the invention, nor do they limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for intelligently identifying a target image according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for intelligently identifying a target image according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a sample direction provided according to the second embodiment of the present invention;
FIG. 4 is a schematic diagram of an initial sample image according to a second embodiment of the present invention;
FIG. 5 is a schematic diagram of a sample image according to a second embodiment of the present invention;
FIG. 6 is a schematic diagram of a target detection box according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of an intelligent target image recognition apparatus according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. In the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of an embodiment of the present invention, which provides a method for intelligently identifying a target image, where this embodiment is applicable to a case of determining a target image, and the method may be executed by an intelligent identification apparatus for a target image, where the intelligent identification apparatus for a target image may be implemented in a form of hardware and/or software, and the intelligent identification apparatus for a target image may be configured in an electronic device, such as a mobile phone, and the electronic device may be configured by two or more physical entities or may be configured by one physical entity.
As shown in fig. 1, the method for intelligently identifying a target image according to an embodiment of the present invention specifically includes the following steps:
s101, inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model.
In this embodiment, a preset image pickup device, such as a mobile phone camera, may be used to obtain an image (an initial image) including at least one target to be recognized, and then the image is input into a preset feature extraction model, and through processing of a first sub-model of the preset feature extraction model, at least three feature maps (first feature maps) with different scales, such as feature maps with scales of 80, 40, and 20 × 20, may be obtained. The target to be recognized may be any object, such as a person, an animal, or any inanimate object, and the scale may be understood as the size of the picture, and for example, the scale 80 x 80 may represent a picture with a length and a width of 80 pixels.
Optionally, the first sub-model at least includes a convolution layer, a batch normalization layer, an activation function layer, and a pooling layer.
Specifically, the convolutional layer may process the initial image by using a plurality of different convolutional cores to obtain feature maps of different layers. The batch normalization layer can calculate the mean value and the variance of pixel values of all characteristic images output by the convolutional layer, then determine the difference value of the pixel values and the mean value, and calculate the quotient of the difference value and the variance, thereby realizing normalization. The activation function layer may be a PRelu (Parametric Rectified Linear Unit) layer or a Relu (Rectified Linear Unit) layer, and the activation function layer may perform a non-Linear processing on the feature map output by the batch normalization layer. The pooling layer can perform downsampling processing on the feature map output by the activation function layer, so that first feature maps with different scales are obtained.
S102, inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network.
In this embodiment, after the first feature map is obtained, the second submodel of the preset feature extraction model may be used to fuse the first feature maps with different scales, so as to obtain at least three second feature maps with different scales, where the second submodel has the superior characteristic of the feature pyramid network, so that the accuracy of the first feature map may be improved, and the speed of processing feature data is faster, and the scale of the second feature map is generally the same as the scale of the first feature map, but the second feature map includes feature information with finer granularity. The second sub-model usually includes a plurality of feature extraction layers.
S103, inputting the second feature map into the third sub-model to obtain target information of a target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
In this embodiment, after the second feature map is obtained, all the second feature maps may be input into a third sub-model, where the third sub-model may include a plurality of convolutional layers, such as a convolutional layer formed by a plurality of convolutional kernels 1*1, the convolutional layers may be used to extract target information from the second feature maps with different scales, and each target to be recognized in each second feature map may correspond to a set of target information including position information, category confidence, direction confidence, and deflection angle confidence. Target information meeting preset requirements can be screened from multiple sets of target information, such as preset requirements that category confidence, direction confidence and deflection angle confidence are larger than corresponding preset confidence threshold values, and according to position information, the direction corresponding to the direction confidence and the deflection angle corresponding to the deflection angle confidence in the target information meeting the preset requirements, a target image of a target to be recognized can be cut from an initial image, so that the target image is obtained.
The position information in the target information may be understood as the position coordinates of the target to be recognized in the image coordinate system of the initial image. The category confidence may be understood as the confidence that the object to be recognized belongs to a preset category. The direction confidence may be understood as the confidence that the target to be recognized is in a preset direction. The deflection angle confidence may be understood as the confidence that the target to be recognized is at a preset deflection angle. The preset type of the target to be recognized may be a person, an animal, or any inanimate object, such as a bill, etc., the preset direction may be an orientation of the target to be recognized, such as a preset direction of upward, downward, left, right, etc., the orientation and the deflection angle of the target to be recognized may be based on the setting information of the target to be recognized, if the target to be recognized is a bill, the direction of a font in the bill may be based, that is, if the font is upward, the orientation of the target to be recognized is upward, if the font is deflected by 30 degrees with respect to a horizontal line, the deflection angle of the target to be recognized is 30 degrees, and if the target to be recognized is a person, an animal, a plant, etc., the orientation and the deflection angle of the target to be recognized may be determined based on the setting information of a shape information of a preset position of the target to be recognized, etc. The preset deflection angle may be any angle within a preset range, such as any angle greater than or equal to 0 degrees and less than 360 degrees.
The method for intelligently identifying the target image comprises the steps of inputting an initial image into a first submodel of a preset feature extraction model to obtain at least three first feature maps with different scales, inputting the first feature map into a second submodel to obtain at least three second feature maps with different scales, constructing the second submodel based on a feature pyramid network, inputting the second feature map into a third submodel to obtain target information of a target to be identified, and determining the target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence, direction confidence and deflection angle confidence. According to the technical scheme, the initial image is processed by the first sub-model of the preset feature extraction model, the features (first feature maps) of the initial image in different scales can be obtained, then the features are processed by the second sub-model constructed based on the feature pyramid network, the fine features (second feature maps) of the initial image in different scales can be obtained, finally all the second feature maps are input into the third sub-model, the target information of the target to be recognized in the initial image can be obtained, the target image of the target to be recognized can be determined according to the target information.
Example two
Fig. 2 is a flowchart of an intelligent target image identification method according to a second embodiment of the present invention, and the technical solution of the second embodiment of the present invention is further optimized based on the above optional technical solutions, and a specific way of determining a target image is given.
Optionally, the inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales includes: inputting the first feature map into the second submodel, performing second convolution processing on the deep feature map by using the second submodel, performing first up-sampling processing on a second convolution processing result, and splicing the first up-sampling result and the middle layer feature map to obtain a first splicing feature; after the first splicing characteristic is subjected to third convolution processing, second up-sampling processing is carried out on a third convolution processing result, and a second up-sampling result and the shallow layer characteristic diagram are spliced to obtain a second splicing characteristic; after fourth convolution processing is carried out on the second splicing characteristic, fifth convolution processing is carried out on a fourth convolution processing result, and a fifth convolution processing result and the third convolution processing result are spliced to obtain a third splicing characteristic; and performing sixth convolution processing on the third splicing feature, performing seventh convolution processing on a sixth convolution processing result, splicing the seventh convolution processing result and the second convolution processing result to obtain a fourth splicing feature, and performing eighth convolution processing on the fourth splicing feature to obtain an eighth convolution processing result, wherein the second feature map comprises the fourth convolution processing result, the sixth convolution processing result and the eighth convolution processing result. The method has the advantages that the second submodel is used for processing the first characteristic diagram such as convolution and splicing, the purpose of optimizing deep characteristics by utilizing shallow characteristics is achieved, and the transmission of position information in the model is enhanced.
Optionally, the inputting the second feature map into the third sub-model to obtain target information of a target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, includes: inputting the second feature maps into the third submodel, and performing ninth convolution processing on the current second feature map by using the third submodel aiming at each second feature map to obtain target information of a detection frame of a target to be identified; screening the target information of the detection frame by using a non-maximum suppression algorithm to determine the target position, the target direction and the target deflection angle of the target detection frame; and cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be identified. The advantage of this arrangement is that the accuracy of the target image is enhanced by screening the redundant detection frames with a non-maximum suppression algorithm.
As shown in fig. 2, the second embodiment of the present invention provides an intelligent target image identification method, which specifically includes the following steps:
s201, inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales.
Optionally, the inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales includes: inputting the initial image into a convolution layer, a batch normalization layer and an activation function layer to obtain a shallow feature map, a middle feature map and an initial deep feature map which have different scales; performing first convolution processing on the initial deep layer feature map, and then performing pooling processing on a first convolution processing result to obtain a plurality of pooling processing results with different scales, wherein the scale of the initial deep layer feature map is smaller than that of the shallow layer feature map and that of the middle layer feature map; and splicing the pooling processing results of the different scales to obtain a deep layer feature map, wherein the first feature map comprises the deep layer feature map, the shallow layer feature map and the middle layer feature map. The advantage of this arrangement is that the first feature map obtained by the processing of the convolutional layer, the batch normalization layer, the activation function layer and the pooling layer in the first submodel contains different levels of feature information.
Specifically, the initial image is input into a first sub-model including a convolution layer, a batch normalization layer and an activation function layer, so that three first feature maps with different scales, namely a shallow feature map, a middle feature map and an initial deep feature map, can be obtained, wherein the scale of the shallow feature map is larger than that of the middle feature map, and the scale of the middle feature map is larger than that of the initial deep feature map. The initial deep layer feature map may be subjected to feature extraction using a convolutional layer, then the extracted features are subjected to pooling processing to obtain a plurality of features of different scales (pooling processing results), and finally the features of different scales are spliced to obtain a deep layer feature map. The size of the pooled nuclei can be at least 4, such as 1*1, 5*5, 9*9 and 13 × 13.
Optionally, the determining manner of the preset feature extraction model includes:
1) The method comprises the steps of obtaining a sample image containing a preset real frame, wherein the sample image at least contains a sample target image, the preset real frame is used for framing the sample target image, and the preset real frame is configured with a sample label.
Specifically, the images of all the sample targets to be recognized, that is, the sample target images, may be framed in the sample images by real frames (preset real frames), and each preset real frame is configured with a sample label correspondingly. The sample label may include a sample position label, a sample category label, a sample direction label, a sample deflection angle label, and the like, the sample position label may be a position coordinate of a preset real frame in an image coordinate system of the sample image, the sample category label may be a category to which an object in the preset real frame belongs, such as a bill or an identity card, the sample direction label may be a direction of the preset real frame, fig. 3 is a schematic diagram of a sample direction, the sample direction label may include a positive direction (a direction shown by a left image in fig. 3), a lower direction (a direction shown by a left image in fig. 3), a left direction (a direction shown by a left image in fig. 3), a right direction (a direction shown by a left image in fig. 3), and the like, and the sample deflection angle label may be an angle at which the preset real frame deviates from a horizontal line.
2) And inputting the sample image into a first initial sub-model of a preset initial model to obtain at least three first sample characteristic maps with different scales, wherein the preset initial model at least comprises the first initial sub-model, a second initial sub-model and a third initial sub-model.
Specifically, the sample image may be input into a first initial sub-model, so that a first sample feature map of at least three different scales may be obtained. The first initial sub-model may include an initial convolutional layer, an initial batch normalization layer, an initial activation function layer, and an initial pooling layer.
Optionally, before the sample image is input into the first initial sub-model, the sample deflection angle of the sample target image may be converted into an angle within a preset angle range, for example, if the range of the sample deflection angle is greater than-90 degrees and less than or equal to 0 degree, the following conversion method may be used
Figure BDA0003989056800000081
The sample deflection angle is converted to an angle in a range of 0 degrees or more and 180 degrees or less. The theta is the converted angle, the width and height are respectively the width and height of the sample target image, a preset mode, such as OpenCV, may be used to determine the sample deflection angle of the sample target image, if the sample deflection angle is a negative number, the deflection direction corresponding to the sample deflection angle is a negative direction, and if the left is a positive direction, the right is a negative direction.
3) And inputting the first sample feature map into the second initial submodel to obtain at least three second sample feature maps with different scales, wherein the second initial submodel is constructed based on a feature pyramid network.
4) And inputting the second sample characteristic diagram into the third initial sub-model to obtain sample target information of a sample detection frame of a sample target to be identified, wherein the sample target information comprises sample position information, sample category confidence, sample direction confidence and sample deflection angle confidence.
Specifically, through the processing of the third initial submodel, a detection frame of the sample target to be identified, that is, a sample detection frame, may be obtained, and sample target information of the detection frame may also be determined.
5) And determining a loss function according to the sample target information and the sample label, and training the preset initial model by using the loss function to obtain a preset feature extraction model.
Specifically, a loss function can be established according to the sample target information and the sample label, for example, a regression loss function can be determined according to the sample position information and the sample position label in the sample target information, the loss function can be determined as a loss function of the model, a difference (a result of the loss function) between the sample detection frame and the preset real frame can be calculated by using the loss function, the preset initial model is trained according to the size of the result of the loss function, when the result of the loss function is smaller than the preset loss function value, it is indicated that the training of the preset initial model is completed, and the trained preset initial model is the preset feature extraction model.
The advantage of the above arrangement is that the accuracy of the preset feature extraction model is ensured by training the preset initial model by using the sample image containing the preset real frame and determining whether the training is completed or not by the loss function.
Further, the loss function includes: the method comprises a regression loss function, a confidence coefficient loss function, a category loss function, an angle loss function and a direction loss function, wherein the regression loss function is determined based on the intersection and parallel ratio of the sample detection frame and the preset real frame and the area of the minimum convex closed frame of the sample detection frame and the preset real frame. The advantage that sets up like this lies in, through utilizing five kinds of loss functions, comprehensive aassessment is to predetermine initial model's training effect, and whether comprehensive the confirming predetermines initial model and train and accomplish, has improved the degree of accuracy of predetermineeing the feature extraction model.
For example, the loss function may be determined in the following manner:
Loss=αL 1 +βL 2 +θL 3 +γL 4 +δL 5
where α, β, θ, γ, and δ are coefficients of a loss function, L 1 As a function of the regression loss, L 2 As a function of confidence loss, L 3 Is a class loss function, L 4 As a function of angular loss, L 5 Is a directional loss function. The regression loss function may be determined in the following manner
Figure BDA0003989056800000091
Wherein, A represents the area of a preset real frame, B represents the area of a sample detection frame, | | represents the absolute value, IOU represents the intersection and combination ratio of A and B, C represents the area of a minimum convex closed frame of A and B, and \ represents the subtraction operation of the area. The confidence loss function, the category loss function, the angle loss function, and the direction loss function may be determined in the following manner
Figure BDA0003989056800000092
Wherein y is the probability of presetting a real frame,
Figure BDA0003989056800000093
the confidence of the sample detection box is the confidence of the sample class, the sample direction or the sample deflection angle. If the sample label of the preset real frame is consistent with the sample target information of the corresponding sample detection frame, y =1, and if the sample label of the preset real frame is inconsistent with the sample target information of the corresponding sample detection frame, y =0, and if the sample type label of the preset real frame is an identity card and the type corresponding to the sample type confidence coefficient of the sample detection frame is an invoice, y =0./>
Optionally, before the obtaining of the sample image containing the preset real frame, the method further includes:
1) The method comprises the steps of obtaining an initial sample image containing a preset initial real frame, wherein the initial sample image at least contains one initial sample target image, the preset initial real frame is used for framing the initial sample target image, the preset initial real frame is configured with an initial sample label, and the initial sample label contains an initial sample direction label.
2) And carrying out preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the processing process of the preset image processing to determine the sample label.
Specifically, the preset image processing may include fixed image scale processing, image stitching processing, image translation processing, image cropping processing, image flipping processing, image rotation processing, and the like, and one or more processing modes may be selected from the preset image processing according to the same probability to process the initial sample image, where the processed initial sample image is the sample image. If the initial sample image after the image processing is preset and the information corresponding to the initial sample label is inconsistent, the initial sample label needs to be adjusted according to the processing process of the image processing to obtain the sample label, if the initial sample label is a positive direction, after the image turning processing, the direction of the initial sample image is turned by 180 degrees, the turned initial sample image is the sample image, the initial sample label can be updated to a downward direction, and the downward direction is the sample label of the sample image.
The fixed image scaling process may be to unify all the initial sample images into a specific size, for example, 640 × 640, to avoid destroying the characteristics of the original initial sample image, the initial sample image may be scaled equally, and the initial sample image that does not meet the scaling requirement may be padded, for example, with the three primary color pixel value of (114,114,114) to achieve the purpose of unifying the scales. The image stitching processing may be that a preset number of initial sample images, such as four initial sample images, are randomly selected, the initial sample images are stitched after being subjected to fixed image scale processing, and then scaled to a set scale, such as 640 × 640. If a point can be randomly selected in the range of the central area of a preset area with the scale of 1280 x 1280, four selected initial sample images are randomly placed in four areas, namely the upper left area, the lower left area, the upper right area and the lower right area of the selected point, if some initial sample images exceed the range of the preset area, namely the initial sample images are truncated, if the area of the initial sample images in the range of the preset area accounts for more than 40% of the total area of the initial sample images, the initial sample images in the range of the preset area are retained, if the area of the initial sample images in the range of the preset area is less than 40%, the initial sample images are determined to be invalid, and finally, the spliced initial sample images are zoomed to the set scale, so that the image splicing processing can be completed. The image translation processing may be to translate the initial sample image up and down, left and right, and the translation size may be preset. The image cropping processing may be to perform random area cropping on the initial sample image, and in this process, if some initial sample images may be truncated, the remaining initial sample image may be determined according to the above way of processing the truncation phenomenon, and then the initial sample image is fixed to a specific scale. The image flipping process may be to randomly flip the initial sample image up and down and left and right. The image rotation process may be to randomly rotate the input initial sample image, where the rotation angle range may be preset, for example, the rotation angle is preset to be clockwise or counterclockwise, and is less than 44 degrees. Fig. 4 is a schematic diagram of an initial sample image, and fig. 5 is a schematic diagram of a sample image, where the sample image obtained by performing processes such as fixed image scale processing, image stitching processing, and image rotation processing on the initial sample image shown in fig. 4 is shown in fig. 5.
Performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the preset image processing process to determine the sample label, including:
if the initial sample image is determined to be subjected to image rotation processing, determining a preset angle before the image rotation processing is executed, wherein the preset angle is the angle which is closest to the rotation angle corresponding to the image rotation processing in a preset angle set; rotating the initial sample image to a first position corresponding to the preset angle, and determining a first sequence of first corner point coordinates according to a preset sorting mode, wherein the first corner point coordinates are coordinates of corner points of a preset initial real frame after the initial sample image is rotated by the preset angle, and the preset sorting mode comprises sorting of absolute values of coordinates of preset coordinate axes; and after the initial sample image is restored to the initial position before rotation, rotating the initial sample image to a second position corresponding to the rotation angle, determining the initial sample image at the second position as a sample image, determining a second sequence of second corner point coordinates according to the preset sorting mode, and if the second sequence is the same as the first sequence, determining a sample direction corresponding to the preset angle as a sample direction label in a sample label of the sample image.
For example, if the initial sample image is in the positive direction, the rotation angle corresponding to the image rotation process is 93 degrees, the rotation direction is the left direction, and the preset angle set includes 90 degrees, 180 degrees, 270 degrees, and 360 degrees, the preset angle is 90 degrees, and the rotation direction is also the left direction. The initial sample image may be rotated to 90 degrees, and sorted from small to large according to the abscissa value of the corner coordinates of the initial sample image in the preset coordinate system, and after the initial sample image is restored to the original position, the initial sample image is rotated to 93 degrees, and sorted from small to large according to the abscissa value of the corner coordinates of the initial sample image in the preset coordinate system, if the sorting (first order) of the corner coordinates of the initial sample image in the preset coordinate system at this time is the corner 1, the corner 3, the corner 2, and the corner 4. If the sequence (second sequence) of the corner coordinates of the initial sample image in the preset coordinate system is still the corner 1, the corner 3, the corner 2, and the corner 4, the sample direction (left direction) rotated to 90 degrees may be determined as the sample direction label of the sample image. If the first sequence and the second sequence are different and the rotation angle is greater than the preset angle, the next direction of the sample direction corresponding to the preset angle can be determined as the sample direction label of the sample image, and if the next direction of the left direction is the next direction, the next direction is the sample direction label. If the first sequence and the second sequence are different and the rotation angle is smaller than the preset angle, the previous direction of the sample direction corresponding to the preset angle can be determined as the sample direction label of the sample image, and if the previous direction of the left direction is the positive direction, the previous direction is the sample direction label.
Optionally, the preset angle set may not be preset, but a rotation angle and a preset angle corresponding to image rotation processing, such as a 90-degree modulus value and a quotient value, are calculated, the initial sample image is rotated to a set angle corresponding to a product of the preset angle and the modulus value, a third order of coordinates of a third corner point is determined according to a preset sorting mode, the initial sample image is restored to an initial position before rotation, the initial sample image is rotated to a position corresponding to the rotation angle to obtain a sample image, a fourth order of coordinates of a fourth corner point is determined according to the preset sorting mode, and if the fourth order is the same as the third order, a sample direction corresponding to the set angle is determined as a sample direction label in a sample label of the sample image.
The advantage that above-mentioned set up like this lies in, through the image processing of predetermineeing to initial sample image and to sample label's correction, has both guaranteed the variety of sample image, has still guaranteed sample label's accuracy, has realized the promotion to predetermineeing the characteristic extraction model precision.
S202, inputting the first feature map into the second submodel, performing second convolution processing on the deep feature map by using the second submodel, performing first up-sampling processing on a second convolution processing result, and splicing the first up-sampling result and the middle-layer feature map to obtain a first splicing feature.
For example, if the 3 first feature maps are c3, c4 and c5, respectively, c3 is a shallow feature map determined by 8 times downsampling, c4 is a middle feature map determined by 16 times downsampling, and c5 is a deep feature map determined by 32 times downsampling, the deep feature map may be subjected to second convolution processing by using a second sub-model to obtain a fine second convolution processing result p4, then the p4 may be subjected to first upsampling processing, the obtained first upsampling result is the same as the middle feature map in scale, and the first upsampling result and the middle feature map may be merged to obtain the first merged feature.
S203, after the first splicing characteristic is subjected to third convolution processing, second up-sampling processing is carried out on a third convolution processing result, and a second up-sampling result and a shallow layer characteristic diagram are spliced to obtain a second splicing characteristic.
For example, after performing third convolution processing on the obtained first mosaic feature, a third convolution processing result p3 may be obtained, then, second upsampling processing is performed on p3, the obtained second upsampling result has the same scale as the shallow feature map, and the second upsampling result and the shallow feature map may be stitched to obtain a second mosaic feature.
S204, after fourth convolution processing is carried out on the second splicing feature, fifth convolution processing is carried out on a fourth convolution processing result, and the fifth convolution processing result and the third convolution processing result are spliced to obtain a third splicing feature.
For example, after performing fourth convolution processing on the obtained second splicing feature, a fourth convolution processing result h3 may be obtained, then performing fifth convolution processing on h3, where the obtained fifth convolution processing result is the same as the scale of p3, and splicing the fifth convolution processing result and p3 to obtain a third splicing feature.
S205, after performing sixth convolution processing on the third splicing feature, performing seventh convolution processing on a sixth convolution processing result, splicing the seventh convolution processing result and the second convolution processing result to obtain a fourth splicing feature, and performing eighth convolution processing on the fourth splicing feature to obtain an eighth convolution processing result.
Wherein the second feature map includes the fourth convolution processing result, the sixth convolution processing result, and the eighth convolution processing result.
For example, after performing sixth convolution processing on the obtained third splicing feature, a sixth convolution processing result h4 may be obtained, then performing seventh convolution processing on h3, where the obtained seventh convolution processing result is the same as the scale of p4, the seventh convolution processing result and p4 may be spliced to obtain a fourth splicing feature, and after performing eighth convolution processing on the obtained fourth splicing feature, an eighth convolution processing result h5 may be obtained, so as to obtain a second feature map h3, a second feature map h4, and a second feature map h5.
S206, inputting the second feature maps into the third submodel, and performing ninth convolution processing on the current second feature map by using the third submodel aiming at each second feature map to obtain target information of the detection frame of the target to be identified.
For example, as described above, the target information of the detection frame of the target to be recognized may be obtained by inputting h3, h4, and h5 into the third submodel, and performing ninth convolution processing on the current second feature map by using the third submodel for each second feature map.
S207, screening the target information of the detection frame by using a non-maximum suppression algorithm to determine the target position, the target direction and the target deflection angle of the target detection frame.
Specifically, since the second feature map may include a plurality of detection frames of a plurality of targets to be recognized, the detection frames need to be screened, and by using a non-maximum suppression algorithm, redundant detection frames may be removed, for example, detection frames with a category confidence level that is too low are removed, so as to obtain a target detection frame, the position information of the target detection frame is a target position, the category corresponding to the category confidence level of the target detection frame is a target category, the direction corresponding to the direction confidence level of the target detection frame is a target direction, and the deflection angle corresponding to the deflection angle confidence level of the target detection frame is a target deflection angle.
For example, the screening step corresponding to the non-maximum suppression algorithm may be: 1) filtering out the detection frames with the class confidence degrees lower than the threshold value by using a classification score threshold value, 2) sequencing the class confidence degrees of the reserved prediction frames to determine the maximum class confidence degree and the corresponding first detection frame, 3) traversing the remaining detection frames except the first detection frame, if the rotation intersection ratio of the current detection frame and the first detection frame is greater than the set intersection ratio threshold value, deleting the current detection frame, reserving the first detection frame, and 4) repeating the process of the step 2).
S208, cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be recognized.
Specifically, fig. 6 is a schematic diagram of a target detection frame, where fig. 6 includes four images, a rectangular frame in each image is a detection frame, detection frames in a left column of images are conventional detection frames generated by a conventional image detection algorithm, and detection frames in a right column of images are target detection frames generated according to the method, so that it can be obviously seen that the accuracy of the target detection frame for determining the target to be recognized is higher than that of the conventional detection frames, that is, the recognition effect of the target to be recognized by the method is better than that of the conventional method. The target detection frame can be cut out from the initial image according to the target position and the target deflection angle, the direction of the image corresponding to the target detection frame is corrected according to the target deflection angle, and the image is rotated to the positive direction, so that the target image is obtained.
According to the intelligent identification method of the target image, provided by the embodiment of the invention, the initial image is processed by utilizing the first sub-model of the preset feature extraction model, the first feature maps with different scales can be obtained, then the second sub-model constructed based on the feature pyramid network is utilized, the first feature maps are subjected to convolution processing for multiple times, the first feature maps with different scales are aligned with the scale of the convolution processing result and are spliced, so that fine features (second feature maps) consistent with the scale of the first feature maps can be obtained, finally all the second feature maps are input into the third sub-model, so that the target information of the target to be identified in the initial image can be obtained, the target information is screened by utilizing a non-maximum suppression algorithm, so that a target detection frame and a target image corresponding to the target detection frame are determined, the initial image is processed by utilizing the light-weight feature extraction model in the method, the target to be identified in any direction can be detected and classified under different scenes, the direction of the image of the target to be identified is corrected, and the working efficiency of image identification is greatly improved.
EXAMPLE III
Fig. 7 is a schematic structural diagram of an intelligent target image recognition apparatus according to a third embodiment of the present invention. As shown in fig. 7, the apparatus includes: a first feature map determination module 301, a second feature map determination module 302, and a target image determination module 303, wherein:
the first feature map determining module is used for inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model;
the second feature map determining module is used for inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
and the target image determining module is used for inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
The device for intelligently identifying the target image, provided by the embodiment of the invention, comprises the steps of processing the initial image by using the first sub-model of the preset feature extraction model to obtain features (first feature maps) of the initial image in different scales, processing the features by using the second sub-model constructed based on the feature pyramid network to obtain fine features (second feature maps) of the initial image in different scales, and inputting all the second feature maps into the third sub-model to obtain target information of a target to be identified in the initial image.
Optionally, the first sub-model at least includes a convolution layer, a batch normalization layer, an activation function layer, and a pooling layer.
Optionally, the first feature map determining module includes:
the characteristic diagram determining unit is used for inputting the initial image into the convolutional layer, the batch normalization layer and the activation function layer to obtain a shallow characteristic diagram, a middle characteristic diagram and an initial deep characteristic diagram which have different scales;
the pooling processing unit is used for performing first pooling processing on the initial deep feature map and then performing pooling processing on a first pooling processing result to obtain a plurality of pooling processing results with different scales, wherein the scale of the initial deep feature map is smaller than that of the shallow feature map and that of the middle feature map;
and the splicing unit is used for splicing the pooling processing results of the different scales to obtain a deep characteristic map, wherein the first characteristic map comprises the deep characteristic map, the shallow characteristic map and the middle characteristic map.
Optionally, the second feature map determining module includes:
the first splicing characteristic determining unit is used for inputting the first characteristic diagram into the second submodel, performing second convolution processing on the deep characteristic diagram by using the second submodel, performing first up-sampling processing on a second convolution processing result, and splicing the first up-sampling result and the middle characteristic diagram to obtain a first splicing characteristic;
the second splicing feature determining unit is used for performing second upsampling processing on a third convolution processing result after the third convolution processing is performed on the first splicing feature, and splicing a second upsampling result and the shallow layer feature map to obtain a second splicing feature;
a third splicing feature determining unit, configured to perform a fifth convolution processing on a fourth convolution processing result after performing the fourth convolution processing on the second splicing feature, and splice the fifth convolution processing result and the third convolution processing result to obtain a third splicing feature;
and the second feature map determining unit is configured to perform sixth convolution processing on the third splicing feature, perform seventh convolution processing on a sixth convolution processing result, splice the seventh convolution processing result and the second convolution processing result to obtain a fourth splicing feature, and perform eighth convolution processing on the fourth splicing feature to obtain an eighth convolution processing result, where the second feature map includes the fourth convolution processing result, the sixth convolution processing result, and the eighth convolution processing result.
Optionally, the target image determining module includes:
the target information determining unit is used for inputting the second feature maps into the third submodel, and performing ninth convolution processing on the current second feature map by using the third submodel aiming at each second feature map to obtain target information of a detection frame of a target to be identified;
the screening unit is used for screening the target information of the detection frame by using a non-maximum suppression algorithm so as to determine the target position, the target direction and the target deflection angle of the target detection frame;
and the target image determining unit is used for cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be recognized.
Optionally, the determining manner of the preset feature extraction model includes: acquiring a sample image containing a preset real frame, wherein the sample image at least contains one sample target image, the preset real frame is used for framing the sample target image, and the preset real frame is configured with a sample label; inputting the sample image into a first initial submodel of a preset initial model to obtain at least three first sample characteristic graphs with different scales, wherein the preset initial model at least comprises the first initial submodel, a second initial submodel and a third initial submodel; inputting the first sample feature map into the second initial submodel to obtain at least three second sample feature maps with different scales, wherein the second initial submodel is constructed based on a feature pyramid network; inputting the second sample characteristic diagram into the third initial sub-model to obtain sample target information of a sample detection frame of a sample target to be identified, wherein the sample target information comprises sample position information, a sample category confidence coefficient, a sample direction confidence coefficient and a sample deflection angle confidence coefficient; and determining a loss function according to the sample target information and the sample label, and training the preset initial model by using the loss function to obtain a preset feature extraction model.
Optionally, before the obtaining the sample image containing the preset real frame, the method further includes: acquiring an initial sample image containing a preset initial real frame, wherein the initial sample image at least contains one initial sample target image, the preset initial real frame is used for framing the initial sample target image, the preset initial real frame is configured with an initial sample label, and the initial sample label contains an initial sample direction label; performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the processing process of the preset image processing to determine a sample label; performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the preset image processing process to determine a sample label, including: if the initial sample image is determined to be subjected to image rotation processing, determining a preset angle before the image rotation processing is executed, wherein the preset angle is the angle which is closest to the rotation angle corresponding to the image rotation processing in a preset angle set; rotating the initial sample image to a first position corresponding to the preset angle, and determining a first sequence of first corner coordinates according to a preset sorting mode, wherein the first corner coordinates are coordinates of corners of a preset initial real frame after the initial sample image is rotated by the preset angle, and the preset sorting mode comprises sorting of absolute values of coordinates of preset coordinate axes; and after the initial sample image is restored to the initial position before rotation, rotating the initial sample image to a second position corresponding to the rotation angle, determining the initial sample image at the second position as a sample image, determining a second sequence of second corner point coordinates according to the preset sorting mode, and if the second sequence is the same as the first sequence, determining a sample direction corresponding to the preset angle as a sample direction label in a sample label of the sample image.
Further, the loss function includes: the method comprises the following steps of determining a regression loss function, a confidence coefficient loss function, a category loss function, an angle loss function and a direction loss function, wherein the regression loss function is determined based on the intersection and combination ratio of the sample detection frame and the preset real frame and the area of the minimum convex closed frame of the sample detection frame and the preset real frame.
The intelligent identification device for the target image provided by the embodiment of the invention can execute the intelligent identification method for the target image provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
FIG. 8 shows a schematic block diagram of an electronic device 40 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the electronic device 40 includes at least one processor 41, and a memory communicatively connected to the at least one processor 41, such as a Read Only Memory (ROM) 42, a Random Access Memory (RAM) 43, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 41 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 42 or the computer program loaded from a storage unit 48 into the Random Access Memory (RAM) 43. In the RAM 43, various programs and data necessary for the operation of the electronic apparatus 40 can also be stored. The processor 41, the ROM 42, and the RAM 43 are connected to each other via a bus 44. An input/output (I/O) interface 45 is also connected to bus 44.
A number of components in the electronic device 40 are connected to the I/O interface 45, including: an input unit 46 such as a keyboard, a mouse, or the like; an output unit 47 such as various types of displays, speakers, and the like; a storage unit 48 such as a magnetic disk, optical disk, or the like; and a communication unit 49 such as a network card, modem, wireless communication transceiver, etc. The communication unit 49 allows the electronic device 40 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Processor 41 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 41 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 41 performs the various methods and processes described above, such as a method of intelligent identification of a target image.
In some embodiments, the method for intelligent identification of a target image may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 48. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 40 via the ROM 42 and/or the communication unit 49. When the computer program is loaded into the RAM 43 and executed by the processor 41, one or more steps of the above described method of intelligent recognition of a target image may be performed. Alternatively, in other embodiments, the processor 41 may be configured by any other suitable means (e.g., by means of firmware) to perform the intelligent identification method of the target image.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
The computer device provided by the above can be used for executing the intelligent identification method of the target image provided by any of the above embodiments, and has corresponding functions and advantages.
EXAMPLE five
In the context of the present invention, a computer-readable storage medium may be a tangible medium, the computer-executable instructions, when executed by a computer processor, for performing a method of intelligent recognition of a target image, the method comprising:
inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model;
inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
In the context of the present invention, a computer readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer device provided by the above can be used for executing the intelligent identification method of the target image provided by any of the above embodiments, and has corresponding functions and advantages.
It should be noted that, in the embodiment of the above-mentioned intelligent target image recognition apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An intelligent identification method of a target image is characterized by comprising the following steps:
inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model;
inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
and inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, category confidence coefficient, direction confidence coefficient and deflection angle confidence coefficient.
2. The method of claim 1, wherein the first submodel includes at least a convolutional layer, a batch normalization layer, an activation function layer, and a pooling layer; inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the method comprises the following steps:
inputting the initial image into a convolution layer, a batch normalization layer and an activation function layer to obtain a shallow feature map, a middle feature map and an initial deep feature map which have different scales;
performing first convolution processing on the initial deep layer feature map, and then performing pooling processing on a first convolution processing result to obtain a plurality of pooling processing results with different scales, wherein the scale of the initial deep layer feature map is smaller than that of the shallow layer feature map and that of the middle layer feature map;
and splicing the pooling processing results of the different scales to obtain a deep layer feature map, wherein the first feature map comprises the deep layer feature map, the shallow layer feature map and the middle layer feature map.
3. The method of claim 2, wherein the inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales comprises:
inputting the first feature map into the second submodel, performing second convolution processing on the deep feature map by using the second submodel, performing first up-sampling processing on a second convolution processing result, and splicing the first up-sampling result and the middle layer feature map to obtain a first splicing feature;
after the first splicing characteristic is subjected to third convolution processing, second up-sampling processing is carried out on a third convolution processing result, and a second up-sampling result and the shallow layer characteristic diagram are spliced to obtain a second splicing characteristic;
performing fourth convolution processing on the second splicing feature, performing fifth convolution processing on a fourth convolution processing result, and splicing the fifth convolution processing result and the third convolution processing result to obtain a third splicing feature;
and after performing sixth convolution processing on the third splicing feature, performing seventh convolution processing on a sixth convolution processing result, splicing the seventh convolution processing result and the second convolution processing result to obtain a fourth splicing feature, and performing eighth convolution processing on the fourth splicing feature to obtain an eighth convolution processing result, wherein the second feature map comprises the fourth convolution processing result, the sixth convolution processing result and the eighth convolution processing result.
4. The method according to any one of claims 1 to 3, wherein the inputting the second feature map into the third submodel to obtain object information of an object to be recognized, and determining an object image of the object to be recognized from the initial image according to the object information comprises:
inputting the second feature maps into the third submodel, and performing ninth convolution processing on the current second feature map by using the third submodel aiming at each second feature map to obtain target information of a detection frame of a target to be identified;
screening the target information of the detection frame by using a non-maximum suppression algorithm to determine the target position, the target direction and the target deflection angle of the target detection frame;
and cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be identified.
5. The method of claim 1, wherein the predetermined feature extraction model is determined in a manner comprising:
acquiring a sample image containing a preset real frame, wherein the sample image at least contains one sample target image, the preset real frame is used for framing the sample target image, and the preset real frame is configured with a sample label;
inputting the sample image into a first initial sub-model of a preset initial model to obtain at least three first sample feature maps with different scales, wherein the preset initial model at least comprises the first initial sub-model, a second initial sub-model and a third initial sub-model;
inputting the first sample feature map into the second initial submodel to obtain at least three second sample feature maps with different scales, wherein the second initial submodel is constructed based on a feature pyramid network;
inputting the second sample characteristic diagram into the third initial sub-model to obtain sample target information of a sample detection frame of a sample target to be identified, wherein the sample target information comprises sample position information, a sample category confidence coefficient, a sample direction confidence coefficient and a sample deflection angle confidence coefficient;
and determining a loss function according to the sample target information and the sample label, and training the preset initial model by using the loss function to obtain a preset feature extraction model.
6. The method according to claim 5, further comprising, before the obtaining the sample image containing the predetermined real frame:
acquiring an initial sample image containing a preset initial real frame, wherein the initial sample image at least contains one initial sample target image, the preset initial real frame is used for framing the initial sample target image, the preset initial real frame is configured with an initial sample label, and the initial sample label contains an initial sample direction label;
performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the processing process of the preset image processing to determine a sample label;
performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the preset image processing process to determine the sample label, including:
if the initial sample image is determined to be subjected to image rotation processing, determining a preset angle before the image rotation processing is executed, wherein the preset angle is the angle which is closest to the rotation angle corresponding to the image rotation processing in a preset angle set;
rotating the initial sample image to a first position corresponding to the preset angle, and determining a first sequence of first corner point coordinates according to a preset sorting mode, wherein the first corner point coordinates are coordinates of corner points of a preset initial real frame after the initial sample image is rotated by the preset angle, and the preset sorting mode comprises sorting of absolute values of coordinates of preset coordinate axes;
and after the initial sample image is restored to the initial position before rotation, rotating the initial sample image to a second position corresponding to the rotation angle, determining the initial sample image at the second position as a sample image, determining a second sequence of second corner point coordinates according to the preset sorting mode, and if the second sequence is the same as the first sequence, determining a sample direction corresponding to the preset angle as a sample direction label in a sample label of the sample image.
7. The method of claim 5, wherein the loss function comprises: the method comprises the following steps of determining a regression loss function, a confidence coefficient loss function, a category loss function, an angle loss function and a direction loss function, wherein the regression loss function is determined based on the intersection and combination ratio of the sample detection frame and the preset real frame and the area of the minimum convex closed frame of the sample detection frame and the preset real frame.
8. An apparatus for intelligently recognizing a target image, comprising:
the first feature map determining module is used for inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises a first sub-model, a second sub-model and a third sub-model;
the second feature map determining module is used for inputting the first feature map into the second submodel to obtain at least three second feature maps with different scales, wherein the second submodel is constructed based on a feature pyramid network;
and the target image determining module is used for inputting the second feature map into the third sub-model to obtain target information of the target to be recognized, and determining a target image of the target to be recognized from the initial image according to the target information, wherein the target information comprises position information, a category confidence coefficient, a direction confidence coefficient and a deflection angle confidence coefficient.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of intelligent identification of target images of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the method of intelligently identifying a target image according to any one of claims 1 to 7 when executed.
CN202211575082.3A 2022-12-08 2022-12-08 Intelligent identification method, device and equipment for target image and storage medium Pending CN115937537A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211575082.3A CN115937537A (en) 2022-12-08 2022-12-08 Intelligent identification method, device and equipment for target image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211575082.3A CN115937537A (en) 2022-12-08 2022-12-08 Intelligent identification method, device and equipment for target image and storage medium

Publications (1)

Publication Number Publication Date
CN115937537A true CN115937537A (en) 2023-04-07

Family

ID=86553403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211575082.3A Pending CN115937537A (en) 2022-12-08 2022-12-08 Intelligent identification method, device and equipment for target image and storage medium

Country Status (1)

Country Link
CN (1) CN115937537A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994002A (en) * 2023-09-25 2023-11-03 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994002A (en) * 2023-09-25 2023-11-03 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium
CN116994002B (en) * 2023-09-25 2023-12-19 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2020221013A1 (en) Image processing method and apparaus, and electronic device and storage medium
WO2020199468A1 (en) Image classification method and device, and computer readable storage medium
CN111583097A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN114550177B (en) Image processing method, text recognition method and device
CN109886159B (en) Face detection method under non-limited condition
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
US20240193923A1 (en) Method of training target object detection model, method of detecting target object, electronic device and storage medium
CN112989995B (en) Text detection method and device and electronic equipment
CN111680690A (en) Character recognition method and device
CN113239807B (en) Method and device for training bill identification model and bill identification
CN115937537A (en) Intelligent identification method, device and equipment for target image and storage medium
CN116740355A (en) Automatic driving image segmentation method, device, equipment and storage medium
CN114283343A (en) Map updating method, training method and equipment based on remote sensing satellite image
CN113378837A (en) License plate shielding identification method and device, electronic equipment and storage medium
CN115620321B (en) Table identification method and device, electronic equipment and storage medium
CN115345895B (en) Image segmentation method and device for visual detection, computer equipment and medium
CN116091709A (en) Three-dimensional reconstruction method and device for building, electronic equipment and storage medium
CN115761698A (en) Target detection method, device, equipment and storage medium
CN113361535B (en) Image segmentation model training, image segmentation method and related device
CN115601616A (en) Sample data generation method and device, electronic equipment and storage medium
CN114511862A (en) Form identification method and device and electronic equipment
CN114120305A (en) Training method of text classification model, and recognition method and device of text content
CN113792671A (en) Method and device for detecting face synthetic image, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination