CN116935057A

CN116935057A - Target evaluation method, electronic device, and computer-readable storage medium

Info

Publication number: CN116935057A
Application number: CN202310664064.0A
Authority: CN
Inventors: 刘卓异; 王若楠; 杨学迅; 孙志亮; 黄鹏; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-10-24

Abstract

The application discloses a target evaluation method, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: obtaining a copy image and a template image, and respectively extracting the characteristics of the copy image and the template image in multiple dimensions; the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, wherein the structural characteristics represent a composition structure corresponding to a target in the image, and the style characteristics represent an external form corresponding to the target in the image; and determining scores corresponding to the targets in the copy image based on the characteristics of the plurality of dimensions corresponding to the copy image and the characteristics of the plurality of dimensions corresponding to the template image. By means of the scheme, the efficiency and the accuracy of copying evaluation can be improved.

Description

Target evaluation method, electronic device, and computer-readable storage medium

Technical Field

The present application relates to the field of image processing technology, and in particular, to a target evaluation method, an electronic device, and a computer-readable storage medium.

Background

Copying is an important learning approach, and especially for beginners, how to make them learn the current learning level is becoming more interesting, and conventional copying evaluation methods rely on professionals in the relevant fields for evaluation, but such evaluation modes are inefficient or based on a single algorithm, but such evaluation modes have lower accuracy. In view of this, how to improve the efficiency and accuracy of copy evaluation has become a problem to be solved.

Disclosure of Invention

The application mainly solves the technical problem of providing a target evaluation method, electronic equipment and a computer readable storage medium, which can improve the efficiency and accuracy of copying evaluation.

To solve the above technical problem, a first aspect of the present application provides a target evaluation method, including: obtaining a copy image and a template image, and respectively extracting characteristics of the copy image and the template image in multiple dimensions; the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, wherein the structural characteristics represent a composition structure corresponding to a target in the image, and the style characteristics represent an external form corresponding to the target in the image; and determining scores corresponding to targets in the copy image based on the characteristics of the plurality of dimensions corresponding to the copy image and the characteristics of the plurality of dimensions corresponding to the template image.

To solve the above technical problem, a second aspect of the present application provides an electronic device, including: a memory and a processor coupled to each other, wherein the memory stores program data, and the processor invokes the program data to perform the method of the first aspect.

To solve the above technical problem, a third aspect of the present application provides a computer-readable storage medium having stored thereon program data which, when executed by a processor, implements the method described in the first aspect.

According to the scheme, after the copy image and the template image are obtained, the characteristics of the copy image and the template image are respectively extracted from multiple dimensions, the characteristics of the multiple dimensions corresponding to the copy image and the template image are obtained, the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, the structural characteristics represent the composition structure corresponding to the targets in the image, the style characteristics represent the external forms corresponding to the targets in the image, and therefore the characteristics of at least two dimensions are obtained, the comprehensiveness of characteristic extraction is improved, the targets in the copy image are evaluated based on the characteristics of the multiple dimensions corresponding to the copy image and the characteristics of the multiple dimensions corresponding to the template image, the scores corresponding to the targets in the copy image are determined, so that copy evaluation is performed based on the comprehensive characteristics, the accuracy of copy evaluation is improved, and the characteristics of the multiple dimensions can be extracted from the image only by collecting the images corresponding to perform evaluation in the whole evaluation process, and the efficiency of copy evaluation is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a schematic flow chart of an embodiment of a target evaluation method according to the present application;

FIG. 2 is a schematic flow chart of another embodiment of the objective evaluation method of the present application;

FIG. 3 is a schematic diagram of a style feature extraction network according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of a structural feature extraction network according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an electronic device according to the present application;

fig. 6 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.

The target evaluation method provided by the application is used for evaluating the target in the copy image, and the application scene at least comprises handwriting copy and drawing copy.

Referring to fig. 1, fig. 1 is a flow chart of an embodiment of a target evaluation method according to the present application, the method includes:

s101: obtaining a copy image and a template image, and respectively extracting characteristics of the copy image and the template image in multiple dimensions, wherein the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, the structural characteristics represent a corresponding composition structure of a target in the image, and the style characteristics represent an external form corresponding to the target in the image.

Specifically, after the copy image and the template image are obtained, the features of the copy image and the template image are respectively extracted from multiple dimensions, so that the features of the copy image and the template image corresponding to each other are obtained, wherein the features of the multiple dimensions at least comprise structural features and style features, and therefore the features of at least two dimensions are obtained, and the comprehensiveness of feature extraction is improved.

Further, the target in the template image is a reference object, and the target in the copy image is a copy object.

In an application mode, a copy image and a template image corresponding to a target are obtained, structural features and style features corresponding to the copy image and the template image are extracted respectively, and features of multiple dimensions corresponding to the copy image and the template image are obtained.

In another application mode, a copy image and a template image corresponding to the target are obtained, structural features, style features and other features corresponding to the copy image and the template image are extracted respectively, and features of multiple dimensions corresponding to the copy image and the template image are obtained, wherein the other features are different from the structural features and the style features.

In an application scene, inputting a copy image into a trained feature extraction model so that the feature extraction model extracts structural features and style features corresponding to targets in the copy image, inputting a template image into the trained feature extraction model so that the feature extraction model extracts the structural features and style features corresponding to the targets in the template image, wherein the feature extraction model comprises a structural feature extraction network and a style feature extraction network, the structural feature extraction network is used for extracting the structural features of the targets, and the style feature extraction network is used for extracting the style features of the targets.

In another application scene, the copy image and the template image are respectively input into a trained structural feature extraction network so that the structural feature extraction network extracts structural features corresponding to the targets in the copy image and structural features corresponding to the targets in the template image, and the copy image and the template image are respectively input into a trained style feature extraction network so that the style feature extraction network extracts style features corresponding to the targets in the copy image and style features corresponding to the targets in the template image.

In a specific application scene, the objects in the copy image and the template image are characters, the composition structure of the characters corresponds to the character strokes of the characters, the external form of the characters corresponds to the font style of the characters, the structure features and the style features corresponding to the characters in the copy image and the structure features and style features corresponding to the characters in the template image are extracted, and the features of multiple dimensions corresponding to the copy image and the template image are obtained, so that the application scene of handwriting copy is adapted.

In another specific application scene, the objects in the copy image and the template image are drawings, the composition structure of the drawings corresponds to the drawing patterns of the drawings, the external form of the drawings corresponds to the drawing style of the drawings, the structural features and the style features corresponding to the drawings in the copy image and the structural features, style features and color features corresponding to the drawings in the template image are extracted, and the features of multiple dimensions corresponding to the copy image and the template image are obtained, wherein the color features represent the color space distribution corresponding to the objects in the image, so that the application scene of the drawing copy is adapted.

S102: and determining scores corresponding to the targets in the copy image based on the characteristics of the plurality of dimensions corresponding to the copy image and the characteristics of the plurality of dimensions corresponding to the template image.

Specifically, based on the characteristics of multiple dimensions corresponding to the copy image and the characteristics of multiple dimensions corresponding to the template image, the targets in the copy image are evaluated, and the scores corresponding to the targets in the copy image are determined, so that the copy evaluation is performed based on the comprehensive characteristics, and the accuracy of the copy evaluation is improved.

It can be understood that the whole evaluation process can extract the characteristics of multiple dimensions from the image only by collecting the image corresponding to the target, so as to evaluate, and the efficiency of copying evaluation is improved.

In an application mode, the features of the multiple dimensions corresponding to the copy image are fused to obtain first fusion features, the features of the multiple dimensions corresponding to the template image are fused to obtain second fusion features, the first fusion features and the second fusion features are compared to obtain fusion feature comparison results, and scores corresponding to targets in the copy image are determined based on the fusion feature comparison results.

In another application mode, the features of multiple dimensions corresponding to the copy image and the features of multiple dimensions corresponding to the template image are sequentially compared according to the types characterized by the features, multiple sub-feature comparison results are obtained, and scores corresponding to targets in the copy image are determined based on the multiple sub-feature comparison results.

In an application scene, fusing the features of multiple dimensions corresponding to the copy image to obtain a first fused feature, fusing the features of multiple dimensions corresponding to the template image to obtain a second fused feature, obtaining Euclidean distances between the first fused feature and the second fused feature, determining the similarity of the first fused feature and the second fused feature based on the Euclidean distance, and determining scores corresponding to targets in the copy image by using the similarity.

In another application scene, the characteristics of multiple dimensions corresponding to the copy image and the characteristics of multiple dimensions corresponding to the template image are sequentially compared according to the types represented by the characteristics to obtain multiple sub-characteristic comparison scores, weights of the sub-characteristic comparison results are determined based on the types represented by the characteristics, the multiple sub-characteristic comparison scores are weighted and summed to obtain a target comparison score, and the target comparison score is utilized to determine the score corresponding to the target in the copy image.

Referring to fig. 2, fig. 2 is a flow chart of another embodiment of the target evaluation method according to the present application, the method includes:

s201: obtaining a copy image and a template image, and respectively extracting characteristics of the copy image and the template image in multiple dimensions, wherein the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, the structural characteristics represent a corresponding composition structure of a target in the image, and the style characteristics represent an external form corresponding to the target in the image.

Specifically, a copy image and a template image are obtained, and features of the copy image and the template image are extracted from at least two dimensions of a composition structure and an external form, so that features of a plurality of dimensions corresponding to the copy image and the template image respectively are obtained.

In an application scene, respectively extracting characteristics of a copy image and a template image in multiple dimensions, including: respectively inputting the copy image and the template image into a feature extraction model to obtain a first structural feature and a first style feature corresponding to the copy image, and a second structural feature and a second style feature corresponding to the template image; the feature extraction model comprises a structural feature extraction network for extracting structural features and a style feature extraction network for extracting style features, the structural feature extraction network is trained by using a plurality of first training sample pairs, the style feature extraction network is trained by using a plurality of second training sample pairs, the first training sample pairs comprise samples with the same composition structure, and the second training sample pairs comprise samples with the same external form.

Specifically, the copy image is input into the feature extraction model, so that a structural feature extraction network in the feature extraction model extracts first structural features corresponding to the copy image, a style feature extraction network extracts first style features corresponding to the copy image, a template image is input into the feature extraction network, so that a structural feature extraction network in the feature extraction model extracts second structural features corresponding to the template image, and a style feature extraction network extracts second style features corresponding to the template image, thereby improving feature extraction efficiency, avoiding the use of a single network to extract features, obtaining multiple features, and improving feature comprehensiveness.

It can be understood that the structural feature extraction network is obtained by training a plurality of first training sample pairs, the style feature extraction network is obtained by training a plurality of second training sample pairs, the first training sample pairs comprise two samples with the same composition structure, and the second training sample pairs comprise two samples with the same external form, so that the trained structural feature extraction network can extract the same structural feature no matter whether the styles are the same or not when the structures of the targets are the same, and the trained style feature extraction network can extract the same style feature no matter whether the structures are the same or not when the styles of the targets are the same.

In an implementation scenario, the style feature extraction network includes a preset number of convolution layers and at least one residual network layer that are cascaded in sequence, the structural feature extraction network includes a preset number of convolution layer groups, and each convolution layer group includes at least two convolution layers.

Specifically, the style feature extraction network includes a preset number of convolution layers, the structural feature extraction network includes a preset number of convolution layers, and each convolution layer group includes at least two convolution layers, so that the total number of convolution layers of the structural feature extraction network is greater than the number of convolution layers of the style feature extraction network, so that the structural feature extraction network can extract deeper features, and the style feature extraction network can extract abbreviated features and enhance the extraction of shallow features through the residual network layers.

Further, the style features characterize the external form corresponding to the object, the structure features characterize the composition structure corresponding to the object, for example, when the object is a character, the external form of the character corresponds to the font style of the character, the composition structure of the character corresponds to the character strokes of the character, the font style is an abstract concept, and the character strokes are an imaged concept, so that the style features need to be analyzed based on shallow features to reduce the probability that the feature overfitting cannot be obtained, and the structure features need to be analyzed based on depth features to visualize the structure features. Therefore, through the structural design, the style characteristics extracted by the style characteristic extraction network can be more accurate, and the structural characteristics extracted by the structural characteristic extraction network are more accurate.

Optionally, the number of channels of the convolution layers in the cascade connection in the style feature extraction network is increased, the number of channels of the convolution layers in the same convolution layer group in the structural feature extraction network is the same, the number of channels corresponding to the cascade convolution layer group is increased, and the number of the convolution layers in the convolution layer group is positively correlated with the number of channels corresponding to the convolution layer group.

Specifically, the number of channels of the convolution layers in cascade connection in the style feature extraction network is increased, the number of channels corresponding to the convolution layers in cascade connection in the structure feature extraction network is increased, so that the style feature extraction network and the structure feature extraction network can each output more sub-features layer by layer, the number of channels of the convolution layers in the same convolution layer group is the same, so that the output in each convolution layer group is kept consistent, the number of the convolution layers in the convolution layer group is positively correlated with the number of the channels corresponding to the convolution layer group, and when the number of the channels corresponding to the convolution layer group is increased, the number of the convolution layers in the convolution layer group is increased, and the accuracy of the convolution layer group in outputting the sub-features of the current channel number is improved.

It should be noted that, the convolution kernel size of the first convolution layer in the style feature extraction network is larger than the convolution kernel sizes of the other convolution layers, the sliding step size of the first convolution layer is smaller than the sliding step sizes of the other convolution layers, the convolution kernel sizes of the convolution layers in all convolution layer groups are the same, and the sliding step size of the first convolution layer in the same convolution layer group is smaller than the sliding step size of the last convolution layer.

Specifically, the convolution kernel size of a first convolution layer in the style feature extraction network is larger than the convolution kernel sizes of other convolution layers in the style feature extraction network, so that features of a target are extracted coarsely in the first convolution layer of the style feature extraction network, the sliding step length of the first convolution layer is smaller than that of the other convolution layers, the sliding step length is matched with the convolution kernel size, when the convolution kernel size is larger, features are extracted from the target by using the smaller sliding step length, when the convolution kernel size is smaller, features are extracted from the target by using the larger sliding step length, the rationality of feature extraction is improved, and the requirements of the style feature extraction network for extracting shallow features are adapted.

Further, all convolution kernels in the structural feature extraction network have the same size so as to keep the precision of features extracted by the convolution kernels consistent, and the sliding step length of the first convolution layer in the same convolution layer group is smaller than that of the last convolution layer, so that a relatively precise feature extraction process is performed at least once in the first convolution layer in the same convolution layer group, and a relatively rough feature extraction process is performed at least once in the last convolution layer in the convolution layer group, thereby compensating the features output by the previous convolution layers.

In a specific embodiment, the preset value is four, the style feature extraction network includes a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer which are sequentially cascaded, and a residual error network layer which is cascaded after the fourth convolution layer, wherein the number of channels in the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer exponentially increases, the convolution kernel size of the first convolution layer is greater than the convolution sizes of the second convolution layer, the third convolution layer and the fourth convolution layer, the sliding step size of the first convolution layer is smaller than the sliding step sizes of the second convolution layer, the third convolution layer and the fourth convolution layer, and the convolution kernel sizes and the sliding step sizes of the second convolution layer, the third convolution layer and the fourth convolution layer are the same, so that the composition form of the style feature extraction network is realized, and more accurate style features are conveniently extracted.

Further, the structural feature network comprises a first convolution layer group, a second convolution layer group, a third convolution layer group and a fourth convolution layer group which are sequentially cascaded, wherein the number of channels of the first convolution layer group, the second convolution layer group, the third convolution layer group and the fourth convolution layer group is exponentially increased, and the sizes of convolution kernels are the same. The number of the convolution layers in the first convolution layer group and the second convolution layer group is the same, the number of the convolution layers in the third convolution layer group is larger than the number of the convolution layers in the second convolution layer group, and the number of the convolution layers in the fourth convolution layer group is larger than the number of the convolution layers in the third convolution layer group, so that the composition form of the structural feature extraction network is realized, and more accurate structural features are conveniently extracted.

It can be understood that the preset values corresponding to the style feature network and the structural feature network may be other customized values, and the number of residual network layers in the style feature network and the number of convolution layers in the convolution layer group in the structural feature network may be set to other composition forms based on the above relationship, which is not particularly limited in the present application.

In a specific implementation scenario, referring to fig. 3 and fig. 4, fig. 3 is a schematic structural diagram of an embodiment of the style feature extraction network according to the present application, fig. 4 is a schematic structural diagram of an embodiment of the style feature extraction network according to the present application, and taking a preset number of four as an example, as shown in fig. 3, a convolution kernel size of a first convolution layer is 7*7, a channel number is 64, a sliding step size is 1, a convolution kernel size of a second convolution layer is 3*3, a channel number is 128, a sliding step size is 2, a convolution kernel size of a third convolution layer is 3*3, a channel number is 256, a sliding step size is 2, a convolution kernel size of a fourth convolution layer is 3*3, a channel number is 128, a sliding step size is 2, and a convolution layer cascade with a fourth convolution layer includes a plurality of residual network layers. As shown in fig. 4, the convolution kernel sizes of all the convolution layers in the structural feature network are 3*3, one convolution layer group corresponds to the same dashed line frame, the number of channels of the first convolution layer group is 64, the number of channels of the second convolution layer group is 128, the number of channels of the third convolution layer group is 256, the number of channels of the fourth convolution layer group is 512, the number of convolution layers in the first convolution layer group and the second convolution layer group is 2, the number of convolution layers in the third convolution layer group is 3, the number of convolution layers in the fourth convolution layer group is 6, 13 convolution layers are altogether, wherein the first two convolution layers in the third convolution layer group are the same, the sliding step length is 1, the last convolution layer sliding step length is 2, the fourth convolution layer group comprises two identical substructures, each substructure comprises three convolution layers, wherein the first two convolution layers in the substructures are the same, the sliding step length is 1, and the last convolution layer sliding step length is 2, so that when the number of channels is gradually increased, the accuracy of feature extraction is improved through the increase of the convolution layer composition mode.

It should be noted that, the first training sample pair is matched with a structural label, the second training sample pair is matched with a style label, and the training process of the feature extraction model includes: cascading the structural feature extraction network and the corresponding structural classification network to obtain a structural classification model, cascading the style feature extraction network and the corresponding style classification network to obtain a style classification model; training a structural classification model by using the first training sample pair and the matched structural label thereof, and training a style classification model by using the second training sample pair and the matched style label thereof until a first convergence condition is met, so as to obtain a pre-trained structural feature extraction network and a pre-trained style feature extraction network; and training the pre-trained structural feature extraction network by using the first training sample, and training the pre-trained style feature extraction network by using the second training sample until the second convergence condition is met, so as to obtain a trained feature extraction model.

Specifically, the structural feature extraction network and the style feature extraction network are pre-trained, the structural feature extraction network and the structural classification network are cascaded to obtain a structural classification model, a first training sample pair is input into the structural classification model, so that the structural classification network outputs a predicted structural category corresponding to a sample in the first training sample pair, the predicted structural category is compared with a structural label of the first training sample pair, and a first pre-training loss is determined, so that the structural feature extraction network is adjusted based on the first pre-training loss.

And similarly, cascading the style feature extraction network with the style classification network to obtain a style classification model, inputting a second training sample pair into the style classification model so that the style classification network outputs a predicted style category corresponding to a sample in the second training sample pair, comparing the predicted style category with a style label of the second training sample pair, and determining a second pre-training loss, thereby adjusting the style feature extraction network based on the second pre-training loss until the first convergence condition is met, obtaining a pre-trained structural feature extraction network and a style feature extraction network, so that the pre-trained structural feature extraction network has feature extraction capability, the extracted features are matched with the composition structure of the target, the pre-trained style feature extraction network has feature extraction capability, the extracted features are matched with the external form of the target, and the training efficiency and the training precision are improved.

Optionally, the first convergence condition includes that the first pre-training loss and the second pre-training loss are both smaller than a first loss threshold, or the number of loops is greater than a first number threshold and the first pre-training loss and the second pre-training loss are each smaller than a respective corresponding loss threshold.

In a specific application scenario, the structure classification network and the style classification network correspond to the same structure, and include a maximum pooling layer, three full-connection layers and an output layer, so as to integrate the extracted features and output the category with the highest probability corresponding to the integrated features. In other specific application scenarios, the structure classification network and the style classification network may also correspond to other component structures, and the structure classification network and the style classification network may also correspond to different structures, which is not particularly limited in the present application.

Further, after the pre-trained structural feature extraction network and the style feature extraction network are obtained, a first training sample pair is input into the pre-trained structural feature extraction network, and prediction structural features corresponding to two samples in the first training sample pair are obtained, so that the prediction structural features corresponding to the two samples are compared, a first prediction loss is determined, and the structural feature extraction network is adjusted based on the first prediction loss.

Optionally, determining the prediction structural features corresponding to the two samples based on the euclidean distance to obtain a first prediction loss, and determining the comparison process to be expressed by using a formula as follows:

wherein C represents the number of channels corresponding to the feature, M ₁ (c) And M ₂ (c) Representing the elements in the predicted structural features corresponding to each of the two samples in the first pair of samples.

It can be understood that the training process of the pre-trained style feature extraction network is similar to that of the structural feature extraction network, and the second training sample pair is input into the pre-trained style feature extraction network to obtain the prediction style features corresponding to the two samples in the second training sample pair, so that the prediction style features corresponding to the two samples are compared, the second prediction loss is determined, and the style feature extraction network is adjusted based on the second prediction loss.

Optionally, the second convergence condition includes that the first predicted loss and the second predicted loss are both smaller than the second loss threshold, or the number of loops is greater than the second number of times threshold and the first predicted loss and the second predicted loss are each smaller than the respective corresponding loss threshold.

Further, the training-completed structural feature extraction network has the capability of accurately extracting structural features of the target, the accurate structural features can be still extracted when the target transforms various component structures, the training-completed style feature extraction network has the capability of accurately extracting style features of the target, and the accurate style features can be still extracted when the target is newly added with other external forms.

S202: and fusing the characteristics of the multiple dimensions corresponding to the copy image to obtain a first fused characteristic, and fusing the characteristics of the multiple dimensions corresponding to the template image to obtain a second fused characteristic.

Specifically, the features of the multiple dimensions corresponding to the copy image are fused to obtain a first fusion feature, so that the accuracy of the representation of the first fusion feature is improved, the features of the multiple dimensions corresponding to the template image are fused to obtain a second fusion feature, and the accuracy of the representation of the second fusion feature is improved.

It can be understood that when the features of the multiple dimensions corresponding to the copy image are the first structural feature and the first style feature, the first structural feature and the first style feature are fused to obtain a first fused feature, and when the features of the multiple dimensions corresponding to the template image are the second structural feature and the second style feature, the second structural feature and the second style feature are fused to obtain a second fused feature.

In an application scene, inputting a first structural feature and a first style feature corresponding to a copy image into a feature fusion module to obtain a first fusion feature, and inputting a second structural feature and a second style feature corresponding to a template image into a feature fusion model to obtain a second fusion feature; the feature fusion model comprises a bilinear attention network, and the number of channels corresponding to the first fusion feature and the second fusion feature is the same. The feature fusion model fuses the features of multiple dimensions through a bilinear attention network, so that the precision of feature fusion is improved.

S203: and obtaining a score corresponding to the target in the copy image based on the first fusion feature and the second fusion feature.

Specifically, the first fusion feature and the second fusion feature are compared, so that the similarity between the first fusion feature and the second fusion feature is determined, and the score corresponding to the target in the copy image is determined by using the similarity.

In an application scene, comparing the first fusion feature with the second fusion feature to obtain the Euclidean distance between the first fusion feature and the second fusion feature, wherein the process is identified as follows by using a formula:

wherein C represents the number of channels corresponding to the feature, f ₁ (c) Representing elements in the first fused feature, f ₂ (c) Watch (watch)Elements in the second fusion feature are shown.

Further, based on the Euclidean distance, the similarity of the first fusion feature and the second fusion feature is determined, and the above process is expressed as follows by using a formula:

wherein e is a natural constant, a and b are weight values, the adjustment can be carried out according to actual conditions, and the default condition a is equal to 1, and b is equal to 0. And obtaining the similarity value with the interval of 0 to 1 through the calculation.

Optionally, the objects in the copy image and the template image are characters, the composition structure of the characters corresponds to the character strokes of the characters, and the external form of the characters corresponds to the font style of the characters, so that the method is suitable for the handwriting copy scene, and the handwriting copy scoring efficiency and accuracy are improved.

In the embodiment, the feature extraction model is used for extracting the features of multiple dimensions of the copy image and the template image, wherein the feature extraction model comprises a structural feature extraction network and a style feature extraction network, the feature extraction is avoided by using a single network, multiple features are obtained, the comprehensiveness of the features is improved, the structural feature extraction network and the style feature extraction network correspond to each composition form so as to adapt to the features required to be extracted, the style feature extracted by the style feature extraction network is enabled to be more accurate, the structural feature extracted by the structural feature extraction network is enabled to be more accurate, the features of multiple dimensions corresponding to the copy image are fused to obtain a first fusion feature, the accuracy of the representation of the first fusion feature is improved, the features of multiple dimensions corresponding to the template image are fused to obtain a second fusion feature, the accuracy of the representation of the second fusion feature is improved, the scores corresponding to the targets in the copy image are determined by comparing the first fusion feature and the second fusion feature, and the accuracy of the copy scores is improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, the electronic device 50 includes a memory 501 and a processor 502 coupled to each other, wherein the memory 501 stores program data (not shown), and the processor 502 invokes the program data to implement the method in any of the above embodiments, and the description of the related content is referred to the detailed description of the above method embodiments and is not repeated herein.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a computer readable storage medium 60 according to the present application, where the computer readable storage medium 60 stores program data 600, and when the program data 600 is executed by a processor, the method in any of the above embodiments is implemented, and details of the related content are described in the above embodiments, which are not repeated herein.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. A method of target evaluation, the method comprising:

obtaining a copy image and a template image, and respectively extracting characteristics of the copy image and the template image in multiple dimensions; the characteristics of the multiple dimensions at least comprise structural characteristics and style characteristics, wherein the structural characteristics represent a composition structure corresponding to a target in the image, and the style characteristics represent an external form corresponding to the target in the image;

and determining scores corresponding to targets in the copy image based on the characteristics of the plurality of dimensions corresponding to the copy image and the characteristics of the plurality of dimensions corresponding to the template image.

2. The method according to claim 1, wherein the extracting features of the copy image and the template image in a plurality of dimensions, respectively, includes:

respectively inputting the copy image and the template image into a feature extraction model to obtain a first structural feature and a first style feature corresponding to the copy image and a second structural feature and a second style feature corresponding to the template image;

the feature extraction model comprises a structural feature extraction network for extracting structural features and a style feature extraction network for extracting style features, wherein the structural feature extraction network is trained by using a plurality of first training sample pairs, the style feature extraction network is trained by using a plurality of second training sample pairs, the first training sample pairs comprise samples with two identical composition structures, and the second training sample pairs comprise samples with two identical external forms.

3. The method of claim 2, wherein the style feature extraction network comprises a predetermined number of convolutional layers and at least one residual network layer in cascade, the structural feature extraction network comprises the predetermined number of convolutional layer groups, and each of the convolutional layer groups comprises at least two convolutional layers.

4. The method for evaluating a target according to claim 3, wherein the number of channels of the convolution layers in cascade in the style feature extraction network is increased, the number of channels of the convolution layers in the same convolution layer group in the structural feature extraction network is the same, the number of channels corresponding to the cascade convolution layer group is increased, and the number of convolution layers in the convolution layer group is positively correlated with the number of channels corresponding to the convolution layer group.

5. The method according to claim 3 or 4, wherein the convolution kernel size of the first convolution layer in the style feature extraction network is larger than the convolution kernel sizes of the other convolution layers, the sliding step size of the first convolution layer is smaller than the sliding step sizes of the other convolution layers, the convolution kernel sizes of the convolution layers in all the convolution layer groups are the same, and the sliding step size of the first convolution layer in the same convolution layer group is smaller than the sliding step size of the last convolution layer.

6. The method of claim 2, wherein the first training sample pair matches a structural tag and the second training sample pair matches a style tag, and wherein the training process of the feature extraction model comprises:

cascading the structural feature extraction network and the corresponding structural classification network to obtain a structural classification model, cascading the style feature extraction network and the corresponding style classification network to obtain a style classification model;

training the structural classification model by using the first training sample pair and the matched structural label, and training the style classification model by using the second training sample pair and the matched style label until a first convergence condition is met, so as to obtain the pre-trained structural feature extraction network and the pre-trained style feature extraction network;

and training the pre-trained structural feature extraction network by using the first training sample, and training the pre-trained style feature extraction network by using the second training sample until a second convergence condition is met, so as to obtain the trained feature extraction model.

7. The method of claim 1, wherein determining the score corresponding to the object in the copy image based on the feature of the plurality of dimensions corresponding to the copy image and the feature of the plurality of dimensions corresponding to the template image comprises:

fusing the features of the multiple dimensions corresponding to the copy image to obtain a first fused feature, and fusing the features of the multiple dimensions corresponding to the template image to obtain a second fused feature;

and obtaining a score corresponding to the target in the copy image based on the first fusion feature and the second fusion feature.

8. The method according to any one of claims 1 to 7, wherein the object in the copy image and the template image is a character, the constituent structure of the character corresponds to a character stroke of the character, and the external form of the character corresponds to a font style of the character.

9. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor invokes to perform the method of any of claims 1-8.

10. A computer readable storage medium having stored thereon program data, which when executed by a processor, implements the method of any of claims 1-8.