CN111079785A

CN111079785A - An image recognition method, device and terminal device

Info

Publication number: CN111079785A
Application number: CN201911097616.4A
Authority: CN
Inventors: 宋方良
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-11-11
Filing date: 2019-11-11
Publication date: 2020-04-28

Abstract

The present application is applicable to the technical field of image processing, and provides an image recognition method, device and terminal equipment. The method includes: acquiring a sample training set; performing a feature extraction operation, and inputting the sample image group in the sample training set into a twin neural network , obtain the first eigenvector and the second eigenvector, the first eigenvector and the second eigenvector are normalized eigenvectors; The loss function calculates the loss value; if the loss value is greater than the preset loss threshold and the number of iterations is less than the preset number of iterations, update the twin neural network according to the loss value, add 1 to the number of iterations, and return to the feature extraction operation; if the loss value is less than or equal to the preset loss The threshold, or the number of iterations is greater than or equal to the preset number of iterations, and the Siamese neural network is used to perform image recognition on the image group to be processed. The present application can solve the problem of low accuracy of the existing image recognition methods.

Description

Image identification method and device and terminal equipment

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image recognition method, an image recognition device, and a terminal device.

Background

In the current image processing field, image recognition tasks such as face recognition, pedestrian re-recognition and the like belong to the problem of fine-grained image classification, the number of classes is large, the difference between different face images and pedestrian images is small, and the recognition difficulty is high.

The current image recognition method still has the problem of low accuracy when processing the fine-grained image classification problem. People actively explore various image recognition schemes, hopefully, the distance between the image features of the same person or object can be shortened, the distance between the image features of different persons or objects can be lengthened, and the fine-grained image classification problems can be processed with higher precision. Therefore, how to improve the accuracy of the current image recognition method becomes a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of this, embodiments of the present application provide an image recognition method, an image recognition device, and a terminal device, so as to solve the problem that an existing image recognition method is low in accuracy.

A first aspect of an embodiment of the present application provides an image recognition method, including:

acquiring a sample training set, wherein the training sample set comprises at least one group of sample image groups and sample marks corresponding to the sample image groups, and each group of sample image groups comprises two sample images;

executing a feature extraction operation, wherein the feature extraction operation comprises inputting the sample image group in the sample training set into a twin neural network to obtain a first feature vector and a second feature vector corresponding to the sample image group, and the first feature vector and the second feature vector are normalized feature vectors;

calculating a loss value according to the first feature vector, the second feature vector, the sample mark and a cosine distance-based contrast loss function;

if the loss value is larger than a preset loss threshold value and the iteration times are smaller than a preset iteration time, updating the twin neural network according to the loss value, adding 1 to the iteration times, and returning to the feature extraction operation;

and if the loss value is less than or equal to a preset loss threshold value or the iteration times are more than or equal to preset iteration times, using the twin neural network to perform image recognition on the image group to be processed.

A second aspect of an embodiment of the present application provides an image recognition apparatus, including:

the training sample module is used for acquiring a sample training set, wherein the training sample set comprises at least one group of sample image groups and sample marks corresponding to the sample image groups, and each group of sample image groups comprises two sample images;

the feature extraction module is used for executing feature extraction operation, wherein the feature extraction operation comprises the step of inputting the sample image group in the sample training set into a twin neural network to obtain a first feature vector and a second feature vector corresponding to the sample image group, and the first feature vector and the second feature vector are normalized feature vectors;

a loss calculation module for calculating a loss value according to the first feature vector, the second feature vector, the sample mark and a cosine distance-based contrast loss function;

the model updating module is used for updating the twin neural network according to the loss value if the loss value is larger than a preset loss threshold value and the iteration times are smaller than preset iteration times, and triggering the feature extraction module when the iteration times are increased by 1;

and the image identification module is used for identifying the image of the image group to be processed by using the twin neural network if the loss value is less than or equal to a preset loss threshold value or the iteration times are more than or equal to preset iteration times.

A third aspect of the embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the steps of the method as described above.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to implement the steps of the method as described above.

Compared with the prior art, the embodiment of the application has the advantages that:

according to the image identification method, when the twin neural network is adopted to extract the features of the image group, the feature vectors are normalized, the normalization of the feature vectors can enable the obtained depth image features to be more separated, and the image identification effect is improved. Moreover, after the feature vectors are normalized, the feature vectors are mapped to a spherical surface of a hypersphere instead of a Euclidean space, so that the method uses a cosine distance-based contrast loss function to replace the Euclidean distance-based contrast loss function, can better measure the loss value between the normalized feature vectors, better train the twin neural network, improve the image recognition accuracy of the twin neural network, and solve the problem of low accuracy of the existing image recognition method.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 3 is a schematic diagram of a terminal device provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a variation of loss value provided by an embodiment of the present application;

fig. 5 is a schematic diagram of another loss value variation provided in the embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In addition, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.

The first embodiment is as follows:

referring to fig. 1, an image recognition method according to a first embodiment of the present application is described as follows, where the image recognition method according to the first embodiment of the present application includes:

s101, obtaining a sample training set, wherein the training sample set comprises at least one group of sample image groups and sample marks corresponding to the sample image groups, and each group of sample image groups comprises two sample images;

when the neural network is used for image recognition, the type selection, the model setting and the model training of the neural network are closely related to the recognition accuracy of the neural network.

In the image recognition method of the present embodiment, a twin neural network is selected as the neural network for image recognition. The twin neural network is a conjoined neural network, the twin neural network is provided with two input ends, the similarity degree of the two input ends can be measured, and good identification effect can be obtained in fine-grained image identification scenes such as face identification, pedestrian re-identification and the like.

After the twin neural network is selected, the twin neural network needs to be trained. In the training process, a training sample set is obtained first. Because the twin neural network has two input ends, the training sample set comprises at least one group of sample image groups and sample marks corresponding to the sample image groups, one group of sample image groups comprises two sample images, and the two sample images are respectively input into the two input ends of the twin neural network.

The sample flag is used to identify whether two images in the sample image group are similar images.

Step S102, inputting the sample image group in the sample training set into a twin neural network to obtain a first feature vector and a second feature vector corresponding to the sample image group, wherein the first feature vector and the second feature vector are normalized feature vectors;

after the training sample set is obtained, feature extraction operation can be performed, and sample images in each sample image group in sample training are respectively input into the twin neural network.

The twin neural network can extract a first feature vector and a second feature vector corresponding to each sample image group in the sample training set. One sample image group comprises two sample images, and one sample image corresponds to one feature vector. The first feature vector and the second feature vector are normalized feature vectors.

The feature vector is normalized, so that the features of the depth image can be more separated, and the recognition effect of the subsequent image recognition process can be improved.

Step S103, calculating a loss value according to the first characteristic vector, the second characteristic vector, the sample mark and a contrast loss function based on cosine distance;

after the first feature vector and the second feature vector corresponding to each sample image in the sample training set are extracted, a loss value can be calculated according to the first feature vector, the second feature vector, the sample mark and a cosine distance-based contrast loss function.

In some possible implementations, the calculating a loss value according to the first feature vector, the second feature vector, the sample label, and a cosine distance-based contrast loss function includes:

a1, calculating cosine distances of the first feature vector and the second feature vector;

a2, if the sample marks that two images in the sample image group are similar images, taking the square of the cosine distance as a loss value;

a3, if the sample marks that the two images in the sample image group are non-similar images, calculating a difference value between a preset distance threshold and the cosine distance, and taking the square of the maximum value between 0 and the difference value as the loss value.

The expression of the cosine distance based contrast loss function can be expressed as:

wherein L represents a cosine-distance-based contrast loss function, f1 represents a first feature vector, f2 represents a second feature vector, < f1, f2> represents an angle between the first feature vector and the second feature vector, cos (< f1, f2>) represents a cosine distance between the first feature vector and the second feature vector, m is a preset distance threshold, Y is a sample mark corresponding to the sample image group, Y ═ 0 represents that two sample images in the sample image group are non-similar images, Y ═ 1 represents that two sample images in the sample image group are similar images, and max () is a maximum value taking function.

In general, if a twin neural network is applied to the image recognition field, a euclidean distance-based contrast loss function is usually selected for calculating the loss value, and it is difficult to measure the loss value between feature vectors well using a cosine distance-based contrast loss function.

However, in the image recognition method of the present embodiment, in the course of the twin neural network performing the feature extraction operation, the extracted feature vector is subjected to normalization processing. After the feature vector is normalized, the feature vector does not exist in the Euclidean space, but exists on the spherical surface of a hypersphere. At the moment, the loss value between the characteristic vectors cannot be measured well by using the Euclidean distance, the loss value between the characteristic vectors can be measured better by using a contrast loss function based on the cosine distance, the updating process of network parameters in the twin neural network is improved, and the identification precision of the twin neural network is effectively improved.

Taking fig. 4 and 5 as an example, the ordinate in fig. 4 and 5 represents the loss values of the two sample images in the sample image group, and the abscissa represents the cosine value between the feature vectors of the two sample images in the sample image group.

When the normalized feature vector is used, and then the contrast loss function based on the euclidean distance is used, the loss value of the positive sample image group changes as indicated by a line 1, and the loss value of the negative sample image group changes as indicated by a line 2. For a positive sample image group, the derivative of the loss value with respect to the cosine value is a constant, that is, the length of the gradient generated by the difficult sample image group (the difficult sample image group indicates that the difficulty of identifying two sample images in the sample image group is high, for example, different face images belonging to the same person but having a large difference in appearance are regarded as the difficult sample image group) and the gradient generated by the simple sample image group (the simple sample image group indicates that the difficulty of identifying two sample images in the sample image group is low, for example, different face images belonging to the same person and having a similar scene are regarded as the simple sample image group) are the same, which is obviously unreasonable.

When the cosine distance-based contrast loss function is used, the loss value of the positive sample image group changes as indicated by a line 3, and the loss value of the negative sample image group changes as indicated by a line 4. When the cosine distance-based loss function is used, whether the image group of positive samples or the image group of negative samples, the curve of the loss value relative to the cosine value is a strict convex function, and the gradient length generated by the image group of difficult samples is longer than that generated by the image group of easy samples.

In summary, when the normalized feature vector is used, the loss value between the feature vectors can be better measured by using the cosine-distance-based contrast loss function compared with the Euclidean-distance-based contrast loss function, the training process of the twin neural network can be effectively improved, and the identification precision of the twin neural network is improved.

Step S104, if the loss value is larger than a preset loss threshold value and the iteration number is smaller than a preset iteration number, updating the twin neural network according to the loss value, adding 1 to the iteration number, and returning to the step S102;

in the process of twin neural network training, the goal of the training is to make the loss function as small as possible. Therefore, after calculating the loss value according to the first feature vector, the second feature vector, the sample mark and the cosine distance-based contrast loss function, if the loss value is greater than the preset loss threshold and the iteration number is less than the preset iteration number, it indicates that the training of the twin neural network should be continued.

At this time, the network parameters of the twin neural network may be updated according to the loss values, and the updated algorithm may be selected according to the actual situation. For example, a model updating algorithm such as a batch gradient descent algorithm, a random gradient descent algorithm, or a small batch gradient descent algorithm may be selected to update the network parameters of the twin neural network.

After updating the network parameters of the twin neural network, the method may return to step S102 to re-execute the feature extraction operation and start the next round of iterative training process.

And S105, if the loss value is less than or equal to a preset loss threshold value or the iteration times are more than or equal to preset iteration times, using the twin neural network to perform image recognition on the image group to be processed.

If the loss value is less than or equal to the preset loss threshold value, or the iteration number is greater than or equal to the preset iteration number, the model training target is reached, or the upper limit of the training number is reached. At this time, the process of training iteration may be stopped, and the image group to be processed may be subjected to image recognition using the trained twin neural network.

In the process of training the twin neural network, the sample training set may have the problem of insufficient number of sample image groups, the insufficient number of sample image groups may cause poor training effect of the twin neural network, and the accuracy of image recognition is reduced.

Therefore, in some possible implementations, after the sample training set is obtained, a preset number of sample image groups in the sample training set may be selected, and at least one sample image in the sample image groups may be subjected to morphological transformation (i.e., one sample image in the sample image groups is subjected to morphological transformation, or both sample images in the sample image groups are subjected to morphological transformation).

The morphological transformation may include one or more of cropping, rotating, flipping, brightness adjustment, sharpening, adding noise. When both of the two sample images in the sample image group are subjected to morphological transformation, the two sample images may be selected by the same morphological transformation method or different morphological transformation methods.

After morphological transformation is carried out on at least one sample image in the sample images, a new sample image group can be obtained, and the new sample image group is added to the sample training set, so that the number of the sample image groups in the sample training set is expanded.

In addition, the sample training set is expanded by using the image morphology transformation mode, so that the number of sample training groups can be increased, and the anti-interference performance of the twin neural network can be improved. For example, if the sample images in the sample training set are all face images with sufficient brightness and clear images during the training process, an erroneous conclusion may be drawn when the twin neural network is used to identify the face images with different brightness of the same person after the training is completed. Therefore, in the training process, the sample training set is expanded by using the image form transformation mode, the diversity of the sample images in the sample training set can be improved, and the anti-interference performance of the twin neural network is further improved.

In some possible implementation manners, the performing, by using the twin neural network, image recognition on the to-be-processed image group if the loss value is less than or equal to a preset loss threshold or the iteration number is greater than or equal to a preset iteration number includes:

b1, if the loss value is less than or equal to a preset loss threshold value, or the iteration times are more than or equal to preset iteration times, testing the twin neural network by using a sample test set, and calculating the accuracy of the twin neural network, wherein the sample test set comprises at least one test image group and sample marks corresponding to the test image group, and each test image group comprises two test images;

after the twin neural network is trained, the accuracy of the twin neural network can be tested.

And acquiring a sample test set, wherein the sample test set comprises at least one test image group and sample marks corresponding to the test image group, and each test image group comprises two test images.

Inputting the test image group with the sample test concentration into the trained twin neural network, and testing the twin neural network. The twin neural network outputs the identification result of each test image group, and if the identification result is consistent with the corresponding sample mark, the identification result is accurate; if the recognition result is not consistent with the corresponding sample mark, the recognition result is indicated to be wrong. And counting the accuracy of the identification result corresponding to each test image group to obtain the accuracy of the twin neural network.

B2, if the accuracy of the twin neural network is lower than a preset accuracy threshold, replacing the sample training set, and returning to the feature extraction operation of the step S102;

if the accuracy of the twin neural network is lower than the preset accuracy threshold, the recognition accuracy of the twin neural network is low, and the twin neural network needs to be retrained. At this time, the new sample training set may be replaced, and the procedure returns to step S102 to repeat the iterative process of training the neural network.

And B3, if the accuracy of the twin neural network is higher than or equal to a preset accuracy threshold, performing image recognition on the image group to be processed by using the twin neural network.

If the accuracy of the twin neural network is higher than or equal to the preset accuracy threshold, the recognition accuracy of the twin neural network meets the requirements of the user, and the twin neural network can be used for carrying out image recognition on the image group to be processed.

In the image recognition method provided by this embodiment, when the features of the image group are extracted by using the twin neural network, the feature vectors are normalized, and the normalization of the feature vectors can separate the features of the obtained depth image more, thereby improving the image recognition effect. Moreover, after the feature vectors are normalized, the feature vectors are mapped to a spherical surface of a hypersphere instead of a Euclidean space, so that the method uses a cosine distance-based contrast loss function to replace the Euclidean distance-based contrast loss function, can better measure the loss value between the normalized feature vectors, better train the twin neural network, improve the image recognition accuracy of the twin neural network, and solve the problem of low accuracy of the existing image recognition method.

After the sample training set is obtained, the sample training set can be expanded in a form transformation mode, the number of sample image groups in the sample training set can be increased, and the anti-interference performance of the twin neural network can be improved.

After the twin neural network training is completed, the twin neural network can be tested by using the sample testing set. If the accuracy of the twin neural network is lower than a preset accuracy threshold, replacing the sample training set, and retraining the twin neural network; and if the accuracy of the twin neural network is higher than or equal to a preset accuracy threshold, performing image recognition on the image group to be processed by using the twin neural network. The twin neural network is tested through the sample testing set, so that the identification precision of the twin neural network can be ensured, and the image identification of the image to be processed can be more accurately carried out.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Example two:

the second embodiment of the present application provides an image recognition apparatus, only the part related to the present application is shown for convenience of description, and as shown in fig. 2, the image recognition apparatus includes,

a training sample module 201, configured to obtain a sample training set, where the training sample set includes at least one group of sample image groups and sample labels corresponding to the sample image groups, and each group of sample image groups includes two sample images;

a feature extraction module 202, configured to perform a feature extraction operation, where the feature extraction operation includes inputting the sample image group in the sample training set into a twin neural network to obtain a first feature vector and a second feature vector corresponding to the sample image group, where the first feature vector and the second feature vector are normalized feature vectors;

a loss calculation module 203, configured to calculate a loss value according to the first feature vector, the second feature vector, the sample label, and a cosine distance-based contrast loss function;

a model updating module 204, configured to update the twin neural network according to the loss value if the loss value is greater than a preset loss threshold and the iteration number is less than a preset iteration number, where the iteration number is increased by 1, and trigger the feature extraction module;

and the image identification module 205 is configured to perform image identification on the image group to be processed by using the twin neural network if the loss value is less than or equal to a preset loss threshold value or the iteration number is greater than or equal to a preset iteration number.

Further, the loss calculation module 203 includes:

a cosine distance sub-module, configured to calculate cosine distances of the first eigenvector and the second eigenvector;

a first loss submodule, configured to take a square of the cosine distance as a loss value if the sample flag indicates that two images in the sample image group are similar images;

and the second loss submodule is used for calculating the difference between a preset distance threshold and the cosine distance if the sample marks that the two images in the sample image group are non-similar images, and taking the square of the maximum value between 0 and the difference as the loss value.

Further, the apparatus further comprises:

and the sample expansion module is used for selecting a preset number of sample image groups in the sample training set, performing morphological transformation on at least one sample image in the sample image groups to obtain a new sample image group, and adding the new sample image group to the sample training set.

Further, the morphological transformation includes one or more of cropping, rotating, flipping, brightness adjustment, sharpening, adding noise.

Further, the image recognition module 205 includes:

the testing submodule is used for testing the twin neural network by using a sample testing set if the loss value is less than or equal to a preset loss threshold value or the iteration times are more than or equal to preset iteration times, and calculating the accuracy of the twin neural network, wherein the sample testing set comprises at least one testing image group and sample marks corresponding to the testing image group, and each testing image group comprises two testing images;

the resetting submodule is used for replacing the sample training set and triggering the feature extraction module if the accuracy of the twin neural network is lower than a preset accuracy threshold;

and the recognition sub-module is used for recognizing the image of the image group to be processed by using the twin neural network if the accuracy of the twin neural network is higher than or equal to a preset accuracy threshold.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Example three:

fig. 3 is a schematic diagram of a terminal device provided in the third embodiment of the present application. As shown in fig. 3, the terminal device 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, implements the steps in the above-described embodiment of the image recognition method, such as the steps S101 to S105 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 32, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 201 to 205 shown in fig. 2.

Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 32 in the terminal device 3. For example, the computer program 32 may be divided into a training sample module, a feature extraction module, a loss calculation module, a model update module, and an image recognition module, each of which has the following specific functions:

The terminal device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 30, a memory 31. It will be understood by those skilled in the art that fig. 3 is only an example of the terminal device 3, and does not constitute a limitation to the terminal device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the terminal device may also include an input-output device, a network access device, a bus, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the terminal device 3, such as a hard disk or a memory of the terminal device 3. The memory 31 may also be an external storage device of the terminal device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the terminal device 3. The memory 31 is used for storing the computer program and other programs and data required by the terminal device. The memory 31 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image recognition method, comprising:

2. The image recognition method of claim 1, wherein the calculating a loss value from the first feature vector, the second feature vector, the sample token, and a cosine distance-based contrast loss function comprises:

calculating cosine distances of the first feature vector and the second feature vector;

if the sample marks that the two images in the sample image group are similar images, taking the square of the cosine distance as a loss value;

if the sample marks are two images in the sample image group which are non-similar images, calculating a difference value between a preset distance threshold and the cosine distance, and taking the square of the maximum value between 0 and the difference value as the loss value.

3. The image recognition method of claim 1, after the obtaining the training set of samples, further comprising:

selecting a preset number of sample image groups in the sample training set, performing morphological transformation on at least one sample image in the sample image groups to obtain a new sample image group, and adding the new sample image group to the sample training set.

4. The image recognition method of claim 3, wherein the morphological transformation comprises one or more of cropping, rotating, flipping, brightness adjusting, sharpening, adding noise.

5. The image recognition method of claim 1, wherein the image recognition of the group of images to be processed using the twin neural network if the loss value is less than or equal to a preset loss threshold or the number of iterations is greater than or equal to a preset number of iterations comprises:

if the loss value is less than or equal to a preset loss threshold value, or the iteration number is greater than or equal to a preset iteration number, testing the twin neural network by using a sample test set, and calculating the accuracy of the twin neural network, wherein the sample test set comprises at least one test image group and sample marks corresponding to the test image group, and each test image group comprises two test images;

if the accuracy of the twin neural network is lower than a preset accuracy threshold, replacing a sample training set, and returning to the feature extraction operation;

and if the accuracy of the twin neural network is higher than or equal to a preset accuracy threshold, performing image identification on the image group to be processed by using the twin neural network.

6. An image recognition apparatus, comprising:

7. The image recognition apparatus according to claim 6, wherein said apparatus comprises:

8. The image recognition apparatus of claim 6, wherein the image recognition module comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.