CN107992894B

CN107992894B - Image recognition method, image recognition device and computer-readable storage medium

Info

Publication number: CN107992894B
Application number: CN201711318139.0A
Authority: CN
Inventors: 张水发
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2022-02-08
Anticipated expiration: 2037-12-12
Also published as: CN107992894A

Abstract

The present disclosure relates to an image recognition method, an image recognition apparatus, and a computer-readable storage medium, wherein the method includes: acquiring the resolution of a first image to be identified; when the resolution of the first image is greater than a first preset resolution or less than a second preset resolution, generating a third image with a third preset resolution by specifying a convolution layer or a deconvolution layer contained in the multi-scale layer according to the first image, wherein the first preset resolution is N times of the third preset resolution, the third preset resolution is N times of the second preset resolution, and N is greater than 1; and identifying the third image by specifying a classifier. The third image has more accurate characteristics of the pixel points, namely, the third image has smaller distortion degree compared with the first image, and the resolution of the third image is the resolution which can accurately identify the image for the appointed classifier, so that the third image can be accurately identified through the appointed classifier, and the accurate identification of the first image is realized.

Description

Image recognition method, image recognition device and computer-readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image recognition method and apparatus, and a computer-readable storage medium.

Background

With the development of image processing technology, classification models such as deep convolutional neural network models have become indispensable tools in image recognition. Before the image is recognized through the classification model, the classification model can be trained, the trained classification model can be called a classifier, and then the classifier can be used for recognizing the image.

In the related art, when an image is identified by using a classifier, the image may be directly input to the classifier, and then, a convolutional layer, an active layer, a pooling layer, a full link layer, and the like included in the classifier may process the image, and a class probability layer included in the classifier may output a final image identification result.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides an image recognition method, apparatus, and computer-readable storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided an image recognition method, the method including:

acquiring the resolution of a first image to be identified;

when the resolution of the first image is greater than a first preset resolution or less than a second preset resolution, generating a third image with a third preset resolution by specifying a convolution layer or a deconvolution layer contained in a multi-scale layer according to the first image, wherein the first preset resolution is N times of the third preset resolution, the third preset resolution is N times of the second preset resolution, and N is greater than 1;

and identifying the third image by a specified classifier.

Optionally, the generating, according to the first image, a third image with a third preset resolution by specifying a convolutional layer or a deconvolution layer included in a multi-scale layer includes:

scaling a first image into a second image having the first preset resolution or the second preset resolution according to a resolution of the first image;

and generating a third image with the third preset resolution corresponding to the second image through a convolution layer or a deconvolution layer contained in the specified multi-scale layer.

Optionally, the scaling the first image into a second image with the first preset resolution or the second preset resolution according to the resolution of the first image includes:

when the resolution of the first image is greater than the first preset resolution, scaling the first image into a second image with the first preset resolution;

when the resolution of the first image is smaller than the second preset resolution, scaling the first image into a second image with the second preset resolution.

Optionally, the generating, by a convolution layer or a deconvolution layer included in the specified multi-scale layer, a third image with the third preset resolution corresponding to the second image includes:

when the resolution of the second image is the first preset resolution, generating a third image with a third preset resolution corresponding to the second image through the convolutional layer contained in the specified multi-scale layer;

and when the resolution of the second image is the second preset resolution, generating a third image with the third preset resolution corresponding to the second image through a deconvolution layer contained in the specified multi-scale layer.

Optionally, after acquiring the resolution of the first image to be recognized, the method further includes:

when the resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution, scaling the first image into a third image with the third preset resolution.

Optionally, before generating, according to the first image, a third image with a third preset resolution by specifying a convolutional layer or a deconvolution layer included in a multi-scale layer, the method further includes:

acquiring a plurality of preset image sets, wherein all preset images included in each preset image set in the plurality of preset image sets belong to the same category;

and training a multi-scale layer to be trained and a classification model by using the plurality of preset image sets to obtain the designated multi-scale layer and the designated classifier.

According to a second aspect of the embodiments of the present disclosure, there is provided an image recognition apparatus, the apparatus including:

the first acquisition module is used for acquiring the resolution of a first image to be identified;

a generating module, configured to generate, according to the first image, a third image with a third preset resolution by specifying a convolutional layer or a deconvolution layer included in a multi-scale layer when a resolution of the first image is greater than a first preset resolution or less than a second preset resolution, where the first preset resolution is N times the third preset resolution, the third preset resolution is N times the second preset resolution, and N is greater than 1;

and the identification module is used for identifying the third image through a specified classifier.

Optionally, the generating module includes:

a scaling sub-module, configured to scale a first image into a second image with the first preset resolution or the second preset resolution according to a resolution of the first image;

and the generation submodule is used for generating a third image with the third preset resolution corresponding to the second image through a convolution layer or a deconvolution layer contained in the specified multi-scale layer.

Optionally, the scaling sub-module is further configured to:

Optionally, the generation submodule is further configured to:

Optionally, the apparatus further comprises:

a scaling module, configured to scale the first image into a third image with the third preset resolution when the resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution.

Optionally, the apparatus further comprises:

the second obtaining module is used for obtaining a plurality of preset image sets, and all preset images included in each preset image set in the plurality of preset image sets belong to the same category;

and the training module is used for training the multi-scale layer to be trained and the classification model by using the plurality of preset image sets to obtain the designated multi-scale layer and the designated classifier.

According to a third aspect of the embodiments of the present disclosure, there is provided an image recognition apparatus, the apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon instructions which, when executed by a processor, implement the steps of the method of the first aspect described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the embodiment of the present disclosure, a resolution of a first image to be identified may be obtained, and then when the resolution of the first image is greater than a first preset resolution or less than a second preset resolution, that is, when a difference between the resolution of the first image and a third preset resolution is large, a third image having a third preset resolution may be generated by specifying a convolution layer or a deconvolution layer included in a multi-scale layer according to the first image, and finally, the third image is identified by specifying a classifier. The third image is generated in a convolution mode or a deconvolution mode, so that the characteristics of the pixel points in the third image are accurate, namely the distortion degree of the third image is smaller than that of the first image, and the resolution of the third image is the resolution which can be accurately identified by the appointed classifier, so that the third image can be accurately identified by the appointed classifier, the identification of the first image is accurately realized, and the accuracy of the image identification is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart illustrating an image recognition method according to an exemplary embodiment.

FIG. 2A is a flow chart illustrating another method of image recognition according to an example embodiment.

FIG. 2B is a schematic diagram illustrating a network architecture for image recognition, according to an example embodiment.

Fig. 3A is a block diagram illustrating a first type of image recognition device according to an exemplary embodiment.

FIG. 3B is a block diagram illustrating a generation module in accordance with an exemplary embodiment.

Fig. 3C is a block diagram illustrating a second type of image recognition device according to an exemplary embodiment.

Fig. 3D is a block diagram illustrating a third image recognition apparatus according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating a fourth image recognition apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

For convenience of understanding, before explaining the embodiments of the present disclosure in detail, an application scenario related to the embodiments of the present disclosure will be described.

With the development of image processing technology, classification models such as deep convolutional neural network models have become indispensable tools in image recognition. Before the image is recognized through the classification model, the classification model can be trained, the trained classification model can be called a classifier, and then the classifier can be used for recognizing the image. In the related art, when an image is recognized using a classifier, the image may be directly input to the classifier, and then the classifier may output a final image recognition result. When the classifier identifies an image, the features of the image are extracted from the pixel points of the image, and then the image is identified according to the features of the image, so the classifier can only identify the image with the resolution within a certain resolution range, and therefore if the image is directly input into the classifier, when the difference between the resolution of the image and the resolution range which can be identified by the classifier is large, the image identification result is inaccurate. To this end, the present disclosure provides an image recognition method that improves the accuracy of image recognition by adjusting the resolution of an image.

The image recognition method provided by the disclosure can be applied to image recognition scenes. For example, many trademark images of different types may exist in daily life, in order to check whether the trademark images are infringed, the trademark images are usually identified through a classifier, and the resolutions of the trademark images may be greatly different, so that the image identification method provided by the present disclosure may be adopted to ensure accurate identification of the trademark images.

Next, an image recognition method provided by an embodiment of the present disclosure will be described in detail with reference to the drawings.

Fig. 1 is a flow chart illustrating a method of image recognition, as shown in fig. 1, according to an exemplary embodiment, the method including the following steps.

In step 101, a resolution of a first image to be identified is acquired.

In step 102, when the resolution of the first image is greater than the first preset resolution or less than the second preset resolution, a third image with a third preset resolution is generated by designating a convolutional layer or a deconvolution layer included in the multi-scale layer according to the first image, where the first preset resolution is N times the third preset resolution, the third preset resolution is N times the second preset resolution, and N is greater than 1.

In step 103, the third image is identified by specifying a classifier.

Optionally, generating, from the first image, a third image having a third preset resolution by specifying a convolutional layer or a deconvolution layer included in the multi-scale layer, includes:

scaling the first image into a second image with a first preset resolution or a second preset resolution according to the resolution of the first image;

and generating a third image with a third preset resolution corresponding to the second image by specifying a convolution layer or a deconvolution layer contained in the multi-scale layer.

Optionally, scaling the first image into a second image having a first preset resolution or a second preset resolution according to the resolution of the first image, including:

when the resolution of the first image is greater than a first preset resolution, scaling the first image into a second image with the first preset resolution;

and when the resolution of the first image is less than a second preset resolution, scaling the first image into a second image with the second preset resolution.

Optionally, generating a third image with a third preset resolution corresponding to the second image by specifying a convolution layer or a deconvolution layer included in the multi-scale layer includes:

when the resolution of the second image is a first preset resolution, generating a third image with a third preset resolution corresponding to the second image by specifying a convolutional layer contained in the multi-scale layer;

and when the resolution of the second image is a second preset resolution, generating a third image with a third preset resolution corresponding to the second image by specifying a deconvolution layer contained in the multi-scale layer.

Optionally, after acquiring the resolution of the first image to be identified, the method further includes:

and when the resolution of the first image is less than or equal to a first preset resolution and greater than or equal to a second preset resolution, scaling the first image into a third image with a third preset resolution.

Optionally, before generating a third image with a third preset resolution by specifying a convolutional layer or a deconvolution layer included in the multi-scale layer according to the first image, the method further includes:

and training the multi-scale layer to be trained and the classification model by using the plurality of preset image sets to obtain the appointed multi-scale layer and the appointed classifier.

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, and the embodiments of the present disclosure are not described in detail again.

Fig. 2A is a flowchart illustrating an image recognition method according to an exemplary embodiment, and the image recognition method provided in the embodiment of fig. 1 will be described in conjunction with fig. 2A. As shown in fig. 2A, the method includes the following steps.

In step 201, the resolution of the first image to be identified is acquired.

It should be noted that the resolution of the image refers to how many pixel points are in each inch of the image, the resolution of the image may represent the size, and the like of the image, and the resolution of the image determines the precision of the image details, that is, the higher the resolution of the image is, the more pixel points are included in the image, the clearer the image is, the lower the resolution of the image is, the fewer pixel points are included in the image, and the blurrier the image is.

In step 202, when the resolution of the first image is greater than the first preset resolution or less than the second preset resolution, a third image having a third preset resolution is generated by designating a convolution layer or a deconvolution layer included in the multi-scale layer according to the first image.

It should be noted that the first preset resolution, the second preset resolution, and the third preset resolution may be preset according to different requirements, where the first preset resolution is N times the third preset resolution, the third preset resolution is N times the second preset resolution, and N is greater than 1. For example, if N is 4, and the first predetermined resolution is 448 × 448, the third predetermined resolution is 224 × 224, and the second predetermined resolution is 112 × 112.

In addition, the specified multi-scale layer may be set in advance, and the specified multi-scale layer is a layer for scaling the image by N times, that is, the specified multi-scale layer may scale the image by N times through a convolution layer included therein, or scale the image by N times through a deconvolution layer included therein.

Moreover, the processing process of the convolution layer and the deconvolution layer on the image is opposite, the convolution layer can divide the image into a plurality of regions, and then the characteristics of a plurality of pixel points in each region are extracted as one characteristic, namely, the number of pixel points included in the image can be reduced through the convolution layer, the resolution of the image is reduced, and the image is reduced; the deconvolution layer can restore the characteristics of each pixel point in the image into a plurality of characteristics, that is, the number of pixel points in the image can be increased through the deconvolution layer, the resolution of the image is increased, and the image is amplified.

The implementation process of step 202 may be: and scaling the first image into a second image with a first preset resolution or a second preset resolution according to the resolution of the first image, and then generating a third image with a third preset resolution corresponding to the second image by specifying a convolution layer or a deconvolution layer contained in the multi-scale layer.

It should be noted that, after the second image is input into the designated multi-scale layer, the image output by the designated multi-scale layer is the image corresponding to the second image, and since the convolution layer or the deconvolution layer included in the designated multi-scale layer can scale the image by N times, the image output by the designated multi-scale layer is the third image with the third preset resolution.

The implementation process of scaling the first image into the second image with the first preset resolution or the second preset resolution according to the resolution of the first image may be: when the resolution of the first image is greater than a first preset resolution, scaling the first image into a second image with the first preset resolution; and when the resolution of the first image is less than a second preset resolution, scaling the first image into a second image with the second preset resolution.

For example, the first image has a resolution of 552 x 552, and assuming that the first preset resolution is 448 x 448, the first image may be scaled to a second image having a resolution of 448 x 448 since 552 x 552 is greater than 448 x 448.

For another example, the resolution of the first image is 56 × 56, and assuming that the second preset resolution is 112 × 112, the first image may be scaled to the second image with the resolution of 112 × 112 because 56 × 56 is smaller than 112 × 112.

The implementation process of generating a third image with a third preset resolution corresponding to the second image by specifying the convolutional layer or the deconvolution layer included in the multi-scale layer may be: when the resolution of the second image is a first preset resolution, generating a third image with a third preset resolution corresponding to the second image through the convolutional layer contained in the appointed multi-scale layer; and when the resolution of the second image is a second preset resolution, generating a third image with a third preset resolution corresponding to the second image through a deconvolution layer contained in the specified multi-scale layer.

It should be noted that, since the third preset resolution is smaller than the first preset resolution, when the resolution of the second image is the first preset resolution, the second image may be reduced by N times by specifying the convolution layer included in the multi-scale layer, so as to generate a third image with the third preset resolution corresponding to the second image. And because the second preset resolution is smaller than the third preset resolution, when the resolution of the second image is the second preset resolution, the second image can be enlarged by N times by specifying the deconvolution layer included in the multi-scale layer, so as to generate a third image with the third preset resolution corresponding to the second image.

It should be noted that, when implementing the processing on the image, the convolution layer and the deconvolution layer are usually provided with a convolution kernel and a step size, where when the step size is equal to the side length of the convolution kernel, the area of the convolution kernel is the scaling multiple of the image. Therefore, further, before generating the third image having the third preset resolution by specifying the convolution layer or the deconvolution layer included in the multi-scale layer, it is also possible to set the convolution kernels in the convolution layer and the deconvolution layer included in the specified multi-scale layer to each N × N and set the step size to each N.

For example, if N is 4, then the convolution kernels for the convolutional layer and the deconvolution layer included in the specified multiscale layer may each be set to 2 x 2, and the step size may each be set to 2, at which point the convolutional layer may reduce the image by a factor of 4, and the deconvolution layer may enlarge the image by a factor of 4.

It should be noted that, in the embodiment of the present disclosure, according to the first image, the third image with the third preset resolution is generated by specifying the convolution layer or the deconvolution layer included in the multi-scale layer, so as to implement accurate identification of the third image through specifying the classifier in the subsequent step, that is, the third preset resolution is a resolution at which the specified classifier can accurately identify the image.

In practical applications, if the difference between the resolution of the image to be recognized and the third preset resolution is large, the image to be recognized is directly zoomed to the image with the third preset resolution, which may cause image distortion due to over-zooming, and then affect the accuracy of subsequent image recognition. Therefore, in the embodiment of the present disclosure, the first image may be scaled to the second image having a resolution closer to that of the first image, so as to avoid image distortion, and then the third image corresponding to the second image is generated by specifying the convolution layer or the deconvolution layer included in the multi-scale layer.

In step 203, when the resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution, the first image is scaled to a third image with a third preset resolution.

For example, the first image has a resolution of 336 × 336, assuming that the first predetermined resolution is 448 × 448, the second predetermined resolution is 112 × 112, and the third predetermined resolution is 224 × 224, since 336 × 336 is less than 448 × 448 and greater than 112 × 112, the first image may be scaled to a third image having a resolution of 224 × 224.

It should be noted that, since the third preset resolution is located between the first preset resolution and the second preset resolution, when the resolution of the first image is located between the first preset resolution and the second preset resolution, the difference between the resolution of the first image and the third preset resolution is small, and thus the first image can be directly zoomed into the third image with the third preset resolution, thereby not only avoiding image distortion, but also avoiding unnecessary operations and saving resources.

In step 204, the third image is identified by specifying a classifier.

It should be noted that the designated classifier may be set in advance, and the designated classifier is used for identifying the image to obtain the image identification result. In addition, in the embodiment of the present disclosure, the third image is obtained by scaling the first image, so that the image recognition result obtained by recognizing the third image through the specified classifier is the image recognition result of the first image.

In addition, in the embodiment of the disclosure, because the distortion degree of the third image is smaller than that of the first image, and the third preset resolution is a resolution at which the specified classifier can accurately perform image recognition, the third image is recognized by the specified classifier, that is, the recognition of the first image can be accurately achieved, and the accuracy of the image recognition is higher.

The following describes an image recognition method provided by the embodiment of the present disclosure with reference to fig. 2B.

Referring to fig. 2B, assume that the first predetermined resolution is 448 × 448, the second predetermined resolution is 112 × 112, and the third predetermined resolution is 224 × 224. When the resolution of the first image is greater than 448 x 448, scaling the first image into a second image with a resolution of 448 x 448, and then generating a third image with a resolution of 224 x 224 through the convolutional layers included in the designated multi-scale layer; scaling the first image directly into a third image with a resolution of 224 x 224 when the first image has a resolution greater than or equal to 112 x 112 and less than or equal to 448 x 448; when the resolution of the first image is less than 112 × 112, the first image is scaled to a second image with a resolution of 112 × 112, and then a third image with a resolution of 224 × 224 is generated through a deconvolution layer included in the specified multiscale layer. And then, inputting the third image into a specified classifier, and outputting an image recognition result of the third image through the specified classifier, wherein the image recognition result of the third image is the image recognition result of the first image.

Further, before generating a third image with a third preset resolution by specifying a convolution layer or a deconvolution layer included in the multi-scale layer according to the first image, a specified multi-scale layer and a specified classifier may be generated, wherein when the specified multi-scale layer and the specified classifier are generated, a plurality of preset image sets may be obtained, and the multi-scale layer to be trained and the classification model are trained by using the plurality of preset image sets to obtain the specified multi-scale layer and the specified classifier.

It should be noted that the plurality of preset image sets may be preset, and all the preset images included in each of the plurality of preset image sets belong to the same category, that is, all the preset images included in the plurality of preset image sets are images with category identifiers, and all the preset images included in each of the preset image sets have the same category identifier.

In addition, in order to enable the trained designated multi-scale layer and the trained designated classifier to have better robustness, all the preset images included in the plurality of preset image sets can be obtained by operations such as clipping and flipping in advance, which is not limited by the embodiment of the disclosure.

The implementation process of using the multiple preset image sets to train the multi-scale layer to be trained and the classification model to obtain the designated multi-scale layer and the designated classifier can be as follows: selecting a preset image from a plurality of images included in the plurality of preset image sets, and executing the following processing on the selected preset image until each preset image included in the plurality of preset image sets is processed: inputting a preset image to a multi-scale layer to be trained, inputting the image output by the multi-scale layer into a classification model to be trained, outputting the image category through the classification model, calculating the loss value of the preset image through a preset loss function, transmitting the loss value back to each layer and the multi-scale layer included in the classification model, substituting the loss value into the partial derivative functions of each layer and each parameter in the multi-scale layer included in the classification model, determining the partial derivative value of each parameter, namely the specific error value of each parameter, updating each parameter based on the partial derivative value of each parameter, and completing one-time adjustment of each parameter. Therefore, preset images are continuously input, the processing is repeated, all parameters of the multi-scale layer and the classification model are continuously learned, and all parameters can be adjusted to be target parameters after multiple updates, so that training is completed, and the specified multi-scale layer and the specified classifier are obtained.

It should be noted that the preset loss function may be preset according to different requirements, which is not limited in the embodiment of the present disclosure.

In addition, in the embodiment of the disclosure, the multi-scale layer to be trained and the classification model are trained simultaneously to obtain the designated multi-scale layer and the designated classifier, so that the accuracy of the designated classifier in identifying the image output by the designated multi-scale layer is higher.

Certainly, in practical applications, when the multiple preset image sets are used to train the multi-scale layer to be trained and the classification model to obtain the designated multi-scale layer and the designated classifier, in addition to the above training modes, there may be other training modes, for example, an end2end (end-to-end) training method, a self-ascending unsupervised learning method, a top-descending supervised learning method, and the like, which is not limited by the present disclosure.

Fig. 3A is a block diagram illustrating an image recognition apparatus according to an exemplary embodiment. Referring to fig. 3A, the apparatus includes a first obtaining module 301, a generating module 302, and an identifying module 303.

A first obtaining module 301, configured to obtain a resolution of a first image to be identified.

A generating module 302, configured to generate, according to the first image, a third image with a third preset resolution by specifying a convolutional layer or a deconvolution layer included in the multi-scale layer when the resolution of the first image is greater than a first preset resolution or less than a second preset resolution, where the first preset resolution is N times the third preset resolution, the third preset resolution is N times the second preset resolution, and N is greater than 1.

And the identifying module 303 is configured to identify the third image by specifying a classifier.

Optionally, referring to fig. 3B, the generating module 302 includes:

a scaling sub-module 3021, configured to scale the first image into a second image having a first preset resolution or a second preset resolution according to a resolution of the first image.

A generating sub-module 3022 configured to generate a third image with a third preset resolution corresponding to the second image by specifying a convolution layer or a deconvolution layer included in the multi-scale layer.

Optionally, the scaling submodule 3021 is further configured to:

Optionally, the generating submodule 3022 is further configured to:

Optionally, referring to fig. 3C, the apparatus further comprises:

the scaling module 304 is configured to scale the first image into a third image with a third preset resolution when the resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution.

Optionally, referring to fig. 3D, the apparatus further comprises:

a second obtaining module 305, configured to obtain a plurality of preset image sets, where all preset images included in each of the preset image sets belong to the same category.

The training module 306 is configured to train the multi-scale layer to be trained and the classification model using the plurality of preset image sets, so as to obtain a designated multi-scale layer and a designated classifier.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating an image recognition apparatus 400 according to an exemplary embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.

The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the apparatus 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 400.

The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor assembly 414 may detect an open/closed state of the apparatus 400, the relative positioning of the components, such as a display and keypad of the apparatus 400, the sensor assembly 414 may also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the methods provided by the embodiments illustrated in fig. 1 or fig. 2A and described above.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform an image recognition method, the method comprising:

acquiring the resolution of a first image to be identified;

when the resolution of the first image is greater than a first preset resolution or less than a second preset resolution, generating a third image with a third preset resolution by specifying a convolution layer or a deconvolution layer contained in the multi-scale layer according to the first image, wherein the first preset resolution is N times of the third preset resolution, the third preset resolution is N times of the second preset resolution, and N is greater than 1;

and identifying the third image by specifying a classifier.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring the resolution of a first image to be identified;

when the resolution of the first image is greater than a first preset resolution, scaling the first image into a second image with the first preset resolution; when the resolution of the first image is smaller than a second preset resolution, scaling the first image into a second image with the second preset resolution; generating a third image with a third preset resolution corresponding to the second image by designating a convolution layer or a deconvolution layer contained in a multi-scale layer, wherein the third preset resolution is a resolution at which a designated classifier can accurately identify an image, the first preset resolution is N times of the third preset resolution, the third preset resolution is N times of the second preset resolution, and N is greater than 1;

when the resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution, scaling the first image into a third image with the third preset resolution;

and identifying the third image through a specified classifier, wherein the specified classifier and the specified multi-scale layer are obtained by simultaneously training the multi-scale layer to be trained and a classification model.

2. The method according to claim 1, wherein the generating a third image with a third preset resolution corresponding to the second image by specifying convolutional layers or deconvolution layers included in the multi-scale layers comprises:

3. The method according to any one of claims 1-2, further comprising:

4. An image recognition apparatus, characterized in that the apparatus comprises:

a generating module, configured to scale the first image into a second image with a first preset resolution when a resolution of the first image is greater than the first preset resolution; when the resolution of the first image is smaller than a second preset resolution, scaling the first image into a second image with the second preset resolution; generating a third image with a third preset resolution corresponding to the second image by designating a convolution layer or a deconvolution layer contained in a multi-scale layer, wherein the third preset resolution is a resolution at which a designated classifier can accurately identify an image, the first preset resolution is N times of the third preset resolution, the third preset resolution is N times of the second preset resolution, and N is greater than 1;

a scaling module, configured to scale the first image into a third image with a third preset resolution when a resolution of the first image is less than or equal to the first preset resolution and greater than or equal to the second preset resolution;

and the identification module is used for identifying the third image through a specified classifier, wherein the specified classifier and the specified multi-scale layer are obtained by simultaneously training the multi-scale layer to be trained and the classification model.

5. The apparatus of claim 4, wherein the generating module comprises: generating a submodule;

the generation submodule is used for:

6. The apparatus of any of claims 4-5, further comprising:

7. An image recognition apparatus, characterized in that the apparatus comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of any of the methods of claims 1-3.

8. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-3.