CN108764051B

CN108764051B - Image processing method and device and mobile terminal

Info

Publication number: CN108764051B
Application number: CN201810399087.2A
Authority: CN
Inventors: 张弓
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2021-07-13
Anticipated expiration: 2038-04-28
Also published as: CN108764051A

Abstract

The application discloses an image processing method, an image processing device and a mobile terminal, wherein the method comprises the following steps: in an image to be identified, identifying a first-level scene through a deep convolutional neural network to obtain a first classification result; in the first classification result, identifying a secondary scene through a shallow convolutional neural network to obtain a second classification result; and outputting scene fine classification information of the image to be recognized based on the second classification result. According to the method, the large-scale scenes and the small-scale scenes are classified in sequence through the cascaded convolutional neural network, so that high calculation amount caused by fine classification of the small-scale scenes by adopting a single large-scale network is avoided, good balance is achieved in calculation and precision, and possibility is provided for landing on a mobile terminal.

Description

Image processing method and device and mobile terminal

Technical Field

The present application relates to the field of mobile terminal technologies, and in particular, to an image processing method and apparatus, and a mobile terminal.

Background

Existing scene classification methods include classifying large and small classes of scenes. The broad category of scenes includes less relevant scenes such as sky, grass, gourmet, etc. But people tend to pay more attention to a small category of scenes, such as the sky including clouds, the sun, the atmosphere, etc., grass including greenish grass, alternate yellow-green grass, etc., and delicacies including fruits, vegetables, meat, etc. Different subclass scenes can be identified, so that more fine post-processing can be performed, and the photo display effect is improved.

However, at present, complex scenes are finely classified through deep learning, and the classification is mainly performed by adopting a single large-scale network, so that the calculation amount is large, the time consumption is long, and great pressure is caused on the deployment of a mobile terminal.

Disclosure of Invention

In view of the above problems, the present application provides an image processing method, an image processing apparatus and a mobile terminal to solve the above problems.

In a first aspect, an embodiment of the present application provides an image processing method, including: in an image to be identified, identifying primary scenes through a deep convolutional neural network to obtain a first classification result, wherein each type of primary scene comprises at least one type of secondary scene; in the first classification result, identifying a secondary scene through a shallow convolutional neural network to obtain a second classification result, wherein the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network; and outputting scene fine classification information of the image to be recognized based on the second classification result.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the first-level classification module is used for identifying first-level scenes in the image to be identified through a deep convolutional neural network to obtain a first classification result, wherein each first-level scene comprises at least one second-level scene; the second-stage classification module is used for identifying a second-stage scene through a shallow convolutional neural network in the first classification result to obtain a second classification result, wherein the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network; and the output module is used for outputting the scene fine classification information of the image to be recognized based on the second classification result.

In a third aspect, an embodiment of the present application provides a mobile terminal, which includes a display, a memory, and a processor, where the display and the memory are coupled to the processor, and the memory stores instructions, and when the instructions are executed by the processor, the processor performs the method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having program code executable by a processor, where the program code causes the processor to execute the method of the first aspect.

Compared with the prior art, the image processing method, the image processing device and the mobile terminal provided by the embodiment of the application classify the primary scenes in the image to be recognized through the deep convolutional neural network, then classify the secondary scenes of each type of primary scenes through the shallow convolutional neural network, and finally output the fine scene classification information of the image to be recognized. Compared with the prior art, the embodiment of the application classifies the large-scale scenes and the small-scale scenes in sequence by adopting the cascaded convolutional neural network, avoids high calculation amount caused by fine classification of the small-scale scenes by adopting a single large-scale network, obtains good balance in calculation and precision, and provides possibility for landing on the ground to the mobile terminal.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart illustrating an image processing method according to a first embodiment of the present application;

FIG. 2 is a flow chart illustrating an image processing method according to a second embodiment of the present application;

fig. 3 shows a block diagram of an image processing apparatus according to a third embodiment of the present application;

fig. 4 shows a block diagram of an image processing apparatus according to a fourth embodiment of the present application;

fig. 5 is a block diagram illustrating a structure of a mobile terminal according to an embodiment of the present application;

fig. 6 shows a block diagram of a mobile terminal for performing an image processing method according to an embodiment of the present application;

fig. 7 is a schematic diagram of a conventional process for identifying and classifying image scenes by using a single convolutional neural network.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the continuous development of machine learning and deep learning, the method for identifying image scenes by adopting a machine learning model is widely applied to the fine classification of image scenes.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a conventional process of identifying and classifying an image scene by using a single Convolutional Neural Network (CNN). In fig. 7, a test image is input, about 2000 candidate regions (Region probes) are extracted from the image from bottom to top by using a selective search algorithm, each candidate Region is scaled (warp) to 227 × 227 and input to CNN, the output of the last layer of full connection layer of CNN is used as a feature, and finally, the CNN feature extracted from each candidate Region is input to an SVM (Support Vector machine) for classification.

However, after studying the scene classification system, the inventors found that, since the single large-scale network is mainly used to finely classify each subclass of complex scenes, the computation amount is large, the time consumption is long (for example, 47 seconds are required for processing one image by the VGG16 model), the utilization efficiency of computing resources is not high, and a great pressure is caused to the deployment of a mobile terminal. In the research process, the inventor researches the reason that the existing scene classification model has large calculation amount, researches how to optimize the structure of the classification model to reduce the calculation amount and improve the scene resolution efficiency, researches a feasible mobile terminal deployment scheme, and provides an image processing method, an image processing device and a mobile terminal in the embodiment of the application.

The image processing method, device, mobile terminal and storage medium provided by the embodiments of the present application will be described in detail by specific embodiments.

First embodiment

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an image processing method according to a first embodiment of the present application. The image processing method classifies the primary scenes in the image to be recognized through the deep convolutional neural network, then classifies the secondary scenes of each class of primary scenes through the shallow convolutional neural network, and finally outputs the scene fine classification information of the image to be recognized, so that the high calculation amount caused by fine classification of small classes of scenes through a single large-scale network is avoided, good balance is achieved in calculation and precision, and the possibility of landing on the ground to a mobile terminal is provided. In a specific embodiment, the image processing method is applied to the image processing apparatus 300 shown in fig. 3 and the mobile terminal 100 (fig. 5) equipped with the image processing apparatus 300, and is used for improving the scene fine classification efficiency of the mobile terminal 100 when capturing images. The following will describe the flow shown in fig. 1 in detail by taking a mobile phone as an example. The above-mentioned image processing method may specifically include the steps of:

step S101: in the image to be identified, a first-level scene is identified through a deep convolutional neural network, and a first classification result is obtained.

In the embodiment of the application, the image to be identified may be an image displayed in an image acquisition mode when a mobile phone camera is used for shooting, may also be an image stored in a local album after shooting is completed, may also be an image acquired from a cloud, and the like; it may be a two-dimensional plane image or a three-dimensional stereo image. The first-level scene may refer to a scene located at a higher level in the scene classification. The secondary scene may refer to a scene in the scene classification that is lower than the primary scene. For example, an image is captured by a mobile phone camera, where the image may have sky, grass, and house, and the sky may be a midday sky, an early evening sky, or a late night sky, in this example, the sky, grass, and house without obvious relationship may be regarded as a primary scene, and the midday sky, the early evening sky, and the late night sky belonging to the lower middle of the sky may be regarded as a secondary scene.

It can be understood that each image to be recognized contains at least one class of primary scenes; each class of primary scene includes at least one class of secondary scene. It can be understood that, besides the primary scene and the secondary scene subordinate to the primary scene, there may be a tertiary scene subordinate to the secondary scene, a quaternary scene subordinate to the tertiary scene, and the like.

In this embodiment, the first classification result includes a first-class scene classified in the image to be recognized after being recognized by the deep convolutional neural network. As a way, each class of the classified primary scene identified by the deep convolutional neural network can be used as an independent image region for further classification of the next secondary scene.

Step S102: and in the first classification result, identifying a secondary scene through a shallow convolutional neural network, and acquiring a second classification result.

In this embodiment, the deep convolutional neural network and the shallow convolutional neural network are relatively, wherein the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network. The number of convolutional neural networks may be regarded as the number of convolutional layers and fully-connected layers in the convolutional neural network.

By one approach, the deep convolutional neural network may select VGG and the shallow convolutional neural network may select AlexNet, where VGG has 19 convolutional layers/fully-connected layers and AlexNet has 8 convolutional layers/fully-connected layers. In other possible implementations, other convolutional neural networks such as Goog l eNet (22 layer), ResNet (152-1000 layer), etc. may also be selected to implement the image processing method in this embodiment.

It will be appreciated that the deep convolutional neural network and the shallow convolutional neural network are relative, and in one embodiment are VGGs as the deep convolutional neural network, and in another embodiment may be VGGs as the shallow convolutional neural network, for example, when a greater number of resnets are used as the deep convolutional neural network.

In this embodiment, the second classification result includes a second-level scene classified by the first classification result after being further identified by the shallow convolutional neural network. As one mode, each class of primary scene is identified and classified by a shallow convolutional neural network, and each class of secondary scene can be used as an independent image area.

Step S103: and outputting scene fine classification information of the image to be recognized based on the second classification result.

In this embodiment, the second classification result obtained in the previous step includes at least one class of secondary scenes in the classified image to be recognized. For facilitating subsequent operations such as image processing, parameters such as position, type and the like of each type of secondary scene in the second classification result may be integrated to form scene fine classification information of the image to be recognized, which can be recognized by a subsequent processing module.

In the existing single large-scale network, since the training is performed through a large number of data sets of different types of scenes, the accuracy of scene recognition and classification is very high, but at the same time, the calculation amount of the scene recognition is very large, and the hardware level of the current mobile terminal is difficult to realize the huge calculation amount, so the accuracy of scene classification of the mobile terminal is always limited by the difficulty in improving. The image processing method provided by the embodiment of the invention performs scene classification by using the cascade convolution neural network, so that the scene classification accuracy is guaranteed to reach the standard, the calculation amount required by the scene classification is greatly reduced, the hardware resource requirement of the scene fine classification is further reduced, and the scene classification efficiency is improved.

According to the image processing method provided by the first embodiment of the application, the deep convolutional neural network is used for classifying the primary scenes in the image to be recognized, then the shallow convolutional neural network is used for classifying the secondary scenes in each class of primary scenes, and finally the scene fine classification information of the image to be recognized is output, so that the high calculation amount caused by the fact that a single large-scale network is used for finely classifying the small-class scenes is avoided, good balance is achieved in calculation and precision, and possibility is provided for falling to the mobile terminal.

Second embodiment

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing method according to a second embodiment of the present application. The following will describe the flow shown in fig. 2 in detail by taking a mobile phone as an example. The above-mentioned image processing method may specifically include the steps of:

step S201: and acquiring an image to be identified in an image acquisition mode.

In this embodiment, the image to be recognized may be an image acquired by components such as a mobile phone camera in an image acquisition mode when the mobile phone camera is used for shooting. In order to perform more fine post-processing on the image to be recognized, the steps of the image processing method provided by the embodiment may be performed before the image processing is performed on the image to be recognized.

It can be understood that, in other embodiments, the image to be identified may also be an image acquired through a local memory of a mobile phone, a cloud server, a browser webpage end, or the like, and the image acquisition mode may also be an image acquisition mode of a local album, an image acquisition mode when the mobile terminal acquires image data from the cloud server, an image acquisition mode when a browser webpage loads a picture, or the like.

Step S202: and classifying the primary scene in the image to be identified through a deep convolutional neural network.

In this embodiment, the data of the image to be recognized may be input into a trained deep convolutional neural network, and the deep convolutional neural network is used to recognize and classify the first-level scene in the image to be recognized.

Step S203: and acquiring a first classification result, wherein the first classification result comprises at least one class of the primary scene in the image to be recognized.

In this embodiment, through the recognition and classification of the deep convolutional neural network, at least one class of classified image data of the primary scene may be output. The primary scene in the image to be identified at least comprises one type. It can be understood that if the primary scenes in the image to be recognized are only one type, the first classification result obtained after the deep convolutional neural network recognition classification also only contains the primary scenes of at most one type.

In this embodiment, step S204 may be directly performed after step S203, or step S208 may be performed first.

Step S204: and classifying the secondary scenes in each class of primary scenes in the first classification result through a shallow convolutional neural network.

In this embodiment, since the deep convolutional neural network and the shallow convolutional neural network are in a cascade relationship, the deep convolutional neural network identifies the classified primary scene image data, and can automatically match the input layer of the shallow convolutional neural network corresponding to the image data of the primary scene, and classify the secondary scene in the primary scene.

Step S205: and acquiring a second classification result, wherein the second classification result comprises at least one class of the secondary scene in the image to be identified.

In this embodiment, through the identification and classification of the shallow convolutional neural network, at least one type of classified image data of the secondary scene may be output. The second class scenes in each class of second class scenes contained in the first classification result contain at least one class. It can be understood that, if the primary scene in the image to be recognized is only one type, and the secondary scene in the primary scene is also only one type, the second classification result obtained after the shallow convolutional neural network recognition and classification also only contains the most one type of secondary scene.

There may be special cases where the primary scene in the image to be recognized may not be recognized or a recognition error may occur. For example, originally, the trained deep convolutional neural network can only be used for identifying sky, grassland and houses, when only ocean exists in the image to be identified, in the first possible case, the first classification result output by the deep convolutional neural network does not contain any class of primary scenes, because the deep convolutional neural network which is not trained by the ocean data set cannot identify the ocean; in the second possible case, since the ocean and the sky have high similarity in some dimensions of the image data, in the first classification result output by the deep convolutional neural network, the ocean in the image to be identified is classified as the sky, which is a case of classification error. Similarly, the shallow convolutional neural network may also have the above-mentioned problems when identifying and classifying the secondary scene.

For the situation that the first new scene cannot be identified, the new scene type can be defined subsequently, and the data set of the new scene type is input to train the convolutional neural network, so that the convolutional neural network can identify and classify the new scene. For the second case of scene recognition error, parameters of the convolutional neural network model need to be adjusted, the image recognition accuracy of the convolutional neural network is optimized, and the case of recognition error is reduced.

Step S206: and setting a classification mark for the secondary scene in the second classification result.

In this embodiment, the second classification result may include image data of each classified class of secondary scenes, and for facilitating subsequent operations such as image processing, a classification flag may be set for each classified class of secondary scenes, so as to establish a matching relationship between each class of secondary scenes and a corresponding subsequent processing module. In this embodiment, the classification mark may be an identifier added after the secondary scene is identified, or may be a feature of the secondary scene image itself.

Step S207: and outputting scene fine classification information of the image to be recognized containing the classification mark.

The scene fine classification information may include image data of all secondary scenes on which classification marks are marked after the image to be identified is subjected to fine classification. As a mode, the scene fine classification information may be directly input to the image processing module to perform classification processing on the secondary scene.

In this embodiment, step S208 may be further performed after step S203.

Step S208: and judging whether a simple classification instruction is received.

If the simple classification command is received, execute step S209; if the simple sort command is not received, step S204 is executed.

Step S209: and outputting the scene simple classification information of the image to be recognized based on the first classification result.

In this embodiment, the step S209 is performed to output the scene simple classification information of the image to be recognized, where the scene simple classification information only includes classification information of a classified primary scene, and the specific implementation steps may refer to the step S206 to the step S207.

In this embodiment, a determining step S208 may be added after step S203, and is configured to determine whether a simple classification instruction is received, and if the simple classification instruction is received, execute step S209 to output scene simple classification information of the image to be recognized; if the simple classification instruction is not received, step S204 to step S207 are executed, and scene fine classification information of the image to be recognized is output. The method can be used for providing a classification fineness selection for the user, and the user can select to-be-identified images to be simply processed or finely processed according to the requirement of the user. It can be understood that the image processing method provided by this embodiment may further output the scene simple classification information and the scene fine classification information of the same image to be recognized, respectively.

It should be noted that, in the above-mentioned scene fine classification information and scene simple classification information, fine and simple are relative concepts, and what is expressed is the relationship between the secondary scene classification and the primary scene classification, that is, the classification of the secondary scene is more fine than the primary scene, and it is not that the secondary scene contains more elements or categories than the primary scene, nor that the subordinate classification of a certain scene in the same scene is more than the subordinate classification of another scene. For example, the primary scene included in one image to be recognized includes cats, sky and grass, the secondary scene classification of cats includes the ear, eye, nose, etc. of cats, and the secondary scene classification of the sky in the image only includes blue sky, from the classification result, the sky only includes one secondary scene, the cat includes at least three secondary scenes, but the scene is not considered to be more detailed than the sky because the cat includes more secondary scenes, because the secondary scene of sky that can be identified in the shallow convolutional neural network also contains red sky, black sky and so on, even if the number of the secondary scene types of the sky that can be identified by the shallow convolutional neural network is less than the number of the secondary scene types of the cat, there is no fine comparability between the scenes of the same level, and the fine and simple functions in the present embodiment are only used for distinguishing the relationship between the lower level scene classification and the upper level scene classification.

In the present embodiment, step S210 and step S211 may be performed after step S203.

Step S210: and judging whether an unclassified primary scene exists in the image to be identified.

If an unclassified primary scene exists in the image to be recognized, executing step S211: and if the unclassified primary scene does not exist in the image to be identified, ending the process.

Step S211: inputting a data set containing at least one class of the unclassified primary scenes, and training the deep convolutional neural network. In the present embodiment, step S212 may be performed after step S207.

Step S212: and judging whether an unclassified secondary scene exists in the image to be identified.

If the image to be recognized has an unclassified secondary scene, executing step S213; and if the image to be identified does not have an unclassified secondary scene, ending the process.

Step S213: inputting a data set containing at least one type of the unclassified secondary scenes, and training the shallow convolutional neural network.

In this embodiment, the steps S210, S211, S212, and S213 may be used to solve the problem that the convolutional neural network cannot identify a new scene. The cascaded convolutional neural network in the embodiment can be optimized and the application range of the image processing method can be expanded by inputting a new first-level scene data set to the deep convolutional neural network to train the deep convolutional neural network and inputting a new second-level scene data set to the shallow convolutional neural network to train the shallow convolutional neural network.

As a mode, a convolutional neural network for identifying a new scene can be trained by taking a convolutional neural network which is trained to identify an old scene as a pre-training model in a transfer learning mode, so that the training efficiency of the new model can be improved.

In this embodiment, step S214 may be further performed after step S209.

Step S214: and classifying each primary scene image in the images to be recognized based on the simple scene classification information.

In this embodiment, step S215 may be further performed after step S207.

Step S215: and classifying each secondary scene image in the image to be identified based on the scene fine classification information.

In this embodiment, each primary scene image feature in the image to be recognized may be extracted according to the simple scene classification information, and each different type of primary scene image is guided to the image processing system corresponding to the feature thereof for processing; similarly, each secondary scene image feature in the image to be recognized can be extracted according to the scene fine classification information, and each different type of secondary scene image can be guided to the image processing system corresponding to the feature of the secondary scene image for processing. After step S214 is executed, the subdivided primary scene images may be integrated to obtain an image of the image to be recognized after the simple image processing; after step S215 is executed, the two-level scene images subjected to the subdivision processing may be integrated to obtain an image of the image to be recognized after the fine image processing.

Compared with the first embodiment of the application, the image processing method provided by the second embodiment of the application can output the scene simple classification information and the scene fine classification information, can conveniently show the effect comparison of the simple classification and the fine classification, and provides personalized selection for a user; the cascade convolution neural network is trained by inputting a new scene data set, so that the recognition precision can be continuously optimized, the application range of the image processing method is expanded, and the application of the scheme is more intelligent.

Third embodiment

Referring to fig. 3, fig. 3 is a block diagram of an image processing apparatus 300 according to a third embodiment of the present application. As will be explained below with respect to the block diagram shown in fig. 3, the image processing apparatus 300 includes: a primary classification module 310, a secondary classification module 320, and an output module 330, wherein:

the first-level classification module 310 is configured to identify a first-level scene in the image to be identified through a deep convolutional neural network, and obtain a first classification result, where each type of the first-level scene includes at least one type of second-level scene.

And a secondary classification module 320, configured to identify a secondary scene through a shallow convolutional neural network in the first classification result, and obtain a second classification result, where the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network.

An output module 330, configured to output scene fine classification information of the image to be recognized based on the second classification result.

The image processing device provided by the third embodiment of the application classifies the first-level scenes in the image to be recognized through the deep convolutional neural network, then classifies the second-level scenes by adopting the shallow convolutional neural network for each class of first-level scenes, and finally outputs the scene fine classification information of the image to be recognized, so that the high calculation amount caused by fine classification of small-level scenes by adopting a single large-scale network is avoided, good balance is obtained in calculation and precision, and possibility is provided for falling to the ground to the mobile terminal.

Fourth embodiment

Referring to fig. 4, fig. 4 is a block diagram illustrating an image processing apparatus 400 according to a fourth embodiment of the present application. As will be explained below with respect to the block diagram of fig. 4, the image processing apparatus 400 includes: a primary classification module 410, a secondary classification module 420, an output module 430, a simple output module 440, a primary training module 450, and a secondary training module 460, wherein:

the first-level classification module 410 is configured to identify a first-level scene in the image to be identified through a deep convolutional neural network, and obtain a first classification result, where each type of the first-level scene includes at least one type of second-level scene. Further, the primary classification module 410 includes: a preview unit 411, a primary classification unit 412, and a primary acquisition unit 413, wherein:

the preview unit 411 is used for acquiring an image to be recognized in an image acquisition mode;

and the primary classification unit 412 is configured to classify a primary scene in the image to be recognized through a deep convolutional neural network.

The primary obtaining unit 413 is configured to obtain a first classification result, where the first classification result includes at least one type of the primary scene in the image to be recognized.

And the secondary classification module 420 is configured to identify a secondary scene through a shallow convolutional neural network in the first classification result, and obtain a second classification result, where the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network. Further, the secondary classification module 420 includes: a secondary classification unit 421 and a secondary acquisition unit 422, wherein:

the secondary classification unit 421 is configured to classify a secondary scene in each class of primary scenes in the first classification result through a shallow convolutional neural network.

The secondary obtaining unit 422 is configured to obtain a second classification result, where the second classification result includes at least one type of the secondary scene in the image to be identified.

And an output module 430, configured to output scene fine classification information of the image to be recognized based on the second classification result. Further, the output module 430 includes: a marking unit 431 and a fine output unit 432, wherein:

a marking unit 431, configured to set a classification mark for the secondary scene in the second classification result;

a fine output unit 432, configured to output scene fine classification information of the image to be recognized including the classification mark.

The instruction module 440 is configured to determine whether a simple classification instruction is received.

A simple output module 442, configured to output scene simple classification information of the image to be recognized based on the first classification result.

The primary recognition module 450 is configured to determine whether an unclassified primary scene exists in the image to be recognized.

A primary training module 452 configured to input a data set including at least one type of the unclassified primary scenes and train the deep convolutional neural network.

And a secondary recognition module 460, configured to determine whether an unclassified secondary scene exists in the image to be recognized.

A secondary training module 462, configured to input a data set including at least two types of secondary scenes from the unclassified secondary scenes, and train the shallow convolutional neural network.

And a primary processing module 470, configured to perform classification processing on each primary scene image in the image to be identified based on the scene simple classification information.

And the secondary processing module 480 is configured to perform classification processing on each secondary scene image in the image to be identified based on the scene fine classification information.

Fifth embodiment

A fifth embodiment of the present application provides a mobile terminal comprising a display, a memory, and a processor, the display and the memory coupled to the processor, the memory storing instructions that, when executed by the processor, perform:

in an image to be identified, identifying primary scenes through a deep convolutional neural network to obtain a first classification result, wherein each type of primary scene comprises at least one type of secondary scene;

in the first classification result, identifying a secondary scene through a shallow convolutional neural network to obtain a second classification result, wherein the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network;

and outputting scene fine classification information of the image to be recognized based on the second classification result.

Sixth embodiment

A sixth embodiment of the present application provides a computer-readable storage medium having program code executable by a processor, the program code causing the processor to perform:

In summary, according to the image processing method, the image processing device and the mobile terminal provided by the application, the deep convolutional neural network classifies the primary scene in the image to be recognized, then the shallow convolutional neural network is adopted for classifying the secondary scene for each type of primary scene, and finally the scene fine classification information of the image to be recognized is output. Compared with the prior art, the embodiment of the application classifies the large-scale scenes and the small-scale scenes in sequence by adopting the cascaded convolutional neural network, avoids high calculation amount caused by fine classification of the small-scale scenes by adopting a single large-scale network, obtains good balance in calculation and precision, and provides possibility for landing on the ground to the mobile terminal.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. For any processing manner described in the method embodiment, all the processing manners may be implemented by corresponding processing modules in the apparatus embodiment, and details in the apparatus embodiment are not described again.

Referring to fig. 5, based on the image processing method and apparatus, the embodiment of the present application further provides a mobile terminal 100, which includes an electronic body 10, where the electronic body 10 includes a housing 12 and a main display 120 disposed on the housing 12. The housing 12 may be made of metal, such as steel or aluminum alloy. In this embodiment, the main display 120 generally includes a display panel 111, and may also include a circuit or the like for responding to a touch operation performed on the display panel 111. The Display panel 111 may be a Liquid Crystal Display (LCD) panel, and in some embodiments, the Display panel 111 is a touch screen 109.

Referring to fig. 6, in an actual application scenario, the mobile terminal 100 may be used as a smart phone terminal, in which case the electronic body 10 generally further includes one or more processors 102 (only one is shown in the figure), a memory 104, an RF (Radio Frequency) module 106, an audio circuit 110, a sensor 114, an input module 118, and a power module 122. It will be understood by those skilled in the art that the structure shown in fig. 5 is merely illustrative and is not intended to limit the structure of the electronic body 10. For example, the electronics body section 10 may also include more or fewer components than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

Those skilled in the art will appreciate that all other components are peripheral devices with respect to the processor 102, and the processor 102 is coupled to the peripheral devices through a plurality of peripheral interfaces 124. The peripheral interface 124 may be implemented based on the following criteria: universal Asynchronous Receiver/Transmitter (UART), General Purpose Input/Output (GPIO), Serial Peripheral Interface (SPI), and Inter-Integrated Circuit (I2C), but the present invention is not limited to these standards. In some examples, the peripheral interface 124 may comprise only a bus; in other examples, the peripheral interface 124 may also include other elements, such as one or more controllers, for example, a display controller for interfacing with the display panel 111 or a memory controller for interfacing with a memory. These controllers may also be separate from the peripheral interface 124 and integrated within the processor 102 or a corresponding peripheral.

The memory 104 may be used to store software programs and modules, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the electronic body portion 10 or the primary display 120 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The RF module 106 is configured to receive and transmit electromagnetic waves, and achieve interconversion between the electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices. The RF module 106 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and so forth. The RF module 106 may communicate with various networks such as the internet, an intranet, a wireless network, or with other devices via a wireless network. The wireless network may comprise a cellular telephone network, a wireless local area network, or a metropolitan area network. The Wireless network may use various Communication standards, protocols, and technologies, including, but not limited to, Global System for Mobile Communication (GSM), Enhanced Mobile Communication (Enhanced Data GSM Environment, EDGE), wideband Code division multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Wireless Fidelity (WiFi) (e.g., Institute of Electrical and Electronics Engineers (IEEE) standard IEEE 802.10A, IEEE802.11 b, IEEE802.1 g, and/or IEEE802.11 n), Voice over internet protocol (VoIP), world wide mail Access (Microwave for Wireless Communication), Wi-11 Wireless Access (wimax), and any other suitable protocol for instant messaging, and may even include those protocols that have not yet been developed.

The audio circuitry 110, earpiece 101, sound jack 103, microphone 105 collectively provide an audio interface between a user and the electronic body portion 10 or the main display 120. Specifically, the audio circuit 110 receives sound data from the processor 102, converts the sound data into an electrical signal, and transmits the electrical signal to the earpiece 101. The earpiece 101 converts the electrical signal into sound waves that can be heard by the human ear. The audio circuitry 110 also receives electrical signals from the microphone 105, converts the electrical signals to sound data, and transmits the sound data to the processor 102 for further processing. Audio data may be retrieved from the memory 104 or through the RF module 106. In addition, audio data may also be stored in the memory 104 or transmitted through the RF module 106.

The sensor 114 is disposed in the electronic body portion 10 or the main display 120, examples of the sensor 114 include, but are not limited to: light sensors, operational sensors, pressure sensors, gravitational acceleration sensors, and other sensors.

Specifically, the light sensors may include a light sensor 114F, a pressure sensor 114G. Among them, the pressure sensor 114G may detect a pressure generated by pressing on the mobile terminal 100. That is, the pressure sensor 114G detects pressure generated by contact or pressing between the user and the mobile terminal, for example, contact or pressing between the user's ear and the mobile terminal. Accordingly, the pressure sensor 114G may be used to determine whether contact or pressing has occurred between the user and the mobile terminal 100, as well as the magnitude of the pressure.

Referring to fig. 5 again, in the embodiment shown in fig. 5, the light sensor 114F and the pressure sensor 114G are disposed adjacent to the display panel 111. The light sensor 114F may turn off the display output when an object is near the main display 120, for example, when the electronic body portion 10 moves to the ear.

As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping) and the like for recognizing the attitude of the mobile terminal 100. In addition, the electronic body 10 may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer and a thermometer, which are not described herein,

in this embodiment, the input module 118 may include the touch screen 109 disposed on the main display 120, and the touch screen 109 may collect touch operations of the user (for example, operations of the user on or near the touch screen 109 using any suitable object or accessory such as a finger, a stylus, etc.) and drive the corresponding connection device according to a preset program. Optionally, the touch screen 109 may include a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 102, and can receive and execute commands sent by the processor 102. In addition, the touch detection function of the touch screen 109 may be implemented by various types, such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch screen 109, in other variations, the input module 118 may include other input devices, such as keys 107. The keys 107 may include, for example, character keys for inputting characters, and control keys for activating control functions. Examples of such control keys include a "back to home" key, a power on/off key, and the like.

The main display 120 is used to display information input by a user, information provided to the user, and various graphic user interfaces of the electronic body section 10, which may be composed of graphics, text, icons, numbers, video, and any combination thereof, and in one example, the touch screen 109 may be provided on the display panel 111 so as to be integrated with the display panel 111.

The power module 122 is used to provide power supply to the processor 102 and other components. Specifically, the power module 122 may include a power management system, one or more power sources (e.g., batteries or ac power), a charging circuit, a power failure detection circuit, an inverter, a power status indicator light, and any other components associated with the generation, management, and distribution of power within the electronic body portion 10 or the primary display 120.

The mobile terminal 100 further comprises a locator 119, the locator 119 being configured to determine an actual location of the mobile terminal 100. In this embodiment, the locator 119 implements the positioning of the mobile terminal 100 by using a positioning service, which is understood to be a technology or a service for obtaining the position information (e.g., longitude and latitude coordinates) of the mobile terminal 100 by using a specific positioning technology and marking the position of the positioned object on an electronic map.

It should be understood that the mobile terminal 100 described above is not limited to a smartphone terminal, but it should refer to a computer device that can be used in mobility. Specifically, the mobile terminal 100 refers to a mobile computer device equipped with an intelligent operating system, and the mobile terminal 100 includes, but is not limited to, a smart phone, a smart watch, a tablet computer, and the like.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (mobile terminal) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments. In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

judging whether a simple classification instruction is received or not;

if a simple classification instruction is received, outputting scene simple classification information of the image to be recognized based on the first classification result;

if the simple classification instruction is not received, in the first classification result, identifying a secondary scene in each class of the primary scene through a shallow convolutional neural network to obtain a second classification result, wherein the shallow convolutional neural network is cascaded to the deep convolutional neural network, and the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network;

outputting scene fine classification information of the image to be recognized based on the second classification result;

judging whether an unclassified secondary scene exists in the image to be identified;

and if the images to be recognized have unclassified secondary scenes, inputting a data set containing at least one class of secondary scenes in the unclassified secondary scenes, and training the shallow convolutional neural network.

2. The method of claim 1, wherein identifying a primary scene in the image to be identified by a deep convolutional neural network and obtaining a first classification result comprises:

acquiring an image to be identified in an image acquisition mode;

classifying the primary scene in the image to be identified through a deep convolutional neural network;

and acquiring a first classification result, wherein the first classification result comprises at least one class of the primary scene in the image to be recognized.

3. The method of claim 2, wherein in the first classification result, identifying a secondary scene in each of the classes of the primary scenes through a shallow convolutional neural network, and obtaining a second classification result comprises:

classifying the secondary scenes in each class of primary scenes in the first classification result through a shallow convolutional neural network;

and acquiring a second classification result, wherein the second classification result comprises at least one class of the secondary scene in the image to be identified.

4. The method according to claim 3, wherein outputting scene fine classification information of the image to be recognized based on the second classification result comprises:

setting a classification mark for the secondary scene in the second classification result;

and outputting scene fine classification information of the image to be recognized containing the classification mark.

5. The method of claim 1, wherein after the first classification result is obtained by identifying a primary scene in the image to be identified by a deep convolutional neural network, the method further comprises:

and outputting the scene simple classification information of the image to be recognized based on the first classification result.

6. The method of claim 1, further comprising:

judging whether an unclassified primary scene exists in the image to be identified;

if the unclassified primary scenes exist in the image to be recognized, inputting a data set containing at least one class of primary scenes in the unclassified primary scenes, and training the deep convolutional neural network.

7. The method according to claim 1, wherein after outputting scene fine classification information of the image to be recognized based on the second classification result, the method further comprises:

and classifying each secondary scene image in the image to be identified based on the scene fine classification information.

8. The method according to claim 5, wherein after outputting scene simple classification information of the image to be recognized based on the first classification result, the method further comprises:

and classifying each primary scene image in the images to be recognized based on the simple scene classification information.

9. An image processing apparatus, characterized in that the apparatus comprises:

the first-level classification module is used for identifying first-level scenes in the image to be identified through a deep convolutional neural network to obtain a first classification result, wherein each first-level scene comprises at least one second-level scene;

the instruction module is used for judging whether a simple classification instruction is received or not;

the simple output module is used for outputting scene simple classification information of the image to be recognized based on the first classification result if a simple classification instruction is received;

the second classification module is used for identifying a second scene in each class of the first-class scenes through a shallow convolutional neural network in the first classification result if a simple classification instruction is not received, and acquiring a second classification result, wherein the shallow convolutional neural network is cascaded to the deep convolutional neural network, and the number of layers of the deep convolutional neural network is greater than that of the shallow convolutional neural network;

the output module is used for outputting scene fine classification information of the image to be recognized based on the second classification result;

the secondary recognition module is used for judging whether an unclassified secondary scene exists in the image to be recognized;

and the secondary training module is used for inputting a data set containing at least one type of secondary scenes in the unclassified secondary scenes and training the shallow convolutional neural network if the unclassified secondary scenes exist in the image to be recognized.

10. A mobile terminal comprising a display, a memory, and a processor, the display and the memory coupled to the processor, the memory storing instructions that, when executed by the processor, the processor performs the method of any of claims 1-8.

11. A computer-readable storage medium having program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1-8.