CN110674873A

CN110674873A - Image classification method and device, mobile terminal and storage medium

Info

Publication number: CN110674873A
Application number: CN201910906770.5A
Authority: CN
Inventors: 尚太章
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangzhou Xinguang Enterprise Management Consulting Co ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-01-10
Anticipated expiration: 2039-09-24
Also published as: CN110674873B

Abstract

The embodiment of the application provides an image classification method, an image classification device, a mobile terminal and a storage medium, wherein the method is applied to the mobile terminal and comprises the following steps: acquiring a first target detection frame contained in an image to be classified, and calculating a first length-width ratio of the first target detection frame; determining a target length-width ratio interval in which the first length-width ratio falls, and determining a target classification detection model corresponding to the target length-width ratio interval according to the corresponding relation between the length-width ratio interval and the classification detection model; and inputting the first target detection frame into a target classification detection model for classification to obtain a classification result of the first target detection frame. The image classification method and the image classification device can improve the accuracy of image classification and identification.

Description

Image classification method and device, mobile terminal and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method and apparatus, a mobile terminal, and a storage medium.

Background

At present, the image recognition algorithm generally adopts a deep learning algorithm. However, the current deep learning algorithm requires that the input pictures are all uniform in shape. For example, the aspect ratios of the input pictures are required to be all fixed proportions.

For the pictures with the length-width ratio far larger than the fixed ratio or far smaller than the fixed ratio, when the current deep learning algorithm is adopted for image classification, the original images generate larger deformation, the image identification and classification effect is poorer, more false identifications are generated, and therefore the accuracy of image classification and identification is influenced.

Disclosure of Invention

The embodiment of the application provides an image classification method, an image classification device, a mobile terminal and a storage medium, which can improve the accuracy of image classification and identification.

A first aspect of an embodiment of the present application provides an image classification method, including:

acquiring a first target detection frame contained in an image to be classified, and calculating a first length-width ratio of the first target detection frame;

determining a target length-width ratio interval in which the first length-width ratio falls, and determining a target classification detection model corresponding to the target length-width ratio interval according to the corresponding relation between the length-width ratio interval and the classification detection model;

and inputting the first target detection frame into the target classification detection model for classification to obtain a classification result of the first target detection frame.

A second aspect of an embodiment of the present application provides an image classification apparatus, including:

the device comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring a first target detection frame contained in an image to be classified;

a calculation unit configured to calculate a first aspect ratio of the first target detection frame;

the determining unit is used for determining a target aspect ratio section into which the first aspect ratio falls, and determining a target classification detection model corresponding to the target aspect ratio section according to the corresponding relation between the aspect ratio section and the classification detection model;

and the classification unit is used for inputting the first target detection frame into the target classification detection model for classification to obtain a classification result of the first target detection frame.

A third aspect of an embodiment of the present application provides a mobile terminal, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the first aspect of the embodiment of the present application.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps as described in the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.

In the embodiment of the application, when the image to be classified is classified, a first target detection frame contained in the image to be classified is obtained, and a first length-width ratio of the first target detection frame is calculated; determining a target length-width ratio interval in which the first length-width ratio falls, and determining a target classification detection model corresponding to the target length-width ratio interval according to the corresponding relation between the length-width ratio interval and the classification detection model; and inputting the first target detection frame into a target classification detection model for classification to obtain a classification result of the first target detection frame. According to the embodiment of the application, the target classification detection model corresponding to the first target detection frame is determined according to the length-width ratio of the first target detection frame contained in the image to be classified, different classification detection models can be determined according to different length-width ratios, and the detected target detection frames with different length-width ratios are respectively sent to different classification networks for identification, so that the accuracy of identification and classification of the target detection frames is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image classification method provided in an embodiment of the present application;

FIG. 2a is a schematic diagram illustrating a method for detecting a plurality of target detection frames from an image to be classified according to an embodiment of the present disclosure;

fig. 2b is a schematic diagram illustrating an aspect ratio of a target detection frame according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of another image classification method provided in an embodiment of the present application;

FIG. 4 is a schematic flowchart of another image classification method provided in the embodiments of the present application;

FIG. 5a is a schematic diagram illustrating a selection of a classification detection model when a target detection frame with an aspect ratio within a range of 0.75 to 1.5 is classified according to an embodiment of the present disclosure;

fig. 5b is a schematic diagram illustrating selection of a classification detection model when a target detection frame with an aspect ratio within an interval greater than 1.5 is subjected to classification processing according to an embodiment of the present application;

fig. 5c is a schematic diagram illustrating selection of a classification detection model when a target detection frame with an aspect ratio within an interval smaller than 0.75 is subjected to classification processing according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The mobile terminal according to the embodiments of the present application may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and so on. For convenience of description, the above-mentioned devices are collectively referred to as a mobile terminal.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an image classification method according to an embodiment of the present disclosure. As shown in fig. 1, the image classification method is applied to a mobile terminal, and may include the following steps.

101, the mobile terminal acquires a first target detection frame contained in the image to be classified, and calculates a first aspect ratio of the first target detection frame.

The image classification method provided by the embodiment of the application can be used for classifying the target detection frames contained in the image to be classified. The image to be classified may include one or more target detection frames, and when the image to be classified includes a plurality of target detection frames, the first target detection frame may be one of the plurality of target detection frames detected in the image to be classified.

The first target detection frame may be a rectangular frame containing the target to be detected in the image to be classified. The target to be detected can comprise a person, an animal, an automobile, a fruit and other targets needing to be detected and classified.

The first target detection frame is not a simple rectangular frame but a rectangular frame including an image in the rectangular frame. The first target detection frame is a rectangular area containing the target to be detected in the image to be classified.

Specifically, please refer to fig. 2a, fig. 2a is a schematic diagram of detecting a plurality of target detection frames from an image to be classified according to the present application. As shown in fig. 2a, the left image to be classified is not subject to target detection, and the right image to be classified is 4 target detection frames obtained after target detection. As can be seen from fig. 2a, a target detection frame 1, a target detection frame 2, a target detection frame 3, and a target detection frame 4 can be detected in an image to be classified, 4 target detection frames are total, each detection frame is a rectangular frame, and each detection frame can include a target to be detected. For example, the target included in the target detection box 1 is a human face, the target included in the target detection box 2 is a whole body photograph, the target included in the target detection box 3 is an apple, and the target included in the target detection box 4 is an automobile.

In the embodiment of the present application, the first aspect ratio refers to an aspect ratio of the first target detection frame. The aspect ratio of the target detection frame is the ratio of the length and the width of the target detection frame. Specifically, please refer to fig. 2b, where fig. 2b is a schematic diagram of an aspect ratio of a target detection frame according to an embodiment of the present disclosure. The first target detection frame in fig. 2b is illustrated by taking the target detection frame 3 in fig. 2a as an example, and as shown in fig. 2b, when the length of the first target detection frame is denoted by w and the width is denoted by h, the aspect ratio of the first target detection frame is w/h.

And 102, the mobile terminal determines a target aspect ratio section in which the first aspect ratio falls, and determines a target classification detection model corresponding to the target aspect ratio section according to the corresponding relation between the aspect ratio section and the classification detection model.

In this embodiment, the correspondence between the aspect ratio section and the classification detection model may be preset and stored in a memory (e.g., a non-volatile memory) of the mobile terminal. The correspondence relationship between the aspect ratio section and the classification detection model may include at least two selectable aspect ratio sections and at least two selectable classification detection models. The selectable aspect ratio intervals correspond to the selectable classification detection models one by one, and one selectable aspect ratio interval can only correspond to one selectable classification detection model. For example, please refer to table 1, where table 1 is a table of correspondence between the aspect ratio range and the classification detection model provided in the embodiment of the present application.

TABLE 1

Length-width ratio interval	Corresponding classification detection model
		0.75～1.5	Classification detection model 1
Greater than 1.5	Classification detection model 2
		Less than 0.75	Classification detection model 3

The classification detection model 1 is used for classifying the target detection frames with the length-width ratios within the range of 0.75-1.5, the classification detection model 2 is used for classifying the target detection frames with the length-width ratios within the range of more than 1.5, and the classification detection model 3 is used for classifying the target detection frames with the length-width ratios within the range of less than 0.75. For the target detection frames with the length-width ratio within the range of 0.75-1.5, the accuracy of the detection result of the classification detection model 1 is good, for the target detection frames with the length-width ratio within the range of more than 1.5, the accuracy of the detection result of the classification detection model 2 is good, and for the target detection frames with the length-width ratio within the range of less than 0.75, the accuracy of the detection result of the classification detection model 3 is good. For example, the accuracy of the detection result of the classification detection model 1 for the target detection frame with the aspect ratio equal to 1 may be best, the accuracy of the detection result of the classification detection model 2 for the target detection frame with the aspect ratio equal to 2 may be best, and the accuracy of the detection result of the classification detection model 3 for the target detection frame with the aspect ratio equal to 0.5 may be best.

The target classification detection model may be a Network model designed based on a residual neural Network (rest) and having an aspect ratio w/h of 0.5, w/h of 1, and w/h of 0.2, or may be a Network model designed based on other networks such as a mobile neural Network (mobile), xcepton, Visual geometry group Network (VGG), inceptotion, and dark Network (dark Network) as a backbone Network and having a corresponding aspect ratio w/h of 0.5, w/h of 1, and w/h of 0.2.

103, the mobile terminal inputs the first target detection frame into the target classification detection model for classification, and obtains a classification result of the first target detection frame.

In the embodiment of the application, after the mobile terminal inputs the first target detection frame into the target classification detection model, the first target detection frame is classified through the target classification detection model, so that the classification result of the first target detection frame can be obtained.

According to the embodiment of the application, the target classification detection model corresponding to the first target detection frame is determined according to the length-width ratio of the first target detection frame contained in the image to be classified, different classification detection models can be determined according to different length-width ratios, and the detected target detection frames with different length-width ratios are respectively sent to different classification networks for identification, so that the accuracy of identification and classification of the target detection frames is improved.

Referring to fig. 3, fig. 3 is a schematic flowchart of another image classification method provided in the present application. As shown in fig. 3, the image classification method is applied to a mobile terminal, and may include the following steps.

301, the mobile terminal obtains a first target detection frame included in the image to be classified, and calculates a first aspect ratio of the first target detection frame.

And 302, the mobile terminal determines a target aspect ratio section in which the first aspect ratio falls, and determines a target classification detection model corresponding to the target aspect ratio section according to the corresponding relation between the aspect ratio section and the classification detection model.

The specific implementation of steps 301 to 302 in the embodiment of the present application may refer to steps 101 to 102 shown in fig. 1, which are not described herein again.

303, the mobile terminal detects whether the first aspect ratio is equal to a target reference aspect ratio corresponding to the target aspect ratio section. If yes, go to step 304, otherwise go to step 305 through step 306.

In the embodiment of the present application, the target reference aspect ratio is one value in the target aspect ratio section. For example, if the target aspect ratio range is 0.75 to 1.5, the corresponding target reference aspect ratio may be equal to 1. The target classification detection model can directly perform classification detection on the target detection frame with the aspect ratio equal to the target reference aspect ratio.

And 304, inputting the first target detection frame into the target classification detection model by the mobile terminal for classification to obtain a classification result of the first target detection frame.

The specific implementation of step 304 in the embodiment of the present application may refer to step 103 shown in fig. 1, which is not described herein again.

And 305, the mobile terminal performs scaling processing on the first target detection frame to obtain a scaled first target detection frame, wherein the aspect ratio of the scaled first target detection frame is the target reference aspect ratio.

And 306, inputting the scaled first target detection frame into the target classification detection model by the mobile terminal for classification to obtain a classification result of the first target detection frame.

In the embodiment of the present application, the target classification detection model is designed to require that the aspect ratio of the input image is equal to the target reference aspect ratio. If the aspect ratio of the input image is equal to the target reference aspect ratio, the input image can be classified through the target classification detection model without being processed. If the aspect ratio of the input image is not equal to the target reference aspect ratio, the aspect ratio of the input image needs to be adjusted, the aspect ratio of the input image is adjusted to the target reference aspect ratio, and the image with the adjusted aspect ratio is input into the target classification detection model for classification.

If the aspect ratio of the first target detection frame is not equal to the target reference aspect ratio, scaling the first target detection frame to obtain a scaled first target detection frame, so that the aspect ratio of the scaled first target detection frame is the target reference aspect ratio.

The scaling process may include performing a reduction process or an enlargement process on the target detection frame.

Specifically, the target aspect ratio section includes an upper limit value, a lower limit value, and a target reference aspect ratio.

Optionally, if the aspect ratio of the first target detection frame is between the target reference aspect ratio and the upper limit value, the mobile terminal performs scaling processing on the first target detection frame to obtain a scaled first target detection frame, specifically:

the mobile terminal keeps the width of the first target detection frame unchanged, and compresses the first target detection frame from the length direction so that the length-width ratio of the first target detection frame is equal to the target reference length-width ratio; alternatively, the first and second electrodes may be,

and the mobile terminal keeps the length of the first target detection frame unchanged, and stretches the first target detection frame from the width direction so that the length-width ratio of the first target detection frame is equal to the target reference length-width ratio.

For example, if the target aspect ratio range is 0.75 to 1.5, the upper limit value is 1.5, the lower limit value is 0.75, and the target reference aspect ratio is 1. If the aspect ratio of the first target detection frame is between 1 and 1.5, the aspect ratio of the first target detection frame is larger than the target reference aspect ratio, that is, the length of the first target detection frame is larger or the width of the first target detection frame is smaller. The mobile terminal may compress the length of the first target detection box, or stretch the width of the first target detection box, or simultaneously compress the length of the first target detection box and stretch the width of the first target detection box, so that the aspect ratio of the first target detection box is equal to the target reference aspect ratio.

Optionally, if the aspect ratio of the first target detection frame is between the lower limit and the target reference aspect ratio, the mobile terminal performs scaling processing on the first target detection frame to obtain a scaled first target detection frame, specifically:

the mobile terminal keeps the width of the first target detection frame unchanged, and stretches the first target detection frame from the length direction so that the length-width ratio of the first target detection frame is equal to the target reference length-width ratio; alternatively, the first and second electrodes may be,

the mobile terminal keeps the length of the first target detection frame unchanged, and compresses the first target detection frame from the width direction so that the aspect ratio of the first target detection frame is equal to the target reference aspect ratio.

For example, if the target aspect ratio range is 0.75 to 1.5, the upper limit value is 1.5, the lower limit value is 0.75, and the target reference aspect ratio is 1. If the aspect ratio of the first target detection frame is between 0.75 and 1, the aspect ratio of the first target detection frame is smaller than the target reference aspect ratio, that is, the length of the first target detection frame is smaller or the width of the first target detection frame is larger. The mobile terminal may stretch the length of the first target detection box, or compress the width of the first target detection box, or simultaneously stretch the length of the first target detection box and compress the width of the first target detection box, so that the aspect ratio of the first target detection box is equal to the target reference aspect ratio.

In the embodiment of the application, because the difference between the upper limit value and the lower limit value of the target length-width ratio interval is not too large, when the first target detection frame is compressed or stretched, the first target detection frame cannot be deformed greatly, and thus the identification and classification of the first target detection frame cannot be influenced greatly.

Optionally, step 306 may specifically include the following steps:

and the mobile terminal inputs the scaled first target detection frame into the trained target classification detection model to perform convolution operation, pooling operation and classification operation, so as to obtain a classification result of the first target detection frame.

The pooling operation may include maximum pooling (max-pooling) and mean-pooling (mean-pooling), among others. The mobile terminals may be maximally pooled before being averaged. The classification operation can be performed by a softmax function.

Referring to fig. 4, fig. 4 is a schematic flowchart of another image classification method provided in the present application. As shown in fig. 4, the image classification method is applied to a mobile terminal, and may include the following steps.

401, the mobile terminal obtains an image to be classified.

402, the mobile terminal inputs the image to be classified into the target detection network model, and identifies at least one target detection frame contained in the image to be classified through the target detection network model, wherein the first target detection frame is any one of the at least one target detection frame contained in the image to be classified.

In the embodiment of the present application, the first target detection frame is any one of at least one target detection frame included in the image to be classified. The mobile terminal may classify each of the at least one object detection box.

The target detection network model may be any one of a Region Convolutional Neural Networks (RCNN) model, a Once-Only Look (YOLO) model, or a single target detection (SSD) model.

The target detection network model of the application can detect various targets, for example, the target detection network model can detect the targets needing to be detected and classified, such as people, animals, automobiles, fruits and the like. Because the aspect ratios of different targets are greatly different, if the targets are classified by using the target classification detection models with the same aspect ratio (the target classification detection models with the aspect ratio of 1: 1), the accuracy of classification and identification is affected.

For example, referring to table 1, the classification detection model 1 is used to classify the target detection frame with the aspect ratio within the range of 0.75-1.5, and the classification detection model 1 requires that the aspect ratio of the input target detection frame is equal to 1; the classification detection model 2 is used for classifying the target detection frame with the length-width ratio within a range larger than 1.5, and the classification detection model 2 requires that the length-width ratio of the input target detection frame is equal to 2; the classification detection model 3 is used for performing classification processing on the target detection frame with the aspect ratio within an interval smaller than 0.75, and the classification detection model 3 requires that the aspect ratio of the input target detection frame is equal to 0.5. Referring to fig. 5a, fig. 5b and fig. 5c, fig. 5a is a schematic diagram illustrating a selection of a classification detection model when a target detection frame with an aspect ratio within a range of 0.75 to 1.5 is subjected to classification processing according to an embodiment of the present disclosure; fig. 5b is a schematic diagram illustrating selection of a classification detection model when a target detection frame with an aspect ratio within an interval greater than 1.5 is subjected to classification processing according to an embodiment of the present application; fig. 5c is a schematic diagram illustrating selection of a classification detection model when a target detection frame with an aspect ratio within an interval smaller than 0.75 is subjected to classification processing according to an embodiment of the present application.

As shown in fig. 5a, when the aspect ratio of the first target detection frame is in the range of 0.75 to 1.5, if the first target detection frame is input into the classification detection model 1, the first target detection frame needs to be converted into the aspect ratio (w/h is 1) required by the classification detection model 1, and then the first target detection frame is input into the classification detection model 1 for classification detection; if the first target detection frame is input into the classification detection model 2, the first target detection frame needs to be converted into an aspect ratio (w/h is 2) required by the classification detection model 2, and then the first target detection frame is input into the classification detection model 2 for classification detection; if the first target detection frame is input to the classification detection model 3, the first target detection frame needs to be converted into an aspect ratio (w/h is 0.5) required by the classification detection model 3, and then the first target detection frame is input to the classification detection model 3 for classification detection. As can be seen from fig. 5a, after the first target detection frame is converted into the aspect ratio required by the classification detection model 1, the deformation of the first target detection frame is minimal, and therefore, the accuracy of the classification detection result obtained by performing classification detection on the first target detection frame by using the classification detection model 1 is higher.

As shown in fig. 5b, when the aspect ratio of the first target detection frame is in the interval greater than 1.5, if the first target detection frame is input into the classification detection model 1, the first target detection frame needs to be converted into the aspect ratio (w/h is 1) required by the classification detection model 1, and then the first target detection frame is input into the classification detection model 1 for classification detection; if the first target detection frame is input into the classification detection model 2, the first target detection frame needs to be converted into an aspect ratio (w/h is 2) required by the classification detection model 2, and then the first target detection frame is input into the classification detection model 2 for classification detection; if the first target detection frame is input to the classification detection model 3, the first target detection frame needs to be converted into an aspect ratio (w/h is 0.5) required by the classification detection model 3, and then the first target detection frame is input to the classification detection model 3 for classification detection. As can be seen from fig. 5a, after the first target detection frame is converted into the aspect ratio required by the classification detection model 2, the deformation of the first target detection frame is minimal, and therefore, the accuracy of the classification detection result obtained by performing classification detection on the first target detection frame by using the classification detection model 2 is higher.

As shown in fig. 5c, when the aspect ratio of the first target detection frame is in the interval less than 0.75, if the first target detection frame is input into the classification detection model 1, the first target detection frame needs to be converted into the aspect ratio (w/h is 1) required by the classification detection model 1, and then the first target detection frame is input into the classification detection model 1 for classification detection; if the first target detection frame is input into the classification detection model 2, the first target detection frame needs to be converted into an aspect ratio (w/h is 2) required by the classification detection model 2, and then the first target detection frame is input into the classification detection model 2 for classification detection; if the first target detection frame is input to the classification detection model 3, the first target detection frame needs to be converted into an aspect ratio (w/h is 0.5) required by the classification detection model 3, and then the first target detection frame is input to the classification detection model 3 for classification detection. As can be seen from fig. 5c, after the first target detection frame is converted into the aspect ratio required by the classification detection model 3, the deformation of the first target detection frame is minimal, and therefore, the accuracy of the classification detection result obtained by performing classification detection on the first target detection frame by using the classification detection model 3 is higher.

The mobile terminal calculates a first aspect ratio of the first target detection box 403.

And 404, the mobile terminal determines a target aspect ratio section in which the first aspect ratio falls, and determines a target classification detection model corresponding to the target aspect ratio section according to the corresponding relation between the aspect ratio section and the classification detection model.

The mobile terminal detects whether the first aspect ratio is equal to a target reference aspect ratio corresponding to the target aspect ratio section 405. If yes, go to step 406, otherwise go to step 407-408.

And 406, the mobile terminal inputs the first target detection frame into the target classification detection model for classification to obtain a classification result of the first target detection frame.

And 407, the mobile terminal performs scaling processing on the first target detection frame to obtain a scaled first target detection frame, and the aspect ratio of the scaled first target detection frame is the target reference aspect ratio.

And 408, inputting the scaled first target detection frame into the target classification detection model by the mobile terminal for classification to obtain a classification result of the first target detection frame.

Optionally, after step 403 is executed, the following steps may also be executed:

(11) the mobile terminal determines whether the first aspect ratio falls into a specific aspect ratio section corresponding to the specific target detection network model, wherein the specific aspect ratio section belongs to the target aspect ratio section;

(12) if the first aspect ratio falls within the specific aspect ratio section, the step of determining the target classification detection model corresponding to the target aspect ratio section according to the correspondence relationship between the aspect ratio section and the classification detection model is performed.

Optionally, in a case where the first aspect ratio does not fall within the specific aspect ratio section, it is determined that the first object detection frame does not include the specific object.

In the embodiment of the application, if the target to be detected contained in the classified image is a certain specific target, a specific target detection network model can be adopted and is specially used for detecting a certain specific target. For example, when a specific target is a human face, a specific human face detection network model may be used. Since the aspect ratio of the specific target is generally fixed in a range, if the first target detection frame includes a certain type of specific target, the first target detection frame is in the specific aspect ratio section, and for the classification of the specific target, a specific target classification detection model may be used for the classification. If the first aspect ratio of the first object detection frame does not fall within the specific aspect ratio section, it is determined that the first object detection frame does not contain the specific object, and the subsequent classification processing is not performed.

According to the embodiment of the application, a specific target can be identified and classified, and the accuracy of identifying and classifying the specific target is improved.

Optionally, if the image to be classified does not include the target detection frame, the next image to be classified is continuously obtained for detection and classification.

According to the embodiment of the application, the target classification detection model corresponding to the aspect ratio of the first target detection frame contained in the image to be classified is determined, different classification detection models can be determined according to different aspect ratios, the target detection frames with different length-width ratios obtained through detection are subjected to scaling processing and then are respectively sent to different classification networks for identification, and therefore the accuracy of identification and classification of the target detection frames is improved.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the mobile terminal includes hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the mobile terminal may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In accordance with the above, referring to fig. 6, fig. 6 is a schematic structural diagram of an image classification apparatus provided in an embodiment of the present application, and is applied to a mobile terminal, the image classification apparatus 600 may include an obtaining unit 601, a calculating unit 602, a determining unit 603, and a classifying unit 604, where:

an obtaining unit 601, configured to obtain a first target detection frame included in an image to be classified;

a calculating unit 602, configured to calculate a first aspect ratio of the first target detection frame;

a determining unit 603, configured to determine a target aspect ratio section into which the first aspect ratio falls, and determine a target classification detection model corresponding to the target aspect ratio section according to a correspondence between the aspect ratio section and the classification detection model;

the classifying unit 604 is configured to input the first target detection box into the target classification detection model for classification, so as to obtain a classification result of the first target detection box.

Optionally, the image classification apparatus 600 may further include a detection unit 605.

The detecting unit 605 is configured to detect whether the first aspect ratio is equal to a target reference aspect ratio corresponding to the target aspect ratio section;

the classifying unit 604 is further configured to, if the first aspect ratio is equal to the target reference aspect ratio, input the first target detection box into the target classification detection model for classification, so as to obtain a classification result of the first target detection box.

Optionally, the image classification apparatus 600 may further include a scaling processing unit 606.

A scaling unit 606, configured to, if the first aspect ratio is not equal to the target reference aspect ratio, perform scaling processing on the first target detection frame to obtain a scaled first target detection frame, where an aspect ratio of the scaled first target detection frame is the target reference aspect ratio;

the classifying unit 604 is further configured to input the scaled first target detection box into the target classification detection model for classification, so as to obtain a classification result of the first target detection box.

Optionally, the scaling unit 606 inputs the scaled first target detection box into the target classification detection model for classification, so as to obtain a classification result of the first target detection box, specifically: the scaling unit 606 inputs the scaled first target detection box into a trained target classification detection model to perform convolution operation, pooling operation, and classification operation, so as to obtain a classification result of the first target detection box.

Optionally, the obtaining unit 601 is further configured to obtain an image to be classified before obtaining a first target detection frame included in the image to be classified;

the obtaining unit 601 obtains a first target detection frame included in the image to be classified, specifically: the obtaining unit 601 inputs the image to be classified into a target detection network model, and identifies at least one target detection frame included in the image to be classified through the target detection network model, where the first target detection frame is any one of the at least one target detection frame included in the image to be classified.

The determining unit 603 is further configured to determine whether the first aspect ratio falls into a specific aspect ratio section corresponding to the specific target detection network model, where the specific aspect ratio section belongs to the target aspect ratio section;

the determining unit 603 is further configured to determine, if the first aspect ratio falls in the specific aspect ratio section, a target classification detection model corresponding to the target aspect ratio section according to a correspondence relationship between the aspect ratio section and the classification detection model.

The determining unit 603 is further configured to determine that the first object detection frame does not include a specific object if the first aspect ratio does not fall within the specific aspect ratio section.

The obtaining unit 601, the calculating unit 602, the determining unit 603, the classifying unit 604, the detecting unit 605 and the scaling processing unit 606 may correspond to a processor of the mobile terminal.

In the embodiment of the application, the target classification detection model corresponding to the first target detection frame is determined according to the aspect ratio of the first target detection frame contained in the image to be classified, different classification detection models can be determined according to different aspect ratios, and the detected target detection frames with different length-width ratios are respectively sent to different classification networks for identification, so that the accuracy of identification and classification of the target detection frames is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a mobile terminal according to an embodiment of the present disclosure, as shown in fig. 7, the mobile terminal 700 includes a processor 701 and a memory 702, and the processor 701 and the memory 702 may be connected to each other through a communication bus 703. The communication bus 703 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 704 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus. The memory 702 is used for storing a computer program comprising program instructions, the processor 701 being configured for invoking the program instructions, the program comprising instructions for performing the method shown in fig. 1 to 4.

The processor 701 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

The Memory 702 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

In addition, the mobile terminal 700 may further include a camera, a display, a communication interface, an antenna, and other common components, which are not described in detail herein.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the image classification methods as described in the above method embodiments.

Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program causes a computer to execute part or all of the steps of any one of the image classification methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image classification method, comprising:

2. The method of claim 1, wherein after determining the target classification detection model corresponding to the target aspect ratio section according to the correspondence between the aspect ratio section and the classification detection model, the method further comprises:

detecting whether the first aspect ratio is equal to a target reference aspect ratio corresponding to the target aspect ratio section;

and under the condition that the first aspect ratio is equal to the target reference aspect ratio, performing the step of inputting the first target detection frame into the target classification detection model for classification to obtain a classification result of the first target detection frame.

3. The method of claim 2, further comprising:

under the condition that the first aspect ratio is not equal to the target reference aspect ratio, performing scaling processing on the first target detection frame to obtain a scaled first target detection frame, wherein the aspect ratio of the scaled first target detection frame is the target reference aspect ratio;

and inputting the scaled first target detection frame into the target classification detection model for classification to obtain a classification result of the first target detection frame.

4. The method according to claim 3, wherein the inputting the scaled first target detection box into the target classification detection model for classification to obtain a classification result of the first target detection box comprises:

and inputting the scaled first target detection frame into a trained target classification detection model to perform convolution operation, pooling operation and classification operation to obtain a classification result of the first target detection frame.

5. The method according to any one of claims 1 to 4, wherein before the obtaining of the first target detection frame included in the image to be classified, the method further comprises:

acquiring an image to be classified;

the acquiring of the first target detection frame included in the image to be classified includes:

inputting the image to be classified into a target detection network model, and identifying at least one target detection frame contained in the image to be classified through the target detection network model, wherein the first target detection frame is any one of the at least one target detection frame contained in the image to be classified.

6. The method of claim 5, wherein the object detection network model comprises a specific object detection network model, and wherein after calculating the first aspect ratio of the first object detection box, the method further comprises:

determining whether the first aspect ratio falls into a specific aspect ratio section corresponding to the specific target detection network model, the specific aspect ratio section belonging to the target aspect ratio section;

and if the first aspect ratio falls within the specific aspect ratio section, performing the step of determining a target classification detection model corresponding to the target aspect ratio section according to a correspondence relationship between the aspect ratio section and the classification detection model.

7. The method of claim 6, further comprising:

determining that the first object detection frame does not include a specific object in a case where the first aspect ratio does not fall within the specific aspect ratio section.

8. An image classification apparatus, comprising:

9. A mobile terminal comprising a processor and a memory, the memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 7.