CN108256555B

CN108256555B - Image content identification method and device and terminal

Info

Publication number: CN108256555B
Application number: CN201711394566.7A
Authority: CN
Inventors: 张志伟; 杨帆
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2020-10-16
Anticipated expiration: 2037-12-21
Also published as: CN108256555A

Abstract

The embodiment of the invention provides an image content identification method, an image content identification device and a terminal, wherein the method comprises the following steps: inputting a sample image into a convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for carrying out iterative training on the convolutional neural network; determining a number of trained iterations for the convolutional neural network; based on the trained iteration times, adjusting a loss function to obtain a target loss function; performing iterative training according to the target loss function to obtain a target convolutional neural network; and identifying the content of the image to be identified through the target convolutional neural network. The convolutional neural network training scheme provided by the embodiment of the invention can better fit the distribution of complex image samples, and reduce the number of sample images with intermediate probability value distribution, thereby increasing the recall rate of the samples under the condition of ensuring the accuracy rate of the recognition result of the convolutional neural network.

Description

Image content identification method and device and terminal

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to an image content recognition method, an image content recognition device, and a terminal.

Background

Deep learning is widely applied in video images, speech recognition, natural language processing and other related fields. The convolutional neural network is used as an important branch of deep learning, and due to the ultra-strong fitting capability and the end-to-end global optimization capability of the convolutional neural network, the accuracy of a prediction result obtained in computer vision tasks such as target detection and classification is greatly improved.

In practical applications, however, the results produced by convolutional neural networks are not generally used directly. Taking a binary classification task as an example, a convolutional neural network will give its probability in a certain class for an input data. The probability threshold is set according to a specific application scenario, and a higher threshold is usually set to obtain a higher accuracy, but the recall rate of the image sample is correspondingly reduced, and obviously, the accuracy rate of the recognition result is inversely proportional to the recall rate of the image sample. Therefore, the technical problems urgently needed to be solved by the technical personnel in the field are as follows: how to increase the recall rate of the sample under the condition of ensuring the accuracy rate of the identification result of the convolutional neural network.

Disclosure of Invention

The embodiment of the invention provides an image content identification method, an image content identification device and a terminal, and aims to solve the problem that in the prior art, the accuracy of a convolutional neural network identification result and the recall rate of a sample cannot be considered at the same time.

According to an aspect of the present invention, there is provided an image content recognition method, the method including: inputting a sample image into a convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for carrying out iterative training on the convolutional neural network; determining a number of trained iterations for the convolutional neural network; based on the trained iteration times, adjusting a loss function to obtain a target loss function; performing iterative training according to the target loss function to obtain a target convolutional neural network; and identifying the content of the image to be identified through the target convolutional neural network.

Optionally, the step of adjusting a preset loss function to obtain a target loss function based on the trained iteration number includes: extracting a preset loss function, and judging whether the number of trained iterations is greater than a first preset number; if not, adjusting the hyperparameter in the preset loss function to be 0 to obtain a target loss function; and if so, adjusting the hyperparameter in the preset loss function to a preset value to obtain a target loss function.

Optionally, the preset loss function is as follows:

sin FocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

wherein p is_tIs a probability value, gamma is a hyper-parameter, i is an iteration number upper limit value, and s is a trained iteration number;

optionally, the step of adjusting a preset loss function to obtain a target loss function based on the trained iteration number includes: determining an upper limit value of the iteration times; and substituting the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

Optionally, the step of performing the iterative training according to the target loss function includes: determining a characteristic map corresponding to the sample image through the convolutional neural network; carrying out average pooling on the feature maps, and carrying out dimension reduction processing on the feature maps after the average pooling to obtain feature vectors; the first feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value; calculating an average loss value of the convolutional neural network based on the target loss function; and calculating partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and updating model parameters corresponding to the convolutional neural network according to the gradient values.

According to another aspect of the present invention, there is provided an image content recognition apparatus, the apparatus including: the input module is configured to input a sample image into the convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for performing iterative training on the convolutional neural network; a determination module configured to determine a number of trained iterations to the convolutional neural network; a loss function adjusting module configured to adjust a loss function to obtain a target loss function based on the trained iteration number; the training module is configured to perform the iterative training according to the target loss function to obtain a target convolutional neural network; and the prediction module is configured to identify the content of the image to be identified through the target convolutional neural network.

Optionally, the loss function adjusting module includes: the extraction submodule is configured to extract a preset loss function and judge whether the number of trained iterations is greater than a first preset number; the first adjusting submodule is configured to adjust the hyperparameter in the preset loss function to 0 if the target loss function is not the preset loss function, and obtain a target loss function; and the second adjusting submodule is configured to adjust the hyperparameter in the preset loss function to a preset value if the target loss function is in the preset state, so as to obtain the target loss function.

Optionally, the preset loss function is as follows:

sin FocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

optionally, the loss function adjusting module includes: an upper limit value determination submodule configured to determine an upper limit value of the number of iterations; and the substitution submodule is configured to substitute the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

Optionally, the training module comprises: a feature map determination submodule configured to determine a feature map corresponding to the sample image through the convolutional neural network; the processing submodule is configured to perform average pooling on the feature maps, and perform dimension reduction processing on the feature maps after the average pooling to obtain feature vectors; the first feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value; a calculation sub-module configured to calculate an average loss value of the convolutional neural network based on the target loss function; and the updating submodule is configured to calculate the partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and update the model parameters corresponding to the convolutional neural network according to the gradient values.

According to still another aspect of the present invention, there is provided a terminal including: the image content recognition method comprises a memory, a processor and an image content recognition program which is stored on the memory and can run on the processor, wherein the image content recognition program realizes any step of the image content recognition method when being executed by the processor.

According to still another aspect of the present invention, there is provided a computer-readable storage medium having stored thereon an image content recognition program which, when executed by a processor, implements the steps of any one of the image content recognition methods described in the present invention.

Compared with the prior art, the invention has the following advantages:

according to the image content identification scheme provided by the embodiment of the invention, when the convolutional neural network is subjected to iterative training according to the input sample image, the target loss function used for iterative training each time is dynamically adjusted based on the iteration times, so that the distribution of complex image samples can be better fitted, the number of sample images with intermediate probability value distribution is reduced, and the recall rate of the samples is increased under the condition of ensuring the accuracy rate of the identification result of the convolutional neural network.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating steps of an image content recognition method according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of an image content recognition method according to a second embodiment of the present invention;

fig. 3 is a block diagram of an image content recognition apparatus according to a third embodiment of the present invention;

fig. 4 is a block diagram of a terminal according to a fourth embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Example one

Referring to fig. 1, a flowchart illustrating steps of an image content identification method according to a first embodiment of the present invention is shown.

The image content identification method of the embodiment of the invention can comprise the following steps:

step 101: in the process of training the convolutional neural network, a sample image is input into the convolutional neural network.

Wherein the sample image is used for iterative training of the convolutional neural network.

The convolutional neural network of the embodiment of the invention can be a multi-classification content identification model and can identify the class of the image; models can also be identified for the binary content, and whether an image belongs to a certain category can be identified. After the convolutional neural network modeling is completed, a large number of sample images are required to be adopted for carrying out multiple iterative training on the convolutional neural network modeling, so that the convergence of the convolutional neural network is ensured, and the accuracy of a prediction result is ensured. The specific process of training the convolutional neural network through each sample image is the same, and in the embodiment of the invention, one iterative training of the convolutional neural network is performed by inputting one sample image as an example.

Step 102: the number of trained iterations to the convolutional neural network is determined.

When the convolutional neural network is trained, iterative training needs to be carried out for a plurality of times, and iterative training needs to be carried out once when a sample image is input each time. During training, the system accumulatively records the training iteration times of the convolutional neural network, and the recorded training iteration times are used for adjusting the loss function during the next iterative training.

For example: before step 101 is executed, 100 times of iterative training are performed on the convolutional neural network, and in step 101, the 101 st iterative training is performed on the input sample image, so that the number of trained iterations is determined to be 100.

Step 103: and adjusting the loss function to obtain a target loss function based on the trained iteration times.

An adjustable variable is set in the loss function, and the adjustable variable changes along with the change of the iteration times. Therefore, the dynamic adjustment of the target loss function along with the change of the iteration times is realized.

The target loss function used for iterative training each time is dynamically adjusted based on the iteration times, the gradient of a simple sample in parameter training can be reduced, the distribution of a complex image sample can be better fitted, and the number of sample images with intermediate probability value distribution is reduced.

Step 104: and carrying out iterative training according to the target loss function to obtain the target convolutional neural network.

For the specific process of performing iterative training on the convolutional neural network based on the target loss function, the parameter correlation technique is only required, and this is not specifically limited in the embodiment of the present invention. The convergence degree of the convolutional neural network can be detected through the target loss function, and the model parameters corresponding to the convolutional neural network can be updated through the gradient values obtained through calculation of the target loss function.

And repeatedly executing a plurality of times of convolutional neural network iterative training, generating a target convolutional neural network when the convolutional neural network converges to a preset degree, and subsequently identifying the image content through the target convolutional neural network.

Step 105: and identifying the content of the image to be identified through the target convolutional neural network.

If the target convolutional neural network is a two-classification recognition model, after content recognition is carried out on the image to be recognized, a result indicating whether the image to be recognized is a specific-class image or not can be output.

Among other things, the target convolutional neural network may be trained to recognize any particular class of images, such as: and the image of the A category and the target convolutional neural network can identify whether the image to be identified belongs to the image of the A category or not.

According to the image content identification method provided by the embodiment of the invention, when the convolutional neural network is subjected to iterative training according to the input sample image, the target loss function used for iterative training each time is dynamically adjusted based on the iteration times, so that the distribution of complex image samples can be better fitted, the number of sample images with intermediate probability value distribution is reduced, and the recall rate of the samples is increased under the condition of ensuring the accuracy rate of the identification result of the convolutional neural network.

Example two

Referring to fig. 2, a flowchart illustrating steps of an image content identification method according to a second embodiment of the present invention is shown.

The image content identification method of the embodiment of the invention specifically comprises the following steps:

step 201: in the process of training the convolutional neural network, a sample image is input into the convolutional neural network.

Wherein the sample image is used for iterative training of the convolutional neural network. After the convolutional neural network modeling is completed, a large number of sample images are required to be adopted for carrying out multiple iterative training on the convolutional neural network modeling, so that the convergence of the convolutional neural network is ensured, and the accuracy of a prediction result is ensured. The specific process of training the convolutional neural network through each sample image is the same, and in the embodiment of the invention, one iterative training of the convolutional neural network is performed by inputting one sample image as an example.

Step 202: the number of trained iterations to the convolutional neural network is determined.

The system accumulates the times of each iterative training in the process of iterative training of the convolutional neural network. For example: the previous iterative training is 50 th, the current iterative training is 51 th, and the corresponding next iterative training is 52 th.

Step 203: and adjusting the loss function to obtain a target loss function based on the trained iteration times.

In the specific implementation process, a person skilled in the art can set different loss functions according to actual requirements. If the set loss functions are different, the specific adjustment mode of the loss functions based on the trained iteration times is also different. No matter how the loss function is set and adjusted, the iterative training of the convolutional neural network through the target loss function can be guaranteed, the extraction of simple samples in parameter training is reduced, and the model is better fitted to the distribution of complex samples.

Alternatively, the loss function may be preset to: FocallLoss ═ - (1-p)_t)^γlog(p_t) (ii) a Wherein p is_tGamma is a hyper-parameter, being a probability value. p is a radical of_tThe large sample image is a simple sample image, whereas p_tThe small sample image is a complex sample image.

Aiming at the preset loss function, based on the trained iteration times, when the preset loss function is adjusted to obtain a target loss function: extracting a preset loss function, and judging whether the number of trained iterations is greater than a first preset number; if not, adjusting the hyperparameter in the preset loss function to be 0 to obtain a target loss function; if yes, adjusting the super-parameter in the preset loss function to be a preset value, and obtaining the target loss function.

In the above manner of adjusting the preset loss function, γ is set to 0 at the initial stage of model training, and the model is trained for a period of time, when the model is almost converged, γ is adjusted to a preset value at this time, and the preset value is a value other than 0, in order to allow the model to better learn the distribution of the difficult samples. The method for adjusting the preset loss function can fit the distribution of complex samples, and reduces the number of sample images with distributed intermediate probability values. The disadvantage is that the hyper-parameter gamma has a sudden change from 0 to a preset value in the training process, which can have a great influence on the model parameters instantaneously.

The embodiment of the present invention further provides a preferably preset function, and the preferably preset loss function is as follows:

sin FocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

for the preferably preset loss function, based on the trained iteration number, the way of adjusting the preset loss function to obtain the target loss function is as follows: firstly, determining an upper limit value of iteration times; and substituting the upper limit value of the iteration times and the trained iteration times into a preset loss function to obtain a target loss function.

The upper limit value of the iteration number may be set by a person skilled in the art according to actual requirements, and is not particularly limited in the embodiment of the present invention. The method for preferably presetting the loss function and adjusting the preset loss function can be used for fitting the distribution of the complex samples and reducing the number of sample images with the distributed intermediate probability values. Since γ sin (2 π clip (s-i,0, i/2)/i) is gradual, the loss function variation does not have a large impact on the model parameters instantaneously.

Step 204: and determining a corresponding characteristic map of the sample image through a convolutional neural network.

In the embodiment of the invention, the sample image can be a single frame image in a video or can be only a multimedia image. An image is input into a convolutional neural network, and a characteristic map is obtained after the image passes through a convolutional layer or a pooling layer. As for a specific processing mode of inputting the sample image into the convolutional neural network to obtain the feature map, reference may be made to the existing related art, which is not specifically limited in the embodiment of the present invention.

Step 205: and carrying out average pooling on the feature maps, and carrying out dimension reduction processing on the feature maps after the average pooling to obtain feature vectors.

The feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value. The convolutional neural network comprises a plurality of labels, and the probability value is the matching degree of the sample image and the classification label. It should be noted that, if the convolutional neural network is a second classification content identification model, the convolutional neural network includes two classification labels, which are a label for indicating the classification and a label for indicating that the classification is not included; if the convolutional neural network is a multi-classification content identification model, the convolutional neural network comprises classification labels corresponding to the classifications.

Step 206: an average loss value of the convolutional neural network is calculated based on the target loss function.

When calculating the average loss value, firstly calculating the loss value corresponding to each point in the characteristic vector through a target loss function; and then calculating the average value of the loss values corresponding to each point to obtain an average loss value.

Whether the iterative training of the convolutional neural network can be ended or not can be judged through the average loss value. Specifically, judging whether the average loss value is smaller than a preset loss value; if yes, finishing the iterative training of the convolutional neural network without returning to the step 201 to input the sample image into the convolutional neural network; if not, returning to the step 201 to continue inputting the sample image into the convolutional neural network, and performing iterative training on the convolutional neural network until the average loss value is smaller than the preset loss value.

And if the average loss value is smaller than the preset loss value, determining that the convolutional neural network converges to the preset standard. The preset loss value may be set by a person skilled in the art according to actual requirements, and is not particularly limited in the embodiment of the present invention. The smaller the preset loss value is, the better the convergence of the convolutional neural network after the training is finished; the larger the preset loss value is, the easier the iterative training of the convolutional neural network is.

Step 207: and calculating partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and updating model parameters corresponding to the convolutional neural network according to the gradient values to obtain the target convolutional neural network.

Specifically, to calculate the objective loss function, the partial derivatives at each point in the eigenvector are scaled to obtain a gradient value. The iterative training of the convolutional neural network is substantially continuous updating of the model parameters until the convolutional neural network converges to a preset standard, and then the image content can be predicted.

It should be noted that, in a specific implementation process, step 207 is not limited to be performed after step 206, and may also be performed before step 206.

Step 208: and identifying the content of the image to be identified through the target convolutional neural network.

The target convolutional neural network obtained through training in the steps 201 to 207 can be better fitted to the distribution of the complex image samples, and the number of sample images with the intermediate probability value distribution is reduced. Therefore, when the image to be recognized is input into the target convolutional neural network for content recognition, an accurate recognition result can be obtained.

EXAMPLE III

Referring to fig. 3, a block diagram of an image content recognition apparatus according to a third embodiment of the present invention is shown.

The icon content recognition device of the embodiment of the invention can comprise: an input module 301, configured to input a sample image into a convolutional neural network in a process of training the convolutional neural network, wherein the sample image is used for iteratively training the convolutional neural network; a determination module 302 configured to determine a number of trained iterations for the convolutional neural network; a loss function adjusting module 303 configured to adjust a loss function to obtain a target loss function based on the trained iteration number; a training module 304, configured to perform the iterative training according to the target loss function to obtain a target convolutional neural network; a prediction module 305 configured to perform content recognition on the image to be recognized through the target convolutional neural network.

Preferably, the loss function adjusting module 303 may include: an extraction submodule 3031, configured to extract a preset loss function, and determine whether the number of trained iterations is greater than a first preset number; a first adjusting submodule 3032, configured to adjust the hyperparameter in the preset loss function to 0 if not, so as to obtain a target loss function; and the second adjusting submodule 3033 is configured to, if yes, adjust the hyperparameter in the preset loss function to a preset value, so as to obtain a target loss function.

Preferably, the preset loss function is as follows:

sin FocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

wherein p is_tThe value is a probability value, gamma is a hyperparameter i and is an upper limit value of the number of iterations, and s is the number of trained iterations;

preferably, the loss function adjusting module 303 may include: an upper limit value determination submodule 3034 configured to determine an upper limit value of the number of iterations; and a substitution submodule 3035 configured to substitute the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

Preferably, the training module 304 may include: a feature map determination submodule 3041 configured to determine, by the convolutional neural network, a feature map corresponding to the sample image; the processing submodule 3042 is configured to perform average pooling on the feature map, and perform dimension reduction processing on the feature map after the average pooling to obtain a feature vector; the first feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value; a calculation submodule 3043 configured to calculate an average loss value of the convolutional neural network based on the target loss function; the updating submodule 3044 is configured to calculate partial derivatives of the target loss function at each point in the feature vector to obtain a gradient value, and update the model parameter corresponding to the convolutional neural network according to the gradient value.

The image content identification device of the embodiment of the present invention is used to implement the corresponding icon content identification method in the first embodiment and the second embodiment, and has the corresponding beneficial effects as the method embodiment, which are not described herein again.

Example four

Referring to fig. 4, a block diagram of a terminal for icon content identification according to a fourth embodiment of the present invention is shown.

The terminal of the embodiment of the invention can comprise: the system comprises a memory, a processor and an icon content identification program which is stored on the memory and can run on the processor, wherein when the icon content identification program is executed by the processor, the steps of any icon content identification method in the invention are realized.

FIG. 4 is a block diagram illustrating a convolutional neural network training terminal 600, according to an example embodiment. For example, the terminal 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and so forth.

Referring to fig. 4, terminal 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the terminal 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the terminal 600. Examples of such data include instructions for any application or method operating on terminal 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of terminal 600. The power components 606 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 600.

The multimedia component 608 comprises a screen providing an output interface between the terminal 600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 600 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the terminal 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing various aspects of status assessment for the terminal 600. For example, sensor component 614 can detect an open/closed state of terminal 600, relative positioning of components, such as a display and keypad of terminal 600, change in position of terminal 600 or a component of terminal 600, presence or absence of user contact with terminal 600, orientation or acceleration/deceleration of terminal 600, and temperature change of terminal 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the terminal 600 and other devices in a wired or wireless manner. The terminal 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing a convolutional neural network training method, and in particular, an image content recognition method comprising: inputting a sample image into a convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for carrying out iterative training on the convolutional neural network; determining a number of trained iterations for the convolutional neural network; based on the trained iteration times, adjusting a loss function to obtain a target loss function; performing iterative training according to the target loss function to obtain a target convolutional neural network; and identifying the content of the image to be identified through the target convolutional neural network.

Preferably, the step of adjusting a preset loss function to obtain a target loss function based on the trained iteration number includes: extracting a preset loss function, and judging whether the number of trained iterations is greater than a first preset number; if not, adjusting the hyperparameter in the preset loss function to be 0 to obtain a target loss function; and if so, adjusting the hyperparameter in the preset loss function to a preset value to obtain a target loss function.

Preferably, the preset loss function is as follows:

sin FocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

preferably, the step of adjusting a preset loss function to obtain a target loss function based on the trained iteration number includes: determining an upper limit value of the iteration times; and substituting the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

Preferably, the step of performing an iterative training according to the objective loss function includes: determining a characteristic map corresponding to the sample image through the convolutional neural network; carrying out average pooling on the feature maps, and carrying out dimension reduction processing on the feature maps after the average pooling to obtain feature vectors; the first feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value; calculating an average loss value of the convolutional neural network based on the target loss function; and calculating partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and updating model parameters corresponding to the convolutional neural network according to the gradient values.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the terminal 600 to perform the convolutional neural network training method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The instructions in the storage medium, when executed by a processor of the terminal, enable the terminal to perform the steps of any one of the convolutional neural network training methods described in the present invention.

When the terminal provided by the embodiment of the invention carries out iterative training on the convolutional neural network according to the input sample image, the target loss function used for iterative training each time is dynamically adjusted based on the iteration times, the distribution of complex image samples can be better fitted, the number of sample images with intermediate probability value distribution is reduced, and thus the recall rate of the samples is increased under the condition of ensuring the accuracy rate of the recognition result of the convolutional neural network.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The image content recognition scheme provided herein is not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The structure required to construct a system incorporating aspects of the present invention will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an image content recognition scheme according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. An image content recognition method, characterized in that the method comprises:

inputting a sample image into a convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for carrying out iterative training on the convolutional neural network;

determining a number of trained iterations for the convolutional neural network;

based on the trained iteration times, adjusting a loss function to obtain a target loss function;

performing iterative training according to the target loss function to obtain a target convolutional neural network;

identifying the content of the image to be identified through the target convolutional neural network;

the loss function is provided with an adjustable variable, and the target loss function is dynamically adjusted through the change of the adjustable variable along with the iteration times;

the step of adjusting a preset loss function to obtain a target loss function based on the trained iteration number comprises:

extracting a preset loss function, and judging whether the number of trained iterations is greater than a first preset number; if not, adjusting the hyperparameter in the preset loss function to be 0 to obtain a target loss function;

if so, adjusting the hyperparameter in the preset loss function to a preset value to obtain a target loss function;

the preset loss function is as follows:

sinFocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

2. the method of claim 1, wherein the step of adjusting a pre-set loss function to obtain a target loss function based on the number of trained iterations comprises:

determining an upper limit value of the iteration times;

and substituting the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

3. The method of claim 1 or 2, wherein the step of performing an iterative training according to the objective loss function comprises:

determining a characteristic map corresponding to the sample image through the convolutional neural network;

carrying out average pooling on the feature maps, and carrying out dimension reduction processing on the feature maps after the average pooling to obtain feature vectors; the feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value;

calculating an average loss value of the convolutional neural network based on the target loss function;

and calculating partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and updating model parameters corresponding to the convolutional neural network according to the gradient values.

4. An image content recognition apparatus, characterized in that the apparatus comprises:

the input module is configured to input a sample image into the convolutional neural network in the process of training the convolutional neural network, wherein the sample image is used for performing iterative training on the convolutional neural network;

a determination module configured to determine a number of trained iterations to the convolutional neural network;

a loss function adjusting module configured to adjust a loss function to obtain a target loss function based on the trained iteration number;

the training module is configured to perform the iterative training according to the target loss function to obtain a target convolutional neural network;

the prediction module is configured to identify the content of the image to be identified through the target convolutional neural network;

the loss function adjustment module includes:

the extraction submodule is configured to extract a preset loss function and judge whether the number of trained iterations is greater than a first preset number;

the first adjusting submodule is configured to adjust the hyperparameter in the preset loss function to 0 if the target loss function is not the preset loss function, and obtain a target loss function;

the second adjusting submodule is configured to adjust the super-parameter in the preset loss function to a preset value if the super-parameter in the preset loss function is in the preset value, so that a target loss function is obtained;

the preset loss function is as follows:

sinFocallLoss＝-(1-p_t)^{γsin(2π*clip(s-i,0,i/2)/i)}log(p_t)

5. the apparatus of claim 4, wherein the loss function adjustment module comprises:

an upper limit value determination submodule configured to determine an upper limit value of the number of iterations;

and the substitution submodule is configured to substitute the iteration number upper limit value and the trained iteration number into the preset loss function to obtain a target loss function.

6. The apparatus of claim 4 or 5, wherein the training module comprises:

a feature map determination submodule configured to determine a feature map corresponding to the sample image through the convolutional neural network;

the processing submodule is configured to perform average pooling on the feature maps, and perform dimension reduction processing on the feature maps after the average pooling to obtain feature vectors; the feature vector comprises a plurality of points, and each point corresponds to a classification label in the convolutional neural network and a probability value;

a calculation sub-module configured to calculate an average loss value of the convolutional neural network based on the target loss function;

and the updating submodule is configured to calculate the partial derivatives of the target loss function at each point in the feature vector to obtain gradient values, and update the model parameters corresponding to the convolutional neural network according to the gradient values.

7. A terminal, comprising: memory, processor and image content recognition program stored on the memory and executable on the processor, which image content recognition program, when executed by the processor, carries out the steps of the image content recognition method according to any one of claims 1 to 3.

8. A computer-readable storage medium, characterized in that an image content identification program is stored on the computer-readable storage medium, which image content identification program, when executed by a processor, implements the steps of the image content identification method according to any one of claims 1 to 3.