CN113569889A

CN113569889A - Image recognition method based on artificial intelligence and related device

Info

Publication number: CN113569889A
Application number: CN202110083832.4A
Authority: CN
Inventors: 习洋洋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-10-29

Abstract

The application discloses an image identification method based on artificial intelligence and a related device, which are applied to a computer vision technology. By acquiring an input image; then inputting the input image into an attention-guided target recognition network in a target model to obtain an attention diagram and an image feature diagram; and further inputting the image feature map into a hierarchical recognition network in the target model to obtain a first type label and a second type label. The method and the device realize the image hierarchical identification process based on attention area guidance, and the method and the device adopt the attention area in the attention diagram to acquire the enhanced image, so that the model is concentrated on the data learning of key parts and is displayed in a hierarchical identification mode, the detailed part in the image is identified, and the accuracy of image identification is improved.

Description

Image recognition method based on artificial intelligence and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an artificial intelligence based image recognition method and a related apparatus.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence.

The identification of the low-grade pictures is an important application direction of artificial intelligence, and in the scheme provided by the related technology, an end-to-end model is usually constructed by means of massive labeled data, namely all processing of the pictures to be identified is completed by the end-to-end model, and the results are output as low-grade or non-low-grade results.

However, the acquisition process of massive annotation data is time-consuming and labor-consuming, and details of a popular picture may be omitted, which affects the accuracy of image recognition.

Disclosure of Invention

In view of this, the present application provides an image recognition method based on artificial intelligence, which can effectively improve the accuracy of image recognition.

A first aspect of the present application provides an image recognition method based on artificial intelligence, which can be applied to a system or a program containing an image recognition function in a terminal device, and specifically includes:

acquiring an input image;

inputting the input image into a preset recognition network in a target model to obtain an attention map, wherein the attention map comprises attention areas,

carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network;

inputting the input image into the target recognition network to obtain an image feature map;

inputting the image feature map into a hierarchical recognition network in the target model to obtain a first type label and a second type label, where the hierarchical recognition network includes a first-level label branch and a second-level label branch, the first-level label branch is used to determine the first type label of the input image, the second-level label branch is used to recognize the second type label of the input image, the first-type label and the second-type label are used to indicate the same target object, and the description granularity of the second type label for the target object is smaller than the description granularity of the first-type label for the target object.

Optionally, in some possible implementation manners of the present application, the performing image adjustment on the attention map based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network includes:

covering the attention area to update the attention map to obtain a first adjustment image, and adjusting a label corresponding to the first adjustment image;

strengthening the weight parameters corresponding to the attention area to update the attention map to obtain a second adjustment image, and keeping the label corresponding to the second adjustment image unchanged;

and training the preset recognition network according to the first adjustment image and the second adjustment image to obtain the target recognition network.

Optionally, in some possible implementation manners of the present application, the training the preset recognition network according to the first adjusted image and the second adjusted image to obtain the target recognition network includes:

performing region perturbation based on the first adjusted image to generate a negative sample sequence;

performing weight parameter perturbation based on the second adjusted image to generate a positive sample sequence;

and training the preset identification network according to the negative sample sequence and the positive sample sequence to obtain the target identification network.

Optionally, in some possible implementations of the present application, the method further includes:

determining a primary attention label and a secondary attention label corresponding to the attention area;

constraining based on the region corresponding to the attention primary label and the region corresponding to the attention secondary label to obtain attention loss information;

and adjusting parameters of the target identification network according to the attention loss information.

acquiring first-level label training data;

determining a classification loss in the primary label training data to train the primary label branch;

acquiring secondary label training data;

inputting the secondary label training data into a secondary classifier to obtain a secondary label positive sample and a secondary label negative sample;

training the secondary label branch based on the secondary label positive examples and the secondary label negative examples.

Optionally, in some possible implementation manners of the present application, the inputting the secondary label training data into a secondary classifier to obtain a secondary label positive sample and a secondary label negative sample includes:

determining a target sample in the secondary label training data;

performing sliding mean calculation on the batch data corresponding to the target sample to obtain dynamic threshold information, wherein the dynamic threshold information comprises a positive sample threshold and a negative sample threshold;

inputting the target sample into the two classifiers to obtain a predicted value;

comparing, based on the predicted values and the dynamic threshold information, to determine the secondary label positive examples and secondary label negative examples in the secondary label training data.

Optionally, in some possible implementations of the present application, the comparing based on the predicted value and the dynamic threshold information to determine the secondary label positive sample and the secondary label negative sample in the secondary label training data includes:

comparing the predicted value to a positive sample threshold in the dynamic threshold information;

if the predicted value is greater than the positive sample threshold value, determining that the target sample is the secondary label positive sample;

comparing the predicted value to a negative sample threshold in the dynamic threshold information;

and if the predicted value is smaller than the negative sample threshold value, determining the target sample as the secondary label negative sample.

if the predicted value is greater than the negative sample threshold value and the predicted value is less than the positive sample threshold value, determining the target sample as a noise sample;

setting the noise sample to not participate in training of the secondary label branch.

Optionally, in some possible implementations of the present application, the acquiring the input image includes:

acquiring an instant media data stream;

and extracting images in the media data stream according to a target time sequence to obtain the input images, and issuing the input images according to the target time sequence after the input images are identified.

extracting first key information in the first type label;

extracting second key information in the second type label;

associating the first key information with the second key information to obtain description information of the input image;

the input image is labeled based on the description information.

triggering a calling process of the input image in response to a target operation;

caching the input image based on the calling process, and identifying the mark of the input image;

and if the mark of the input image meets a preset condition, displaying the input image.

Optionally, in some possible implementations of the present application, the target model is used for identification of vulgar images, the first type tag is used for indicating an individual type of the target object, and the second type tag is used for indicating a location type of the target object.

A second aspect of the present application provides an apparatus for image recognition, comprising:

an acquisition unit configured to acquire an input image;

an input unit, configured to input the input image into a preset recognition network in a target model to obtain an attention map, where the attention map includes an attention area,

the adjusting unit is used for carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset identification network according to the enhanced image to obtain a target identification network;

the input unit is further used for inputting the input image into the target recognition network to obtain an image feature map;

the identification unit is configured to input the image feature map into a hierarchical identification network in the target model to obtain a first type tag and a second type tag, where the hierarchical identification network includes a first-level tag branch and a second-level tag branch, the first-level tag branch is used to determine the first type tag of the input image, the second-level tag branch is used to identify the second type tag of the input image, the first-type tag and the second-type tag are used to indicate the same target object, and the description granularity of the second type tag for the target object is smaller than the description granularity of the first type tag for the target object.

Optionally, in some possible implementations of the present application, the adjusting unit is specifically configured to mask the attention area, update the attention map to obtain a first adjusted image, and adjust a label corresponding to the first adjusted image;

the adjusting unit is specifically configured to strengthen the weight parameter corresponding to the attention area, update the attention map to obtain a second adjusted image, and keep a label corresponding to the second adjusted image unchanged;

the adjusting unit is specifically configured to train the preset recognition network according to the first adjusted image and the second adjusted image to obtain the target recognition network.

Optionally, in some possible implementations of the present application, the adjusting unit is specifically configured to perform region perturbation based on the first adjusted image to generate a negative sample sequence;

the adjusting unit is specifically configured to perform weight parameter perturbation based on the second adjusted image to generate a positive sample sequence;

the adjusting unit is specifically configured to train the preset identification network according to the negative sample sequence and the positive sample sequence to obtain the target identification network.

Optionally, in some possible implementations of the present application, the adjusting unit is specifically configured to determine a primary attention label and a secondary attention label corresponding to the attention area;

the adjusting unit is specifically configured to constrain the region corresponding to the attention primary label and the region corresponding to the attention secondary label to obtain attention loss information;

the adjusting unit is specifically configured to perform parameter adjustment on the target identification network according to the attention loss information.

Optionally, in some possible implementation manners of the present application, the identification unit is specifically configured to obtain first-level label training data;

the identification unit is specifically configured to determine a classification loss in the primary label training data, so as to train the primary label branch;

the identification unit is specifically used for acquiring secondary label training data;

the identification unit is specifically used for inputting the secondary label training data into a secondary classifier to obtain a secondary label positive sample and a secondary label negative sample;

the identification unit is specifically configured to train the secondary label branch based on the secondary label positive sample and the secondary label negative sample.

Optionally, in some possible implementations of the present application, the identification unit is specifically configured to determine a target sample in the secondary label training data;

the identification unit is specifically configured to perform sliding mean calculation based on batch data corresponding to the target sample to obtain dynamic threshold information, where the dynamic threshold information includes a positive sample threshold and a negative sample threshold;

the identification unit is specifically configured to input the target sample into the two classifiers to obtain a predicted value;

the identification unit is specifically configured to compare the predicted value with the dynamic threshold information to determine the secondary label positive sample and the secondary label negative sample in the secondary label training data.

Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to compare the predicted value with a positive sample threshold in the dynamic threshold information;

the identification unit is specifically configured to compare the predicted value with a negative sample threshold in the dynamic threshold information;

the identification unit is specifically configured to determine that the target sample is the secondary label negative sample if the predicted value is smaller than the negative sample threshold.

Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to determine that the target sample is a noise sample if the predicted value is greater than the negative sample threshold and the predicted value is less than the positive sample threshold;

the identification unit is specifically configured to set the noise sample to not participate in training of the secondary label branch.

Optionally, in some possible implementation manners of the present application, the obtaining unit is specifically configured to obtain an instant media data stream;

the acquiring unit is specifically configured to extract an image in the media data stream according to a target time sequence to obtain the input image, and the input image is issued according to the target time sequence after being identified.

Optionally, in some possible implementation manners of the present application, the identification unit is specifically configured to extract first key information in the first type tag;

the identification unit is specifically configured to extract second key information in the second type tag;

the identification unit is specifically configured to associate the first key information with the second key information to obtain description information of the input image;

the identification unit is specifically configured to mark the input image based on the description information.

Optionally, in some possible implementations of the present application, the identification unit is specifically configured to trigger a calling process of the input image in response to a target operation;

the identification unit is specifically configured to cache the input image based on the calling process and identify a tag of the input image;

the identification unit is specifically configured to display the input image if the mark of the input image meets a preset condition.

A third aspect of the present application provides a computer device comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of image recognition according to the first aspect or any one of the first aspects according to instructions in the program code.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of image recognition of the first aspect or any of the first aspects described above.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of image recognition provided in the first aspect or the various alternative implementations of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

by acquiring an input image; then inputting the input image into a preset recognition network in a target model to obtain an attention diagram, wherein the attention diagram comprises an attention area, carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network; further inputting the input image into a target recognition network to obtain an image characteristic diagram; and then inputting the image feature map into a hierarchical identification network in the target model to obtain a first type label and a second type label, wherein the hierarchical identification network comprises a first-level label branch and a second-level label branch, the first-level label branch is used for determining the first type label of the input image, the second-level label branch is used for identifying the second type label of the input image, the first-type label and the second-type label are used for indicating the same target object, and the description granularity of the second type label for the target object is smaller than that of the first-type label for the target object. The method and the device realize the image hierarchical identification process based on attention area guidance, and the method and the device adopt the attention area in the attention diagram to acquire the enhanced image, so that the model is concentrated on the data learning of key parts and is displayed in a hierarchical identification mode, the detailed part in the image is identified, and the accuracy of image identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram of a network architecture in which a system for image recognition operates;

fig. 2 is a flowchart of image recognition according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a method for image recognition based on artificial intelligence according to an embodiment of the present disclosure;

FIG. 4 is a scene diagram illustrating a method for image recognition based on artificial intelligence according to an embodiment of the present disclosure;

FIG. 5 is a scene diagram illustrating another artificial intelligence-based image recognition method according to an embodiment of the present disclosure;

FIG. 6 is a scene diagram illustrating another artificial intelligence-based image recognition method according to an embodiment of the present disclosure;

FIG. 7 is a scene diagram of another artificial intelligence-based image recognition method according to an embodiment of the present disclosure;

FIG. 8 is a scene diagram illustrating another artificial intelligence-based image recognition method according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of another artificial intelligence based image recognition method provided by an embodiment of the present application;

FIG. 10 is a flowchart of another artificial intelligence based image recognition method provided by an embodiment of the present application;

fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an image identification method based on artificial intelligence and a related device, which can be applied to a system or a program containing an image identification function in terminal equipment and can acquire an input image; then inputting the input image into a preset recognition network in a target model to obtain an attention diagram, wherein the attention diagram comprises an attention area, carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network; further inputting the input image into a target recognition network to obtain an image characteristic diagram; and then inputting the image feature map into a hierarchical identification network in the target model to obtain a first type label and a second type label, wherein the hierarchical identification network comprises a first-level label branch and a second-level label branch, the first-level label branch is used for determining the first type label of the input image, the second-level label branch is used for identifying the second type label of the input image, the first-type label and the second-type label are used for indicating the same target object, and the description granularity of the second type label for the target object is smaller than that of the first-type label for the target object. The method and the device realize the image hierarchical identification process based on attention area guidance, and the method and the device adopt the attention area in the attention diagram to acquire the enhanced image, so that the model is concentrated on the data learning of key parts and is displayed in a hierarchical identification mode, the detailed part in the image is identified, and the accuracy of image identification is improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some nouns that may appear in the embodiments of the present application are explained.

An attention mechanism is as follows: the classification model can generate different weight preference for different areas in the image classification process, and can be represented by a thermodynamic diagram. The attention mechanism is obtained by learning the classification model through mass data.

Fine grain size recognition: refers to a sub-domain in image classification. The image classification problem is to classify images into different large categories as desired, such as lions, dogs, planes, etc. The fine-grained recognition is to distinguish in a category, and if face recognition is a special fine-grained recognition problem, the face of the person who you want needs to be found out from a large number of faces. The currently predominant fine-grained recognition dataset is CUB-200, which identifies different categories from birds.

Data noise: the method refers to that the image is wrongly marked because the image cannot be judged which type is in the marking process of a marking person; or certain label information in the image is missed, so that the effect is poor during model training.

It should be understood that the method for image recognition provided by the present application may be applied to a system or a program including an image recognition function in a terminal device, such as a vulgar image detection tool, specifically, the system for image recognition may operate in a network architecture as shown in fig. 1, which is a network architecture diagram of the system for image recognition as shown in fig. 1, and as can be seen from the figure, the system for image recognition may provide a process of image recognition with multiple information sources, that is, multimedia data is obtained through a triggering operation at a terminal side, so that multimedia data is recognized at the terminal side or a server side to obtain a vulgar image therein and processed; it can be understood that, fig. 1 shows various terminal devices, the terminal devices may be computer devices, in an actual scene, there may be more or fewer types of terminal devices participating in the process of image recognition, and the specific number and types are determined by the actual scene, which is not limited herein, and in addition, fig. 1 shows one server, but in an actual scene, there may also be participation of multiple servers, and the specific number of servers is determined by the actual scene.

In this embodiment, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through a wired or wireless communication manner, and the terminal and the server may be connected to form a block chain network, which is not limited herein.

It is understood that the above-mentioned image recognition system may be operated in a personal mobile terminal, for example: the application which is a low-custom image detection tool can also run on a server, and can also be run on a third-party device to provide image recognition so as to obtain the processing result of the image recognition of the information source; the specific image recognition system may be operated in the device in the form of a program, may also be operated as a system component in the device, and may also be used as one of cloud service programs, and a specific operation mode is determined according to an actual scene, which is not limited herein.

The artificial intelligence technology includes Computer Vision technology (CV), which is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

In the scheme provided by the related technology, an end-to-end model is usually constructed by means of massive labeled data, namely all processing of the picture to be identified is completed by the end-to-end model, and the result is output as a low-popular or non-low-popular result.

In order to solve the above problems, the present application provides an image recognition method based on artificial intelligence, which is applied to the flow framework of image recognition shown in fig. 2, and as shown in fig. 2, for the flow framework of image recognition provided in the embodiment of the present application, a user obtains multimedia data through an interactive operation of an interface layer, and converts the multimedia data into an image input at a server side for image recognition, so as to obtain a hierarchical tag corresponding to the multimedia data, and perform a vulgar determination to determine whether to publish or show the multimedia data on a terminal.

According to the method, the attention guiding method is adopted in the image identification process, the attention degree of the fine distinguishing area in the image is improved by the aid of the model, and the accuracy of the model in the image low-custom identification scene is guaranteed. In the process of determining the hierarchical label, deep learning is trained through mass data, so that the model has rough attention area and weak positioning information for the image. However, the learning of the attention of the classification model in the training process is essentially passive model parameter learning, and the invention introduces attention guide to actively assist the classification model in learning the attention area and help the model to obtain better effect on the data of the fine violation area; and aiming at the noise label in the data, the invention judges the reliability of the data by using an adaptive threshold method with different categories. And performing optimal selective learning on the data according to the model judgment result.

It is understood that the method provided by the present application may be a program written as a processing logic in a hardware system, or may be an image recognition device, and the processing logic is implemented in an integrated or external manner. As one implementation, the apparatus for image recognition is implemented by obtaining an input image; then inputting the input image into a preset recognition network in a target model to obtain an attention diagram, wherein the attention diagram comprises an attention area, carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network; further inputting the input image into a target recognition network to obtain an image characteristic diagram; and then inputting the image feature map into a hierarchical identification network in the target model to obtain a first type label and a second type label, wherein the hierarchical identification network comprises a first-level label branch and a second-level label branch, the first-level label branch is used for determining the first type label of the input image, the second-level label branch is used for identifying the second type label of the input image, the first-type label and the second-type label are used for indicating the same target object, and the description granularity of the second type label for the target object is smaller than that of the first-type label for the target object. The method and the device realize the image hierarchical identification process based on attention area guidance, and the method and the device adopt the attention area in the attention diagram to acquire the enhanced image, so that the model is concentrated on the data learning of key parts and is displayed in a hierarchical identification mode, the detailed part in the image is identified, and the accuracy of image identification is improved.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, and is specifically explained by the following embodiment:

with reference to the above flow architecture, the following describes an image recognition method in the present application, please refer to fig. 3, where fig. 3 is a flow chart of an image recognition method based on artificial intelligence provided in an embodiment of the present application, where the management method may be executed by a terminal device, a server, or both the terminal device and the server, and the embodiment of the present application at least includes the following steps:

301. an input image is acquired.

In this embodiment, the input image may be an image used for training of the target model, and the image has a corresponding feature label, so as to facilitate the learning of the image feature extraction capability by the target model.

In a possible scenario, when the target model is a trained model, the input image may be an image in an instant data stream, for example, new data corresponding to a friend circle is refreshed, so that the new data is subjected to vulgar information recognition to guide an online publishing process.

Specifically, the application takes the target model for identifying the vulgar image as an example for explanation, the first type tag obtained by subsequent identification is used for indicating the individual type of the target object, the second type tag is used for indicating the part type of the target object, namely the result of hierarchical identification, and the specific identification scene is determined according to the actual situation.

It can be understood that in the identification scene of the vulgar image, the vulgar identification and the pornographic identification are different, and the pornographic data is data of genitals exposure, sexual behaviors and the like triggering red lines. Whereas the popular data is the data of wearing exposure and sexy feeling. The pornographic data is more discriminative in terms of problems, and the vulgar data and the normal data are easy to be confused and are more difficult, so that the method is carried out by combining attention guidance and hierarchical identification.

Specifically, the popular data is further subjected to label division, and the label division is divided into: sexual cues, bare children, bare animals, bare artwork, female sexual sensation (first level label) -chest (second level label), female sexual sensation-leg, female sexual sensation-hip, female sexual sensation-stature, male sexual sensation, acg sexual sensation, etc. The specific type of tag depends on the actual scene, and is not limited herein.

302. And inputting the input image into a preset recognition network in the target model to obtain the attention map.

In this embodiment, the attention map includes an attention area, where the attention map is also called an attention thermodynamic map, and image features with different weights are shown by the shade of color in the map, for example, the weight set by a vulgar image area is heavy, and the corresponding color is dark; the corresponding attention area, i.e. a part of the input image, is obtained by the color distribution (weight distribution) in the attention map.

Specifically, as for the architecture of the target model in the present application, as shown in fig. 4, fig. 4 is a scene schematic diagram of another artificial intelligence based image recognition method provided in the embodiment of the present application; inputting an input image into a preset recognition network to obtain an attention diagram, and then adjusting the image based on the attention diagram to obtain an enhanced image, so as to further train the preset recognition network; in addition, for the part of label identification, image feature maps are extracted based on a target identification network obtained through training, and further feature fusion is carried out, and different task branches are input for identification.

It can be understood that, the process of multi-task hierarchical recognition, that is, the recognition process for the same object with different granularities, as shown in fig. 5, fig. 5 is a scene schematic diagram of another artificial intelligence based image recognition method provided by the embodiment of the present application; the figure shows the identification object a1 corresponding to the primary label and the identification object a2 corresponding to the secondary label, and it can be seen that the identification object a2 corresponding to the secondary label is a part of the identification object a1 corresponding to the primary label, for example, the identification object a1 corresponding to the primary label is a female body, and the identification object a2 corresponding to the secondary label is a hip, so that the process of hierarchical identification and determination of a vulgar scene is realized.

303. And performing image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training a preset recognition network according to the enhanced image to obtain a target recognition network.

In this embodiment, the process of identifying the preset identification network based on the enhanced image is a process of directing attention; specifically, attention guidance is a data enhancement technique, i.e. the process shown in fig. 6, and fig. 6 is a scene schematic diagram of another artificial intelligence-based image recognition method provided in this embodiment of the present application; that is, the original image is further enhanced by the attention area learned by the current model (for example, the attention area is hidden and the attention area is intensified), and the enhanced image is further learned (for example, when the attention area is hidden, the label is normal, and when the attention area is intensified, the label is kept unchanged). In this way, it is possible to achieve an active assistance to the model learning that it needs to pay attention to distinguish the different classes of regions.

Specifically, for the process of image adjustment, the attention area may be firstly masked to update the attention map to obtain a first adjusted image, and a label corresponding to the first adjusted image is adjusted; then, strengthening the weight parameters corresponding to the attention area to update the attention map to obtain a second adjustment image, and keeping the label corresponding to the second adjustment image unchanged; and training the preset recognition network according to the first adjustment image and the second adjustment image to obtain a target recognition network. For example, fig. 7 is a scene schematic diagram of another artificial intelligence-based image recognition method provided in the embodiment of the present application; the attention area of the original image is shown in the figure, and the attention area is respectively covered and enhanced, and the labels of the attention area are correspondingly adjusted, so that the recognition capability of the target model for the attention area is improved.

Optionally, in order to further improve the recognition capability of the target model for the attention area, perturbation may be performed on the basis of enhancing the image to enlarge the data volume; specifically, firstly, performing region disturbance based on a first adjustment image to generate a negative sample sequence, namely an image not containing a low-colloquial region; then, carrying out weight parameter disturbance based on the second adjusted image to generate a positive sample sequence, namely an image containing a vulgar area; and training the preset recognition network according to the negative sample sequence and the positive sample sequence to obtain the target recognition network.

Optionally, during training based on the enhanced image, the attention area needs to be constrained. Firstly, determining a primary attention label and a secondary attention label corresponding to an attention area; then, based on the area corresponding to the attention first-level label and the area corresponding to the attention second-level label, restraining to obtain attention loss information; and then, parameter adjustment is carried out on the target recognition network according to the attention loss information, so that the training effect of the target recognition network is improved.

Specifically, for the process of constraining the attention area, that is, the attention areas of the secondary labels are as consistent as possible, the learning effect of the attention area is strengthened by the constraint, which may be specifically performed with reference to the following formula:

wherein, (x, y) is an arbitrary point on the attention map,

the value at (x, y) of the ith channel in the attention map is shown. 1 (condition) means that the output is 1 when the condition is true and 0 when the condition is false.

304. And inputting the input image into a target recognition network to obtain an image feature map.

In this embodiment, after the attention-directed training for the preset recognition network is completed to obtain the target recognition network, the image features may be extracted based on the target recognition network.

Specifically, the target identification network may be a Resnet series network, for example, the Resnet18 identifies a network, and a specific network type depends on an actual scenario and is not limited herein.

305. And inputting the image feature map into a hierarchical recognition network in the target model to obtain a first type label and a second type label.

In this embodiment, as shown in the architecture shown in fig. 4, the hierarchical identification network includes a first-level tag branch and a second-level tag branch, the first-level tag branch is used to determine a first-type tag of the input image, the second-level tag branch is used to identify a second-type tag of the input image, the first-type tag and the second-type tag are used to indicate a same target object, and the granularity of description of the second-type tag for the target object is smaller than the granularity of description of the first-type tag for the target object, so as to implement a hierarchical detail identification process.

Optionally, the training processes of the first-level label branch and the second-level label branch may be performed separately, that is, first-level label training data is obtained; then determining the classification loss in the training data of the primary label to train the branches of the primary label; acquiring secondary label training data; inputting the secondary label training data into a secondary classifier to obtain a secondary label positive sample and a secondary label negative sample; and training the secondary label branches based on the secondary label positive samples and the secondary label negative samples.

It should be noted that in the training process of the secondary label branch, the secondary label of a piece of low-colloquial data is not labeled fully. The annotator often only focuses on the secondary labels in the image that he is most interested in, and ignores some other co-existing secondary labels. For example, if the figure shows a sensation of both chest and leg. If the annotator only annotates a chest feeling, the chest label is positive and the leg label is negative when the data is learned by the model according to normal deep learning, so that the learning of the leg label by the model is confused, and the sample learned by the confused model can be called a noise sample.

In order to avoid the noise sample from participating in the training process of the secondary label branch, a judgment process of a dynamic threshold value can be adopted, namely, a target sample in the training data of the secondary label is determined firstly; then, performing sliding mean calculation based on batch data corresponding to the target sample to obtain dynamic threshold information, wherein the dynamic threshold information comprises a positive sample threshold and a negative sample threshold; inputting the target sample into a two-classifier of logistic regression to obtain a predicted value; and based on the predicted value and the dynamic threshold value information, the positive sample and the negative sample of the secondary label in the training data of the secondary label are determined, so that the accuracy of sample marking is ensured.

Specifically, the process of determining the secondary label positive samples and the secondary label negative samples is to compare the predicted values with the dynamic threshold. For example, comparing the predicted value to a positive sample threshold in the dynamic threshold information; if the predicted value is larger than the positive sample threshold value, determining that the target sample is a secondary label positive sample; or comparing the predicted value with a negative sample threshold value in the dynamic threshold value information; and if the predicted value is smaller than the negative sample threshold value, determining that the target sample is a secondary label negative sample.

It can be understood that, for the determination of the noise sample, that is, when the predicted value is greater than the negative sample threshold value and the predicted value is less than the positive sample threshold value, the target sample is determined to be the noise sample, which may also be referred to as a ignored sample; the noise samples are then set to not participate in the training of the secondary label branch. Specifically, as shown in fig. 8, fig. 8 is a scene schematic diagram of another artificial intelligence based image recognition method provided in the embodiment of the present application; the training data input into the secondary label branch is subjected to dynamic threshold updating based on each sample division batch data, so that a secondary label positive sample, a secondary label negative sample and a secondary label neglecting sample (noise sample) can be obtained, the noise sample is neglected, namely, the noise sample does not participate in the calculation of a loss function, and the accuracy of secondary label branch identification is improved.

In one possible scenario, the positive and negative parts of the loss function set the category threshold, and the positive threshold of each secondary label is initialized to 1 and the negative threshold is initialized to 0. And adjusting the positive and negative thresholds of the corresponding secondary labels by a sliding mean method according to the scores of the secondary label predictions corresponding to different samples in each batch of training. In addition, a missing label sample and a false label sample are distinguished according to the comparison between the output result of the model and the positive and negative thresholds of the corresponding secondary labels, if the positive threshold of the female chest label is 0.7 and the female chest label predicted by the sample model is 0.9, the sample model is regarded as a correct sample and participates in model training; if another sample model predicts a chest label of 0.3, and truly labels the sample as a female chest, the model considers the sample as a mis-standard sample (noise sample) and does not participate in model training.

Specifically, the loss function can be performed with reference to the following formula:

L(x，y)＝1(p(x)＞θ_p)ylog(p(x))-1(p(x)≤θ_n)*(1-y)log(1-p(x))，

θ_p＝min(μ_P+α*σ_P，1)，

θ_n＝max(μ_n-α*σ_n，0)，

wherein, for a certain class of secondary labels, p (x) represents the corresponding model prediction result, theta_pIndicating a positive threshold for the class, θ_nIndicating a negative threshold for that category and y indicating the true label (0 or 1) for that category of the sample. Alpha is the threshold iteration rate, the formula is 0.1, sigma_PPredicting a score, σ, for a model corresponding to a positive sample_nScores are predicted for models of negative examples. The positive/negative threshold value is updated by the score of the sample after each batch model is screened. The purpose of the formula is to train only the samples after model screening, and other samples do not participate in training.

In another possible scenario, after obtaining each hierarchical label, the labeling of the image may be performed, that is, first key information in the first type label is extracted first; extracting second key information in the second type label; then, correlating the first key information with the second key information to obtain the description information of the input image; and then labeling the input image based on the description information.

Specifically, for the marked image, the calling process of the input image can be triggered in response to the target operation; then, caching the input image based on a calling process, and identifying the mark of the input image; when the mark of the input image meets the preset condition (for example, the leg part exposed part is not included), the input image is displayed.

Alternatively, in the above embodiment, the target identification network may be a lightweight network (e.g., a chostnet network) deployed in the terminal, and the hierarchical identification network is an image identification network (e.g., resnet18) deployed on the server side, so that the performance of the algorithm deployed on the service is improved through the cascading framework, where the performance includes the accuracy and speed of image identification.

With reference to the above embodiments, by acquiring an input image; then inputting the input image into a preset recognition network in a target model to obtain an attention diagram, wherein the attention diagram comprises an attention area, carrying out image adjustment on the attention diagram based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network; further inputting the input image into a target recognition network to obtain an image characteristic diagram; and then inputting the image feature map into a hierarchical identification network in the target model to obtain a first type label and a second type label, wherein the hierarchical identification network comprises a first-level label branch and a second-level label branch, the first-level label branch is used for determining the first type label of the input image, the second-level label branch is used for identifying the second type label of the input image, the first-type label and the second-type label are used for indicating the same target object, and the description granularity of the second type label for the target object is smaller than that of the first-type label for the target object. The method and the device realize the image hierarchical identification process based on attention area guidance, and the method and the device adopt the attention area in the attention diagram to acquire the enhanced image, so that the model is concentrated on the data learning of key parts and is displayed in a hierarchical identification mode, the detailed part in the image is identified, and the accuracy of image identification is improved.

Next, the process of image recognition is described with reference to a scene of online publishing of multimedia data, as shown in fig. 9, fig. 9 is a flowchart of another artificial intelligence based image recognition method provided in this embodiment of the present application; the embodiment of the application at least comprises the following steps:

901. an instant data stream is obtained.

In this embodiment, the instant data stream may be a data stream acquired by instant messaging software, such as friend circle refresh data, short video application refresh data, or live video stream data.

Specifically, in the identification process of the video data, one or more frames of images in the video stream may be selected for identification, so as to realize identification of the video data.

902. And inputting the image data in the instant data stream into a target model for recognition to obtain a first type label and a second type label.

In this embodiment, the process of inputting the target model for identification refers to the description of the embodiment shown in fig. 3, which is not described herein again.

It can be understood that, for short video or live application scenes, the video data stream can be converted into image data for identification; specifically, each frame may be used as image data, a start or end frame of a video may also be used as image data, and a fixed image acquisition interval may also be used to extract video frames to obtain image data; due to the fact that certain relevance exists among the contents of the video frames, possible low-custom images cannot be omitted due to the adoption of the interval type acquisition mode, and the identification accuracy is guaranteed while the identification efficiency is improved.

903. And performing vulgar image judgment based on the first type label.

In this embodiment, the first type tag can be directly used to determine whether to include a vulgar image, such as a "female sexual feeling" image, to mark and determine whether to push.

904. And processing and issuing the vulgar parts based on the second type of labels.

In this embodiment, the second type tag may obtain a specific identification portion, and if the first type tag or the second type tag indicates that the image contains vulgar content, the vulgar portion may be processed based on the second type tag, for example, mosaic processing, so that the processed image is published, and accuracy of online information is ensured.

It can be understood that, the order of the information online may be based on the corresponding time sequence when the information is obtained, that is, the instant media data stream is obtained; and then extracting the images in the media data stream according to a target time sequence to obtain input images, and issuing the input images according to the target time sequence after the input images are identified.

In the embodiment, by introducing a guide attention mechanism, category-based adaptive threshold learning and dual-branch dual-task combined training, the identification precision of the vulgar information is high, millions of vulgar data can be accurately hit in hundreds of millions of data in the instant communication software every day, and in addition, new scenes such as video numbers, live broadcast and the like are not influenced by the vulgar image information.

The data maintenance on the collection server side will now explain the above-described image recognition process. Referring to fig. 10, fig. 10 is a flowchart of another artificial intelligence based image recognition method according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:

1001. an instant data stream is obtained.

In this embodiment, the acquisition of the instant data stream is similar to step 901 in the embodiment shown in fig. 9, and is not described herein again.

1002. And inputting the image data in the instant data stream into a target model for recognition to obtain a first type label and a second type label.

1003. And determining and publishing the normal image in the instant data stream.

In this embodiment, for data whose first type tag indicates that the data is normal, immediate upper limit issue may be performed.

1004. And marking the abnormal image based on the first type label and the second type label, and uploading the abnormal image to a server.

In this embodiment, since the second type tag records a specific position of the popular part of the image, it can be marked and generate corresponding description information, for example: this is because the image includes bare legs.

Optionally, in the process of identifying the similar pictures, the pictures in which the popular parts of the images are recorded are inspected for the attention areas in which the popular parts of the images are recorded, so that the popular image identification efficiency is improved, and the dynamic identification process is realized, namely, the follow-up image identification can be guided by continuously collecting the attention areas of the popular images.

Specifically, there may be an image that is uploaded incorrectly, so model parameter adjustment may be performed based on the image that is uploaded incorrectly to improve the accuracy of the target model.

According to the embodiment, the image identification process is high in interpretability, the reason analysis of the model on the image prediction result can be given according to the attention area of the visual model, and the higher index iteration of the model is laid.

In order to better implement the above-mentioned aspects of the embodiments of the present application, the following also provides related apparatuses for implementing the above-mentioned aspects. Referring to fig. 11, fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure, in which the recognition apparatus 1100 includes:

an acquisition unit 1101 for acquiring an input image;

an input unit 1102, configured to input the input image into a preset recognition network in a target model to obtain an attention map, where the attention map includes an attention area,

an adjusting unit 1103, configured to perform image adjustment on the attention map based on the attention area to obtain an enhanced image, and train the preset recognition network according to the enhanced image to obtain a target recognition network;

the input unit 1102 is further configured to input the input image into the target recognition network to obtain an image feature map;

an identifying unit 1104, configured to input the image feature map into a hierarchical identification network in the target model to obtain a first type tag and a second type tag, where the hierarchical identification network includes a first-level tag branch and a second-level tag branch, the first-level tag branch is used to determine the first type tag of the input image, the second-level tag branch is used to identify the second type tag of the input image, the first-type tag and the second-type tag are used to indicate the same target object, and a description granularity of the second type tag for the target object is smaller than a description granularity of the first-type tag for the target object.

Optionally, in some possible implementations of the present application, the adjusting unit 1103 is specifically configured to mask the attention area, update the attention map to obtain a first adjusted image, and adjust a label corresponding to the first adjusted image;

the adjusting unit 1103 is specifically configured to enhance the weight parameter corresponding to the attention area to update the attention map to obtain a second adjusted image, and keep a label corresponding to the second adjusted image unchanged;

the adjusting unit 1103 is specifically configured to train the preset recognition network according to the first adjusted image and the second adjusted image, so as to obtain the target recognition network.

Optionally, in some possible implementations of the present application, the adjusting unit 1103 is specifically configured to perform region perturbation based on the first adjusted image to generate a negative sample sequence;

the adjusting unit 1103 is specifically configured to perform weight parameter perturbation based on the second adjusted image to generate a positive sample sequence;

the adjusting unit 1103 is specifically configured to train the preset identification network according to the negative sample sequence and the positive sample sequence, so as to obtain the target identification network.

Optionally, in some possible implementations of the present application, the adjusting unit 1103 is specifically configured to determine a primary attention label and a secondary attention label corresponding to the attention area;

the adjusting unit 1103 is specifically configured to constrain, based on the area corresponding to the attention primary label and the area corresponding to the attention secondary label, to obtain attention loss information;

the adjusting unit 1103 is specifically configured to perform parameter adjustment on the target identification network according to the attention loss information.

Optionally, in some possible implementation manners of the present application, the identifying unit 1104 is specifically configured to obtain primary label training data;

the identifying unit 1104 is specifically configured to determine a classification loss in the primary label training data, so as to train the primary label branch;

the identification unit 1104 is specifically configured to obtain secondary label training data;

the identification unit 1104 is specifically configured to input the secondary label training data into a secondary classifier to obtain a secondary label positive sample and a secondary label negative sample;

the identifying unit 1104 is specifically configured to train the secondary label branch based on the secondary label positive sample and the secondary label negative sample.

Optionally, in some possible implementations of the present application, the identifying unit 1104 is specifically configured to determine a target sample in the secondary label training data;

the identifying unit 1104 is specifically configured to perform sliding mean calculation based on the batch data corresponding to the target sample to obtain dynamic threshold information, where the dynamic threshold information includes a positive sample threshold and a negative sample threshold;

the identifying unit 1104 is specifically configured to input the target sample into the two classifiers to obtain a predicted value;

the identifying unit 1104 is specifically configured to compare the predicted value with the dynamic threshold information to determine the secondary label positive sample and the secondary label negative sample in the secondary label training data.

Optionally, in some possible implementations of the present application, the identifying unit 1104 is specifically configured to compare the predicted value with a positive sample threshold in the dynamic threshold information;

the identifying unit 1104 is specifically configured to compare the predicted value with a negative sample threshold in the dynamic threshold information;

the identifying unit 1104 is specifically configured to determine that the target sample is the secondary label negative sample if the predicted value is smaller than the negative sample threshold.

Optionally, in some possible implementations of the present application, the identifying unit 1104 is specifically configured to determine that the target sample is a noise sample if the predicted value is greater than the negative sample threshold and the predicted value is less than the positive sample threshold;

the identifying unit 1104 is specifically configured to set the noise sample not to participate in the training of the secondary label branch.

Optionally, in some possible implementations of the present application, the obtaining unit 1101 is specifically configured to obtain an instant media data stream;

the obtaining unit 1101 is specifically configured to extract images in the media data stream according to a target time sequence to obtain the input image, and the input image is issued according to the target time sequence after being identified.

Optionally, in some possible implementations of the present application, the identifying unit 1104 is specifically configured to extract first key information in the first type tag;

the identification unit 1104 is specifically configured to extract second key information in the second type tag;

the identifying unit 1104 is specifically configured to associate the first key information with the second key information to obtain description information of the input image;

the identifying unit 1104 is specifically configured to mark the input image based on the description information.

Optionally, in some possible implementations of the present application, the identifying unit 1104 is specifically configured to trigger a calling process of the input image in response to a target operation;

the identifying unit 1104 is specifically configured to cache the input image based on the calling process, and identify a tag of the input image;

the identifying unit 1104 is specifically configured to display the input image if the mark of the input image meets a preset condition.

An embodiment of the present application further provides a terminal device, as shown in fig. 12, which is a schematic structural diagram of another terminal device provided in the embodiment of the present application, and for convenience of description, only a portion related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to a method portion in the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as the mobile phone as an example:

fig. 12 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 12, the cellular phone includes: radio Frequency (RF) circuitry 1210, memory 1220, input unit 1230, display unit 1240, sensors 1250, audio circuitry 1260, wireless fidelity (WiFi) module 1270, processor 1280, and power supply 1290. Those skilled in the art will appreciate that the handset configuration shown in fig. 12 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 12:

the RF circuit 1210 is configured to receive and transmit signals during information transmission and reception or during a call, and in particular, receive downlink information of a base station and then process the received downlink information to the processor 1280; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 1210 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 1210 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 1220 may be used to store software programs and modules, and the processor 1280 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 1220. The memory 1220 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1220 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1230 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 1230 may include a touch panel 1231 and other input devices 1232. The touch panel 1231, also referred to as a touch screen, can collect touch operations of a user (e.g., operations of the user on or near the touch panel 1231 using any suitable object or accessory such as a finger, a stylus, etc., and a range of spaced touch operations on the touch panel 1231) and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 1231 may include two portions, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 1280, and can receive and execute commands sent by the processor 1280. In addition, the touch panel 1231 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1230 may include other input devices 1232 in addition to the touch panel 1231. In particular, other input devices 1232 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 1240 may be used to display information input by the user or information provided to the user and various menus of the cellular phone. The display unit 1240 may include a display panel 1241, and optionally, the display panel 1241 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, touch panel 1231 can overlay display panel 1241, and when touch panel 1231 detects a touch operation thereon or nearby, the touch panel 1231 can transmit the touch operation to processor 1280 to determine the type of the touch event, and then processor 1280 can provide a corresponding visual output on display panel 1241 according to the type of the touch event. Although in fig. 12, the touch panel 1231 and the display panel 1241 are implemented as two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1231 and the display panel 1241 may be integrated to implement the input and output functions of the mobile phone.

The cell phone may also include at least one sensor 1250, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1241 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1241 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 1260, speaker 1261, and microphone 1262 can provide an audio interface between a user and a cell phone. The audio circuit 1260 can transmit the received electrical signal converted from the audio data to the speaker 1261, and the audio signal is converted into a sound signal by the speaker 1261 and output; on the other hand, the microphone 1262 converts the collected sound signals into electrical signals, which are received by the audio circuit 1260 and converted into audio data, which are processed by the audio data output processor 1280, and then passed through the RF circuit 1210 to be transmitted to, for example, another cellular phone, or output to the memory 1220 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 1270, and provides wireless broadband internet access for the user. Although fig. 12 shows the WiFi module 1270, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 1280 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1220 and calling data stored in the memory 1220, thereby performing overall monitoring of the mobile phone. Optionally, processor 1280 may include one or more processing units; optionally, the processor 1280 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into the processor 1280.

The mobile phone further includes a power supply 1290 (e.g., a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the processor 1280 through a power management system, so that the power management system may manage functions such as charging, discharging, and power consumption management.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In this embodiment, the processor 1280 included in the terminal further has a function of executing each step of the page processing method.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a server provided in this embodiment, and the server 1300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1322 (e.g., one or more processors) and a memory 1332, and one or more storage media 1330 (e.g., one or more mass storage devices) storing an application 1342 or data 1344. Memory 1332 and storage medium 1330 may be, among other things, transitory or persistent storage. The program stored on the storage medium 1330 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, the central processor 1322 may be arranged in communication with the storage medium 1330, executing a sequence of instruction operations in the storage medium 1330 on the server 1300.

The server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input-output interfaces 1358, and/or one or more operating systems 1341, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the management apparatus in the above-described embodiment may be based on the server configuration shown in fig. 13.

Also provided in the embodiments of the present application is a computer-readable storage medium, which stores instructions for image recognition, and when the instructions are executed on a computer, the instructions cause the computer to perform the steps performed by the image recognition apparatus in the methods described in the foregoing embodiments shown in fig. 3 to 10.

Also provided in the embodiments of the present application is a computer program product including instructions for image recognition, which when run on a computer, causes the computer to perform the steps performed by the apparatus for image recognition in the method described in the foregoing embodiments shown in fig. 3 to 10.

The embodiment of the present application further provides an image recognition system, where the image recognition system may include the image recognition apparatus in the embodiment described in fig. 11, or the terminal device in the embodiment described in fig. 12, or the server described in fig. 13.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an image recognition device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for artificial intelligence based image recognition, comprising:

acquiring an input image;

2. The method of claim 1, wherein the image adjusting the attention map based on the attention area to obtain an enhanced image, and training the preset recognition network according to the enhanced image to obtain a target recognition network comprises:

3. The method of claim 2, wherein training the pre-set recognition network according to the first adjusted image and the second adjusted image to obtain the target recognition network comprises:

4. The method of claim 2, further comprising:

5. The method of claim 1, further comprising:

acquiring first-level label training data;

acquiring secondary label training data;

6. The method of claim 5, wherein inputting the secondary label training data into a secondary classifier to obtain secondary label positive samples and secondary label negative samples comprises:

determining a target sample in the secondary label training data;

7. The method of claim 6, wherein the comparing based on the predicted value and the dynamic threshold information to determine the secondary label positive examples and secondary label negative examples in the secondary label training data comprises:

8. The method of claim 7, further comprising:

9. The method of claim 1, wherein the acquiring the input image comprises:

acquiring an instant media data stream;

10. The method according to any one of claims 1-9, further comprising:

extracting first key information in the first type label;

extracting second key information in the second type label;

the input image is labeled based on the description information.

11. The method of claim 10, further comprising:

12. The method of claim 1, wherein the target model is used for identification of vulgar images, wherein the first type tag is used to indicate an individual type of the target object, and wherein the second type tag is used to indicate a location type of the target object.

13. An apparatus for image recognition, comprising:

an acquisition unit configured to acquire an input image;

14. A computer device, the computer device comprising a processor and a memory:

the memory is used for storing program codes; the processor is configured to perform the method of image recognition according to any one of claims 1 to 12 according to instructions in the program code.

15. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of image recognition of any of the preceding claims 1 to 12.