CN111242230A

CN111242230A - Image processing method and image classification model training method based on artificial intelligence

Info

Publication number: CN111242230A
Application number: CN202010051557.3A
Authority: CN
Inventors: 郭梓铿
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-06-05

Abstract

The invention provides an image processing method, an image classification model training method and device, electronic equipment and a storage medium based on artificial intelligence; the method comprises the following steps: carrying out feature extraction processing on an image to be recognized to obtain a feature vector; dividing the feature vector into N sub-vectors to be identified; wherein N is an integer greater than 1; classifying each sub-vector to be identified in the N sub-vectors to be identified respectively to obtain N category prediction results of the content included in the image to be identified correspondingly; and determining the category of the content included in the image to be recognized by combining the N category prediction results. By the method and the device, the accuracy of image identification can be improved.

Description

Image processing method and image classification model training method based on artificial intelligence

Technical Field

The invention relates to an artificial intelligence technology, in particular to an image processing method, an image classification model training method and device, electronic equipment and a storage medium based on artificial intelligence.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence.

Image recognition is an important application of artificial intelligence, and in the solutions provided in the related art, usually, feature extraction is performed on an image to obtain a feature vector, and a category of content included in the image is directly predicted according to the feature vector, for example, whether the predicted image includes a human face or not is determined. However, the feature vector obtained by feature extraction usually contains much redundant information that is not conducive to recognition, and the type obtained from the feature vector is not accurate, and the accuracy of image recognition is poor.

Disclosure of Invention

The embodiment of the invention provides an image processing method based on artificial intelligence, an image classification model training method, an image classification model training device, electronic equipment and a storage medium, which can improve the accuracy of image recognition and obtain more accurate classes.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image processing method based on artificial intelligence, which comprises the following steps:

carrying out feature extraction processing on an image to be recognized to obtain a feature vector;

dividing the feature vector into N sub-vectors to be identified; wherein N is an integer greater than 1;

classifying each sub-vector to be identified in the N sub-vectors to be identified respectively to obtain N category prediction results of the content included in the image to be identified correspondingly;

and determining the category of the content included in the image to be recognized by combining the N category prediction results.

The embodiment of the invention provides an image classification model training method based on artificial intelligence, which comprises the following steps:

acquiring a sample image and a sample category of content included in the sample image;

carrying out feature extraction processing on the sample image through an image classification model to obtain a sample feature vector;

segmenting the sample feature vector into N sample sub-vectors;

determining a loss value of the image classification model according to the N sample sub-vectors and a sample category of content included in the sample image;

according to the loss value, performing back propagation in the image classification model, and

in the process of back propagation, updating the weight parameters of the image classification model along the gradient descending direction;

the image classification model is used for identifying the image to be identified.

The embodiment of the invention provides an image processing device based on artificial intelligence, which comprises:

the extraction module is used for extracting the features of the image to be identified to obtain a feature vector;

the segmentation module is used for segmenting the feature vector into N sub-vectors to be identified; wherein N is an integer greater than 1;

the classification module is used for respectively performing classification processing according to each to-be-identified sub-vector in the N to-be-identified sub-vectors to correspondingly obtain N category prediction results of contents included in the to-be-identified image;

and the category determining module is used for determining the category of the content included in the image to be identified by combining the N category prediction results.

The embodiment of the invention provides an image classification model training device based on artificial intelligence, which comprises:

the system comprises a sample acquisition module, a content acquisition module and a content classification module, wherein the sample acquisition module is used for acquiring a sample image and a sample category of content included in the sample image;

the sample extraction module is used for carrying out feature extraction processing on the sample image through an image classification model to obtain a sample feature vector;

a sample segmentation module for segmenting the sample feature vector into N sample sub-vectors;

a loss determining module, configured to determine a loss value of the image classification model according to the N sample sub-vectors and a sample category of content included in the sample image;

the updating module is used for performing backward propagation in the image classification model according to the loss value and updating the weight parameter of the image classification model along the gradient descending direction in the process of backward propagation;

An embodiment of the present invention provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the image processing method based on artificial intelligence or the image classification model training method based on artificial intelligence provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

Embodiments of the present invention provide a storage medium storing executable instructions for causing a processor to execute the method for processing an image based on artificial intelligence or the method for training an image classification model based on artificial intelligence according to the embodiments of the present invention.

The embodiment of the invention has the following beneficial effects:

according to the method and the device, the characteristic vector of the image to be recognized is divided into N sub-vectors to be recognized, the influence of redundant (error) information which is not beneficial to recognition is weakened by combining the category prediction results of the N sub-vectors to be recognized, for example, when the redundant information which is not beneficial to recognition exists in a certain sub-vector to be recognized in a centralized manner, the influence degree of the sub-vectors to be recognized including the redundant information on the final result can be reduced by combining the N category prediction results, a more accurate category is obtained, and the accuracy of image recognition is improved.

Drawings

FIG. 1 is a schematic diagram of an alternative architecture of an artificial intelligence based image processing system according to an embodiment of the present invention;

FIG. 2 is a block chain network-based image processing system with artificial intelligence according to an alternative embodiment of the present invention;

FIG. 3A is an alternative architecture diagram of a server according to an embodiment of the present invention;

FIG. 3B is an alternative architecture diagram of a server according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative architecture of an artificial intelligence based image processing apparatus according to an embodiment of the present invention;

FIG. 5A is a schematic flow chart of an alternative artificial intelligence based image processing method according to an embodiment of the present invention;

FIG. 5B is a schematic flow chart of an alternative artificial intelligence based image processing method according to an embodiment of the present invention;

FIG. 5C is a schematic flow chart of an alternative artificial intelligence based image processing method according to an embodiment of the present invention;

FIG. 5D is a schematic flow chart of an alternative method for training an image classification model based on artificial intelligence according to an embodiment of the present invention;

FIG. 6 is an alternative schematic diagram of feature extraction processing by a convolutional neural network model according to an embodiment of the present invention;

FIG. 7 is an architecture diagram of model training provided by the related art;

FIG. 8 is an alternative schematic diagram of vector slicing provided by embodiments of the present invention;

FIG. 9 is an alternative architectural diagram of model training provided by embodiments of the present invention;

FIG. 10 is an alternative flow chart of model training provided by embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Image recognition: refers to a process of identifying or verifying a category of subject content in an image, such as identifying a category of a face in an image, or identifying a category of a building in an image, and so on.

2) Feature extraction: the original image is converted into the feature vector, and the redundancy of data can be reduced by means of feature extraction, more meaningful potential variables can be found, and the original image can be further understood.

3) Neural Network (NN) model: a model for simulating the structure and function of biological nerve network (animal's central nervous system, especially brain) features that the interconnection relation between a lot of nodes in the model is regulated to process information.

4) Convolutional Neural Networks (CNN) model: a class of feed-forward image classification models that include convolution calculations and have depth structures, convolution can be used as a feature extractor.

5) Gradient descent method: an optimization mode of an image classification model can carry out iterative solution through a gradient descent method when solving the minimum value of a loss function, namely the minimum loss value.

6) High-dimensional space: in the process of image recognition, the image is mapped to a feature space, so that a feature vector is obtained. In this document, a high-dimensional space and a low-dimensional space are relative concepts, and the high-dimensional space refers to a feature vector obtained by feature extraction processing, that is, a feature space corresponding to an undivided feature vector.

7) Low-dimensional space: and indicating the feature space corresponding to the segmented sub-vectors.

8) Blockchain (Blockchain): an encrypted, chained transactional memory structure formed of blocks (blocks).

9) Block chain Network (Blockchain Network): the new block is incorporated into the set of a series of nodes of the block chain in a consensus manner.

Embodiments of the present invention provide an image processing method, an image classification model training method, an image processing apparatus, an electronic device, and a storage medium based on artificial intelligence, which can improve accuracy of image recognition and obtain a more accurate classification.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of an artificial intelligence based image processing system 100 according to an embodiment of the present invention, in which, to implement supporting an artificial intelligence based image processing application, a terminal device 400 (an exemplary terminal device 400-1 and a terminal device 400-2 are shown) is connected to a server 200 through a network 300, and the server 200 is connected to a database 500, where the network 300 may be a wide area network or a local area network, or a combination of both.

The terminal device 400 is configured to send an image to be recognized to the server 200, where the image to be recognized may be an image captured by the terminal device 400 or a network image; the server 200 is configured to obtain the sample image and the sample category of the content included in the sample image from the database 500, and update the image classification model according to the sample image and the sample category; performing feature extraction processing on the image to be recognized through the updated image classification model to obtain a feature vector; dividing the feature vector into N sub-vectors to be identified, wherein N is an integer greater than 1; classifying each sub-vector to be identified in the N sub-vectors to be identified respectively to obtain N category prediction results of the content included in the image to be identified correspondingly; determining the category of the content included in the image to be recognized by combining the N category prediction results, and sending the category to the terminal device 400; terminal device 400 is also configured to display the category on graphical interface 410 (graphical interface 410-1 and graphical interface 410-2 are shown as examples). Fig. 1 illustrates a scene of face recognition, which exemplarily shows that a category of content included in an image to be recognized is a user a.

The embodiment of the invention can also be realized by combining a block chain technology, and the block chain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

Referring to fig. 2, fig. 2 is an alternative architecture diagram of the artificial intelligence based image processing system 110 according to an embodiment of the present invention, which includes a blockchain network 600 (exemplarily showing a node 610-1 to a node 610-3), an authentication center 700, a service system 800 (exemplarily showing an electronic device 810 belonging to the service system 800, where the electronic device 810 may be the server 200 or the terminal device 400 in fig. 1), which are respectively described below.

The type of blockchain network 600 is flexible and may be, for example, any of a public chain, a private chain, or a federation chain. Taking a public link as an example, electronic devices such as terminal devices and servers of any service system can access the blockchain network 600 without authorization; taking a federation chain as an example, an electronic device (e.g., a terminal device/server) hosted by a service system after being authorized can access the blockchain network 600, and at this time, the service system becomes a special node, i.e., a client node, in the blockchain network 600.

Note that the client node may provide only functions that support the business system to initiate transactions (e.g., for uplink storage of data or querying of data on the chain), and may be implemented by default or selectively (e.g., depending on the specific business requirements of the business system) for functions of native nodes of the blockchain network 600, such as the below ranking function, consensus service, ledger function, and the like. Therefore, data and service processing logic of the service system can be migrated to the blockchain network 600 to the maximum extent, and the credibility and traceability of the data and service processing process are realized through the blockchain network 600.

Blockchain network 600 receives a transaction submitted from a client node (e.g., electronic device 810 attributed to business system 800 shown in fig. 2) of a business system (e.g., business system 800 shown in fig. 2), executes the transaction to update the ledger or query the ledger.

An exemplary application of the blockchain network is described below, taking the example of the service system accessing the blockchain network to implement the uplink of the image classification model.

The electronic device 810 of the service system 800 accesses the blockchain network 600 to become a client node of the blockchain network 600. The electronic device 810 first trains an image classification model for identifying an image to be identified, and in the training process, the electronic device 810 acquires a sample image and a sample category of content included in the sample image, where the sample image and the sample category may be acquired by the electronic device 810 from a database or from the block chain network 600. Then, the electronic device 810 performs feature extraction processing on the sample image through an image classification model, divides the obtained sample feature vector into N sample sub-vectors, determines a loss value of the image classification model according to the N sample sub-vectors and the sample category of the content included in the sample image, and updates the image classification model according to the loss value, for example, by using a back propagation mechanism.

After the update of the image classification model is completed, the electronic device 810 generates an asymmetric key pair including a public key and a private key according to an asymmetric encryption algorithm, and encrypts the updated image classification model according to the public key. The electronic device 810 then generates a transaction that submits the encrypted image classification model, in which the smart contract that needs to be invoked to implement the submission and the parameters passed to the smart contract are specified, and the transaction also carries a digital signature signed by the business system 800 (e.g., a digest of the transaction is encrypted using a private key in a digital certificate of the business system 800), and broadcasts the transaction to the blockchain network 600. Wherein, the digital certificate can be obtained by the service system 800 registering with the authentication center 700.

When a node 610 in the blockchain network 600 receives a transaction, a digital signature carried by the transaction is verified, after the digital signature is successfully verified, whether the business system 800 has a transaction right is determined according to the identity of the business system 800 carried in the transaction, and the transaction fails due to any verification judgment of the digital signature and the right verification. After successful verification, the node 610 signs its own digital signature and continues to broadcast in the blockchain network 600.

After the node 610 with the sorting function in the blockchain network 600 receives the transaction successfully verified, the transaction is filled into a new block and broadcasted to the nodes providing the consensus service in the blockchain network 600.

The node 610 providing the consensus service in the blockchain network 600 performs the consensus process on the new block to reach an agreement, the node providing the ledger function adds the new block to the tail of the blockchain, and performs the transaction in the new block: and for the transaction submitting the encrypted image classification model, storing the encrypted image classification model to a state database in a key value pair mode.

An exemplary application of the blockchain network is described below, taking a business system accessing the blockchain network to implement the query of the image classification model as an example.

The electronic device 810 generates a transaction for querying the encrypted image classification model according to an instruction or preset logic of a user, and specifies an intelligent contract to be invoked for implementing a query operation and parameters to be transferred to the intelligent contract in the transaction, and the transaction also carries a digital signature signed by the service system 800. Then, the electronic device 810 broadcasts the transaction to the blockchain network 600, and after the nodes 610 of the blockchain network are verified, block-filled and agreed, the node 610 providing the ledger function appends the formed new block to the tail of the blockchain and executes the transaction in the new block: for transactions that query the encrypted image classification model, the encrypted image classification model is queried from the state database and sent to the electronic device 810. The electronic device 810 can decrypt the encrypted image classification model through the private key of the asymmetric key pair, so as to apply the encrypted image classification model to the identification of the image to be identified. It should be noted that the data stored in the status database is generally the same as the data stored in the blockchain, and when responding to the query transaction, the data in the status database is preferentially responded, so as to improve the response efficiency.

The following continues to illustrate exemplary applications of the electronic device provided by embodiments of the present invention. The electronic device may be implemented as various types of terminal devices such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), and the like, and may also be implemented as a server. Next, an electronic device will be described as an example of a server.

Referring to fig. 3A, fig. 3A is a schematic diagram of an architecture of a server 200 (for example, the server 200 shown in fig. 1) according to an embodiment of the present invention, where the server 200 shown in fig. 3A includes: at least one processor 210, memory 240, and at least one network interface 220. The various components in server 200 are coupled together by a bus system 230. It is understood that the bus system 230 is used to enable connected communication between these components. The bus system 230 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 230 in fig. 3A.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 240 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 240 optionally includes one or more storage devices physically located remote from processor 210.

The memory 240 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 240 described in connection with embodiments of the present invention is intended to comprise any suitable type of memory.

In some embodiments, memory 240 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, to support various operations, as exemplified below.

An operating system 241, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 242 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), among others.

In some embodiments, the artificial intelligence based image processing apparatus provided by the embodiments of the present invention can be implemented in software, and fig. 3A illustrates an artificial intelligence based image processing apparatus 2431 stored in the memory 240, which can be software in the form of programs and plug-ins, and the like, and includes the following software modules: an extraction module 24311, a segmentation module 24312, a classification module 24313, and a category determination module 24314, which are logical and thus can be arbitrarily combined or further split depending on the functionality implemented. The functions of the respective modules will be explained below.

In some embodiments, the image classification model training device based on artificial intelligence provided in the embodiments of the present invention may also be implemented in a software manner, see fig. 3B, and except for the image classification model training device 2432 based on artificial intelligence shown in fig. 3B, the rest of the image classification model training device based on artificial intelligence may be the same as that shown in fig. 3A, and details are not repeated here. For the artificial intelligence based image classification model training device 2432 stored in the memory 240, which may be software in the form of programs and plug-ins, etc., the following software modules are included: sample acquisition module 24321, sample extraction module 24322, sample segmentation module 24323, loss determination module 24324, and update module 24325, which are logical and thus can be arbitrarily combined or further split depending on the functionality implemented. The functions of the respective modules will be explained below.

In other embodiments, the artificial intelligence based image processing apparatus and the artificial intelligence based image classification model training apparatus provided in the embodiments of the present invention may be implemented in hardware, and for example, the artificial intelligence based image processing apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the artificial intelligence based image processing method provided in the embodiments of the present invention; the artificial intelligence based image classification model training device provided by the embodiment of the invention can be a processor in the form of a hardware decoding processor, and is programmed to execute the artificial intelligence based image classification model training method provided by the embodiment of the invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field-Programmable Gate arrays (fpgas), or other electronic components.

The image processing method based on artificial intelligence provided by the embodiment of the present invention may be executed by the server, or may be executed by a terminal device (for example, the terminal device 400-1 and the terminal device 400-2 shown in fig. 1), or may be executed by both the server and the terminal device.

In the following, a process of implementing the artificial intelligence based image processing method by the embedded artificial intelligence based image processing apparatus in the electronic device will be described in conjunction with the exemplary application and structure of the electronic device set forth above.

Referring to fig. 4 and 5A, fig. 4 is a schematic structural diagram of an artificial intelligence based image processing apparatus 2431 provided in an embodiment of the present invention, which shows a flow of implementing category prediction through a series of modules, and fig. 5A is a schematic flow diagram of an artificial intelligence based image processing method provided in an embodiment of the present invention, and the steps shown in fig. 5A will be described with reference to fig. 4.

In step 101, feature extraction processing is performed on an image to be recognized to obtain a feature vector.

As an example, referring to fig. 4, in the extracting module 24311, an image to be recognized is obtained, and the image to be recognized may be a network image or a local image of the terminal device. And performing feature extraction processing on the acquired image to be recognized to obtain a feature vector, wherein the feature extraction processing can be performed on the image to be recognized through an image classification model, and the image classification model can be acquired from a database or a block chain network.

In step 102, the feature vector is divided into N sub-vectors to be identified; wherein N is an integer greater than 1.

As an example, referring to fig. 4, in the segmentation module 24312, the feature vector is sequentially segmented into N sub-vectors according to the segmentation value N, that is, the image to be recognized is mapped to N low-dimensional spaces, and the sub-vectors are vectors located in the low-dimensional spaces, and for convenience of differentiation, the sub-vectors are named as sub-vectors to be recognized, where N is an integer greater than 1 and can be set in advance.

In step 103, the classification processing is performed according to each of the N to-be-identified sub-vectors, so as to correspondingly obtain N category prediction results of the content included in the to-be-identified image.

As an example, referring to fig. 4, in the classifying module 24313, each sub-vector to be recognized is classified, for example, by a softmax function, so as to obtain at least two classes and a confidence corresponding to each class. For each to-be-identified sub-vector, the class prediction result of the to-be-identified sub-vector may be set to include all classes obtained through classification processing and corresponding confidence degrees, or the class prediction result may be set to include only the class corresponding to the confidence degree with the largest value. For N to-be-identified subvectors, N category prediction results can be obtained.

In step 104, the category of the content included in the image to be recognized is determined in combination with the N category prediction results.

Here, the final category may be selected by applying a voting method, for example, in a case where the category prediction result includes only the category corresponding to the confidence of the largest value, the category with the largest number of votes in the N category prediction results is determined as the category of the content included in the image to be recognized. For example, the value of N is 4, the categories included in the 1 st and 2 nd category predictors are both category a, the category included in the 3 rd category predictor is category B, and the category included in the 4 th category predictor is category C, and since the number of tickets of category a is 2 tickets, which is higher than the number of tickets of other categories, category a is determined as the category of the content included in the image to be recognized.

On this basis, there may be at least two categories with the largest number of tickets, and in this case, one category may be randomly selected as the category of the content included in the image to be recognized, from among the at least two categories with the largest number of tickets. In addition to the random selection mode, the selection may also be performed according to the confidence level, specifically, for the category with the largest number of votes, the confidence level corresponding to the category is determined first, and then the confidence levels corresponding to the categories are averaged to obtain the measured confidence level of the category, where the confidence level corresponding to the category refers to the confidence level corresponding to the category obtained when the category prediction result including the category is obtained through the classification processing. Then, the category with the most votes and the highest measurement confidence is determined as the category of the content included in the image to be recognized. As another example from the above, if the value of N is changed to 5 and the 5 th class prediction result indicates that the predicted class is class B, the class with the highest number of votes includes classes A and BClass B, for which the Confidence Confidence corresponding to class A obtained when the 1 st to-be-identified sub-vector (corresponding to the 1 st class prediction result) is classified is determined_A1Similarly, the Confidence factor of the category A corresponding to the 2 nd to-be-identified subvector is determined_A2The 3 rd Confidence coefficient Confidence of the category B corresponding to the subvectors to be identified_B1The 5 th Confidence coefficient of the sub-vector to be identified corresponding to the category B_B2Then, the Confidence measure (Confidence) of class A is calculated_A1+Confidence_A2) Per 2, Confidence measure of class B (Confidence)_B1+Confidence_B2) And/2, determining the category with the highest measuring confidence coefficient in the category A and the category B as the category of the content included in the image to be recognized.

In addition, a weighted voting method may also be applied, for example, in the case that the category prediction result includes all categories obtained by the classification processing and corresponding confidence degrees, different weights are set for the classifiers in different low-dimensional spaces, and the confidence degree in each category prediction result is weighted according to the weights. And then, carrying out summation or average processing on the confidence degrees in the N category prediction results according to the categories, and determining the category corresponding to the confidence degree with the maximum value obtained finally as the category of the content included in the image to be recognized.

The embodiment of the invention can be applied to different application scenes, such as the login verification scene shown in fig. 4, in which a user account with login authority can be predetermined, and the category of content contained in an image associated with the user account is determined as the authority category. When a login request initiated by a user is received, image acquisition processing is carried out within a set time period to obtain an image to be recognized, feature extraction processing, vector segmentation and classification processing are carried out on the image to be recognized, and the category included in the image to be recognized is determined. And when the category is the same as the authority category, determining that the verification result of the login request is successful, namely that the user initiating the login request corresponds to the user account with the login authority. Specifically, the login authentication may be access authentication or power-on authentication, and the like, which is not limited in the embodiment of the present invention.

The embodiment of the invention can also be applied to the scene of image clustering, for example, images in the user gallery are determined as images to be identified, after the category of the content included in the images to be identified is determined, the images corresponding to the same category in the user gallery are classified into one category, and the automatic arrangement of the user gallery is completed. It should be noted that the user gallery may be a local gallery or a network gallery.

As can be seen from the above exemplary implementation of fig. 5A, in the embodiment of the present invention, the image to be recognized is mapped to N low-dimensional spaces, and the category prediction results of the N low-dimensional spaces are integrated, so as to weaken the influence of the redundant information that is not beneficial to recognition.

In some embodiments, referring to fig. 5B, fig. 5B is an optional flowchart of the artificial intelligence based image processing method provided in the embodiment of the present invention, and based on fig. 5A, after step 102, in step 201, a feature extraction process may be further performed on the comparison image, and the obtained feature vector is divided into N comparison sub-vectors.

Besides the way of classifying vectors, the category of the content included in the image to be recognized can be determined by the way of vector comparison. Specifically, a contrast image of a known category is acquired, feature extraction processing is performed on the contrast image, and the obtained feature vector is divided into N contrast sub-vectors. It should be noted that the manner of performing the feature extraction processing on the comparison image is the same as the manner of performing the feature extraction processing on the image to be recognized, for example, the feature extraction processing is performed through an image classification model, and in addition, the acquisition route of the comparison image in the embodiment of the present invention is not limited, and for example, the comparison image may be acquired from a database or a block chain network.

In step 202, determining the vector distance between the nth to-be-identified sub-vector and the nth comparison sub-vector until obtaining N vector distances; wherein the value of N is 1, 2, … … N in sequence.

And determining the vector distance between the nth sub-vector to be identified and the nth contrast sub-vector according to the vector segmentation sequence, and iterating the value of N from 1 until obtaining N vector distances. The embodiment of the present invention does not limit the specific type of the vector distance, for example, the vector distance may be an euclidean distance or a cosine distance, and the smaller the vector distance, the higher the similarity between the two corresponding sub-vectors.

In step 203, the N vector distances are averaged to obtain an image vector distance.

And averaging the N vector distances to obtain the image vector distance between the image to be identified and the comparison image. It should be noted that, under the condition of obtaining the feature vector corresponding to the comparison image, the vector distance between the feature vector corresponding to the image to be recognized and the feature vector corresponding to the comparison image may be directly determined as the image vector distance without performing vector segmentation.

In step 204, when the image vector distance is smaller than the distance threshold, it is determined that the image to be recognized and the comparison image include the same category of content.

Here, when the image vector distance is smaller than the distance threshold, it is determined that the image to be recognized and the comparison image include the same category of content. It should be noted that the vector distance in the above steps can be replaced by the vector similarity, and step 203 can be updated as follows: carrying out average processing on the N vector similarities to obtain an image vector similarity; step 204 can be updated as: and when the similarity of the image vector is greater than the similarity threshold value, determining that the image to be identified and the comparison image comprise the same category of content. Wherein, the vector similarity is cosine similarity.

In some embodiments, before step 101, further comprising: acquiring and processing images within a set time period when a login request is received to obtain an image to be identified; determining at least one user account with login authority, and determining an image associated with the user account as a comparison image;

after step 204, further comprising: and loading the account resources of the user account corresponding to the comparison image so as to respond to the login request.

The embodiment of the invention can be applied to a login verification scene, and particularly, the image acquisition processing is carried out in the set time period of receiving the login request, wherein the set time period is 5 seconds, and the image acquisition processing is carried out in the set time period, so that the difficulty of cheating by a user initiating the login request through counterfeiting images is improved. When the image acquisition processing is carried out, only one frame of image can be acquired, and also multiple frames of images can be acquired. Meanwhile, at least one user account with login authority is determined, for example, the user account with login authority in a desktop computer, and an image associated with each user account is determined as a comparison image, wherein the image associated with the user account may be an image including a face of a user corresponding to the user account.

When the image to be identified is only one frame and the image vector distance between the image to be identified and a certain comparison image is smaller than a distance threshold, loading account resources of a user account corresponding to the comparison image to respond to a login request; and when the images to be identified are multiple frames and the image vector distance between the images to be identified and a certain contrast image exceeding the frame number proportion is smaller than the distance threshold, loading account resources of a user account corresponding to the contrast image, wherein the frame number proportion can be set according to an actual application scene, taking the case that the frame number proportion is 50% as an example, and if 10 frames of images to be identified exist and the image vector distance between 6 frames of images to be identified and a certain contrast image is smaller than the distance threshold, determining that the verification is successful. The account resource may be at least one of an image resource and a sound resource. The method can be applied to login verification scenes of terminal equipment such as mobile equipment or desktop computers, login verification precision is improved, and illegal users are prevented from successfully logging in.

As can be seen from the above exemplary implementation of fig. 5B, in the embodiment of the present invention, the class identification of the content included in the image to be identified is realized in another layer by a vector comparison manner, and the method is suitable for the case that the sample classes of the training image classification model are few and cannot cover all classes that need to be predicted.

In some embodiments, referring to fig. 5C, fig. 5C is an optional flowchart of the artificial intelligence based image processing method provided in the embodiment of the present invention, and based on fig. 5B, before step 101, at least one directory in the user gallery may also be determined in step 301.

In the embodiment of the invention, the image clustering of the user gallery can be realized in a vector comparison mode. First, at least one directory in the user gallery is determined, where the directory may be a file directory created by the user, for example, the user actively clusters (adds) images in the user gallery related to "zhangsan" to a new directory, and sets the directory name of the directory to "zhangsan". The user gallery can be a local gallery or a network gallery, such as a gallery located on a content interaction platform.

In step 302, a first traversal process is performed on the user gallery, the traversed image is determined as an image to be identified, in the first traversal process, a second traversal process is performed on a plurality of images except the image to be identified to obtain a contrast image, and a binary image group is constructed according to the image to be identified obtained by the first traversal process and the contrast image obtained by the second traversal process.

Here, the user gallery is subjected to first traversal processing, and the traversed image is determined as the image to be identified. Then, on the basis of determining the image to be recognized, performing second traversal processing on a plurality of images except the image to be recognized to obtain a comparison image, and constructing a binary image group according to the image to be recognized obtained through the first traversal processing and the comparison image obtained through the second traversal processing until all binary image groups possibly appearing in the user gallery are obtained. The binary image group in the embodiment of the present invention is a non-sequential image group, so after the binary image group is obtained, the binary image group including the same images but different image orders can be subjected to deduplication processing, thereby reducing the amount of data to be processed, for example, if the binary image group 1 includes an image a and an image B, and if the binary image group 2 includes an image B and an image a, only one of the binary image group 1 and the binary image group 2 is reserved.

In fig. 5C, after step 203, in step 303, the images in the user gallery are added to at least one image class according to the image vector distance corresponding to the binary image group.

For each binary image group, an image vector distance between two images included in the binary image group is determined, and thus, an image vector distance between each two images in the user gallery can be obtained. The images in the user gallery are then added to at least one image class according to the image vector distance.

In some embodiments, the above-mentioned adding of images in the user gallery to at least one image class according to the image vector distance corresponding to the binary image group may be implemented in such a way that: according to the image vector distance corresponding to the binary image group, clustering the feature vectors corresponding to all the images in the user gallery to obtain at least one vector class; and converting the feature vectors included in each vector class into corresponding images, and combining the images obtained by converting each vector class into corresponding image classes.

Here, according to the image vector distance, clustering processing is performed on feature vectors corresponding to all images in the user gallery to obtain at least one vector class. And for each obtained vector class, converting the feature vectors contained in the vector class into corresponding images, and combining the converted images into the corresponding image classes. By the method, the images in the user gallery are clustered based on the image vector distance, and the clustering accuracy is improved.

In step 304, when the images in the image class exceeding the set ratio belong to the same directory, the directory to which the images exceeding the set ratio belong is determined as the target directory, and the images in the image class not belonging to any directory are added to the target directory.

For each image class, when the images exceeding the set proportion in the image class belong to the same catalog, determining the catalog to which the images exceeding the set proportion belong as a target catalog, wherein the set proportion can be set according to the actual application scene, such as 50%. Because the images belonging to the directories are obtained by the active clustering of the user, in order to avoid influencing the active clustering result of the user, in the embodiment of the invention, only the images not belonging to any directory are clustered, and specifically, the images not belonging to any directory in the images are added to the target directory.

On the contrary, when the images in the image class belong to the same catalog and the number of the images in the image class is the largest does not exceed the set proportion, a new catalog can be automatically created, and the images in the image class which do not belong to any catalog are added to the new catalog.

As can be seen from the above exemplary implementation of fig. 5C, according to the embodiment of the present invention, the images in the user gallery are clustered according to the image vector distance, so that the clustering effect is improved, the automatic arrangement of the images is realized, and the user experience is improved.

The image classification model training method based on artificial intelligence provided by the embodiment of the present invention may be executed by the server, or may be executed by a terminal device (for example, the terminal device 400-1 and the terminal device 400-2 shown in fig. 1), or may be executed by both the server and the terminal device.

The following describes a process of implementing the artificial intelligence based image classification model training method by using the embedded artificial intelligence based image classification model training apparatus in the electronic device, in conjunction with the exemplary application and structure of the electronic device described above.

Referring to fig. 5D, fig. 5D is an optional flowchart of the artificial intelligence based image classification model training method according to the embodiment of the present invention, which will be described with reference to the steps shown.

In step 401, a sample image and a sample category of content included in the sample image are acquired.

In the embodiment of the invention, the image to be recognized can be subjected to feature extraction processing through the image classification model, and the image classification model can be trained in order to improve the feature extraction effect of the image classification model, wherein the type of the image classification model is not limited in the embodiment of the invention, for example, the image classification model can be a convolutional neural network model.

In the training process, firstly, a sample image and an annotated sample class are obtained as training data of an image classification model. The embodiment of the invention does not limit the setting mode of the sample type, and the scene of face recognition is taken as an example, the sample type can be set according to the condition that whether the sample image comprises the face, and if the sample type is set to be 0, the sample image does not comprise the face; when the sample class is 1, it indicates that a human face is included in the sample image. In addition, different sample types can be set for different faces, for example, when the sample type is set to be 0, it is indicated that the sample image includes a face a; when the sample class is 1, it indicates that the face B is included in the sample image.

In some embodiments, the above-described obtaining of the sample image and the sample category of the content included in the sample image may be implemented in such a way that: determining an interactive object of a target user and determining an interactive event participated by the target user; when the interactive event comprises an image, carrying out named entity recognition processing on a text in the interactive event to obtain a named entity in the text; when the named entities are matched with the interactive objects, determining the images in the interactive events as sample images, and creating sample categories according to the named entities so as to enable the sample categories corresponding to different named entities to be different.

Here, the image classification model may be trained for the target user. Firstly, an interactive object of a target user is determined, wherein the interactive object can be preset or obtained by extracting an address book of the interactive object, namely the interactive object can be an address object in the address book. Meanwhile, an interaction event in which the target user participates is determined, wherein the interaction event comprises at least one of text and images, and for example, the interaction event can be a user dynamic or a friend circle published by the target user in the content interaction platform.

When the interactive event only comprises the image or only comprises the text, the interactive event is not subjected to subsequent processing; and when the interactive event comprises an image and a text, carrying out named entity recognition processing on the text in the interactive event to obtain a named entity in the text. The embodiment of the invention does not limit the way of named entity identification processing, for example, named entities in the text can be identified by setting a named entity rule, and named entity identification processing can also be carried out by a related model, such as a Long Short-Term Memory network (LSTM) model.

And when the identified named entities are matched with the interactive objects, determining the images in the interactive events as sample images, and creating sample categories according to the named entities, wherein the sample categories corresponding to different named entities are different. For example, a target user issues a user dynamic including a named entity of "zhang san" and a face image, and an interactive object named "zhang san" also exists in the address book of the target user, the named entity is determined to be matched with the interactive object, an image in an interactive event is determined to be a sample image, and a sample category corresponding to "zhang san" is created. And if the sample category corresponding to the named entity is created, directly establishing the corresponding relation between the created sample category and the sample image. By the method, the sample image and the sample category are effectively analyzed from the interaction event of the target user, and the flexibility of acquiring training data is improved.

In some embodiments, the above-described obtaining of the sample image and the sample category of the content included in the sample image may be implemented in such a way that: determining an interactive object of a target user and determining at least one directory in a user gallery of the target user; and when the catalog is matched with the interactive object, determining the image in the catalog as a sample image, and creating a sample category according to the catalog to ensure that the sample categories corresponding to different catalogs are different.

In addition to parsing the training data from the interaction events of the target user, the user gallery may also be parsed. Specifically, at least one directory in the user gallery of the target user is determined, where the directory may be a file directory. Since the directory name of the directory is usually short, the directory name is not subjected to named entity recognition processing, but is directly subjected to text matching with the interactive object of the target user. When the directory name contains the interactive object, determining that the directory name is matched with the interactive object, determining all images under the directory corresponding to the directory name as sample images, and creating sample categories according to the directory name, wherein the sample categories corresponding to different directory names are different. Likewise, if a sample category corresponding to the directory name has been created, the created sample category is directly associated with the sample image. By the method, the training data are obtained from the user gallery, so that the flexibility and the applicability to different application scenes are further improved.

In step 402, a sample image is subjected to feature extraction processing by an image classification model to obtain a sample feature vector.

Here, the feature extraction processing is performed on the sample image by the image classification model, and the obtained feature vector is named as a sample feature vector.

In step 403, the sample feature vector is sliced into N sample sub-vectors.

The sample feature vector is divided into N sub-vectors, where N is the same as N in step 102, and the sub-vectors obtained here are named sample sub-vectors for easy distinction. The image classification model is trained through the N divided sample sub-vectors, and the training efficiency of the image classification model can be improved.

In some embodiments, before step 403, further comprising: acquiring at least two preset complexity intervals and corresponding segmentation values; wherein, the tangent value is in inverse relation with the numerical value of the complexity interval; determining the complexity of an image classification model; and determining a complexity interval where the complexity of the image classification model is located as a target complexity interval, and determining a segmentation value corresponding to the target complexity interval as N.

In the embodiment of the invention, the improvement degree of the training efficiency of the image classification model is in inverse proportion to the number of the segmented sub-vectors, namely, the improvement degree of the training efficiency is smaller when the number of the segmented sub-vectors is larger. Therefore, in order to obtain more reasonable training efficiency, at least two preset complexity intervals and a segmentation value corresponding to each complexity interval are obtained, wherein the segmentation values and numerical values of the complexity intervals are in an inverse relation. For example, the complexity may be a model volume of the image classification model, and the unit is Mega (MB), before training, a complexity interval a with a numerical range of (0, 1MB) is obtained, and a corresponding tangent value is 4; and acquiring a complexity interval B with a numerical range of more than 1MB, wherein the corresponding segmentation value is 2. When the model volume of the image classification model is 10MB, it is determined that the model volume falls within the complexity section B, and N is set to 2. Of course, this is not a limitation to the embodiment of the present invention, for example, the complexity of the image classification model may also be the number of network layers in the model, or other metrics are applied to determine the complexity, and there may be other setting manners for the complexity interval and the cut value. By the method, when the image classification model is simple and the training time is short, more sub-vectors are divided, so that the precision of the trained model is improved; when the image classification model is complex, the division number of the sub-vectors is reduced, so that the model training efficiency is improved on the basis of ensuring certain precision, and the model is prevented from being trained for a long time.

In step 404, a loss value of the image classification model is determined based on the N sample subvectors and the sample class of the content included in the sample image.

In fig. 5D, step 404 can be realized by steps 501 to 503, and will be described with reference to each step.

In step 501, a classification process is performed according to each sample sub-vector of the N sample sub-vectors, and N sample class prediction results of the content included in the sample image are obtained correspondingly.

Here, the classification processing is performed for each sample sub-vector to obtain a class prediction result corresponding to each sample sub-vector, and the class prediction result obtained here is named a sample class prediction result for the convenience of distinction.

In step 502, determining sub-loss values according to the sample class and the prediction result of the ith sample class until N sub-loss values are obtained; wherein, the value of i is 1, 2, … … N in sequence; the ith sub-loss value is used to represent the difference between the sample class and the ith sample class predictor.

Here, the sample class and the ith sample class prediction result are processed according to the loss function to obtain an ith sub-loss value, and the value of i is iterated from 1 until N sub-loss values are obtained, wherein the ith sub-loss value is used for representing the difference between the sample class and the ith sample class prediction result. The embodiment of the present invention does not limit the type of the loss function, and may be, for example, a cross entropy loss function or other types of loss functions.

In step 503, the N sub-loss values are accumulated to obtain the loss value of the image classification model.

Here, the accumulation processing may be addition processing or weighted sum processing. And performing accumulation processing on the N sub-loss values to obtain the loss value of the image classification model on the whole.

In fig. 5D, after step 404, in step 405, the image classification model is propagated backward according to the loss value, and the weight parameter of the image classification model is updated in the gradient descending direction during the backward propagation.

And according to the obtained loss value of the image classification model, performing backward propagation in the image classification model, calculating a gradient in the process of backward propagation to each network layer of the image classification model, and updating the weight parameter of the network layer along the gradient descending direction. And repeating the process of training the image classification model until a convergence condition is met, wherein the convergence condition comprises a set iteration number or a set accuracy threshold value and the like.

In some embodiments, the above-described obtaining of the sample image and the sample category of the content included in the sample image may be implemented in such a way that: obtaining a training set comprising a plurality of training samples; the training sample comprises a sample image and a sample category of content included in the sample image;

the above back propagation in the image classification model according to the loss values can be achieved in such a way that: adding at least one training sample in a training set to a processing batch; carrying out average processing on loss values corresponding to all training samples in a processing batch to obtain an average loss value; and performing back propagation in the image classification model according to the average loss value.

In an embodiment of the invention, a training set comprising a plurality of training samples may be obtained, and at least one training sample in the training set may be added to a batch (batch), wherein different batches comprise the same number of training samples. And training the image classification model according to each processing batch, specifically, after determining the loss value of the image classification model corresponding to each training sample in the processing batch, averaging the loss values corresponding to all the training samples in the processing batch to obtain an average loss value, and performing back propagation in the image classification model according to the average loss value. By the small-batch gradient descending mode, the training efficiency is improved, and meanwhile, the precision of model training is guaranteed, so that the image classification model can meet the convergence condition more quickly.

In some embodiments, after step 405, further comprising: generating an asymmetric key pair comprising a public key and a private key; and encrypting the updated image classification model according to the public key, and sending the encrypted image classification model to the block chain network, so that the node of the block chain network fills the encrypted image classification model into a new block, and adds the new block to the tail of the block chain.

After the image classification model is updated, an asymmetric key pair comprising a public key and a private key can be generated through an asymmetric encryption algorithm, and the updated image classification model is encrypted according to the public key. And then, sending the encrypted image classification model to a block chain network, and after the nodes of the block chain network are verified, filled with blocks and identified in a consistent manner, adding a new block comprising the encrypted image classification model to the tail part of the block chain. Because the block chain has the characteristic of being not tampered, the accuracy of the model data is improved in a chain linking mode, and meanwhile, the safety of the model data is also improved because the encrypted image classification model is stored in the block chain.

When the image classification model needs to be used, a model request is sent to the blockchain network so as to obtain the encrypted image classification model stored in the blockchain. And then, decrypting the encrypted image classification model according to a private key in the asymmetric key pair, so that the feature extraction processing can be carried out on the image to be recognized through the decrypted image classification model. It should be noted that, when the state database exists, the nodes of the blockchain network may store the encrypted image classification model in the state database at the same time, and preferentially respond to the model request according to the data in the state database, so as to speed up the feedback efficiency. By the aid of the chain winding mode, safety and accuracy of the image classification model are effectively improved.

As can be seen from the above exemplary implementation of fig. 5D, in the embodiment of the present invention, the loss value of the image classification model is determined according to the N cut sample sub-vectors, so that the model training efficiency is effectively improved, and the model training precision is also improved.

In the following, an exemplary application of the embodiment of the present invention in an actual face recognition application scenario will be described, that is, a case where an image to be recognized is a face image will be exemplified.

In fig. 6, after the face image 61 with the dimension of 227 × 3 is input to the convolutional neural network model, the convolutional neural network model performs convolution operation layer by layer through a convolution kernel, and finally the face image 61 is mapped into a feature vector 62, the dimension of the feature vector 62 is 1000, that is, the convolution operation is equivalent to feature extraction, and the convolution kernel used in the convolution operation is equivalent to a feature filter. The convolutional neural network model is the above image classification model.

FIG. 7 is a schematic diagram of the model training architecture provided in the related art, in the solution provided in the related art, the face image is usually mapped to a d-dimensional feature vector f ∈ R^dAnd determining a face type prediction result of the face image (namely the face of which user belongs) according to the d-dimensional feature vector, and training a convolutional neural network model according to a supervision signal, wherein the supervision signal is a loss function. However, the d-dimensional feature vectors have a high dimensionality, for example in face recognitionIn a scene, d is usually 512, so that the d-dimensional feature vector contains more redundant information which is not beneficial to classification, and the efficiency of training the convolutional neural network model according to the d-dimensional feature vector is low.

Thus, the embodiment of the present invention provides a schematic diagram of vector segmentation as shown in fig. 8, compared to a manner in which a class prediction result is directly determined by a d-dimensional feature vector in the related art, in fig. 8, the d-dimensional feature vector is segmented into N sub-vectors, that is, an image is mapped to N low-dimensional spaces, and fig. 8 exemplarily shows a case where N is 4. According to the class prediction results of the cut sub-vectors 1, 2, 3 and 4, the class of the face in the face image (corresponding to the class of the content included in the above image to be recognized) can be determined, for example, by voting, where the class prediction results of the sub-vector 1 and the sub-vector 2 both indicate that the face class of the face included in the face image is user a, the class prediction result of the sub-vector 3 indicates that the face class of the face included in the face image is user B, the class prediction result of the sub-vector 4 indicates that the face class of the face included in the face image is user C, and then the user a with the largest number of votes is finally determined as the face class of the face included in the face image.

Wherein, d-dimension characteristic vectors are divided into N parts according to the dimension, and each sub-vector can be expressed as

The slicing process can be expressed as:

f_i＝[f_i ¹,f_i ²,...,f_i ^N]

where i represents the ith sample.

The embodiment of the present invention further provides an architecture schematic diagram of model training as shown in fig. 9, when training the convolutional neural network model, the same sub-supervisory signals are added to each of the split sub-vectors, and the convolutional neural network model is trained together according to 4 sub-supervisory signals, wherein the total supervisory signal is a sum of the sub-supervisory signals, and can be represented as:

wherein loss (f) is obtained by sub-supervisory signals_i ⁿ,y_i) That is, loss (f) obtained from the above sub-loss value through the master supervision signal_i,y_i) I.e. the loss value, y, above_iRepresenting the sample face class of the ith training sample.

For ease of understanding, the embodiment of the present invention provides a flow chart of model training as shown in fig. 10, which is illustrated in the form of steps:

the method comprises the following steps: and determining the relevant parameters. In this step, the convolutional neural network model for the feature extraction process is named CNN_weightAnd determining a cut value N, an adopted loss function loss _ function and a training set (X, Y), wherein X corresponds to a sample face image (corresponding to the sample image above), and Y corresponds to a sample face category (corresponding to the sample category above) of the sample face image, wherein the sample face categories of different users are different, and the sample face categories can be embodied in a numerical form, for example, the sample face category of the user A is 1, and the sample face category of the user B is 2. In addition, a face image for which a face type needs to be predicted is determined.

Step two: and initializing the weight parameters of the convolutional neural network model, wherein the weight parameters can be initialized randomly or in other ways.

Step three: and training a convolutional neural network model.

Repeating the following process s times, s being an integer greater than 0: {

1) Extracting a processing batch (X) from a training set (X, Y)_i,y_i) Wherein i belongs to {1,... b }, and b is an integer greater than 0;

2) sample face image x_iInputting the data into a convolutional neural network model to obtain a d-dimensional feature vector, namely f_i←CNN_weight(x_i)；

3) The d-dimensional feature vector is divided into N sub-vectors, i.e. [ f ]_i ¹,f_i ²,...,f_i ^N]←f_i；

4) Obtaining a master supervision signal

5) Updating the weight parameters of the convolutional neural network model by means of gradient descent (small batch gradient descent), i.e.

}

Step four: according to the updated convolutional neural network model, feature extraction processing is carried out on the face image to obtain a feature vector, namely f_image＝CNN_weight(image)。

Step five: according to f_imageAnd determining the face class in the face image. Here, since the sample face classes used for training may be fewer and cannot cover all face classes that need to be predicted, except for f_imageBesides the classification processing mode, the face type in the face image can be determined according to the vector comparison mode.

Specifically, at least one contrast image of a known face type is obtained, and feature extraction processing is performed on the contrast image through an updated convolutional neural network model to obtain a feature vector f_contrast. For example, the terminal device stores f_contrastF of the_contrastCorresponding to a contrast image contrast, and the face class in the contrast image contrast may be used to unlock the desktop of the terminal device, for example, the contrast image contrast is associated with a user account having login authority. When a login request initiated by a certain user is detected, the terminal equipment collects a face image, and the face image is processed through the updated convolutional neural network model to obtain a corresponding f_image. Then, according to f_imageAnd f_contrastDetermining image vector distance, when the image vector distance is smaller than a distance threshold value, determining that the face image and the contrast image contain correspond to the same face type, and the terminalThe device performs an unlocking operation in response to the login request. Two determination modes exist in the image vector distance, wherein the first mode is to directly determine f_imageAnd f_contrastThe vector distance between is determined as the image vector distance, the second way is to determine f_imageAnd f_contrastAre all divided into N subvectors and f is determined_imageCorresponding nth sub-vector and f_contrastAnd (3) iterating the value of N from 1 until obtaining N vector distances, and carrying out average processing on the obtained N vector distances to obtain the image vector distance.

The inventor tests and verifies that the scheme provided by the embodiment of the invention can effectively prompt the accuracy of face recognition, and the method specifically comprises the following steps:

1) the accuracy obtained on the open source face recognition data sets LFW (Lable d face the wild), AgeDB-30 and CFP-FP for different convolutional neural network models is as follows:

wherein, the evaluation index of the accuracy is that the False Acceptance Rate (FAR) is 10^-4The evaluation index of the correct acceptance Rate in the case of (1) is TAR (@ FAR ═ 10), i.e., the accuracy Rate^-4). In addition, in the table, the loss functions all adopt Arcface loss functions, and for the MobileFaceNet model introduced with the scheme provided by the embodiment of the present invention, the feature vector output by the model is divided into 4 sub-vectors, and for the ResNet18 model and the ResNet50 model, the feature vector output by the model is divided into 2 sub-vectors. According to the table, it can be determined that the accuracy of identification can be improved by the scheme provided by the embodiment of the invention no matter the lightweight model MobileFaceNet with lower complexity or the expressive model ResNet50 with more parameters.

2) The accuracy obtained for different loss functions is as follows:

in the table, the MobileFaceNet model is used as a basis, and the evaluation index of the accuracy is TA R (@ FAR ═ 10)^-4)。

3) The accuracy of face recognition obtained on the IJB-B dataset and the IJB-C dataset is shown in the following table:

in the table, based on the Arcface loss function and the ResNet100 model, for the ResNet100 model introduced with the scheme provided by the embodiment of the present invention, the feature vector output by the model is divided into 2 sub-vectors. In addition, the evaluation index of accuracy in the table is TAR (@ FAR ═ 10)^-4)。

4) On the MegaFace data set, the accuracy of the obtained face recognition is shown in the following table:

in the table, based on the Arcface loss function and the ResNet100 model, for the ResNet100 model introduced with the scheme provided by the embodiment of the present invention, the feature vector output by the model is divided into 2 sub-vectors. Further, the evaluation indices of accuracy in the table include Rank-1(1M) and TAR (@ FAR ═ 10)^-6) Where Rank-1 refers to the probability that the result of the 1 st hit is the correct result.

Continuing with the exemplary structure in which the artificial intelligence based image processing apparatus 2431 provided by the embodiments of the present invention is implemented as software modules, in some embodiments, as shown in fig. 3A, the software modules stored in the artificial intelligence based image processing apparatus 2431 of the memory 240 may include: the extraction module 24311 is configured to perform feature extraction processing on the image to be recognized to obtain a feature vector; a segmentation module 24312, configured to segment the feature vector into N to-be-identified sub-vectors; wherein N is an integer greater than 1; the classification module 24313 is configured to perform classification processing respectively according to each to-be-identified sub-vector of the N to-be-identified sub-vectors, and correspondingly obtain N category prediction results of content included in the to-be-identified image; and the category determining module 24314 is configured to determine a category of content included in the image to be recognized, according to the N category prediction results.

In some embodiments, the artificial intelligence based image processing device 2431 further comprises: the comparison processing module is used for carrying out feature extraction processing on the comparison image and dividing the obtained feature vector into N comparison sub-vectors; the distance determining module is used for determining the vector distance between the nth to-be-identified sub-vector and the nth contrast sub-vector until N vector distances are obtained; wherein the value of N is 1, 2, … … N in sequence; the average processing module is used for carrying out average processing on the N vector distances to obtain image vector distances; and the same category determining module is used for determining that the image to be identified and the comparison image comprise the same category content when the image vector distance is smaller than the distance threshold.

In some embodiments, the artificial intelligence based image processing device 2431 further comprises: the acquisition module is used for acquiring and processing images within a set time period when the login request is received to obtain an image to be identified; the comparison image determining module is used for determining at least one user account with login authority and determining an image associated with the user account as a comparison image;

the artificial intelligence based image processing apparatus 2431 further includes: and the loading module is used for loading the account resources of the user account corresponding to the comparison image so as to respond to the login request.

In some embodiments, the artificial intelligence based image processing device 2431 further comprises: the catalog determining module is used for determining at least one catalog in the user gallery; the traversal module is used for performing first traversal processing on the user gallery, determining the traversed images as images to be identified, performing second traversal processing on a plurality of images except the images to be identified in the process of the first traversal processing to obtain comparison images, and constructing a binary image group according to the images to be identified obtained by the first traversal processing and the comparison images obtained by the second traversal processing;

the artificial intelligence based image processing apparatus 2431 further includes: the image class determining module is used for adding the images in the user gallery to at least one image class according to the image vector distance corresponding to the binary image group; the first adding module is used for determining the catalogue to which the image exceeding the set proportion belongs as the target catalogue when the images exceeding the set proportion in the image classes belong to the same catalogue, and adding the images which do not belong to any catalogue in the image classes to the target catalogue.

Continuing with the exemplary structure in which the artificial intelligence based image classification model training apparatus 2432 provided by the embodiments of the present invention is implemented as a software module, in some embodiments, as shown in fig. 3B, the software module stored in the artificial intelligence based image classification model training apparatus 2432 of the memory 240 may include: a sample obtaining module 24321, configured to obtain a sample image and a sample category of content included in the sample image; the sample extraction module 24322 is configured to perform feature extraction processing on the sample image through the image classification model to obtain a sample feature vector; a sample segmentation module 24323 for segmenting the sample feature vector into N sample sub-vectors; a loss determining module 24324, configured to determine a loss value of the image classification model according to the N sample sub-vectors and the sample class of the content included in the sample image; an updating module 24325, configured to perform backward propagation in the image classification model according to the loss value, and update the weight parameter of the image classification model along a gradient descending direction in the process of the backward propagation; the image classification model is used for identifying the image to be identified.

In some embodiments, the loss determination module 24324 is further configured to: classifying according to each sample sub-vector in the N sample sub-vectors to correspondingly obtain N sample category prediction results of the content included in the sample image; determining sub-loss values according to the sample class and the ith sample class prediction result until N sub-loss values are obtained; wherein, the value of i is 1, 2, … … N in sequence; the ith sub-loss value is used for representing the difference between the sample class and the prediction result of the ith sample class; and performing accumulation processing on the N sub-loss values to obtain a loss value of the image classification model.

In some embodiments, the sample acquisition module 24321 is further configured to: obtaining a training set comprising a plurality of training samples; the training sample comprises a sample image and a sample category of content included in the sample image;

an update module 24325, further to: adding at least one training sample in a training set to a processing batch; carrying out average processing on loss values corresponding to all training samples in a processing batch to obtain an average loss value; and performing back propagation in the image classification model according to the average loss value.

In some embodiments, the artificial intelligence based image classification model training device 2432 further includes: the interval acquisition module is used for acquiring at least two preset complexity intervals and corresponding segmentation values; wherein, the tangent value is in inverse relation with the numerical value of the complexity interval; the complexity determining module is used for determining the complexity of the image classification model; and the segmentation value determining module is used for determining a complexity interval where the complexity of the image classification model is located as a target complexity interval, and determining the segmentation value corresponding to the target complexity interval as N.

In some embodiments, the sample acquisition module 24321 is further configured to: determining an interactive object of a target user and determining an interactive event participated by the target user; when the interactive event comprises an image, carrying out named entity recognition processing on a text in the interactive event to obtain a named entity in the text; when the named entities are matched with the interactive objects, determining the images in the interactive events as sample images, and creating sample categories according to the named entities so as to enable the sample categories corresponding to different named entities to be different.

In some embodiments, the sample acquisition module 24321 is further configured to: determining an interactive object of a target user and determining at least one directory in a user gallery of the target user; and when the catalog is matched with the interactive object, determining the image in the catalog as a sample image, and creating a sample category according to the catalog to ensure that the sample categories corresponding to different catalogs are different.

In some embodiments, the artificial intelligence based image classification model training device 2432 further includes: a key generation module for generating an asymmetric key pair comprising a public key and a private key; and the encryption module is used for encrypting the updated image classification model according to the public key and sending the encrypted image classification model to the block chain network, so that the node of the block chain network fills the encrypted image classification model into the new block and adds the new block to the tail of the block chain.

Embodiments of the present invention provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to execute an artificial intelligence based image processing method or an artificial intelligence based image classification model training method provided by embodiments of the present invention, for example, an artificial intelligence based image processing method as shown in fig. 5A, 5B or 5C, or an artificial intelligence based image classification model training method as shown in fig. 5D.

In some embodiments, the storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EE PROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a HyperText markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the embodiment of the present invention can achieve the following technical effects:

1) by dividing the feature vector of the image into N sub-vectors and determining the image category by combining the category prediction results of the N sub-vectors, the influence of redundant information which is not beneficial to identification on the identification result is reduced, and the accuracy and efficiency of image identification are improved.

2) Through the vector segmentation mode, the training time of the image classification model is shortened, the training efficiency is improved, and the inventor tests prove that the model training speed can be improved to 2 to 3 times of that of the traditional training scheme under the condition that the loss function is the cross entropy loss function; for the case that the loss function is an Arcfac e or Cosface loss function, the embodiment of the invention can increase the model training speed to 1.2 to 1.5 times of that of the traditional training scheme. In addition, the stability of the training image classification model can be improved through the embodiment of the invention.

3) By calculating the image vector distance between the image to be identified and the comparison image, the accuracy of user authentication (such as login authentication) can be improved, and the safety degree of a user account is enhanced.

4) The embodiment of the invention is also suitable for the scene of image clustering, can automatically cluster the images in the user gallery, and improves the user experience.

5) The embodiment of the invention has more excellent performance on a lightweight model, has higher accuracy improvement range and is beneficial to being deployed to mobile terminal equipment.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An image processing method based on artificial intelligence, comprising:

2. The image processing method according to claim 1, further comprising:

carrying out feature extraction processing on the comparison image, and dividing the obtained feature vector into N comparison sub-vectors;

determining the vector distance between the nth to-be-identified sub-vector and the nth contrast sub-vector until obtaining N vector distances; wherein the value of N is 1, 2, … … N in sequence;

averaging the N vector distances to obtain image vector distances;

when the image vector distance is smaller than a distance threshold value, determining that the image to be identified and the comparison image comprise the same category of content.

3. The image processing method according to claim 2,

before the feature extraction processing is performed on the image to be recognized, the method further comprises the following steps:

acquiring and processing images within a set time period when a login request is received to obtain the image to be identified;

determining at least one user account with login authority, and determining an image associated with the user account as a comparison image;

when the image vector distance is smaller than a distance threshold value, after determining that the image to be identified and the comparison image include the same category of content, the method further includes:

and loading account resources of the user account corresponding to the comparison image so as to respond to the login request.

4. The image processing method according to claim 2,

determining at least one directory in a user gallery;

performing first traversal processing on the user gallery, determining the traversed image as an image to be identified, and

in the process of the first traversal processing, performing second traversal processing on a plurality of images except the image to be identified to obtain a contrast image, and constructing a binary image group according to the image to be identified obtained by the first traversal processing and the contrast image obtained by the second traversal processing;

after the averaging processing is performed on the N vector distances to obtain the image vector distance, the method further includes:

adding the images in the user gallery to at least one image class according to the image vector distance corresponding to the binary image group;

when the images exceeding the set proportion in the image class belong to the same catalog, determining the catalog to which the images exceeding the set proportion belong as a target catalog, and

and adding the images which do not belong to any directory in the image classes to the target directory.

5. An image classification model training method based on artificial intelligence is characterized by comprising the following steps:

segmenting the sample feature vector into N sample sub-vectors;

6. The method for training an image classification model according to claim 5, wherein the determining a loss value of the image classification model according to the N sample subvectors and the sample classes of the content included in the sample images comprises:

classifying according to each sample sub-vector in the N sample sub-vectors to correspondingly obtain N sample category prediction results of the content included in the sample image;

determining sub-loss values according to the sample class and the ith sample class prediction result until N sub-loss values are obtained;

wherein the value of i is 1, 2, … … N in sequence; the ith said sub-penalty value is used to represent the difference between said sample class and the ith said sample class predictor;

and performing accumulation processing on the N sub-loss values to obtain the loss value of the image classification model.

7. The image classification model training method according to claim 5,

the obtaining of the sample image and the sample category of the content included in the sample image includes:

obtaining a training set comprising a plurality of training samples; wherein the training sample comprises a sample image and a sample category of content included in the sample image;

the back propagation in the image classification model according to the loss value comprises:

adding at least one training sample in the training set to a processing batch;

carrying out average processing on loss values corresponding to all training samples in the processing batch to obtain an average loss value;

and performing back propagation in the image classification model according to the average loss value.

8. The image classification model training method according to claim 5, further comprising:

acquiring at least two preset complexity intervals and corresponding segmentation values; wherein the tangent value is in inverse proportion to the numerical value of the complexity interval;

determining a complexity of the image classification model;

and determining a complexity interval where the complexity of the image classification model is located as a target complexity interval, and determining a segmentation value corresponding to the target complexity interval as the N.

9. The method for training the image classification model according to any one of claims 5 to 8, wherein the obtaining of the sample image and the sample category of the content included in the sample image comprises:

determining an interactive object of a target user and determining an interactive event participated by the target user;

when the interaction event comprises an image, carrying out named entity recognition processing on a text in the interaction event to obtain a named entity in the text;

and when the named entities are matched with the interactive objects, determining the images in the interactive events as sample images, and creating sample categories according to the named entities so as to enable the sample categories corresponding to different named entities to be different.

10. The method for training the image classification model according to any one of claims 5 to 8, wherein the obtaining of the sample image and the sample category of the content included in the sample image comprises:

determining an interactive object of a target user and determining at least one directory in a user gallery of the target user;

and when the catalog is matched with the interactive object, determining the image under the catalog as a sample image, and creating a sample category according to the catalog so as to enable the sample categories corresponding to different catalogs to be different.

11. The image classification model training method according to any one of claims 5 to 8, further comprising:

generating an asymmetric key pair comprising a public key and a private key;

encrypting the updated image classification model according to the public key, and sending the encrypted image classification model to a block chain network so as to enable the image classification model to be encrypted

And the node of the block chain network fills the encrypted image classification model into a new block and adds the new block to the tail of the block chain.

12. An artificial intelligence-based image processing apparatus, comprising:

13. An image classification model training device based on artificial intelligence is characterized by comprising:

14. An electronic device, comprising:

a memory for storing executable instructions;

a processor, configured to execute the executable instructions stored in the memory to implement the artificial intelligence based image processing method according to any one of claims 1 to 4 or the artificial intelligence based image classification model training method according to any one of claims 5 to 11.

15. A storage medium storing executable instructions for causing a processor to perform the artificial intelligence based image processing method of any one of claims 1 to 4 or the artificial intelligence based image classification model training method of any one of claims 5 to 11 when executed.