CN114911778A

CN114911778A - Data processing method and device, computer equipment and storage medium

Info

Publication number: CN114911778A
Application number: CN202110172990.7A
Authority: CN
Inventors: 赵亮; 张洁; 黄平达; 马佳骏; 陈志奎
Original assignee: Dalian University of Technology; Tencent Technology Shenzhen Co Ltd
Current assignee: Dalian University of Technology; Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-16

Abstract

The application discloses a data processing method, a data processing device, computer equipment and a storage medium, and belongs to the technical field of computers. According to the method and the device, missing characteristic values in incomplete multi-modal data are supplemented by utilizing the average value of characteristic data under each modality, so that the supplemented characteristic obtained after completion and original characteristic data are utilized, the characteristic data of different modalities can be aligned, the shared characteristic of common information between the characteristic data used for representing different modalities is conveniently and accurately reconstructed, and the processing accuracy of the incomplete multi-modal data can be improved.

Description

Data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.

Background

With the rapid development of computer technology and multimedia technology, feature data of the same entity can be collected from different data sources or channels, and multi-modal data arises, wherein each information source or form can be referred to as a modality. For example, a web page usually has feature data of two modalities, text and links, a news usually has feature data of multiple modalities, which is composed of translated versions of different languages, and an image usually has feature data of multiple modalities, such as texture, color, histogram, and the like.

In application of multi-modal data, a phenomenon of data missing often occurs, for example, feature data of a certain modality is missing, or part of feature values are missing in feature data of a certain modality, so that data to be processed often appears in an incomplete and multi-modal form, and available information is insufficient, and therefore, how to process incomplete multi-modal data becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a data processing method and device, computer equipment and a storage medium, and the sharing characteristics among incomplete multi-modal data can be accurately extracted. The technical scheme is as follows:

in one aspect, a data processing method is provided, and the method includes:

acquiring a plurality of feature data belonging to a plurality of modalities, wherein the feature data corresponding to at least one modality in the plurality of modalities comprises a missing feature value, and any modality in the plurality of modalities corresponds to one or more feature data;

for the at least one modality with the missing characteristic value, completing the missing characteristic value in the characteristic data corresponding to the at least one modality based on the average value of the characteristic data corresponding to the at least one modality, and obtaining the completed characteristic of the at least one modality;

acquiring shared features of the plurality of feature data of the plurality of modalities based on the completion feature of the at least one modality and the feature data of the modalities other than the at least one modality, the shared features being used for representing common information among the plurality of feature data belonging to the plurality modalities respectively.

In one aspect, a data processing apparatus is provided, the apparatus comprising:

the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a plurality of characteristic data belonging to a plurality of modalities, the characteristic data corresponding to at least one of the plurality of modalities comprises a missing characteristic value, and any modality of the plurality of modalities corresponds to one or more characteristic data;

the completion module is used for completing the missing characteristic values in the characteristic data corresponding to the at least one modality based on the average value of the characteristic data corresponding to the at least one modality for the at least one modality containing the missing characteristic values, so as to obtain the completed characteristic of the at least one modality;

a second obtaining module, configured to obtain a shared feature of the plurality of feature data of the plurality of modalities based on a completion feature of the at least one modality and feature data of a modality other than the at least one modality, where the shared feature is used to represent common information between the plurality of feature data belonging to the plurality of modalities respectively.

In one possible implementation, the completion module includes:

the obtaining sub-module is used for obtaining an average value of feature data corresponding to any modality of the at least one modality containing the missing feature values;

and the filling sub-module is used for filling the average value of the characteristic data corresponding to any mode into the missing characteristic value in the characteristic data corresponding to any mode to obtain the completion characteristic of any mode.

In one possible embodiment, the apparatus further comprises:

the regularization module is used for regularizing the plurality of feature data to obtain a plurality of regularized feature data;

and the completion module is also used for executing the step of completing the missing characteristic values and obtaining the completion characteristics based on the regularized characteristic data.

In one possible embodiment, the feature data corresponding to any one of the plurality of modalities respectively corresponds to a plurality of categories;

the second acquisition module includes:

the linear combination submodule is used for carrying out linear combination on the characteristic data or the completion characteristic of any one of the plurality of modes and the characteristic value or the completion average value belonging to the same category to obtain the self-expression characteristic of the plurality of categories under any one mode;

and the clustering submodule is used for clustering the self-expression characteristics of the categories under the multiple modes to obtain the shared characteristics.

In one possible embodiment, the clustering submodule includes:

a building unit, configured to build, based on the multiple self-expression features, multiple affinity graphs of the multiple categories in the multiple modalities, where any one of the multiple affinity graphs includes multiple nodes and multiple weighted edges, where any one of the multiple nodes is used to represent a self-expression feature of a category in a corresponding modality, and a weight carried by any one of the multiple weighted edges is used to represent a similarity between two self-expression features of two categories corresponding to two nodes of the any one weighted edge in a corresponding modality;

and the spectral clustering unit is used for performing spectral clustering operation on the self-expression characteristics based on the affinity graphs to obtain the shared characteristics.

In one possible embodiment, the spectral clustering unit includes:

the first obtaining subunit is configured to obtain, for any clustering process, a plurality of graph laplacian matrices of the plurality of affinity graphs;

a second obtaining subunit, configured to obtain, based on the multiple graph laplacian matrices, multiple cluster indication matrices of the multiple modalities;

a third obtaining subunit, configured to obtain a loss function value of any one of the clustering processes based on reconstruction errors of the multiple self-expression features, a clustering error of the any one of the clustering processes, and a similarity difference between the multiple clustering indication matrices and a feature to be solved;

and the iteration adjusting subunit is used for iteratively adjusting the parameters of the feature to be solved until the loss function value meets the stop condition, stopping iteration and acquiring the feature to be solved during the last iteration as the shared feature.

In one possible embodiment, the apparatus further comprises:

a third obtaining module, configured to obtain, for any one of the multiple modalities, a target weight matrix corresponding to the any modality, where a weight coefficient of a feature value in the target weight matrix that is not missing in the any modality is greater than a weight coefficient of a missing feature value;

the third obtaining subunit is further configured to, in a process of obtaining the loss function value, call the target weight matrix, and weight a reconstruction error of a self-expression feature corresponding to the any modality.

In a possible implementation manner, the target weight matrix is a diagonal weight matrix, the weight coefficient of the undeleted eigenvalue is 1, and the weight coefficient of the missing eigenvalue is the ratio of the number of samples of the undeleted eigenvalue to the number of total samples in any mode.

In one possible implementation, the first obtaining subunit is configured to:

based on any affinity map in the multiple affinity maps, acquiring a similarity matrix and a diagonal matrix of the any affinity map;

and obtaining the difference value between the diagonal matrix and the similarity matrix as a graph Laplace matrix of any affinity graph.

In one possible embodiment, the iterative adjustment subunit is configured to:

iteratively adjusting values of a plurality of intermediate variables in the loss function based on an alternating direction multiplier method to obtain a plurality of adjusted intermediate variables, wherein when the value of any intermediate variable is adjusted, the values of the intermediate variables except for any intermediate variable in the plurality of intermediate variables are kept unchanged;

and keeping the values of the adjusted intermediate variables unchanged, and executing the step of iteratively adjusting the parameters of the features to be solved.

In one possible embodiment, the stopping condition is that the difference between the loss function value of any one clustering process and the loss function value of the last clustering process is less than a loss threshold.

In one possible implementation, the feature data corresponding to any of the plurality of modalities respectively correspond to a plurality of categories, the plurality of categories are a plurality of target objects, and the plurality of modalities of the plurality of target objects at least include visual information and near-infrared information of face images of the respective plurality of target objects.

In one aspect, a computer device is provided, the computer device comprising one or more processors and one or more memories, wherein at least one computer program is stored in the one or more memories, and loaded and executed by the one or more processors to implement the data processing method as described above.

In one aspect, a storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the data processing method as described above.

In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes to enable the computer device to perform the above-described data processing method.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the missing characteristic values in incomplete multi-modal data are supplemented by utilizing the average value of the characteristic data under each modality, so that the supplemented characteristic data of different modalities can be aligned by utilizing the supplemented characteristic and the original characteristic data obtained after completion, the shared characteristic used for expressing the common information among the characteristic data of different modalities can be conveniently and accurately reconstructed, and the processing accuracy of the incomplete multi-modal data can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a data processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of a data processing method provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of a data processing method provided in an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a data processing method provided by an embodiment of the present application;

FIGS. 6 to 12 show model parameters λ, respectively ₁ The selection condition of (1);

FIGS. 13 to 19 show model parameters λ, respectively ₃ And λ ₄ The selection condition of (1);

fig. 20 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

The term "at least one" in this application means one or more, and the meaning of "a plurality" means two or more, for example, a plurality of first locations means two or more first locations.

Before the embodiments of the present application are described, some basic concepts in the cloud technology field need to be introduced, which are described below.

Cloud Technology (Cloud Technology): the cloud computing business mode management system is a management technology for unifying series resources such as hardware, software, networks and the like in a wide area network or a local area network to realize data calculation, storage, processing and sharing, namely is a general name of a network technology, an information technology, an integration technology, a management platform technology, an application technology and the like applied based on a cloud computing business mode, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support in the field of cloud technology. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture-like websites and more portal websites. With the high development and application of the internet industry, each article may have an own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industry data all need strong system background support and can be achieved through cloud computing.

Cloud Computing (Cloud Computing): cloud computing is a computing model that distributes computing tasks over a resource pool of large numbers of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS, english is collectively referred to as Infrastructure as a Service, and chinese is collectively referred to as Infrastructure as a Service) platform is established, and multiple types of virtual resources are deployed in the resource pool and are used by external clients. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, and network devices.

According to the logic function division, a Platform as a Service (PaaS a Service) layer can be deployed on the IaaS layer, a Software as a Service (SaaS) layer can be deployed on the PaaS layer, and the SaaS layer can be directly deployed on the IaaS layer. PaaS is a platform on which software runs, such as a database, a web (web page) container, and the like. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

Cloud Storage (Cloud Storage): the distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system which integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, grid technology, distributed storage file systems and the like, and provides data storage and service access functions to the outside.

At present, a storage method of a storage system is as follows: logical volumes are created, and when a logical volume is created, physical storage space, which may be the disk composition of a certain storage device or several storage devices, is allocated to each logical volume. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data Identification (ID), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.

Artificial intelligence cloud Service (AI as a Service, AIaaS): the AI service is also called an AI (Artificial Intelligence) service, which is a service mode of an Artificial Intelligence platform that is mainstream at present, and specifically, the AIaaS platform splits several types of common AI services and provides an independent or packaged service at a cloud. This service model is similar to the opening of an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface), and part of the qualified developers can also use an AI framework and an AI infrastructure provided by the platform to deploy and operate and maintain the self-dedicated cloud artificial intelligence services.

The embodiment of the present application relates to the aforementioned intersecting field of cloud computing, cloud storage, and AIaaS, where a user may upload incomplete multi-modal data (that is, feature data of multiple modalities with missing feature values) to a cloud server, and the cloud server solves a shared feature of the overall multi-modal data based on the incomplete multi-modal data in a cloud computing mode, so as to be put into various downstream sub-tasks, for example, according to different types of multi-modal data, the sub-tasks include but are not limited to: audio recognition, multi-modal emotion analysis, terminal identity verification, image semantic recognition, image classification, text classification and the like. In addition, the incomplete multi-modal data and the shared features obtained by analyzing and solving the incomplete multi-modal data can be stored in a distributed cloud storage system, so that persistent storage and maintenance are achieved. In addition, independent or packaged services are provided for the shared feature extraction of the multi-modal data and various downstream subtasks at the cloud end, so that the artificial intelligence cloud service can be realized.

In view of this, before the embodiments of the present application are described, some terms in the AI field need to be explained:

artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises an audio processing technology, a computer vision technology, a natural language processing technology, machine learning/deep learning and the like.

The computer can listen, see, say and feel, and is the development direction of future human-computer interaction, Machine Learning (ML) is a multi-field cross subject, and relates to multi-subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The embodiment of the application relates to a data processing method, which can extract shared features capable of accurately representing common information among whole multi-modal data based on incomplete multi-modal data lacking characteristic values by using a machine learning theory so as to be put into various downstream subtasks, ensure that comprehensive information of various modes can be fused in the shared features, and reflect the self characteristics of each sample more comprehensively and in detail.

Fig. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102.

The terminal 101 is installed and operated with applications supporting multimodal data collection services including, but not limited to: browser applications, social applications, ordering applications, payment applications, taxi-taking applications, image processing applications, short video applications, and the like.

The terminal 101 may be directly or indirectly connected to the server 102 through a wired or wireless communication manner, and the connection manner is not limited in this embodiment of the application.

The server 102 is configured to provide a background multi-modal data processing service for the application program, where the server 102 includes at least one of a server, multiple servers, a cloud computing platform, or a virtualization center. Optionally, the server 102 undertakes primary computational work and the terminal 101 undertakes secondary computational work; or, the server 102 undertakes the secondary computing work, and the terminal 101 undertakes the primary computing work; alternatively, the server 102 and the terminal 101 perform cooperative computing by using a distributed computing architecture.

In some embodiments, the server 102 is an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), big data and artificial intelligence platforms, and the like.

In some embodiments, the terminal 101 is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an MP3(Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4) player, an e-book reader, and the like, but is not limited thereto.

Those skilled in the art will appreciate that the number of terminals 101 described above may be greater or fewer. For example, the number of the terminals 101 may be only one, or the number of the terminals 101 may be several tens or hundreds, or more. The number and the device type of the terminals 101 are not limited in the embodiment of the present application.

With the rapid development of computer technology and multimedia technology, feature data of the same entity (an entity can be regarded as a sample, namely, an object to be measured) can be collected from different data sources or channels, and multi-modal data is generated accordingly. Each source or channel produces characteristic data of a different modality, e.g., a web page typically has characteristic data of both text and link modalities; each piece of international news has a translation version of multiple languages (the translation version of each language is also the feature data of one mode); the picture has feature data of modes such as texture, color and histogram. The multi-modal data show complementarity and consistency, information of different modalities is effectively fused, more comprehensive and comprehensive discriminability characteristics (namely shared characteristics) of data entities can be obtained, and more reliable characteristic support is provided for various downstream tasks. In a large number of application scenarios, the problem of data missing is often unavoidable, that is, feature data is often presented in an incomplete multi-modal form, for example, a part of feature values are missing in feature data of a certain modality, which not only results in insufficient available information, but also destroys the instance dimension consistency of each modality, and further causes the traditional multi-modal fusion algorithm to be no longer applicable. In view of the fact that there is a large amount of unlabeled feature data in real life, how to solve the problem of fusion of incomplete multi-modal data has become a research focus in the field of multi-modal application in recent years, and therefore, a data processing method for incomplete multi-modal data is urgently needed.

The embodiment of the application provides a data processing method, that is, a fusion algorithm for incomplete multi-modal data, and if data samples (that is, data instances) can be represented mutually, it means that one data sample can be represented as a linear combination of other data samples in the same category, so that similarity measurement can be performed between the data samples according to the strength of correlation between the data samples, rather than simple distance measurement.

Fig. 2 is a flowchart of a data processing method according to an embodiment of the present application. Referring to fig. 2, the embodiment is applied to a computer apparatus, and includes the steps of:

201. the computer equipment acquires a plurality of characteristic data belonging to a plurality of modals, wherein at least one modal in the plurality of modals corresponds to a characteristic data containing a missing characteristic value, and any modal in the plurality of modals corresponds to one or more characteristic data.

In some embodiments, the plurality of feature data includes at least the following description of dimensions:

1) mode: the data sources or channels used to represent the characteristic data may be, for example, different modalities for the characteristic data collected from different data sources, different modalities for the characteristic data collected from different channels, or different modalities for the characteristic data collected from different data sources or different channels.

2) Sample preparation: that is, in the example, the feature data of multiple modalities belonging to the sample can be obtained by collecting the same sample from different data sources or channels, and the missing phenomenon of the feature value may refer to that the feature data of a certain sample in a certain modality is entirely missing.

3) The category: that is, the target object to which the sample belongs, may include one or more samples under the same category, and each sample includes feature data belonging to a plurality of modalities.

In some embodiments, the feature data corresponding to any modality of the multiple modalities respectively correspond to multiple categories, the multiple categories are multiple target objects, the multiple modalities of the multiple target objects at least include visual information and near-infrared information of face images of the multiple target objects, and the face images are taken as an example for illustration below.

In an exemplary scenario, taking a multi-modal face data set as an example for illustration, assuming that the multi-modal face data set includes feature data of 2 modalities, where the 2 modalities are visual information of a face image and near-infrared information of the face image (which are derived from different acquisition channels), a total modality number m is 2, assuming that the multi-modal face data set includes a total of N face images, a total number of instances (i.e., a sample capacity of the multi-modal face data set) is N, and the N face images belong to actual k target objects, respectively, a total number of categories is k, and to sum up, the multi-modal face data set may be represented in the following form:

wherein the content of the first and second substances,

is represented by having d _v And (4) an original matrix formed by characteristic data of a v-th mode of the dimensional attribute, wherein R represents that the characteristic data belong to a real number domain. For example, at d _v Under the condition of 100, X ⁽¹⁾ Representing a visual information matrix with 100-dimensional feature data, X ⁽²⁾ Representing a near infrared information matrix with 100 dimensional feature data. Alternatively, an incomplete multi-modal face data set is formed due to the fact that the visual information or the near infrared information is lacked in the partial samples.

It should be noted that, in other embodiments, besides the face image of the target object is selected as the sample, a human body image of the target object, a video of the target object, an audio of the target object, description texts of various information, and the like may also be selected as the sample, and the embodiment of the present application does not specifically limit the types of the multiple modalities and the types of the feature data.

202. For the at least one modality containing the missing feature value, the computer device completes the missing feature value in the feature data corresponding to the at least one modality based on the average value of the feature data corresponding to the at least one modality, and obtains the completed feature of the at least one modality.

In some embodiments, because the feature data in each modality does not have a feature value missing phenomenon, the computer device only needs to adopt an average value of existing feature data to complement the missing feature value in each modality for each modality with the feature value missing phenomenon, so as to construct a complete multi-modality feature data set (for example, a multi-modality face data set), and for the feature data in the modality without the missing feature value, the processing is skipped without being processed, so that the processing resource of the computer device can be saved.

203. The computer device obtains a shared feature of the plurality of feature data of the plurality of modalities based on a completion feature of the at least one modality and feature data of a modality other than the at least one modality, the shared feature being used for representing common information among the plurality of feature data belonging to the plurality modalities respectively.

In some embodiments, the computer device may regard the supplemented features of the modalities that have been supplemented and the feature data of the modalities that do not lack the feature values as a supplemented complete multi-modality feature data set, and a feature value missing phenomenon does not exist in the supplemented multi-modality feature data set any more, so that the feature data of different modalities can be aligned, which is convenient for accurately extracting the shared features of the feature data of each modality subsequently, thereby achieving a multi-modality data fusion effect.

In some embodiments, when the shared features are extracted based on the complemented multi-modal feature data set, a low-rank sparse representation technology may be used to learn global and local graph structures of each modality in a self-expression subspace, construct each self-expression feature of each category in each modality, and perform a spectral clustering operation on each self-expression feature by using each self-expression feature as an affinity graph in a spectral clustering algorithm, so as to construct a more accurate shared feature, which will be described in detail in the next embodiment.

According to the method provided by the embodiment of the application, the missing characteristic values in the incomplete multi-modal data are supplemented by utilizing the average value of the characteristic data under each mode, so that the supplemented characteristic obtained after completing and the original characteristic data are utilized, the characteristic data of different modes can be aligned, the shared characteristic used for expressing the common information among the characteristic data of different modes can be conveniently and accurately reconstructed, and the processing accuracy of the incomplete multi-modal data can be improved.

Fig. 3 is a flowchart of a data processing method provided in an embodiment of the present application, please refer to fig. 3, the embodiment is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes the following steps:

301. the server acquires a plurality of characteristic data belonging to a plurality of modals, wherein the characteristic data corresponding to at least one modal in the plurality of modals comprises a missing characteristic value, and any modal in the plurality of modals corresponds to one or more characteristic data.

1) mode: the acquisition data sources or acquisition channels used to represent the characteristic data may be, for example, different modalities for characteristic data acquired from different data sources, different modalities for characteristic data acquired from different channels, or different modalities for characteristic data acquired from different data sources or different channels.

3) The category: i.e. the target object to which the sample belongs, may comprise one or more samples under the same category, and each sample comprises feature data belonging to a plurality of modalities.

wherein the content of the first and second substances,

is represented by having d _v And (4) an original matrix formed by characteristic data of a v-th mode of the dimensional attribute, wherein R represents that the characteristic data belong to a real number domain. For example, at d _v Under the condition of 100, X ⁽¹⁾ Representing a visual information matrix with 100-dimensional feature data, X ⁽²⁾ Representing a near infrared information matrix with 100-dimensional feature data. Alternatively, an incomplete multi-modal face data set is formed due to the fact that the visual information or the near infrared information is lacked in the partial samples.

It should be noted that, in other embodiments, besides the face image of the target object being selected as the sample, a human body image of the target object, a video of the target object, an audio of the target object, description texts of various information, and the like may be selected as the sample, and the embodiment of the present application does not specifically limit the types of the multiple modalities and the types of the feature data.

Optionally, the server may read the plurality of feature data from the local database, receive the plurality of feature data uploaded by the terminal from the cloud computing platform or the AIaaS platform, and call the plurality of feature data from the distributed cloud storage system.

302. The server carries out regularization processing on the plurality of feature data to obtain a plurality of regularized feature data.

In some embodiments, the server may perform regularization processing on the multiple pieces of feature data in multiple regularization manners to obtain the multiple pieces of regularized feature data, for example, a regularization manner based on an L0 norm is used, or a regularization manner based on an L1 norm is used, or a regularization manner based on an L2 norm is used, which is not specifically limited in the embodiment of the present application.

In an exemplary scenario, taking the incomplete multi-modal face data set as an example, assuming that the regularization based on the L2 norm is performed on each data sample (i.e., data instance) in the multi-modal face data set, the regularization principle is equivalent to performing the following operation shown in formula (1):

wherein the symbol s.t. represents a constraint, X _i，j The ith row and jth column elements of the matrix X are expressed, that is, the jth attribute value of the ith data sample in the multi-modal face data set X is expressed, N represents the sample capacity of the multi-modal face data set X, d _v The attribute dimension representing one modality of the multi-modal face data set X, e.g. the attribute dimension d of the visual information matrix for the modality of visual information of a face image _v May be 100-dimensional, with visual confidence for each data sampleThe information is expressed as a vector with the length of 100 dimensions, and the visual information vector of N data samples forms the visual information matrix of the whole multi-modal face data set, wherein the visual information matrix is a visual information matrix with the size of N multiplied by d _v A matrix of (c).

In the process, by performing regularization on each feature data, which is equivalent to performing preprocessing on the whole multi-modal feature data set, the preprocessing mode of performing regularization operation on each feature data of each modality can eliminate dimension influence among feature data of different modalities and accelerate the convergence rate when the feature is subsequently iteratively adjusted.

In some embodiments, the server may not perform the step 302, and directly after the original feature data are obtained, the step 302 is skipped and the step 303 is performed, that is, for each modality including the missing feature value, an average value of the feature data that is not normalized is obtained, and the average value is filled into the missing feature value of each modality, so as to obtain the intermediate feature of each modality, which can greatly simplify the data processing flow.

303. The server obtains an average value of regularized feature data corresponding to any modality of the at least one modality containing the missing feature values.

In some embodiments, the server determines at least one modality with missing feature values from the plurality of modalities, obtains, for each modality of the at least one modality, an average value of the normalized feature data corresponding to each modality, and performs step 304 described below.

In some embodiments, if the individual feature data is not regularized, the server only needs to perform step 304 after obtaining the average of the feature data that is not regularized for each modality that contains missing feature values.

304. And the server fills the average value of the regularized feature data corresponding to any modality into the missing feature value in the regularized feature data corresponding to any modality to obtain the completion feature of any modality.

In some embodiments, the server may assign a missing feature value (NULL or NULL) in the normalized feature data in each mode to an average value of the normalized feature data in each mode obtained in step 303, so as to obtain a complete supplemented multi-mode feature data set, in which each mode without a missing feature value maintains the original feature data obtained in step 301, and each mode without a missing feature value is updated to the supplemented intermediate feature obtained in step 304.

In some embodiments, if the regularization processing is not performed on each feature data, the server only needs to assign a feature value (NULL or NULL) missing from the feature data that is not regularized in each modality to the average value of the feature data that is not regularized in each modality and is acquired in step 303, so as to obtain a completed multi-modal feature data set, and since the regularization step does not need to be performed, the data processing flow can be simplified.

In one exemplary embodiment, server regularization handles incomplete multi-modal feature datasets

And then, for each mode v, obtaining an average value of the characteristic data corresponding to the mode of each missing characteristic value, filling the missing characteristic value in each mode with the average value, wherein the filled intermediate characteristic can successfully align the dimension in each mode, so that subsequent further multi-mode data fusion is facilitated.

In the above step 303-.

In the process, the server can ensure that the missing characteristic values of each modality in the incomplete multi-modality characteristic data set are completed after filling operation by acquiring the completion characteristic of each modality, so that the final completion characteristic of each modality can be aligned with the dimension, and subsequent characteristic fusion operation is facilitated.

305. The server acquires a target weight matrix corresponding to any modality in the plurality of modalities, wherein the weight coefficient of the characteristic value which is not missing in the target weight matrix under the any modality is larger than the weight coefficient of the missing characteristic value.

Optionally, the target weight matrix is a diagonal weight matrix, the weight coefficient of the undeleted eigenvalue is 1, and the weight coefficient of the missing eigenvalue is a ratio of the number of samples of the undeleted eigenvalue to the number of total samples in any mode.

It should be noted that the target weight matrix is not limited to the form of the diagonal weight matrix, as long as the weight coefficient of the feature value that is not missing is ensured to be larger than the weight coefficient of the missing feature value (already filled as the average value), and the embodiment of the present application does not specifically limit the form of the target weight matrix.

In some embodiments, the server constructs, for each modality of the plurality of modalities, a target weight matrix corresponding to each modality, and this target weight matrix is used for weighting the reconstruction error of the self-expression feature of the corresponding modality when the shared feature is learned subsequently, so as to reduce the weight of the filled average value in constructing the shared feature, thereby being capable of improving the expression capability of the shared feature. Optionally, the above-mentioned weighting process is to multiply each target weight matrix by elements with the reconstruction error of the self-expression feature of the corresponding modality.

In an exemplary embodiment, the server constructs a diagonal weight matrix G for each modality's feature data ^(v) ∈R ^N×N (i.e., the target weight matrix), the diagonal weight matrix is defined as the following equation (2):

wherein the content of the first and second substances,

weight matrix G representing modes v ^(v) The element in the ith row and ith column (i.e., the ith diagonal element), n _v Represents the total number of data instances in modality v that are not missing, and N represents the total number of data instances (i.e., sample size) in modality v.

In the above process, for the at least one modality including the missing feature value, the server completes the missing feature value in the feature data corresponding to the at least one modality based on the average value of the feature data corresponding to the at least one modality, so as to obtain the completed feature of the at least one modality, and by constructing the target weight matrix, a lower weight can be given to the missing feature value, so that it can be ensured that the non-missing data instance of each modality is represented by the real data (rather than the filled average value) as much as possible.

306. And the server linearly combines the characteristic values or supplemented average values of the characteristic values or supplemented characteristics belonging to the same category for the characteristic data or supplemented characteristics of any one of the plurality of modalities to obtain the self-expression characteristics of the plurality of categories under any modality.

In some embodiments, for each modality of the plurality of modalities, the server determines feature values (or a complementary average value) of all data samples belonging to each category under each modality, and linearly combines the feature values of all the data samples belonging to each category to obtain a self-expression feature corresponding to each category under each modality.

In some embodiments, in the above linear combination process, the self-expression features corresponding to each category may be gradually approximated by reducing the reconstruction error, and when the reconstruction error of each self-expression feature is obtained, the target weight matrix constructed in the above step 305 may be used for weighting, so as to reduce the contribution of the filled average value to the self-expression features.

In the above process, that is, for each category in each modality, after linearly combining the feature values (or the complemented average values) belonging to the modality and the category in a combination manner, the server obtains a feature after linear combination, and obtains a reconstruction error of the combination manner, if the reconstruction error is greater than the reconstruction threshold, the combination manner needs to be iteratively adjusted (for example, a coefficient occupied by each feature value when the linear combination is adjusted), until the reconstruction error is less than or equal to the reconstruction threshold, the self-expressed feature of the category in the modality can be obtained.

In some embodiments, the server combines a low-rank sparse representation technique and a weight mechanism, takes feature data (or complementary features) of each modality as a dictionary, and can learn a low-rank sparse self-expression feature Z by reconstructing feature data (or complementary features) of the server and reconstructing the feature data (or complementary features) of the server ^(v) ∈R ^N×N For example, in the case where the feature data is a face image, that is, the self-expression feature makes the face feature of the same person linearly represented by the face image of the person as much as possible.

307. The server constructs a plurality of affinity graphs of the plurality of categories under the plurality of modalities based on a plurality of self-expression characteristics of the plurality of categories under the plurality of modalities.

The method includes the steps that a plurality of nodes and a plurality of weighted edges are included in any one of the plurality of affinity graphs, any one of the plurality of nodes is used for representing self-expression characteristics of one category in a corresponding mode, and the weight carried by any one of the weighted edges is used for representing similarity between two self-expression characteristics of two categories corresponding to two nodes of the any one weighted edge in the corresponding mode.

In some embodiments, the server regards the self-expression feature of each category as a node in the affinity graph, connects each node to at least one node nearest to the surrounding, forms at least one undirected edge, and for each undirected edge, uses the similarity between two self-expression features corresponding to two endpoints as the weight of the undirected edge, finally obtains at least one weighted edge, and repeatedly performs the above operations, thereby obtaining the affinity graph of each category in each modality.

In the above process, the server combines the target weight matrix obtained in step 305, and uses a low-rank sparse representation technique, so that the global and local graph structures of the self-expression subspace of each modality can be learned, and the eigenvalue missing in each modality hardly participates in the eigen representation of other data instances under the limitation of a low weight value in the target weight matrix, so that the information represented by other data eigen representations is prevented from being diluted by a filled average value, and the accuracy of the other data eigen representations is ensured.

308. The server performs spectral clustering operation on the self-expression characteristics based on the affinity graphs to obtain shared characteristics of the characteristic data of the modalities, wherein the shared characteristics are used for representing common information among the characteristic data respectively belonging to the modalities.

In some embodiments, the server applies a graph clustering technique to the low-rank sparse affinity maps of the respective modalities to obtain corresponding low-dimensional feature representations (i.e., clustering indicator matrices). And finally, for the clustering indication matrix of each mode, using a check technology, fusing the shared low-dimensional feature representation among the learning modes, and outputting the final shared feature. That is, an incomplete multi-modal data fusion model based on low-rank sparse graph learning can be constructed, which is described in detail below.

3081. And the server acquires a plurality of graph Laplacian matrixes of the affinity graphs for any clustering process.

Optionally, the server obtains a similarity matrix and a diagonal matrix of any one of the affinity maps based on the affinity map; and obtaining the difference value between the diagonal matrix and the similarity matrix as a graph Laplace matrix of any affinity graph.

In some embodiments, the constructed affinity map from the expression signature obtained in step 307 above is assumed to be Z ^(v) Then the expression of the graph Laplace matrix is

Wherein L is _Z Is a matrix of the laplacian of the graph,

is based on an affinity graph Z ^(v) The similarity matrix of (2) is defined to ensure its symmetry

The superscript T represents the transpose of the matrix, and

is an affinity graph Z ^(v) A diagonal matrix in which diagonal elements are defined as a similarity matrix

The sum of the elements of the corresponding row in (b).

3082. The server obtains a plurality of clustering indication matrixes of the plurality of modalities based on the plurality of graph Laplacian matrixes.

In some embodiments, the server continuously optimizes the values of the reduced objective function to obtain each cluster indication matrix

Alternatively, the expression of the objective function is as the following equation (3):

the first three terms of the target function are expressions represented by low-rank sparse representation under a weight mechanism, the fourth term is a spectral clustering formula, and a symbol s.t. represents a constraint condition.

Wherein, in the constraint condition, E ^(v) ＝X ^(v) -X ^(v) Z ^(v) Representing the reconstruction error of the self-expression process (i.e. the reconstruction error of the plurality of self-expression features in the vth modality),

i.e. error matrix, affinity graph Z ^(v) Used in the subsequent spectral clustering process, and

Z ^(v) 1-1 for Z ^(v) Three additional constraints, conditions

Aiming at avoiding excessive data instances being represented by itself, the non-negative constraint 0 ≦ Z ^(v) 1 or less so that the affinity diagram Z ^(v) Better interpretability, since positive values are more realistic in normal applications; finally, to ensure that all data are involved in self-expression learning, Z is applied to it ^(v) A constraint of 1, 1 denotes a vector with elements all 1.

Optionally, for minimizing the objective function, | · | | non-calculation _F Represents the Frobenius norm,

represents the square of the current Frobenius norm in the weight matrix G ^(v) Under the guidance of the information of (1), minimize | | | E ^(v) G ^(v) || _F The term can effectively reduce the contribution degree of the missing characteristic value in low-rank sparse representation (namely self-expression characteristic) so as to ensure that the non-missing characteristic value can exactly represent the self-expression characteristic.

Wherein | · | charging _* Representing the nuclear norm for the self-expression matrix Z ^(v) The global structure (representing the overall relationship among different types of face images) can be learned by carrying out low-rank constraint, and | · | luminance can be calculated ₁ Representing a 1 norm, intended to constrain Z ^(v) Can capture its local structural information (representing important feature items for each face representation). Tr (-) denotes the trace of the matrix. Finally, the low-dimensional feature representation of each mode is obtained through the spectral clustering technology

I.e. the cluster indication matrix.

Wherein λ is ₁ 、λ ₂ And λ ₃ Are both penalty parameters.

Optionally, as can be seen from the above equation (3), in the process of obtaining the loss function value, the server invokes the target weight matrix to weight the reconstruction error of the self-expression feature corresponding to the any modality, so that the contribution of the filled average value to the overall self-expression feature can be effectively reduced.

In some embodiments, the server may successfully obtain the low-dimensional feature representation of all the modalities in the case of data loss by minimizing the above objective function, and the conventional multi-modality fusion technology forces the modalities to share the same low-dimensional feature matrix, whose model is as follows:

3083. and the server acquires the loss function value of any clustering process based on the reconstruction errors of the self-expression features, the clustering errors of any clustering process and the similarity difference between the clustering indication matrixes and the features to be solved.

In some embodiments, since equation (4) above is equivalent to minimizing the following model:

that is, the determination of the cluster indication matrix F depends on the sum of the similarity matrices of all modalities.

In the absence of modalities, these similarity matrices have non-true elements (filled averages) that add up to exacerbate the error, resulting in an inaccurate representation of the shared features that are finally learned. In view of this, the embodiment of the present application uses a matching technique, which can effectively fuse incomplete multi-modal features, synthesize an objective function, and obtain a fused model loss function:

wherein λ is ₄ For the trade-off of the parameters, U represents the target fusion feature matrix, ω (F) ^(v) U) is a collation technique that measures F for each modality v based on the notion of collation ^(v) Similarity to U. Optionally, the definition of the check function is as follows:

in some embodiments, the server employs a linear kernel

K _U ＝UU ^T Because of

So the above-mentioned check formula (6) can be rewritten as the following formula (7):

in some embodiments, the class number k is typically a fixed constant for a given dataset, so ignoring the constant term, the final objective function expression is obtained:

in the process, the server designs an objective function according to the content of the model, performs joint optimization on various matrix variables in the objective function to obtain a final iterative formula (8), and after setting a convergence threshold (namely a loss threshold) and initializing all parameters and variables, continuously iterates to obtain a final accurate clustering indication matrix.

In some embodiments, the server may use any one of the above formula (5) or formula (8) as an objective function for obtaining the cluster indication matrix, that is, the server may optimize the cluster indication matrix through formula (8) based on a kernel alignment technique to improve the expression capability of the cluster indication matrix, or may directly optimize the cluster indication matrix through formula (5) without using the kernel alignment technique to simplify the data processing flow, which is not specifically limited in the embodiments of the present application.

3084. And the server iteratively adjusts the parameters of the feature to be solved until the loss function value meets the stop condition, stops the iteration and acquires the feature to be solved in the last iteration as the shared feature.

In some embodiments, the server iteratively adjusts values of a plurality of intermediate variables in the loss function based on an alternating direction multiplier method to obtain a plurality of adjusted intermediate variables, wherein when a value of any intermediate variable is adjusted, values of intermediate variables other than the any intermediate variable in the plurality of intermediate variables are kept unchanged; and keeping the values of the adjusted intermediate variables unchanged, and executing the step of iteratively adjusting the parameters of the characteristic to be solved.

In some embodiments, the stop condition is that the difference between the loss function value of any one clustering process and the loss function value of the last clustering process is less than a loss threshold. Alternatively, the stop condition may be that the loss function value of any one clustering process is smaller than the target threshold. Optionally, the stop condition may also be that the number of iterations is greater than a number threshold, which is not specifically limited in the embodiment of the present application.

In one exemplary scenario, when variable E ^(v) 、Z ^(v) 、F ^(v) When coupled with U, the minimum optimization of the above algorithm is a non-convex problem, and it is very difficult to find a global optimal solution. In addition, the matrix variable Z ^(v) The constraint conditions of (2) are numerous, and meanwhile, the optimization is almost impossible to realize.

In view of this, the embodiments of the present application introduce the intermediate variable S separately ^(v) 、P ^(v) And Q ^(v) Implementing the matrix variable Z ^(v) Multiple constraint problem of (2) canOptionally, based on an ADMM (Alternating Direction Method of Multipliers) technology, the original objective function is converted into a plurality of subproblems, and variables in each subproblem are iteratively optimized, so that local optimal solutions of the model can be respectively found.

First, the lagrangian form of the target function as a whole is given below:

wherein, the first and the second end of the pipe are connected with each other,

and

are lagrange multipliers of mode v, and μ is a penalty parameter.

In the ADMM algorithm, when one variable is updated, the other variables are kept unchanged, optionally, the updating process is as follows:

(1) fix all other variables, update Z ^(v)

With respect to Z ^(v) The lagrange sub-problem of (a) is:

by calculating the partial derivative of the above formula and making the partial derivative formula be 0, the optimized formula (11) of z (v) can be solved:

wherein the content of the first and second substances,

(2) fix all other variables, update S ^(v)

With respect to S ^(v) The lagrange sub-problem of (a) is:

to solve the above sub-problem, the singular value threshold shrinkage operator Θ is applied to update S ^(v) ：

(3) Fixing all other variables, updating Q ^(v) And

its lagrangian problem is as follows:

because of the fact that

The above equation (14) can be rewritten as an optimization problem in the form of the following elements:

after the partial derivative of the above formula (15) is calculated with respect to Q ^(v) Is equivalent to

The above equation (15) can thus be updated to the following equation (16):

to ensure

Is not negative, the examples of the present application add

And (5) operating. That is, it can then be updated according to the following equation (17)

(4) Fixing all other variables, updating P ^(v)

The Lagrangian sub-problem is as follows:

the sparsity constraint problem may be solved using a soft threshold operation θ:

(5) all other variables are fixed, update E ^(v)

The lagrangian sub-problem is as follows:

definition of

Then matrix E ^(v) The update formula of (c) is:

E ^(v)＝ μM ^(v) (2G ^(v) G ^(v)T +μI) ^-1 (21)

wherein, (.) ^-1 Representing the inversion operation of the matrix.

(6) Fix all other variables, minimize F ^(v) Can be translated into the following maximization problem:

the above problem can be optimized using eigenvalue decomposition method, updated F ^(v) By

And forming a characteristic value vector corresponding to the front k maximum characteristic values after characteristic value decomposition.

(7) Fixing other variables, updating

And mu

Optionally, the following relevant parameters in the ADMM are updated according to the following formula:

μ＝min(μ ₀ ，ρμ) (27)

wherein constant μ ₀ And ρ are both preset parameters.

(8) Finally, all other variables are fixed, and the shared spectral clustering indicator matrix U of the incomplete multi-modal feature data set is updated using the same method in step (6), and the sub-problem about U is:

the ith row of the matrix U is updated to

And (4) after the eigenvalue is decomposed, the eigenvalue vector corresponding to the ith (descending order) eigenvalue.

Optionally, after obtaining the updated formulas of all variables, the convergence threshold is set to 10 in advance ^-4 (ii) a All parameters are assigned values, optionally with μ ═ 0.01, μ ₀ 1e8, ρ 1.1; all variable matrices are initialized randomly.

In some embodiments, the server alternately updates the variable matrix in turn according to the above optimization formula, and calculates the objective function value after each iteration. Judging whether the difference value of the objective function values of the two adjacent iteration processes meets a stop condition, if so, stopping iteration, and outputting the shared low-dimensional feature matrix obtained by the calculation as the final objective of the algorithm; if the convergence condition is not satisfied, step 3084 is repeated.

In the above step 307-308, the server clusters the self-expression features of the categories under the modalities to obtain the sharing feature. By carrying out low-rank and sparse joint constraint on the self-expression affinity graph, a global graph structure reflecting the relation of all data instances and a local graph structure reflecting the characteristics of individual data instances can be obtained simultaneously.

In the step 306-308, the server obtains the shared characteristics of the plurality of characteristic data of the plurality of modalities based on the completion characteristics of the at least one modality and the characteristic data of the modalities other than the at least one modality, that is, an effective model is constructed to solve the problem of information missing in multi-modality data fusion. A large number of experiments verify that the performance of the shared features obtained by the embodiment of the application is good. In addition, a weighting mechanism based on a target weight matrix is provided, and by combining a low-rank sparse representation technology, non-centralized distributed multi-modal data under the condition of deficiency can be effectively learned, and the method is easy to expand to a scene with any modal number.

Fig. 4 is a schematic flow chart of a data processing method provided in an embodiment of the present application, please refer to fig. 4, which illustrates an example in which the total modal number is 3, and for an incomplete multi-modal feature data set, an average value is used to fill missing feature values in each modal, then a weighting mechanism of a target weight matrix is combined to obtain self-expressed features of each category in each modal, a spectral clustering technique is used to obtain a clustering indicator matrix of each modal, and then a kernel alignment technique is combined to iteratively adjust a final shared feature.

Fig. 5 is a schematic flowchart of a data processing method provided in an embodiment of the present application, please refer to fig. 5, in which step one, incomplete multi-modal data is preprocessed, which is equivalent to regularizing each feature data; filling the average value of the missing characteristic values, and constructing a corresponding target weight matrix; step three, learning self-expression characteristics for characteristic data of each mode, performing spectral clustering as an affinity graph to obtain a clustering indication matrix, and fusing by using a kernel alignment technology; solving an optimization formula of each variable through an objective function of a minimization algorithm, setting a convergence threshold value, and initializing all parameters and variables; step five, updating all variables according to an optimization formula, and calculating the current loss function value; step six, calculating the difference value between the loss function values of the current iteration and the last iteration, if the difference value is larger than or equal to a given loss threshold value, returning to the step five, and otherwise, executing the step seven; and step seven, finishing iteration to obtain shared characteristics.

All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

According to the method provided by the embodiment of the application, the missing characteristic values in incomplete multi-modal data are complemented by utilizing the average value of the characteristic data under each mode, so that the complementing characteristics and the original characteristic data obtained after complementing are utilized, the characteristic data of different modes can be aligned, the shared characteristics used for representing the common information among the characteristic data of different modes can be conveniently and accurately reconstructed, and the processing accuracy of incomplete multi-modal data can be improved.

FIGS. 6 to 12 show model parameters λ, respectively ₁ 6-12 for BUAA, 3sources, Yale, Sensit, Wikipedia, Aloi and Webkb seven multimodal datasets, respectively, standardize mutual information index with lambda at a data loss rate of 30% ₁ Variation of the value taken at different times, in which case the parameter lambda ₃ And λ ₄ And (4) fixing.

FIGS. 13 to 19 show model parameters λ, respectively ₃ And λ ₄ FIG. 13 to FIG. 19 show seven multi-modal datasets BUAA, 3sources, Yale, Sensit, Wikipedia, Aloi and Webkb, respectively, normalized mutual information with lambda at a data loss rate of 30% ₃ And λ ₄ In the case of a variation of (b), in which the parameter lambda ₁ And (5) fixing.

In order to verify the effectiveness of the model provided by the embodiment of the application, the model is compared with 7 currently representative Multi-modal feature Learning models, namely, PVC (Partial Multi-View Clustering), MIC (Multi-incorporated-View Clustering, Incomplete Multi-View Clustering), IMG (Incomplete Multi-View Clustering, Incomplete Multi-View partitioning), DAIMC (double-Aligned Incomplete Multi-View Clustering, Multi-View Low-Sparse Subspace Clustering, APMC (approximate-based Partial Multi-View Clustering, Incomplete Multi-View Clustering, and Incomplete Multi-View Clustering), and verifying the performance of each model on three indexes of precision, standardized mutual information and purity by using a K-means (K mean value) clustering technology. Experiments in addition to BUAA face image data sets in the examples, news data sets 3sources, web data sets Webkb, sound signal data sets SensIT, Wikipedia data sets Wikipedia, image data sets Yale and Aloi were used, and specific data set information thereof is shown in table 1.

TABLE 1

In the experiment, fusion characteristics of different multi-modal data sets with data loss rates of 0.1-0.5 are learned through each model respectively, and then clustering analysis is carried out on the fusion characteristics by using a K-means technology. Table 2 to table 8 show the results of the clustering accuracy, normalized mutual information and purity comparison of the model of the present application and other comparison models on seven data sets. Experiments show that the clustering performance of the model of the application on all data sets is obviously superior to that of other models. The inherent geometric structure among missing data is effectively obtained by the model through low-rank sparse graph learning under a weight mechanism, and the fusion characteristics after kernel alignment have high-precision discriminability. Meanwhile, the model has the best performance on various data sets, so that the model has wide application value.

Table 2 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results of BUAA datasets on each model:

TABLE 2

Table 3 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results for the 3sources dataset on each model:

TABLE 3

Table 4 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results of the Yale dataset on each model:

TABLE 4

Table 5 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results of the SensIT data set on each model:

TABLE 5

Table 6 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results for the Wikipedia dataset on each model:

TABLE 6

Table 7 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results for Aloi datasets on each model:

TABLE 7

Table 8 shows the average clustering performance (accuracy, normalized mutual information, purity) comparison results for the Webkb dataset on each model:

TABLE 8

In comparative experiments, all comparative model parameters used the recommended values in the relevant original text. For the model parameters proposed in this application, λ is selected within {0.1, 0.3, 0.5, 0.7, 0.9} ₁ Is preferably taken and is λ ₂ ＝1-λ ₁ Selecting lambda from {1e-6,1e-5,1e-4,1e-3,1e-2,1e-1 }) ₃ And λ ₄ The value of (c). In order to find the best match of the above parameters, the following strategies are adopted in the application: firstly let λ be ₁ 0.3, find λ ₃ And λ ₄ Then fixing in reverse direction, adjusting lambda ₁ The value of (a). And finally, selecting a parameter value with optimal clustering performance as a final parameter value of model calculation by taking the standardized mutual information value as an evaluation standard.

The incomplete multi-modal data fusion algorithm based on low-rank sparse graph learning provided by the embodiment of the application is described in detail above.

Fig. 20 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, please refer to fig. 20, the apparatus including:

a first obtaining module 2001, configured to obtain a plurality of feature data belonging to a plurality of modalities, where at least one of the plurality of modalities corresponds to a feature data that includes a missing feature value, and any of the plurality of modalities corresponds to one or more feature data;

a completion module 2002, configured to, for the at least one modality with a missing feature value, complete the missing feature value in the feature data corresponding to the at least one modality based on an average value of the feature data corresponding to the at least one modality, so as to obtain a completed feature of the at least one modality;

a second obtaining module 2003, configured to obtain a shared feature of the plurality of feature data of the plurality of modalities based on the completing feature of the at least one modality and the feature data of the modalities other than the at least one modality, where the shared feature is used to represent common information between the plurality of feature data belonging to the plurality modalities respectively.

According to the device provided by the embodiment of the application, the missing characteristic values in incomplete multi-modal data are complemented by utilizing the average value of the characteristic data under each mode, so that the complementing characteristics and the original characteristic data obtained after complementing are utilized, the characteristic data of different modes can be aligned, the shared characteristics used for expressing the common information between the characteristic data of different modes can be conveniently and accurately reconstructed, and the processing accuracy of incomplete multi-modal data can be improved.

In one possible implementation, based on the apparatus components of fig. 20, the completion module 2002 includes:

the obtaining submodule is used for obtaining an average value of characteristic data corresponding to any mode in the at least one mode containing the missing characteristic value;

and the filling submodule is used for filling the average value of the characteristic data corresponding to any mode into the missing characteristic value of the characteristic data corresponding to any mode to obtain the completion characteristic of any mode.

In a possible embodiment, based on the apparatus composition of fig. 20, the apparatus further comprises:

the completion module 2002 is further configured to perform a step of completing the missing feature values and obtaining a completion feature based on the normalized feature data.

In one possible embodiment, the feature data corresponding to any of the plurality of modalities respectively corresponds to a plurality of categories;

based on the apparatus composition of fig. 20, the second acquisition module 2003 includes:

the linear combination submodule is used for carrying out linear combination on the characteristic data or the completion characteristic of any one mode in the plurality of modes and the characteristic value or the completion average value belonging to the same category to obtain the self-expression characteristic of the plurality of categories under any mode;

and the clustering submodule is used for clustering a plurality of self-expression characteristics of the plurality of categories under the plurality of modes to obtain the shared characteristic.

In one possible implementation, based on the apparatus components of fig. 20, the clustering submodule includes:

the constructing unit is used for constructing a plurality of affinity graphs of the plurality of categories under the plurality of modalities based on the plurality of self-expression features, wherein any one of the affinity graphs comprises a plurality of nodes and a plurality of weighted edges, any one of the plurality of nodes is used for representing the self-expression features of one category under the corresponding modality, and the weight carried by any one of the weighted edges is used for representing the similarity between the two self-expression features of the two categories corresponding to the two nodes of the any one weighted edge under the corresponding modality;

In one possible embodiment, based on the apparatus composition of fig. 20, the spectral clustering unit includes:

the first obtaining subunit is configured to obtain, for any one clustering process, a plurality of graph laplacian matrices of the plurality of affinity graphs;

a third obtaining subunit, configured to obtain a loss function value of the any clustering process based on reconstruction errors of the multiple self-expression features, a clustering error of the any clustering process, and a similarity difference between the multiple clustering indication matrices and the feature to be solved;

the third obtaining subunit is further configured to, in the process of obtaining the loss function value, call the target weight matrix, and weight the reconstruction error of the self-expression feature corresponding to the any modality.

In one possible embodiment, the target weight matrix is a diagonal weight matrix, the weight coefficient of the non-missing eigenvalue is 1, and the weight coefficient of the missing eigenvalue is the ratio of the number of samples of the non-missing eigenvalue to the number of total samples in any mode.

In one possible embodiment, the first obtaining subunit is configured to:

In one possible embodiment, the iterative adjustment subunit is configured to:

and keeping the values of the adjusted intermediate variables unchanged, and executing the step of iteratively adjusting the parameters of the characteristic to be solved.

In a possible embodiment, the stop condition is that the difference between the loss function value of any one clustering process and the loss function value of the last clustering process is smaller than a loss threshold.

In one possible implementation, the feature data corresponding to any of the plurality of modalities respectively corresponds to a plurality of categories, the plurality of categories are a plurality of target objects, and the plurality of modalities of the plurality of target objects at least include visual information and near-infrared information of face images of the respective plurality of target objects.

It should be noted that: in the data processing apparatus provided in the above embodiment, only the division of the above functional modules is taken as an example for data processing, and in practical applications, the above function allocation can be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the data processing apparatus and the data processing method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the data processing method embodiments and are not described herein again.

Fig. 21 is a schematic structural diagram of a computer device 2100 according to an embodiment of the present application, where the computer device 2100 may generate a relatively large difference due to different configurations or performances, and the computer device 2100 includes one or more processors (CPUs) 2101 and one or more memories 2102, where the memories 2102 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 2101 to implement the data Processing methods according to the embodiments. Optionally, the computer device 2100 further has a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the computer device 2100 further includes other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, which is executable by a processor in a terminal to perform the data processing method in the above-described embodiments, is also provided. For example, the computer readable storage medium includes a ROM (Read-Only Memory), a RAM (Random-Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute to complete the data processing method in the above-described embodiments.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program instructing relevant hardware, and optionally, the program is stored in a computer readable storage medium, and optionally, the above mentioned storage medium is a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of data processing, the method comprising:

acquiring a plurality of characteristic data belonging to a plurality of modalities, wherein the characteristic data corresponding to at least one of the modalities comprises a missing characteristic value, and any one of the modalities corresponds to one or more characteristic data;

2. The method according to claim 1, wherein for the at least one modality with missing feature values, completing the missing feature values in the feature data corresponding to the at least one modality based on an average value of the feature data corresponding to the at least one modality, and obtaining a completed feature of the at least one modality comprises:

for any one of the at least one modality containing the missing characteristic values, obtaining an average value of characteristic data corresponding to the any modality;

and filling the average value of the characteristic data corresponding to any mode into the missing characteristic value of the characteristic data corresponding to any mode to obtain the completion characteristic of any mode.

3. The method according to claim 1, wherein for the at least one modality with missing feature values, the method further comprises, before completing the missing feature values in the feature data corresponding to the at least one modality based on an average value of the feature data corresponding to the at least one modality and obtaining a completed feature of the at least one modality:

regularizing the plurality of feature data to obtain a plurality of regularized feature data;

and executing the step of completing the missing characteristic values and obtaining the completing characteristics based on the regularized characteristic data.

4. The method according to claim 1, wherein the feature data corresponding to any of the plurality of modalities respectively corresponds to a plurality of categories;

the obtaining shared features of the plurality of feature data of the plurality of modalities based on the complementing features of the at least one modality and the feature data of the modalities of the plurality of modalities other than the at least one modality comprises:

for feature data or completion features of any one of the plurality of modalities, linearly combining feature values or completion average values belonging to the same category to obtain self-expression features of the plurality of categories in the any modality;

and clustering the self-expression characteristics of the categories under the plurality of modes to obtain the shared characteristics.

5. The method according to claim 4, wherein clustering the plurality of self-expressed features of the plurality of categories under the plurality of modalities to obtain the shared feature comprises:

constructing a plurality of affinity graphs of the plurality of categories under the plurality of modalities based on the plurality of self-expression features, wherein any one of the plurality of affinity graphs comprises a plurality of nodes and a plurality of weighted edges, any one of the plurality of nodes is used for representing the self-expression features of one category under the corresponding modality, and a weight carried by any one of the plurality of weighted edges is used for representing the similarity between the two self-expression features of the two categories corresponding to the two nodes of the any one weighted edge under the corresponding modality;

and performing spectral clustering operation on the plurality of self-expression characteristics based on the plurality of affinity graphs to obtain the shared characteristics.

6. The method according to claim 5, wherein the performing spectral clustering operations on the plurality of self-expressed features based on the plurality of affinity maps to obtain the shared features comprises:

for any clustering process, obtaining a plurality of graph Laplacian matrixes of the plurality of affinity graphs;

obtaining a plurality of clustering indication matrixes of the plurality of modes based on the plurality of graph Laplacian matrixes;

obtaining a loss function value of any clustering process based on reconstruction errors of the self-expression features, clustering errors of any clustering process and similarity difference between the clustering indication matrixes and features to be solved;

and iteratively adjusting the parameters of the features to be solved until the loss function value meets the stop condition, stopping iteration, and acquiring the features to be solved in the last iteration as the shared features.

7. The method according to claim 6, wherein for the at least one modality with missing feature values, the method further comprises, after completing the missing feature values in the feature data corresponding to the at least one modality based on an average value of the feature data corresponding to the at least one modality and obtaining a completed feature of the at least one modality:

for any modality in the plurality of modalities, acquiring a target weight matrix corresponding to the modality, wherein the weight coefficient of the characteristic value which is not missing in the target weight matrix under the modality is greater than the weight coefficient of the missing characteristic value;

and in the process of obtaining the loss function value, calling the target weight matrix to weight the reconstruction error of the self-expression characteristic corresponding to any modality.

8. The method according to claim 7, wherein the target weight matrix is a diagonal weight matrix, the weight coefficient of the undeleted eigenvalue is 1, and the weight coefficient of the missing eigenvalue is the ratio of the number of samples of the undeleted eigenvalue to the number of total samples in any mode.

9. The method of claim 6, wherein obtaining the plurality of graph laplacian matrices of the plurality of affinity graphs comprises:

acquiring a similarity matrix and a diagonal matrix of any affinity map based on any affinity map in the multiple affinity maps;

10. The method of claim 6, wherein iteratively adjusting the parameters of the feature to be solved comprises:

and keeping the values of the adjusted intermediate variables unchanged, and executing the step of iteratively adjusting the parameters of the characteristics to be solved.

11. The method of claim 6, wherein the stopping condition is that a difference between the loss function value of any one clustering process and the loss function value of the last clustering process is less than a loss threshold.

12. The method according to claim 1, wherein the feature data corresponding to any of the plurality of modalities respectively correspond to a plurality of categories, the plurality of categories are a plurality of target objects, and the plurality of modalities of the plurality of target objects include at least visual information and near-infrared information of face images of the respective plurality of target objects.

13. A data processing apparatus, characterized in that the apparatus comprises:

the completion module is used for completing the missing characteristic values in the characteristic data corresponding to the at least one modality based on the average value of the characteristic data corresponding to the at least one modality, so as to obtain the completed characteristic of the at least one modality;

14. A computer device, characterized in that the computer device comprises one or more processors and one or more memories in which at least one computer program is stored, the at least one computer program being loaded and executed by the one or more processors to implement the data processing method according to any one of claims 1 to 12.

15. A storage medium having stored therein at least one computer program which is loaded and executed by a processor to implement the data processing method of any one of claims 1 to 12.