CN113254741A - Data processing method and system based on intra-modality fusion and inter-modality relation - Google Patents
Data processing method and system based on intra-modality fusion and inter-modality relation Download PDFInfo
- Publication number
- CN113254741A CN113254741A CN202110665991.5A CN202110665991A CN113254741A CN 113254741 A CN113254741 A CN 113254741A CN 202110665991 A CN202110665991 A CN 202110665991A CN 113254741 A CN113254741 A CN 113254741A
- Authority
- CN
- China
- Prior art keywords
- network
- data
- modal
- modality
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 230000004927 fusion Effects 0.000 title claims abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000013145 classification model Methods 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000012795 verification Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 19
- 239000000126 substance Substances 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 241000282414 Homo sapiens Species 0.000 claims description 3
- 241000764238 Isis Species 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 2
- 230000036039 immunity Effects 0.000 claims 1
- 230000006399 behavior Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 208000020016 psychiatric disease Diseases 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000003340 mental effect Effects 0.000 description 2
- 230000004630 mental health Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 206010010144 Completed suicide Diseases 0.000 description 1
- 208000024254 Delusional disease Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005978 brain dysfunction Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011369 optimal treatment Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 208000002851 paranoid schizophrenia Diseases 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The application relates to a data processing method and a system based on the relation between fusion modalities and between modalities, which comprises the following steps: obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set; constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-modal subject information auxiliary task network which are connected with the feature extraction network; and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified. The method and the device can effectively improve the performance of the social network data in directing to the target classification.
Description
Technical Field
The present application relates to the field of data processing technologies, and more particularly, to a data processing method and system fusing intra-modality and inter-modality relationships.
Background
The expression and behavior of a human are variously expressed, reflecting the mental condition of the human. The identification of various expressions and behaviors and the classification of subjects having problems therein have become an essential process for social security and are also important research subjects in the field of psychomedicine. Such subjects are characterized by brain dysfunction that results in various degrees of impairment of mental activities such as cognition, emotion, will, and behavior under the influence of various biological, psychological and social environmental factors. There are a very large variety of such problematic expressions and behaviors, and many eventually develop into various mental disorders, such as autism, depression, delusional disorder, etc. Among them, depression is one of the most common mental disorders, seriously threatening the health of people. According to the world health organization, about 3 million people worldwide suffer from depression. Depression can lead to suicide in severe cases, which seriously affects the daily life of the patient. However, 76% to 85% of depression patients are not treated effectively in low-income and medium-income countries due to lack of medical resources and trained health care staff, and most depression patients miss appropriate treatment time ignoring their own depression symptoms. Early depression detection is of great significance in preventing mental health diseases such as depression.
The traditional mental disorder identification is mainly based on psychological knowledge, for example, for depression, a way of filling out a depression measuring table and a professional manual interview is adopted to judge whether a user has depression tendency, however, the following defects exist in the way: (1) the resource consumption is high, the professional medical personnel are limited, and the manual detection cost is high; (2) the diagnosis period is long, the diagnosis process needs medical personnel to follow up for a long time, and the process is slow; (3) the process is passive, and the patient can actively go to treatment only when symptoms are obvious, and the optimal treatment time is missed.
With the rapid development of the internet, social platforms such as Twitter, microblog and buffalo have become indispensable social tools for people, hundreds of millions of users share their thoughts and moods in various social platforms every day, the social network data including multi-modal information (such as texts, pictures, voice and the like) provides a new method and way for expression and behavior recognition of people, and more researchers use the multi-modal social network data to research various mental health diseases including depression. However, in the face of massive social network data, how to effectively model multi-modal sequence information becomes a key problem for improving data processing performance. Currently, modeling text or picture modal sequence information is mostly realized by a variation method such as RNN (radio network) and the like, and the problem of sequence dependence exists. Timing information cannot be modeled well.
Disclosure of Invention
The object of the present application is to solve the above technical problem. The application provides a social network data processing method and system fusing intra-modal and inter-modal relationships, data needing to be identified and classified are processed by using a new topic model to model multi-modal sequence information, the problem of sequence dependence caused by methods such as RNN (neural network) is relieved, and the performance of classification processing of a social network pointing to a target is further improved. The application provides the following technical scheme:
in a first aspect, a data processing method based on intra-modality and inter-modality relationships is provided, which includes:
obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set;
constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode topic information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode topic information auxiliary task network comprises a text modal network, a picture modal network and an inter-modal network and is used for acquiring topic information in the text modal network, topic information in the picture modal network and network relation topic information between modalities;
and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified.
Optionally, the text feature extraction network is a BERT model, and the picture feature extraction network is a VGG model.
Optionally, the training with the preset loss function includes simultaneously training the main task and the auxiliary task through a main task loss function, an auxiliary task loss function, and a joint loss function.
Optionally, wherein the text modality network, the picture modality network, and the inter-modality network are constructed based on a variational auto-encoder framework.
Optionally, the method for obtaining the theme information in the text modal network, the theme information in the picture modal network, and the network relationship theme information between modalities includes:
obtaining topic information in a mode by using a text mode network and a picture mode network;
obtaining the relation information between the text mode and the picture mode by using the following formula, and inputting the relation information into the inter-mode network to obtain the relation topic information between the multiple modes:
wherein the content of the first and second substances,is a non-linear function of the standard,is as followstThe individual text and its corresponding pictorial representation,and isIs a transformation vector of order 3,d,mthe size of the dimensions of the vector is represented,,vector multiplication for trainable parametersThe result of (a) is a vectorEach of which。
Optionally, wherein the network model of the primary task is constructed based on an LSTM model.
Optionally, the merging of the outputs of the main task and the auxiliary task using the gating mechanism comprises:
wherein the content of the first and second substances,for the final representation of the user of the social network,is an output representation of the main task and,for the representation of the output of the three kinds of subject information,,andare trainable parameters.
Optionally, the main task loss function is:
wherein the content of the first and second substances,Nfor the number of samples to be taken,is as followsiIndividual userThe true category label of (a) is,is composed ofThe coefficients of the regularization are,all training parameters in the model;
auxiliary task loss function:
wherein the content of the first and second substances,Uin the form of an intermediate content matrix, the content matrix,for a standard normal distribution, the first half of the formula measures the modeled distribution with a Kullback-Leibler divergenceAnd true distributionThe second part is the reconstruction loss of the model, the original input is reconstructed by generating the network,representing a training parameter;
joint loss function:
wherein the content of the first and second substances,as weights, to balance the penalty functions of the primary and secondary tasks,is a function of the main loss as a function of,for the purpose of the text modal loss function,as a function of the modal loss of the picture,is an inter-modal relationship loss function.
In a second aspect, there is provided a data processing system that fuses intra-modality and inter-modality relationships, comprising:
the system comprises a sample construction unit, a data acquisition unit and a data analysis unit, wherein the sample construction unit is used for acquiring an initial sample and dividing the sample into a training set, a verification set and a test set;
the model building unit is used for building a data classification model based on the relation topic information in the fusion modality and between the modalities;
and the model training unit is used for training a data classification model based on the intra-modal and inter-modal relation topic information.
The beneficial effects of this application include at least:
(1) compared with the prior social network data processing and classification method which mainly uses texts for training and only mines the relevant information of the text data, the method uses multi-modal social network data based on texts and pictures, and uses more useful information;
(2) the text features are extracted by using the latest BERT method, the picture features are extracted by using the VGG method, data information can be captured better, and the performance of the method is improved effectively;
(3) the invention provides a new theme model, which can learn sparse text theme characteristics and continuous picture characteristic themes;
(4) the topic model is used for learning the topic information in each mode and the relationship topic information among multiple modes, and the social network data target direction classification performance is obviously improved.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application.
Drawings
The present application may be better understood by describing exemplary embodiments thereof in conjunction with the following drawings, wherein:
fig. 1 is a general schematic diagram of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application.
Fig. 2 is a flowchart of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application.
Fig. 3 is a schematic diagram of text feature extraction based on a BERT model according to an embodiment of the present application.
Fig. 4 is a schematic diagram of extracting features of a picture based on a VGG model according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a topic information model provided in an embodiment of the present application.
FIG. 6 is a data processing system diagram that merges intra-modality and inter-modality relationships provided in accordance with an embodiment of the present application.
Detailed Description
The following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings and examples, will enable those skilled in the art to practice the embodiments of the present application with reference to the description.
It is noted that in the detailed description of these embodiments, in order to provide a concise description, all features of an actual implementation may not be described in detail. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions are made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Fig. 2 is a flowchart of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application. The method at least comprises the following steps:
step S201, sample data of a social network pointing to a target classification is obtained, the sample data is divided into a training set, a verification set and a test set, and training set sample data, verification set sample data and test set sample data are obtained.
Step S202, a preset classification model is constructed, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode subject information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode subject information auxiliary task network comprises a text mode network, a picture mode network and an inter-mode network and is used for acquiring subject information in the text mode network, subject information in the picture mode network and subject information of network relation among modes.
In the present embodiment, text features are extracted using a BERT-base (uncached) model, and the structure of the BERT-base (uncached) model is shown in fig. 3, because each user has published many text sentences, "[ CLS ] in the BERT model is used to represent whole sentence vectors, and the sentence vectors are converted into the same dimensional features as the pictures. When all text sequences are BERT encoded, a text Content matrix (UGC) can be obtained, and the formula is as follows:
wherein the content of the first and second substances,a matrix of text content representing the user,is shown asnThe characteristics extracted by BERT were used in sentences.
In this embodiment, a VGG16 model is used to encode a picture, and an output vector of the first fully-connected layer of the VGG16 is used as an encoding vector of the picture, and is converted into a feature dimension with the same size as a text, where the VGG16 model is shown in fig. 4.
In the present embodiment, referring to FIG. 5, the following steps are followed to obtainObtaining the relation topic information between the internal topics of each modality and the multiple modalities: unlike traditional neural topic models that focus on modeling sparse text feature topics, the topic model in this embodiment aims to generate a matrix of content (e.g., content matrix) in between each modalityOrEtc.), since both sparse and continuous features can be encoded into the same UGC matrix, they can be input into the topic model proposed by the present invention, which adopts a variational self-encoder framework, including inference networks and generation networks. The loss function of the subject model is as follows:
wherein the content of the first and second substances,Uthe intermediate content matrix is shown with superscripts omitted for convenience.Is a standard normal distribution, and the first half of the formula measures the modeled distribution with a Kullback-Leibler divergenceAnd true distributionThe second part is the reconstruction loss of the model, i.e. the original input is reconstructed by the generating network.Representing the training parameters.
Firstly, a theme model is used for modeling a theme in each mode, relationship information among multiple modes is obtained by using the following formula, and then the relationship information is input into the theme model to obtain relationship theme characteristics among the modes, wherein the formula is as follows:
wherein the content of the first and second substances,is a standard non-linear function of the signal,is shown astThe individual text and its corresponding pictorial representation,and isIs a transformation vector of order 3,d,mrepresenting the dimension size of the vector.,Are trainable parameters. We multiply the vectorsAs a vectorEach of whichIs equivalent toThe bilinear models simultaneously capture multiline mutual information of the vectors. We approximate each vector slice using two low rank matrices using a method of vector factorizationThe method comprises the following steps:
In the example, an LSTM is utilized to construct a target-oriented classification network model, the training of the model is set as a main task, and a theme in each modeling mode and a relation theme between multiple modes are set as auxiliary tasks.
Step S203, inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified
Firstly, the characteristics of a text mode and a picture mode are fused and input into an LSTM network for training to obtain the representation of a main task. The loss function of a specific main task is defined as:
wherein the content of the first and second substances,Nindicating the number of samples, i.e., the number of Twitter users.Is the firstiIndividual userTrue category label of (2).Is thatThe coefficients of the regularization are,all training parameters in the model are represented.
Then, in order to distinguish different influences of the topic information of different auxiliary tasks on the target detection classification result, a gating mechanism is used to distinguish the expressions of the main task and the auxiliary tasks, and a formula is defined as follows:
wherein the content of the first and second substances,it is the last representation of the user that,is an output representation of the main task and,the method is a representation of three types of theme information output, wherein the theme information in each mode comprises a text theme and a picture theme, and then the relation theme information among multiple modes.,Andare trainable parameters.
The method usesThe combined loss function simultaneously optimizes the main task and the three auxiliary tasks, the loss function of the whole method comprises the three auxiliary tasks, such as text theme modeling, picture theme modeling and multi-modal relation theme modeling, and the corresponding loss function is expressed asAnd. The loss function of the loss union of the whole method is then expressed as:
wherein the content of the first and second substances,is a weight that balances the penalty function for the primary and secondary tasks.
In the embodiment, the data classification is carried out by simultaneously fusing the characteristics of the main task and the auxiliary task through the following steps, and the characteristic result fused by using the gating mechanism is input into softmax to carry out the second classification of the pointing target:
wherein the content of the first and second substances,,is a parameter that is trainable,nis the total number of categories to be classified,ris the final representation of the user for classification.Is the probability distribution of the two classes predicted by the model. The method inputs the social network user numberThe text and picture information of the table are converted into probability distribution and output as a classification label pointing to the target, taking depression as an example, 0 is non-depression and 1 is depression.
FIG. 6 is a data processing system for fusing multi-modal relationship topic information provided in one embodiment of the present application. The system at least comprises the following units: a sample construction unit 610, a model construction unit 620, and a model training unit 630.
The sample construction unit 610 acquires an initial sample, and divides the sample into a training set, a verification set and a test set;
a model construction unit 620 for constructing a data classification model based on the intra-modality fusion and inter-modality relationship topic information;
a model training unit 630 trains data classification models based on intra-modality and inter-modality relationship topic information.
For relevant details reference is made to the above-described method embodiments.
The basic principles of the present application have been described in connection with specific embodiments, but it should be noted that, for those skilled in the art, it can be understood that all or any of the steps or components of the method and apparatus of the present application can be implemented in hardware, firmware, software or their combination in any computing device (including processors, storage media, etc.) or network of computing devices, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present application.
The object of the present application can thus also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the application can thus also be achieved merely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present application, and a storage medium storing such a program product also constitutes the present application. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.
It is further noted that in the apparatus and method of the present application, it is apparent that the components or steps may be disassembled and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
Unless otherwise defined, technical or scientific terms used in the claims and the specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. The use of "first," "second," and similar terms in the description and claims of this patent application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The terms "a" or "an," and the like, do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprise" or "comprises", and the like, means that the element or item listed before "comprises" or "comprising" covers the element or item listed after "comprising" or "comprises" and its equivalent, and does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, nor are they restricted to direct or indirect connections.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (9)
1. A data processing method based on intra-modality and inter-modality relationships fusion includes:
obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set;
constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode topic information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode topic information auxiliary task network comprises a text modal network, a picture modal network and an inter-modal network and is used for acquiring topic information in the text modal network, topic information in the picture modal network and network relation topic information between modalities;
and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified.
2. The method of claim 1, wherein the text feature extraction network is a BERT model and the picture feature extraction network is a VGG model.
3. The method of claim 1, wherein the training with the preset loss function comprises simultaneously training the main task and the auxiliary task with a main task loss function, an auxiliary task loss function, and a joint loss function.
4. The method of claim 1, wherein the text modality network, picture modality network, and inter-modality network are constructed based on a variational self-encoder framework.
5. The method according to claim 4, wherein the method for obtaining the topic information of the text modal network, the topic information of the picture modal network and the topic information of the inter-modal network relationship comprises the following steps:
obtaining topic information in a mode by using a text mode network and a picture mode network;
obtaining the relation information between the text mode and the picture mode by using the following formula, and inputting the relation information into the inter-mode network to obtain the relation topic information between the multiple modes:
wherein the content of the first and second substances,is a non-linear function of the standard,is as followstThe individual text and its corresponding pictorial representation,and isIs a transformation vector of order 3,d,mthe size of the dimensions of the vector is represented,,vector multiplication for trainable parametersThe result of (a) is a vectorEach of which。
6. The method of claim 1, wherein the network model of the primary task is built based on an LSTM model.
7. The method of claim 1, wherein the fusing the outputs of the main task and the auxiliary task using the gating mechanism is as follows:
8. The method of claim 3, wherein:
main task loss function:
wherein the content of the first and second substances,Nis the number of samplesTo achieve the purpose of improving the immunity of human beings,is as followsiIndividual userThe true category label of (a) is,is composed ofThe coefficients of the regularization are,all training parameters in the model;
auxiliary task loss function:
wherein the content of the first and second substances,Uin the form of an intermediate content matrix, the content matrix,for a standard normal distribution, the first half of the formula measures the modeled distribution with a Kullback-Leibler divergenceAnd true distributionThe second part is the reconstruction loss of the model, the original input is reconstructed by generating the network,representing a training parameter;
joint loss function:
wherein the content of the first and second substances,as weights, to balance the penalty functions of the primary and secondary tasks,is a function of the main loss as a function of,for the purpose of the text modal loss function,as a function of the modal loss of the picture,is an inter-modal relationship loss function.
9. A data processing system that fuses intra-modality and inter-modality relationships, comprising:
the system comprises a sample construction unit, a data acquisition unit and a data analysis unit, wherein the sample construction unit is used for acquiring an initial sample and dividing the sample into a training set, a verification set and a test set;
the model building unit is used for building a data classification model based on the relation topic information in the fusion modality and between the modalities;
and the model training unit is used for training a data classification model based on the intra-modal and inter-modal relation topic information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110665991.5A CN113254741B (en) | 2021-06-16 | 2021-06-16 | Data processing method and system based on intra-modality fusion and inter-modality relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110665991.5A CN113254741B (en) | 2021-06-16 | 2021-06-16 | Data processing method and system based on intra-modality fusion and inter-modality relation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113254741A true CN113254741A (en) | 2021-08-13 |
CN113254741B CN113254741B (en) | 2021-09-28 |
Family
ID=77188227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110665991.5A Active CN113254741B (en) | 2021-06-16 | 2021-06-16 | Data processing method and system based on intra-modality fusion and inter-modality relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254741B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107072595A (en) * | 2013-12-31 | 2017-08-18 | 威斯康星州医药大学股份有限公司 | Adaptive restatement based on multi-modality imaging is drawn |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
WO2018219927A1 (en) * | 2017-05-30 | 2018-12-06 | Esteve Pharmaceuticals, S.A. | (hetero)arylalkylamino-pyrazolopyridazine derivatives having multimodal activity against pain |
CN109840287A (en) * | 2019-01-31 | 2019-06-04 | 中科人工智能创新技术研究院(青岛)有限公司 | A kind of cross-module state information retrieval method neural network based and device |
CN110222827A (en) * | 2019-06-11 | 2019-09-10 | 苏州思必驰信息科技有限公司 | The training method of text based depression judgement network model |
US20200364405A1 (en) * | 2017-05-19 | 2020-11-19 | Google Llc | Multi-task multi-modal machine learning system |
US20200372369A1 (en) * | 2019-05-22 | 2020-11-26 | Royal Bank Of Canada | System and method for machine learning architecture for partially-observed multimodal data |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112364168A (en) * | 2020-11-24 | 2021-02-12 | 中国电子科技集团公司电子科学研究院 | Public opinion classification method based on multi-attribute information fusion |
CN112598067A (en) * | 2020-12-25 | 2021-04-02 | 中国联合网络通信集团有限公司 | Emotion classification method and device for event, electronic equipment and storage medium |
CN112612936A (en) * | 2020-12-28 | 2021-04-06 | 杭州电子科技大学 | Multi-modal emotion classification method based on dual conversion network |
CN112784801A (en) * | 2021-02-03 | 2021-05-11 | 紫东信息科技(苏州)有限公司 | Text and picture-based bimodal gastric disease classification method and device |
CN112819052A (en) * | 2021-01-25 | 2021-05-18 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Multi-modal fine-grained mixing method, system, device and storage medium |
US20210150315A1 (en) * | 2019-11-14 | 2021-05-20 | International Business Machines Corporation | Fusing Multimodal Data Using Recurrent Neural Networks |
-
2021
- 2021-06-16 CN CN202110665991.5A patent/CN113254741B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107072595A (en) * | 2013-12-31 | 2017-08-18 | 威斯康星州医药大学股份有限公司 | Adaptive restatement based on multi-modality imaging is drawn |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
US20200364405A1 (en) * | 2017-05-19 | 2020-11-19 | Google Llc | Multi-task multi-modal machine learning system |
WO2018219927A1 (en) * | 2017-05-30 | 2018-12-06 | Esteve Pharmaceuticals, S.A. | (hetero)arylalkylamino-pyrazolopyridazine derivatives having multimodal activity against pain |
CN109840287A (en) * | 2019-01-31 | 2019-06-04 | 中科人工智能创新技术研究院(青岛)有限公司 | A kind of cross-module state information retrieval method neural network based and device |
US20200372369A1 (en) * | 2019-05-22 | 2020-11-26 | Royal Bank Of Canada | System and method for machine learning architecture for partially-observed multimodal data |
CN110222827A (en) * | 2019-06-11 | 2019-09-10 | 苏州思必驰信息科技有限公司 | The training method of text based depression judgement network model |
US20210150315A1 (en) * | 2019-11-14 | 2021-05-20 | International Business Machines Corporation | Fusing Multimodal Data Using Recurrent Neural Networks |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112364168A (en) * | 2020-11-24 | 2021-02-12 | 中国电子科技集团公司电子科学研究院 | Public opinion classification method based on multi-attribute information fusion |
CN112598067A (en) * | 2020-12-25 | 2021-04-02 | 中国联合网络通信集团有限公司 | Emotion classification method and device for event, electronic equipment and storage medium |
CN112612936A (en) * | 2020-12-28 | 2021-04-06 | 杭州电子科技大学 | Multi-modal emotion classification method based on dual conversion network |
CN112819052A (en) * | 2021-01-25 | 2021-05-18 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Multi-modal fine-grained mixing method, system, device and storage medium |
CN112784801A (en) * | 2021-02-03 | 2021-05-11 | 紫东信息科技(苏州)有限公司 | Text and picture-based bimodal gastric disease classification method and device |
Non-Patent Citations (2)
Title |
---|
吴良庆 等: "基于多任务学习的多模态情绪识别方法", 《基于多任务学习的多模态情绪识别方法》 * |
章荪 等: "基于多任务学习的时序多模态情感分析模型", 《计算机应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113254741B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Abdullah et al. | SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning | |
CN111444709B (en) | Text classification method, device, storage medium and equipment | |
Han et al. | A review on sentiment discovery and analysis of educational big‐data | |
Chen et al. | Sentiment analysis based on deep learning and its application in screening for perinatal depression | |
Li et al. | Weibo text sentiment analysis based on bert and deep learning | |
Almars | Attention-Based Bi-LSTM Model for Arabic Depression Classification. | |
Wu et al. | Kaicd: A knowledge attention-based deep learning framework for automatic icd coding | |
Zhang et al. | Multi-task learning for jointly detecting depression and emotion | |
Wani et al. | Depression screening in humans with AI and deep learning techniques | |
Vohra et al. | Deep learning based sentiment analysis of public perception of working from home through tweets | |
Zhou et al. | Tamfn: Time-aware attention multimodal fusion network for depression detection | |
Lim et al. | Subsentence extraction from text using coverage-based deep learning language models | |
Sirrianni et al. | Predicting stance polarity and intensity in cyber argumentation with deep bidirectional transformers | |
Shen et al. | Emotion analysis of ideological and political education using a gru deep neural network | |
Wang et al. | SCANET: Improving multimodal representation and fusion with sparse‐and cross‐attention for multimodal sentiment analysis | |
CN113254741B (en) | Data processing method and system based on intra-modality fusion and inter-modality relation | |
Ji et al. | LSTM based semi-supervised attention framework for sentiment analysis | |
Huang et al. | Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network | |
Song | Distilling knowledge from user information for document level sentiment classification | |
Feng et al. | SINN: A speaker influence aware neural network model for emotion detection in conversations | |
Bose et al. | Attention-based multimodal deep learning on vision-language data: models, datasets, tasks, evaluation metrics and applications | |
Nabiilah et al. | Personality Classification Based on Textual Data using Indonesian Pre-Trained Language Model and Ensemble Majority Voting. | |
Lin et al. | Adapting Static and Contextual Representations for Policy Gradient-Based Summarization | |
Dehghani et al. | Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model | |
dos Santos Júnior et al. | Learning and Semi-automatic Intention Labeling for Classification Models: ACOVID-19 Study for Chatbots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231113 Address after: Room 401, 4th Floor, CCF Building, 600 Xiangrong Road, High Speed Rail New City, Xiangcheng District, Suzhou City, Jiangsu Province, 215133 Patentee after: Digital Suzhou Construction Co.,Ltd. Address before: No.1 Shizi street, Gusu District, Suzhou City, Jiangsu Province Patentee before: SOOCHOW University |
|
TR01 | Transfer of patent right |