CN113254741A - Data processing method and system based on intra-modality fusion and inter-modality relation - Google Patents

Data processing method and system based on intra-modality fusion and inter-modality relation Download PDF

Info

Publication number
CN113254741A
CN113254741A CN202110665991.5A CN202110665991A CN113254741A CN 113254741 A CN113254741 A CN 113254741A CN 202110665991 A CN202110665991 A CN 202110665991A CN 113254741 A CN113254741 A CN 113254741A
Authority
CN
China
Prior art keywords
network
data
modal
modality
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110665991.5A
Other languages
Chinese (zh)
Other versions
CN113254741B (en
Inventor
李寿山
安明慧
王晶晶
周国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Suzhou Construction Co ltd
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202110665991.5A priority Critical patent/CN113254741B/en
Publication of CN113254741A publication Critical patent/CN113254741A/en
Application granted granted Critical
Publication of CN113254741B publication Critical patent/CN113254741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The application relates to a data processing method and a system based on the relation between fusion modalities and between modalities, which comprises the following steps: obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set; constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-modal subject information auxiliary task network which are connected with the feature extraction network; and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified. The method and the device can effectively improve the performance of the social network data in directing to the target classification.

Description

Data processing method and system based on intra-modality fusion and inter-modality relation
Technical Field
The present application relates to the field of data processing technologies, and more particularly, to a data processing method and system fusing intra-modality and inter-modality relationships.
Background
The expression and behavior of a human are variously expressed, reflecting the mental condition of the human. The identification of various expressions and behaviors and the classification of subjects having problems therein have become an essential process for social security and are also important research subjects in the field of psychomedicine. Such subjects are characterized by brain dysfunction that results in various degrees of impairment of mental activities such as cognition, emotion, will, and behavior under the influence of various biological, psychological and social environmental factors. There are a very large variety of such problematic expressions and behaviors, and many eventually develop into various mental disorders, such as autism, depression, delusional disorder, etc. Among them, depression is one of the most common mental disorders, seriously threatening the health of people. According to the world health organization, about 3 million people worldwide suffer from depression. Depression can lead to suicide in severe cases, which seriously affects the daily life of the patient. However, 76% to 85% of depression patients are not treated effectively in low-income and medium-income countries due to lack of medical resources and trained health care staff, and most depression patients miss appropriate treatment time ignoring their own depression symptoms. Early depression detection is of great significance in preventing mental health diseases such as depression.
The traditional mental disorder identification is mainly based on psychological knowledge, for example, for depression, a way of filling out a depression measuring table and a professional manual interview is adopted to judge whether a user has depression tendency, however, the following defects exist in the way: (1) the resource consumption is high, the professional medical personnel are limited, and the manual detection cost is high; (2) the diagnosis period is long, the diagnosis process needs medical personnel to follow up for a long time, and the process is slow; (3) the process is passive, and the patient can actively go to treatment only when symptoms are obvious, and the optimal treatment time is missed.
With the rapid development of the internet, social platforms such as Twitter, microblog and buffalo have become indispensable social tools for people, hundreds of millions of users share their thoughts and moods in various social platforms every day, the social network data including multi-modal information (such as texts, pictures, voice and the like) provides a new method and way for expression and behavior recognition of people, and more researchers use the multi-modal social network data to research various mental health diseases including depression. However, in the face of massive social network data, how to effectively model multi-modal sequence information becomes a key problem for improving data processing performance. Currently, modeling text or picture modal sequence information is mostly realized by a variation method such as RNN (radio network) and the like, and the problem of sequence dependence exists. Timing information cannot be modeled well.
Disclosure of Invention
The object of the present application is to solve the above technical problem. The application provides a social network data processing method and system fusing intra-modal and inter-modal relationships, data needing to be identified and classified are processed by using a new topic model to model multi-modal sequence information, the problem of sequence dependence caused by methods such as RNN (neural network) is relieved, and the performance of classification processing of a social network pointing to a target is further improved. The application provides the following technical scheme:
in a first aspect, a data processing method based on intra-modality and inter-modality relationships is provided, which includes:
obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set;
constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode topic information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode topic information auxiliary task network comprises a text modal network, a picture modal network and an inter-modal network and is used for acquiring topic information in the text modal network, topic information in the picture modal network and network relation topic information between modalities;
and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified.
Optionally, the text feature extraction network is a BERT model, and the picture feature extraction network is a VGG model.
Optionally, the training with the preset loss function includes simultaneously training the main task and the auxiliary task through a main task loss function, an auxiliary task loss function, and a joint loss function.
Optionally, wherein the text modality network, the picture modality network, and the inter-modality network are constructed based on a variational auto-encoder framework.
Optionally, the method for obtaining the theme information in the text modal network, the theme information in the picture modal network, and the network relationship theme information between modalities includes:
obtaining topic information in a mode by using a text mode network and a picture mode network;
obtaining the relation information between the text mode and the picture mode by using the following formula, and inputting the relation information into the inter-mode network to obtain the relation topic information between the multiple modes:
Figure 548386DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 114497DEST_PATH_IMAGE002
is a non-linear function of the standard,
Figure 603116DEST_PATH_IMAGE003
is as followstThe individual text and its corresponding pictorial representation,
Figure 229269DEST_PATH_IMAGE004
and is
Figure 350809DEST_PATH_IMAGE005
Is a transformation vector of order 3,dmthe size of the dimensions of the vector is represented,
Figure 658293DEST_PATH_IMAGE006
Figure 814468DEST_PATH_IMAGE007
vector multiplication for trainable parameters
Figure 611523DEST_PATH_IMAGE008
The result of (a) is a vector
Figure 407310DEST_PATH_IMAGE009
Each of which
Figure 315223DEST_PATH_IMAGE010
Optionally, wherein the network model of the primary task is constructed based on an LSTM model.
Optionally, the merging of the outputs of the main task and the auxiliary task using the gating mechanism comprises:
Figure 325904DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 231543DEST_PATH_IMAGE012
for the final representation of the user of the social network,
Figure 327675DEST_PATH_IMAGE013
is an output representation of the main task and,
Figure 39279DEST_PATH_IMAGE014
for the representation of the output of the three kinds of subject information,
Figure 904467DEST_PATH_IMAGE015
Figure 230275DEST_PATH_IMAGE016
and
Figure 813703DEST_PATH_IMAGE017
are trainable parameters.
Optionally, the main task loss function is:
Figure 328998DEST_PATH_IMAGE018
wherein the content of the first and second substances,Nfor the number of samples to be taken,
Figure 986376DEST_PATH_IMAGE019
is as followsiIndividual user
Figure 296134DEST_PATH_IMAGE020
The true category label of (a) is,
Figure 366858DEST_PATH_IMAGE021
is composed of
Figure 607216DEST_PATH_IMAGE022
The coefficients of the regularization are,
Figure 446996DEST_PATH_IMAGE023
all training parameters in the model;
auxiliary task loss function:
Figure 927656DEST_PATH_IMAGE024
wherein the content of the first and second substances,Uin the form of an intermediate content matrix, the content matrix,
Figure 485676DEST_PATH_IMAGE025
for a standard normal distribution, the first half of the formula measures the modeled distribution with a Kullback-Leibler divergence
Figure 14877DEST_PATH_IMAGE026
And true distribution
Figure 709164DEST_PATH_IMAGE027
The second part is the reconstruction loss of the model, the original input is reconstructed by generating the network,
Figure 626304DEST_PATH_IMAGE028
representing a training parameter;
joint loss function:
Figure 592992DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 988201DEST_PATH_IMAGE030
as weights, to balance the penalty functions of the primary and secondary tasks,
Figure 802574DEST_PATH_IMAGE031
is a function of the main loss as a function of,
Figure 625036DEST_PATH_IMAGE032
for the purpose of the text modal loss function,
Figure 829753DEST_PATH_IMAGE033
as a function of the modal loss of the picture,
Figure 28653DEST_PATH_IMAGE034
is an inter-modal relationship loss function.
In a second aspect, there is provided a data processing system that fuses intra-modality and inter-modality relationships, comprising:
the system comprises a sample construction unit, a data acquisition unit and a data analysis unit, wherein the sample construction unit is used for acquiring an initial sample and dividing the sample into a training set, a verification set and a test set;
the model building unit is used for building a data classification model based on the relation topic information in the fusion modality and between the modalities;
and the model training unit is used for training a data classification model based on the intra-modal and inter-modal relation topic information.
The beneficial effects of this application include at least:
(1) compared with the prior social network data processing and classification method which mainly uses texts for training and only mines the relevant information of the text data, the method uses multi-modal social network data based on texts and pictures, and uses more useful information;
(2) the text features are extracted by using the latest BERT method, the picture features are extracted by using the VGG method, data information can be captured better, and the performance of the method is improved effectively;
(3) the invention provides a new theme model, which can learn sparse text theme characteristics and continuous picture characteristic themes;
(4) the topic model is used for learning the topic information in each mode and the relationship topic information among multiple modes, and the social network data target direction classification performance is obviously improved.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application.
Drawings
The present application may be better understood by describing exemplary embodiments thereof in conjunction with the following drawings, wherein:
fig. 1 is a general schematic diagram of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application.
Fig. 2 is a flowchart of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application.
Fig. 3 is a schematic diagram of text feature extraction based on a BERT model according to an embodiment of the present application.
Fig. 4 is a schematic diagram of extracting features of a picture based on a VGG model according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a topic information model provided in an embodiment of the present application.
FIG. 6 is a data processing system diagram that merges intra-modality and inter-modality relationships provided in accordance with an embodiment of the present application.
Detailed Description
The following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings and examples, will enable those skilled in the art to practice the embodiments of the present application with reference to the description.
It is noted that in the detailed description of these embodiments, in order to provide a concise description, all features of an actual implementation may not be described in detail. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions are made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Fig. 2 is a flowchart of a data processing method for fusing intra-modality and inter-modality relationships according to an embodiment of the present application. The method at least comprises the following steps:
step S201, sample data of a social network pointing to a target classification is obtained, the sample data is divided into a training set, a verification set and a test set, and training set sample data, verification set sample data and test set sample data are obtained.
Step S202, a preset classification model is constructed, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode subject information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode subject information auxiliary task network comprises a text mode network, a picture mode network and an inter-mode network and is used for acquiring subject information in the text mode network, subject information in the picture mode network and subject information of network relation among modes.
In the present embodiment, text features are extracted using a BERT-base (uncached) model, and the structure of the BERT-base (uncached) model is shown in fig. 3, because each user has published many text sentences, "[ CLS ] in the BERT model is used to represent whole sentence vectors, and the sentence vectors are converted into the same dimensional features as the pictures. When all text sequences are BERT encoded, a text Content matrix (UGC) can be obtained, and the formula is as follows:
Figure 697531DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 707924DEST_PATH_IMAGE036
a matrix of text content representing the user,
Figure 462253DEST_PATH_IMAGE037
is shown asnThe characteristics extracted by BERT were used in sentences.
In this embodiment, a VGG16 model is used to encode a picture, and an output vector of the first fully-connected layer of the VGG16 is used as an encoding vector of the picture, and is converted into a feature dimension with the same size as a text, where the VGG16 model is shown in fig. 4.
In the present embodiment, referring to FIG. 5, the following steps are followed to obtainObtaining the relation topic information between the internal topics of each modality and the multiple modalities: unlike traditional neural topic models that focus on modeling sparse text feature topics, the topic model in this embodiment aims to generate a matrix of content (e.g., content matrix) in between each modality
Figure 199265DEST_PATH_IMAGE036
Or
Figure 722650DEST_PATH_IMAGE038
Etc.), since both sparse and continuous features can be encoded into the same UGC matrix, they can be input into the topic model proposed by the present invention, which adopts a variational self-encoder framework, including inference networks and generation networks. The loss function of the subject model is as follows:
Figure 90178DEST_PATH_IMAGE039
wherein the content of the first and second substances,Uthe intermediate content matrix is shown with superscripts omitted for convenience.
Figure 331803DEST_PATH_IMAGE025
Is a standard normal distribution, and the first half of the formula measures the modeled distribution with a Kullback-Leibler divergence
Figure 606927DEST_PATH_IMAGE026
And true distribution
Figure 437349DEST_PATH_IMAGE040
The second part is the reconstruction loss of the model, i.e. the original input is reconstructed by the generating network.
Figure 772515DEST_PATH_IMAGE028
Representing the training parameters.
Firstly, a theme model is used for modeling a theme in each mode, relationship information among multiple modes is obtained by using the following formula, and then the relationship information is input into the theme model to obtain relationship theme characteristics among the modes, wherein the formula is as follows:
Figure 501437DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 580251DEST_PATH_IMAGE002
is a standard non-linear function of the signal,
Figure 15912DEST_PATH_IMAGE003
is shown astThe individual text and its corresponding pictorial representation,
Figure 787558DEST_PATH_IMAGE004
and is
Figure 738197DEST_PATH_IMAGE042
Is a transformation vector of order 3,dmrepresenting the dimension size of the vector.
Figure 807653DEST_PATH_IMAGE006
Figure 160137DEST_PATH_IMAGE007
Are trainable parameters. We multiply the vectors
Figure 102685DEST_PATH_IMAGE008
As a vector
Figure 478303DEST_PATH_IMAGE009
Each of which
Figure 898920DEST_PATH_IMAGE010
Is equivalent to
Figure 371489DEST_PATH_IMAGE043
The bilinear models simultaneously capture multiline mutual information of the vectors. We approximate each vector slice using two low rank matrices using a method of vector factorization
Figure 484939DEST_PATH_IMAGE044
The method comprises the following steps:
Figure 597120DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 759111DEST_PATH_IMAGE046
Figure 820608DEST_PATH_IMAGE047
and is
Figure 104959DEST_PATH_IMAGE048
In the example, an LSTM is utilized to construct a target-oriented classification network model, the training of the model is set as a main task, and a theme in each modeling mode and a relation theme between multiple modes are set as auxiliary tasks.
Step S203, inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified
Firstly, the characteristics of a text mode and a picture mode are fused and input into an LSTM network for training to obtain the representation of a main task. The loss function of a specific main task is defined as:
Figure 704437DEST_PATH_IMAGE049
wherein the content of the first and second substances,Nindicating the number of samples, i.e., the number of Twitter users.
Figure 732435DEST_PATH_IMAGE019
Is the firstiIndividual user
Figure 914018DEST_PATH_IMAGE020
True category label of (2).
Figure 103691DEST_PATH_IMAGE021
Is that
Figure 941197DEST_PATH_IMAGE022
The coefficients of the regularization are,
Figure 507308DEST_PATH_IMAGE023
all training parameters in the model are represented.
Then, in order to distinguish different influences of the topic information of different auxiliary tasks on the target detection classification result, a gating mechanism is used to distinguish the expressions of the main task and the auxiliary tasks, and a formula is defined as follows:
Figure 808976DEST_PATH_IMAGE050
wherein the content of the first and second substances,
Figure 356501DEST_PATH_IMAGE012
it is the last representation of the user that,
Figure 743620DEST_PATH_IMAGE013
is an output representation of the main task and,
Figure 113421DEST_PATH_IMAGE014
the method is a representation of three types of theme information output, wherein the theme information in each mode comprises a text theme and a picture theme, and then the relation theme information among multiple modes.
Figure 941700DEST_PATH_IMAGE015
Figure 738755DEST_PATH_IMAGE016
And
Figure 347591DEST_PATH_IMAGE017
are trainable parameters.
The method usesThe combined loss function simultaneously optimizes the main task and the three auxiliary tasks, the loss function of the whole method comprises the three auxiliary tasks, such as text theme modeling, picture theme modeling and multi-modal relation theme modeling, and the corresponding loss function is expressed as
Figure 521083DEST_PATH_IMAGE051
And
Figure 718715DEST_PATH_IMAGE052
. The loss function of the loss union of the whole method is then expressed as:
Figure 686671DEST_PATH_IMAGE053
wherein the content of the first and second substances,
Figure 782803DEST_PATH_IMAGE030
is a weight that balances the penalty function for the primary and secondary tasks.
In the embodiment, the data classification is carried out by simultaneously fusing the characteristics of the main task and the auxiliary task through the following steps, and the characteristic result fused by using the gating mechanism is input into softmax to carry out the second classification of the pointing target:
Figure 432090DEST_PATH_IMAGE054
wherein the content of the first and second substances,
Figure 297278DEST_PATH_IMAGE055
Figure 436135DEST_PATH_IMAGE056
is a parameter that is trainable,nis the total number of categories to be classified,ris the final representation of the user for classification.
Figure 206514DEST_PATH_IMAGE057
Is the probability distribution of the two classes predicted by the model. The method inputs the social network user numberThe text and picture information of the table are converted into probability distribution and output as a classification label pointing to the target, taking depression as an example, 0 is non-depression and 1 is depression.
FIG. 6 is a data processing system for fusing multi-modal relationship topic information provided in one embodiment of the present application. The system at least comprises the following units: a sample construction unit 610, a model construction unit 620, and a model training unit 630.
The sample construction unit 610 acquires an initial sample, and divides the sample into a training set, a verification set and a test set;
a model construction unit 620 for constructing a data classification model based on the intra-modality fusion and inter-modality relationship topic information;
a model training unit 630 trains data classification models based on intra-modality and inter-modality relationship topic information.
For relevant details reference is made to the above-described method embodiments.
The basic principles of the present application have been described in connection with specific embodiments, but it should be noted that, for those skilled in the art, it can be understood that all or any of the steps or components of the method and apparatus of the present application can be implemented in hardware, firmware, software or their combination in any computing device (including processors, storage media, etc.) or network of computing devices, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present application.
The object of the present application can thus also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the application can thus also be achieved merely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present application, and a storage medium storing such a program product also constitutes the present application. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.
It is further noted that in the apparatus and method of the present application, it is apparent that the components or steps may be disassembled and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
Unless otherwise defined, technical or scientific terms used in the claims and the specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. The use of "first," "second," and similar terms in the description and claims of this patent application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The terms "a" or "an," and the like, do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprise" or "comprises", and the like, means that the element or item listed before "comprises" or "comprising" covers the element or item listed after "comprising" or "comprises" and its equivalent, and does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, nor are they restricted to direct or indirect connections.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (9)

1. A data processing method based on intra-modality and inter-modality relationships fusion includes:
obtaining sample data of a social network directed target classification, dividing the sample data into a training set, a verification set and a test set, and obtaining sample data of the training set, sample data of the verification set and sample data of the test set;
constructing a preset classification model, wherein the preset classification model comprises a feature extraction network, a target classification main task network and a multi-mode topic information auxiliary task network which are connected with the feature extraction network, the feature extraction network comprises a text feature extraction network and a picture feature extraction network, and the multi-mode topic information auxiliary task network comprises a text modal network, a picture modal network and an inter-modal network and is used for acquiring topic information in the text modal network, topic information in the picture modal network and network relation topic information between modalities;
and inputting the sample data of the training set into the preset classification model, training by using a preset loss function, and fusing the output of the main task and the auxiliary task by using a gating mechanism to obtain a social data classification model, wherein the social data classification model is used for classifying the input data to be classified.
2. The method of claim 1, wherein the text feature extraction network is a BERT model and the picture feature extraction network is a VGG model.
3. The method of claim 1, wherein the training with the preset loss function comprises simultaneously training the main task and the auxiliary task with a main task loss function, an auxiliary task loss function, and a joint loss function.
4. The method of claim 1, wherein the text modality network, picture modality network, and inter-modality network are constructed based on a variational self-encoder framework.
5. The method according to claim 4, wherein the method for obtaining the topic information of the text modal network, the topic information of the picture modal network and the topic information of the inter-modal network relationship comprises the following steps:
obtaining topic information in a mode by using a text mode network and a picture mode network;
obtaining the relation information between the text mode and the picture mode by using the following formula, and inputting the relation information into the inter-mode network to obtain the relation topic information between the multiple modes:
Figure 97309DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 919771DEST_PATH_IMAGE002
is a non-linear function of the standard,
Figure 186804DEST_PATH_IMAGE003
is as followstThe individual text and its corresponding pictorial representation,
Figure 323388DEST_PATH_IMAGE004
and is
Figure 992266DEST_PATH_IMAGE005
Is a transformation vector of order 3,dmthe size of the dimensions of the vector is represented,
Figure 985630DEST_PATH_IMAGE006
Figure 739960DEST_PATH_IMAGE007
vector multiplication for trainable parameters
Figure 929501DEST_PATH_IMAGE008
The result of (a) is a vector
Figure 452887DEST_PATH_IMAGE009
Each of which
Figure 617152DEST_PATH_IMAGE010
6. The method of claim 1, wherein the network model of the primary task is built based on an LSTM model.
7. The method of claim 1, wherein the fusing the outputs of the main task and the auxiliary task using the gating mechanism is as follows:
Figure 796460DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 337163DEST_PATH_IMAGE012
for the final representation of the user of the social network,
Figure 715055DEST_PATH_IMAGE013
is an output representation of the main task and,
Figure 315800DEST_PATH_IMAGE014
for the representation of the output of the three kinds of subject information,
Figure 966093DEST_PATH_IMAGE015
Figure 310487DEST_PATH_IMAGE016
and
Figure 542885DEST_PATH_IMAGE017
are trainable parameters.
8. The method of claim 3, wherein:
main task loss function:
Figure 314532DEST_PATH_IMAGE018
wherein the content of the first and second substances,Nis the number of samplesTo achieve the purpose of improving the immunity of human beings,
Figure 202854DEST_PATH_IMAGE019
is as followsiIndividual user
Figure 85359DEST_PATH_IMAGE020
The true category label of (a) is,
Figure 437843DEST_PATH_IMAGE021
is composed of
Figure 380391DEST_PATH_IMAGE022
The coefficients of the regularization are,
Figure 11136DEST_PATH_IMAGE023
all training parameters in the model;
auxiliary task loss function:
Figure 697332DEST_PATH_IMAGE024
wherein the content of the first and second substances,Uin the form of an intermediate content matrix, the content matrix,
Figure 904323DEST_PATH_IMAGE025
for a standard normal distribution, the first half of the formula measures the modeled distribution with a Kullback-Leibler divergence
Figure 955455DEST_PATH_IMAGE026
And true distribution
Figure 880686DEST_PATH_IMAGE027
The second part is the reconstruction loss of the model, the original input is reconstructed by generating the network,
Figure 104994DEST_PATH_IMAGE028
representing a training parameter;
joint loss function:
Figure 432070DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 637792DEST_PATH_IMAGE030
as weights, to balance the penalty functions of the primary and secondary tasks,
Figure 50319DEST_PATH_IMAGE031
is a function of the main loss as a function of,
Figure 78318DEST_PATH_IMAGE032
for the purpose of the text modal loss function,
Figure 259901DEST_PATH_IMAGE033
as a function of the modal loss of the picture,
Figure 652836DEST_PATH_IMAGE034
is an inter-modal relationship loss function.
9. A data processing system that fuses intra-modality and inter-modality relationships, comprising:
the system comprises a sample construction unit, a data acquisition unit and a data analysis unit, wherein the sample construction unit is used for acquiring an initial sample and dividing the sample into a training set, a verification set and a test set;
the model building unit is used for building a data classification model based on the relation topic information in the fusion modality and between the modalities;
and the model training unit is used for training a data classification model based on the intra-modal and inter-modal relation topic information.
CN202110665991.5A 2021-06-16 2021-06-16 Data processing method and system based on intra-modality fusion and inter-modality relation Active CN113254741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110665991.5A CN113254741B (en) 2021-06-16 2021-06-16 Data processing method and system based on intra-modality fusion and inter-modality relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110665991.5A CN113254741B (en) 2021-06-16 2021-06-16 Data processing method and system based on intra-modality fusion and inter-modality relation

Publications (2)

Publication Number Publication Date
CN113254741A true CN113254741A (en) 2021-08-13
CN113254741B CN113254741B (en) 2021-09-28

Family

ID=77188227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110665991.5A Active CN113254741B (en) 2021-06-16 2021-06-16 Data processing method and system based on intra-modality fusion and inter-modality relation

Country Status (1)

Country Link
CN (1) CN113254741B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107072595A (en) * 2013-12-31 2017-08-18 威斯康星州医药大学股份有限公司 Adaptive restatement based on multi-modality imaging is drawn
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
WO2018219927A1 (en) * 2017-05-30 2018-12-06 Esteve Pharmaceuticals, S.A. (hetero)arylalkylamino-pyrazolopyridazine derivatives having multimodal activity against pain
CN109840287A (en) * 2019-01-31 2019-06-04 中科人工智能创新技术研究院(青岛)有限公司 A kind of cross-module state information retrieval method neural network based and device
CN110222827A (en) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 The training method of text based depression judgement network model
US20200364405A1 (en) * 2017-05-19 2020-11-19 Google Llc Multi-task multi-modal machine learning system
US20200372369A1 (en) * 2019-05-22 2020-11-26 Royal Bank Of Canada System and method for machine learning architecture for partially-observed multimodal data
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112364168A (en) * 2020-11-24 2021-02-12 中国电子科技集团公司电子科学研究院 Public opinion classification method based on multi-attribute information fusion
CN112598067A (en) * 2020-12-25 2021-04-02 中国联合网络通信集团有限公司 Emotion classification method and device for event, electronic equipment and storage medium
CN112612936A (en) * 2020-12-28 2021-04-06 杭州电子科技大学 Multi-modal emotion classification method based on dual conversion network
CN112784801A (en) * 2021-02-03 2021-05-11 紫东信息科技(苏州)有限公司 Text and picture-based bimodal gastric disease classification method and device
CN112819052A (en) * 2021-01-25 2021-05-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-modal fine-grained mixing method, system, device and storage medium
US20210150315A1 (en) * 2019-11-14 2021-05-20 International Business Machines Corporation Fusing Multimodal Data Using Recurrent Neural Networks

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107072595A (en) * 2013-12-31 2017-08-18 威斯康星州医药大学股份有限公司 Adaptive restatement based on multi-modality imaging is drawn
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
US20200364405A1 (en) * 2017-05-19 2020-11-19 Google Llc Multi-task multi-modal machine learning system
WO2018219927A1 (en) * 2017-05-30 2018-12-06 Esteve Pharmaceuticals, S.A. (hetero)arylalkylamino-pyrazolopyridazine derivatives having multimodal activity against pain
CN109840287A (en) * 2019-01-31 2019-06-04 中科人工智能创新技术研究院(青岛)有限公司 A kind of cross-module state information retrieval method neural network based and device
US20200372369A1 (en) * 2019-05-22 2020-11-26 Royal Bank Of Canada System and method for machine learning architecture for partially-observed multimodal data
CN110222827A (en) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 The training method of text based depression judgement network model
US20210150315A1 (en) * 2019-11-14 2021-05-20 International Business Machines Corporation Fusing Multimodal Data Using Recurrent Neural Networks
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112364168A (en) * 2020-11-24 2021-02-12 中国电子科技集团公司电子科学研究院 Public opinion classification method based on multi-attribute information fusion
CN112598067A (en) * 2020-12-25 2021-04-02 中国联合网络通信集团有限公司 Emotion classification method and device for event, electronic equipment and storage medium
CN112612936A (en) * 2020-12-28 2021-04-06 杭州电子科技大学 Multi-modal emotion classification method based on dual conversion network
CN112819052A (en) * 2021-01-25 2021-05-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-modal fine-grained mixing method, system, device and storage medium
CN112784801A (en) * 2021-02-03 2021-05-11 紫东信息科技(苏州)有限公司 Text and picture-based bimodal gastric disease classification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴良庆 等: "基于多任务学习的多模态情绪识别方法", 《基于多任务学习的多模态情绪识别方法 *
章荪 等: "基于多任务学习的时序多模态情感分析模型", 《计算机应用》 *

Also Published As

Publication number Publication date
CN113254741B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
Abdullah et al. SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning
CN111444709B (en) Text classification method, device, storage medium and equipment
Han et al. A review on sentiment discovery and analysis of educational big‐data
Chen et al. Sentiment analysis based on deep learning and its application in screening for perinatal depression
Li et al. Weibo text sentiment analysis based on bert and deep learning
Almars Attention-Based Bi-LSTM Model for Arabic Depression Classification.
Wu et al. Kaicd: A knowledge attention-based deep learning framework for automatic icd coding
Zhang et al. Multi-task learning for jointly detecting depression and emotion
Wani et al. Depression screening in humans with AI and deep learning techniques
Vohra et al. Deep learning based sentiment analysis of public perception of working from home through tweets
Zhou et al. Tamfn: Time-aware attention multimodal fusion network for depression detection
Lim et al. Subsentence extraction from text using coverage-based deep learning language models
Sirrianni et al. Predicting stance polarity and intensity in cyber argumentation with deep bidirectional transformers
Shen et al. Emotion analysis of ideological and political education using a gru deep neural network
Wang et al. SCANET: Improving multimodal representation and fusion with sparse‐and cross‐attention for multimodal sentiment analysis
CN113254741B (en) Data processing method and system based on intra-modality fusion and inter-modality relation
Ji et al. LSTM based semi-supervised attention framework for sentiment analysis
Huang et al. Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network
Song Distilling knowledge from user information for document level sentiment classification
Feng et al. SINN: A speaker influence aware neural network model for emotion detection in conversations
Bose et al. Attention-based multimodal deep learning on vision-language data: models, datasets, tasks, evaluation metrics and applications
Nabiilah et al. Personality Classification Based on Textual Data using Indonesian Pre-Trained Language Model and Ensemble Majority Voting.
Lin et al. Adapting Static and Contextual Representations for Policy Gradient-Based Summarization
Dehghani et al. Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model
dos Santos Júnior et al. Learning and Semi-automatic Intention Labeling for Classification Models: ACOVID-19 Study for Chatbots

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231113

Address after: Room 401, 4th Floor, CCF Building, 600 Xiangrong Road, High Speed Rail New City, Xiangcheng District, Suzhou City, Jiangsu Province, 215133

Patentee after: Digital Suzhou Construction Co.,Ltd.

Address before: No.1 Shizi street, Gusu District, Suzhou City, Jiangsu Province

Patentee before: SOOCHOW University

TR01 Transfer of patent right