CN114418038A

CN114418038A - Space-based information classification method and device based on multi-mode fusion and electronic equipment

Info

Publication number: CN114418038A
Application number: CN202210317228.8A
Authority: CN
Inventors: 刘禹汐; 姜青涛; 侯立旺; 王慧静
Original assignee: Beijing Daoda Tianji Technology Co ltd
Current assignee: Beijing Daoda Tianji Technology Co ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-04-29

Abstract

The embodiment of the disclosure provides a multi-mode fusion-based space-based information classification method and device and electronic equipment. The method comprises the following steps: respectively obtaining text information characteristics and picture information characteristics of the space-based information to be classified; extracting corresponding text characteristic vectors and corresponding picture characteristic vectors according to the text information characteristics and the picture information characteristics; calculating the text characteristic vector and the picture characteristic vector, and performing joint splicing on the text characteristic vector and the picture characteristic vector with correlation to obtain fusion characteristics of the sky-based information to be classified; and inputting the fusion characteristics into a preset classification model for classification. In this way, the multi-mode information can be reasonably processed to obtain rich characteristic information, reasonable classification is carried out according to interaction between the characteristics, the information classification efficiency is improved, and subsequent work such as quick search of information is facilitated.

Description

Space-based information classification method and device based on multi-mode fusion and electronic equipment

Technical Field

The present disclosure relates to data classification technology, and more particularly, to information classification technology.

Background

At present, the explosive growth and accessibility of space-flight open-source multi-mode data on the Internet provide wide opportunities for people, and the intrinsic knowledge of heterogeneous information can be fused from multiple aspects, so that the traditional information text classification technology is challenged. For the research on the classification problem of the open source information of the space-based services, the existing technical scheme and research are mostly based on texts, and the information is classified by utilizing a natural processing technology.

The existing space-based open-source information classification technology is based on a single mode of text, picture multi-source information in open-source information is not fused, the reliability of the single-source information is not high, the space-based information is long in content and contains a large number of proper nouns, and the information classification effect is influenced.

Disclosure of Invention

The disclosure provides a space-based information classification method and device based on multi-mode fusion and electronic equipment.

According to a first aspect of the present disclosure, a space-based intelligence classification method based on multimodal fusion is provided. The method comprises the following steps:

respectively obtaining text information characteristics and picture information characteristics of the space-based information to be classified;

extracting corresponding text characteristic vectors and corresponding picture characteristic vectors according to the text information characteristics and the picture information characteristics;

calculating the text characteristic vector and the picture characteristic vector, and performing joint splicing on the text characteristic vector and the picture characteristic vector with correlation to obtain fusion characteristics of the sky-based information to be classified;

and inputting the fusion characteristics into a preset classification model for classification.

In some implementations of the first aspect, extracting the corresponding text feature vector from the text intelligence feature comprises:

the method comprises the steps of obtaining a text in space-based information to be classified, carrying out vectorization representation on the text, inputting the vectorization text into a pre-trained text information feature extraction model, and obtaining a corresponding text feature vector.

In some implementations of the first aspect, extracting the corresponding picture feature vector according to the picture intelligence feature comprises:

and obtaining a picture in the space-based information to be classified, and inputting the picture into a pre-trained picture information feature extraction model to obtain a corresponding picture feature vector.

In some realizations of the first aspect, the pre-trained text intelligence feature extraction model is a Bi-GRU model;

the pre-trained image information feature extraction model is a VGG-16 model and comprises 13 convolutional layers, 5 pooling layers and 2 full-connection layers.

In some implementation manners of the first aspect, the calculating the text feature vector and the picture feature vector, and jointly splicing the text feature vector and the picture feature vector having correlation to obtain the fusion feature of the sky-based information to be classified includes:

carrying out similarity calculation on the text information characteristic and the picture information characteristic according to a Pearson correlation coefficient; if the similarity reaches a threshold value, performing joint splicing; and if the similarity does not reach the threshold value, the joint splicing is not carried out.

In some implementations of the first aspect, the classification model is an MLP model;

the MLP model comprises a hidden layer, and the hidden layer uses a dropout algorithm;

the output layer of the MLP model comprises a softmax classifier that employs a multi-class cross-entropy loss function for classification.

According to a second aspect of the present disclosure, a space-based intelligence classification apparatus based on multimodal fusion is provided. The device includes:

the acquisition module is used for respectively acquiring text information characteristics and picture information characteristics of the space-based information to be classified;

the feature extraction module is used for extracting corresponding text feature vectors and picture feature vectors according to the text information features and the picture information features;

the first fusion module is used for calculating the text characteristic vector and the picture characteristic vector, and jointly splicing the text characteristic vector with correlation and the picture characteristic vector to obtain fusion characteristics of the sky-based information to be classified;

and the second fusion module is used for inputting the fusion characteristics into a preset classification model for classification.

According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.

In the method, the text information characteristics and the picture information characteristics are fused to classify the information, and because the expression modes of different modes are different, certain phenomena of intersection and complementation exist, even multiple different information interactions possibly exist among the modes.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. The accompanying drawings are included to provide a further understanding of the present disclosure, and are not intended to limit the disclosure thereto, and the same or similar reference numerals will be used to indicate the same or similar elements, where:

FIG. 1 shows a flow diagram of a multi-modal fusion-based space-based intelligence classification method according to an embodiment of the present disclosure;

FIG. 2 illustrates a logic diagram of a multi-modal fusion based space-based intelligence classification method according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating the process of extracting text feature vectors;

FIG. 4 is a diagram illustrating the process of extracting feature vectors of a picture;

FIG. 5 shows a schematic diagram of a joint splicing process;

FIG. 6 shows a block diagram of a multi-modal fusion based space-based intelligence classification apparatus according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an electronic device for implementing a multi-modal fusion-based space-based intelligence classification method of an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the method, the text information characteristics and the picture information characteristics are fused to classify the information, so that the information classification efficiency is improved, and subsequent work such as quick information search is facilitated.

Fig. 1 shows a flow diagram of a multi-modal fusion-based space-based intelligence classification method 100 according to an embodiment of the present disclosure.

As shown in fig. 1, the multi-modal fusion-based space-based intelligence classification method 100 includes:

s101, respectively obtaining text information characteristics and picture information characteristics of sky-based information to be classified;

s102, extracting corresponding text characteristic vectors and corresponding picture characteristic vectors according to the text information characteristics and the picture information characteristics;

s103, calculating the text characteristic vector and the picture characteristic vector, and performing combined splicing on the text characteristic vector and the picture characteristic vector with correlation to obtain fusion characteristics of the sky-based information to be classified;

and S104, inputting the fusion characteristics into a preset classification model for classification.

FIG. 2 illustrates a logic diagram of a multi-modal fusion based space-based intelligence classification method according to an embodiment of the present disclosure.

As shown in fig. 2, the multi-modal fusion-based space-based intelligence classification method disclosed by the present disclosure takes multi-modal fusion intelligence classification recognition as a core, that is, feature extraction is performed on different modalities, and extracted modality features are fused and then classified. According to the method, the feature layer and the decision layer in the third step are fused once respectively, the feature fusion is more sufficient through the first splicing fusion, the multi-mode features are complementary, and the cross or information redundancy among the multiple modes can be filtered and eliminated through the second dropout layer fusion. Through two times of fusion, the modal representation and cross-modal complementary correlation of multi-modal data can be correctly captured, multi-modal information is reasonably processed, rich characteristic information is obtained, and the space-based information classification effect is further improved.

In step S102, extracting a corresponding text feature vector according to the text intelligence feature includes:

Fig. 3 shows a process diagram of extracting text feature vectors.

As shown in fig. 3, vectorizing the text means that first a word vector is trained, which includes: training word vectors on a large-scale corpus by using a word vector model, and then performing vectorization representation on the text by using the trained word vector model, wherein the word vector model can be a CBOW model, word vectors are trained by using the CBOW model in a generic packet, the window is set to be 5, the min _ count is set to be 3, and the dimension of the word vectors is set to be 100 dimensions.

In some embodiments, the pre-trained text intelligence feature extraction model is a Bi-GRU model. The vectorized text is input into the Bi-GRU for feature extraction, and the text feature vector is obtained by training by using the traditional natural processing technology. The method specifically comprises the following steps: by inserting the word embedding layer into the Bi-GRU layer, the GRU and LSTM perform similarly, but with fewer GRU parameters and therefore more easily converge. Each unit in the GRU can control the flow of information by means of a reset gate and an update gate. Bi-GRUs are better able to capture and consider contextual information than single-term GRUs, and therefore, Bi-GRUs are chosen to extract text intelligence features in embodiments of the disclosure. The number of Bi-GRU hidden units is set to 64, overfitting is prevented by adopting a drop-out technology, parameters are set to 0.5, and the model is trained through a full connection layer and a softmax layer to obtain a text feature vector.

In step S102, extracting a corresponding picture feature vector according to the picture intelligence feature includes:

Fig. 4 shows a process diagram of extracting a picture feature vector.

In some embodiments, the pre-trained picture intelligence feature extraction model is a VGG-16 model, comprising 13 convolutional layers, 5 pooling layers, and 2 fully-connected layers.

The convolutional neural network model generally comprises a large number of parameters needing to be learned, a large number of training sets are also needed for training the parameters, and due to limited calculation, a transfer learning technology needs to be utilized to perform fine tuning on the basis of a pre-trained model, so that the VGG-16 model pre-trained by using ImageNet in Keras a reference model is selected in the embodiment of the disclosure.

As shown in fig. 4, according to the type of picture information, in the embodiment of the present disclosure, the last full-link output in the original model is modified, and the previous full-link layer is replaced, the number of neurons is set to 3, and the model includes 13 convolutional layers, 5 pooling layers, and 2 full-link layers. Thus, the embodiment of the present disclosure obtains the picture feature vector by using the fine tuning mode training.

According to the embodiment of the disclosure, fine tuning CNNs based on transfer learning are constructed to extract image information characteristics, the image information contains abundant information content, the images are subjected to vectorization representation, and the rich information content of the images is extracted and is convenient to be fused with text information characteristics.

In step S103, the calculating the text feature vector and the picture feature vector, and jointly splicing the text feature vector and the picture feature vector having correlation to obtain the fusion feature of the sky-based information to be classified includes:

Wherein fig. 5 shows a schematic diagram of a joint splicing process.

After the emotional characteristics of the text and the picture extracted by the Bi-GRU model and the fine-tuning VGG-16 model are respectively obtained, the relevance between the two modes needs to be judged, and the text characteristic vector and the picture characteristic vector with the relevance are jointly spliced, so that the mutual cooperation of various single modes is realized under the action of constraint conditions.

Due to the difference of information contained in each mode, the multi-mode cooperation needs to keep unique characteristics of each mode, and the cooperation method is based on a cross-mode similarity method which aims to directly measure the distance between a vector and different modes to learn a common subspace. The cross-modal correlation based approach aims at learning a shared subspace, thereby maximizing the correlation of different modal representation sets. The cross-modal similarity method keeps the similarity structure between the modalities under the constraint of similarity measurement, so that the cross-modal similarity distance of the same semantics or related objects is as small as possible, and the distance of different semantics is as large as possible.

And (3) carrying out similarity calculation between two modal feature vectors by adopting a Pearson correlation coefficient, and assuming that Q and D respectively represent fixed-length feature vectors obtained by two modes of the first-step information text and the second-step image, calculating the similarity by the following formula:

in the formula, Q_iAnd D_iRespectively, indicate the position of the bit where the vector is located,

and

the average values of Q and D, respectively. r ranges from-1 to + 1. The larger the absolute value of the correlation coefficient is, the stronger the correlation is, the closer the correlation coefficient is to 1 or-1, the stronger the correlation is, the closer the correlation coefficient is to 0, and the weaker the correlation is.

The threshold value can be represented by r, if the absolute value of r is less than 0.7, the graphics and text are considered to be irrelevant and not fused, and the classification result is directly determined by the text feature vector; and if the absolute value of r is more than or equal to 0.7, splicing the r into long vectors to perform feature layer fusion.

According to the embodiment of the present disclosure, as shown in fig. 5, the embodiment of the present disclosure performs a first fusion, and performs fusion splicing on the abstract features. The characteristic data are obtained through fusion splicing to obtain effective cross-modal characteristics, so that modal representation and cross-modal complementary correlation of the multi-modal data are accurately captured.

In step S104, the classification model is an MLP model;

Because the fusion feature vector extracted from the space-based open source information is complex, in order to better consider the influence of the complex features on the information classification effect, the MLP multi-layer perceptron neural network model with the dropout is selected to classify the feature vector after the first fusion, and the reason for adding the dropout layer is that the picture feature and the text feature vector are fused in a splicing mode in the step S103, so that the overfitting problem of the model is prevented, and the dropout layer can be regarded as the second fusion of the picture feature and the text feature. The MLP adopted in the embodiment of the disclosure only comprises one hidden layer, namely, a three-layer neural network structure, and dropout is used for the hidden layer, namely, some output features of the hidden layer are randomly set to be 0 in the training process, here, set to be 0.2, and usually set to be in the range of 0.2-0.5, the larger the value is, the more features are discarded, and the parameter belongs to the hyper-parameter. And the output layer softmax classifier adopts a multi-class cross entropy loss function, and all parameters of the MLP are connection weights and offsets among the layers, including W1, b1, W2 and b 2.

The parameters may be determined using a gradient descent method (SGD), specifically, all parameters are initialized randomly first, and then trained iteratively, and the gradient is continuously calculated and the parameters are updated until the error is sufficiently small. Therefore, the data feature fitting data can be fully utilized, so that the classification result of the fused feature vector by the model is more reliable.

In some embodiments, the fusion features are divided into training set samples and test set samples, the training set samples are input into the MLP model for training, then the test set samples are input into the trained MLP model, and whether the classification result can be output is observed, so as to determine whether the training is completed.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.

Fig. 6 shows a block diagram of a multi-modal fusion based space-based intelligence classification apparatus 600 according to an embodiment of the present disclosure.

As shown in fig. 6, the space-based intelligence classification apparatus 600 based on multi-modal fusion includes:

the acquisition module 601 is used for respectively acquiring text information characteristics and picture information characteristics of the space-based information to be classified;

the feature extraction module 602 is configured to extract corresponding text feature vectors and picture feature vectors according to the text intelligence features and the picture intelligence features;

the first fusion module 603 is configured to calculate the text feature vector and the picture feature vector, and jointly splice the text feature vector and the picture feature vector having correlation to obtain a fusion feature of the sky-based information to be classified;

and a second fusing module 604, configured to input the fused features into a preset classification model for classification.

In some embodiments, the system further comprises a text feature vector extraction module for obtaining a text in the space-based intelligence to be classified, performing vectorization representation on the text, and inputting the vectorized text into a pre-trained text intelligence feature extraction model to obtain a corresponding text feature vector.

In some embodiments, the image feature vector extraction module is further included, and is configured to obtain an image in the space-based information to be classified, and input the image into a pre-trained image information feature extraction model to obtain a corresponding image feature vector.

In some embodiments, the system further comprises a joint splicing module for performing similarity calculation on the text intelligence features and the picture intelligence features according to a pearson correlation coefficient; if the similarity reaches a threshold value, performing joint splicing; and if the similarity does not reach the threshold value, the joint splicing is not carried out.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

The device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the method 100. For example, in some embodiments, the method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When loaded into RAM 703 and executed by the computing unit 801, may perform one or more of the steps of the method 100 described above. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method 100 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A space-based intelligence classification method based on multi-modal fusion is characterized by comprising the following steps:

2. The multi-modal fusion-based space-based intelligence classification method of claim 1, wherein extracting corresponding text feature vectors from text intelligence features comprises:

3. The multi-modal fusion-based space-based intelligence classification method of claim 2, wherein extracting corresponding picture feature vectors according to picture intelligence features comprises:

4. The method for space-based intelligence taxonomy based on multimodal fusion of claim 3,

the pre-trained text information feature extraction model is a Bi-GRU model;

5. The multi-modal fusion-based space-based intelligence classification method of claim 1, wherein the calculating the text feature vector and the picture feature vector, and jointly splicing the text feature vector with correlation and the picture feature vector to obtain the fusion features of the space-based intelligence to be classified comprises:

6. The multi-modal fusion based space-based intelligence classification method of claim 1, wherein the classification model is an MLP model;

7. A space-based information classification device based on multi-modal fusion is characterized in that,

8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.