CN116484217A - Intelligent decision method and system based on multi-mode pre-training large model - Google Patents

Intelligent decision method and system based on multi-mode pre-training large model Download PDF

Info

Publication number
CN116484217A
CN116484217A CN202310407938.4A CN202310407938A CN116484217A CN 116484217 A CN116484217 A CN 116484217A CN 202310407938 A CN202310407938 A CN 202310407938A CN 116484217 A CN116484217 A CN 116484217A
Authority
CN
China
Prior art keywords
training
decision
data
model
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310407938.4A
Other languages
Chinese (zh)
Inventor
刘应波
杜宇
刘应玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Yuanmatrix Technology Co ltd
Original Assignee
Yunnan Yuanmatrix Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Yuanmatrix Technology Co ltd filed Critical Yunnan Yuanmatrix Technology Co ltd
Priority to CN202310407938.4A priority Critical patent/CN116484217A/en
Publication of CN116484217A publication Critical patent/CN116484217A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an intelligent decision method and system based on a multi-mode pre-training large model, wherein the method comprises the following steps: acquiring a decision problem, performing intelligent decision through a preset multi-mode pre-training model, generating a decision result, and storing a decision case; obtaining a decision case of the multi-mode pre-training model, and constructing decision tag data; the method comprises the steps of performing supervised training through decision tag data, and adjusting model parameters of the multi-mode pre-training model; the invention uses the case generated by decision as training label data to finely adjust the model parameters, thereby being beneficial to improving the decision capability of the model in specific case types.

Description

Intelligent decision method and system based on multi-mode pre-training large model
Technical Field
The invention relates to the technical field of multi-mode data processing, in particular to an intelligent decision method and system based on a multi-mode pre-training large model.
Background
In recent years, researchers have made great progress in both computer vision and natural language processing, so that multi-modal deep learning, which combines both, is also receiving more and more attention. In the existing multi-mode pre-training model, deep learning is performed through combination of multi-mode data, so that the understanding capability of the model to the original data is improved, and the decision accuracy is further improved.
However, with social progress, the decision problem to be performed is also slowly changed, and in the using process of the pre-training large model, the generation of the new decision problem cannot be adaptively changed, so that the model cannot be improved in the aspects of the degree of the breadth of the decision problem and the accuracy of the decision result, and meanwhile, the pre-training large model cannot meet the personalized requirements under specific scenes.
Therefore, how to enable the multi-modal pre-training model to enhance the decision making capability while using the multi-modal pre-training model is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the invention provides an intelligent decision method and system based on a multi-mode pre-training large model, which can use the case generated by decision as training label data to finely adjust the model parameters, and is helpful to improve the decision capability of the model for deciding the problem in a specific case type, so that the problem can be solved more widely in the specific scene.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
acquiring a decision problem, performing intelligent decision through a preset multi-mode pre-training model, generating a decision result, and storing a decision case;
obtaining a decision case of the multi-mode pre-training model, and constructing decision tag data; and the model parameters of the multi-mode pre-training model are adjusted by performing supervised training through the decision tag data.
Further, the pre-training step of the multi-modal pre-training model includes:
acquiring training data of multiple modes;
extracting training features of training data corresponding to each mode, uniformly encoding the training features, generating tuple sequences corresponding to each mode, and constructing a multi-mode data set;
and performing joint training on the pre-constructed multi-mode data processing model through tuple sequences corresponding to the multiple modes to generate a multi-mode pre-training model.
Further, the multi-modal training data includes one or more of image data, video data, and text data.
Further, training features of training data corresponding to each mode are extracted, the training features are uniformly coded, and a tuple sequence corresponding to each mode is generated, specifically:
for image data, the feature information is recorded as tuple f1= (C, O, P, R, …);
wherein C is a data modality type, wherein O represents an object in an image, P is a position of the object in the image, and R represents other features, which are geometry, shape, amplitude, histogram, color or local binary pattern;
for video data, extracting images of the video data frame by frame to form an image set, wherein the tuples F2= (C, O, P, R, T, …) of the images in the image set, and the element T is used for recording time information of a current frame;
for text data, extracting features by natural language processing, the text data tuple F3 can be encoded as (C, S, E …), where S is the feature level and E is the environmental information;
the multi-modality dataset dstd= { D1, D2, D3, …, DN }.
Further, the joint training specifically includes:
acquiring training data of different data modality types in a multi-modality data set, and merging;
training a multi-mode data processing model constructed in advance through the combined data.
Further, the merging mode comprises mode embedding, attention mechanism, multi-view learning or multi-task learning. Such as using a similar modality embedding (Modality Embedding): the different input modalities Dstd are converted into vector representations of the shared space, which are then connected together to form a multi-modal vector dem= [ F1, F2, F3, …, FN ], e.g. the images are encoded using Convolutional Neural Networks (CNN), the text is encoded using Recurrent Neural Networks (RNN), and the two vectors are then connected together to form Dem, further training models with Dem.
Further, the step of constructing decision tag data includes:
creating a decision problem text;
calculating decision case e according to text similarity m Distance l between text and decision problem text m The method comprises the steps of carrying out a first treatment on the surface of the Wherein m is the sequence number of the decision case.
Constructing a data Tag vector tag= { (e) 1 ,R 1 ,l 1 ),(e 2 ,R 2 ,l 2 ),(e 3 ,R 3 ,l 3 ) ,., }, wherein R is a decision case outcome;
and obtaining a training data format mapping rule map of the multi-mode pre-training model, and mapping and converting the Tag vector Tag form into a pre-training data format according to the mapping rule to form decision Tag data.
An intelligent decision making system based on a multi-modal pre-trained large model, comprising:
a user data acquisition device for a user to input a decision problem,
the data processor is used for setting a multi-mode pre-training model, carrying out intelligent decision through the multi-mode pre-training model according to the decision problem, generating a decision result and storing a decision case;
the intelligent optimization module is used for acquiring decision cases of the multi-mode pre-training model and constructing decision tag data; and the multi-mode pre-training model is used for performing supervised training on the multi-mode pre-training model through the decision tag data, and model parameters are adjusted.
Further, the user data acquisition device is an electronic input device or a voice acquisition device.
Further, the visual operation device is further included and used for visual model evaluation by a user through man-machine interaction.
The invention has the beneficial effects that:
compared with the prior art, the intelligent decision method and the system based on the multi-mode pre-training large model provided by the invention have the advantages that the case generated in the decision process of the pre-training large model is used as training label data to finely adjust the model parameters, so that the decision capability of the model in specific case types is improved; meanwhile, the pre-training large model is trained through training data in a plurality of scenes, has decision making capability in a plurality of scenes, combines with model fine adjustment, and can achieve capability improvement in a specific scene corresponding to the decision making problem according to the decision making problem of specific requirements.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an intelligent decision method based on a multi-mode pre-training large model according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an intelligent decision system based on a multi-mode pre-training large model according to another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, the embodiment of the invention discloses an intelligent decision method based on a multi-mode pre-training large model, which comprises the following steps:
s1: acquiring a decision problem, performing intelligent decision through a preset multi-mode pre-training model, generating a decision result, and storing a decision case.
In one embodiment, the pre-training step of the multimodal pre-training model is:
s11: acquiring training data of multiple modes; the training data may include one or more of image data, video data, and text data. And, cleaning, labeling and format conversion are carried out on the data.
S12: extracting training features of training data corresponding to each mode, uniformly encoding the training features, generating tuple sequences corresponding to each mode, and constructing a multi-mode data set; this step is a pretreatment. Preprocessing is defined as a function f (x), wherein x is multi-modal data, and a standardized text data set, namely the multi-modal data set, is formed after f processing.
Specifically, for image data, feature information of an image can be extracted by using a feature extraction technique, for example, using CNN, viT model, or the like, by which feature information is recorded as tuple f1= (C, O, P, R, …); c is a data mode type, the image is 1, the video is 2, and the text is 3; o represents an object in the image, P is the position of the object in the image, R represents other features, which are geometric, shape, amplitude, histogram, color or local binary pattern; for video data, extracting images of the video data frame by frame to form an image set, wherein the tuples F2= (C, O, P, R, T, …) of the images in the image set, and the element T is used for recording time information of a current frame; for text data, extracting features by natural language processing, the text data tuple F3 can be encoded as (C, S, E …), where S is the feature level and E is the environmental information; the multi-modality dataset dstd= { D1, D2, D3, …, DN }. In addition, data of other modality types, such as audio data, are included, and feature extraction is performed through a speech recognition model.
S13: and performing joint training on the pre-constructed multi-mode data processing model through tuple sequences corresponding to the multiple modes to obtain model parameters of the multi-mode pre-training model. For the construction of the multi-mode data processing model, a suitable multi-mode pre-training model can be selected, and is not limited to a specific model, such as Transformer, BERT, more specifically, such as DALL-E of OpenAI or CLIP of Google. These models are typically composed of multiple neural networks for processing image, video, and text input data.
In one embodiment, in S13, the specific steps of the joint training include:
acquiring training data of different data modality types in a multi-modality data set, and merging; the merging method comprises mode embedding, attention mechanism, multi-view learning, multi-task learning and the like. The invention uses a similar modality embedding (Modality Embedding): converting different input modalities Dstd into vector representations of the shared space, and then concatenating the vectors together to form a multi-modal vector dem= [ F1, F2, F3, …, fn]For example, images are encoded using Convolutional Neural Networks (CNNs), text is encoded using convolutional neural networks (RNNs), and then two vectors are concatenated together to form D em . Training a multi-mode data processing model constructed in advance through the combined data. The training method is characterized in that a combined training mode is used for pre-training, and a model is trained through a plurality of unlabeled tasks, so that richer semantic information is learned. Pre-training is typically performed using self-supervised learning, for example, by predicting the rotation angle of an image or by dividing an image into blocks and rearranging the image to predict the original image. The goal of these tasks is to train different parts of the model by utilizing information of multiple modalities.
S2: obtaining a decision case of the multi-mode pre-training model, and constructing decision tag data; and the model parameters of the multi-mode pre-training model are adjusted by performing supervised training through the decision tag data.
In one embodiment, the step of constructing decision tag data includes:
s21: creating a decision problem text;
S22:calculating decision case e according to text similarity m Distance l between text and decision problem text m The method comprises the steps of carrying out a first treatment on the surface of the The text distance may be calculated by a natural language text processing method for similarity calculation, such as cosine similarity calculation.
S23: constructing a data Tag vector tag= { (e) 1 ,R 1 ,l 1 ),(e 2 ,R 2 ,l 2 ),(e 3 ,R 3 ,l 3 ) ,., }, wherein R is a decision case outcome;
s24: and obtaining a training data format mapping rule map of the multi-mode pre-training model, and mapping and converting the Tag vector Tag form into a pre-training data format according to the mapping rule to form decision Tag data.
Wherein a complete case should contain: the decision context requires a decision problem, the end result. The decision cases are illustrated as follows: decision making of intelligent vehicle control system: the case context is: the model of the large vehicle is XXX on a certain month of a certain year, and avoidance is performed at the XX intersection. Other parameters for this case include: weather, traffic flow, field pictures, etc.; on a certain month of a certain year, the model of the small vehicle is XXX, and the vehicle is avoided in a certain place. Other parameters for this case include: weather, traffic volume, etc. The decision problem is: if a road is now being traversed, it is necessary to avoid pedestrians? The decision result is "yes" or "not".
In this embodiment, decision tag data is input into a training model for supervised training, and a back propagation algorithm is used to update large model parameters, and the training uses accuracy, precision, recall, and F1 score in evaluating training results.
Evaluating the performance of the trimmed model: the trimmed model is subjected to performance evaluation by using a tool such as a confusion matrix or an ROC curve to visualize the evaluation result data set.
Example 2
As shown in fig. 2, the invention also discloses an intelligent decision system based on the multi-mode pre-training large model, which comprises a user data acquisition device for users to input decision-making problems,
the data processor is used for setting a multi-mode pre-training model, carrying out intelligent decision through the multi-mode pre-training model according to the decision problem, generating a decision result and storing a decision case;
the intelligent optimization module is used for acquiring decision cases of the multi-mode pre-training model and constructing decision tag data; the method is used for performing supervised training on the multi-mode pre-training model through the decision tag data and adjusting model parameters.
In one embodiment, the user data collection device is an electronic entry device or a voice collection device.
In one embodiment, the system further comprises a visual operation device for visual model evaluation by a user through man-machine interaction.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An intelligent decision method based on a multi-mode pre-training large model is characterized by comprising the following steps:
acquiring a decision problem, performing intelligent decision through a preset multi-mode pre-training model, generating a decision result, and storing a decision case;
obtaining a decision case of the multi-mode pre-training model, and constructing decision tag data; and the model parameters of the multi-mode pre-training model are adjusted by performing supervised training through the decision tag data.
2. The intelligent decision method based on a multi-modal pre-training large model according to claim 1, wherein the pre-training step of the multi-modal pre-training model comprises:
acquiring training data of multiple modes;
extracting training features of training data corresponding to each mode, uniformly encoding the training features, generating tuple sequences corresponding to each mode, and constructing a multi-mode data set;
and performing joint training on the pre-constructed multi-mode data processing model through tuple sequences corresponding to the modes to obtain model parameters of the multi-mode pre-training model.
3. The intelligent decision making method based on a multi-modal pre-trained large model according to claim 2, wherein the multi-modal training data comprises one or more of image data, video data and text data.
4. The intelligent decision-making method based on the multi-mode pre-training large model according to claim 3, wherein training features of training data corresponding to each mode are extracted, the training features are uniformly coded, and a tuple sequence corresponding to each mode is generated, specifically:
for image data, the feature information is recorded as tuple f1= (C, O, P, R, …);
wherein C is a data modality type, wherein O represents an object in an image, P is a position of the object in the image, and R represents other features, which are geometry, shape, amplitude, histogram, color or local binary pattern;
for video data, extracting images of the video data frame by frame to form an image set, wherein the tuples F2= (C, O, P, R, T, …) of the images in the image set, and the element T is used for recording time information of a current frame;
for text data, extracting features by natural language processing, the text data tuple F3 can be encoded as (C, S, E …), where S is the feature level and E is the environmental information;
the multi-modality dataset dstd= { D1, D2, D3, …, DN }.
5. The intelligent decision method based on the multi-mode pre-training large model according to claim 2, wherein the joint training is specifically as follows:
acquiring training data of different data modality types in a multi-modality data set, and merging;
training a multi-mode data processing model constructed in advance through the combined data.
6. The intelligent decision making method based on a multi-modal pre-trained large model according to claim 5, wherein the merging means comprises modal embedding, attention mechanism, multi-view learning or multi-task learning.
7. The intelligent decision making method based on a multi-modal pre-trained large model according to claim 1, wherein the step of constructing decision tag data comprises:
creating a decision problem text;
calculating decision case e according to text similarity m Distance l between text and decision problem text m
Constructing a data Tag vector tag= { (e) 1 ,R 1 ,l 1 ),(e 2 ,R 2 ,l 2 ),(e 3 ,R 3 ,l 3 ) ,., }, wherein R is a decision case outcome;
and acquiring a training data format mapping rule of the multi-mode pre-training model, and mapping and converting the Tag vector Tag form into a pre-training data format according to the mapping rule to form decision Tag data.
8. An intelligent decision making system based on a multi-modal pre-trained large model, comprising:
a user data acquisition device for a user to input a decision problem,
the data processor is used for setting a multi-mode pre-training model, carrying out intelligent decision through the multi-mode pre-training model according to the decision problem, generating a decision result and storing a decision case;
the intelligent optimization module is used for acquiring decision cases of the multi-mode pre-training model and constructing decision tag data; and the multi-mode pre-training model is used for performing supervised training on the multi-mode pre-training model through the decision tag data, and model parameters are adjusted.
9. The intelligent decision making system based on a multi-modal pre-trained large model according to claim 8, wherein the user data acquisition device is an electronic entry device or a voice acquisition device.
10. The intelligent decision making system based on a multi-modal pre-trained large model according to claim 8, further comprising a visual manipulation means for visual model assessment by a user through human-machine interaction.
CN202310407938.4A 2023-04-17 2023-04-17 Intelligent decision method and system based on multi-mode pre-training large model Pending CN116484217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310407938.4A CN116484217A (en) 2023-04-17 2023-04-17 Intelligent decision method and system based on multi-mode pre-training large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310407938.4A CN116484217A (en) 2023-04-17 2023-04-17 Intelligent decision method and system based on multi-mode pre-training large model

Publications (1)

Publication Number Publication Date
CN116484217A true CN116484217A (en) 2023-07-25

Family

ID=87213167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310407938.4A Pending CN116484217A (en) 2023-04-17 2023-04-17 Intelligent decision method and system based on multi-mode pre-training large model

Country Status (1)

Country Link
CN (1) CN116484217A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114250A (en) * 2023-10-24 2023-11-24 广州知韫科技有限公司 Intelligent decision-making system based on large model
CN117290462A (en) * 2023-11-27 2023-12-26 北京滴普科技有限公司 Intelligent decision system and method for large data model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114250A (en) * 2023-10-24 2023-11-24 广州知韫科技有限公司 Intelligent decision-making system based on large model
CN117114250B (en) * 2023-10-24 2024-02-02 广州知韫科技有限公司 Intelligent decision-making system based on large model
CN117290462A (en) * 2023-11-27 2023-12-26 北京滴普科技有限公司 Intelligent decision system and method for large data model
CN117290462B (en) * 2023-11-27 2024-04-05 北京滴普科技有限公司 Intelligent decision system and method for large data model

Similar Documents

Publication Publication Date Title
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
CN116484217A (en) Intelligent decision method and system based on multi-mode pre-training large model
CN110929092B (en) Multi-event video description method based on dynamic attention mechanism
CN111563508A (en) Semantic segmentation method based on spatial information fusion
CN105787458A (en) Infrared behavior identification method based on adaptive fusion of artificial design feature and depth learning feature
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
JP2016062610A (en) Feature model creation method and feature model creation device
CN113792177B (en) Scene character visual question-answering method based on knowledge-guided deep attention network
CN110569359B (en) Training and application method and device of recognition model, computing equipment and storage medium
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
EP3884426B1 (en) Action classification in video clips using attention-based neural networks
CN113627266B (en) Video pedestrian re-recognition method based on transform space-time modeling
CN110853656B (en) Audio tampering identification method based on improved neural network
Alam et al. Two dimensional convolutional neural network approach for real-time bangla sign language characters recognition and translation
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
CN116564338B (en) Voice animation generation method, device, electronic equipment and medium
Jiang et al. Hadamard product perceptron attention for image captioning
CN117421591A (en) Multi-modal characterization learning method based on text-guided image block screening
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
CN114357221B (en) Self-supervision active learning method based on image classification
CN110135253A (en) A kind of finger vena identification method based on long-term recursive convolution neural network
CN115270917A (en) Two-stage processing multi-mode garment image generation method
CN115470799A (en) Text transmission and semantic understanding integrated method for network edge equipment
CN108921911B (en) Method for automatically converting structured picture into source code
Lu et al. Automatic lipreading based on optimized OLSDA and HMM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination