CN115048996A

CN115048996A - Quality assessment model training and using method, equipment and storage medium

Info

Publication number: CN115048996A
Application number: CN202210658095.0A
Authority: CN
Inventors: 杨晓婷; 史忠伟
Original assignee: 58tongcheng Information Technology Co ltd
Current assignee: 58tongcheng Information Technology Co ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-09-13

Abstract

The embodiment of the application provides a quality assessment model training and using method, equipment and a storage medium. The quality evaluation model is obtained based on vector training corresponding to data of different modes in the multi-mode data, so that the quality evaluation model can be used for performing quality evaluation on the target multi-mode data; before model training is carried out by using vectors of multiple modes, fusion processing is carried out on the vectors of the multiple modes, secondary classification is carried out on the fused vectors, and model training is carried out based on the fused vectors after the secondary classification; therefore, when the quality evaluation model is used for carrying out quality evaluation on the target multi-modal data, the method is not only suitable for carrying out quality evaluation on the data of each mode in the target multi-modal data, but also can carry out quality evaluation on the target multi-modal data integrally by combining the incidence relation among different modes in the target multi-modal data, and the obtained quality evaluation result is more accurate.

Description

Quality assessment model training and using method, equipment and storage medium

Technical Field

The present application relates to the field of model training technologies, and in particular, to a quality assessment model training method, a quality assessment model using method, a quality assessment model training apparatus, and a storage medium.

Background

In the internet posting scenario, many post contents are multi-modal data, and in the recruitment scenario, for example, the contents of the postings can be divided into text data, user behavior data, and auxiliary data for describing user behaviors from the data modality perspective. For enterprises, from the data security perspective, compliance detection is required for published data to determine whether the data is tampered or subjected to malicious attacks.

When the multi-modal data in the target scene are subjected to compliance detection, model training is usually performed on the basis of a large amount of historical multi-modal data in the target scene, so that an evaluation model for evaluating whether any multi-modal data in the target scene is in compliance is obtained. Although this method can detect compliance of data of each modality, in the case where data of different modalities have a correlation, it is difficult to detect if data of each modality is compliant but data of different modalities is not compliant. Therefore, there is a need to provide a solution for overall compliance assessment of multimodal data.

Disclosure of Invention

The application provides a quality assessment model training and using method, equipment and a storage medium from multiple aspects, and the quality assessment model training and using method, equipment and storage medium are used for carrying out model training and quality assessment on multi-modal data and determining the compliance of the multi-modal data.

The embodiment of the application provides a quality assessment model training method for multi-modal data, which comprises the following steps: acquiring N multi-mode sample data corresponding to N training tasks, wherein each multi-mode sample data comprises data of at least two modes; vectorizing and fusing the N pieces of multi-mode sample data to obtain N fused vectors, and performing secondary classification on the N fused vectors by using an activation function; performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the N classified fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training; performing second batch training on the quality evaluation model based on the plurality of intermediate state evaluation parameters and the N classified fusion vectors to obtain a function loss sum corresponding to each batch of training; and determining a general evaluation parameter corresponding to the quality evaluation model according to the sum of the loss of the plurality of functions obtained by the second batch training.

In an optional embodiment, the obtaining N multimodal sample data corresponding to N training tasks includes: acquiring T multi-modal data under multiple scenes, and randomly sampling N multi-modal sample data from the T multi-modal data to serve as N training tasks; wherein N is a positive integer greater than 1, and N is less than or equal to T.

In an optional embodiment, vectorizing the N multimodal sample data comprises: performing vector calculation on the text data in the N multi-modal sample data in a word vector calculation mode to obtain word vectors in the N multi-modal sample data; performing vector processing on the behavior data in the N pieces of multi-modal sample data by adopting a graph neural network to obtain behavior vectors in the N pieces of multi-modal sample data; and performing vector processing on the auxiliary data in the N multi-mode sample data by adopting a single hot coding mode to obtain the coding vectors in the N multi-mode sample data.

In an optional embodiment, performing fusion processing on the N multimodal sample data to obtain the N fusion vectors includes: calculating the outer product of every two vectors to obtain a middle vector by taking every two vectors as a group of N vectors after the N multi-modal sample data are subjected to quantization processing; and leveling the intermediate vector to obtain fusion vectors corresponding to the N pieces of multi-modal sample data respectively.

In an optional embodiment, the N fusion vectors include X support vectors and Y query vectors; performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the N classified fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training, including: and acquiring a corresponding number of first fusion vectors in batches from the X support vectors according to the set number of single samples, and sequentially performing first gradient descent calculation on the quality evaluation model by using the first fusion vectors acquired in batches based on the initialized evaluation parameters and the first loss function to obtain a number of intermediate state evaluation parameters corresponding to the number of single samples calculated each time.

In an optional embodiment, performing second batch training on the quality evaluation model based on the plurality of intermediate state evaluation parameters and the N fusion vectors after the second classification to obtain a function loss sum corresponding to each batch of training, includes: obtaining a corresponding number of second fusion vectors in batches from the Y query vectors according to the set number of single samples; based on the single-sample-quantity intermediate state evaluation parameters and the second loss functions obtained each time, sequentially performing second gradient descent calculation on the quality evaluation model by using second fusion vectors obtained in batches to obtain function loss sums corresponding to each calculation; and calculating the sum of the function losses corresponding to the second fusion vector under the intermediate state evaluation parameters respectively used by the second fusion vector at each second gradient descent.

In an optional embodiment, determining a general evaluation parameter corresponding to the quality evaluation model according to a plurality of function loss sums obtained by the second batch training includes: determining an intermediate state evaluation parameter corresponding to the minimum function loss sum in the plurality of function loss sums obtained by the second batch training; and taking the intermediate state evaluation parameter corresponding to the minimum function loss sum as a general evaluation parameter corresponding to the quality evaluation model.

In an optional embodiment, further comprising: acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into the quality evaluation model; and in the quality evaluation model, performing quality evaluation on the target multi-modal data according to the general evaluation parameters, wherein the quality evaluation result represents the quality of the target multi-modal data.

In an optional embodiment, in the case of acquiring T pieces of multimodal data in a plurality of scenes, the method further includes: randomly sampling M multi-modal sample data from the T multi-modal data as M test tasks, wherein M is a positive integer greater than 1, and (N + M) is less than or equal to T; vectorizing and fusing the M multi-mode sample data to obtain M fused vectors, and performing secondary classification on the M fused vectors by using an activation function.

In an optional embodiment, the M multi-modal sample data includes the target multi-modal data, the M fusion vectors obtained by subjecting the M multi-modal sample data to quantization processing and fusion processing include P support vectors and Q query vectors, and P is smaller than N; before inputting the target multi-modal data into the quality assessment model, further comprising: and performing third batch training on the quality evaluation model by using the P support vectors, and finely adjusting the general evaluation parameters.

In an optional embodiment, the performing, inside the quality assessment model, quality assessment on the target multi-modal data according to the general assessment parameters includes: vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data; performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function; and predicting the two classified fusion vectors based on the fine-tuned general evaluation parameters and the Q query vectors to obtain a ratio of the two corresponding types of fusion vectors, and outputting the ratio as a quality evaluation result.

The embodiment of the present application further provides a method for using a quality evaluation model, including: acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into a quality evaluation model; and performing quality evaluation on the target multi-modal data according to the general evaluation parameters in the quality evaluation model, wherein the quality evaluation result represents the quality of the target multi-modal data.

In an optional embodiment, the quality assessment of the target multi-modal data according to the general assessment parameters inside the quality assessment model comprises: vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data, and performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function; and predicting the two classified fusion vectors according to the evaluation model parameters to obtain the ratio of the two corresponding fusion vectors, and outputting the ratio as a quality evaluation result.

The embodiment of the present application further provides a quality assessment model training device for multimodal data, including: a memory having a computer program stored therein and a processor for executing the computer program for performing the steps of any of the methods.

Embodiments of the present application also provide a computer readable storage medium storing a computer program/instructions, which when executed by a processor, causes the processor to implement the steps of any of the methods.

In the embodiment of the application, the quality evaluation model is obtained based on vector training corresponding to data of different modes in the multi-mode data, so that the quality evaluation model can be used for performing quality evaluation on the target multi-mode data; before model training is carried out by using vectors of multiple modes, fusion processing is carried out on the vectors of the multiple modes, secondary classification is carried out on the fused vectors, and model training is carried out based on the fused vectors after the secondary classification; therefore, when the quality evaluation model is used for carrying out quality evaluation on the target multi-modal data, the method is not only suitable for carrying out quality evaluation on the data of each mode in the target multi-modal data, but also can carry out quality evaluation on the target multi-modal data integrally by combining the incidence relation among different modes in the target multi-modal data, and the obtained quality evaluation result is more accurate. In addition, the quality assessment model provided by the embodiment of the application is obtained based on multi-modal data training in various scenes, is more universal when in use, and has no limitation on the scene corresponding to the target multi-modal data to be assessed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1a is a flowchart of a quality assessment model training method according to an embodiment of the present disclosure;

FIG. 1b is a flow chart of a method for using a quality assessment model according to an embodiment of the present disclosure;

FIG. 1c is a schematic diagram of a process for performing quality assessment on target multi-modal data according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a quality assessment model training device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Multimodal data is involved in many scenarios of the internet, for example, in a posting scenario, text data corresponding to a post, behavior data such as browsing, clicking, collecting, etc. of the post by a user, and auxiliary data for restricting, explaining, and explaining the content of the post and the behavior of the user are involved. For another example, in a customer service scene, the data may relate to text data corresponding to a customer service in a communication process with a user, behavior data such as a user initiating a consultation to the customer service, browsing a communication content, evaluating a service content, the customer service receiving the consultation of the user, providing a corresponding service to the user, and the like, and may also relate to auxiliary data for limiting, explaining and explaining identity information and behaviors of the customer service and the user. However, in internet scenarios, malicious attacks are often suffered, for example, data is tampered or abnormally released, and the like. Taking a recruitment scenario as an example, for example, a job is usually published in a city, but in the case of malicious attack, there may be a case where a job is published in different cities; for another example, in the case where a certain position is not recruited to a suitable person, the position information is usually updated once in a specified period, for example, a week, but in the case of a malicious attack, the position information may be continuously updated; for another example, the post information that is distributed is content related to a clean service, but in the case of a malicious attack, the post information may be tampered with as other content related to a month sao, an engineer, a technician, or the like, or as content including bad information such as fraud, gambling, obscency, or the like. Therefore, for the similar case as described above, for the target scenario, it is necessary to perform compliance verification on the data in the target scenario.

However, data of different modalities in the multi-modality data have a certain relevance, and even if any modality data meets the scene requirement, if the relevance relationship between the data of different modalities does not meet the scene requirement, the multi-modality data cannot meet the scene requirement. For example, in a recruitment scene, the content, the publishing time and the publishing city of the recruitment post A all meet the requirements, but if a large number of recruitment posts A are published in the corresponding publishing time of the target city, the recruitment post A can be determined to be maliciously attacked during publishing and belong to abnormal publishing; for another example, the content, the posting time and the number of postings issued in the target city meet the requirements, that is, only one recruitment post B is issued in the target city, but the recruitment post B is also issued in the corresponding posting time of other cities besides the target city, so that the recruitment post B can be determined to be maliciously attacked at the time of posting and belong to abnormal postings. Based on this, in the process of verifying the compliance of the multi-modal data, it is necessary to verify the compliance of the multi-modal data together with the association relationship between different modal data. Therefore, in the model training process, in order to avoid considering the characteristics of data only from a single modality, the model training method provided in the embodiment of the present application can fuse data of different modalities in multi-modality data, and perform model training based on the fused data, so as to train a general model for quality evaluation, which can comprehensively consider multi-modality data in different scenes.

The quality assessment model training method provided by the embodiment of the application is described below by taking a recruitment scene as an example and combining the accompanying drawings.

Fig. 1a is a flowchart of a quality assessment model training method provided in an embodiment of the present application, and as shown in fig. 1a, the method includes:

s1a, acquiring N multi-mode sample data corresponding to N training tasks, wherein each multi-mode sample data comprises data of at least two modes;

s2a, vectorizing and fusing the N multi-mode sample data to obtain N fused vectors, and performing secondary classification on the N fused vectors by using an activation function;

s3a, performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the N classified fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training;

s4a, performing second batch training on the quality evaluation model based on the plurality of intermediate state evaluation parameters and the N classified fusion vectors to obtain function loss sum corresponding to each batch of training;

and S5a, determining a general evaluation parameter corresponding to the quality evaluation model according to the sum of the loss of the plurality of functions obtained by the second batch training.

In the embodiment of the present application, model training may be performed by using multi-modal data in multiple scenarios to obtain a quality assessment model provided in the embodiment of the present application. In the embodiment of the application, the source of the used multi-modal data is not limited, and in an optional embodiment, in order to adapt to the data characteristics corresponding to the recruitment scenario, the multi-modal data suitable for model training may be online data corresponding to the recruitment scenario, for example, historical multi-modal data in about 3 months may be selected for model training. In another alternative embodiment, the multimodal data used for model training may also be multimodal data obtained from a business platform; for example, multi-modal data needed for model training can be obtained from an anti-fraud platform, including tagged data to identify data that does not meet recruitment scenario usage, e.g., to tag sensitive words that cannot be used in a recruitment scenario. In another optional embodiment, the online historical multi-modal data corresponding to the recruitment scene and the multi-modal data provided by the service platform can be simultaneously acquired, so that model training can be performed by using the data. In the embodiment of the application, model training is performed based on-line historical multi-mode data corresponding to the recruitment scene and/or multi-mode data provided by the service platform, so that data characteristics corresponding to the recruitment scene can be learned, and a model suitable for performing quality assessment on the multi-mode data in the recruitment scene can be obtained.

In the embodiment of the present application, each multimodal data can be used as a training task. Supposing that N training tasks are used in the embodiment of the application, in order to ensure that the trained quality assessment model can not only perform quality assessment on data of each mode in the multi-mode data, but also perform quality assessment on the multi-mode data as a whole by combining incidence relations among different mode data in the multi-mode data, under the condition that N multi-mode sample data corresponding to the N training tasks are obtained, vectorization processing and fusion processing can be performed on the data of each mode in the N multi-mode sample data respectively to obtain N fusion vectors; the vectorization processing of the N multi-mode sample data includes: vectorizing data of each mode in the N multi-mode sample data respectively; the fusion processing of the N multi-modal sample data means: and for each multi-mode sample data in the N multi-mode sample data, fusing vectors obtained after the data of each mode contained in the multi-mode sample data is subjected to quantization processing to obtain N fused vectors.

Further, the N fusion vectors may be subjected to a second classification using an activation function to perform model training based on the N fusion vectors after the second classification. In the embodiment of the application, the multi-modal data meeting the requirement of the recruitment scene is called compliant data, the multi-modal data not meeting the requirement of the recruitment scene is called non-compliant data, the second classification aims to distinguish compliant data from non-compliant data in the multi-modal sample data, model training is performed based on the fusion vector after the second classification, characteristics corresponding to the compliant data and the non-compliant data can be learned, the quality assessment model obtained through training is further used for performing quality assessment on the target multi-modal data, and the quality assessment result represents the compliance of the multi-modal target data. In the embodiment of the present application, a specific manner of performing binary classification on the fusion vector is not limited, and optionally, the fusion vector may be distinguished according to vector values 0 and 1, a ratio of the vector values 0 and 1 is determined according to the number of vectors respectively corresponding to the vector values, and then compliance of the multimodal data is determined according to the ratio. For example, a vector value of 1 indicates data compliance, a vector value of 0 indicates data non-compliance, and if the ratio of the vector value of 1 to the vector value of 0 in the fused vector corresponding to the target multimodal data is greater than 50%, the target multimodal data is considered to be compliant.

In the embodiment of the application, an initial evaluation parameter can be set for the quality evaluation model, and when model training is performed, first batch training can be performed on the quality evaluation model based on the initial evaluation parameter and the N classified fusion vectors so as to obtain an intermediate state evaluation parameter corresponding to each batch training of the quality evaluation model; further, the intermediate state evaluation parameters are used as new evaluation parameters, the quality evaluation model is subjected to second batch training by using the N fusion vectors after the second classification to obtain a plurality of function loss sums corresponding to the quality evaluation model subjected to the second batch training, and general evaluation parameters corresponding to the quality evaluation model are determined according to the plurality of function loss sums so as to perform quality evaluation on target multi-modal data to be evaluated according to the general evaluation parameters. For the specific process of performing the first batch training and the second batch training on the quality evaluation model, reference may be made to the following embodiments, which will not be described in detail herein.

In the embodiment of the present application, a specific implementation manner of the initial evaluation parameter is not limited, and the implementation manner of the corresponding initial evaluation parameter may also be different according to a difference of the quality evaluation model. Further, the embodiment of the application also does not limit the specific type of the quality evaluation model, and a suitable model can be selected for training according to the specific scene requirements and the characteristics of the multi-modal data. Optionally, the embodiment of the present application is described by taking an optimization-based Meta Learning framework (MAML) as an example, and the corresponding initialization evaluation parameter is a function having a characteristic of normal distribution, which is not limited to this.

The following is a detailed description of the specific implementation of each step in the above method.

In the embodiment of the application, in order to ensure that N multi-modal sample data for model training has universality, when N training tasks for model training are obtained, T multi-modal data under multiple scenes can be obtained first, and then N multi-modal sample data are randomly sampled from the T multi-modal data to serve as the N training tasks; wherein N is a positive integer greater than 1, and N is less than or equal to T. In this way, model training by selecting multimodal data with too single characteristics can be avoided. For example, the selected N multi-mode sample data are all recruitment data corresponding to the same type of position, or most of the multi-mode sample data are recruitment data corresponding to the same or similar type of position, and if model training is performed by using the multi-mode sample data, the obtained quality assessment model is difficult to meet the requirement of universality. Further, under the condition that N multi-mode sample data meeting the requirement are obtained, vectorization processing and fusion processing can be carried out on the N multi-mode sample data. In the embodiment of the present application, a specific manner of vectorizing N pieces of multi-modal sample data is not limited, and the type of data modality included in each piece of multi-modal sample data is different, and a manner of vectorizing may also be different.

In this embodiment of the application, a modality type corresponding to each modality data is not limited, and optionally, taking an example that each multimodal data includes 3 modalities of text data, behavior data, and auxiliary data, when performing vectorization processing on N multimodal sample data, the text data, the behavior data, and the auxiliary data in each multimodal sample data may be separately subjected to vectorization processing in an adaptive manner. Optionally, a word vector calculation mode may be adopted to perform vector calculation on text data in the N multimodal sample data to obtain word vectors in the N multimodal sample data; performing vector processing on the behavior data in the N multi-modal sample data by adopting a graph neural network to obtain behavior vectors in the N multi-modal sample data; and performing vector processing on the auxiliary data in the N multi-mode sample data by adopting a single hot coding mode to obtain coding vectors in the N multi-mode sample data. For example, a fast text classification model (FsatText) may be used to calculate word vectors corresponding to text data; for another example, a graph Embedding model (LINE) may be used to determine the behavior vector corresponding to each behavior data and the similarity between the behavior vectors; as another example, the auxiliary data may be vector processed and classified using a one-bit efficient coding model (one-hot). Of course, the above description is merely exemplary and not limiting.

Accordingly, the embodiment of the present application does not limit a specific manner of performing fusion processing on the N multi-modal sample data, and optionally, a method such as point-wise addition (point-wise addition) or vector concatenation (concatenate) may be adopted to perform fusion processing on the N vectors after the N multi-modal sample data is subjected to quantization processing. Taking a vector splicing method as an example, for each multi-mode sample data, calculating an outer product of every two vectors to obtain a middle vector according to every two vectors as a group; and then flattening the intermediate vector to obtain fusion vectors corresponding to the N pieces of multi-mode sample data respectively.

Based on the above, when N fusion vectors corresponding to N multi-modal sample data are obtained, model training can be performed based on the N fusion vectors to obtain a quality that can evaluate the compliance of the target multi-modal dataA quantitative evaluation model. In the embodiment of the present application, the number of training tasks used each time when batch training is performed, that is, the number of fusion vectors, may be determined according to the set number of single samples (batch _ size), and a corresponding number of fusion vectors are obtained according to the set batch _ size batch for model training. In the embodiment of the present application, a specific manner of batch training of the quality assessment model is not limited, and optionally, the N fusion vectors may be divided into X support vectors (support sets) and Y query vectors (query sets); the number of X and Y may be the same or different, and is not limited herein. Supposing that X support vectors are divided into K groups according to the set batch _ size, obtaining batch _ size first fusion vectors from the X support vectors in batches, and obtaining K times in total, namely the times of executing the first batch training are K; further, based on the initialized evaluation parameters and the first loss function, sequentially performing first gradient descent calculation on the quality evaluation model by using first fusion vectors acquired in batches to obtain batch _ size intermediate evaluation parameters corresponding to each calculation, wherein the first gradient descent calculation is respectively as follows:

wherein the content of the first and second substances,

representing the evaluation parameters before each gradient descent calculation; η represents a learning rate used for the first gradient descent calculation;

representing the derivation of the evaluation parameters;

representing a first loss function; theta ⁱ Representing the evaluation parameter after each first gradient descent calculation; the intermediate state evaluation parameters respectively represent the quality evaluation results of the multi-modal data corresponding to the batch _ size fusion vectors used in each calculation under the initial evaluation parameters.

Further, assuming that Y query vectors are divided into L groups according to the set batch _ size, batch _ size second fusion vectors are obtained from the Y query vectors in batches for L times, that is, the number of times of performing second batch training is L; further, based on the batch _ size intermediate state evaluation parameters and the first loss functions obtained by calculating the first gradient descent each time, performing second gradient descent calculation on the quality evaluation model in sequence by using second fusion vectors obtained in batches to obtain function loss sums corresponding to each calculation:

wherein fi (θ i) represents a function loss corresponding to 1 intermediate state evaluation parameter and 1 second fusion vector used in each second gradient descent calculation;

a function loss summation representing the respective corresponding values of the batch _ size intermediate state evaluation parameters and the batch _ size second fusion vectors used in each second gradient descent calculation; and the function loss sum corresponding to each second gradient descent calculation represents the quality evaluation result corresponding to the batch _ size second fusion vectors used by each second gradient descent calculation under the respectively corresponding intermediate state evaluation parameters.

Further, in the case where L function loss sums corresponding to the second gradient descent calculation are obtained, the batch _ size number of intermediate state evaluation parameters corresponding to the smallest function loss sum among the L function loss sums may be determined, and the batch _ size number of intermediate state evaluation parameters may be used as the general evaluation parameter corresponding to the quality evaluation model. Further optionally, in order to improve the accuracy of the general evaluation parameter, a third gradient descent calculation may be performed on the quality evaluation model based on the general evaluation parameter and a third loss function to obtain an updated general evaluation parameter:

wherein μ represents a corresponding initial evaluation parameter prior to the third gradient descent calculation; lambda denotesThe third gradient descent calculates a learning rate used;

representing the derivation of the evaluation parameters;

representing a third loss function; ω represents the higher longitude general evaluation parameter after the third gradient descent calculation.

It should be noted that, in the embodiment of the present application, specific types of the first Loss function, the second Loss function, and the third Loss function are not limited, and include, but are not limited to, any one of Mean square Error Loss (Mean Squared Error Loss), Mean Absolute Error Loss (Mean Absolute Error Loss), Cross Entropy Loss (Cross Entropy Loss), and Hinge Loss (Hinge Loss); accordingly, the embodiment of the present application also does not limit the relationship among the first loss function, the second loss function, and the third loss function, which may be the same or different, and may be determined specifically according to actual requirements.

Based on the above, when the general evaluation parameter corresponding to the quality evaluation model is obtained, the target multi-modal data can be subjected to instruction evaluation by using the quality evaluation model and the general evaluation parameter corresponding to the quality evaluation model, so as to predict the compliance of the target multi-modal data. Optionally, before performing quality evaluation on the target multi-modal data, target multi-modal data in a target scene may be acquired, and the target multi-modal data is input into the quality evaluation model; within the quality assessment model, the target multi-modal data can be subjected to quality assessment according to the general assessment parameters, and the quality assessment result represents the compliance of the target multi-modal data. For example, if multi-modal data corresponding to a recruitment scene is to be evaluated, on-line data corresponding to the recruitment scene can be acquired, and the acquired on-line data is input into the quality evaluation model as target multi-modal data corresponding to the recruitment scene, so as to predict whether the acquired on-line data meets the requirement of the recruitment scene according to a quality evaluation result output by the quality evaluation model; for example, the output quality assessment result is a ratio corresponding to a fusion vector corresponding to the target multi-modal data after being classified into two categories, if the ratio of the vector value 1 is greater than 50%, it is determined that the target multi-modal data meets the requirement of a recruitment scene, and if the ratio of the vector value 0 is greater than 50%, it is determined that the target multi-modal data does not meet the requirement of the recruitment scene.

It should be noted that the quality assessment model provided by the embodiment of the present application is obtained by training based on multi-modal data in multiple scenes, including a target scene corresponding to target multi-modal data to be assessed, and the quality assessment model obtained by training in this way is more universal; it should be further noted that, because the data size obtained from the online is relatively small, in order to improve the accuracy of quality evaluation on the online data, before the quality evaluation on the online data, a batch of small sample data may be used to perform fine adjustment on the general evaluation parameters corresponding to the quality evaluation model, so as to perform quality evaluation on the target multi-modal data based on the fine-adjusted general evaluation parameters, so as to obtain a more accurate quality evaluation result.

In the embodiment of the application, a specific acquisition mode for acquiring the small sample data is not limited, and in an optional embodiment, preset M multi-mode sample data can be used as the small sample data to perform fine adjustment on the general evaluation parameters; the method can also be used for randomly sampling M multi-mode sample data from the T multi-mode data as M test tasks to carry out fine adjustment on the general evaluation parameters under the condition of acquiring the T multi-mode data under a plurality of scenes; wherein M is a positive integer greater than 1, and (N + M) is equal to or less than T. Further, under the condition that M multi-mode sample data are obtained, vectorization processing and fusion processing can be further performed on the M multi-mode sample data to obtain M fusion vectors, the M fusion vectors are subjected to secondary classification by using an activation function, and fine adjustment is performed on the general evaluation parameters through the M fusion vectors subjected to secondary classification. Further optionally, in order to ensure that the quality assessment model can perform targeted quality assessment on target multi-modal data in the target scene after being fine-tuned, the obtained M multi-modal sample data may include the target multi-modal data or include other multi-modal data in the target scene, so as to improve the adaptability of the quality assessment parameter to the target scene.

In the embodiment of the present application, a specific manner of fine tuning the small sample data of the quality evaluation model is not limited, and optionally, M fusion vectors obtained after quantization processing and fusion processing of M multi-modal sample data may be divided into P support vectors and Q query vectors; and P is smaller than N, so that the P support vectors are used as small sample data, and before target multi-modal data are input into the quality assessment model, the P support vectors can be used for carrying out third batch training on the quality assessment model so as to carry out fine adjustment on general assessment parameters. In the embodiment of the present application, a specific manner of the third batch training is not limited, and may be the same as or different from the first and second batch training manners, and the specific manner may be determined according to actual requirements.

Further, after the general evaluation model is subjected to small sample fine adjustment, the quality of the target multi-modal data can be evaluated based on the fine-adjusted general evaluation parameters, and the compliance of the target multi-modal data can be predicted. Optionally, when performing quality evaluation on the target multi-modal data, vectorization processing and fusion processing can be performed on the target multi-modal data inside the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data, and the fusion vector corresponding to the target multi-modal data is subjected to secondary classification by using an activation function; and further, predicting the two classified fusion vectors based on the fine-tuned general evaluation parameters and the Q query vectors to obtain a ratio of the target multi-modal data to the two classes of fusion vectors, and outputting the ratio as a quality evaluation result.

In the embodiment of the present application, a specific manner of adjusting the target multimodal data according to the quality evaluation result is not limited, and optionally, in the case of obtaining the quality evaluation result corresponding to the target multimodal data, the quality evaluation result may be provided to a service terminal in charge of modifying the target multimodal data, so that the service terminal performs corresponding adjustment on the target multimodal data according to the quality evaluation result. For example, in the recruitment scene, under the condition that a quality evaluation result corresponding to the recruitment data is obtained, the quality evaluation result can be provided to a corresponding auditing terminal, and the auditing terminal can determine that the recruitment data has a problem according to the quality evaluation result and check and adjust the recruitment data so as to enable the recruitment data to meet the requirement of the recruitment scene. For example, when the problem posts such as posts sent in multiple cities and posts sent in a single city at the same time are checked, the problem posts can be adjusted to meet the recruitment requirement.

It should be noted that, because the quality assessment model provided in the embodiment of the present application is obtained based on multi-modal data training in multiple scenarios, when in actual use, the scenario corresponding to the multi-modal data to be quality assessed is not limited. The embodiment is described with respect to a process of performing quality assessment on multi-modal data by taking a recruitment scenario as an example, according to specific requirements, the process of performing quality assessment can also be applied to a house renting/buying/selling scenario, a vehicle renting/buying/selling scenario, an e-commerce sale and service scenario, and the like, and can be flexibly used according to actual requirements, and for the specific process of using the model training method in different scenarios, reference may be made to the embodiment, and no further description is given here.

Based on the above, an embodiment of the present application further provides a method for using a quality assessment model, and fig. 1b is a flowchart of the method for using the quality assessment model, as shown in fig. 1b, the method includes:

s1b, acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into a quality evaluation model;

s2b, performing quality evaluation on the target multi-modal data according to the general evaluation parameters in the quality evaluation model, wherein the quality evaluation result represents the quality of the target multi-modal data.

Optionally, when the quality of the target multi-modal data is evaluated according to the general evaluation parameters, vectorization processing and fusion processing can be performed on the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data, and the fusion vector corresponding to the target multi-modal data is subjected to secondary classification by using an activation function; and predicting the two classified fusion vectors according to the evaluation model parameters to obtain a ratio corresponding to the two classes of fusion vectors, and outputting the ratio as a quality evaluation result.

It should be noted that, for a specific process of the user of the quality assessment model, reference may be made to the above embodiments, which are not described herein again. In addition, the embodiment of the present application does not limit the type of the activation function used for performing the binary classification on the fusion vector, and for example, the activation function may be a Sigmoid function, a softmax function, or another function.

FIG. 1c is a schematic diagram of a process of performing quality assessment on target multi-modal data by using a method for using a quality assessment model provided by an embodiment of the present application, and the process of the quality assessment is briefly described below with reference to FIG. 1 c; the target multi-modal data comprises data of three modes, namely text data, behavior data and auxiliary data, the used vectorization processing models are a FastText model, a Line model and an One-Hot model respectively, the used vector fusion mode is a locate mode, and the used activation function is SoftMax. As shown in fig. 1c, when the target multi-modal data is obtained, a text data, a behavior data, and an auxiliary data in the target multi-modal data may be vectorized by using a FastText model, a Line model, and an One-Hot model, so as to obtain a corresponding text vector, a behavior vector, and an auxiliary vector; further, performing fusion processing on the vectors of the three modes in a concatemate mode to obtain a fusion vector corresponding to the target multi-mode data, and performing normalization processing on the fusion vector by using a SoftMax function to perform secondary classification on the fusion vector; and finally, inputting the two classified fusion vectors into a quality evaluation model, and performing quality evaluation on the target multi-modal data to obtain a corresponding quality evaluation result.

The quality evaluation model provided by the embodiment of the application is obtained based on vector training corresponding to data of different modes in multi-mode data, so that the quality evaluation model can be used for performing quality evaluation on target multi-mode data; before model training is carried out by using vectors of multiple modes, fusion processing is carried out on the vectors of the multiple modes, secondary classification is carried out on the fused vectors, and model training is carried out based on the fused vectors after the secondary classification; therefore, when the quality evaluation model is used for carrying out quality evaluation on the target multi-modal data, the method is not only suitable for carrying out quality evaluation on data of each mode in the target multi-modal data, but also can carry out quality evaluation on the target multi-modal data integrally by combining the incidence relation among different modes in the target multi-modal data, and the obtained quality evaluation result is more accurate; in addition, the quality assessment model provided by the embodiment of the application is obtained based on multi-modal data training in various scenes, is more universal when in use, and has no limitation on the scene corresponding to the target multi-modal data to be assessed.

It should be noted that, the specific implementation manner of each step of the method may participate in the description of the corresponding part in the embodiment of the system, which is not described herein again. The execution subjects of the steps of the method provided by the above embodiments may be the same device, or different devices may be used as the execution subjects of the method. For example, the execution subjects of steps S1a to S5a may be device a; for another example, the execution subject of step S1a may be device a, and the execution subjects of steps S2a to S5a may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations occurring in a specific order are included, but it should be clearly understood that the operations may be executed out of the order they appear herein or in parallel, and the order of the operations, such as S1a, S1b, etc., is merely used to distinguish between the various operations, and the order itself does not represent any order of execution. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Based on the above, an embodiment of the present application further provides a quality assessment model training device for multimodal data, fig. 2 is a schematic structural diagram of the quality assessment model training device, and as shown in fig. 2, the quality assessment model training device includes: a processor 21 and a memory 22 in which a computer program is stored; the processor 21 and the memory 22 may be one or more.

The memory 22 is mainly used for storing computer programs, and these computer programs can be executed by the processor 21, so that the processor 21 controls the quality assessment model training device to implement corresponding functions and complete corresponding actions or tasks. In addition to storing computer programs, memory 22 may also be configured to store other various data to support operations on the quality assessment model training apparatus. Examples of such data include instructions for any application or method operating on the quality assessment model training device.

The memory 22, which may be implemented by any type of volatile or non-volatile memory device or combination thereof, may include, for example, a Static Random Access Memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic disk, or an optical disk.

In the embodiment of the present application, the implementation form of the processor 21 is not limited, and may be, for example, but not limited to, a CPU, a GPU, an MCU, or the like. The processor 21 may be regarded as a control system of the quality assessment model training apparatus and may be configured to execute a computer program stored in the memory 22 to control the quality assessment model training apparatus to perform a corresponding function and perform a corresponding action or task. It is worth to be noted that, according to the different implementation forms and scenes of the quality assessment model training device, the functions, actions or tasks to be implemented are different; accordingly, the computer programs stored in the memory 22 may vary, and execution of different computer programs by the processor 21 may control the quality assessment model training apparatus to perform different functions, perform different actions or tasks.

In some optional embodiments, as shown in fig. 2, the quality assessment model training apparatus may further include: a display 23, a power supply component 24, and a communication component 25. Only some of the components are schematically shown in fig. 2, which does not mean that the quality assessment model training apparatus only comprises the components shown in fig. 2, and the quality assessment model training apparatus may further comprise other components for different application requirements, for example, in the case of a voice interaction requirement, as shown in fig. 2, the quality assessment model training apparatus may further comprise an audio component 26. The components that the quality assessment model training apparatus may include may be determined according to the product form of the quality assessment model training apparatus, and are not limited herein.

In an embodiment of the application, the processor, when executing the computer program in the memory, is configured to: acquiring N multi-mode sample data corresponding to N training tasks, wherein each multi-mode sample data comprises data of at least two modes; vectorizing and fusing the N multi-mode sample data to obtain N fused vectors, and performing secondary classification on the N fused vectors by using an activation function; performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the two classified N fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training; performing second batch training on the quality evaluation model based on the plurality of intermediate state evaluation parameters and the two classified N fusion vectors to obtain a function loss sum corresponding to each batch of training; and determining a general evaluation parameter corresponding to the quality evaluation model according to the sum of the loss of the plurality of functions obtained by the second batch training.

In an optional embodiment, when acquiring N multi-mode sample data corresponding to N training tasks, the processor 21 is configured to: acquiring T multi-modal data under multiple scenes, and randomly sampling N multi-modal sample data from the T multi-modal data to serve as N training tasks; wherein N is a positive integer greater than 1, and N is less than or equal to T.

In an alternative embodiment, the processor 21, when vectoring the N multi-modal sample data, is configured to: performing vector calculation on the text data in the N multi-modal sample data in a word vector calculation mode to obtain word vectors in the N multi-modal sample data; performing vector processing on the behavior data in the N pieces of multi-modal sample data by adopting an image neural network to obtain behavior vectors in the N pieces of multi-modal sample data; and performing vector processing on the auxiliary data in the N multi-mode sample data by adopting a single-hot coding mode to obtain coding vectors in the N multi-mode sample data.

In an optional embodiment, when performing fusion processing on N pieces of multimodal sample data to obtain N fusion vectors, the processor 21 is configured to: calculating the outer product of every two vectors by taking every two vectors as a group of N vectors after the N multi-modal sample data are subjected to quantization processing to obtain an intermediate vector; and leveling the intermediate vector to obtain fusion vectors corresponding to the N pieces of multi-modal sample data respectively.

In an optional embodiment, the N fusion vectors include X support vectors and Y query vectors; the processor 21 is configured to, when performing first batch training on the quality evaluation model based on the initialized evaluation parameter and the two-classified N fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training: and acquiring a corresponding number of first fusion vectors in batches from the X support vectors according to the set number of single samples, and sequentially performing first gradient descent calculation on the quality evaluation model by using the first fusion vectors acquired in batches based on the initialized evaluation parameters and the first loss function so as to obtain intermediate state evaluation parameters of which the number is corresponding to each batch of calculation.

In an optional embodiment, the processor 21, when performing second batch training on the quality estimation model based on the plurality of intermediate state estimation parameters and the two classified N fusion vectors to obtain a function loss sum corresponding to each batch of training, is configured to: obtaining a corresponding number of second fusion vectors in batches from the Y query vectors according to the set number of single samples; based on the single sample number intermediate state evaluation parameters and the second loss function obtained each time, sequentially performing second gradient descent calculation on the quality evaluation model by using second fusion vectors obtained in batches to obtain the loss sum of the corresponding function calculated in each batch; and the function loss sum is the sum of the function losses corresponding to the second fusion vector under the intermediate state evaluation parameters respectively used by the second fusion vector calculated by each second gradient descent.

In an alternative embodiment, when determining the general evaluation parameter corresponding to the quality evaluation model according to the plurality of function loss sums obtained by the second batch training, the processor 21 is configured to: determining an intermediate state evaluation parameter corresponding to the minimum function loss sum in the plurality of function loss sums obtained by the second batch training; and taking the intermediate state evaluation parameter corresponding to the minimum function loss sum as a general evaluation parameter corresponding to the quality evaluation model.

In an alternative embodiment, the processor 21 is further configured to: acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into a quality evaluation model; and in the quality evaluation model, performing quality evaluation on the target multi-modal data according to the general evaluation parameters, wherein the quality evaluation result represents the quality of the target multi-modal data.

In an optional embodiment, in the case of acquiring T multi-modal data under multiple scenes, the processor 21 is further configured to: randomly sampling M multi-modal sample data from the T multi-modal data as M test tasks, wherein M is a positive integer greater than 1; vectorizing and fusing the M multi-mode sample data to obtain M fused vectors, and performing secondary classification on the M fused vectors by using an activation function; wherein (N + M) is less than or equal to T.

In an optional embodiment, the M multi-modal sample data include target multi-modal data, and the M fusion vectors obtained by subjecting the M multi-modal sample data to quantization processing and fusion processing include P support vectors and Q query vectors; wherein P is less than N; the processor 21 is further configured to, prior to entering the target multimodal data into the quality assessment model: and performing third batch training on the quality evaluation model by using the P support vectors, and finely adjusting the general evaluation parameters.

In an alternative embodiment, within the quality assessment model, the processor 21 is configured to, when performing the quality assessment on the target multimodal data according to the general assessment parameters: vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data; performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function; and predicting the two classified fusion vectors based on the fine-tuned general evaluation parameters and the Q query vectors to obtain a ratio of the two corresponding types of fusion vectors, and outputting the ratio as a quality evaluation result.

It should be noted that, for specific functions of the processor in the quality assessment model training device, reference may be made to the above method embodiments, which are not described herein again.

Accordingly, the present application further provides a computer readable storage medium storing a computer program, where the computer program is capable of implementing the steps that can be performed by the quality assessment model training device in the foregoing method embodiments.

Based on the above, the embodiment of the present application further provides a quality assessment model using device for multi-modal data, and the corresponding structure is similar to the structure shown in fig. 2, and can be seen in fig. 2. In the present embodiment, the quality estimation model using apparatus includes: a processor and a memory storing a computer program; the processor and the memory may be one or more, among others.

And the memory is mainly used for storing computer programs which can be executed by the processor to enable the processor to control the quality assessment model to realize corresponding functions and complete corresponding actions or tasks by using the equipment. In addition to storing a computer program, the memory may be configured to store various other data to support operations on the quality assessment model using apparatus. Examples of such data include instructions for any application or method operating on the quality assessment model utilization device.

The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

In the embodiment of the present application, the implementation form of the processor is not limited, and for example, the processor may be, but not limited to, a CPU, a GPU, an MCU, or the like. The processor may be viewed as a control system of the quality assessment model using apparatus, and may be configured to execute a computer program stored in the memory to control the quality assessment model using apparatus to perform a corresponding function, perform a corresponding action or task. It should be noted that, according to the quality evaluation model, the implementation form of the device and the scene where the device is located, the functions, actions or tasks to be implemented may be different; accordingly, the computer programs stored in the memory may vary, and execution of the different computer programs by the processor may control the quality assessment model to perform different functions, perform different actions or tasks with the device.

In some optional embodiments, the quality assessment model using apparatus may further comprise: display, power components, and communication components, among other components. These are only given as some illustrative components and do not mean that the quality assessment model using apparatus only includes these components, and the quality assessment model using apparatus may further include other components for different application requirements, for example, in the case where there is a need for voice interaction, the quality assessment model using apparatus may further include an audio component. Regarding the components that the quality evaluation model using apparatus may include, the specific components may depend on the product form of the quality evaluation model using apparatus, and are not limited herein.

In an embodiment of the application, the processor, when executing the computer program in the memory, is configured to: acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into a quality evaluation model; and in the quality evaluation model, performing quality evaluation on the target multi-modal data according to the general evaluation parameters, wherein the quality evaluation result represents the quality of the target multi-modal data.

In an alternative embodiment, within the quality assessment model, the processor, when performing the quality assessment on the target multimodal data according to the general assessment parameters, is configured to: vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data; performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function; and predicting the two classified fusion vectors according to the evaluation model parameters to obtain the ratio of the two corresponding fusion vectors, and outputting the ratio as a quality evaluation result.

It should be noted that, for specific functions of the processor in the quality assessment model using device, reference may be made to the above method embodiments, which are not described herein again.

Accordingly, the present application further provides a computer readable storage medium storing a computer program, which when executed, can implement the steps that can be performed by the quality assessment model using device in the above method embodiments.

The communication component in the above embodiments is configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

The display in the above embodiments includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The power supply assembly of the above embodiments provides power to various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

The audio component in the above embodiments may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A quality assessment model training method for multi-modal data, comprising:

acquiring N multi-mode sample data corresponding to N training tasks, wherein each multi-mode sample data comprises data of at least two modes;

vectorizing and fusing the N pieces of multi-mode sample data to obtain N fused vectors, and performing secondary classification on the N fused vectors by using an activation function;

performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the N classified fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training;

performing second batch training on the quality evaluation model based on the plurality of intermediate state evaluation parameters and the N classified fusion vectors to obtain a function loss sum corresponding to each batch of training;

and determining a general evaluation parameter corresponding to the quality evaluation model according to the sum of the loss of the plurality of functions obtained by the second batch training.

2. The method of claim 1, wherein obtaining N multi-modal sample data for N training tasks comprises:

acquiring T multi-modal data under multiple scenes, and randomly sampling N multi-modal sample data from the T multi-modal data to serve as N training tasks; wherein N is a positive integer greater than 1, and N is less than or equal to T.

3. The method of claim 1, wherein vectorizing the N multimodal sample data comprises:

performing vector calculation on the text data in the N multi-modal sample data in a word vector calculation mode to obtain word vectors in the N multi-modal sample data;

performing vector processing on the behavior data in the N pieces of multi-modal sample data by adopting a graph neural network to obtain behavior vectors in the N pieces of multi-modal sample data;

and performing vector processing on the auxiliary data in the N multi-mode sample data by adopting a single hot coding mode to obtain the coding vectors in the N multi-mode sample data.

4. The method according to claim 3, wherein performing the fusion processing on the N pieces of multi-modal sample data to obtain the N fusion vectors comprises:

calculating the outer product of every two vectors to obtain a middle vector by taking every two vectors as a group of N vectors after the N multi-modal sample data are subjected to quantization processing;

and leveling the intermediate vector to obtain fusion vectors corresponding to the N pieces of multi-modal sample data respectively.

5. The method according to any one of claims 1-4, wherein the N fused vectors comprise X support vectors and Y query vectors;

performing first batch training on the quality evaluation model based on the initialized evaluation parameters and the N classified fusion vectors to obtain a plurality of intermediate state evaluation parameters corresponding to each batch of training, including:

obtaining a corresponding number of first fusion vectors in batches from the X support vectors according to the set number of single samples,

and based on the initialized evaluation parameters and the first loss function, sequentially performing first gradient descent calculation on the quality evaluation model by using first fusion vectors obtained in batches to obtain intermediate state evaluation parameters corresponding to the number of single samples in each calculation.

6. The method of claim 5, wherein performing a second batch training of the quality assessment model based on the plurality of intermediate state assessment parameters and the two classified N fused vectors to obtain a function loss sum corresponding to each batch training comprises:

obtaining a corresponding number of second fusion vectors in batches from the Y query vectors according to the set number of single samples;

based on the single-sample-quantity intermediate state evaluation parameters and the second loss functions obtained each time, sequentially performing second gradient descent calculation on the quality evaluation model by using second fusion vectors obtained in batches to obtain function loss sums corresponding to each calculation;

and calculating the sum of the function losses corresponding to the second fusion vector under the intermediate state evaluation parameters respectively used by the second fusion vector at each second gradient descent.

7. The method of claim 6, wherein determining the general evaluation parameters corresponding to the quality evaluation model according to the sum of the loss of the plurality of functions obtained by the second batch training comprises:

determining an intermediate state evaluation parameter corresponding to the minimum function loss sum in the plurality of function loss sums obtained by the second batch training;

and taking the intermediate state evaluation parameter corresponding to the minimum function loss sum as a general evaluation parameter corresponding to the quality evaluation model.

8. The method of claim 7, further comprising:

acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into the quality evaluation model;

and in the quality evaluation model, performing quality evaluation on the target multi-modal data according to the general evaluation parameters, wherein the quality evaluation result represents the quality of the target multi-modal data.

9. The method of claim 8, wherein in the case of acquiring T multimodal data in a plurality of scenes, further comprising:

randomly sampling M multi-modal sample data from the T multi-modal data as M test tasks, wherein M is a positive integer greater than 1, and (N + M) is less than or equal to T;

vectorizing and fusing the M multi-mode sample data to obtain M fused vectors, and performing secondary classification on the M fused vectors by using an activation function.

10. The method according to claim 9, wherein the M multi-modal sample data includes the target multi-modal data, and the M fused vectors obtained by the quantization processing and fusion processing of the M multi-modal sample data include P support vectors and Q query vectors, where P is smaller than N;

before inputting the target multi-modal data into the quality assessment model, further comprising:

and performing third batch training on the quality evaluation model by using the P support vectors, and finely adjusting the general evaluation parameters.

11. The method of claim 10, further characterized by performing a quality assessment of the targeted multimodal data according to the generic assessment parameters within the quality assessment model, comprising:

vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data;

performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function;

and predicting the two classified fusion vectors based on the fine-tuned general evaluation parameters and the Q query vectors to obtain a ratio of the two corresponding types of fusion vectors, and outputting the ratio as a quality evaluation result.

12. A method of using a quality assessment model, comprising:

acquiring target multi-modal data in a target scene, and inputting the target multi-modal data into a quality evaluation model;

and performing quality evaluation on the target multi-modal data according to the general evaluation parameters in the quality evaluation model, wherein the quality evaluation result represents the quality of the target multi-modal data.

13. The method of claim 12, wherein performing a quality assessment on the targeted multi-modal data according to general assessment parameters within the quality assessment model comprises:

vectorizing and fusing the target multi-modal data in the quality evaluation model to obtain a fusion vector corresponding to the target multi-modal data, and performing secondary classification on the fusion vector corresponding to the target multi-modal data by using an activation function;

and predicting the two classified fusion vectors according to the evaluation model parameters to obtain the ratio of the two corresponding fusion vectors, and outputting the ratio as a quality evaluation result.

14. A quality assessment model training device for multimodal data, comprising: a memory having a computer program stored therein and a processor for executing the computer program for performing the steps of the method of any one of claims 1 to 11.

15. A quality assessment model training device for multimodal data, comprising: a memory having a computer program stored therein and a processor for executing the computer program for performing the steps of the method of any one of claims 12-13.

16. A computer-readable storage medium storing a computer program/instructions, which when executed by a processor causes the processor to carry out the steps of the method of any one of claims 1-13.