CN116933896A

CN116933896A - Super-parameter determination and semantic conversion method, device, equipment and medium

Info

Publication number: CN116933896A
Application number: CN202311189418.7A
Authority: CN
Inventors: 廖金龙; 许士芳; 吴长平; 姚建国
Original assignee: Shanghai Suiyuan Intelligent Technology Co ltd
Current assignee: Shanghai Suiyuan Intelligent Technology Co ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-10-24
Anticipated expiration: 2043-09-15
Also published as: CN116933896B

Abstract

The invention discloses a method, a device, equipment and a medium for determining super parameters and converting semantics. The super parameter determining method based on the double-tower model comprises the following steps: the method comprises the steps of comparing a target computing power of a target pre-training large model with a comprehensive computing power of target hardware equipment in a numerical mode; if the comparison result exceeds a preset threshold, processing each target pre-training small model according to a preset super-parameter combination to generate a target training sample set; generating a basic feature sample set based on the preset training feature setting field and the preset convergence feature setting field; generating a target double-tower model according to the basic double-tower model of the basic feature sample set back propagation training; and traversing the basic super-parameter range corresponding to the target pre-training large model according to the target double-tower model, and determining the target super-parameter corresponding to the target pre-training large model. By the technical scheme, the super parameters of the pre-training large model can be quickly obtained, the success rate of the pre-training of the large model is improved, and the training cost is reduced.

Description

Super-parameter determination and semantic conversion method, device, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for determining a super parameter and converting semantics.

Background

As computing power increases and data sets increase, the size of the pre-trained large model also gradually increases. Therefore, when training a pre-training large model, the training difficulty is also increased, and the problems of difficult convergence, slow convergence and unstable training are often encountered.

In the prior art, for a model with fewer parameters, adaptive traversal can be generally performed by setting a parameter range, and then, a group of stable and reliable super parameters are searched for model training. However, if the super parameters of the pre-training large model are determined by adopting a self-adaptive traversal parameter range mode, the training time and the training difficulty of the pre-training large model are increased, and the training cost is also increased. Therefore, how to improve the success rate of training the pre-training large model and reduce the training cost is a problem to be solved at present.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for determining super parameters and converting semantics, which can solve the problems of low success rate and high training cost of large model pre-training.

According to an aspect of the present invention, there is provided a super parameter determination method based on a double-tower model, the method being applied to a super parameter determination scene of a pre-trained large model, the pre-trained large model being carried in a target hardware device, the method comprising:

Acquiring comprehensive calculation force of target hardware equipment and basic calculation force of each basic super parameter corresponding to a target pre-training large model, calculating according to each basic calculation force to obtain target calculation force corresponding to the target pre-training large model, and numerically comparing the target calculation force with the comprehensive calculation force to generate a comparison result;

if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained, and each target pre-training small model in the pre-training small model set is trained and judged according to the preset super-parameter combination, so that a target training sample set is generated;

determining basic training features corresponding to each target pre-training small model in a target training sample set based on a preset training feature setting field, determining basic convergence features corresponding to each target pre-training small model in the target training sample set based on a preset convergence feature setting field, and combining the basic training features and the basic convergence features to generate a basic feature sample set;

acquiring a basic double-tower model, and training the basic double-tower model according to the back propagation of the basic characteristic sample set to generate a target double-tower model;

obtaining each basic super parameter corresponding to the target pre-training large model and a basic super parameter range corresponding to each basic super parameter, traversing the basic super parameter range corresponding to each basic super parameter according to the target double-tower model, and determining the target super parameter corresponding to the target pre-training large model.

According to another aspect of the present invention, there is provided a semantic conversion method, the method comprising:

obtaining a text to be converted, and inputting the text to be converted into a target semantic conversion model; the target semantic conversion model is a target pre-training large model, and is obtained through training by the double-tower model-based hyper-parameter determination method according to any embodiment of the invention;

and determining text characteristics corresponding to the text to be converted through the target semantic conversion model, and generating a corresponding target semantic conversion result based on the text characteristics.

According to another aspect of the present invention, there is provided a super parameter determination apparatus based on a twin tower model, the apparatus being applied to a super parameter determination scene of a pre-trained large model, the pre-trained large model being carried in a target hardware device, the apparatus comprising:

the calculation force judging module is used for acquiring the comprehensive calculation force of the target hardware equipment and the basic calculation force of each basic super parameter corresponding to the target pre-training large model, calculating according to each basic calculation force to obtain the target calculation force corresponding to the target pre-training large model, and numerically comparing the target calculation force with the comprehensive calculation force to generate a comparison result;

The training sample generation module is used for acquiring a pre-training small model set and a preset super-parameter combination if the comparison result exceeds a preset threshold value, training and judging each target pre-training small model in the pre-training small model set according to the preset super-parameter combination, and generating a target training sample set;

the feature sample generation module is used for determining basic training features corresponding to each target pre-training small model in the target training sample set based on the preset training feature setting field, determining basic convergence features corresponding to each target pre-training small model in the target training sample set based on the preset convergence feature setting field, and combining the basic training features and the basic convergence features to generate a basic feature sample set;

the double-tower model construction module is used for acquiring a basic double-tower model, training the basic double-tower model according to the back propagation of the basic characteristic sample set, and generating a target double-tower model;

the super-parameter determining module is used for obtaining each basic super-parameter corresponding to the target pre-training large model and a basic super-parameter range corresponding to each basic super-parameter, traversing and processing the basic super-parameter range corresponding to each basic super-parameter according to the target double-tower model, and determining the target super-parameter corresponding to the target pre-training large model.

According to another aspect of the present invention, there is provided a semantic conversion apparatus, characterized in that the method includes:

the data acquisition module is used for acquiring a text to be converted and inputting the text to be converted into the target semantic conversion model; the target semantic conversion model is a target pre-training large model, and is obtained through training by the double-tower model-based hyper-parameter determination method according to any embodiment of the invention;

and the result generation module is used for determining text characteristics corresponding to the text to be converted through the target semantic conversion model and generating a corresponding target semantic conversion result based on the text characteristics.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the dual-tower model-based hyper-parameter determination method of any of the embodiments of the invention or to perform the semantic conversion method of any of the embodiments of the invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the two-tower model-based hyper-parameter determination method according to any of the embodiments of the present invention or to perform the semantic conversion method according to any of the embodiments of the present invention when executed.

According to the technical scheme, the target computing force corresponding to the target pre-training big model is obtained through computing according to the basic computing force corresponding to the target pre-training big model, the target computing force is compared with the comprehensive computing force of the target hardware equipment in a numerical mode to generate a comparison result, if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained, training is conducted according to the preset super-parameter combination, all target pre-training small models in the pre-training small model set are judged to be processed, a target training sample set is generated, further, basic training characteristics corresponding to all target pre-training small models in the target training sample set are determined based on preset training characteristic setting fields, basic convergence characteristics corresponding to all target pre-training small models in the target training sample set are determined based on preset convergence characteristic setting fields, basic training characteristics and basic convergence characteristic sample sets are generated, further, the basic double-tower model is reversely transmitted according to the basic characteristic sample set to generate a target double-tower model, finally, all basic super-parameters corresponding to the target pre-training large model and basic super-parameter range corresponding to all basic super-parameters are obtained, the target super-training parameters corresponding to the target double-tower super-training large model is rapidly achieved, and the target super-training success rate can be reduced.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a dual tower model-based hyper-parameter determination method according to a first embodiment of the invention;

FIG. 2 is a flow chart of a method for determining super parameters based on a dual tower model according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a target hyper-parameter determination method according to a second embodiment of the invention;

FIG. 4 is a flow chart of a super parameter determination method based on a dual tower model according to a third embodiment of the present invention;

FIG. 5 is a flowchart of a method for generating a fusion training feature vector according to a third embodiment of the present invention;

Fig. 6 is a flowchart of a fusion convergence feature vector generation method according to a third embodiment of the invention;

FIG. 7 is a flow chart of a target dual-tower model generation method according to a third embodiment of the present invention;

FIG. 8 is a flow chart of an alternative dual tower model based hyper-parameter determination method in accordance with a third embodiment of the invention;

FIG. 9 is a flow chart of a semantic conversion method according to a fourth embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a super parameter determining device based on a dual-tower model according to a fifth embodiment of the present invention;

fig. 11 is a schematic structural diagram of a semantic conversion device according to a sixth embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device implementing a super parameter determining method or a semantic conversion method based on a double-tower model according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a super parameter determining method based on a dual-tower model, which is applicable to a case of determining super parameters of a pre-trained large model, where the pre-trained large model is carried in a target hardware device, and the method may be performed by a super parameter determining device based on a dual-tower model, where the super parameter determining device based on a dual-tower model may be implemented in a form of hardware and/or software, and the super parameter determining device based on a dual-tower model may be configured in an electronic device, for example, may be configured in a computer device. As shown in fig. 1, the method includes:

S110, acquiring comprehensive calculation force of the target hardware equipment and basic calculation force of each basic super parameter corresponding to the target pre-training large model, calculating according to each basic calculation force to obtain target calculation force corresponding to the target pre-training large model, and comparing the target calculation force with the comprehensive calculation force in a numerical mode to generate a comparison result.

The target hardware device may refer to a hardware device that carries a target pre-training large model, that is, a hardware device that needs to be used when the target pre-training large model is trained. By way of example, the target hardware device may be an artificial intelligence (Artificial Intelligence, AI) chip or distributed cluster, or the like. The aggregate computing power may refer to the computing power of the target hardware device to process the data.

The target pre-training large model may refer to a large model that needs to be trained. The superparameter may refer to a parameter that needs to be manually set in the deep learning model, including model structure superparameters and model training superparameters. By way of example, model structure superparameters may include the number of network layers, the size and length of each layer, the type of use of the activation function, and the paradigm of gradient clipping use, among others. Model training hyper-parameters may include learning rate, batch size, number of iterations, regularization parameters, amount of data used, training strategy related parameters, and the like. The basic super-parameters can refer to super-parameters which need to be set in the training process of the target pre-training large model.

The basic computing power can refer to computing power which needs to be satisfied when data processing is performed on each basic super-parameter. Illustratively, the underlying computational effort may be the processing time period required in processing the respective underlying superparameter for the data. The target computing power may refer to the computing power that needs to be met when data processing is performed on all basic hyper-parameters in the target pre-trained large model. For example, the target calculation may be the sum of all base calculations in the target pre-trained large model. The comparison result may refer to a difference result between the target calculation force and the integrated calculation force.

S120, if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained, and each target pre-training small model in the pre-training small model set is trained and judged to be processed according to the preset super-parameter combination, so that a target training sample set is generated.

The preset threshold may be a preset value for evaluating the comparison result. For example, if the comparison result exceeds the preset threshold, it indicates that the training time is longer when the target hardware device is used to perform conventional training on the target pre-training large model, so that the super-parameters of the target pre-training large model need to be predetermined.

Wherein the pre-trained small model may refer to a deep learning model with fewer model parameters than the pre-trained large model. The pre-training small model set may refer to a model space composed of pre-training small models of different weight parameter sizes and pre-training small models of different network structures. It is noted that the pre-training small model set includes a plurality of pre-training small models with the same parameter size but different model structures, different parameter sizes but the same model structure, and different parameter sizes and different model structures. The embodiment of the invention does not limit the specific model parameters and the structure of the pre-training small model.

The target pre-training small model may refer to a pre-training small model selected for subsequent operations in the target training sample set. The preset superparameter combinations may refer to preset superparameter combinations comprising different superparameter combination types. For example, the preset hyper-parameter combinations may include preset training hyper-parameters and preset structural hyper-parameters. The embodiment of the invention does not limit the types of the preset training super parameters and the preset structure super parameters in the preset super parameter combination. The target training sample set may refer to a sample set obtained after actual training of each target pre-training small model according to a preset super-parameter combination. For example, the set of target training samples may contain the resulting features corresponding to each target pre-training small model.

S130, determining basic training features corresponding to all target pre-training small models in a target training sample set based on preset training feature setting fields, determining basic convergence features corresponding to all target pre-training small models in the target training sample set based on preset convergence feature setting fields, and combining the basic training features and the basic convergence features to generate a basic feature sample set.

Wherein, the basic training features may refer to basic features in the target pre-training small model training results. Illustratively, the underlying training features may include the size of the model, relevant parameters of the model network structure, configuration of the super-parameters, and relevant training strategies, among others. The relevant parameters of the model network structure may specifically include the number and size of hidden layers, loss functions, etc. The configuration of the super-parameters may specifically include learning rate, batch size, iteration number, regularization parameters, amount of data used, activation function type, use of gradient clipping and its paradigm, etc. The preset training feature setting field may refer to a field for identifying and extracting basic training features corresponding to the target pre-training small model. For example, key fields of the features may be trained for each basis.

The basic convergence feature may refer to a feature representing convergence characteristics in the training result of the target pre-training small model. Illustratively, the base convergence feature may include a loss function (loss) curve, a norm result variation curve of the gradient, a smoothness of a dynamic scale curve of the mix training, the number and location of spikes, a fit to a standard reference curve, a final convergence value of the curve, a test accuracy on a validation set, and expert convergence decision results, among others. The preset convergence feature setting field may refer to a field for identifying and extracting a basic convergence feature corresponding to the target pre-training small model. For example, key fields for each base convergence feature may be provided.

The basic feature sample set may refer to a combination of basic training features and basic convergence features corresponding to each target pre-training small model.

And S140, acquiring a basic double-tower model, and training the basic double-tower model according to the back propagation of the basic characteristic sample set to generate a target double-tower model.

The basic double-tower model may refer to a fusion feature extraction model constructed in advance. For example, the base dual-tower model may include a base training feature extraction cage and a base convergence feature extraction cage. The basic training feature extraction and fusion device is used for generating corresponding fusion vectors according to basic training features. The basic convergence feature extraction and fusion device is used for generating corresponding fusion vectors according to the basic convergence features. Thus, effective data support can be provided for subsequent operations. In the embodiment of the invention, the basic training feature extraction fusion device and the basic convergence feature extraction fusion device can adopt a multi-layer perceptron model.

The target double-tower model may refer to a basic double-tower model after parameter training. For example, the target dual-tower model may include a trained target training feature extraction cage and a target convergence feature extraction cage.

S150, obtaining each basic super parameter corresponding to the target pre-training large model and a basic super parameter range corresponding to each basic super parameter, traversing the basic super parameter range corresponding to each basic super parameter according to the target double-tower model, and determining the target super parameter corresponding to the target pre-training large model.

The basic super-parameter range may refer to a preset value range of each basic super-parameter. The target hyper-parameters may refer to combinations of hyper-parameters corresponding to the final determined target pre-trained large model.

Therefore, the target super-parameters can be directly utilized to train the target pre-training large model, and the problems of low training success rate and high training cost of the target pre-training large model caused by adaptively traversing and searching a group of stable and reliable super-parameters in the training process of the target pre-training large model are avoided.

Example two

Fig. 2 is a flowchart of a super parameter determining method based on a double-tower model according to a second embodiment of the present invention, where the method is based on the above embodiment, and in this embodiment, specifically, the method includes the steps of traversing the basic super parameter ranges corresponding to each basic super parameter according to a target double-tower model, and determining a target super parameter corresponding to the target pre-training large model, where the method includes: acquiring target convergence characteristics and target model parameters corresponding to a target pre-training large model, traversing and processing basic super parameter ranges corresponding to basic super parameters according to a target double-tower model to obtain preset training characteristic vectors corresponding to basic super parameter combinations corresponding to target model parameters and preset convergence characteristic vectors corresponding to target convergence characteristics; calculating a basic cosine similarity value between a preset training feature vector and a preset convergence feature vector, and determining a target cosine similarity value with the largest value in the basic cosine similarity value; and determining a target hyper-parameter corresponding to the target pre-training large model according to the target training feature vector corresponding to the target cosine similarity value. As shown in fig. 2, the method includes:

S210, acquiring comprehensive calculation force of the target hardware equipment and basic calculation force of each basic super parameter corresponding to the target pre-training large model, calculating according to each basic calculation force to obtain target calculation force corresponding to the target pre-training large model, and comparing the target calculation force with the comprehensive calculation force in a numerical mode to generate a comparison result.

S220, if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained.

S230, training and processing all target pre-training small models in the pre-training small model set according to the preset super-parameter combination, and generating a training result.

The training result may be a result obtained after the actual training of each target pre-training small model according to the preset super-parameter combination. For example, the training results may include training sample results of the current training and corresponding base training feature results. It is noted that the basic training features may be preset during each training process to ensure the effectiveness of the target pre-training small model training process. The training sample result may be a sample result obtained from an input sample training. For example, if the input sample is a picture, the training sample result may be a result picture identifying the relevant feature.

Specifically, if the pre-training small model set includes m target pre-training small models, and the pre-set hyper-parameter combinations are p, then the method can constructTraining scenes of the target pre-training small models are respectively matched with p preset super-parameter combinationsAnd (3) actually training the m target pre-training small models, and then combining basic training characteristics in each training process and sample results generated after each training to generate training results corresponding to the target pre-training small models.

S240, generating a convergence result corresponding to the training result according to a preset judgment strategy.

The preset determination policy may refer to a preset convergence condition determination policy. The preset judgment strategy may be to judge the training result according to the convergence value of the convergence curve, or judge the training result according to the expert judgment result. The embodiment of the present invention is not limited thereto.

The convergence result may refer to a relevant result of performing convergence judgment on each training result according to a preset judgment policy. Illustratively, the convergence result may include a determination result, e.g., convergence or non-convergence, and may include a base convergence feature corresponding to the training result.

Specifically, after training the target pre-training small models in the pre-training small model set according to the preset super-parameter combination, generating a training result, performing convergence judgment on a training sample result in the training result according to a preset judgment strategy, and further generating a convergence result comprising a judgment result and corresponding basic convergence characteristics. Thereby providing an effective basis for subsequent operations.

It should be noted that, in order to ensure the quality and the balance of the data set, in the embodiment of the present invention, the balance of the determination result in the total convergence result may be further determined, that is, whether the number of samples in the total convergence result, in which the determination result is converged, is similar to the number of samples in which the determination result is not converged, is counted, and if not, the non-converged samples may be additionally constructed, so as to ensure the accuracy of the super parameter of the final target pre-training large model.

S250, combining training results and convergence results corresponding to the target pre-training small models to generate a target training sample set.

S260, determining basic training features corresponding to each target pre-training small model in the target training sample set based on the preset training feature setting field, determining basic convergence features corresponding to each target pre-training small model in the target training sample set based on the preset convergence feature setting field, and combining the basic training features and the basic convergence features to generate a basic feature sample set.

S270, acquiring a basic double-tower model, and training the basic double-tower model according to the back propagation of the basic characteristic sample set to generate a target double-tower model.

S280, acquiring each basic super parameter corresponding to the target pre-training large model and a basic super parameter range corresponding to each basic super parameter.

S290, acquiring target convergence characteristics and target model parameters corresponding to the target pre-training large model, and traversing and processing basic super-parameter ranges corresponding to basic super-parameters according to the target double-tower model to obtain preset training characteristic vectors corresponding to basic super-parameter combinations corresponding to the target model parameters and preset convergence characteristic vectors corresponding to the target convergence characteristics.

Where the target convergence feature may refer to a convergence feature that the target pre-trained large model is expected to achieve. The target model parameters may refer to model parameters that the target pre-trained large model expects to set. For example, the parameter number may be included, the weight parameter may be included, and the like.

Under the condition of fixed target model parameters, the preset training feature vector can refer to a fusion training feature vector generated after the target double-tower model traverses and processes the basic super-parameter range corresponding to each basic super-parameter. The preset convergence feature vector may refer to a fusion convergence feature vector obtained by processing the target convergence feature through the target double-tower model.

Specifically, after the target double-tower model is generated and the target convergence feature and the target model parameter corresponding to the target pre-training large model are determined, the basic super-parameter range can be input into a target training feature extraction fusion device of the target double-tower model to output a preset training feature vector, and the target convergence feature is input into the target convergence feature extraction fusion device of the target double-tower model to output the preset convergence feature vector.

And S2100, calculating a basic cosine similarity value between a preset training feature vector and a preset convergence feature vector, and determining a target cosine similarity value with the largest value in the basic cosine similarity values.

The basic cosine similarity value may refer to a cosine similarity between a preset training feature vector and a preset convergence feature vector. Exemplary, if the preset training feature vector isPresetting the convergence characteristic vector as +.>Then it can be calculated by the formulaAnd calculating a basic cosine similarity value between the preset training feature vector and the preset convergence feature vector.

The target cosine similarity value may refer to the cosine similarity with the largest value in the full-scale base cosine similarity value.

S2110, determining a target super parameter corresponding to the target pre-training large model according to the target training feature vector corresponding to the target cosine similarity degree value.

The target training feature vector may refer to a preset training feature vector corresponding to the target cosine similarity degree value.

Specifically, after the target cosine similarity value with the largest value in the basic cosine similarity values is determined, the target training feature vector corresponding to the target cosine similarity value can be determined, and then the hyper-parameter combination for generating the target training feature vector in the basic hyper-parameter range can be used as the target hyper-parameter.

Fig. 3 is a flowchart of a target hyper-parameter determination method according to an embodiment of the invention. Specifically, first, fixing target convergence characteristics and target model parameters corresponding to a target pre-training large model, further, traversing a basic super-parameter range corresponding to each basic super-parameter by using a target training characteristic extraction fusion device in a target double-tower model under the condition of fixing the target model parameters, outputting a preset training characteristic vector, inputting the target convergence characteristics into the target convergence characteristic extraction fusion device of the target double-tower model, outputting the preset convergence characteristic vector, further, calculating a basic cosine similarity value between the preset training characteristic vector and the preset convergence characteristic vector, determining a target cosine similarity value with the largest value in the basic cosine similarity value, determining a target training characteristic vector corresponding to the target cosine similarity value, and finally, taking a super-parameter combination for generating the target training characteristic vector in the basic super-parameter range as a target super-parameter. Thus, the determination of the target super-parameters corresponding to the target pre-training large model is completed.

According to the technical scheme of the embodiment of the invention, the target computing force corresponding to the target pre-training big model is obtained through calculation according to each basic computing force corresponding to the target pre-training big model, the target computing force is compared with the comprehensive computing force of the target hardware equipment in a numerical mode to generate a comparison result, if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained, each target pre-training small model in the pre-training small model set is trained according to the preset super-parameter combination, a training result is generated, and a convergence result corresponding to the training result is generated according to a preset judging strategy, and the target training sample set is generated, further, basic training characteristics corresponding to each target pre-training small model in the target training sample set are determined based on the preset training characteristic setting field, basic training characteristics corresponding to each target pre-training small model in the target training sample set are determined based on the preset convergence characteristic setting field, the basic training characteristics corresponding to each basic super-training small model in the target training sample set are generated based on the preset convergence characteristic combination, further, the basic double-tower model is obtained according to the basic double-feature sample set is reversely propagated and trained, the basic double-convergence parameters corresponding to the basic double-model is obtained, the basic double-training parameters corresponding to the target double-convergence characteristics corresponding to the target super-convergence model is obtained, the basic double-convergence characteristic range corresponding to the basic parameters corresponding to the target double-training model is obtained, and the basic parameters corresponding to the basic double-model is obtained by the basic double-convergence parameters corresponding to the basic parameters corresponding to the target double-model, and the super-convergence parameters corresponding to the basic model, and finally, calculating a basic cosine similarity value between a preset training feature vector and a preset convergence feature vector, determining a target cosine similarity value with the largest value in the basic cosine similarity value, and determining a target hyper-parameter corresponding to the target pre-training large model according to the target training feature vector corresponding to the target cosine similarity value, so that the hyper-parameter of the pre-training large model can be quickly obtained, the success rate of the pre-training of the large model is improved, and the training cost is reduced.

Example III

Fig. 4 is a flowchart of a super parameter determining method based on a dual-tower model according to a third embodiment of the present invention, where the method is based on the above embodiment, and in this embodiment, specifically, the operation of training the basic dual-tower model according to the back propagation of the basic feature sample set to generate a target dual-tower model is refined, and the method specifically may include: preprocessing the basic characteristic sample set by data to obtain a target sample set; wherein each target sample in the target sample set comprises a target training feature and a target convergence feature; acquiring target training features corresponding to target samples, and splicing and processing first discrete features and first continuous features in the target training features to generate basic training feature vectors; acquiring a target convergence feature corresponding to a target sample, and splicing and processing a second discrete feature and a second continuous feature in the target convergence feature to generate a basic convergence feature vector; generating a fusion training feature vector corresponding to the basic training feature vector and a fusion convergence feature vector corresponding to the basic convergence feature vector by using the basic double-tower model, and reversely propagating and training the basic double-tower model by using the difference relation between the fusion training feature vector and the fusion convergence feature vector to generate a target double-tower model. As shown in fig. 4, the method includes:

S310, acquiring comprehensive calculation force of the target hardware equipment and basic calculation force of each basic super parameter corresponding to the target pre-training large model, calculating according to each basic calculation force to obtain target calculation force corresponding to the target pre-training large model, and comparing the target calculation force with the comprehensive calculation force in a numerical mode to generate a comparison result.

S320, if the comparison result exceeds a preset threshold value, a pre-training small model set and a preset super-parameter combination are obtained, and each target pre-training small model in the pre-training small model set is trained and judged to be processed according to the preset super-parameter combination, so that a target training sample set is generated.

S330, determining basic training features corresponding to each target pre-training small model in the target training sample set based on the preset training feature setting field, determining basic convergence features corresponding to each target pre-training small model in the target training sample set based on the preset convergence feature setting field, and combining the basic training features and the basic convergence features to generate a basic feature sample set.

S340, acquiring a basic double-tower model.

S350, preprocessing the basic characteristic sample set by data to obtain a target sample set; each target sample in the target sample set comprises a target training feature and a target convergence feature.

The data preprocessing may refer to a normalization process or a nesting process for samples in the basic feature sample set. The target sample may refer to a base feature sample after data preprocessing. The set of target samples may refer to a set of a full number of target samples.

The target training feature may refer to a training feature corresponding to the target sample. The target convergence feature may refer to a convergence feature corresponding to the target sample.

In an alternative embodiment, the preprocessing the basic feature sample set by the data to obtain a target sample set includes: normalizing continuous feature values corresponding to basic training features in each basic feature sample to generate a first continuous feature, and generating continuous feature values corresponding to basic convergence features to generate a second continuous feature; nesting discrete feature values corresponding to basic training features in each basic feature sample to generate first discrete features, and discrete feature values corresponding to basic convergence features to generate second discrete features; and combining the first continuous feature with the first discrete feature to generate a target training feature, and combining the second continuous feature with the second discrete feature to generate a target convergence feature to obtain a target sample set containing the target training feature and the target convergence feature.

Where the continuous feature value may refer to a feature in which a decimal point exists in the base feature sample or the base convergence feature. By way of example, it may be a learning rate, etc. The normalization processing may refer to an operation of normalizing the continuous feature values in the basic training feature. Illustratively, each successive eigenvalue is normalized to the [0,1] interval. The first continuous feature may be a feature generated by performing standardization processing on continuous feature values corresponding to each basic training feature. The second continuous feature may refer to a feature generated by normalizing continuous feature values corresponding to each basic convergence feature.

Wherein, the discrete feature value may refer to a feature in which no decimal point exists in the base feature sample or the base convergence feature. By way of example, it may be the number of iterations, etc. Nesting processing may refer to the operation of processing discrete feature values in a base feature sample or a base converging feature into a vector of the same dimension. The first discrete feature may refer to a feature generated by nesting discrete feature values corresponding to each basic training feature. The second discrete feature may refer to a feature generated by nesting discrete feature values corresponding to each base convergence feature.

Therefore, standard training features or convergence features can be generated by carrying out standardization processing on continuous feature values corresponding to the basic training features and the basic convergence features and carrying out nesting processing on discrete feature values corresponding to the basic training features and the basic convergence features, so that an effective foundation is provided for subsequent operation.

It should be noted that, in the embodiment of the present invention, the basic feature sample in which the missing data exists in the basic feature sample set may also be deleted before the data preprocessing.

S360, acquiring target training features corresponding to the target samples, and splicing and processing first discrete features and first continuous features in the target training features to generate basic training feature vectors.

The stitching may refer to a process of stitching the first discrete feature and the first continuous feature in the target training feature into a one-dimensional vector. The basic training feature vector may refer to a feature vector generated after the target training feature is stitched.

In an optional embodiment, the obtaining the target training feature corresponding to the target sample, and the stitching process the first discrete feature and the first continuous feature in the target training feature, to generate a basic training feature vector, includes: sequentially accumulating and processing first discrete features in the target training features to obtain a first accumulated vector; and splicing the first accumulated vector and the first continuous feature according to a preset splicing strategy to generate a basic training feature vector.

The sequential accumulation process may refer to a sequential accumulation process of a first discrete feature in the target training features. The first accumulated vector may refer to a vector obtained by sequentially accumulating the first discrete feature in the target training features. The preset splicing strategy may refer to a preset splicing sequence strategy. Illustratively, the cumulative vector may be before the continuous feature or after the continuous feature, which the embodiments of the present invention do not limit. It should be noted that, in the embodiment of the present invention, if a preset splicing policy is selected, the preset splicing policy is adopted in a unified manner for the subsequent operation.

Specifically, if the number of first continuous features in the target training features is s, the dimension of the first discrete features isSequentially accumulating the first discrete features in the target training features to obtain a first accumulated vector, and splicing the first accumulated vector and the first continuous features to generate a basic training feature vector with the total dimension of +.>. Wherein n isThe parameters can be customized according to actual conditions, and can be generally set according to the feature extraction result of the fusion device.

S370, acquiring target convergence characteristics corresponding to the target samples, and splicing and processing second discrete characteristics and second continuous characteristics in the target convergence characteristics to generate basic convergence characteristic vectors.

The base convergence feature vector may refer to a feature vector generated after the target convergence feature is spliced.

In an optional implementation manner, the obtaining the target convergence feature corresponding to the target sample, and the stitching the second discrete feature and the second continuous feature in the target convergence feature, to generate a basic convergence feature vector, includes: sequentially accumulating and processing second discrete features in the target convergence features to obtain a second accumulated vector; and splicing the second accumulated vector and the second continuous feature according to a preset splicing strategy to generate a basic convergence feature vector.

The second accumulated vector may be a vector obtained by sequentially accumulating the second discrete features in the target convergence feature.

Specifically, if the number of second continuous features in the target convergence feature is l, the dimension of the second discrete features isSequentially accumulating the second discrete features in the target convergence feature to obtain a second accumulated vector, and splicing the second accumulated vector and the second continuous feature to generate a basic convergence feature vector with the total dimension of +.>。

And S380, generating a fusion training feature vector corresponding to the basic training feature vector and a fusion convergence feature vector corresponding to the basic convergence feature vector by using the basic double-tower model, and reversely propagating and training the basic double-tower model by using the difference relation between the fusion training feature vector and the fusion convergence feature vector to generate a target double-tower model.

The fusion training feature vector may refer to a fusion vector output after the basic training feature vector is input to a basic training feature extraction fusion device in a basic double-tower model. The fusion convergence feature vector may refer to a fusion vector that is output after the basic convergence feature vector is input to a basic convergence feature extraction fusion device in a basic double-tower model.

Fig. 5 is a flowchart of a method for generating a fusion training feature vector according to an embodiment of the present invention. Specifically, a basic training feature vector generated after target training feature stitching processing is acquired, the basic training feature vector is input to a basic training feature extraction fusion device in a basic double-tower model, and then a fusion training feature vector corresponding to the basic training feature vector is output.

Fig. 6 is a flowchart of a fusion convergence feature vector generation method according to an embodiment of the present invention. Specifically, a basic convergence feature vector generated after target convergence feature splicing processing is obtained, the basic convergence feature vector is input to a basic convergence feature extraction fusion device in a basic double-tower model, and then a fusion convergence feature vector corresponding to the basic convergence feature vector is output.

In an optional embodiment, the training the basic dual-tower model by using the difference relation between the fusion training feature vector and the fusion convergence feature vector to perform back propagation, and generating a target dual-tower model includes: minimizing the difference value between the fusion training feature vector and the fusion convergence feature vector according to a mean square error method to generate a target loss function; and training the basic double-tower model according to the target loss function back propagation to generate a target double-tower model.

The Mean-Square Error (MSE) method may refer to a measurement method that reflects the degree of difference between an estimated amount and an estimated amount. The objective loss function may refer to the smallest difference between the fusion training feature vector and the fusion converging feature vector. The training of the basic double-tower model can be achieved with the objective loss function.

Fig. 7 is a flowchart of a target double-tower model generating method according to an embodiment of the present invention. Specifically, first, a basic feature sample set is preprocessed by data to obtain a target sample set, a first discrete feature and a first continuous feature in target training features in the target sample are spliced to generate a basic training feature vector, and a second discrete feature and a second continuous feature in target convergence features in the target sample are spliced to generate a basic convergence feature vector, so that data processing is realized. And finally, minimizing the difference value between the fusion training feature vector and the fusion convergence feature vector by using a mean square error method to generate a target loss function, and reversely propagating the training basic double-tower model according to the target loss function to generate the target double-tower model.

S390, obtaining each basic super parameter corresponding to the target pre-training large model and a basic super parameter range corresponding to each basic super parameter, traversing the basic super parameter range corresponding to each basic super parameter according to the target double-tower model, and determining the target super parameter corresponding to the target pre-training large model.

In an optional embodiment, after the determining the target super parameter corresponding to the target pre-training big model, the method further includes: and acquiring target model parameters corresponding to the target pre-training large model, and training and processing the target pre-training large model according to the target super-parameters.

Specifically, after the target super-parameters corresponding to the target pre-training large model are determined, the target super-parameters can be directly utilized, and under the condition of fixing the target model parameters, the training of the target pre-training large model is realized.

According to the technical scheme of the embodiment of the invention, the target computing power corresponding to the target pre-training big model is obtained through computing according to the basic computing power corresponding to the target pre-training big model, the target computing power is compared with the comprehensive computing power of the target hardware equipment to generate a comparison result, if the comparison result exceeds the preset threshold, a pre-training small model set and a preset super-parameter combination are obtained, each target pre-training small model in the pre-training small model set is trained and judged to be processed according to the preset super-parameter combination to generate a target training sample set, further, the basic training characteristics corresponding to each target pre-training small model in the target training sample set are determined based on the preset training characteristic setting field, the basic convergence characteristics corresponding to each target pre-training small model in the target training sample set are determined based on the preset convergence characteristic setting field, and combining the basic training features and the basic convergence features to generate a basic feature sample set, further, preprocessing the basic feature sample set to obtain a target sample set, obtaining target training features corresponding to the target samples, stitching first discrete features and first continuous features in the target training features to generate basic training feature vectors, obtaining target convergence features corresponding to the target samples, stitching second discrete features and second continuous features in the target convergence features to generate basic convergence feature vectors, further, generating fusion training feature vectors corresponding to the basic training feature vectors by using a basic double-tower model, and fusion convergence feature vectors corresponding to the basic convergence feature vectors, and back-propagating and training the basic double-tower model by using the difference relationship between the fusion training feature vectors and the fusion convergence feature vectors to generate a target double-tower model, finally, obtaining each basic super parameter corresponding to the target pre-training large model and a basic super parameter range corresponding to each basic super parameter, traversing and processing the basic super parameter range corresponding to each basic super parameter according to the target double-tower model, and determining the target super parameter corresponding to the target pre-training large model, so that the super parameters of the pre-training large model can be quickly obtained, the success rate of the pre-training of the large model is improved, and the training cost is reduced.

FIG. 8 is a flowchart of an alternative super parameter determination method based on a dual-tower model according to an embodiment of the present invention. Specifically, firstly, judging whether a comparison result between comprehensive calculation power of target hardware equipment and target calculation power corresponding to a target pre-training big model exceeds a preset threshold value, if yes, acquiring a pre-training small model set and a preset hyper-parameter combination, training and judging and processing each target pre-training small model in the pre-training small model set according to the preset hyper-parameter combination, generating a target training sample set, further, preprocessing a basic feature sample set by data to obtain a target sample set, acquiring target training features corresponding to the target samples, splicing and processing first discrete features and first continuous features in the target training features, generating basic training feature vectors, finishing extraction of training features, acquiring target convergence features corresponding to the target samples, splicing and processing second discrete features and second continuous features in the target convergence features, generating basic convergence feature vectors, finishing extraction of convergence features, further, utilizing a basic double-tower model to generate fusion feature vectors corresponding to the basic training feature vectors, and fusion convergence feature vectors corresponding to the basic convergence feature vectors, reversely propagating the fusion feature vectors by utilizing a difference relation between the fusion training feature vectors and the fusion feature vectors, finally acquiring the basic double-tower model and the preset double-convergence feature models corresponding to the target convergence feature vectors, and obtaining basic convergence feature vectors corresponding to the target double-convergence models by the preset double-tower model, calculating basic cosine similarity values between preset training feature vectors and preset convergence feature vectors, determining a target cosine similarity value with the largest value in the basic cosine similarity values, and determining target super parameters corresponding to the target pre-training large model according to the target training feature vectors corresponding to the target cosine similarity values to realize prediction of the target super parameters.

Example IV

Fig. 9 is a flowchart of a semantic conversion method according to a fourth embodiment of the present invention, where the method may be applied to converting text to be converted into a corresponding conversion result, and the method may be performed by a semantic conversion device, where the semantic conversion device may be implemented in hardware and/or software, and the semantic conversion device may be configured in an electronic device, and exemplary, may be configured in a computer device. As shown in fig. 9, the method includes:

s410, acquiring a text to be converted, and inputting the text to be converted into a target semantic conversion model; the target semantic conversion model is a target pre-training large model and is obtained through training by the super-parameter determining method based on the double-tower model.

The text to be converted may refer to text that needs to be semantically converted. The text to be converted can be question text or text with written description.

S420, determining text features corresponding to the text to be converted through the target semantic conversion model, and generating a corresponding target semantic conversion result based on the text features.

The text feature may refer to sentence feature in the text to be converted. For example, if the text to be converted is question text, the text feature may be a key question field. If the text to be converted is a text with a literal description, the text feature may be a key description field.

The target semantic conversion result may refer to a conversion result corresponding to the text to be converted, which is generated through the target semantic conversion model. For example, if the text to be converted is a question text, the corresponding target semantic conversion result may be an answer text. If the text to be converted is a text description text, the corresponding target semantic conversion result can be a description picture and the like. The embodiment of the present invention is not limited thereto.

According to the technical scheme, the obtained text to be converted is input into the target semantic conversion model, further, text characteristics corresponding to the text to be converted are determined through the target semantic conversion model, and a corresponding target semantic conversion result is generated based on the text characteristics, so that the semantic conversion result can be generated rapidly, and the semantic conversion efficiency is improved.

Example five

Fig. 10 is a schematic structural diagram of a super parameter determining device based on a dual-tower model according to a fifth embodiment of the present invention. The device is applied to a super-parameter determination scene of a pre-training large model, and the pre-training large model is borne in target hardware equipment. As shown in fig. 10, the apparatus includes: the computing power judging module 510, the training sample generating module 520, the characteristic sample generating module 530, the double-tower model constructing module 540 and the super parameter determining module 550;

The calculation force judging module 510 is configured to obtain a comprehensive calculation force of the target hardware device and a basic calculation force of each basic super parameter corresponding to the target pre-training large model, calculate according to each basic calculation force to obtain a target calculation force corresponding to the target pre-training large model, and numerically compare the target calculation force with the comprehensive calculation force to generate a comparison result;

the training sample generating module 520 is configured to obtain a pre-training small model set and a preset super-parameter combination if the comparison result exceeds a preset threshold, train and determine to process each target pre-training small model in the pre-training small model set according to the preset super-parameter combination, and generate a target training sample set;

the feature sample generation module 530 is configured to determine basic training features corresponding to each target pre-training small model in the target training sample set based on the preset training feature setting field, determine basic convergence features corresponding to each target pre-training small model in the target training sample set based on the preset convergence feature setting field, and combine the basic training features with the basic convergence features to generate a basic feature sample set;

the double-tower model construction module 540 is configured to obtain a basic double-tower model, train the basic double-tower model according to the back propagation of the basic feature sample set, and generate a target double-tower model;

The super parameter determining module 550 is configured to obtain each basic super parameter corresponding to the target pre-training large model, and a basic super parameter range corresponding to each basic super parameter, and process the basic super parameter range corresponding to each basic super parameter according to the target double-tower model traversal, so as to determine a target super parameter corresponding to the target pre-training large model.

Optionally, the dual-tower model building module 540 may specifically include: the device comprises a data preprocessing unit, a first feature vector generating unit, a second feature vector generating unit and a double-tower model constructing unit;

the data preprocessing unit is used for preprocessing the basic characteristic sample set to obtain a target sample set; wherein each target sample in the target sample set comprises a target training feature and a target convergence feature;

the first feature vector generation unit is used for acquiring target training features corresponding to the target samples, and splicing and processing first discrete features and first continuous features in the target training features to generate basic training feature vectors;

the second feature vector generation unit is used for acquiring target convergence features corresponding to the target samples, and splicing and processing second discrete features and second continuous features in the target convergence features to generate basic convergence feature vectors;

the double-tower model construction unit is used for generating a fusion training feature vector corresponding to the basic training feature vector and a fusion convergence feature vector corresponding to the basic convergence feature vector by utilizing the basic double-tower model, and generating a target double-tower model by utilizing the difference relation between the fusion training feature vector and the fusion convergence feature vector to reversely propagate and train the basic double-tower model.

Optionally, the data preprocessing unit may specifically be configured to: normalizing continuous feature values corresponding to basic training features in each basic feature sample to generate a first continuous feature, and generating continuous feature values corresponding to basic convergence features to generate a second continuous feature; nesting discrete feature values corresponding to basic training features in each basic feature sample to generate first discrete features, and discrete feature values corresponding to basic convergence features to generate second discrete features; and combining the first continuous feature with the first discrete feature to generate a target training feature, and combining the second continuous feature with the second discrete feature to generate a target convergence feature to obtain a target sample set containing the target training feature and the target convergence feature.

Optionally, the first feature vector generating unit may specifically be configured to: sequentially accumulating and processing first discrete features in the target training features to obtain a first accumulated vector; splicing the first accumulated vector and the first continuous feature according to a preset splicing strategy to generate a basic training feature vector;

the second feature vector generating unit may specifically be configured to: sequentially accumulating and processing second discrete features in the target convergence features to obtain a second accumulated vector; and splicing the second accumulated vector and the second continuous feature according to a preset splicing strategy to generate a basic convergence feature vector.

Optionally, the double-tower model building unit may be specifically configured to: minimizing the difference value between the fusion training feature vector and the fusion convergence feature vector according to a mean square error method to generate a target loss function; and training the basic double-tower model according to the target loss function back propagation to generate a target double-tower model.

Optionally, the super parameter determining module 550 may specifically be configured to: acquiring target convergence characteristics and target model parameters corresponding to a target pre-training large model, traversing and processing basic super parameter ranges corresponding to basic super parameters according to a target double-tower model to obtain preset training characteristic vectors corresponding to basic super parameter combinations corresponding to target model parameters and preset convergence characteristic vectors corresponding to target convergence characteristics; calculating a basic cosine similarity value between a preset training feature vector and a preset convergence feature vector, and determining a target cosine similarity value with the largest value in the basic cosine similarity value; and determining a target hyper-parameter corresponding to the target pre-training large model according to the target training feature vector corresponding to the target cosine similarity value.

Optionally, the training sample generating module 520 may specifically be configured to: training and processing each target pre-training small model in the pre-training small model set according to the preset super-parameter combination to generate a training result; generating a convergence result corresponding to the training result according to a preset judgment strategy; and combining training results and convergence results corresponding to the target pre-training small models to generate a target training sample set.

Optionally, the super parameter determining device based on the dual-tower model may further include: and the model training module is used for acquiring the target model parameters corresponding to the target pre-training large model after the target super parameters corresponding to the target pre-training large model are determined, and training and processing the target pre-training large model according to the target super parameters.

The super parameter determining device based on the double-tower model provided by the embodiment of the invention can execute the super parameter determining method based on the double-tower model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Example six

Fig. 11 is a schematic structural diagram of a semantic conversion device according to a sixth embodiment of the present invention. As shown in fig. 11, the apparatus includes: a data acquisition module 610 and a result generation module 620;

the data obtaining module 610 is configured to obtain a text to be converted, and input the text to be converted to a target semantic conversion model; the target semantic conversion model is a target pre-training large model, and is obtained through training by the double-tower model-based hyper-parameter determination method according to any embodiment of the invention;

the result generating module 620 is configured to determine, according to the target semantic conversion model, a text feature corresponding to the text to be converted, and generate a corresponding target semantic conversion result based on the text feature.

Example seven

Fig. 12 shows a schematic diagram of an electronic device 710 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 12, the electronic device 710 includes at least one processor 720, and a memory, such as a Read Only Memory (ROM) 730, a Random Access Memory (RAM) 740, etc., communicatively coupled to the at least one processor 720, wherein the memory stores computer programs executable by the at least one processor, and the processor 720 may perform various suitable actions and processes according to the computer programs stored in the Read Only Memory (ROM) 730 or the computer programs loaded from the storage unit 790 into the Random Access Memory (RAM) 740. In RAM740, various programs and data required for the operation of electronic device 710 may also be stored. The processor 720, the ROM 730, and the RAM740 are connected to each other by a bus 750. An input/output (I/O) interface 760 is also connected to bus 750.

Various components in the electronic device 710 are connected to the I/O interface 760, including: an input unit 770 such as a keyboard, a mouse, etc.; an output unit 780 such as various types of displays, speakers, and the like; a storage unit 790 such as a magnetic disk, an optical disk, or the like; and a communication unit 7100 such as a network card, modem, wireless communication transceiver, etc. The communication unit 7100 allows the electronic device 710 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks.

Processor 720 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of processor 720 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. Processor 720 performs the various methods and processes described above, such as a hyper-parametric determination method or a semantic conversion method based on a two-tower model.

The super parameter determining method based on the double-tower model comprises the following steps:

The semantic conversion method comprises the following steps:

In some embodiments, the two-tower model-based hyper-parameter determination method or semantic conversion method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 790. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 710 via the ROM 730 and/or the communication unit 7100. When the computer program is loaded into RAM 740 and executed by processor 720, one or more steps of the above-described two-tower model-based hyper-parameter determination method or semantic conversion method may be performed. Alternatively, in other embodiments, processor 720 may be configured to perform a two-tower model-based hyper-parameter determination method or a semantic conversion method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A super parameter determination method based on a double-tower model, wherein the method is applied to a super parameter determination scene of a pre-trained large model, the pre-trained large model being carried in target hardware equipment, the method comprising:

2. The method of claim 1, wherein the training the base twin tower model based on the back propagation of the set of base feature samples to generate a target twin tower model comprises:

Preprocessing the basic characteristic sample set by data to obtain a target sample set; wherein each target sample in the target sample set comprises a target training feature and a target convergence feature;

acquiring target training features corresponding to target samples, and splicing and processing first discrete features and first continuous features in the target training features to generate basic training feature vectors;

acquiring a target convergence feature corresponding to a target sample, and splicing and processing a second discrete feature and a second continuous feature in the target convergence feature to generate a basic convergence feature vector;

generating a fusion training feature vector corresponding to the basic training feature vector and a fusion convergence feature vector corresponding to the basic convergence feature vector by using the basic double-tower model, and reversely propagating and training the basic double-tower model by using the difference relation between the fusion training feature vector and the fusion convergence feature vector to generate a target double-tower model.

3. The method of claim 2, wherein the preprocessing of the set of base feature samples by the data to obtain a set of target samples comprises:

normalizing continuous feature values corresponding to basic training features in each basic feature sample to generate a first continuous feature, and generating continuous feature values corresponding to basic convergence features to generate a second continuous feature;

Nesting discrete feature values corresponding to basic training features in each basic feature sample to generate first discrete features, and discrete feature values corresponding to basic convergence features to generate second discrete features;

and combining the first continuous feature with the first discrete feature to generate a target training feature, and combining the second continuous feature with the second discrete feature to generate a target convergence feature to obtain a target sample set containing the target training feature and the target convergence feature.

4. The method according to claim 2, wherein the obtaining the target training feature corresponding to the target sample, and concatenating the first discrete feature and the first continuous feature in the target training feature, generates a basic training feature vector, includes:

sequentially accumulating and processing first discrete features in the target training features to obtain a first accumulated vector;

splicing the first accumulated vector and the first continuous feature according to a preset splicing strategy to generate a basic training feature vector;

the obtaining the target convergence feature corresponding to the target sample, and splicing and processing the second discrete feature and the second continuous feature in the target convergence feature to generate a basic convergence feature vector, which comprises the following steps:

Sequentially accumulating and processing second discrete features in the target convergence features to obtain a second accumulated vector;

and splicing the second accumulated vector and the second continuous feature according to a preset splicing strategy to generate a basic convergence feature vector.

5. The method of claim 2, wherein the training the base dual-tower model using the disparity relationship between the fusion training feature vector and the fusion converging feature vector back propagation generates a target dual-tower model, comprising:

minimizing the difference value between the fusion training feature vector and the fusion convergence feature vector according to a mean square error method to generate a target loss function;

and training the basic double-tower model according to the target loss function back propagation to generate a target double-tower model.

6. The method of claim 1, wherein said traversing the basic hyper-parameter range corresponding to each basic hyper-parameter according to a target twin tower model to determine a target hyper-parameter corresponding to the target pre-trained large model comprises:

acquiring target convergence characteristics and target model parameters corresponding to a target pre-training large model, traversing and processing basic super parameter ranges corresponding to basic super parameters according to a target double-tower model to obtain preset training characteristic vectors corresponding to basic super parameter combinations corresponding to target model parameters and preset convergence characteristic vectors corresponding to target convergence characteristics;

Calculating a basic cosine similarity value between a preset training feature vector and a preset convergence feature vector, and determining a target cosine similarity value with the largest value in the basic cosine similarity value;

and determining a target hyper-parameter corresponding to the target pre-training large model according to the target training feature vector corresponding to the target cosine similarity value.

7. The method of claim 1, wherein the training and determining to process each target pre-training small model in the set of pre-training small models according to the preset hyper-parameter combinations, generating a set of target training samples, comprises:

training and processing each target pre-training small model in the pre-training small model set according to the preset super-parameter combination to generate a training result;

generating a convergence result corresponding to the training result according to a preset judgment strategy;

and combining training results and convergence results corresponding to the target pre-training small models to generate a target training sample set.

8. The method of claim 1, further comprising, after said determining the target hyper-parameters corresponding to the target pre-trained large model:

and acquiring target model parameters corresponding to the target pre-training large model, and training and processing the target pre-training large model according to the target super-parameters.

9. A method of semantic conversion, the method comprising:

obtaining a text to be converted, and inputting the text to be converted into a target semantic conversion model; wherein the target semantic conversion model is a target pre-training large model, and is obtained by training by the method of any one of claims 1-8;

10. A super parameter determining device based on a double-tower model, wherein the device is applied to a super parameter determining scene of a pre-training large model, the pre-training large model is carried in target hardware equipment, and the device comprises:

11. A semantic conversion device, the method comprising:

the data acquisition module is used for acquiring a text to be converted and inputting the text to be converted into the target semantic conversion model; wherein the target semantic conversion model is a target pre-training large model, and is obtained by training by the method of any one of claims 1-8;

12. An electronic device, the electronic device comprising:

at least one processor; and

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the dual tower model based hyper-parameter determination method of any one of claims 1-8 or to perform the semantic conversion method of claim 9.

13. A computer readable storage medium, characterized in that it stores computer instructions for causing a processor to implement the two-tower model based hyper-parameter determination method according to any of the claims 1-8 or to perform the semantic conversion method according to claim 9 when executed.