CN116756293A - Model training method and device, storage medium and electronic equipment - Google Patents

Model training method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116756293A
CN116756293A CN202311010104.6A CN202311010104A CN116756293A CN 116756293 A CN116756293 A CN 116756293A CN 202311010104 A CN202311010104 A CN 202311010104A CN 116756293 A CN116756293 A CN 116756293A
Authority
CN
China
Prior art keywords
text
training
model
features
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311010104.6A
Other languages
Chinese (zh)
Inventor
程稳
李勇
刘懿
黄章敏
吕波
常璟飞
陈�光
曾令仿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311010104.6A priority Critical patent/CN116756293A/en
Publication of CN116756293A publication Critical patent/CN116756293A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a method, a device, a storage medium and electronic equipment for model training, wherein an iterative training process of a text dialogue generation model is divided into a plurality of training stages in advance, each text feature for completing the training stage is obtained for each training stage of the text dialogue generation model and is used as a current text feature, each current text feature is clustered according to each current text feature and the preset precision requirement of the training stage, clustered text features are obtained, sparse processing is carried out on the clustered text features, sparse text features are obtained, and training of the training stage is carried out according to the sparse text features. According to the method, a one-time iterative process of the model is divided into a plurality of stages, text features are clustered according to the precision requirement of each stage, and the text features after clustering are sparse, so that the text dialogue generating model is trained according to the text features after sparse clustering.

Description

Model training method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computers, and in particular, to a method and apparatus for model training, a storage medium, and an electronic device.
Background
With the development of artificial intelligence (Artificial Intelligent, AI) models, AI models are applied in various fields. In the field of natural language processing, a text dialogue generation model is an AI model, and how to enable the text dialogue generation model to generate a reply sentence with natural fluency and accuracy of content according to text input by a user so as to realize dialogue between the model and the user is a problem to be solved.
Based on this, the present specification provides a method of model training.
Disclosure of Invention
The present disclosure provides a method, apparatus, storage medium and electronic device for model training, so as to partially solve the foregoing problems in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a method for training a model, which divides an iterative training process of generating a model by a text dialogue into a plurality of training phases in advance, including:
for each training stage of the text dialogue generation model, acquiring each text feature used for completing the training stage as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information;
clustering each current text feature according to each current text feature and a preset accuracy requirement of the training stage to obtain clustered text features;
Performing sparse processing on the clustered text features to obtain sparse text features, wherein the number of the sparse text features is not greater than that of the current text features;
according to the sparse text characteristics, training in the training stage is executed;
and responding to the dialogue text sent by the user, inputting the dialogue text into the trained text dialogue generating model, so that the trained text dialogue generating model outputs the reply content of the dialogue text.
Optionally, clustering each current text feature according to each current text feature and a preset accuracy requirement of the training stage to obtain clustered text features, which specifically includes:
for each current text feature, mapping the current text feature in the same space by adopting a preset method to obtain a standard text feature;
and clustering the standard text features according to the preset accuracy requirement of the training stage to obtain clustered text features.
Optionally, the current text feature is mapped in the same standard space by adopting a preset method, and before the standard text feature is obtained, the method further comprises:
acquiring resources for quantifying the current text feature;
And according to the preset precision requirement of the training stage, compressing the bit width of the resource to obtain compressed text features, and taking the compressed text features as the current text features.
Optionally, performing sparse processing on the clustered text features to obtain sparse text features, which specifically includes:
judging whether the number of the text features in the clustered clusters in which the clustered text features are located is larger than a preset number threshold;
if yes, carrying out sparse processing on the clustered text features to obtain sparse text features;
if not, setting the numerical value of the clustered text feature to zero, and taking the numerical value as the sparse text feature.
Optionally, the method further comprises:
determining an output result of the text dialogue generating model at present as a first output result aiming at each iteration training process, wherein the output result comprises a reply sentence of an input text;
judging whether the first output result is matched with the preset model precision or not according to the first output result and the preset model precision;
if yes, stopping training;
if not, generating compensation data, and compensating the model precision according to the compensation data.
Optionally, according to the compensation data, performing model precision compensation on the text dialogue generating model specifically includes:
and correcting text characteristic parameters of the text dialogue generating model according to the compensation data.
The present specification provides a device for training a model, which divides an iterative training process of generating a model by a text dialogue into a plurality of training phases in advance, including:
the text feature acquisition module is used for acquiring each text feature used for completing each training stage of the text dialogue generation model as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information;
the clustering module is used for clustering each current text feature according to each current text feature and the preset accuracy requirement of the training stage to obtain clustered text features;
the sparse module is used for carrying out sparse processing on the clustered text features to obtain sparse text features, wherein the number of the sparse text features is not greater than that of the current text features;
the training module is used for executing training of the training stage according to the sparse text characteristics;
And the application module is used for responding to the dialogue text sent by the user, inputting the dialogue text into the trained text dialogue generating model, so that the trained text dialogue generating model outputs the reply content of the dialogue text.
Optionally, the clustering module is specifically configured to map, for each current text feature, the current text feature in the same space by using a preset method, so as to obtain a standard text feature; and clustering the standard text features according to the preset accuracy requirement of the training stage to obtain clustered text features.
Optionally, the clustering module is specifically configured to map the current text feature to the same standard space by using a preset method, and acquire a resource for quantifying the current text feature before obtaining the standard text feature; and according to the preset precision requirement of the training stage, compressing the bit width of the resource to obtain compressed text features, and taking the compressed text features as the current text features.
Optionally, the sparse module is specifically configured to determine whether the number of text features in the clustered clusters in which the clustered text features are located is greater than a preset number threshold; if yes, carrying out sparse processing on the clustered text features to obtain sparse text features; if not, setting the numerical value of the clustered text feature to zero, and taking the numerical value as the sparse text feature.
Optionally, the apparatus further comprises:
the compensation module is used for determining an output result of the text dialogue generating model at present as a first output result aiming at each iteration training process, wherein the output result comprises a reply sentence of an input text; judging whether the first output result is matched with the preset model precision or not according to the first output result and the preset model precision; if yes, stopping training; and if not, generating compensation data, and compensating the model precision of the text dialogue generating model according to the compensation data.
Optionally, the compensation module is specifically configured to correct the text feature parameter of the text dialog generating model according to the compensation data.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model training described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of model training as described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
according to the method for training the model, a one-time iterative training process of a text dialogue generation model is divided into a plurality of training stages in advance, each text feature used for completing the training stage is obtained for each training stage of the text dialogue generation model and serves as a current text feature, clustering is conducted on each current text feature according to each current text feature and the preset precision requirement of the training stage, clustered text features are obtained, sparse processing is conducted on the clustered text features, sparse text features are obtained, and training of the training stage is conducted according to the sparse text features.
According to the method, the text dialogue generating model is divided into a plurality of stages through one iteration training process, the text features are clustered according to the precision requirement of each stage, the clustered text features are sparse, the model is trained according to the clustered text features, and the text dialogue generating model capable of generating reply sentences with natural fluency and high accuracy is obtained.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow chart of a method of model training provided in the present specification;
FIG. 2 is a schematic flow chart of model training provided in the present specification;
FIG. 3 is a schematic diagram of a model training apparatus provided in the present specification;
fig. 4 is a schematic structural diagram of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for model training provided in the present specification, including the following steps:
s100: and aiming at each training stage of the text dialogue generating model, acquiring each text feature used for completing the training stage as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information.
With the diversification of the functions that can be achieved by the model, the computing resources and training time required for model training are gradually increased, and especially in the field of natural language processing, a large amount of input texts are required, and the natural language processing model is trained to process the input texts. However, how to train a text dialogue generation model so that the text dialogue generation model can generate reply sentences with natural fluency of contents according to text input by a user is a problem to be solved. Accordingly, the present specification provides a method of model training. The execution subject of the present specification may be a server for model training, or may be another electronic device having a computing function. For convenience of explanation, a model training method provided in the present specification will be explained below with only a server as an execution subject. The model may be a generative model, e.g., chatGPT, visualGPT, phenaki, etc., and the description describes a text dialog generating model in the generative model.
FIG. 2 is a schematic flow chart of model training provided in the present specification, and as shown in FIG. 2, the model training may be divided into a plurality of stages, including: data collection and storage, model selection and loading, parameter initialization, forward propagation, iterative optimization, loss function calculation, backward propagation, model evaluation, and the like. In order to accelerate model training according to the precision requirements of different stages, an iterative training process of a text dialogue generating model can be divided into a plurality of training stages, such as forward propagation, iterative optimization and backward propagation, aiming at the iterative training process of the model. The accuracy requirement refers to the accuracy required to be achieved by a training stage in one iteration training process of the model. It should be noted that, the one-time iterative training process of the model does not include data collection, the data collection belongs to the early preparation of model training, and when the model generates a model for a text dialogue in a generated model, the data is text and can be obtained from a large-scale text corpus, dialogue data, a scientific paper and other published text data sets.
In one or more embodiments of the present disclosure, since the server trains the text dialog generating model according to the accuracy requirements and the text feature parameters of each stage, and accelerates the training of the text dialog generating model, before accelerating the training of the text dialog generating model, the server needs to first acquire, for each training stage of the text dialog generating model, each text feature in the text dialog generating model for completing the training stage as a current text feature. In general, the corresponding parameters may be determined in accordance with the function to be implemented by the model prior to training the model. For example, the model is used to identify the class of objects in the image, then the corresponding parameters include: quantized image samples, quantized image class labels, similarity of samples to image class labels, loss functions, and so forth. Since the model generates a model for a text dialog, text features that may be determined for completing training include word vectors, text length, subword information, vocabulary size, text semantic information, emotion information, etc. And aiming at each training stage of the text dialogue generation model, the server can acquire each text feature corresponding to the training stage from the determined text feature parameters as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information. The subword information mainly refers to related information for further decomposing words in a vocabulary into smaller subword units in text processing, and comprises a subword list: according to the list formed by all sub-words obtained by corpus statistics, the sub-word frequency is as follows: frequency statistics of each sub-word in the corpus, minimum sub-word frequency: a frequency threshold for filtering low frequency subwords, etc. It should be noted that, depending on the training phase, the current text features may also be different.
S102: and clustering each current text feature according to each current text feature and the preset precision requirement of the training stage to obtain clustered text features.
In order to reduce the calculation amount of each training stage and reduce the required calculation resources, so as to accelerate the training of the text dialogue generation model, the server can cluster each current text feature according to each current text feature and the preset precision requirement of the training stage to obtain clustered text features, and then accelerate the training of the text dialogue generation model according to the clustered text features.
Specifically, for different training phases, the accuracy requirement of the training phase can be preset, and the accuracy requirement can be set according to previous experience. And mapping the current text features into the same space by adopting a preset method aiming at each current text feature to obtain standard text features, and clustering the standard text features according to the preset precision requirement of the training stage to obtain clustered text features. The predetermined method may be principal component analysis (Principal Component Analysis, PCA) or other mapping method, which is not limited in this specification. For a text dialog generation model, standard text features are mapped with current text features, the types of the standard text features are consistent with the types of the current text features, but values are standardized, for example, the current text features are word vectors, the values are (0.1,0.3,0.5), the standard text features are still word vectors, but the values are changed to (1, 3, 5).
When the standard text features are clustered, as the accuracy requirements of different training stages may be different, the clustering degree required by the different training stages may be different, and generally, the higher the accuracy requirement is, the weaker the clustering degree is. Therefore, the server can adopt different clustering methods to cluster the standard text features of different training stages, wherein the clustering methods comprise a K-means clustering algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, DBSCAN), a hierarchical clustering algorithm, a spectral clustering algorithm and the like, and the specification is not limited to the clustering methods. The clustering degree refers to the ratio of the number of text feature categories before clustering to the number of text features of the category after clustering, for example, the number of text feature categories before clustering is 10000, the number of text feature categories after clustering is 2000, the clustering degree is 5, and the larger the numerical value of the clustering degree is, the larger the clustering degree is indicated.
It should be noted that, the clustering indexes adopted by different clustering methods are different, for example, the clustering index adopted by the K-means clustering algorithm is an average error square sum (Sum of Squared Error, SSE), the SSR defines a distance square sum of each text feature and the clustering center to which the text feature belongs, and when the standard text feature is clustered, the objective is to minimize the SSE, that is, find the optimal clustering center, so that the distance square sum of all the text features to the clustering center to which the text feature belongs is minimum. Clustering the standard text features of the text dialog generation model does not change the type of standard text features, but is given a clustered class label. The category labels may be numbers, such as being categorized into category 1, category 2, etc., or may be a specific subject term, such as "sports," "entertainment," etc.
In addition, since the text features required by different training stages of an iterative process may have a dependency relationship, when the standard text features of different stages are clustered, the clustering is also required according to the position sequence of the training stages divided in the iterative training process of the model. For example, if an iterative process of the model is divided into two stages of forward propagation and backward propagation, before clustering the backward propagation standard text features, the forward propagation standard text feature clustering needs to be completed.
S104: and carrying out sparse processing on the clustered text features to obtain sparse text features, wherein the number of the sparse text features is not greater than that of the current text features.
Due to different clustering methods, the obtained clustering results are different. For example, the existing text feature A, B, C, D, text feature A, B belongs to the first class and text feature C, D belongs to the second class if clustering method 1 is used, text feature A, C, D belongs to the first class and text feature B belongs to the second class if clustering method 2 is used. Then, the server may process the clustered text features according to different clustering results.
Specifically, the server judges whether the number of text features in the clustered clusters in which the clustered text features are located is larger than a preset number threshold, if so, sparse processing is performed on the clustered text features to obtain sparse text features, if not, the numerical value of the clustered text features is set to zero to serve as the sparse text features, and the number of the sparse text features is not larger than the number of the current text features. The sparse processing comprises the steps of randomly or selecting some clustered text features and setting the numerical value of the text features to zero.
For example, assuming that the preset number threshold is 1, if the number of text features in the cluster is greater than 1, the numerical values of the text features in the cluster are randomly zeroed, and if the number of text features in the cluster is not greater than 1, the numerical values of all the text features in the cluster are zeroed.
S106: and executing training of the training stage according to the sparse text characteristics.
S108: and responding to the dialogue text sent by the user, inputting the dialogue text into the trained text dialogue generating model, so that the trained text dialogue generating model outputs the reply content of the dialogue text.
Based on the method for training the model shown in fig. 1, the method divides a text dialogue generation model into a plurality of stages in one iteration training process, clusters text features according to the precision requirement of each stage, and sparsely clusters the clustered text features to train the model according to the clustered text features, so as to obtain the text dialogue generation model capable of generating a fluent natural and highly accurate reply sentence. In addition, because the text features are clustered and sparse, the calculation in the training process of the text dialogue generating model is reduced, and the training speed of the text dialogue generating model is improved.
For step S100, the server may further divide the training of the text dialog generating model into a pre-training stage and a post-training stage according to the iteration number of the text dialog generating model, and as the iteration number increases, the accuracy requirement is higher.
After determining the current text feature, the server may train the training phase based on the current text feature and reacquire each text feature used to complete the training phase as the current text feature. That is, before the current text feature is thinned, it is necessary to iterate the text dialogue generating model once, then perform the thinning operation on the text feature of each training stage after the iteration, and finally iterate the text dialogue generating model according to the thinned text feature, so as to implement the iterative process before the sparse text feature is iterated, and then continue to iterate by using the thinned text feature, and then sparse text feature, thus the process is repeated.
For step S102, because the data volume required for training the text dialog generation model is large, and the training speed of the text dialog generation model is slow, in order to further accelerate the text dialog generation model training, the server may further obtain a resource for quantizing the current text feature before mapping the current text feature in the same standard space by using a preset method to obtain the standard text feature, compress the bit width of the resource according to the preset accuracy requirement of the training stage, obtain a compressed text feature, and use the compressed text feature as the current text feature. That is, when storing a current text feature, the bit width of the current text feature is compressed. For example, the bit width of the current text feature a is 16 bits, and the bit width of the compressed current text feature is 8 bits. Wherein the resources for quantifying the current text feature include a list of data for representing the current text feature, other related text features for determining the current text feature, and the like, which is not limited in this specification. For a text dialog generation model, the list of data for representing the current text feature includes vector values for word vectors, text length values for text, and the like. In addition, the subword information includes a subword list, a subword frequency, a minimum subword frequency, and the like, that is, the subword list, the subword frequency, the minimum subword frequency are other relevant text features for determining the current text feature (subword information).
For step S104, since the server clusters the standard text features of the training stage according to the preset accuracy requirement of the training stage, if the accuracy requirement of the training stage is higher, the server may choose not to cluster the standard text features of the training stage, or reduce the clustering degree of the training stage, so as to ensure the accuracy of the text dialogue generation model training.
In one or more embodiments of the present disclosure, since only a small number of text features may be included in a cluster during clustering, the values of the text features in the cluster are zeroed for ease of calculation. However, since the server does not determine the influence degree of the text feature in the cluster on model training, for example, after the numerical value of the text feature is set to zero, the accuracy of the whole text dialogue generating model is reduced from 85% to 60%, and the influence degree of the text feature on the text dialogue generating model training is very large. Thus, the text dialog generation model can also be compensated for accuracy.
Specifically, in order to avoid excessive influence of the zeroed text feature on the training of the text dialog generation model, for each iterative training process, an output result of the current text dialog generation model is determined, and the output result is used as a first output result, and comprises a reply sentence for the input text, a translation result for the input text and the like. Judging whether the first output result is matched with the preset model precision according to the first output result and the preset model precision, if so, stopping training, otherwise, generating compensation data, and compensating the model precision according to the compensation data. I.e. the text feature parameters of the text dialog generation model are modified on the basis of the compensation data.
When judging whether the first output result is matched with the preset model precision according to the first output result and the preset model precision, judging whether the first output result is matched with the preset model precision according to the grammar correctness, semantic relativity, logic consistency, knowledge correctness and other aspects of the reply statement. The grammar correctness includes reply statement smoothness and compliance with grammar rules. Semantic relevance includes whether the reply sentence is semantically highly relevant to the input sentence, replying to the intent and key information points of the input sentence. Logical consistency includes whether the logic of reply statements is consistent, with no paradox. Knowledge correctness includes whether the factual knowledge in the sentence is accurate or not, and no obvious error information exists.
When the model precision compensation is carried out on the text dialogue generation model, the server can record the text characteristics with the numerical values set to zero and randomly generate compensation data, and when the training in the training stage is carried out according to the sparse text characteristics, the recorded text characteristics with the numerical values set to zero are corrected according to the randomly generated compensation data so as to carry out the model precision compensation on the model. The type of the compensating text feature is consistent with the type of the current text feature, and if the current text feature is sub-word information, the compensating information can also be sub-word information.
However, if one text feature is a clustering type too many, only the text feature with zero value needs to be recorded and more storage space is needed, the text feature with zero value can be not recorded, and the text feature with zero value can be modified at random according to the randomly generated compensation data so as to carry out model precision compensation on the text dialogue generation model.
That is, by the model training method provided by the specification, not only the text dialogue generating model can be trained, but also the consumption of computing resources and storage resources in the text dialogue generating model training process can be greatly reduced, the model accuracy is not influenced, and the model training speed is remarkably improved.
The model training method provided by the specification is suitable for all the generated models, the generated models can not only input texts and output reply sentences, but also generate images, videos and audios as well as input images and videos according to the input texts, and the text is acquired, so that the application is wide. For example, for a streamfusion model, text may be input, a three-dimensional model corresponding to the text may be output, and for visual gtp, an image may be input, text corresponding to the image may be output.
The foregoing is a method of one or more implementations of the present disclosure, and based on the same concept, the present disclosure further provides a corresponding apparatus for model training, as shown in fig. 3.
Fig. 3 is a schematic diagram of a device for training a model provided in the present specification, where an iterative training process of generating a model by using a text dialog is divided into a plurality of training phases in advance, including:
a text feature obtaining module 300, configured to obtain, for each training stage of the text dialog generation model, each text feature for completing the training stage as a current text feature, where the current text feature includes a word vector, a text length, and subword information;
the clustering module 302 is configured to cluster each current text feature according to each current text feature and a preset accuracy requirement of the training stage, so as to obtain clustered text features;
the sparse module 304 is configured to perform sparse processing on the clustered text features to obtain sparse text features, where the number of the sparse text features is not greater than the number of the current text features;
a training module 306, configured to perform training in the training stage according to the sparse text feature;
An application module 308, configured to input the dialog text into the trained text dialog generation model in response to the dialog text sent by the user, so that the trained text dialog generation model outputs reply content of the dialog text.
Optionally, the clustering module 302 is specifically configured to map, for each current text feature, the current text feature to the same space by using a preset method, so as to obtain a standard text feature; and clustering the standard text features according to the preset accuracy requirement of the training stage to obtain clustered text features.
Optionally, the clustering module 302 is specifically configured to map the current text feature to the same standard space by using a preset method, and acquire a resource for quantifying the current text feature before obtaining the standard text feature; and according to the preset precision requirement of the training stage, compressing the bit width of the resource to obtain compressed text features, and taking the compressed text features as the current text features.
Optionally, the sparse module 304 is specifically configured to determine whether the number of text features in the clustered clusters in which the clustered text features are located is greater than a preset number threshold; if yes, carrying out sparse processing on the clustered text features to obtain sparse text features; if not, setting the numerical value of the clustered text feature to zero, and taking the numerical value as the sparse text feature.
Optionally, the apparatus further comprises:
a compensation module 310, configured to determine, for each iterative training process, an output result of the text dialog generating model currently as a first output result, where the output result includes a reply sentence to the input text; judging whether the first output result is matched with the preset model precision or not according to the first output result and the preset model precision; if yes, stopping training; and if not, generating compensation data, and compensating the model precision of the text dialogue generating model according to the compensation data.
Optionally, the compensation module 310 is specifically configured to modify the text feature parameter of the text dialog generation model according to the compensation data.
The present specification also provides a computer readable storage medium having stored thereon a computer program operable to perform a method of model training as provided in fig. 1 above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 4, which corresponds to fig. 1. As shown in fig. 4, at the hardware level, the electronic device includes an AI accelerator, a processor, an internal bus, a network interface, a memory, and a nonvolatile storage, and may of course include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement a model training method as described above with respect to fig. 1.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. A method of model training, characterized in that an iterative training process of a text dialog generation model is divided into a number of training phases in advance, the method comprising:
for each training stage of the text dialogue generation model, acquiring each text feature used for completing the training stage as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information;
clustering each current text feature according to each current text feature and a preset accuracy requirement of the training stage to obtain clustered text features;
performing sparse processing on the clustered text features to obtain sparse text features, wherein the number of the sparse text features is not greater than that of the current text features;
according to the sparse text characteristics, training in the training stage is executed;
and responding to the dialogue text sent by the user, inputting the dialogue text into the trained text dialogue generating model, so that the trained text dialogue generating model outputs the reply content of the dialogue text.
2. The method of claim 1, wherein clustering each current text feature according to each current text feature and a preset accuracy requirement of the training phase to obtain clustered text features, specifically comprises:
For each current text feature, mapping the current text feature in the same space by adopting a preset method to obtain a standard text feature;
and clustering the standard text features according to the preset accuracy requirement of the training stage to obtain clustered text features.
3. The method of claim 2, wherein the current text feature is mapped to the same standard space by a predetermined method, and before the standard text feature is obtained, the method further comprises:
acquiring resources for quantifying the current text feature;
and according to the preset precision requirement of the training stage, compressing the bit width of the resource to obtain compressed text features, and taking the compressed text features as the current text features.
4. The method of claim 1, wherein the performing sparse processing on the clustered text features to obtain sparse text features specifically comprises:
judging whether the number of the text features in the clustered clusters in which the clustered text features are located is larger than a preset number threshold;
if yes, carrying out sparse processing on the clustered text features to obtain sparse text features;
If not, setting the numerical value of the clustered text feature to zero, and taking the numerical value as the sparse text feature.
5. The method of claim 1, wherein the method further comprises:
determining an output result of the text dialogue generating model at present as a first output result aiming at each iteration training process, wherein the output result comprises a reply sentence of an input text;
judging whether the first output result is matched with the preset model precision or not according to the first output result and the preset model precision;
if yes, stopping training;
and if not, generating compensation data, and compensating the model precision of the text dialogue generating model according to the compensation data.
6. The method of claim 5, wherein compensating the text dialog generation model for model accuracy based on the compensation data, specifically comprises:
and correcting text characteristic parameters of the text dialogue generating model according to the compensation data.
7. An apparatus for model training, wherein an iterative training process for generating a model from a text dialog is divided into a plurality of training phases in advance, the apparatus comprising:
The text feature acquisition module is used for acquiring each text feature used for completing each training stage of the text dialogue generation model as a current text feature, wherein the current text feature comprises word vectors, text lengths and subword information;
the clustering module is used for clustering each current text feature according to each current text feature and the preset accuracy requirement of the training stage to obtain clustered text features;
the sparse module is used for carrying out sparse processing on the clustered text features to obtain sparse text features, wherein the number of the sparse text features is not greater than that of the current text features;
the training module is used for executing training of the training stage according to the sparse text characteristics;
and the application module is used for responding to the dialogue text sent by the user, inputting the dialogue text into the trained text dialogue generating model, so that the trained text dialogue generating model outputs the reply content of the dialogue text.
8. The apparatus of claim 7, wherein the clustering module is specifically configured to map, for each current text feature, the current text feature in a same space by using a preset method to obtain a standard text feature; and clustering the standard text features according to the preset accuracy requirement of the training stage to obtain clustered text features.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-6 when executing the program.
CN202311010104.6A 2023-08-11 2023-08-11 Model training method and device, storage medium and electronic equipment Pending CN116756293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311010104.6A CN116756293A (en) 2023-08-11 2023-08-11 Model training method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311010104.6A CN116756293A (en) 2023-08-11 2023-08-11 Model training method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116756293A true CN116756293A (en) 2023-09-15

Family

ID=87959318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311010104.6A Pending CN116756293A (en) 2023-08-11 2023-08-11 Model training method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116756293A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021189974A1 (en) * 2020-10-21 2021-09-30 平安科技(深圳)有限公司 Model training method and apparatus, text classification method and apparatus, computer device and medium
CN114741517A (en) * 2022-05-09 2022-07-12 北京百度网讯科技有限公司 Training method, device, equipment and medium of text classification model and text classification method, device and equipment
CN114911929A (en) * 2022-04-11 2022-08-16 北京捷通华声科技股份有限公司 Classification model training method, text mining equipment and storage medium
JP2022184827A (en) * 2021-06-01 2022-12-13 株式会社Nttドコモ Text processing apparatus, method, device, and computer-readable storage medium
CN116304012A (en) * 2022-12-02 2023-06-23 支付宝(杭州)信息技术有限公司 Large-scale text clustering method and device
CN116522143A (en) * 2023-05-08 2023-08-01 深圳市大数据研究院 Model training method, clustering method, equipment and medium
CN116521380A (en) * 2023-07-05 2023-08-01 之江实验室 Resource self-adaptive collaborative model training acceleration method, device and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021189974A1 (en) * 2020-10-21 2021-09-30 平安科技(深圳)有限公司 Model training method and apparatus, text classification method and apparatus, computer device and medium
JP2022184827A (en) * 2021-06-01 2022-12-13 株式会社Nttドコモ Text processing apparatus, method, device, and computer-readable storage medium
CN114911929A (en) * 2022-04-11 2022-08-16 北京捷通华声科技股份有限公司 Classification model training method, text mining equipment and storage medium
CN114741517A (en) * 2022-05-09 2022-07-12 北京百度网讯科技有限公司 Training method, device, equipment and medium of text classification model and text classification method, device and equipment
CN116304012A (en) * 2022-12-02 2023-06-23 支付宝(杭州)信息技术有限公司 Large-scale text clustering method and device
CN116522143A (en) * 2023-05-08 2023-08-01 深圳市大数据研究院 Model training method, clustering method, equipment and medium
CN116521380A (en) * 2023-07-05 2023-08-01 之江实验室 Resource self-adaptive collaborative model training acceleration method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张正禄 等: "工程的变形监测分析与预报", 测绘出版社, pages: 166 *
朱红灿;唐毅;: "一种基于DASOM的两阶段中文文本聚类方法", 情报杂志, no. 09 *

Similar Documents

Publication Publication Date Title
CN111881973A (en) Sample selection method and device, storage medium and electronic equipment
CN113221555B (en) Keyword recognition method, device and equipment based on multitasking model
CN112417093B (en) Model training method and device
CN112308113A (en) Target identification method, device and medium based on semi-supervision
CN117332282B (en) Knowledge graph-based event matching method and device
CN116630480B (en) Interactive text-driven image editing method and device and electronic equipment
CN111091001B (en) Method, device and equipment for generating word vector of word
CN116150380B (en) Text matching method, device, storage medium and equipment
CN116127328B (en) Training method, training device, training medium and training equipment for dialogue state recognition model
CN117113174A (en) Model training method and device, storage medium and electronic equipment
CN116756293A (en) Model training method and device, storage medium and electronic equipment
CN109065016B (en) Speech synthesis method, speech synthesis device, electronic equipment and non-transient computer storage medium
CN117973544B (en) Text unit reasoning method device based on semantic distance, storage medium and terminal
CN117992600B (en) Service execution method and device, storage medium and electronic equipment
CN116795972B (en) Model training method and device, storage medium and electronic equipment
CN115017915B (en) Model training and task execution method and device
CN117034942B (en) Named entity recognition method, device, equipment and readable storage medium
CN116501852B (en) Controllable dialogue model training method and device, storage medium and electronic equipment
CN112445784B (en) Text structuring method, equipment and system
CN117973544A (en) Text unit reasoning method device based on semantic distance, storage medium and terminal
CN117787418A (en) Risk identification method and device, storage medium and electronic equipment
CN114996447A (en) Text hierarchy classification method and device based on center loss
CN117711403A (en) Text error correction model training method and device, storage medium and electronic equipment
CN117593003A (en) Model training method and device, storage medium and electronic equipment
CN117520850A (en) Model training method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination