CN115510932A

CN115510932A - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN115510932A
Application number: CN202110633482.4A
Authority: CN
Inventors: 刘宠; 李峰刚; 朱林波
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2022-12-23

Abstract

The application provides a model training method, a device, electronic equipment and a computer storage medium, wherein the method is applied to the technical field of knowledge graphs and comprises the following steps: acquiring an original data set, and extracting a knowledge graph from the original data set; preprocessing a knowledge graph to obtain a training set of a translation model; inputting the training set into a translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model; and determining a loss value of the translation model according to the dispersion, and adjusting parameters of the translation model based on the loss value of the translation model. Therefore, based on the loss value of the translation model, the dispersion of the translation model is reduced, and the dispersion degree of each entity and the relationship vector in the knowledge graph can be increased; furthermore, the problems of reduced model expression capability and low accuracy caused by the mismatching of the proportion of the entity quantity and the relation quantity are solved, and the performance of the translation model in the floor project is ensured.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of knowledge graph technology, and in particular, to a model training method and apparatus, an electronic device, and a computer storage medium.

Background

With the development of the internet, people are not generating a large amount of data all the time, the original data storage form cannot effectively utilize the data, the utilization efficiency of the data is greatly enhanced due to the occurrence of the knowledge graph, and the working efficiency of a search engine is enhanced. In the scientific research field, data is often converted into languages which can be understood by a computer for calculation, symbols are taken as a representation mode of the data in the early stage, and nowadays, vectors are taken as a representation mode of the data, and texts, images, sounds and the like are converted into vector forms through a certain algorithm model, so that the computer identification and calculation are facilitated. Knowledge graph representation learning has wide application, and is commonly used in information retrieval, intelligent recommendation and intelligent question and answer systems. The expression learning method of the knowledge graph comprises several representative models, such as a translation model, a distance model, an energy model, a bilinear model, a tensor neural network model and a matrix decomposition model. The translation model is used commonly, and the translation invariance in vector space is mainly used in the model, a knowledge triplet (h, R, t) in a knowledge base G = (E, R, S) is represented as a head entity h, and is translated to a tail entity t through a relation R, namely vector addition is performed, so that the translation is performed as far as possible

In the related art, for various translation models and variant models thereof, due to the fact that the training data set difference is large, namely, the proportion of the number of entities to the number of relations is high, after model training is finished, the space distance of vectorization representation of the entities in a vector space is too close, the expression capacity of the models is reduced, and the accuracy of the models is reduced.

Disclosure of Invention

The application provides a model training method, a model training device, electronic equipment and a computer storage medium; the method can solve the problems of reduced model expression capability and low accuracy rate caused by higher ratio of the entity quantity to the relation quantity in the related technology.

The technical scheme of the application is realized as follows:

the application provides a model training method, which comprises the following steps:

acquiring an original data set, and extracting a knowledge graph from the original data set; preprocessing the knowledge graph to obtain a training set of a translation model;

inputting the training set into the translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model;

and determining a loss value of the translation model according to the dispersion, and adjusting parameters of the translation model based on the loss value of the translation model.

In some embodiments, the training set includes a plurality of entities and relationships between the plurality of entities; the determining the dispersion of the translation class model in each iteration process of the translation class model comprises the following steps:

clustering the relationships among the entities in each iteration process of the translation model, and determining the spatial distance among different relationship classes and the spatial distance among the entities in the same relationship class;

and determining the dispersion of the translation model according to the spatial distance between different relation classes and the spatial distance between entities in the same relation class.

In some embodiments, the determining a loss value of the translation class model according to the dispersion comprises:

determining a corresponding loss function according to the value of the dispersion of the translation model in each iteration process;

and determining a loss value of the translation type model in each iteration process by using the loss function.

In some embodiments, the method further comprises:

if the dispersion of the translation model in the current iteration process is determined to be smaller than or equal to a set threshold value and the variation value of the dispersion of the translation model in the adjacent iteration process is determined to be smaller than a set value, the translation model is determined to be trained; the adjacent iteration process comprises the current iteration process.

In some embodiments, the preprocessing the knowledge-graph comprises:

extracting semantic information of each knowledge triple in the knowledge graph, and determining a training set of the translation model according to the semantic information of each knowledge triple.

The present application provides a model training apparatus comprising a first determining module, a second determining module, and an adjusting module, wherein,

the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for acquiring an original data set and extracting a knowledge graph from the original data set; preprocessing the knowledge graph to obtain a training set of a translation model;

the second determining module is used for inputting the training set into the translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model;

and the adjusting module is used for determining the loss value of the translation model according to the dispersion and adjusting the parameters of the translation model based on the loss value of the translation model.

The present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the electronic device implements the model training method provided by one or more of the above technical solutions.

The present application provides a computer storage medium having a computer program stored thereon; the computer program can implement the model training method provided by one or more of the above technical solutions after being executed.

The embodiment of the application provides a model training method, a model training device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an original data set, and extracting a knowledge graph from the original data set; preprocessing the knowledge graph to obtain a training set of a translation model; inputting the training set into the translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model; and determining a loss value of the translation model according to the dispersion, and adjusting parameters of the translation model based on the loss value of the translation model.

In the embodiment of the application, in each iteration process of training the translation model, the loss value of the translation model is determined by calculating the dispersion of the translation model, and the optimization of the translation model is achieved by continuously reducing the loss value; since the loss value is determined according to the dispersion of the translation model, namely, the dispersion of the translation model is reduced while the loss value is reduced; the dispersion degree of each entity and the relationship vector in the training set in the vector space can be measured, so that the dispersion degree of the translation model is reduced, and the dispersion degree of each entity and the relationship vector in the vector space can be increased; therefore, the problems of reduced model expression capability and low accuracy rate caused by mismatching of the proportion of the entity quantity to the relation quantity can be solved, the translation model has better performance in the floor project, is not limited to a data set in a specific field any more, and better meets the application requirement.

Drawings

FIG. 1a is a flow chart of a model training method according to an embodiment of the present application;

FIG. 1b is a schematic structural diagram of a BERT model in an embodiment of the present application;

FIG. 1c is a diagram illustrating an iterative process of a translation class model in an embodiment of the present application;

FIG. 1d is a schematic coordinate diagram of the variation of the dispersion of the translation class model with the iteration number in the embodiment of the present application;

FIG. 2a is a flow chart of another model training method according to an embodiment of the present application;

FIG. 2b is a flow chart of another model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a structure of a model training apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application.

The present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present application and are not intended to limit the present application. In addition, the following embodiments are provided as partial embodiments for implementing the present application, not all embodiments for implementing the present application, and the technical solutions described in the present application may be implemented in any combination without conflict.

It should be noted that, in the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include only the elements explicitly recited, but also includes other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. Said." does not exclude the presence of other elements of interest in a method or apparatus including the element (e.g., steps in a method or elements in an apparatus, such as a part of a processor, part of a program or software, etc.).

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

For example, although the model training method provided in the present application includes a series of steps, the model training method provided in the present application is not limited to the described steps, and similarly, the model training apparatus provided in the present application includes a series of modules, the model training apparatus provided in the present application is not limited to the modules explicitly described, and may include modules that are required to acquire relevant information or perform processing based on the information.

The present application may be implemented based on an electronic device, where the electronic device may be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics, a network personal computer, a small computer system, and the like.

The electronic devices such as the terminal device and the server can realize corresponding functions through the execution of the program modules. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so forth. They perform specific tasks or implement specific abstract data types. The computer system may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

The translation model mainly aims to express knowledge in a knowledge triple in a vector space, and attributes of an entity and a relationship are added into the vector space, so that the vector has semantic information, a computer can conveniently calculate and realize a Natural Language Processing (NLP) task.

However, in the related art, the following disadvantages exist for the training of the translation class model:

1) The data set difference is large; in a translation model, data sets in the model are mostly marked mature data sets, and in a general laboratory environment, the number of entities and the number of relation types contained in the data sets in training are small, and large quantity deviation does not exist, so that the translation model belongs to a general field data set. However, in the field of education, the translation model is applied to the actual floor project, and the actual performance is poor, because in the field of education, the proportion of the entities and the number of the relationships is very different, for example, the number of students and teachers is large, each student and teacher can form an entity, and the entities and schools have only one dependency relationship, that is, the head entity can be "classmate M", the relationship is "read in" and the tail entity is "school N", so that the result that the proportion of the number of the entities and the number of the relationships reaches 10000 1, or even higher, namely, there are tens of thousands of students, and there is only one tail entity in the corresponding relationship. However, in the standard data set, the ratio of the entity to the relationship number is only 20, which results in the end of the model training, and the vectorization representation of the entity is too close to the spatial distance in the vector space and too dense, so that the expression capability of the model is reduced.

2) The vectorization result density is too high; in the translation model, the optimization updating method of the model is that each iteration calculates the loss function, then the model is optimized by reducing the loss value, and after the training is finished, the model is evaluated by using link prediction, including calculation

Finding possible matching results

Due to the influence of the above problem 1), the vectorization result after training also has the problem of excessive density, which seriously influences the evaluation of the model during calculation

It is likely that the distance is close and the correct distance cannot be obtained

Resulting in less accurate training results for the model.

3) The data preprocessing mode is too simple; before a model is trained, data preprocessing is firstly carried out, but a preprocessing mode used by a translation model in the related technology only converts entities and relations into simple and continuous numbers for calculation, the preprocessing mode is too simple, semantic information among the entities or relations cannot be considered, and the semantic information has relations with the sequence of initial data, so that the result is inaccurate, the original semantic information of the entities cannot be added into a subsequent training model, the optimization iteration cycle of the model is longer in model training, the time cost of the training model is increased, the iteration speed of the model is slowed down, and errors occur in the training result.

In view of the above problems, the following embodiments are proposed.

In some embodiments of the present Application, the model training method may be implemented by using a Processor in the model training Device, and the Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a CPU, a controller, a microcontroller, and a microprocessor.

Fig. 1a is a flowchart of a model training method according to an embodiment of the present application, and as shown in fig. 1a, the flowchart may include:

step 100: acquiring an original data set, and extracting a knowledge graph from the original data set; and preprocessing the knowledge graph to obtain a training set of the translation model.

Illustratively, the application domain of the raw data set may be a specific domain, e.g., an educational domain, but also other general domains; the embodiments of the present application do not limit this.

Here, the source of the original data set may be structured data, semi-structured data, or unstructured data; structured data refers to data that can be represented and stored using a relational database, and is represented in a two-dimensional form. Semi-structured data is a form of structured data, which does not conform to a data model structure associated with a relational database or other data table, but includes related tags for separating semantic elements and layering records and fields, and common semi-structured data includes Extensible Markup Language (XML) and JavaScript Object Notation (JavaScript Object Notation). Unstructured data represents data that has no fixed structure, such as various documents, pictures, video or audio, etc.

In an embodiment, after the original data set is obtained, an implementation manner of extracting the knowledge graph from the original data set may be: the knowledge extraction is carried out on the original data set, namely, the knowledge extraction is carried out on the data from different sources and different structures, a plurality of pieces of knowledge are formed, and the knowledge graph corresponding to the original data set is determined according to the knowledge.

In the embodiment of the application, after the knowledge graph is extracted from the original data set, the knowledge graph needs to be preprocessed; illustratively, preprocessing the knowledge-graph may include the steps of: cleaning the data of the knowledge graph to remove invalid data, missing data and the like, namely ensuring that each knowledge in the knowledge graph can be represented as a group of complete knowledge triples (h, r, t); wherein h, t ∈ E, E represents the entity set, for example, the number of entities in the entity set is 378326, and the relationship between the entities is 53.

Further, the data in the knowledge graph is re-examined and checked by the script, including checking the data for consistency, deleting duplicate data, correcting existing errors, and the like, and the final available data is obtained, for example, 1205743 pieces of available data, that is, available knowledge triple entries are finally obtained. Here, the script may be a python script, or may be another type of script, which is not limited in this embodiment of the present application.

In some embodiments, preprocessing the knowledge-graph may further include: and extracting semantic information of each knowledge triplet in the knowledge map, and determining a training set of the translation model according to the semantic information of each knowledge triplet.

Here, the translation class model may include translation models of different relationship types and variant models thereof, and may be, for example, a TransE, a TransH, a TransR, a CTransR, a TransD, or a TransA model.

Exemplarily, after the available knowledge triple entries are obtained according to the steps, continuously extracting semantic information of each knowledge triple in the knowledge graph, and determining a training set of the translation model according to the semantic information of each knowledge triple; it can be seen that the training set includes semantic information of the head entity, the tail entity and the relationship between them in each knowledge triplet, in addition to the available knowledge triplet entries.

Illustratively, semantic information of each knowledge triple in the knowledge graph can be continuously extracted through the BERT model; through the processing of the model, a vector of the entity and the relation in each knowledge triple is output, and the vector can carry semantic information of the entity and the relation; here, the BERT model is used as a means for data preprocessing, and since the output vector includes semantic information, it also has a certain effect of improving the accuracy of the model.

Fig. 1b is a schematic structural diagram of the BERT model in the embodiment of the present application, and as shown in fig. 1b, the input layer is an entity or a relationship, the intermediate layer is composed of a bidirectional Transformer feature extractor Trm, and the output layer is a vector corresponding to the entity or the relationship; here, the output layer vector may be used as an input to the translation class model; that is, the input to the translation class model is no longer simply to rank the data.

Therefore, by preprocessing the knowledge graph, the original semantic information of the entity can be added into the subsequent translation model while the interference data is removed, and the accuracy of the training result can be improved in the training process of the translation model.

Step 101: and inputting the training set into the translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model.

In the embodiment of the application, after the knowledge graph is preprocessed through the steps to obtain the input of the translation model, the input of the translation model can be divided into a training set, a test set and a verification set according to a set proportion, and the input of the translation model can also be divided into the training set and the test set; here, the training set is used for training the model or determining parameters in the translation model, and the verification set is used for adjusting the hyper-parameters of the model and primarily evaluating the capability of the model; the test data set is used for evaluating the generalization ability of the translation model; for example, the value of the set ratio may be set according to an actual situation, which is not limited in the embodiment of the present application.

For example, before the translation model is trained, the iteration number of the translation model may be preset, where the value of the iteration number may be set according to an actual situation, for example, 40 times, 50 times, and the like, and this is not limited in this embodiment of the application.

In an embodiment of the present application, the training set may include a plurality of entities and relationships between the plurality of entities; in each iteration process of the translation type model, the dispersion of the translation type model can be calculated according to the multiple entities in the training data set and the relationship among the multiple entities.

In some embodiments, determining the dispersion of the translation class model during each iteration of the translation class model may include: clustering the relationships among a plurality of entities in each iteration process of the translation model, and determining the spatial distance among different relationship classes and the spatial distance among the entities in the same relationship class; and determining the dispersion of the translation model according to the spatial distance between different relation classes and the spatial distance between the entities in the same relation class.

In each iteration process of the translation model, a plurality of entities included in the training set can be clustered according to the relevance of the relationship according to a clustering algorithm; assuming that the entities in the two relation classes are obtained after clustering, it is indicated that the multiple entities included in the training set correspond to the relations of the two different classes. Here, the type of the clustering algorithm is not limited.

For example, after clustering the relationships among the entities, n entities in each relationship class may be randomly extracted, and the dispersion G between the n entities in each relationship class may be calculated _entity I.e. the spatial distance between n entities in each relationship class, n being an integer greater than zero; and then calculating the dispersion F between different relation classes _rel I.e. the spatial distance between different relation classes, integrated G _entity And F _rel The dispersion D of the translation class model can be obtained, and is described by the following formula (1):

wherein m is a coefficient of correlation,

represents the relation r _i And relation r _j The spatial distance therebetween;

represents the relation r _i The spatial distance between the inner n entities,

represents the relation r _j The spatial distances between the inner n entities; here, ,

and

expressed by equation (2) and equation (3), respectively:

wherein n represents the slave relation r _i Number of entities extracted internally, e _x ，e _y Is a relation r _i Inner entity, L1/L2 represents Manhattan distance or Euclidean distance; similarly, the determination can be made by referring to the formula (3)

And will not be described in detail herein.

Here, the expressions of L1, L2 are as shown in formula (4) and formula (5), respectively:

wherein x is ₁ ，x ₂ Representing two points in vector space, d ₁₂ Denotes x ₁ And x ₂ The spatial distance therebetween; if the spatial distance d ₁₂ Representing the spatial distance between n entities in each relationship class, and taking the value of k as 1 to n; (ii) a If the spatial distance d ₁₂ Indicating that k is 1 to m when the spatial distance between different relation classes is calculated.

Illustratively, in calculating the spatial distance between different relationship classes and the spatial distance between entities in the same relationship class, besides determining the spatial distance by using the above-mentioned manhattan distance or euclidean distance, minkowski distance, chebyshev distance, included angle cosine, hamming distance, KL divergence (Kullback-Leibler divergence), and the like may be used; the embodiment of the present application is not particularly limited with respect to the type of distance algorithm.

Wherein the expression of minkowski distance is shown in equation (6):

here, p is a constant, and the most common values are 2 and 1, the former being the euclidean distance and the latter being the manhattan distance.

Wherein, the expression of the Chebyshev distance is shown as formula (7):

the expression of the cosine of the included angle is shown in formula (8):

here, cos θ and d ₁₂ Is the same meaning and represents a spatial distance.

Where the hamming distance is expressed as the distance between two equal length vectors s1 and s2, defined as the minimum number of substitutions required to change one to the other.

Wherein, the expression of KL divergence is shown as formula (9):

for example, the distance algorithm methods described above can be used to calculate the spatial distance between entities or relationships; in the field of education, due to the particularity of data, the L1/L2 is superior to other distance algorithms; the L1 has strong robustness, is insensitive to abnormal values and is a feature extractor with good performance; l2 may solve the over-fitting problem.

In the embodiment of the application, in the training of the translation model, the measurement entity and the relation vectorization dispersion are added, so that the dispersion can influence the learning optimization direction of the translation model, and the learning result of the translation model is changed. Namely, by optimizing the translation model, the optimized model can achieve a good effect in solving the problem that the entity and the relation quantity are not matched, solve the problem and enable the project to fall to the ground.

Step 102: and determining a loss value of the translation model according to the dispersion, and adjusting parameters of the translation model based on the loss value of the translation model.

In the embodiment of the application, after the dispersion D of the translation model is determined according to the above steps, the dispersion D may be combined with a basic loss function of the translation model, and a loss value of the translation model is determined based on the combined loss function; further, parameters of the translation model are adjusted based on the loss value.

In one embodiment, for the translation-like model TransE, the basic loss function is shown in equation (10):

f _r (h,t)＝||l _h +l _r -l _t || _L1/L2 (10)

wherein l _h Representing the head entity vector,/ _r Representing a relationship vector,/ _t Representing the tail entity vector. The optimization goal of the translation class model is to make the head entity vector l _h Plus a relation vector l _r The tail entity vector l can be obtained _t I.e. the optimization objective is l _h +l _r ≈l _t (ii) a Other translation class model variations are on l in the formula _h 、l _r 、l _t And optimizing the part, so that the accuracy of the whole model can be improved after the vectors are changed.

For example, for the combination of the dispersion D and the base loss function, it can be shown as formula (11):

wherein,

representing a combinationThe latter loss function, Δ, represents the value of the base loss function | | l _h +l _r -l _t || _L1/L2 (ii) a Here, for the case of Δ < D, it can be determined adaptively according to the actual situation; for example, when Δ ≦ 0.1D, Δ < D may be specified.

In some embodiments, determining the penalty value for the translation class model from the dispersion may include: determining a corresponding loss function according to the value of the dispersion of the translation model in each iteration process; and determining the loss value of the translation model in each iteration process by using a loss function.

As can be seen from the formula (11), in each iteration process of the translation model, the corresponding combination mode can be determined according to the value of the dispersion D, that is, the combined loss function is obtained

Further, according to the combined loss function

And determining the loss value of the translation class model in each iteration process.

In the embodiment of the application, based on multiple combination modes of the dispersion D and the basic loss function, different application scenes and conditions of different data sets in the same scene are better fitted, wherein due to the arrangement of the dispersion D, the combined loss function is ensured

Not less than 0, when the values of the dispersion D are different, the overall loss function of the translation model

In contrast, the purpose of this arrangement is to limit the influence of the dispersion D on the translation-like model, not to be too obvious nor to be ineffective, because the dispersion D may have a value much larger than the base loss value of the base loss function of the translation-like model due to different data sets, or may have a value much smaller than the base loss value of the base loss function of the translation-like model, so that the dispersion D may have a value much smaller than the base loss valueThe method has the advantages that the influence of the dispersion D on the translation model is limited or improved, and the effect of the dispersion D on the translation model can be played under different data sets.

Combining the formula (1) and the formula (11), the dispersion G is known _entity And dispersion F _rel And (4) integrating the basic loss functions together to measure the integral dispersion D of the translation model. When F is present _rel Is relatively large and G _entity When the dispersion is small, the dispersion D is large; at this time, it is shown that distances between comparative relations among entities in a vector space are relatively dense, and errors are easily caused, that is, a loss value of a translation model is relatively large, and the loss value is reduced by reverse updating and gradient descending in the optimization process of the model; when the dispersion D is reduced, G will be reduced _entity And increasing the distance between the entities in the vector space, so that the entities keep a higher dispersion in the whole vector space, and the translation model error is reduced.

Illustratively, the lost function

The calculation of (2) can lead each vector to be more dispersed in the vector space, and the accuracy is improved in model evaluation, because the evaluation mode used by the translation model is calculation

The relation is added to the head entity, and the correct tail entity existing in the data set can be obtained in the corresponding entity vector with the closest distance (the difference is minimum through vector calculation). In vector space with low dispersion

The obtained vectors are too dense, so that the result has serious deviation, and the accuracy is directly low; in a vector space with high dispersion

The obtained vector tends to be discretized, so that the model accuracy is directly improved.

Here, the combination of the dispersion D and the basis loss function may include other combinations besides the combination shown in the formula (11), and the embodiment of the present application is not limited.

In some embodiments, the method may further include: if the fact that the dispersion of the translation model in the current iteration process is smaller than or equal to a set threshold value and the fact that the variation value of the dispersion of the translation model in the adjacent iteration process is smaller than a set value is determined, it is determined that the training of the translation model is finished; the adjacent iteration process comprises the current iteration process.

In the embodiment of the application, in each iteration process of the translation model, whether the dispersion of the translation model is smaller than or equal to a set threshold value or not is judged, if yes, whether the variation value of the dispersion of the translation model in the adjacent iteration process (including the iteration process) is smaller than a set value or not is continuously judged, if yes, the dispersion of the translation model tends to be stable, the training and learning of the translation model are stopped, and at the moment, the training of the translation model is completed.

In one embodiment, the value of the set threshold may be set according to actual conditions, for example, it may be 0.3s,0.2s, or the like; the value of the set value can also be set according to the actual situation, for example, the value can be 0.03S,0.02S and the like; the embodiments of the present application do not limit this.

Fig. 1c is a schematic diagram of an iterative process of a translation class model in the embodiment of the present application, and as shown in fig. 1c, the process may include:

step A1: n entities are extracted from each relationship class.

Illustratively, after each iteration of the translation class model begins, n entities are extracted from each relationship class.

Step A2: and calculating the dispersion among the n entities in each relation class.

Illustratively, the spatial distance between n entities in each relationship class is calculated as the dispersion between the entities in each relationship class.

Step A3: and calculating the dispersion among different relation classes.

For example, the execution order of step A2 and step A3 is not limited in the embodiments of the present application; step A2 may be performed first, or step A3 may be performed first.

Step A4: and calculating the dispersion D of the translation model.

Illustratively, the dispersion of the translation class model can be calculated according to the dispersion D between n entities in each relation class calculated in step A2 and the dispersion between different relation classes calculated in step A3.

Step A5: and judging whether the dispersion D is less than or equal to a set threshold value.

Illustratively, after the dispersion D of the translation model in the current iteration process is obtained according to the step A4, whether the dispersion of the translation model is smaller than or equal to a set threshold value is judged, if yes, the step A6 is executed, and if not, the step A1 is returned to execute the next iteration operation.

Step A6: and judging whether the dispersion D is stable or not.

Exemplarily, if it is determined according to the step A5 that the dispersion D in the current iteration process is less than or equal to the set threshold, it is continuously determined whether the variation value of the dispersion of the translation type model in the adjacent iteration process (including the current iteration process) is less than the set value, if so, the dispersion D tends to be stable, the model training is finished, and if not, the step A1 is returned to execute the next iteration operation.

Fig. 1d is a schematic coordinate diagram of a variation of the dispersion of the translation model with the number of iterations in the embodiment of the present application, as shown in fig. 1d, an abscissa of the schematic coordinate diagram is the number of iterations of the translation model, and an ordinate of the schematic coordinate diagram is the dispersion of the translation model; it can be seen that the dispersion of the translation class model tends to be stable as the number of iterations increases.

For example, the translation model may terminate its operation through normal iteration, or may also terminate iteration in advance by checking whether the dispersion D converges and is less than or equal to a set threshold, if the condition is satisfied. When the dispersion D is smaller than or equal to the set threshold and tends to be stable, the convergence direction of the translation model is proved to be correct, the loss value is smaller at the moment, the dispersion among entities is larger, and the dispersion at the moment can meet the expectation. When the dispersion D meets the conditions, iteration can be terminated in advance, and model training is finished; therefore, the model training time can be greatly reduced, and the model learning efficiency is increased.

The embodiment of the application provides a model training method, a model training device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an original data set, and extracting a knowledge graph from the original data set; preprocessing a knowledge graph to obtain a training set of a translation model; inputting the training set into a translation model for training, and determining the dispersion of the translation model in each iteration process of the translation model; and determining a loss value of the translation model according to the dispersion, and adjusting parameters of the translation model based on the loss value of the translation model. Calculating the dispersion of the translation model, determining the loss value of the translation model, and optimizing the translation model by continuously reducing the loss value; since the loss value is determined according to the dispersion of the translation model, namely, the dispersion of the translation model is reduced while the loss value is reduced; the dispersion degree of each entity and the relationship vector in the training set in the vector space can be measured, so that the dispersion degree of the translation model is reduced, and the dispersion degree of each entity and the relationship vector in the vector space can be increased; therefore, the problems of reduced model expression capability and low accuracy rate caused by mismatching of the proportion of the entity quantity to the relation quantity can be solved, the translation model has better performance in the floor project, is not limited to a data set in a specific field any more, and better meets the application requirement.

In order to further embody the object of the present application, the present application will be further described based on the above-described embodiments.

Fig. 2a is a flowchart of another model training method according to an embodiment of the present application, and as shown in fig. 2a, the flowchart may include:

step B1: and extracting a knowledge graph from the original data set, and cleaning the data.

And step B2: inputting the cleaned data into a BERT model.

And step B3: the BERT model outputs a training set comprising semantic information, and the training set is input into a translation model.

And step B4: and determining the dispersion degree of the translation model in each iteration process.

Illustratively, since the loss value of the translation class model in each iteration process is determined based on various combinations of the dispersion D and the basic loss function, the iteration direction of the model can be changed by determining the dispersion.

And step B5: and judging whether the dispersion degree D is less than or equal to a set threshold value or not.

Illustratively, judging whether the dispersion of the translation model is less than or equal to a set threshold, if so, executing the step B6, and if not, returning to the step B4 to execute the next iteration operation.

Step B6: and judging whether the dispersion D is stable or not.

Exemplarily, if the dispersion in the current iteration process is determined to be less than or equal to the set threshold, continuously judging whether the variation value of the dispersion of the translation type model in the adjacent iteration process (including the current iteration process) is less than a set value, if so, indicating that the dispersion D tends to be stable, finishing the model training, and if not, returning to the step B4 to execute the next iteration operation.

Here, compared with the translation model in the related art, the embodiment of the present application adds semantic information of entities into input data of the translation model, enriches meanings represented by initialization data, adds the dispersion D to measure the dispersion degree between entities, solves the problem of decrease in model accuracy rate caused by mismatching of entities and relation quantities in the field, improves the performance of the model, can shorten the model training time, improves the model training efficiency, and provides a solution for landing a project.

Fig. 2b is a flowchart of another model training method according to an embodiment of the present application, and as shown in fig. 2b, the flowchart may include:

step 200: and extracting a knowledge graph from an original database, and performing data preprocessing.

Illustratively, the data in the raw database includes structured data, e.g., from a database in a domain-specific project, unstructured data, and semi-structured data, e.g., from a web crawler crawl. And (3) carrying out data cleaning on the data in the original database, wherein the data cleaning comprises removing and complementing missing data, removing and modifying data with format or content errors, removing or modifying data with logic errors, removing unnecessary and repeated data, and finally carrying out relevance verification on the data. Inputting the cleaned data into a BERT model, downloading the model and a classifier by using an open-source pre-training model 'BERT-base-uncased', carrying out serialization marking on the original data, adding labels and attention masks, segmenting the sequence pairs, and inputting the segmented sequence pairs into the model to obtain vectorization representation of the data.

Step 201: and dividing the preprocessed data set according to a set proportion, and inputting the training set into the translation model.

Illustratively, the preprocessed data set may be divided according to a ratio of 1.

In the embodiment of the application, the hyper-parameters of the translation model are set, including the learning rate λ, the value range is {0.001,0.005,0.01,0.1}, the marginal value γ, the value range is {1.0,2.0,10.0}, the embedding dimension K has the value range of {20,50,100,200}, the batch processing B has the value range of {20,75,300,1200,4800}, and the similarity measure is { L1/L2}. This section is not limited to any one translation model, such as TransE, transH, transR, etc. The calculation method is described above, and is not described herein again.

Step 202: and determining the dispersion and the loss value of the translation model.

Illustratively, the dispersion of the translation class model is determined according to the dispersion between the entities in each relation class and the dispersion between different relation classes, and then the loss value is determined according to the dispersion. The calculation method is described above, and is not described herein again. After the loss value is determined, the model is optimized by using a random gradient descent method, so that the loss value of the model is reduced, and the accuracy of the model is improved. And calculating whether the dispersion reaches a threshold value and is in a stable state, so as to judge whether the iteration can be stopped in advance, and otherwise, continuously iterating until the translation model training is finished.

Step 203: and adjusting the translation model.

Exemplarily, after the trained translation model is obtained, the test set is input into the translation model to obtain a test result of the translation model, and the hyper-parameters are adjusted and optimized according to the test result.

The model training method in the embodiment of the application improves the accuracy of the data set with disproportionate entity and relation quantity, and makes the model training method possible to fall to the ground in the industry. Because the dispersion D for measuring the dispersion degree between the entities is added into the translation model, the dispersion D influences the dispersion degree of the entities and the relation vectors in the space, the space distance of the entities and the relation vectors is increased, the accuracy rate of the knowledge representation learning model is increased in the data field with large difference between the number of the entities and the number of the relations, the model iteration time is shortened, and the model training time is reduced. In addition, the influence degree of the dispersion D is limited or enlarged according to different conditions, different combination modes are carried out on the overall loss function of the model, and the effect of limiting the influence degree of the dispersion D is achieved, so that the scheme can be applied to different data sets.

Fig. 3 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application, and as shown in fig. 3, the apparatus includes: a first determination module 300, a second determination module 301, and an adjustment module 302, wherein:

a first determining module 300, configured to obtain an original data set, and extract a knowledge graph from the original data set; preprocessing the knowledge graph to obtain a training set of a translation model;

the second determining module 301 is configured to input the training set into the translation model for training, and determine the dispersion of the translation model in each iteration process of the translation model;

and the adjusting module 302 is configured to determine a loss value of the translation model according to the dispersion, and adjust a parameter of the translation model based on the loss value of the translation model.

In some embodiments, the training set includes a plurality of entities and relationships between the plurality of entities; the second determining module 301 is configured to determine the dispersion of the translation class model in each iteration of the translation class model, and includes:

clustering the relationships among a plurality of entities in each iteration process of the translation model, and determining the spatial distance among different relationship classes and the spatial distance among the entities in the same relationship class;

and determining the dispersion of the translation model according to the spatial distance between different relation classes and the spatial distance between the entities in the same relation class.

In some embodiments, the adjusting module 302 is configured to determine a loss value of the translation class model according to the dispersion, and includes:

and determining the loss value of the translation model in each iteration process by using the loss function.

In some embodiments, the adjusting module 302 is further configured to:

if the dispersion of the translation model in the iteration process is smaller than or equal to a set threshold value and the variation value of the dispersion of the translation model in the adjacent iteration process is smaller than a set value, the translation model is determined to be trained; the adjacent iteration process comprises the current iteration process.

In some embodiments, the first determination module 300 is configured to pre-process the knowledge-graph, including:

and extracting semantic information of each knowledge triple in the knowledge map, and determining a training set of the translation model according to the semantic information of each knowledge triple.

In practical applications, the first determining module 300, the second determining module 301 and the adjusting module 302 may be implemented by a processor in an electronic device, and the processor may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller and microprocessor.

In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the related art, or all or part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

Specifically, the computer program instructions corresponding to a model training method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a usb disk, and when the computer program instructions corresponding to a model training method in the storage medium are read or executed by an electronic device, any one of the model training methods of the foregoing embodiments is implemented.

Based on the same technical concept of the foregoing embodiment, referring to fig. 4, it shows an electronic device 400 provided in the embodiment of the present application, which may include: a memory 401 and a processor 402; wherein,

a memory 401 for storing computer programs and data;

a processor 402 for executing a computer program stored in a memory to implement any of the model training methods of the previous embodiments.

In practical applications, the memory 401 may be a volatile memory (RAM); or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 402.

The processor 402 may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above-mentioned processor functions may be other devices for different model training apparatuses, and the embodiments of the present application are not particularly limited.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present application may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.

The features disclosed in the various product embodiments provided in the present application may be combined arbitrarily without conflict, to arrive at new product embodiments.

The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein the training set comprises a plurality of entities and relationships between the plurality of entities; the determining the dispersion of the translation class model in each iteration process of the translation class model comprises the following steps:

3. The method of claim 1, wherein determining a loss value of the translation class model based on the dispersion comprises:

and determining the loss value of the translation class model in each iteration process by using the loss function.

4. The method of claim 1, further comprising:

5. The method of any one of claims 1 to 4, wherein the preprocessing the knowledge-graph comprises:

6. A model training apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the training set comprises a plurality of entities and relationships between the plurality of entities; the second determining module is configured to determine the dispersion of the translation class model in each iteration of the translation class model, and includes:

8. The apparatus of claim 6, wherein the adjusting module is configured to determine a loss value of the translation class model according to the dispersion, and comprises:

9. An electronic device, characterized in that the device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which when executing the program implements the method of any of claims 1 to 5.

10. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the method of any of claims 1 to 5 when executed by a processor.