CN117973522A - Knowledge data training technology-based application model construction method and system - Google Patents
Knowledge data training technology-based application model construction method and system Download PDFInfo
- Publication number
- CN117973522A CN117973522A CN202410390624.2A CN202410390624A CN117973522A CN 117973522 A CN117973522 A CN 117973522A CN 202410390624 A CN202410390624 A CN 202410390624A CN 117973522 A CN117973522 A CN 117973522A
- Authority
- CN
- China
- Prior art keywords
- data
- training
- application
- sample
- sets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 252
- 238000005516 engineering process Methods 0.000 title claims abstract description 25
- 238000010276 construction Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000000875 corresponding effect Effects 0.000 claims description 26
- 238000013507 mapping Methods 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 6
- 230000002596 correlated effect Effects 0.000 claims description 4
- 238000011161 development Methods 0.000 abstract description 20
- 230000006870 function Effects 0.000 description 116
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides an application model construction method and system based on knowledge data training technology, and relates to the technical field of intelligent development, wherein the method comprises the following steps: collecting a historical function demand information set and a historical application data set, classifying data to obtain a plurality of sample function demand information sets, calculating a plurality of application information entropies, respectively obtaining a plurality of data training amounts and a plurality of training accuracies, judging whether the number of different sample function demand information in the plurality of sample function demand information sets is more than or equal to the plurality of data training amounts, obtaining a training data set, then performing knowledge data training to obtain an application constructor, identifying function demand information provided by a user, and outputting constructed application model data. The application can solve the technical problems of lower development efficiency and low individuation degree caused by the prior art with larger limitation, so as to meet the diversified function requirements of users and improve individuation and development efficiency of low-code development.
Description
Technical Field
The application relates to the technical field of intelligent development, in particular to an application model construction method and system based on knowledge data training technology.
Background
Knowledge modeling is a method for creating structured knowledge representations, which aims to support applications such as information management, knowledge discovery, decision support, and the like. This modeling approach may be top-down, starting with a high level of abstraction, and then progressively refining to a more concrete and detailed level; or from bottom to top, starting with the underlying actual data and information and then progressively organizing and abstracting into a higher level knowledge representation. In this process, the steps of data collection, pattern recognition, concept extraction, associated modeling, abstraction and generalization, verification and optimization are all of vital importance.
At present, in a low-code development platform, an application model required by a user is built by acquiring the user requirement, so that personalized development of the low code is realized.
In summary, the prior art is greatly limited by the prior art, which results in the technical problems of low development efficiency and low individuation degree.
Disclosure of Invention
The application aims to provide an application model construction method and system based on knowledge data training technology, which are used for solving the technical problems of low development efficiency and low individuation degree caused by the prior art with larger limitations.
In view of the above problems, the present application provides a method and a system for constructing an application model based on knowledge data training technology.
In a first aspect, the present application provides an application model construction method based on a knowledge data training technology, the method being implemented by an application model construction system based on a knowledge data training technology, wherein the method includes: based on historical construction data of an application model of a target application, acquiring a historical functional demand information set and a historical application data set; classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information sets to obtain a plurality of sample function demand information sets; calculating a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets; according to the application information entropy, analyzing and acquiring a plurality of data training amounts and a plurality of training accuracies of knowledge data training of the sample application data categories; judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis, generating compensation training data by generating an countermeasure model, and obtaining the training data set; and training knowledge data according to the training data sets and the training accuracies to obtain an application constructor, identifying functional requirement information provided by a user, and outputting constructed application model data.
In a second aspect, the present application further provides an application model building system based on a knowledge data training technology, for executing the application model building method based on the knowledge data training technology according to the first aspect, where the system includes: the historical data set acquisition module is used for acquiring a historical function demand information set and a historical application data set based on historical construction data of an application model of the target application; the demand information set acquisition module is used for classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information set to obtain a plurality of sample function demand information sets; the application information entropy calculation module is used for calculating a plurality of application information entropies of the sample application data categories according to the number of sample application data in the sample application data sets; the precision requirement acquisition module is used for analyzing and acquiring a plurality of data training amounts and a plurality of training precision of knowledge data training of the plurality of sample application data categories according to the plurality of application information entropies; the training data set acquisition module is used for judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis to generate compensation training data by generating an countermeasure model to obtain the training data set; the application builder acquisition module is used for training knowledge data according to a plurality of training data sets and a plurality of training accuracies, obtaining an application builder, identifying functional requirement information provided by a user and outputting built application model data.
One or more technical schemes provided by the application have at least the following technical effects or advantages:
Acquiring a historical function demand information set and a historical application data set through historical construction data based on an application model of a target application; classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information sets to obtain a plurality of sample function demand information sets; calculating a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets; according to the application information entropy, analyzing and acquiring a plurality of data training amounts and a plurality of training accuracies of knowledge data training of the sample application data categories; judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis, generating compensation training data by generating an countermeasure model, and obtaining the training data set; according to a plurality of training data sets and a plurality of training precision, knowledge data training is carried out to obtain an application constructor, function requirement information provided by a user is identified, and constructed application model data is output, so that the technical problems of low development efficiency and low individuation degree caused by the prior art that the prior art is greatly limited are effectively solved, diversified function requirements of the user are met, and individuation and development efficiency of application low-code development are improved.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following brief description will be given of the drawings used in the description of the embodiments or the prior art, it being obvious that the drawings in the description below are only exemplary and that other drawings can be obtained from the drawings provided without the inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of an application model building method based on knowledge data training technology of the present application;
fig. 2 is a schematic structural diagram of an application model building system based on knowledge data training technology according to the present application.
Reference numerals illustrate:
the system comprises a historical data set acquisition module 11, a demand information set acquisition module 12, an application information entropy calculation module 13, an accuracy demand acquisition module 14, a training data set acquisition module 15 and an application builder acquisition module 16.
Detailed Description
The application solves the technical problems of lower development efficiency and low individuation degree caused by the prior art with larger limitation by providing the application model construction method and the system based on the knowledge data training technology, so as to meet the diversified functional requirements of users and improve individuation and development efficiency of application low-code development.
In the following, the technical solutions of the present application will be clearly and completely described with reference to the accompanying drawings, and it should be understood that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application, and that the present application is not limited by the exemplary embodiments described herein. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. It should be further noted that, for convenience of description, only some, but not all of the drawings related to the present application are shown.
Example 1
Referring to fig. 1, the application provides an application model construction method based on a knowledge data training technology, wherein the method is applied to an application model construction system based on the knowledge data training technology, and the method specifically comprises the following steps:
step one: based on historical construction data of an application model of a target application, acquiring a historical functional demand information set and a historical application data set;
In particular, the history build data is all data generated during past development, deployment and updating of the target application. Such data includes, but is not limited to, code libraries, configuration files, test reports, user feedback, version control information, and the like. The set of historical functional requirement information refers to all functional requirements and related documents recorded during past development. Each functional requirement should include information such as requirement description, priority, implementation difficulty, expected completion time, etc. By analyzing this information, it is possible to understand the trend of the user's demand for the application and which functions have been popular or popular with the user in the past. The historical application data set refers to data generated in the actual running process of the application, including user behavior data, system performance data, exception logs and the like. These data may reflect the actual use of the application, performance bottlenecks, and potential problems. For example, by analyzing user behavior data, the user's usage habits, preferences, and churn causes can be known; by analyzing the system performance data, the performance bottleneck can be found out and optimized; by analyzing the exception log, potential problems can be found and repaired in time. The method for collecting the historical functional requirement information set and the historical application data set can be selected according to actual conditions. The function demand information can be collected by referring to historical documents, communicating with related personnel and the like; for application data, collection may be performed by log analysis, data mining, user investigation, and the like.
Step two: classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information sets to obtain a plurality of sample function demand information sets;
Specifically, the historical application data set is cleaned, repeated, wrong or incomplete data is removed, and the quality and accuracy of the data are ensured. Key features are extracted from the cleaned data, which may be user behavior patterns, system performance metrics, time stamps, etc. Historical application data with similar characteristics is classified into one category by using a clustering algorithm such as K-means, hierarchical clustering and the like. Each sample application data set represents a class of historical application data having similar characteristics. And (3) sorting the historical functional requirement information set, and ensuring that each requirement has clear description and attributes such as priority, source, realization state and the like. Keywords or phrases are extracted from each functional requirement, which can reflect the nature and topic of the requirement. The extracted keywords are modeled by using topic modeling technology, and clients are identified for the same function but describing different topics. The historical functional demand information is mapped into corresponding categories according to the identified topics or patterns. These categories may be divided according to business needs, user groups, functional modules, etc. By mapping the categorization, multiple sample functional requirement information sets can be obtained. Each information set contains a class of functional requirements that have a common theme or feature.
Step three: calculating a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets;
In particular, the information entropy is an index for measuring uncertainty in a data set, and can be understood as a degree of confusion of the data set or an average amount of information. The number of samples in each sample application data class can be considered as a probability distribution for that class. The proportion of the number of samples in each category may be calculated and then considered as a probability. The entropy of the information for each category is then calculated. Or directly applying a class of data to each sample, and calculating the amount of sample application data in the class. The proportion of the number of samples in each category, i.e. the probability, is calculated. And calculating to obtain the information entropy of each category.
Step four: according to the application information entropy, analyzing and acquiring a plurality of data training amounts and a plurality of training accuracies of knowledge data training of the sample application data categories;
Specifically, according to the application information entropy, a plurality of data training amounts and a plurality of training accuracies required by the sample application data categories when training knowledge data can be analyzed and acquired. In general, the larger the application information entropy, the smaller the probability of user demand, the smaller the amount of data and accuracy required. The application information entropy of the plurality of sample application data categories is ordered in order from large to small. For categories with lower information entropy, a greater amount of data training may be required to capture these complexities. Conversely, a category with higher entropy of information may require less data training. For example, the optimal amount of training data may be determined by gradually increasing the amount of training data and observing changes in model performance. The training accuracy is typically related to the complexity of the data set, the complexity of the model, and the size of the training data volume. For a given model complexity and amount of training data, it may be expected that the higher information entropy class may achieve lower training accuracy, while the lower information entropy class may achieve higher training accuracy.
Step five: judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis, generating compensation training data by generating an countermeasure model, and obtaining the training data set;
Specifically, for each sample function requirement information set, the number of different sample function requirement information therein is counted. These numbers are compared to a plurality of data training amounts previously determined. If the number of the sample function requirement information is greater than or equal to the data training amount: and selecting a sufficient number of sample function requirement information directly from the corresponding sample function requirement information set so as to meet the requirement of the data training amount. Ensuring that the selected data is representative can cover the main functions and requirements of that category. This results in a preliminary training data set. If the number of sample function requirement information is less than the data training amount: and determining the required compensation data amount of each category according to the analysis result of the application information entropy. The required compensation training data is generated using a generative model or other suitable data generation technique. The model may learn its distribution from existing sample data and generate new, similar data. The generated compensation data is combined with the original sample data to form a complete training data set.
Step six: and training knowledge data according to the training data sets and the training accuracies to obtain an application constructor, identifying functional requirement information provided by a user, and outputting constructed application model data.
Specifically, a corresponding application model is trained using a plurality of training data sets. And setting a proper training target and an optimization strategy according to the determined training precision requirement. Training may employ deep learning, machine learning, and the like. In the training process, the performance and the overfitting condition of the model are monitored by means of cross-validation, learning curves and the like so as to adjust the model structure, parameters and the like in time. After training is completed, the model is evaluated using a separate validation data set. The evaluation index should be selected according to the actual task requirement, such as accuracy, recall, F1 score, etc. Based on the evaluation result, the best performing model or model combination is selected as the basis of the application constructor. The selected model is integrated into an application builder. This builder should have the functions of receiving user input, processing the input, invoking the corresponding model, and outputting the result. And testing and optimizing the constructor to ensure that the stability and the performance of the constructor meet the requirements. When a user provides functional requirement information, the application builder should first pre-process the information, such as cleaning, normalization, feature extraction, etc. And then, inputting the processed user requirements into a corresponding model for identification and analysis. And generating or selecting application model data meeting the requirements of the user by the application constructor according to the recognition and analysis results of the model. Finally, these application model data are output to the user in an appropriate format.
Further, the second step of the present application further comprises:
extracting non-repeated historical application data of the historical application data set as the plurality of sample application data categories;
classifying the historical application data sets according to the plurality of sample application data categories to obtain a plurality of sample application data sets;
And mapping and classifying the historical function demand information sets according to the mapping relation between the application data of each sample and the historical function demand information to obtain a plurality of sample function demand information sets.
Specifically, the data in the historical application data set is deduplicated to ensure that each sample application data is unique. From unrepeated historical application data, a plurality of sample application data categories may be defined. These categories may be partitioned based on attributes of the type, purpose, technology stack, etc. of the application. The extracted non-duplicate sample application data is stored and an index is created for each sample application data for subsequent classification and mapping operations. The historical application data set is classified according to a plurality of sample application data categories defined previously. A sample application data set is created for each category, the data set containing all historical application data items belonging to the category. Ensuring that there is a clear mapping relationship between each sample application data and its associated historical functional requirement information. This may be accomplished by maintaining a mapping table or using database relationships. The set of historical functional demand information is classified based on a mapping relationship with each sample application data. Each item of functional requirement information is assigned to its associated sample application data category. A sample function requirement information set is created for each sample application data category, the set containing all of the historical function requirement information items associated with the category.
Further, the third step of the present application further comprises:
acquiring the number of sample application data in the plurality of sample application data sets, and calculating the ratio of each number to the sum of the plurality of numbers to obtain a plurality of application probabilities;
According to the application probabilities, calculating a plurality of application information entropies for obtaining a plurality of sample application data categories, wherein the application information entropies are as follows:
;
Where M is the number of application data categories for a plurality of samples, The application probability of the data class is applied for the i-th sample.
Specifically, for each sample application data set, the sample application data amount therein is calculated. The sum of the number of application data for all samples is calculated. For each category, dividing the number of sample application data for that category by the total number, thereby yielding an application probability for each category. Once there are application probabilities for each category, these probabilities can be used to calculate application information entropy. And substituting the application probability and the category number into a calculation formula to obtain a plurality of application information entropies.; Where M is the number of application data categories for a plurality of samples,The application probability of the data class is applied for the i-th sample.
Further, the fourth step of the present application further comprises:
acquiring basic data training quantity and basic training precision of knowledge data training, wherein the basic training precision comprises basic accuracy;
Distributing and calculating to obtain a plurality of adjustment coefficients according to the plurality of application information entropies, wherein the magnitude of the sample application information entropies is inversely related to the magnitude of the adjustment coefficients;
And adopting the plurality of adjustment coefficients to perform adjustment calculation on the basic data training quantity and the basic training precision, and obtaining the plurality of data training quantities and the plurality of training precision.
Specifically, a base data training amount is determined, which refers to the minimum data set size that needs to be collected, processed, and marked before training can begin. For example, task complexity, more complex tasks such as image recognition, natural language processing, etc., typically require more data to learn features and patterns. Model complexity, more complex models such as deep learning networks tend to have more parameters and therefore require more data to avoid overfitting. The higher the diversity of the data, the greater the diversity of the different situations and variants contained in the data set, and generally the greater the amount of data required. Labeling costs, for supervised learning tasks, requires manual labeling of data, which increases the cost and time of data set preparation. Determining the basic training accuracy refers to performance index thresholds set for the model before training is started, and these indexes are used to evaluate the model performance during training and verify whether the model reaches a predetermined standard after training is finished. Including accuracy, which refers to the ratio of the number of samples to the total number of samples for which the model is correctly classified, for classification problems. Precision and recall, the precision measures how many of the samples predicted to be positive are true positive samples, and the recall measures how many of all true positive samples are predicted to be positive for a particular class in a two-class or multi-class problem. The basic data training amount, i.e. the minimum data amount required for training the knowledge data, is determined. And determining basic training accuracy, including performance indexes such as basic accuracy rate and the like. For each sample application data class, an adjustment coefficient is calculated based on its application information entropy. The adjustment coefficients are inversely related to the application information entropy, the category with higher information entropy will obtain smaller adjustment coefficients, while the category with lower information entropy will obtain larger adjustment coefficients. The calculation of the adjustment coefficients may be implemented using linear or non-linear functions. The adjustment coefficients are used to adjust the amount of base data training and similarly the adjustment coefficients are used to adjust the base training accuracy.
Further, the fifth step of the present application further comprises:
Extracting the occurrence probability of different sample function requirement information according to the plurality of sample function requirement information sets to obtain a plurality of sample occurrence probability sets;
and according to the multiple sample occurrence probability sets, sequentially selecting sample function requirement information with high occurrence probability from the sample function requirement information sets until the corresponding data training quantity is met, and obtaining a training data set.
Specifically, for each sample function demand information set, the number of occurrences of different function demand information therein is counted. Dividing the occurrence number of each function requirement information by the total number of the function requirement information sets to obtain the occurrence probability of the function requirement information. And arranging the calculated occurrence probabilities into sets to form a plurality of sample occurrence probability sets, wherein each set corresponds to one sample function requirement information set. An empty training data set is created for storing the selected sample functional requirement information. And selecting sample function requirement information with the maximum occurrence probability from each sample occurrence probability set. If there are multiple samples with the same maximum probability, one of them may be selected randomly or according to other criteria. The selected sample functional requirement information is added to the training dataset. After each sample addition, the data volume of the current training data set is updated. And judging whether the data volume of the current training data set reaches a preset basic data training volume or not. If so, stopping selecting and outputting the training data set; if not, the sample continues to be selected.
Further, the fifth step of the present application further comprises:
extracting different sample function requirement information sets with the quantity smaller than the corresponding data training quantity as a plurality of compensation sample function requirement information sets;
calculating to obtain a plurality of basic compensation quantities according to different sample function requirement information quantities and a plurality of training data quantities in the plurality of compensation sample function requirement information sets;
Performing correction calculation on the plurality of basic compensation amounts by adopting application information entropy corresponding to the plurality of compensation sample function requirement information sets to obtain a plurality of data compensation amounts, wherein the magnitude of the application information entropy is positively correlated with the magnitude of the corrected data compensation amount;
and generating compensation training data by generating an countermeasure model according to the plurality of data compensation amounts and the plurality of compensation sample function requirement information sets to form a training data set.
Specifically, first, those sample function requirement information sets having different sample function requirement information amounts smaller than the preset data training amount are identified. These sets are extracted as a plurality of compensated sample functional requirement information sets. For each compensated sample function requirement information set, the amount of data it needs to compensate is calculated, which is typically the preset data training amount minus the number of samples in the current set. And calculating a plurality of basic compensation quantities according to the compensation requirement and the plurality of training data quantities. These basic compensation amounts represent the amount of data that each set needs to supplement without any correction. For each compensated sample function requirement information set, calculating corresponding application information entropy. And performing correction calculation on the basic compensation quantity by using the application information entropy. The principle of correction is that the magnitude of the application information entropy is positively correlated with the magnitude of the corrected data compensation amount. That is, if the entropy of the application information of a certain set is high, it means that the sample function needs are more diverse and uncertain, so more compensation data is needed to capture the diversity; conversely, if the application information entropy is low, relatively less compensation data is required. Based on the task requirements and the data type, an appropriate generated countermeasure model is selected to generate compensation training data. The generation of the countermeasure model consists of a generator and a discriminator, and new data similar to the original data distribution can be generated. Generating compensation data, and generating a corresponding quantity of compensation training data for each compensation sample function requirement information set by using a generated countermeasure model according to the calculated multiple data compensation quantities. And forming a training data set, combining the generated compensation training data with the original training data set to form a complete training data set meeting the data training amount requirement.
Further, the application also comprises:
Constructing a data compensation channel based on the generation of the countermeasure network model, wherein the data compensation channel comprises a generator and a discriminator;
Acquiring a training generation function requirement information set and a training discrimination function requirement information set, and performing supervision training on the data compensation channel until convergence;
And based on the converged data compensation channel, inputting and generating the plurality of compensation sample function requirement information sets, obtaining generated compensation training data, and forming a training data set.
Specifically, a data compensation channel is constructed based on the generation of the countermeasure network model, and is converged through supervision training, and the generator is responsible for generating new sample function requirement information, wherein the input of the new sample function requirement information is a random noise vector or a condition vector, and the output of the new sample function requirement information is the generated sample function requirement information. A discriminator in charge of judging whether the input sample function requirement information is from the real data set or generated by the generator. Its output is typically a probability value that indicates the likelihood that the input data is real data. For a generator, the loss function is typically based on the output of the arbiter, encouraging the generator to generate more realistic data. For the arbiter, the loss function encourages it to correctly distinguish between the real data and the generated data. And dividing the training generation function requirement information set and the training discrimination function requirement information set from the existing function requirement information set. The two data sets are typically randomly extracted from the same original data set. The discriminant training uses the real data and the dummy data generated by the generator to train the discriminant so that it can accurately distinguish between the two. The generator trains, using the feedback of the discriminators to train the generator so that it can generate samples that are closer to real data. Training of the discriminators and generators is alternated until a certain convergence condition is reached, such as stable loss function, sufficiently high quality of the generated samples, etc. The multiple compensating sample function requirement information sets are used as input and provided to the converged data compensating channel, namely the trained generator. The generator generates corresponding compensation training data according to the input compensation sample function requirement information set. The generated compensation training data is combined with the original training data set to form a larger and more comprehensive training data set for subsequent model training.
In summary, the application model construction method based on the knowledge data training technology provided by the application has the following technical effects:
Acquiring a historical function demand information set and a historical application data set through historical construction data based on an application model of a target application; classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information sets to obtain a plurality of sample function demand information sets; calculating a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets; according to the application information entropy, analyzing and acquiring a plurality of data training amounts and a plurality of training accuracies of knowledge data training of the sample application data categories; judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis, generating compensation training data by generating an countermeasure model, and obtaining the training data set; according to a plurality of training data sets and a plurality of training precision, knowledge data training is carried out to obtain an application constructor, function requirement information provided by a user is identified, and constructed application model data is output, so that the technical problems of low development efficiency and low individuation degree caused by the prior art that the prior art is greatly limited are effectively solved, diversified function requirements of the user are met, and individuation and development efficiency of application low-code development are improved.
Example two
Based on the application model construction method based on the knowledge data training technology in the foregoing embodiment, the application also provides an application model construction system based on the knowledge data training technology, please refer to fig. 2, wherein the system comprises:
A historical data set acquisition module 11, wherein the historical data set acquisition module 11 is used for acquiring a historical function demand information set and a historical application data set based on historical construction data of an application model of a target application;
The demand information set acquisition module 12 is configured to classify the same historical application data as the historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data categories, and simultaneously map and classify the historical function demand information set to obtain a plurality of sample function demand information sets;
an application information entropy calculation module 13, where the application information entropy calculation module 13 is configured to calculate a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets;
the precision requirement acquisition module 14 is configured to analyze and acquire a plurality of data training amounts and a plurality of training precision of knowledge data training performed by the plurality of sample application data categories according to the plurality of application information entropies;
The training data set obtaining module 15 is configured to determine whether the number of different sample function requirement information in the plurality of sample function requirement information sets is greater than or equal to the plurality of data training amounts, if yes, select different sample function requirement information from the sample function requirement information sets according to the data training amounts, obtain a training data set, otherwise, obtain a compensation data amount according to application information entropy analysis, generate compensation training data by generating an countermeasure model, and obtain the training data set;
The application builder obtaining module 16 is configured to perform knowledge data training according to a plurality of training data sets and the plurality of training accuracies, obtain an application builder, identify function requirement information provided by a user, and output built application model data.
Further, the requirement information set acquisition module 12 in the system is further configured to:
extracting non-repeated historical application data of the historical application data set as the plurality of sample application data categories;
classifying the historical application data sets according to the plurality of sample application data categories to obtain a plurality of sample application data sets;
And mapping and classifying the historical function demand information sets according to the mapping relation between the application data of each sample and the historical function demand information to obtain a plurality of sample function demand information sets.
Further, the application information entropy calculation module 13 in the system is further configured to:
acquiring the number of sample application data in the plurality of sample application data sets, and calculating the ratio of each number to the sum of the plurality of numbers to obtain a plurality of application probabilities;
According to the application probabilities, calculating a plurality of application information entropies for obtaining a plurality of sample application data categories, wherein the application information entropies are as follows:
;
Where M is the number of application data categories for a plurality of samples, The application probability of the data class is applied for the i-th sample.
Further, the accuracy requirement obtaining module 14 in the system is further configured to:
acquiring basic data training quantity and basic training precision of knowledge data training, wherein the basic training precision comprises basic accuracy;
Distributing and calculating to obtain a plurality of adjustment coefficients according to the plurality of application information entropies, wherein the magnitude of the sample application information entropies is inversely related to the magnitude of the adjustment coefficients;
And adopting the plurality of adjustment coefficients to perform adjustment calculation on the basic data training quantity and the basic training precision, and obtaining the plurality of data training quantities and the plurality of training precision.
Further, the training data set acquisition module 15 in the system is further configured to:
Extracting the occurrence probability of different sample function requirement information according to the plurality of sample function requirement information sets to obtain a plurality of sample occurrence probability sets;
and according to the multiple sample occurrence probability sets, sequentially selecting sample function requirement information with high occurrence probability from the sample function requirement information sets until the corresponding data training quantity is met, and obtaining a training data set.
Further, the training data set acquisition module 15 in the system is further configured to:
extracting different sample function requirement information sets with the quantity smaller than the corresponding data training quantity as a plurality of compensation sample function requirement information sets;
calculating to obtain a plurality of basic compensation quantities according to different sample function requirement information quantities and a plurality of training data quantities in the plurality of compensation sample function requirement information sets;
Performing correction calculation on the plurality of basic compensation amounts by adopting application information entropy corresponding to the plurality of compensation sample function requirement information sets to obtain a plurality of data compensation amounts, wherein the magnitude of the application information entropy is positively correlated with the magnitude of the corrected data compensation amount;
and generating compensation training data by generating an countermeasure model according to the plurality of data compensation amounts and the plurality of compensation sample function requirement information sets to form a training data set.
Further, the system also comprises a compensation training data acquisition module, wherein the compensation training data acquisition module is used for:
Constructing a data compensation channel based on the generation of the countermeasure network model, wherein the data compensation channel comprises a generator and a discriminator;
Acquiring a training generation function requirement information set and a training discrimination function requirement information set, and performing supervision training on the data compensation channel until convergence;
And based on the converged data compensation channel, inputting and generating the plurality of compensation sample function requirement information sets, obtaining generated compensation training data, and forming a training data set.
In this description, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, and the application model building method and the specific example based on the knowledge data training technology in the first embodiment of fig. 1 are equally applicable to the application model building system based on the knowledge data training technology in this embodiment, and by the foregoing detailed description of the application model building method based on the knowledge data training technology, those skilled in the art can clearly know the application model building system based on the knowledge data training technology in this embodiment, so that, for brevity of the description, no detailed description will be given here. For the system disclosed in the embodiment, since the system corresponds to the method disclosed in the embodiment, the description is simpler, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the present application and the equivalent techniques thereof, the present application is also intended to include such modifications and variations.
Claims (8)
1. The application model construction method based on the knowledge data training technology is characterized by comprising the following steps of:
Based on historical construction data of an application model of a target application, acquiring a historical functional demand information set and a historical application data set;
Classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information sets to obtain a plurality of sample function demand information sets;
calculating a plurality of application information entropies of the plurality of sample application data categories according to the number of sample application data in the plurality of sample application data sets;
According to the application information entropy, analyzing and acquiring a plurality of data training amounts and a plurality of training accuracies of knowledge data training of the sample application data categories;
Judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis, generating compensation training data by generating an countermeasure model, and obtaining the training data set;
And training knowledge data according to the training data sets and the training accuracies to obtain an application constructor, identifying functional requirement information provided by a user, and outputting constructed application model data.
2. The method of claim 1, wherein classifying the same historical application data as the set of historical application data to obtain a plurality of sample application data sets corresponding to a plurality of sample application data categories, and mapping and classifying the set of historical functional requirement information to obtain a plurality of sample functional requirement information sets, comprises:
extracting non-repeated historical application data of the historical application data set as the plurality of sample application data categories;
classifying the historical application data sets according to the plurality of sample application data categories to obtain a plurality of sample application data sets;
And mapping and classifying the historical function demand information sets according to the mapping relation between the application data of each sample and the historical function demand information to obtain a plurality of sample function demand information sets.
3. The method of claim 1, wherein calculating a plurality of application information entropies for the plurality of sample application data categories based on the number of sample application data within the plurality of sample application data sets, comprises:
acquiring the number of sample application data in the plurality of sample application data sets, and calculating the ratio of each number to the sum of the plurality of numbers to obtain a plurality of application probabilities;
According to the application probabilities, calculating a plurality of application information entropies for obtaining a plurality of sample application data categories, wherein the application information entropies are as follows:
;
Where M is the number of application data categories for a plurality of samples, The application probability of the data class is applied for the i-th sample.
4. The method of claim 1, wherein analyzing a plurality of data training amounts and a plurality of training accuracies for knowledge data training by obtaining the plurality of sample application data categories based on the plurality of application information entropies, comprises:
acquiring basic data training quantity and basic training precision of knowledge data training, wherein the basic training precision comprises basic accuracy;
Distributing and calculating to obtain a plurality of adjustment coefficients according to the plurality of application information entropies, wherein the magnitude of the sample application information entropies is inversely related to the magnitude of the adjustment coefficients;
And adopting the plurality of adjustment coefficients to perform adjustment calculation on the basic data training quantity and the basic training precision, and obtaining the plurality of data training quantities and the plurality of training precision.
5. The method of claim 1, wherein selecting different sample function requirement information from the set of sample function requirement information according to the amount of data training to obtain the training data set comprises:
Extracting the occurrence probability of different sample function requirement information according to the plurality of sample function requirement information sets to obtain a plurality of sample occurrence probability sets;
and according to the multiple sample occurrence probability sets, sequentially selecting sample function requirement information with high occurrence probability from the sample function requirement information sets until the corresponding data training quantity is met, and obtaining a training data set.
6. The method of claim 1, wherein obtaining the amount of compensation data based on the application information entropy analysis, generating compensation training data by generating an countermeasure model, obtaining a training data set, comprises:
extracting different sample function requirement information sets with the quantity smaller than the corresponding data training quantity as a plurality of compensation sample function requirement information sets;
calculating to obtain a plurality of basic compensation quantities according to different sample function requirement information quantities and a plurality of training data quantities in the plurality of compensation sample function requirement information sets;
Performing correction calculation on the plurality of basic compensation amounts by adopting application information entropy corresponding to the plurality of compensation sample function requirement information sets to obtain a plurality of data compensation amounts, wherein the magnitude of the application information entropy is positively correlated with the magnitude of the corrected data compensation amount;
and generating compensation training data by generating an countermeasure model according to the plurality of data compensation amounts and the plurality of compensation sample function requirement information sets to form a training data set.
7. The method of claim 6, wherein generating compensation training data by generating an countermeasure model in accordance with the plurality of compensation sample functional requirement information sets based on the plurality of data compensation amounts, comprises:
Constructing a data compensation channel based on the generation of the countermeasure network model, wherein the data compensation channel comprises a generator and a discriminator;
Acquiring a training generation function requirement information set and a training discrimination function requirement information set, and performing supervision training on the data compensation channel until convergence;
And based on the converged data compensation channel, inputting and generating the plurality of compensation sample function requirement information sets, obtaining generated compensation training data, and forming a training data set.
8. An application model building system based on knowledge data training techniques, characterized by the steps for implementing the method of any of claims 1 to 7, the system comprising:
The historical data set acquisition module is used for acquiring a historical function demand information set and a historical application data set based on historical construction data of an application model of the target application;
The demand information set acquisition module is used for classifying the historical application data with the same historical application data set to obtain a plurality of sample application data sets corresponding to a plurality of sample application data types, and mapping and classifying the historical function demand information set to obtain a plurality of sample function demand information sets;
the application information entropy calculation module is used for calculating a plurality of application information entropies of the sample application data categories according to the number of sample application data in the sample application data sets;
The precision requirement acquisition module is used for analyzing and acquiring a plurality of data training amounts and a plurality of training precision of knowledge data training of the plurality of sample application data categories according to the plurality of application information entropies;
The training data set acquisition module is used for judging whether the number of different sample function requirement information in the plurality of sample function requirement information sets is larger than or equal to the plurality of data training amounts, if so, selecting different sample function requirement information from the sample function requirement information sets according to the data training amounts to obtain a training data set, otherwise, obtaining a compensation data amount according to application information entropy analysis to generate compensation training data by generating an countermeasure model to obtain the training data set;
the application builder acquisition module is used for training knowledge data according to a plurality of training data sets and a plurality of training accuracies, obtaining an application builder, identifying functional requirement information provided by a user and outputting built application model data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410390624.2A CN117973522B (en) | 2024-04-02 | 2024-04-02 | Knowledge data training technology-based application model construction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410390624.2A CN117973522B (en) | 2024-04-02 | 2024-04-02 | Knowledge data training technology-based application model construction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117973522A true CN117973522A (en) | 2024-05-03 |
CN117973522B CN117973522B (en) | 2024-06-04 |
Family
ID=90864737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410390624.2A Active CN117973522B (en) | 2024-04-02 | 2024-04-02 | Knowledge data training technology-based application model construction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973522B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020038307A1 (en) * | 2000-01-03 | 2002-03-28 | Zoran Obradovic | Systems and methods for knowledge discovery in spatial data |
CN107168255A (en) * | 2017-05-16 | 2017-09-15 | 浙江工业大学 | Polypropylene melt index mixed modeling method based on integrated neural network |
WO2020199591A1 (en) * | 2019-03-29 | 2020-10-08 | 平安科技(深圳)有限公司 | Text categorization model training method, apparatus, computer device, and storage medium |
CN115022038A (en) * | 2022-05-31 | 2022-09-06 | 广东电网有限责任公司 | Power grid network anomaly detection method, device, equipment and storage medium |
US11803815B1 (en) * | 2018-07-27 | 2023-10-31 | Vettery, Inc. | System for the computer matching of targets using machine learning |
CN117113169A (en) * | 2023-07-25 | 2023-11-24 | 华南理工大学 | Industrial flow type data on-line fault diagnosis method based on deep knowledge distillation network |
CN117495485A (en) * | 2023-10-30 | 2024-02-02 | 中国移动通信集团江苏有限公司 | Product recommendation method, device and readable storage medium |
CN117743719A (en) * | 2023-12-22 | 2024-03-22 | 北京京航计算通讯研究所 | Page element identification method |
-
2024
- 2024-04-02 CN CN202410390624.2A patent/CN117973522B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020038307A1 (en) * | 2000-01-03 | 2002-03-28 | Zoran Obradovic | Systems and methods for knowledge discovery in spatial data |
CN107168255A (en) * | 2017-05-16 | 2017-09-15 | 浙江工业大学 | Polypropylene melt index mixed modeling method based on integrated neural network |
US11803815B1 (en) * | 2018-07-27 | 2023-10-31 | Vettery, Inc. | System for the computer matching of targets using machine learning |
WO2020199591A1 (en) * | 2019-03-29 | 2020-10-08 | 平安科技(深圳)有限公司 | Text categorization model training method, apparatus, computer device, and storage medium |
CN115022038A (en) * | 2022-05-31 | 2022-09-06 | 广东电网有限责任公司 | Power grid network anomaly detection method, device, equipment and storage medium |
CN117113169A (en) * | 2023-07-25 | 2023-11-24 | 华南理工大学 | Industrial flow type data on-line fault diagnosis method based on deep knowledge distillation network |
CN117495485A (en) * | 2023-10-30 | 2024-02-02 | 中国移动通信集团江苏有限公司 | Product recommendation method, device and readable storage medium |
CN117743719A (en) * | 2023-12-22 | 2024-03-22 | 北京京航计算通讯研究所 | Page element identification method |
Non-Patent Citations (1)
Title |
---|
冷喜武 等: "智能电网监控运行大数据分析系统的数据规范和数据处理", 电力系统自动化, vol. 42, no. 19, 10 October 2018 (2018-10-10), pages 169 - 176 * |
Also Published As
Publication number | Publication date |
---|---|
CN117973522B (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105740228B (en) | A kind of internet public feelings analysis method and system | |
CN103559504B (en) | Image target category identification method and device | |
CN112463971B (en) | E-commerce commodity classification method and system based on hierarchical combination model | |
CN108681742B (en) | Analysis method for analyzing sensitivity of driver driving behavior to vehicle energy consumption | |
CN112836509A (en) | Expert system knowledge base construction method and system | |
CN112329816A (en) | Data classification method and device, electronic equipment and readable storage medium | |
CN109685104B (en) | Determination method and device for recognition model | |
CN110222733B (en) | High-precision multi-order neural network classification method and system | |
CN101256631A (en) | Method, apparatus, program and readable storage medium for character recognition | |
CN115794803B (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN117151870A (en) | Portrait behavior analysis method and system based on guest group | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
Zhang et al. | Reinforced adaptive knowledge learning for multimodal fake news detection | |
CN117351484B (en) | Tumor stem cell characteristic extraction and classification system based on AI | |
CN113837266A (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
CN117973522B (en) | Knowledge data training technology-based application model construction method and system | |
CN116955534A (en) | Intelligent complaint work order processing method, intelligent complaint work order processing device, intelligent complaint work order processing equipment and storage medium | |
CN111339258A (en) | University computer basic exercise recommendation method based on knowledge graph | |
CN115437960A (en) | Regression test case sequencing method, device, equipment and storage medium | |
CN115098674A (en) | Method for generating confrontation network generation data based on cloud ERP supply chain ecosphere | |
CN111127184B (en) | Distributed combined credit evaluation method | |
CN113792141A (en) | Feature selection method based on covariance measurement factor | |
CN115904920A (en) | Test case recommendation method and device, terminal and storage medium | |
CN117437976B (en) | Disease risk screening method and system based on gene detection | |
CN117667890B (en) | Knowledge base construction method and system for standard digitization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |