CN118113685A

CN118113685A - Big data-based model database management system and method

Info

Publication number: CN118113685A
Application number: CN202410480149.8A
Authority: CN
Inventors: 张大舜; 杜娟; 李姝博; 孙溯辉; 郭克; 苗立琴; 范悦; 王静; 乔欢; 王若平; 王怡琳
Original assignee: Changchun Equipment & Technology Research Institute
Current assignee: Changchun Equipment & Technology Research Institute
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-05-31
Anticipated expiration: 2044-04-22
Also published as: CN118113685B

Abstract

The invention discloses a model database management system and method based on big data, and belongs to the technical field of data management. The system of the invention comprises: the system comprises a model feature extraction module, a history monitoring data processing module, an association relation analysis module and a real-time monitoring data processing module; the model feature extraction module extracts feature data from a model database and classifies the models to construct a model feature data table; the history monitoring data processing module calculates a user model calling score and determines a user model calling grade; the association relation analysis module analyzes the history monitoring data and acquires a call limit value and a history call score of the model; and the real-time monitoring data processing module processes the real-time monitoring data, analyzes the current model calling condition by combining the model characteristic data table, calculates the real-time calling score and recommends the model.

Description

Big data-based model database management system and method

Technical Field

The invention relates to the technical field of data management, in particular to a model database management system and method based on big data.

Background

In the information age today, big data technology has become one of the important driving forces for the development of various industries. With the rapid development of artificial intelligence, machine learning, deep learning, and other technologies, model databases play an increasingly important role as one of the important infrastructures supporting the application of these technologies. However, with the ever-increasing volume of users and data size, model database management faces a number of challenges, including resource competition, performance degradation, security issues, data confusion, and the like.

When a user does not explicitly call which model is included in a traditional model database management system, the prior art generally recommends a model according to a history call model record, and the situation that the same model in a model database is called by the plurality of different users can occur, and the problems of performance bottleneck and low resource utilization rate can be faced to the situation; due to the lack of comprehensive monitoring and analysis of user requests and system resource utilization, conventional systems are not effective in addressing challenges in highly concurrent environments, resulting in reduced system management efficiency and operational performance.

Disclosure of Invention

The invention aims to provide a model database management system and method based on big data, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme:

A model database management method based on big data comprises the following steps:

s100, acquiring feature data of all models from a model database, classifying all models in the model database based on the feature data of the models, and constructing a model feature data table aiming at each model;

S200, acquiring historical monitoring data of a model database through a big data technology, wherein the historical monitoring data comprises service condition data, user request data and model performance data of the model database; analyzing user request data in the history monitoring data to obtain a user model calling grade;

S300, analyzing the association relation among the service condition data, the model performance data and the model calling grade of the model database, and obtaining a calling limit value of each model according to the association relation; calculating historical call scores of each model in the model database according to the call limit value;

S400, acquiring real-time monitoring data, analyzing the real-time monitoring data and matching an applicable model by combining a model characteristic data table; and calculating the real-time calling scores of all the applicable models, and recommending the models according to the real-time calling scores.

Further, step S100 includes:

S101, connecting a model database, obtaining names of all models of the model database, and distributing corresponding model identifiers according to the names of the models, wherein each model has one and only one model identifier; extracting feature data of all models from a model database based on the model identifiers, and associating the model identifiers with the corresponding feature data; performing format conversion on the feature data of each model identifier to form a model feature vector Vi, wherein vi= { Vi1, vi2, i.e., vin }, i represents the model identifier, vi1 represents the 1 st feature data corresponding to the model identifier i, vi2 represents the 2 nd feature data corresponding to the model identifier i, and so on, vin represents the n-th feature data corresponding to the model identifier i, n is a feature data number, and a positive integer is taken;

s102, sequentially carrying out similarity calculation on all the extracted model feature vectors, classifying the model feature vectors with equal similarity into one class, traversing each model class, and creating a model feature data table for each class, wherein the format of the model feature data table is as follows: model name-model feature vector;

s103, obtaining an update log of the model database once every other period, and judging whether the model in the model database is newly added or not; if the new model does not exist, no operation is performed; if there is a new model, the operations of steps S101 and S102 are repeated, and the model feature vector and the model feature data table are updated.

Further, step S200 includes:

s201, the service condition data of the model database refer to access information and model call records of the model database; the user request data refers to the requirement information of a user for calling a model and the final calling model information; the model performance data refer to model response time and model accuracy;

S202, acquiring user request data in a selected period T from historical monitoring data of a model database, dividing the user request data according to corresponding user numbers, wherein one user number corresponds to a plurality of user request data; according to the receiving time of the user request data, sequencing the user request data corresponding to all user numbers according to a time sequence, and counting the number m of the required information pieces in the user request data corresponding to the user numbers in a selected period T;

S203, obtaining final calling model information in user request data to obtain a final calling model identifier; combining the final calling model identifier with the model calling record in the service condition data of the model database, and matching the model performance data corresponding to the final calling model identifier in the model performance data to obtain the model response time t and the model accuracy z corresponding to the calling model identifier; calculating a user model call score Q in a selected period T, wherein the specific formula is as follows:

，

Wherein, alpha and beta represent weight coefficients, m _j represents the number of required information in user request data corresponding to the j-th user number in the selected period T, j represents the number of user numbers, and a positive integer from 1 to N is taken; t _j represents the model response time of the final calling model corresponding to the jth user number in the selected period T; z _jx represents the model accuracy of the xth model finally called by the jth user number in the selected period T, x represents the number of the finally called models, and a positive integer from 1 to M is taken; and arranging the call scores Q of each user model in the selected period T in order from large to small, classifying the call scores Q of the user models into one class, wherein one user model call score Q represents one user model call level, and the higher the call score Q is, the higher the priority of the corresponding user model call level is.

Further, step S300 includes:

S301, acquiring historical monitoring data of a model database in a selected period T, dividing service condition data and model performance data of the model database in the historical monitoring data according to time points in the selected period T to form a correlation analysis set A, and arranging each element in the correlation analysis set A according to a time sequence; each element in the association analysis set A comprises a user number M corresponding to one time point in a selected period T, a model response time average value T0, a model accuracy average value z0 and a user model calling score average value Q0; the user quantity M is obtained from the service condition data of the model database, and the model response time average value t0 and the model accuracy rate average value z0 are obtained from the model performance data;

S302, carrying out normalization processing on the number M of users, the model response time average value t0, the model accuracy average value z0 and the user model call score average value Q0 corresponding to each element in the association analysis set A based on the association analysis set A, marking data points on a radar graph according to the sequence of the values corresponding to each element in the association analysis set A subjected to normalization processing, wherein the radar graph comprises four axes with the same starting point, namely k1, k2, k3 and k4; wherein k1 represents the number of users M, k2 represents the model response time average t0, k3 represents the model accuracy average z0, and k4 represents the user model call score average Q0;

S303, sequentially connecting points of the same time sequence of two adjacent shafts on the radar chart to form a quadrangle, and respectively analyzing the position relation of connecting line segments between the two adjacent shafts so as to obtain the association relation between the service condition data, the model performance data and the model calling grade of the model database; according to the association relation, obtaining a relation between each model calling times Mc of each time point in the selected period T and corresponding model response time T and model accuracy z, wherein the specific expression is as follows: mc=f (t, z), where f (t, z) represents a specific functional relationship, determined by correlation analysis; obtaining a calling limit value Mc_max of each model according to the expression, wherein the model response time t corresponding to the calling limit value Mc_max is the maximum value and the model accuracy z is the minimum value;

S304, calculating a historical call score S of each model in a model database according to a call limit value Mc_max of each model, wherein the specific formula is as follows:

，

recording historical call scores for each model in the model database in chronological order.

Further, step S400 includes:

S401, acquiring real-time monitoring data, wherein the real-time monitoring data refer to service condition data, user request data, model performance data of a model database called by a model last time from a current time point and latest user request data of the current time point; the latest user request data only comprises model demand information, and model call is not carried out at the current moment;

s402, extracting model features to be matched to form a model feature vector Vd to be matched according to latest user request data, performing similarity calculation on the model feature vector Vd to be matched and a model feature vector Vi in a model feature data table, selecting a model category corresponding to the model feature vector Vi with the largest similarity as the model category to be matched, and searching a corresponding model feature data table to obtain a model identifier of a matched applicable model;

S403, calculating real-time calling scores of all matched applicable models based on the service condition data of a model database which is in real-time monitoring data and is called by the model last time from the current time point, and arranging model identifiers of all matched applicable models according to the sequence from the big to the small of the real-time calling scores, so that corresponding recommended model identifiers are output to each latest user request information of the current time point.

A big data based model database management system comprising: the system comprises a model feature extraction module, a history monitoring data processing module, an association relation analysis module and a real-time monitoring data processing module;

the model feature extraction module extracts feature data of a model from a model database, converts the feature data into model feature vectors, classifies the model, and constructs a model feature data table;

the history monitoring data processing module processes history monitoring data of the model database, including user request data, model service condition data and model performance data, and calculates a user model calling score in a selected period T so as to obtain a user model calling grade;

The association relation analysis module analyzes the association relation among the service condition data of the model database, the model performance data and the model calling grade, and obtains the calling limit value of each model according to the association relation; calculating historical call scores of each model in the model database according to the call limit value;

the real-time monitoring data processing module processes the real-time monitoring data, analyzes the current model calling condition by combining the model characteristic data table, calculates the real-time calling score, and recommends the model.

Further, the model feature extraction module comprises a feature data extraction unit, a model feature data table creation unit and a database update monitoring unit;

the feature data extraction unit is connected with the model database, acquires the names of all models, distributes corresponding model identifiers for each model, extracts the feature data of all models from the model database based on the model identifiers, associates the model identifiers with the corresponding feature data, and performs format conversion on the feature data of each model identifier so as to form model feature vectors;

All model feature vectors extracted by the model feature data table creation unit are sequentially subjected to similarity calculation, similarity is classified into one type, each model category is traversed, a model feature data table is created for each category, and model names and corresponding model feature vectors are recorded in the table;

the database updating monitoring unit acquires a model database updating log once every other period, judges whether a model in the model database is newly added, and repeatedly executes the operations of the feature data extracting unit and the model feature data table creating unit if the newly added model exists, and updates the model feature vector and the model feature data table.

Further, the history monitoring data processing module comprises a user request data processing unit, a model performance data matching unit and a user model calling score calculating unit;

The user request data processing unit acquires user request data in a selected period T from historical monitoring data of a model database, divides the user request data according to user numbers and sorts the user request data according to time sequence, and counts the number of required information pieces in the user request data corresponding to each user number in the selected period T;

The model performance data matching unit is combined with final calling model information in the user request data and model calling records in the service condition data of the model database, matches model performance data corresponding to the final calling model, and obtains model response time and model accuracy;

The user model calling score calculating unit calculates the user model calling score in the selected period T, and classifies users with the same user model calling score into one class to form different user model calling grades.

Further, the association analysis module comprises a data processing unit, a visual display unit, an association analysis unit and a history call score calculation unit;

The data processing unit extracts service condition data and model performance data of the model database from historical monitoring data of the model database, divides the data according to time points in a selected period T to form a correlation analysis set, and performs normalization processing on the data in the correlation analysis set;

The visual display unit displays the data in the normalized association analysis set on the radar chart and connects points with the same sequence on adjacent axes to form a quadrangle;

the association relation analysis unit derives a relation formula among the service condition data of the model database, the model performance data and the model calling times according to the position relation of the data points on the radar chart, so as to obtain a calling limit value of each model;

the historical call score calculating unit calculates and records the historical call score of each model in the model database according to the model call limit value.

Further, the real-time monitoring data processing module comprises a real-time monitoring data acquisition unit, a model feature matching unit and a real-time calling score calculating unit;

The method comprises the steps that a real-time monitoring data acquisition unit acquires service condition data, user request data, model performance data and latest user request data of a model database which is called by a model last time from a current time point, wherein the latest user request data only comprises model demand information;

The model feature matching unit extracts model features to be matched to form model feature vectors to be matched according to the latest user request data, performs similarity calculation with the model feature vectors in the model feature data table, selects a model category corresponding to the model feature vector with the largest similarity as the model category to be matched, searches the corresponding model feature data table, and obtains a model identifier of a matched applicable model;

The real-time calling score calculating unit calculates the real-time calling scores of all the matched applicable models based on the service condition data of the model database which is the latest model call from the current time point in the real-time monitoring data, arranges the model identifiers of all the matched applicable models according to the sequence from the big to the small of the real-time calling scores, and outputs corresponding recommended model identifiers to each latest user request information of the current time point.

Compared with the prior art, the invention has the following beneficial effects:

By connecting the model database, extracting the model feature data, associating the model identifier and the feature data and updating the model database regularly, the automatic processing of the model management process is realized, the manual intervention is reduced, and the management efficiency is improved; the similarity calculation is carried out based on the extracted model feature vectors, and the models with equal similarity are classified into one type, so that more accurate model classification is realized, and a user is facilitated to find a required model more quickly; the function of dynamically updating the model characteristic data table is realized by periodically acquiring the model database update log and judging whether a new model exists or not, and the instantaneity and the integrity of the model database are maintained;

The user model call score is calculated by combining the user request data, the use condition data of the model database and the model performance data, so that model call evaluation for each user is realized, the user model call scores are arranged in a sequence from large to small based on the user model call scores, the user model call scores are classified into one class, the model call grade is divided, the model selection flow is optimized, and the user is helped to find a model with better performance more quickly;

The method can evaluate the historical call performance of the model more comprehensively and systematically, provide more accurate model selection reference for users, and can deeply mine the association relation among the service condition data, the model performance data and the model call grade of the model database, provide more information support for model management and help optimize the selection and updating strategies of the model in the model library.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of a big data based model database management system module according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides the following technical solutions:

The model feature extraction module comprises a feature data extraction unit, a model feature data table creation unit and a database update monitoring unit;

The history monitoring data processing module comprises a user request data processing unit, a model performance data matching unit and a user model calling score calculating unit;

The association relation analysis module comprises a data processing unit, a visual display unit, an association relation analysis unit and a history call score calculation unit;

The real-time monitoring data processing module comprises a real-time monitoring data acquisition unit, a model feature matching unit and a real-time calling score calculating unit;

The step S100 includes:

In this embodiment, feature data of all models are extracted from a model database, and it is assumed that the feature data refer to input and output specifications of the models, model structure features, model application scenes and the like, and these feature data can be directly extracted from the model database;

And sequentially carrying out similarity calculation on all the extracted model feature vectors, wherein various methods can be used for similarity calculation, and the common methods comprise cosine similarity, euclidean distance, manhattan distance and the like, and the related personnel can determine which method to calculate.

Step S200 includes:

，

Step S300 includes:

，

In the present embodiment, the analysis of the association relationship is as follows:

by observing the shape, relative position and trend among the data points in the radar chart, a plurality of preliminary association relation analysis can be obtained:

If the connecting lines between two data points are parallel or the trends are similar, the positive correlation between the two indexes possibly exists, for example, when the number of users increases, the response time of the model also increases;

If the connecting lines between two data points are crossed or have opposite trends, negative correlation between the two indexes possibly exists, for example, when the number of users increases, the accuracy of the model decreases;

If a data point is at a location where the angle between the connecting lines is large, this indicates that the correlation between the index and the other index is weak.

Through the above analysis, the relation between each model call number Mc at each time point in the selected period T and the corresponding model response time T and model accuracy z is finally determined, assuming that: mc=at+bz+c, where a, b both represent coefficients, c represents a constant term for excluding other factors that may affect the number of model calls; according to the expression, a calling limit value Mc_max of each model is obtained, and the model response time t corresponding to the calling limit value Mc_max is the maximum value and the model accuracy z is the minimum value;

Step S400 includes:

S401, acquiring real-time monitoring data, wherein the real-time monitoring data refer to service condition data, user request data, model performance data of a model database called by a model last time from a current time point and latest user request data of the current time point; the latest user request data only comprises model demand information and is not called by an incomplete model;

In this embodiment, according to the latest user request data, the feature of the model to be matched is extracted to form a feature vector Vd of the model to be matched, and the requirement of the user on the model, such as input and output specifications of the model, model structure features, model application scenes and the like, can be obtained for the latest user request data, similarity calculation is performed on the feature of the model to be matched, which is formed by the extracted feature of the model to be matched, and the feature vector of the model in the model feature data table, so as to obtain the category of the model to be matched, and a corresponding model feature data table is searched to obtain the model identifier of the matched applicable model.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A model database management method based on big data is characterized in that: the method comprises the following steps:

The step S300 includes:

S301, acquiring historical monitoring data of a model database in a selected period T, dividing service condition data and model performance data of the model database in the historical monitoring data according to time points in the selected period T to form a correlation analysis set A, and arranging each element in the correlation analysis set A according to a time sequence; each element in the association analysis set A comprises a user number M corresponding to one time point in a selected period T, a model response time average value T0, a model accuracy average value z0 and a user model calling score average value Q0;

S303, sequentially connecting points of the same time sequence of two adjacent shafts on the radar chart to form a quadrangle, and respectively analyzing the position relation of connecting line segments between the two adjacent shafts so as to obtain the association relation between the service condition data, the model performance data and the model calling grade of the model database; according to the association relation, obtaining a relation between each model calling times Mc of each time point in the selected period T and corresponding model response time T and model accuracy z, wherein the specific expression is as follows: mc=f (t, z), where f (t, z) represents a specific functional relationship; obtaining a calling limit value Mc_max of each model according to the expression, wherein the model response time t corresponding to the calling limit value Mc_max is the maximum value and the model accuracy z is the minimum value;

，

recording historical call scores of each model in the model database according to the time sequence;

2. The big data based model database management method according to claim 1, wherein: the step S100 includes:

3. The big data based model database management method according to claim 1, wherein: the step S200 includes:

，

Wherein, alpha and beta represent weight coefficients, m _j represents the number of required information in user request data corresponding to the j-th user number in the selected period T, j represents the number of user numbers, and a positive integer from 1 to N is taken; t _j represents the model response time of the final calling model corresponding to the jth user number in the selected period T; z _jx represents the model accuracy of the xth model finally called by the jth user number in the selected period T, x represents the number of the finally called models, and a positive integer from 1 to M is taken; and arranging the calling scores Q of each user model in the selected period T in order from large to small, classifying the same user model calling scores Q into one class, wherein one user model calling score Q represents one user model calling grade.

4. The big data based model database management method according to claim 1, wherein: the step S400 includes:

S401, acquiring real-time monitoring data, wherein the real-time monitoring data refer to service condition data, user request data, model performance data of a model database called by a model last time from a current time point and latest user request data of the current time point; the latest user request data only comprises model demand information;

5. A big data based model database management system using a big data based model database management method according to any of claims 1-4, characterized in that: the system comprises: the system comprises a model feature extraction module, a history monitoring data processing module, an association relation analysis module and a real-time monitoring data processing module;

the history monitoring data processing module processes history monitoring data of a model database, including user request data, model use condition data and model performance data, and calculates a user model calling score in a selected period T so as to obtain a user model calling grade;

the association relation analysis module analyzes the association relation among the service condition data, the model performance data and the model calling grade of the model database, and obtains a calling limit value of each model according to the association relation; calculating historical call scores of each model in the model database according to the call limit value;

And the real-time monitoring data processing module processes the real-time monitoring data, analyzes the current model calling condition by combining the model characteristic data table, calculates the real-time calling score and recommends the model.

6. The big data based model database management system of claim 5, wherein: the model feature extraction module comprises a feature data extraction unit, a model feature data table creation unit and a database update monitoring unit;

The feature data extraction unit is connected with the model database, acquires the names of all models, distributes corresponding model identifiers for each model, extracts feature data of all models from the model database based on the model identifiers, associates the model identifiers with the corresponding feature data, and performs format conversion on the feature data of each model identifier so as to form model feature vectors;

All model feature vectors extracted by the model feature data table creation unit are sequentially subjected to similarity calculation, similarity is classified into one class, each model class is traversed, a model feature data table is created for each class, and model names and corresponding model feature vectors are recorded in the table;

And the database updating monitoring unit acquires a model database updating log once every other period, judges whether a model in the model database is newly added, and repeatedly executes the operations of the feature data extracting unit and the model feature data table creating unit if the newly added model exists, and updates the model feature vector and the model feature data table.

7. The big data based model database management system of claim 5, wherein: the history monitoring data processing module comprises a user request data processing unit, a model performance data matching unit and a user model calling score calculating unit;

The user request data processing unit acquires user request data in a selected period T from historical monitoring data of a model database, divides the user request data according to user numbers and orders the user request data according to time sequence, and counts the number of required information in the user request data corresponding to each user number in the selected period T;

The model performance data matching unit is used for matching model performance data corresponding to a final calling model by combining final calling model information in user request data and a model calling record in service condition data of a model database, and obtaining model response time and model accuracy;

8. The big data based model database management system of claim 5, wherein: the association analysis module comprises a data processing unit, a visual display unit, an association analysis unit and a history call score calculation unit;

The visual display unit displays the data in the normalized association analysis set on a radar chart and connects points with the same sequence on adjacent shafts to form a quadrangle;

And the historical call score calculating unit calculates and records the historical call score of each model in the model database according to the model call limit value.

9. The big data based model database management system of claim 5, wherein: the real-time monitoring data processing module comprises a real-time monitoring data acquisition unit, a model feature matching unit and a real-time calling score calculating unit;

The real-time monitoring data acquisition unit acquires service condition data, user request data, model performance data and latest user request data of a model database which is called by a model last time from a current time point, wherein the latest user request data only comprises model demand information;

the model feature matching unit extracts model features to be matched to form model feature vectors to be matched according to the latest user request data, performs similarity calculation with the model feature vectors in the model feature data table, selects a model category corresponding to the model feature vector with the largest similarity as the model category to be matched, and searches the corresponding model feature data table to obtain a model identifier of a matched applicable model;

the real-time calling score calculating unit calculates the real-time calling scores of all the matched applicable models based on the service condition data of the model database which is the last time of model calling from the current time point in the real-time monitoring data, arranges the model identifiers of all the matched applicable models according to the sequence from the big to the small of the real-time calling scores, and outputs corresponding recommended model identifiers to each latest user request information of the current time point.