CN117788172A

CN117788172A - Data asset assessment method, device, equipment and medium based on large model

Info

Publication number: CN117788172A
Application number: CN202311871038.1A
Authority: CN
Inventors: 张兴慧; 刘小勃
Original assignee: Beijing Xinliu Data Technology Co ltd
Current assignee: Beijing Xinliu Data Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-03-29

Abstract

The invention discloses a data asset assessment method, device, equipment and medium based on a large model. The method comprises the following steps: determining question information to be used based on the evaluation question information comprising the data identification; determining at least one task to be processed corresponding to the questioning information to be used and task description information corresponding to each task to be processed; determining a program to be executed corresponding to each task to be processed based on the task description information; and sequentially performing task processing according to the execution sequence of each program to be executed to obtain an evaluation result of the data to be evaluated corresponding to the data identifier. The method solves the problems of high analysis cost and poor effect caused by manually or software analysis of the data assets in the prior art, improves the automation, the accuracy and the high efficiency of data asset analysis while reducing the analysis cost of the data assets, improves the analysis effect, and achieves the effect of meeting the analysis requirement of users on the data assets.

Description

Data asset assessment method, device, equipment and medium based on large model

Technical Field

The present invention relates to the field of computer processing technologies, and in particular, to a data asset assessment method, apparatus, device, and medium based on a large model.

Background

With the development of big data and artificial intelligence, there is an increasing need for analysis of data assets. The existing data asset analysis method is to manually check and analyze data assets by a user, calculate indexes such as data precision, record repetition rate, record filling rate and the like, so as to know the quality, cost and potential value of the data asset.

This analysis is largely dependent on the expertise and experience of the user and requires a significant amount of analysis cost. Alternatively, specialized data analysis software is used for data asset analysis, such as Excel, database management systems, and the like. The method requires a user to write and run a custom script or query to execute the analysis task, and when a new analysis requirement exists, the user needs to rewrite or rewrite the script to execute the analysis task, so that the problems of high cost and poor analysis effect exist.

Disclosure of Invention

The invention provides a data asset assessment method, device, equipment and medium based on a large model, which are used for improving the accuracy and convenience of data asset analysis while reducing the data asset analysis cost, improving the analysis effect and achieving the effect of meeting the analysis requirement of users on the data asset.

According to an aspect of the present invention, there is provided a data asset assessment method based on a large model, the method comprising:

Determining question information to be used based on the evaluation question information comprising the data identification;

determining at least one to-be-processed task corresponding to the to-be-used question information and task description information corresponding to each to-be-processed task;

determining a program to be executed corresponding to each task to be processed based on the task description information;

and sequentially performing task processing according to the execution sequence of each program to be executed to obtain an evaluation result of the data to be evaluated corresponding to the data identifier.

According to another aspect of the present invention, there is provided a large model-based data asset assessment apparatus, the apparatus comprising:

the questioning information to be used determining module is used for determining questioning information to be used based on the evaluation questioning information comprising the data identification;

the task description information determining module is used for determining at least one task to be processed corresponding to the questioning information to be used and task description information corresponding to each task to be processed;

the program to be executed determining module is used for determining a program to be executed corresponding to each task to be processed based on the task description information;

and the evaluation result determining module is used for sequentially performing task processing according to the execution sequence of each program to be executed to obtain an evaluation result of the data to be evaluated corresponding to the data identifier.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the large model-based data asset assessment method of any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a large model-based data asset assessment method according to any of the embodiments of the present invention when executed.

According to the technical scheme, the questioning information to be used is determined based on the evaluation questioning information comprising the data identification; determining at least one task to be processed corresponding to the questioning information to be used and task description information corresponding to each task to be processed; determining a program to be executed corresponding to each task to be processed based on the task description information; according to the method, task processing is sequentially carried out according to the execution sequence of each program to be executed to obtain an evaluation result of data to be evaluated corresponding to a data identifier, the problems that in the prior art, analysis cost is high and effect is poor are solved by analyzing data assets manually or by software, an AI question-answering mode is achieved, evaluation question information input by a user is received and processed to obtain question information to be used, the evaluation processing accuracy and effectiveness are improved, at least one task to be processed which is matched with the question information to be used is selected, task description information of each task to be processed is determined, the program to be executed of the task to be processed is determined according to the task description information, task processing is sequentially carried out according to the execution sequence of each program to be executed to obtain the evaluation result of the data to be evaluated corresponding to the data identifier, automation, accuracy and efficiency of data asset analysis are improved while data asset analysis cost is reduced, analysis effect is improved, and the effect of meeting the requirement of a user on data asset analysis is achieved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for large model-based data asset assessment, provided in accordance with a first embodiment of the present invention;

FIG. 2 is a schematic diagram showing an evaluation result according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of a data asset assessment device based on a large model according to a second embodiment of the present invention;

FIG. 4 is a schematic diagram of an electronic device implementing a large model-based data asset assessment method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a data asset evaluation method based on a large model, which may be implemented by a data asset evaluation device based on a large model, which may be configured in a computing device, according to an embodiment of the present invention. As shown in fig. 1, the method includes:

S110, determining the questioning information to be used based on the assessment questioning information comprising the data identification.

Wherein the data identification may be used to identify the uniqueness of the data to be evaluated. The data to be evaluated may refer to a data asset that needs to be evaluated, which may be a physically or electronically recorded data asset. For example, the data to be evaluated may be document data, electronic data, or the like. The assessment questioning information includes at least one of text, image, voice, video.

In this embodiment, the user may input question information corresponding to the data to be evaluated, which is desired to be evaluated, in the edit box, for example, the input question information may be a text, for example, "please load the data set 001, analyze its data accuracy" input during the AI dialogue; or may be a piece of speech. When the questioning information of the user is received, the user can consider that the assessment questioning information is received, semantic analysis can be carried out on the assessment questioning information, the assessment intention is identified, and the assessment questioning information is rewritten into text information which accords with the assessment expression intention to serve as questioning information to be used. The technical scheme provided by the embodiment can more clearly and professionally express the questioning information for evaluation by combining with the evaluation intention of the user, and improves the accuracy and the effectiveness of subsequent evaluation processing.

Illustratively, the evaluation user provides the evaluation question information to be "please load the data set 001, analyze the data precision thereof, and display the data precision score by using the histogram", recognizes the intention of the evaluation user, if the intention of the evaluation user is to perform precision analysis on the data in the data set 001, displays the intention of the analysis result by using the histogram, rewrites the evaluation question information based on the intention to obtain the question information to be used to be "use the data set number 001, performs the data precision analysis, and draws the data precision score to the histogram".

In this embodiment, determining question information to be used based on evaluation question information including a data identifier includes: acquiring evaluation questioning information comprising a data identifier; and determining the questioning information to be used according to the predetermined first prompting word template and the evaluation questioning information.

The first prompt word template may be a prompt string for guiding the model to generate a prompt string conforming to the evaluation intention. The first hint word template includes at least one phrase and an evaluation convention corresponding to the at least one phrase. For example, an assessment contract of "data precision" is associated with a data quality assessment.

Specifically, after the evaluation question information including the data identifier is obtained, the vocabulary in the evaluation question information can be understood by using a natural language analysis technology, and evaluation key operation recognition and evaluation target recognition are performed according to the meaning of each vocabulary in the evaluation question information. Illustratively, taking the example of evaluating the question information as "please load data set 001, analyze its data accuracy, and show the data accuracy score with a histogram", the meaning of each word and phrase in the evaluating question information is analyzed: identifying that "loading" indicates that data needs to be acquired, "dataset 001" indicates that a particular data identification, "analysis" indicates that data analysis is required, "data accuracy" indicates that a particular task of analysis is required, and "histogram" indicates a particular visualization mode; identifying evaluation key operations as "load" and "analyze"; the evaluation target of the recognition evaluation user is to analyze the data accuracy of the data set 001. Furthermore, the first prompt word template can be used for rewriting the evaluation question information according to the identified evaluation key operation and the evaluation target, so as to generate the question information to be used, and the definition, accuracy and readability of the question information are improved. For example, the prompting words in the evaluation question information can be replaced or modified according to the prompting words in the first prompting word template to generate new question information to be used; or extracting the prompt word in the evaluation question information, and filling the prompt word into the first prompt word template to generate the question information to be used.

According to the technical scheme provided by the embodiment, the common evaluation phrase and the convention are arranged in the first prompt word template, so that the system can better understand and rewrite the question information of the evaluation user through the first prompt word template guide model, the question information is more specific and clearer, the follow-up evaluation processing steps are ensured to be consistent with the instruction, and the accuracy and the effectiveness of data asset evaluation are improved.

S120, determining at least one task to be processed corresponding to the question information to be used and task description information corresponding to each task to be processed.

In this embodiment, task decomposition may be performed on the question information to be used to obtain a task flow to be processed corresponding to the execution of the data evaluation, where the task flow to be processed includes at least one task to be processed. The tasks to be processed may include data acquisition, data query, data preprocessing, data analysis, visual drawing, etc., and the tasks to be processed may have a dependency relationship between execution and execution. Further, according to the task attribute of each task to be processed, corresponding task description information can be determined. For example, task name, task type, data identification to be used, data storage location, task operation, and the like may be included in the task description information.

In this embodiment, determining at least one task to be processed corresponding to the question information to be used and task description information corresponding to each task to be processed includes: retrieving task prompt information; according to the task prompt information, processing the question information to be used, determining at least one task to be processed, and determining task description information corresponding to each task to be processed.

The task prompt information comprises at least one of description guide information, an information generation format, a task set to be selected of at least one task type, task description word conventions and description information examples. The description guidance information refers to information for guiding generation of the description information. The descriptive guide information can contain a certain flow or sequence information, and by providing an orderly and structured guide prompt, the model is helped to better understand task requirements, and corresponding task description information is generated according to the specified flow. The information generation format refers to a format of the generated description information. The task set to be selected comprises at least one task to be selected, and the task type is related to the operation of the task. The task specification may be used to describe the operation of the task execution. The term convention for task description refers to optional vocabulary agreed upon in generating task description, for example, words such as "sequentially acquire", "circularly acquire" and the like may be adopted in relation to multiple drawing indexes in generating drawings of time series data. The description information example may refer to an example of task description information. The task description information includes, but is not limited to, task numbers, task types, task descriptions, and the like. The task number may refer to the number of the task to be processed. For example, the first is processed, which may be numbered task1, and the second is processed, which may be numbered task2.

Specifically, the pre-configured task prompt information may be invoked, and the question information to be used is correspondingly processed through description guide information for generating description information in the task prompt information, for example, natural language processing technology (such as text classification, recognition, etc.) may be used to analyze the question information to be used, and relevant information of the evaluation task may be analyzed, where the relevant information may include, but is not limited to, key information such as target, input, output, etc. of the evaluation task. And selecting a task to be selected, which is matched with the evaluation task, from the tasks to be selected in the task prompt information as a task to be processed according to the related information of the task. For example, the matching degree between each step of the evaluation task and the task to be selected is calculated, and the task to be selected with the highest matching degree is selected as the task to be processed of the step. For the task to be processed in each step, task description information corresponding to each task to be processed can be determined according to task description word conventions, description information examples, information generation formats and the like in the task prompt information and by combining the self attribute of the task to be processed.

For example, the system may send the questioning information to be used as an instruction to a large model, and the large model invokes a Task prompt (i.e., task prompt information) configured in advance, and generates Task description information of each Task to be processed according to the instruction. For example, task prompt may be; please select the most suitable task (task to be processed) and generate its task description information according to the given instruction, the generation format of the description information is: task = { ' task_type ': task_construction ' }, there are four classes of selectable tasks (i.e. tasks to be selected), including [ data_selection_task ]: the method is used for selecting tasks such as data sources, inquiring data, selecting data columns and the like, and data_processing_task: for extracting and processing tasks related to data quality detection, etc., data_analysis_task: the data_visualization_task is used for analyzing tasks such as data asset cost, future benefits and the like: for one or more drawing line graphs, trend graphs or output statistics ]; when the task_instruction (i.e. task description) is generated, a plurality of indexes can be acquired sequentially, the time sequence data generally draw a line graph, the section data generally draw a bar graph, the task1 = {% s }, the task2 = {% s }, and only a table (i.e. task description word convention) is printed for a predicted task; { description information example with few samples (examples can help a large model to generate task description information more accurately and efficiently };. Instruction: data precision analysis is performed by using a data set number 001. The large model can be a large language model including GPT-3 (generating Pre-trained Transformer 3), GPT-4, BERT (Bidirectional Encoder Representations from Transformers), roBERTa and the like.

For example, the large model may return task description information that may be used to describe the task plan for the entire assessment task:

' task 1= { "data_selection_task": "select data source, determine data storage location with data set number 001. "},

task 2= { "data_selection_task": "query data, query relevant data from database. "},

task 3= { "data_processing_task": "select data column, select data column that needs to be analyzed for accuracy. "},

task 4= { "data_processing_task": "precision calculation is performed according to instruction calculation. "},

task 5= { "data_visualization_task": "shows the accuracy analysis result in a histogram.

"}'

S130, determining a program to be executed corresponding to each task to be processed based on the task description information.

The program to be executed may refer to program code for implementing tasks.

In this embodiment, a code generation technique may be used to generate a program to be executed corresponding to each task to be processed according to task description information corresponding to each task to be processed, so as to perform task processing based on the program to be executed.

In order to improve the accuracy of task processing and the normalization of code generation, response prompt information can be called in the process of determining a program to be executed corresponding to each task to be processed based on the description information of each task; and determining a program to be executed corresponding to each task to be processed based on the response prompt information and the task description information.

The response prompt information comprises task guide information, corresponding examples between the description information and the interface and interface configuration information. The task guidance information may refer to information for guiding generation of task codes. The task guidance information can contain a certain flow or sequence information, and by providing ordered and structured guidance prompts, the model is helped to better understand task requirements, and corresponding output is generated according to the specified flow. The corresponding examples can be examples containing the corresponding relation between the task description information and the interfaces, and the corresponding examples can help the model to better understand the corresponding relation between the task description information and the interfaces, so that the accuracy of interface determination is improved. The interface configuration information includes an interface name, an interface annotation, an in-parameter and an out-parameter corresponding to at least one interface to be selected in the interface tool library, and the interface configuration information may not include a function body of the interface. The interface to be selected may refer to an operation that is pre-packaged and used to implement a certain function, and the interface may include a function, a method, an attribute, and so on.

It should be noted that, the manner of determining the program to be executed corresponding to each task to be processed is the same, and the description may be given by taking the program to be executed for determining any task to be processed as an example.

In this embodiment, the pre-configured response prompt information may be invoked, and in combination with the interface configuration information in the response prompt information and the task description in the task description information, interface information capable of executing the task description is selected from the interface configuration information, and the interface name is embedded into the code to be executed of the task to be processed; the input parameters of the task to be processed can be determined according to the output result of the previous task to be processed of the task to be processed, and the code to be executed is updated based on the input parameters; and the output result of the task to be processed can be used as the input result of the next task to be processed of the task to be processed, the input position of the output result of the task to be processed is determined, the code to be executed is updated based on the input position, and the updated program to be executed corresponding to the task to be processed is obtained.

Optionally, determining the program to be executed corresponding to each task to be processed based on the response prompt information and the task description information includes: based on the task description information, and the interface configuration information and/or the corresponding examples in the response prompt information, determining function parameters corresponding to the corresponding tasks to be processed; and determining a program to be executed corresponding to the task to be processed based on the task description information, the task guide information in the response prompt information and the function parameters corresponding to the corresponding task to be processed.

Specifically, the process guidance can be performed by the task guidance information in the response prompt information, and the function of the task description information of the task to be processed is selected according to at least one of the interface configuration information in the response prompt information and the corresponding example, so as to generate the corresponding parameter for the function. Further, the process guidance can be continuously performed by the task guidance information in the response prompt information, corresponding codes are generated according to the task description information, and function parameters are added into the codes to generate a program to be executed corresponding to the task to be processed.

For example, the large model may invoke a pre-configured response hint information, e.g., including: flow sample (task guidance information): please finish the task description step by using the given function, each step can only select one or more functions without dependency from the following function library, and generate corresponding parameters for the functions, the parameter format is to strictly follow the function description, generate corresponding multiple results_i (output result) in parallel, the functions of the following steps use the results output by the previous functions as parameter input, and end with # #; function Library: { interface configuration information, only include annotation, in-parameter, out-parameter of interface, not include function body) }, { corresponding example of few samples }. The Instruction task specification includes: selecting a data source, and determining a data storage position with a data set number of 001; querying data, namely querying related data from a database; selecting a data column, and selecting the data column needing to be subjected to precision analysis; calculating precision according to the instruction; and displaying the precision analysis result by using a histogram.

For example, the large model may return to the program to be executed so that the evaluation task may be implemented in accordance with the program to be executed.

step1＝{

"arg1":["001"],

"function1":"select_data_source_and_location",

"output1":"result1",

"description1":"Selects the data source and determines the data storage location for a given dataset code."

},step2＝{

"arg1":["result1"],

"function1":"load_data_to_dataframe",

"output1":"result2",

"description1":"Loads data from a specified data storage location into a Pandas DataFrame."

},step3＝{

"arg1":["result2"],

"function1":"select_columns",

"output1":"result3",

"description1":"Selects specific columns from a DataFrame."

},step4＝{

"arg1":["result3"],

"function1":"calculate_column_precision",

"output1":"result4",

"description1":"Calculate the precision of each column in aDataFrame."

},step5＝{

"arg1" [ "result4", "bar", "accuracy score" ],

"function1":"plot_dataframe_and_table",

"output1":"result5",

"description1":"Plot a DataFrame as a table and a specified type of graph."

}

and S140, sequentially performing task processing according to the execution sequence of each program to be executed to obtain an evaluation result of the data to be evaluated corresponding to the data identifier.

The evaluation result may be used to characterize an attribute that evaluates the data to be evaluated, for example, the evaluation result may be a data value, a data reliability, a data integrity, a visual analysis result, or the like.

In this embodiment, each program to be executed represents a corresponding task to be processed, the execution sequence of each program to be executed is related to the dependency relationship between the tasks to be processed, the output result obtained after the execution of the program to be executed of the previous task to be processed is required to be given to the program to be executed of the current task to be processed, and the program to be executed of the task to be processed can be continuously executed. In the process of sequentially performing task processing according to the execution sequence of each program to be executed, the parallel programs to be executed can be placed into a thread pool for parallel execution, a plurality of tasks except for visual tasks are firstly executed by utilizing a reflecting mechanism of python language, after the current program to be executed is executed, the next program to be executed of the program to be executed is placed into the thread pool for processing, and an evaluation result is finally output in a mode of circularly placing task codes. Meanwhile, in the task processing process, whether potential abnormal conditions exist can be monitored, and if the potential abnormal conditions exist, early warning prompt information can be generated for feedback.

For example, the tasks may be performed in a multithreaded manner using a threadpolexechamter. ThreadPoolExecutor is a thread pool executor in Python. First, the system submits the code to be executed for the task to be executed to the thread pool using the executor.

The executor. Subset method accepts three parameters, respectively: a function parameter (such as parameter_and_exe) to be executed, the parameter representing a task to be executed by each parallel step; an input parameter (such as a call_subject parameter), which refers to an input parameter required by a task, and is different in each parallel step; a parameter (e.g., result_buffer) is stored, which is used to refer to a buffer for storing results. The system returns an iterator using a current. Futures. As_ completed (futures), which returns results when the task is completed, and the results of the task to be processed are collected and processed through a loop. In the loop, the system first attempts to acquire the result of the task to be processed using the future. If the task is successfully executed, the result will be included in the result variable. If an exception occurs while the task is executing, the system captures and processes the exception using the try-exception block, printing the exception information to the console. The number of the current parallel step can also be output in the successfully executed task so as to track the execution progress of the task.

In this embodiment, task processing is sequentially performed according to an execution sequence of each program to be executed to obtain an evaluation result of data to be evaluated corresponding to a data identifier, including: and if the program in the program to be executed is the function parameter, acquiring a target interface corresponding to the function parameter from a pre-constructed interface tool library so as to perform task processing based on the function in the target interface.

In this embodiment, in the process of running the program to be executed, if there are function parameters in the program, a target interface corresponding to the function parameters may be called from the interface tool library, and each function in the target interface is executed in parallel to perform task processing, so that task processing efficiency is improved.

In this embodiment, the evaluation result may also be visually displayed, for example, a graph corresponding to the evaluation result may be generated by a visualization tool to display the accuracy analysis result of the data, as shown in fig. 2. The specific implementation mode of the method can be as follows:

first, the data visualization task of the previously defined tasks to be processed, i.e., call, can be performed using the burst_and_exe method, using the reflection mechanism of the python language

The plot_dataframe_and_table method, and store the result in result_buffer_viz. Specifically, the parameter_and_exe method may be invoked and call_direct and result_buffer_viz passed as parameters.

Next, the result in result_buffer_viz is extracted as final_output, which includes a graphic (plt.axes), a table (pd.dataframe), and a text description. Where, plt.axes is a class in Matplotlib library for creating and managing graphics. Specifically, the plt.Axes object represents the coordinate axes and subgraphs in the graph. The coordinate axes are drawing areas in the graph, which contain the coordinate system of the data, for drawing charts and graphic elements of the data.

Then, the extracted results are processed separately. For each element in the final_output, the following is performed: if the element is a plt.Axes object, it is presented as a graphic. This includes displaying graphics on a screen for viewing by a user; if the element is a pd. DataFrame object, it is presented as a table. Tables typically contain detailed information of the data; if the element is not a plt.axes or pd.dataframe object, it is presented as a text description. These textual descriptions may include data analysis results or other relevant information.

The benefit of such an arrangement is that the assessment user may be facilitated to better understand and assess the data and analysis results of the assessment process by presenting the assessment results in a variety of ways.

In this embodiment, a task summary corresponding to the evaluation question information may also be generated according to the evaluation process data. In this way, the evaluating user can check whether the execution of the task matches his intent based on the task summary for review by the evaluating user.

For example, a task summary may be generated according to the evaluation process data, where the task summary includes an initial instruction, a task decomposition, a task plan, a summary of a task flow, and an execution result of each step, which are set by the evaluation user. The task summary may be presented in natural language to describe the logical flow of the overall assessment task. Further, the task summary can be displayed on a display interface, for example, the task summary can be presented to the assessment user in a text mode, so that the assessment user can view the task summary on the interface, understand the overall execution condition of the assessment task, conveniently check the task summary, check whether the task plan of the large model is consistent with the original task plan, and see whether the task plan meets the intention and the requirement of the user, thereby being beneficial to the assessment user to complete the data asset assessment task more quickly and accurately, and further improving the assessment accuracy and efficiency.

According to the technical scheme provided by the embodiment, the steps from understanding of the questioning instruction, task decomposition, task plan generation, task execution, result visualization and the like are integrated together, so that the data asset assessment task can be rapidly and efficiently executed. Meanwhile, various types of data asset tasks are supported, including data accuracy analysis, record filling rate analysis, cost composition analysis, sales revenue prediction and the like. The assessment user can put forward different instructions according to the needs, the system can execute corresponding tasks according to the instructions, assessment results are given, the convenience and the automaticity of assessment are improved, and the requirements of the user on assessment of different data assets are met.

In this embodiment, the interface tool library may also be constructed in advance. The implementation manner of constructing the interface tool library can be as follows: generating at least one operation instruction based on at least one predetermined seed instruction; analyzing the at least one operation instruction and determining at least one interface tool to be configured; configuring associated information corresponding to each interface tool to be configured; generating a function body corresponding to each interface tool to be configured according to each associated information so as to obtain at least one interface to be selected; and constructing and obtaining an interface tool library based on the at least one interface to be selected.

The operation instruction comprises at least one of a data acquisition attribute, a data processing attribute and a visualization attribute.

In this embodiment, some seed instructions, such as "load data set 001, analyze its record filling rate", may be preconfigured, and more operation instructions about the data set are generated through a large model, resulting in richer instructions. The operation instructions may include instructions for data acquisition, data processing, analysis, and visualization. Further, these operational instructions may be analyzed using a large model to determine which interface tools need to be used to fulfill the requests for these operational instructions. The associated information of interface function names, input parameters, output parameters, interface comments and the like of the interface tool to be configured can be configured in a natural language mode. Further, the ability of the large model to generate code may be utilized to generate specific code for implementing the functions of each interface tool to be configured as a function body. The function bodies can be respectively packaged to obtain at least one interface to be selected, the interfaces to be selected form an interface tool library, so that target interfaces corresponding to the function parameters are obtained from the interface tool library, and task processing is performed based on functions in the target interfaces.

Exemplary, the operational instructions include: "load dataset 001, analyze its data accuracy"; "load dataset 001, check repetition rate of data records"; "load dataset 001, calculate the miss rate of the data record"; "load dataset 001, detect outliers in data"; "load dataset 001, forecast future sales revenue for dataset"; "load dataset 001, analyze cost composition of dataset"; "load data sets 001 and 002, compare historical sales records of data set 001 and data set 002"; "load dataset 001, tabular visualization of data in dataset" and the like. The large model generalizes and summarizes these instructions, determining the interface tool to be configured and the configuration interface association information. For example, the interface tool to be configured includes: the data acquisition interface is used for extracting data from the data source, and the parameters comprise the name of the data source, the date range and the like; the data processing interface is used for carrying out calculation, screening, cleaning and other processes on the data, and the parameters comprise data processing operations to be executed; the data display interface is used for visually presenting the processed data, and the parameters comprise a chart type, a data column and the like; and the file output interface is used for further generating specific code implementation for each interface tool by utilizing the code generation capability of the large model to obtain an interface tool library.

According to the technical scheme, the questioning information to be used is determined based on the evaluation questioning information comprising the data identification; determining at least one task to be processed corresponding to the questioning information to be used and task description information corresponding to each task to be processed; determining a program to be executed corresponding to each task to be processed based on the task description information; according to the method, task processing is sequentially carried out according to the execution sequence of each program to be executed to obtain an evaluation result of data to be evaluated corresponding to a data identifier, the problems that in the prior art, analysis cost is high and effect is poor are solved by analyzing data assets manually or by software, an AI question-answering mode is achieved, evaluation question information input by a user is received and processed to obtain question information to be used, the evaluation processing accuracy and effectiveness are improved, at least one task to be processed which is matched with the question information to be used is selected, task description information of each task to be processed is determined, the program to be executed of the task to be processed is determined according to the task description information, task processing is sequentially carried out according to the execution sequence of each program to be executed to obtain the evaluation result of the data to be evaluated corresponding to the data identifier, the data asset analysis cost is reduced, the accuracy and convenience of data asset analysis are improved, and the analysis effect is improved to meet the user's requirement on data asset analysis.

Example two

Fig. 3 is a schematic structural diagram of a data asset assessment device based on a large model according to a second embodiment of the present invention. As shown in fig. 3, the apparatus includes: a questioning information to be used determination module 210, a task description information determination module 220, a program to be executed determination module 230, and an evaluation result determination module 240.

The questioning information to be used determining module 210 is configured to determine questioning information to be used based on the evaluation questioning information including the data identifier; a task description information determining module 220, configured to determine at least one task to be processed corresponding to the question to be used and task description information corresponding to each task to be processed; a program to be executed determining module 230, configured to determine a program to be executed corresponding to each task to be processed based on each task description information; the evaluation result determining module 240 is configured to sequentially perform task processing according to an execution sequence of each program to be executed, so as to obtain an evaluation result of the data to be evaluated corresponding to the data identifier.

On the basis of the above device, optionally, the to-be-used questioning information determining module 210 includes an estimated questioning information obtaining unit and a to-be-used questioning information determining unit.

The assessment questioning information acquisition unit is used for acquiring assessment questioning information comprising a data identifier;

the to-be-used questioning information determining unit is used for determining to-be-used questioning information according to a first predetermined prompting word template and the evaluation questioning information; wherein the first prompt word template includes at least one phrase and an evaluation convention corresponding to the at least one phrase.

On the basis of the above device, optionally, the task description information determining module 220 includes a task prompt information determining unit and a task description information determining unit.

The task prompt information determining unit is used for calling the task prompt information; the task prompt information comprises at least one of description guide information, an information generation format, a task set to be selected of at least one task type, task description word conventions and description information examples;

the task description information determining unit is used for processing the to-be-used question information according to the task prompt information, determining at least one to-be-processed task and determining task description information corresponding to each to-be-processed task; the task description information comprises a task number, a task type and a task description.

On the basis of the above apparatus, optionally, the program to be executed determining module 230 includes a response prompting information determining unit and a program to be executed determining unit.

The response prompt information determining unit is used for calling the response prompt information; the response prompt information comprises task guide information, corresponding examples between description information and interfaces and interface configuration information; the interface configuration information comprises an interface name, an interface annotation, an input parameter and an output parameter corresponding to at least one interface to be selected in the interface tool library;

and the program to be executed determining unit is used for determining the program to be executed corresponding to each task to be processed based on the response prompt information and the task description information.

On the basis of the device, optionally, the program to be executed determining unit comprises a function parameter determining unit and a program to be executed determining subunit.

The function parameter determining unit is used for determining function parameters corresponding to corresponding tasks to be processed based on the task description information, interface configuration information in the response prompt information and/or corresponding examples;

and the program to be executed determining unit is used for determining a program to be executed corresponding to the task to be processed based on the task description information, the task guide information in the response prompt information and the function parameters corresponding to the corresponding task to be processed.

On the basis of the above device, optionally, the evaluation result determining module 240 is configured to obtain, if a program in the program to be executed is a function parameter, a target interface corresponding to the function parameter from a pre-built interface tool library, so as to perform task processing based on a function in the target interface.

On the basis of the device, the device optionally further comprises an interface tool library construction module, wherein the interface tool library construction module comprises an operation instruction determination unit, an interface tool determination unit to be configured, an associated information configuration unit, an interface determination unit to be selected and an interface tool library determination unit.

An operation instruction determining unit for generating at least one operation instruction based on at least one seed instruction determined in advance; the operation instruction comprises at least one of a data acquisition attribute, a data processing attribute and a visualization attribute;

the interface tool to be configured determining unit is used for analyzing the at least one operation instruction and determining at least one interface tool to be configured;

the associated information configuration unit is used for configuring associated information corresponding to each interface tool to be configured;

The interface to be selected determining unit is used for generating a function body corresponding to each interface tool to be configured according to each piece of associated information so as to obtain at least one interface to be selected;

and the interface tool library determining unit is used for constructing and obtaining an interface tool library based on the at least one interface to be selected.

The data asset evaluation device based on the large model provided by the embodiment of the invention can execute the data asset evaluation method based on the large model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

FIG. 4 is a schematic diagram of an electronic device implementing a large model-based data asset assessment method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the large model-based data asset assessment method.

In some embodiments, the large model-based data asset assessment method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the large model-based data asset assessment method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the large model-based data asset assessment method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of data asset assessment based on a large model, comprising:

2. The method of claim 1, wherein the determining the challenge information to be used based on the assessment challenge information including the data identification comprises:

acquiring evaluation questioning information comprising a data identifier;

determining question information to be used according to a first predetermined prompt word template and the evaluation question information; wherein the first prompt word template includes at least one phrase and an evaluation convention corresponding to the at least one phrase.

3. The method of claim 1, wherein the determining at least one task to be processed corresponding to the question to be used and task description information corresponding to each task to be processed comprises:

retrieving task prompt information; the task prompt information comprises at least one of description guide information, an information generation format, a task set to be selected of at least one task type, task description word conventions and description information examples;

processing the questioning information to be used according to the task prompt information, determining at least one task to be processed, and determining task description information corresponding to each task to be processed; the task description information comprises a task number, a task type and a task description.

4. The method of claim 1, wherein determining a program to be executed corresponding to each of the tasks to be processed based on the respective task description information comprises:

calling response prompt information; the response prompt information comprises task guide information, corresponding examples between description information and interfaces and interface configuration information; the interface configuration information comprises an interface name, an interface annotation, an input parameter and an output parameter corresponding to at least one interface to be selected in the interface tool library;

and determining a program to be executed corresponding to each task to be processed based on the response prompt information and the task description information.

5. The method of claim 4, wherein determining a program to be executed corresponding to each task to be processed based on the response prompt message and each task description message comprises:

determining function parameters corresponding to the corresponding tasks to be processed based on the task description information, interface configuration information in the response prompt information and/or corresponding examples;

and determining a program to be executed corresponding to the task to be processed based on the task description information, the task guide information in the response prompt information and the function parameters corresponding to the corresponding task to be processed.

6. The method according to claim 1, wherein the sequentially performing task processing according to the execution sequence of each program to be executed to obtain the evaluation result of the data to be evaluated corresponding to the data identifier includes:

and if the program in the program to be executed is a function parameter, acquiring a target interface corresponding to the function parameter from a pre-constructed interface tool library so as to perform task processing based on a function in the target interface.

7. The method as recited in claim 6, further comprising:

constructing an interface tool library; wherein,

the construction interface tool library comprises:

generating at least one operation instruction based on at least one predetermined seed instruction; the operation instruction comprises at least one of a data acquisition attribute, a data processing attribute and a visualization attribute;

analyzing the at least one operation instruction and determining at least one interface tool to be configured;

configuring associated information corresponding to each interface tool to be configured;

generating a function body corresponding to each interface tool to be configured according to each associated information so as to obtain at least one interface to be selected;

And constructing and obtaining an interface tool library based on the at least one interface to be selected.

8. A large model-based data asset assessment device, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the large model-based data asset assessment method of any of claims 1-7.

10. A computer readable storage medium storing computer instructions for causing a processor to implement the large model based data asset assessment method of any of claims 1-7 when executed.