CN115146865A - Task optimization method based on artificial intelligence and related equipment - Google Patents

Task optimization method based on artificial intelligence and related equipment Download PDF

Info

Publication number
CN115146865A
CN115146865A CN202210871767.6A CN202210871767A CN115146865A CN 115146865 A CN115146865 A CN 115146865A CN 202210871767 A CN202210871767 A CN 202210871767A CN 115146865 A CN115146865 A CN 115146865A
Authority
CN
China
Prior art keywords
data
task
sample
evaluated
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210871767.6A
Other languages
Chinese (zh)
Inventor
邓雪昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202210871767.6A priority Critical patent/CN115146865A/en
Publication of CN115146865A publication Critical patent/CN115146865A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a task optimization method and device based on artificial intelligence, an electronic device and a storage medium, wherein the task optimization method based on artificial intelligence comprises the following steps: collecting a plurality of dimensions related to historical tasks as original data; performing dimensionality reduction on the original data to obtain dimensionality reduction data and a plurality of target dimensions; preprocessing the dimension reduction data to obtain sample data; clustering sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data; training a task classification model according to the sample data and the label data; and acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result. The method can continuously optimize the task classification model according to the classification result, so that the accuracy of task optimization can be continuously enhanced.

Description

Task optimization method based on artificial intelligence and related equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a task optimization method and apparatus based on artificial intelligence, an electronic device, and a storage medium.
Background
With the business scenes of enterprises becoming rich, the number of large data tasks of the enterprises is increasing, and a large number of repeated computing tasks and development tasks begin to appear, so that the cost of data analysis continuously rises. How to quickly locate unreasonable or redundant tasks among a great number of tasks and optimize the tasks is a problem which is widely concerned at present.
At present, enterprises generally formulate indexes capable of reflecting the quality of data analysis tasks according to experience, such as complexity, link redundancy, code efficiency and the like, the method is difficult to completely ensure the comprehensiveness of analysis, and the quantity of the indexes for measuring the quality of the tasks is small, so that the accuracy of analysis results is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a task optimization method based on artificial intelligence and related devices, so as to solve the technical problem of how to improve the accuracy of task optimization, where the related devices include a task optimization apparatus based on artificial intelligence, an electronic device and a storage medium.
The embodiment of the application provides a task optimization method based on artificial intelligence, which comprises the following steps:
collecting a plurality of items of data related to historical tasks as raw data, wherein the raw data comprises a plurality of dimensions;
performing dimensionality reduction processing on the original data to obtain dimensionality reduction data and a plurality of target dimensions;
preprocessing the dimensionality reduction data to obtain sample data;
clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data;
training a task classification model according to the sample data and the label data;
and acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
According to the task optimization method based on the artificial intelligence, original data containing multiple dimensions are constructed by collecting multiple items of data related to historical tasks, dimensionality reduction and preprocessing are carried out on the original data to obtain sample data, so that the data quality is improved, the sample data are further marked to obtain label data corresponding to the sample data, a task classification model is trained by using the sample data and the label data, then the model is used for classifying tasks to be evaluated, and continuous optimization is carried out on the task classification model according to a classification result, so that the performance of the task classification model can be continuously improved, and the accuracy of task optimization can be continuously enhanced.
In some embodiments, the performing dimensionality reduction on the raw data to obtain dimensionality reduction data and a plurality of target dimensions includes:
calculating the importance of each dimension in the original data according to a preset dimension reduction algorithm;
selecting a preset number of dimensions as target dimensions according to the importance;
and taking the target dimension in the original data as dimension reduction data corresponding to the original data.
Therefore, by calculating the importance of each dimension in the original data and selecting a plurality of dimensions with higher dimension importance as dimension reduction data, the data dimension is reduced, the time complexity of subsequent data analysis can be reduced, and the data analysis efficiency is improved.
In some embodiments, the pre-processing the dimension reduced data to obtain sample data comprises:
deleting the dimension reduction data with the missing value from the dimension reduction data, and taking the rest dimension reduction data as first alternative data;
deleting first candidate data with abnormal values from the first candidate data, and taking the remaining first candidate data as second candidate data;
and respectively carrying out normalization processing on the numerical value of each dimension in the second alternative data to obtain sample data.
Therefore, by preprocessing the original data, missing values and abnormal values in the original data are removed to obtain second alternative original data, and the numerical value of each dimension in the second alternative original data is normalized to eliminate the problem of dimension inconsistency, so that the quality of sample data is improved, and the accuracy of subsequent data analysis can be improved.
In some embodiments, the clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data includes:
clustering the sample data to obtain a plurality of sample groups, and respectively calculating the mean value of the sample data in each sample group;
respectively calculating Euclidean distance between each sample data and the mean value of the sample data aiming at each sample group, and taking the sample data corresponding to the minimum Euclidean distance as representative data to obtain the representative data of each sample group, wherein each representative data corresponds to a historical task;
and inquiring an optimization result of a historical task corresponding to the representative data according to a preset historical optimization record, and labeling all sample data in a sample group to which the representative data belongs according to the optimization result to obtain the label data, wherein the optimization result comprises no need of optimization and optimization.
Therefore, all sample data are marked by inquiring the optimization results of the historical tasks corresponding to a small amount of representative data to obtain the label data, so that the waste of human resources caused by repeated inquiry and marking is avoided, and the efficiency of data marking is improved.
In some embodiments, said clustering said sample data to obtain a plurality of sample groupings comprises:
a, selecting a plurality of target data from all sample data according to a preset grouping parameter, wherein the number of the target data is the same as that of the grouping parameter;
b, selecting one unselected sample data as current data, respectively calculating the cosine similarity between the current data and each target data, and classifying the current data and the target data corresponding to the maximum cosine similarity into the same alternative group;
c, repeatedly executing the step b until all sample data are classified into alternative groups to obtain a plurality of alternative groups;
d, respectively calculating the mean value of all sample data in each sample group, calculating the difference value between the mean value belonging to the same sample group and the target data, if the difference value is smaller than a preset termination threshold value, taking the plurality of alternative groups as a plurality of sample groups, if the difference value is larger than the preset termination threshold value, taking the mean value as the target data, and repeatedly executing the steps b to d to obtain a plurality of sample groups.
In this way, a plurality of sample groups are obtained by clustering all sample data, and each sample group has highly similar data dimensions, so that guidance can be provided for data labeling.
In some embodiments, said training a task classification model from said sample data and said label data comprises:
a, the sample data and the label data are in one-to-one correspondence to be used as training data, the training data are divided into a plurality of subsets according to a preset division ratio, all the subsets are marked as not-accessed, one subset is selected as a verification set, the subset is marked as accessed, and all the sample data in the rest subsets are used as a training set;
b, training a preset initial classification model by using all sample data in the training set to obtain an alternative classification model;
c, calculating the classification accuracy of the alternative classification model according to the sample data and the label data in the verification set;
d, optionally selecting one subset marked as unvisited as a verification set, taking the rest subsets as training sets, and repeatedly executing the steps b to d to obtain a plurality of candidate classification models and classification accuracy corresponding to each candidate classification model until all the subsets are marked as visited, and obtaining the classification accuracy corresponding to the plurality of candidate classification models and each candidate classification model;
and e, selecting an alternative classification model corresponding to the maximum value in the classification accuracy as a task classification model, wherein the task classification model has the function of receiving sample data corresponding to the historical task and outputting the category corresponding to the historical task.
Therefore, the task classification model with better performance is obtained through multiple times of cross validation, the risk of under-fitting of the task classification model can be reduced, and the performance of the task classification model is improved.
In some embodiments, the acquiring, according to the target dimension, data to be evaluated related to a task to be evaluated, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result includes:
acquiring multiple items of data related to the task to be evaluated as original data to be evaluated according to the target dimension;
normalizing the original data to be evaluated to obtain the data to be evaluated;
inputting the data to be evaluated into the task classification model to obtain the category of the task to be evaluated;
respectively calculating cosine similarity of the data to be evaluated and each sample data, taking the class of the sample data corresponding to the maximum cosine similarity as a reference class, if the reference class is the same as the class corresponding to the task to be evaluated, pushing the class corresponding to the task to be evaluated as a classification result to a user, and otherwise, taking the data to be evaluated as training data to retrain the task classification model.
Therefore, the target dimension is used for selecting the data to be evaluated corresponding to the task to be evaluated, the data to be evaluated is input into the task classification model to obtain the category of the task to be evaluated, whether the task to be evaluated is correctly classified is evaluated according to the sample data and the label data, the task classification model can be continuously optimized through the data which is not correctly classified, and therefore the performance of the task classification model can be improved.
The embodiment of the present application further provides a task optimization device based on artificial intelligence, the device includes:
the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting a plurality of items of data related to historical tasks as original data, and the original data comprises a plurality of dimensions;
the dimensionality reduction unit is used for performing dimensionality reduction on the original data to obtain dimensionality reduction data and a plurality of target dimensionalities;
the preprocessing unit is used for preprocessing the dimensionality reduction data to obtain sample data;
the marking unit is used for clustering the sample data to obtain a plurality of sample groups and marking the sample data in each sample group to obtain label data;
the training unit is used for training a task classification model according to the sample data and the label data;
and the classification unit is used for acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
An embodiment of the present application further provides an electronic device, where the electronic device includes:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based task optimization method.
The embodiment of the application also provides a computer readable storage medium, and computer readable instructions are stored in the computer readable storage medium and executed by a processor in an electronic device to implement the artificial intelligence based task optimization method.
According to the task optimization method based on artificial intelligence, original data containing multiple dimensions are constructed by collecting multiple items of data related to historical tasks, and the original data are subjected to dimensionality reduction and preprocessing to obtain sample data, so that the data quality is improved, the sample data are further marked to obtain label data corresponding to the sample data, a task classification model is trained by using the sample data and the label data, then the model is used for classifying the tasks to be evaluated, and the task classification model is subjected to continuous optimization according to the classification result, so that the performance of the task classification model can be continuously improved, and the accuracy of task optimization can be continuously enhanced.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of an artificial intelligence based task optimization method to which the present application relates.
FIG. 2 is a functional block diagram of a preferred embodiment of an artificial intelligence based task optimization apparatus according to the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the artificial intelligence based task optimization method.
Detailed Description
In order that the objects, dimensions and advantages of the present application may be more clearly understood, a detailed description of the present application is provided below along with accompanying drawings and specific embodiments. It should be noted that, in the embodiments and dimensions of the embodiments of the present application may be combined with each other without conflict. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, and the described embodiments are merely some, but not all embodiments of the present application.
Furthermore, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical dimensions indicated. Thus, the dimensions defined as "first" and "second" may explicitly or implicitly include one or more of the dimensions. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The embodiment of the present Application provides a task optimization method based on artificial intelligence, which can be applied to one or more electronic devices, where the electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and hardware of the electronic device includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
FIG. 1 is a flow chart of a preferred embodiment of the task optimization method based on artificial intelligence according to the present application. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
And S10, collecting a plurality of items of data related to the historical tasks as original data, wherein the original data comprises a plurality of dimensions.
In this alternative embodiment, the historical tasks refer to a plurality of data processing tasks that are executed in the server by the enterprise to improve the business efficiency, and functions of the plurality of data processing tasks include data query, data cleaning, data aggregation, and the like.
In this optional embodiment, the multiple items of data related to the historical task include a category of the historical task, an operation duration, an operation result, a resource occupancy rate, a data amount analyzed by the historical task, and a program interface called by the historical task, where the resource occupancy rate refers to a proportion of the historical task occupying the server resource, and the resource occupancy rate includes a CPU occupancy rate, a memory occupancy rate, and an I/O occupancy rate.
In this alternative embodiment, multiple items of data related to each historical task may be queried from a preset task running log as raw data corresponding to each historical task, where the raw data is a vector of 1 row and n columns, where n represents the number of the multiple items of data, and each column in the raw data represents one dimension of the raw data.
Therefore, a large amount of original data are obtained by widely collecting data related to the historical tasks, various information of the historical tasks are more widely included, and data support is provided for subsequent dimension extraction.
And S11, performing dimensionality reduction on the original data to obtain dimensionality reduction data and a plurality of target dimensionalities.
In an optional embodiment, the performing dimensionality reduction processing on the raw data to obtain dimensionality reduction data and a plurality of target dimensions includes:
calculating the importance of each dimension in the original data according to a preset dimension reduction algorithm;
selecting a preset number of dimensions as target dimensions according to the importance;
and taking the target dimension in the original data as dimension reduction data corresponding to the original data.
In this optional embodiment, the preset dimension reduction algorithm may be an existing data dimension reduction algorithm such as a Principal Component Analysis (PCA) algorithm, a Singular Value Decomposition (SVD) algorithm, and the like, which is not limited in this application, an output of the preset dimension reduction algorithm is an importance of each dimension in the original data, and a higher importance indicates that a dimension corresponding to the importance is more important for a representation of the original data.
In this alternative embodiment, the preset number may be 0.8n, 0.6n, 0.4n, and the like, which is not limited in this application, where n represents the number of dimensions of the original data. For example, when the preset number is 0.8n, the dimensions may be sorted according to the order of the importance from high to low, and the dimension with the order of the top 0.8n is selected as the target dimension.
In this alternative embodiment, the target dimension in the original data may be used as dimension reduction data.
Therefore, by calculating the importance of each dimension in the original data and selecting a plurality of dimensions with higher dimension importance as dimension reduction data, the data dimension is reduced, the time complexity of subsequent data analysis can be reduced, and the data analysis efficiency is improved.
And S12, preprocessing the dimension reduction data to obtain sample data.
In an optional embodiment, the preprocessing the dimension reduction data to obtain sample data includes:
deleting the dimensionality reduction data with missing values from the dimensionality reduction data, and taking the rest dimensionality reduction data as first alternative data;
deleting first candidate data with abnormal values from the first candidate data, and taking the remaining first candidate data as second candidate data;
and respectively carrying out normalization processing on the numerical value of each dimension in the second alternative data to obtain sample data.
In this optional embodiment, in order to avoid that data missing negatively affects the accuracy of subsequent data analysis, it may be queried whether each dimension reduction data has a missing value, and if a numerical value of any one dimension in a certain dimension reduction data is missing, the dimension reduction data is deleted until all dimension reduction data having the missing value are deleted, so as to obtain the first candidate data.
In this optional embodiment, when the deviation between the value of a certain dimension in the first candidate data and the mean of the values of the dimension in all the first candidate data is greater than two times the standard deviation, the value of the dimension is an abnormal value, and all the first candidate data with the abnormal value may be deleted to obtain the second candidate data.
In this optional embodiment, in order to eliminate the dimension difference of each dimension in the second candidate data, the numerical value of each dimension in the second candidate data may be normalized according to a preset normalization algorithm to obtain sample data, where the preset normalization algorithm may be an existing normalization algorithm such as a maximization algorithm, a minimization algorithm, an arc tangent function algorithm, an S-shaped growth curve algorithm, and the like, and this is not limited in the present application.
The dimension of the sample data is the same as the dimension of the dimensionality reduction data, and the value range of the numerical value of each dimension in the sample data is [0,1].
Therefore, by preprocessing the original data, missing values and abnormal values in the original data are removed to obtain second alternative original data, and the numerical value of each dimensionality in the second alternative original data is normalized to eliminate the problem of inconsistent dimension, so that the quality of sample data is improved, and the accuracy of subsequent data analysis can be improved.
And S13, clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data.
In an optional embodiment, the clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data includes:
clustering the sample data to obtain a plurality of sample groups, and respectively calculating the mean value of the sample data in each sample group;
respectively calculating Euclidean distance between each sample data and the mean value of the sample data aiming at each sample group, and taking the sample data corresponding to the minimum Euclidean distance as representative data to obtain the representative data of each sample group, wherein each representative data corresponds to a historical task;
and inquiring an optimization result of a historical task corresponding to the representative data according to a preset historical optimization record, and labeling all sample data in a sample group to which the representative data belongs according to the optimization result to obtain the label data, wherein the optimization result comprises no need of optimization and optimization.
In an optional embodiment, the clustering the sample data to obtain a plurality of sample groups includes:
a, selecting a plurality of target data from all sample data according to a preset grouping parameter, wherein the number of the target data is the same as that of the grouping parameter, and the preset grouping parameter can be 2 for example;
b, selecting one unselected sample data as current data, respectively calculating the cosine similarity between the current data and each target data, and classifying the current data and the target data corresponding to the maximum cosine similarity into the same alternative group;
c, repeatedly executing the step b until all sample data are classified into alternative groups to obtain a plurality of alternative groups, wherein the number of the sample groups is the same as the preset grouping parameters;
d, respectively calculating the mean value of all sample data in each sample group, calculating the difference value between the mean value belonging to the same sample group and the target data, if the difference value is smaller than a preset termination threshold value, taking the multiple alternative groups as multiple sample groups, if the difference value is larger than the preset termination threshold value, taking the mean value as the target data, and repeatedly executing the steps b to d to obtain multiple sample groups.
In this optional embodiment, since all the sample data in each sample group are relatively similar, a mean value of all the sample data in the sample group may be calculated, and the sample data with the minimum euclidean distance from the mean value is selected as the representative data of the sample group, where the representative data can represent the dimensions of all the sample data in the sample group.
In this optional embodiment, the preset history optimization record is used to record the optimization results of all history tasks, and the optimization results include that optimization is not required and optimization is required, and all sample data in the sample group to which the representative data belongs may be marked by using the optimization results of the history tasks corresponding to the representative data.
Therefore, all sample data are marked by inquiring the optimization results of the historical tasks corresponding to a small amount of representative data to obtain the label data, so that the waste of human resources caused by repeated inquiry and marking is avoided, and the efficiency of data marking is improved.
And S14, training a task classification model according to the sample data and the label data.
In an optional embodiment, training the task classification model according to the sample data and the label data includes:
a, the sample data and the label data are in one-to-one correspondence to be used as training data, the training data are divided into a plurality of subsets according to a preset division ratio, all the subsets are marked as being not accessed, one subset is selected as a verification set, the subset is marked as being accessed, and all the sample data in the rest subsets are used as a training set. For example, when the preset division ratio is 10%, the training data may be divided into 10 subsets, and each subset contains 10% of sample data.
And b, training a preset initial classification model by using all sample data in the training set to obtain an alternative classification model, wherein the preset initial classification model can be an existing classification model such as a GBDT (Gradient Boosting Decision Tree), an XGboost model (Extreme Gradient enhancement algorithm), a LightGBM (Light Gradient Boosting Machine model) and the like, and the method is not limited in the application.
The training method of the initial classification model can be a Bayesian parameter adjusting method.
And c, calculating the classification accuracy of the alternative classification model according to the sample data and the label data in the verification set.
And inputting the sample data in the verification set into the alternative classification model to obtain a classification result, and calculating the ratio of the number of correctly classified samples to the total number of the sample data in the verification set as the classification accuracy of the alternative classification model.
d, optionally selecting one subset marked as unvisited as a verification set, using the rest subsets as training sets, and repeatedly executing the steps b to d to obtain a plurality of candidate classification models and classification accuracy corresponding to each candidate classification model until all the subsets are marked as visited, and obtaining the classification accuracy corresponding to the plurality of candidate classification models and each candidate classification model.
And e, selecting an alternative classification model corresponding to the maximum value in the classification accuracy as a task classification model, wherein the task classification model has the functions of receiving sample data corresponding to the historical task and outputting the category corresponding to the historical task.
Therefore, the task classification model with better performance is obtained through multiple times of cross validation, the risk of under-fitting of the task classification model can be reduced, and the performance of the task classification model is improved.
S15, collecting data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
In an optional embodiment, the acquiring, according to the target dimension, data to be evaluated related to a task to be evaluated, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result includes:
acquiring multiple items of data related to the task to be evaluated as original data to be evaluated according to the target dimension;
normalizing the original data to be evaluated to obtain the data to be evaluated;
inputting the data to be evaluated into the task classification model to obtain the category of the task to be evaluated;
respectively calculating cosine similarity of the data to be evaluated and each sample data, taking the class of the sample data corresponding to the maximum cosine similarity as a reference class, if the reference class is the same as the class corresponding to the task to be evaluated, pushing the class corresponding to the task to be evaluated as a classification result to a user, and otherwise, taking the data to be evaluated as training data to retrain the task classification model.
The task to be evaluated refers to a data processing task which needs to be classified.
In this optional embodiment, various data related to the task to be evaluated may be collected as original data to be evaluated according to the target dimension, and for example, when the target dimension includes the running duration and the CPU occupancy rate, the running duration and the CPU occupancy rate of the task to be evaluated may be collected as the original data to be evaluated.
In this optional embodiment, the original data to be evaluated may be normalized according to the preset normalization algorithm to obtain data to be evaluated, where the data to be evaluated includes multiple dimensions, the number of the dimensions in the data to be evaluated is the same as the dimensions of the dimension-reduced data, and a value range of each dimension in the data to be evaluated is [0,1].
In this optional embodiment, the data to be evaluated may be input into the task classification model to obtain a category corresponding to the task to be evaluated, where the category corresponding to the task to be evaluated includes no optimization or optimization.
In this optional embodiment, the cosine similarity between the data to be evaluated and each sample data may be calculated respectively, the sample data corresponding to the maximum cosine similarity is selected as the benchmark data, the tag data corresponding to the benchmark data is used as the benchmark category, if the benchmark category is the same as the category corresponding to the task to be evaluated, the classification is correct, and the category corresponding to the task to be evaluated may be pushed to the user as the classification result.
If the reference category is different from the category corresponding to the task to be evaluated, classification fails, the reference category can be used as label data corresponding to the data to be evaluated, the data to be evaluated is used as training data, and the task classification model is retrained to improve the performance of the task classification model.
Therefore, the data to be evaluated corresponding to the task to be evaluated is selected by utilizing the target dimension, the data to be evaluated is input into the task classification model to obtain the category of the task to be evaluated, whether the task to be evaluated is correctly classified is evaluated according to the sample data and the label data, the task classification model can be continuously optimized through the data which is not correctly classified, and therefore the performance of the task classification model can be improved.
According to the task optimization method based on artificial intelligence, original data containing multiple dimensions are constructed by collecting multiple items of data related to historical tasks, and the original data are subjected to dimensionality reduction and preprocessing to obtain sample data, so that the data quality is improved, the sample data are further marked to obtain label data corresponding to the sample data, a task classification model is trained by using the sample data and the label data, then the model is used for classifying the tasks to be evaluated, and the task classification model is subjected to continuous optimization according to the classification result, so that the performance of the task classification model can be continuously improved, and the accuracy of task optimization can be continuously enhanced.
FIG. 2 is a functional block diagram of a preferred embodiment of the artificial intelligence based task optimization apparatus according to the present application. The artificial intelligence based task optimization device 11 comprises an acquisition unit 110, a dimension reduction unit 111, a preprocessing unit 112, a labeling unit 113, a training unit 114 and a classification unit 115. The modules/units referred to in this application refer to a series of computer program segments capable of being executed by the processor 13 and performing a fixed function, and stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
In an alternative embodiment, the collection unit 110 is configured to collect a plurality of items of data related to historical tasks as raw data, the raw data comprising a plurality of dimensions.
In this alternative embodiment, the historical tasks refer to a plurality of data processing tasks that are executed in the server by the enterprise to improve the business efficiency, and functions of the plurality of data processing tasks include data query, data cleaning, data aggregation, and the like.
In this optional embodiment, the multiple items of data related to the historical task include a category of the historical task, an operation duration, an operation result, a resource occupancy rate, a data amount analyzed by the historical task, and a program interface called by the historical task, where the resource occupancy rate refers to a proportion of the historical task occupying the server resource, and the resource occupancy rate includes a CPU occupancy rate, a memory occupancy rate, and an I/O occupancy rate.
In this alternative embodiment, multiple items of data related to each historical task may be queried from a preset task running log as raw data corresponding to each historical task, where the raw data is a vector including 1 row and n columns, where n represents the number of the multiple items of data, and each column in the raw data represents one dimension of the raw data.
In an optional embodiment, the dimension reduction unit 111 is configured to perform dimension reduction on the raw data to obtain dimension reduction data and a plurality of target dimensions.
In an optional embodiment, the performing dimension reduction processing on the raw data to obtain dimension reduction data and a plurality of target dimensions includes:
calculating the importance of each dimension in the original data according to a preset dimension reduction algorithm;
selecting a preset number of dimensions as target dimensions according to the importance;
and taking the target dimensionality in the original data as the dimensionality reduction data corresponding to the original data.
In this optional embodiment, the preset dimension reduction algorithm may be an existing data dimension reduction algorithm such as a Principal Component Analysis (PCA) algorithm, a Singular Value Decomposition (SVD) algorithm, and the like, which is not limited in this application, an output of the preset dimension reduction algorithm is an importance of each dimension in the original data, and a higher importance indicates that a dimension corresponding to the importance is more important for a representation of the original data.
In this alternative embodiment, the preset number may be 0.8n, 0.6n, 0.4n, and the like, which is not limited in this application, where n represents the number of dimensions of the original data. For example, when the preset number is 0.8n, the dimensions may be sorted according to the order of the importance from high to low, and the dimension with the order of the top 0.8n is selected as the target dimension.
In this alternative embodiment, the target dimension in the original data may be used as dimension reduction data.
In an optional embodiment, the preprocessing unit 112 is configured to preprocess the dimension reduced data to obtain sample data.
In an optional embodiment, the preprocessing the dimension reduction data to obtain sample data includes:
deleting the dimensionality reduction data with missing values from the dimensionality reduction data, and taking the rest dimensionality reduction data as first alternative data;
deleting first candidate data with abnormal values from the first candidate data, and taking the remaining first candidate data as second candidate data;
and respectively carrying out normalization processing on the numerical value of each dimension in the second alternative data to obtain sample data.
In this optional embodiment, in order to avoid that data loss negatively affects the accuracy of subsequent data analysis, whether each dimensionality reduction data has a missing value or not may be queried, and if a numerical value of any one dimensionality in a certain dimensionality reduction data is missing, the dimensionality reduction data is deleted until all dimensionality reduction data having the missing value are deleted, so as to obtain first candidate data.
In this optional embodiment, when the deviation between the value of a certain dimension in the first candidate data and the mean of the values of the dimension in all the first candidate data is greater than two times the standard deviation, the value of the dimension is an abnormal value, and all the first candidate data with the abnormal value may be deleted to obtain the second candidate data.
In this optional embodiment, in order to eliminate the dimension difference of each dimension in the second candidate data, the numerical value of each dimension in the second candidate data may be normalized according to a preset normalization algorithm to obtain sample data, where the preset normalization algorithm may be an existing normalization algorithm such as a maximization algorithm, a minimization algorithm, an arc tangent function algorithm, an S-shaped growth curve algorithm, and the like, and this is not limited in the present application.
The dimension of the sample data is the same as the dimension of the dimensionality reduction data, and the value range of the numerical value of each dimension in the sample data is [0,1].
In an optional embodiment, the labeling unit 113 is configured to cluster the sample data to obtain a plurality of sample groups, and label the sample data in each sample group to obtain label data.
In an optional embodiment, the clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data includes:
clustering the sample data to obtain a plurality of sample groups, and respectively calculating the mean value of the sample data in each sample group;
respectively calculating Euclidean distance between each sample data and the mean value of the sample data aiming at each sample group, and taking the sample data corresponding to the minimum Euclidean distance as representative data to obtain the representative data of each sample group, wherein each representative data corresponds to a historical task;
and inquiring an optimization result of a historical task corresponding to the representative data according to a preset historical optimization record, and labeling all sample data in a sample group to which the representative data belongs according to the optimization result to obtain the label data, wherein the optimization result comprises no need of optimization and optimization.
In an optional embodiment, the clustering the sample data to obtain a plurality of sample groups includes:
a, selecting a plurality of target data from all sample data according to a preset grouping parameter, wherein the number of the target data is the same as that of the grouping parameter, and the preset grouping parameter can be 2 for example;
b, selecting one unselected sample data as current data, respectively calculating the cosine similarity between the current data and each target data, and classifying the current data and the target data corresponding to the maximum cosine similarity into the same alternative group;
c, repeatedly executing the step b until all sample data are classified into alternative groups to obtain a plurality of alternative groups, wherein the number of the sample groups is the same as the preset group parameters;
d, respectively calculating the mean value of all sample data in each sample group, calculating the difference value between the mean value belonging to the same sample group and the target data, if the difference value is smaller than a preset termination threshold value, taking the multiple alternative groups as multiple sample groups, if the difference value is larger than the preset termination threshold value, taking the mean value as the target data, and repeatedly executing the steps b to d to obtain multiple sample groups.
In this optional embodiment, since all the sample data in each sample group are relatively similar, a mean value of all the sample data in the sample group may be calculated, and the sample data with the minimum euclidean distance from the mean value is selected as the representative data of the sample group, where the representative data can represent the dimensions of all the sample data in the sample group.
In this optional embodiment, the preset history optimization record is used to record the optimization results of all history tasks, and the optimization results include that optimization is not required and optimization is required, and all sample data in the sample group to which the representative data belongs may be marked by using the optimization results of the history tasks corresponding to the representative data.
In an optional embodiment, the training unit 114 is configured to train a task classification model according to the sample data and the label data.
In an optional embodiment, the training a task classification model according to the sample data and the label data includes:
in an optional embodiment, the training a task classification model according to the sample data and the label data includes:
a, the sample data and the label data are in one-to-one correspondence to be used as training data, the training data are divided into a plurality of subsets according to a preset division ratio, all the subsets are marked as being not accessed, one subset is selected as a verification set, the subset is marked as being accessed, and all the sample data in the rest subsets are used as a training set. For example, when the preset division ratio is 10%, the training data may be divided into 10 subsets, and each subset contains 10% of sample data.
And b, training a preset initial classification model by using all sample data in the training set to obtain an alternative classification model, wherein the preset initial classification model can be an existing classification model such as a GBDT (Gradient Boosting Decision Tree), an XGboost model (Extreme Gradient enhancement algorithm), a LightGBM (Light Gradient Boosting Machine model) and the like, and the method is not limited in the application.
The training method of the initial classification model can be a Bayesian parameter adjusting method.
And c, calculating the classification accuracy of the alternative classification model according to the sample data and the label data in the verification set.
And inputting the sample data in the verification set into the alternative classification model to obtain a classification result, and calculating the ratio of the number of correctly classified samples to the total number of the sample data in the verification set as the classification accuracy of the alternative classification model.
d, optionally selecting one subset marked as unvisited as a verification set, using the rest subsets as training sets, and repeatedly executing the steps b to d to obtain a plurality of candidate classification models and classification accuracy corresponding to each candidate classification model until all the subsets are marked as visited, and obtaining the classification accuracy corresponding to the plurality of candidate classification models and each candidate classification model.
And e, selecting an alternative classification model corresponding to the maximum value in the classification accuracy as a task classification model, wherein the task classification model has the function of receiving sample data corresponding to the historical task and outputting the category corresponding to the historical task.
In an optional embodiment, the classification unit 115 is configured to collect data to be evaluated related to the task to be evaluated according to the target dimension, input the data to be evaluated into the task classification model to obtain a classification result, and optimize the task to be evaluated according to the classification result.
In an optional embodiment, the acquiring, according to the target dimension, data to be evaluated related to a task to be evaluated, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result includes:
acquiring multiple items of data related to the task to be evaluated as original data to be evaluated according to the target dimension;
normalizing the original data to be evaluated to obtain the data to be evaluated;
inputting the data to be evaluated into the task classification model to obtain the category of the task to be evaluated;
respectively calculating cosine similarity of the data to be evaluated and each sample data, taking the class of the sample data corresponding to the maximum cosine similarity as a reference class, if the reference class is the same as the class corresponding to the task to be evaluated, pushing the class corresponding to the task to be evaluated as a classification result to a user, and otherwise, taking the data to be evaluated as training data to retrain the task classification model.
The task to be evaluated refers to a data processing task which needs to be classified.
In this optional embodiment, various data related to the task to be evaluated may be collected as original data to be evaluated according to the target dimension, and for example, when the target dimension includes the running duration and the CPU occupancy rate, the running duration and the CPU occupancy rate of the task to be evaluated may be collected as the original data to be evaluated.
In this optional embodiment, the original data to be evaluated may be normalized according to the preset normalization algorithm to obtain data to be evaluated, where the data to be evaluated includes multiple dimensions, the number of the dimensions in the data to be evaluated is the same as the dimensions of the dimension-reduced data, and a value range of each dimension in the data to be evaluated is [0,1].
In this optional embodiment, the data to be evaluated may be input into the task classification model to obtain a category corresponding to the task to be evaluated, where the category corresponding to the task to be evaluated includes no optimization or optimization.
In this optional embodiment, the cosine similarity between the data to be evaluated and each sample data may be calculated respectively, the sample data corresponding to the maximum cosine similarity is selected as the benchmark data, the tag data corresponding to the benchmark data is used as the benchmark category, if the benchmark category is the same as the category corresponding to the task to be evaluated, the classification is correct, and the category corresponding to the task to be evaluated may be pushed to the user as the classification result.
If the reference category is different from the category corresponding to the task to be evaluated, the classification fails, the reference category can be used as the label data corresponding to the data to be evaluated, the data to be evaluated can be used as training data, and the task classification model is retrained to improve the performance of the task classification model.
According to the task optimization method based on artificial intelligence, original data containing multiple dimensions are constructed by collecting multiple items of data related to historical tasks, and the original data are subjected to dimensionality reduction and preprocessing to obtain sample data, so that the data quality is improved, the sample data are further marked to obtain label data corresponding to the sample data, a task classification model is trained by using the sample data and the label data, then the model is used for classifying the tasks to be evaluated, and the task classification model is subjected to continuous optimization according to the classification result, so that the performance of the task classification model can be continuously improved, and the accuracy of task optimization can be continuously enhanced.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is used for storing computer readable instructions, and the processor 13 is used for executing the computer readable instructions stored in the memory to implement the artificial intelligence based task optimization method of any one of the above embodiments.
In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in the memory 12 and executable on the processor 13, such as an artificial intelligence based task optimization program.
Fig. 3 only shows the electronic device 1 with components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
In conjunction with fig. 1, memory 12 in electronic device 1 stores a plurality of computer-readable instructions to implement an artificial intelligence based task optimization method, and processor 13 may execute the plurality of instructions to implement:
collecting a plurality of items of data related to historical tasks as raw data, wherein the raw data comprises a plurality of dimensions;
performing dimensionality reduction processing on the original data to obtain dimensionality reduction data and a plurality of target dimensions;
preprocessing the dimensionality reduction data to obtain sample data;
clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data;
training a task classification model according to the sample data and the label data;
and acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
Specifically, the specific implementation method of the instruction by the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-type structure, and the electronic device 1 may further include more or less hardware or software than that shown in the figure, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as may be adapted to the present application, should also be included in the scope of protection of the present application, and is included by reference.
Memory 12 includes at least one type of readable storage medium, which may be non-volatile or volatile. The readable storage medium includes flash memory, removable hard disks, multimedia cards, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, e.g. a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as code of an artificial intelligence based task optimization program, etc., but also for temporarily storing data that has been output or is to be output.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the electronic device 1 by various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules stored in the memory 12 (e.g., executing artificial intelligence based task optimization programs, etc.), and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various types of application programs installed. The processor 13 executes the application program to implement the steps of the various artificial intelligence based task optimization method embodiments described above, such as the steps shown in FIG. 1.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to complete the application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device 1. For example, the computer program may be segmented into an acquisition unit 110, a dimension reduction unit 111, a pre-processing unit 112, an annotation unit 113, a training unit 114, a classification unit 115.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute parts of the artificial intelligence based task optimization method described in the embodiments of the present application.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer-readable storage medium and executed by a processor, to implement the steps of the embodiments of the methods described above.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random access Memory and other Memory, etc.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connected communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device 1 and another electronic device.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
The embodiment of the present application further provides a computer-readable storage medium (not shown), in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the artificial intelligence based task optimization method according to any of the above embodiments.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims (10)

1. A task optimization method based on artificial intelligence is characterized by comprising the following steps:
collecting a plurality of items of data related to historical tasks as raw data, wherein the raw data comprises a plurality of dimensions;
performing dimensionality reduction processing on the original data to obtain dimensionality reduction data and a plurality of target dimensions;
preprocessing the dimensionality reduction data to obtain sample data;
clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each sample group to obtain label data;
training a task classification model according to the sample data and the label data;
and acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
2. The artificial intelligence based task optimization method of claim 1, wherein the performing dimensionality reduction processing on the raw data to obtain dimensionality reduction data and a plurality of target dimensions comprises:
calculating the importance of each dimension in the original data according to a preset dimension reduction algorithm;
selecting a preset number of dimensions as target dimensions according to the importance;
and taking the target dimension in the original data as dimension reduction data corresponding to the original data.
3. The artificial intelligence based task optimization method of claim 1, wherein the pre-processing the dimension reduced data to obtain sample data comprises:
deleting the dimensionality reduction data with missing values from the dimensionality reduction data, and taking the rest dimensionality reduction data as first alternative data;
deleting first candidate data with abnormal values from the first candidate data, and taking the remaining first candidate data as second candidate data;
and respectively carrying out normalization processing on the numerical value of each dimension in the second alternative data to obtain sample data.
4. The artificial intelligence based task optimization method of claim 1, wherein the clustering the sample data to obtain a plurality of sample groups, and labeling the sample data in each of the sample groups to obtain labeled data comprises:
clustering the sample data to obtain a plurality of sample groups, and respectively calculating the mean value of the sample data in each sample group;
respectively calculating Euclidean distance between each sample data and the mean value of the sample data aiming at each sample group, and taking the sample data corresponding to the minimum Euclidean distance as representative data to obtain the representative data of each sample group, wherein each representative data corresponds to a historical task;
and inquiring an optimization result of a historical task corresponding to the representative data according to a preset historical optimization record, and labeling all sample data in a sample group to which the representative data belongs according to the optimization result to obtain the label data, wherein the optimization result comprises no need of optimization and optimization.
5. The artificial intelligence based task optimization method of claim 4, wherein said clustering the sample data to obtain a plurality of sample groupings comprises:
a, selecting a plurality of target data from all sample data according to a preset grouping parameter, wherein the number of the target data is the same as that of the grouping parameter;
b, selecting one unselected sample data as current data, respectively calculating the cosine similarity between the current data and each target data, and classifying the current data and the target data corresponding to the maximum cosine similarity into the same alternative group;
c, repeatedly executing the step b until all sample data are classified into alternative groups to obtain a plurality of alternative groups;
d, respectively calculating the mean value of all sample data in each sample group, calculating the difference value between the mean value belonging to the same sample group and the target data, if the difference value is smaller than a preset termination threshold value, taking the multiple alternative groups as multiple sample groups, if the difference value is larger than the preset termination threshold value, taking the mean value as the target data, and repeatedly executing the steps b to d to obtain multiple sample groups.
6. The artificial intelligence based task optimization method of claim 1, wherein said training a task classification model from said sample data and said label data comprises:
a, the sample data and the label data are in one-to-one correspondence to be used as training data, the training data are divided into a plurality of subsets according to a preset division ratio, all the subsets are marked as not-accessed, one subset is selected as a verification set, the subset is marked as accessed, and all the sample data in the rest subsets are used as a training set;
b, training a preset initial classification model by using all sample data in the training set to obtain an alternative classification model;
c, calculating the classification accuracy of the alternative classification model according to the sample data and the label data in the verification set;
d, optionally selecting one subset marked as unvisited as a verification set, taking the rest subsets as training sets, and repeatedly executing the steps b to d to obtain a plurality of candidate classification models and classification accuracy corresponding to each candidate classification model until all the subsets are marked as visited, and obtaining the classification accuracy corresponding to the plurality of candidate classification models and each candidate classification model;
and e, selecting an alternative classification model corresponding to the maximum value in the classification accuracy as a task classification model, wherein the task classification model has the functions of receiving sample data corresponding to the historical task and outputting the category corresponding to the historical task.
7. The artificial intelligence based task optimization method of claim 1, wherein the collecting data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result comprises:
acquiring multiple items of data related to the task to be evaluated as original data to be evaluated according to the target dimension;
normalizing the original data to be evaluated to obtain the data to be evaluated;
inputting the data to be evaluated into the task classification model to obtain the category of the task to be evaluated;
and respectively calculating cosine similarity between the data to be evaluated and each sample data, taking the class of the sample data corresponding to the maximum cosine similarity as a reference class, if the reference class is the same as the class corresponding to the task to be evaluated, pushing the class corresponding to the task to be evaluated as a classification result to a user, and if not, taking the data to be evaluated as training data to retrain the task classification model.
8. An artificial intelligence based task optimization apparatus, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a plurality of items of data related to historical tasks as original data, and the original data comprises a plurality of dimensions;
the dimensionality reduction unit is used for performing dimensionality reduction on the original data to obtain dimensionality reduction data and a plurality of target dimensionalities;
the preprocessing unit is used for preprocessing the dimensionality reduction data to obtain sample data;
the marking unit is used for clustering the sample data to obtain a plurality of sample groups and marking the sample data in each sample group to obtain label data;
the training unit is used for training a task classification model according to the sample data and the label data;
and the classification unit is used for acquiring data to be evaluated related to the task to be evaluated according to the target dimension, inputting the data to be evaluated into the task classification model to obtain a classification result, and optimizing the task to be evaluated according to the classification result.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based task optimization method of any one of claims 1 to 7.
10. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein computer-readable instructions that are executed by a processor in an electronic device to implement the artificial intelligence based task optimization method of any one of claims 1 to 7.
CN202210871767.6A 2022-07-22 2022-07-22 Task optimization method based on artificial intelligence and related equipment Pending CN115146865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210871767.6A CN115146865A (en) 2022-07-22 2022-07-22 Task optimization method based on artificial intelligence and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210871767.6A CN115146865A (en) 2022-07-22 2022-07-22 Task optimization method based on artificial intelligence and related equipment

Publications (1)

Publication Number Publication Date
CN115146865A true CN115146865A (en) 2022-10-04

Family

ID=83414124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210871767.6A Pending CN115146865A (en) 2022-07-22 2022-07-22 Task optimization method based on artificial intelligence and related equipment

Country Status (1)

Country Link
CN (1) CN115146865A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010602A (en) * 2023-01-10 2023-04-25 孔祥山 Data optimization method and system based on big data
CN116402113A (en) * 2023-06-08 2023-07-07 之江实验室 Task execution method and device, storage medium and electronic equipment
CN117315445A (en) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 Target identification method, device, electronic equipment and readable storage medium
CN117666971A (en) * 2024-01-31 2024-03-08 之江实验室 Industrial data storage method, device and equipment
CN117707747A (en) * 2024-02-06 2024-03-15 山东省计算中心(国家超级计算济南中心) Resource utilization rate prediction-based job excessive allocation scheduling method and system
CN117806833A (en) * 2024-02-28 2024-04-02 苏州元脑智能科技有限公司 Data processing system, method and medium
CN117707747B (en) * 2024-02-06 2024-05-24 山东省计算中心(国家超级计算济南中心) Resource utilization rate prediction-based job excessive allocation scheduling method and system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010602A (en) * 2023-01-10 2023-04-25 孔祥山 Data optimization method and system based on big data
CN116010602B (en) * 2023-01-10 2023-09-29 湖北华中电力科技开发有限责任公司 Data optimization method and system based on big data
CN116402113A (en) * 2023-06-08 2023-07-07 之江实验室 Task execution method and device, storage medium and electronic equipment
CN116402113B (en) * 2023-06-08 2023-10-03 之江实验室 Task execution method and device, storage medium and electronic equipment
CN117315445A (en) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 Target identification method, device, electronic equipment and readable storage medium
CN117315445B (en) * 2023-11-28 2024-03-22 苏州元脑智能科技有限公司 Target identification method, device, electronic equipment and readable storage medium
CN117666971A (en) * 2024-01-31 2024-03-08 之江实验室 Industrial data storage method, device and equipment
CN117666971B (en) * 2024-01-31 2024-04-30 之江实验室 Industrial data storage method, device and equipment
CN117707747A (en) * 2024-02-06 2024-03-15 山东省计算中心(国家超级计算济南中心) Resource utilization rate prediction-based job excessive allocation scheduling method and system
CN117707747B (en) * 2024-02-06 2024-05-24 山东省计算中心(国家超级计算济南中心) Resource utilization rate prediction-based job excessive allocation scheduling method and system
CN117806833A (en) * 2024-02-28 2024-04-02 苏州元脑智能科技有限公司 Data processing system, method and medium
CN117806833B (en) * 2024-02-28 2024-04-30 苏州元脑智能科技有限公司 Data processing system, method and medium

Similar Documents

Publication Publication Date Title
CN115146865A (en) Task optimization method based on artificial intelligence and related equipment
CN112801718B (en) User behavior prediction method, device, equipment and medium
CN112883190A (en) Text classification method and device, electronic equipment and storage medium
CN112541745A (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN112288337B (en) Behavior recommendation method, behavior recommendation device, behavior recommendation equipment and behavior recommendation medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN113157927A (en) Text classification method and device, electronic equipment and readable storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN112560465A (en) Method and device for monitoring batch abnormal events, electronic equipment and storage medium
WO2023040145A1 (en) Artificial intelligence-based text classification method and apparatus, electronic device, and medium
CN113658002B (en) Transaction result generation method and device based on decision tree, electronic equipment and medium
CN114756669A (en) Intelligent analysis method and device for problem intention, electronic equipment and storage medium
CN113344125A (en) Long text matching identification method and device, electronic equipment and storage medium
CN112801222A (en) Multi-classification method and device based on two-classification model, electronic equipment and medium
CN113313211A (en) Text classification method and device, electronic equipment and storage medium
CN112052310A (en) Information acquisition method, device, equipment and storage medium based on big data
CN111429085A (en) Contract data generation method and device, electronic equipment and storage medium
CN115146064A (en) Intention recognition model optimization method, device, equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN113312482A (en) Question classification method and device, electronic equipment and readable storage medium
CN114841165A (en) User data analysis and display method and device, electronic equipment and storage medium
CN114708073A (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
CN114722146A (en) Supply chain asset checking method, device, equipment and medium based on artificial intelligence
CN114610854A (en) Intelligent question and answer method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination