WO2023123851A1 - Procédé et appareil de visualisation de données, dispositif électronique, support de stockage et programme - Google Patents
Procédé et appareil de visualisation de données, dispositif électronique, support de stockage et programme Download PDFInfo
- Publication number
- WO2023123851A1 WO2023123851A1 PCT/CN2022/095486 CN2022095486W WO2023123851A1 WO 2023123851 A1 WO2023123851 A1 WO 2023123851A1 CN 2022095486 W CN2022095486 W CN 2022095486W WO 2023123851 A1 WO2023123851 A1 WO 2023123851A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- type
- visualization
- sub
- visualized
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000013079 data visualisation Methods 0.000 title claims abstract description 31
- 238000012800 visualization Methods 0.000 claims abstract description 165
- 238000012549 training Methods 0.000 claims abstract description 149
- 238000010801 machine learning Methods 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 description 8
- 238000013473 artificial intelligence Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 2
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the embodiment of the present application relates to the technical field of artificial intelligence, involving but not limited to a data visualization method, device, electronic equipment, computer storage medium and computer program.
- Automatic machine learning (Auto Machine Learning)
- the model automatic training task configured by the user, guided by the meta-knowledge transfer module to extract the matching meta-knowledge from the meta-knowledge repository, in the hyperparameter (Hyperparameter) search space for the machine learning model
- the hyperparameters are automatically searched, the training of the machine learning model is completed, and the hyperparameters are automatically optimized.
- Embodiments of the present application provide a data visualization method, device, electronic device, computer storage medium, and computer program, which can present the influence of artificial intelligence hyperparameters on the training results of machine learning model training tasks from multiple dimensions.
- a data visualization method provided by an embodiment of the present application is applied to electronic devices, and the method includes: according to multiple parameter sets used in the training task of the machine learning model, obtaining the training task under each parameter set Training results; each parameter set includes the value of at least one type of parameter; obtain the corresponding relationship between the data type of each type of data in the data to be visualized and the visualization sub-region; the data to be visualized includes: the plurality of parameter sets A training result corresponding to each of the parameter sets; visualizing each of the parameter sets and the training results corresponding to each of the parameter sets in a preset visualization area according to the correspondence; the preset visualization area includes The visualization sub-area corresponding to each type of data.
- the visualization of each of the parameter sets and the training results corresponding to each of the parameter sets in the preset visualization area according to the corresponding relationship includes: establishing each parameter set according to the corresponding relationship.
- the embodiment of the present application can generate the characteristic curve corresponding to each parameter set in the coordinate system of the corresponding visualization sub-region for each type of data in the visualization data, and the visualization data includes the training tasks of the machine learning model.
- the method further includes: traversing the training results corresponding to each of the parameter sets, and determining a target parameter set among the plurality of parameter sets; Set a pattern for display; the target parameter set includes at least one of the following: a parameter set corresponding to the maximum value of the reward function of the training task, and a parameter set corresponding to the minimum value of the reward function of the training task.
- the embodiment of the present application can use a preset style to display the characteristic curve corresponding to the target parameter set, which is convenient for analyzing the content displayed in the preset style, that is, it is convenient to analyze the characteristic curve corresponding to the target parameter set.
- establishing the coordinate system of the visualized sub-area corresponding to each type of data according to the corresponding relationship includes: determining the value range of each type of data in the data to be visualized; The numerical range of the type data and the corresponding relationship determine the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data. It can be seen that by determining the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data, it is beneficial to more accurately and comprehensively present the corresponding type of data in the visualization sub-region.
- the coordinate system of the visualization sub-region corresponding to each type of data forms a parallel coordinate system
- each vertical axis in the parallel coordinate system corresponds to a type of data in the data to be visualized.
- the embodiment of the present application can present a corresponding type of data in the visualized data in the visualization sub-area based on each vertical axis of the parallel coordinate system, that is, the embodiment of the present application can target different types of data to be visualized
- Types of data are classified and displayed in different visualization sub-areas, which facilitates classification and analysis of different types of data to be visualized.
- a data visualization device includes: a processing part configured to obtain a training result of the training task under each parameter set according to multiple parameter sets adopted by the training task of the machine learning model;
- Each of the parameter sets includes the value of at least one type of parameter;
- the acquisition part is used to acquire the corresponding relationship between the data type of each type of data in the data to be visualized and the visualization sub-region;
- the data type includes: the plurality of parameters The set and the training result corresponding to each of the parameter sets;
- the loading part is used to visualize each of the parameter sets and the training results corresponding to each of the parameter sets in the preset visualization area according to the corresponding relationship;
- the preset visualization area includes a visualization sub-area corresponding to each type of data.
- the loading part is configured to visualize each of the parameter sets and the training results corresponding to each of the parameter sets in a preset visualization area according to the correspondence, comprising: according to the The correspondence relationship establishes the coordinate system of the visualization sub-area corresponding to each type of data; according to the coordinate system of the visualization sub-area corresponding to each type of data, mark each type of the data to be visualized in the preset visualization area The coordinate points of the data; according to the coordinate points of each type of data in the preset visualization area, a characteristic curve corresponding to each of the parameter sets is generated.
- the embodiment of the present application can generate the characteristic curve corresponding to each parameter set in the coordinate system of the corresponding visualization sub-region for each type of data in the visualization data, and the visualization data includes the training tasks of the machine learning model.
- the loading part is further configured to: traverse the training results corresponding to each of the parameter sets, and determine a target parameter set among the plurality of parameter sets;
- the curve is displayed in a preset style;
- the target parameter set includes at least one of the following: a parameter set corresponding to the maximum value of the reward function of the training task, and a parameter set corresponding to the minimum value of the reward function of the training task.
- a preset style can be used to display the characteristic curve corresponding to the target parameter set, which is convenient for analyzing the content displayed in the preset style, that is, it is convenient for analyzing the characteristic curve corresponding to the target parameter set.
- the loading part is configured to establish the coordinate system of the visualized sub-area corresponding to each type of data according to the correspondence, including: determining the numerical range of each type of data in the data to be visualized; According to the numerical range of each type of data and the corresponding relationship, determine the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data. It can be seen that by determining the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data, it is beneficial to more accurately and comprehensively present the corresponding type of data in the visualization sub-region.
- the coordinate system of the visualization sub-region corresponding to each type of data forms a parallel coordinate system
- each vertical axis in the parallel coordinate system corresponds to a type of data in the data to be visualized.
- the embodiment of the present application can present a corresponding type of data in the visualized data in the visualization sub-area based on each vertical axis of the parallel coordinate system, that is, the embodiment of the present application can target different types of data to be visualized
- Types of data are classified and displayed in different visualization sub-areas, which facilitates classification and analysis of different types of data to be visualized.
- An embodiment of the present application provides an electronic device, the electronic device includes a memory, a processor, and a computer program stored in the memory that can run on the processor, and the processor implements one or more of the aforementioned technologies when executing the program The data visualization method provided by the solution.
- An embodiment of the present application provides a computer storage medium, and the computer storage medium stores a computer program; after the computer program is executed, the data visualization method provided by one or more of the foregoing technical solutions can be implemented.
- the embodiment of the present application also provides a computer program, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes any one of the above data visualization method.
- each parameter set includes the value of at least one type of parameter
- the corresponding relationship between the data type of each type of data in the visualization data and the visualization sub-area; the data to be visualized includes any of the following: multiple parameter sets and the training results corresponding to each parameter set; according to the corresponding relationship in the preset visualization area
- Each parameter set and the training result corresponding to each parameter set are visualized; since the preset visualization area includes a visualization sub-area corresponding to each type of data, each visualization sub-area corresponds to a data dimension of the data to be visualized. Therefore, the influence of hyperparameters of artificial intelligence on the training results of the training task of the machine learning model can be presented from multiple dimensions.
- FIG. 1 is a schematic flow diagram of a data visualization method provided in an embodiment of the present application
- FIG. 2 is a schematic diagram of a scene of a data visualization method provided by an embodiment of the present application
- FIG. 3 is a schematic flow diagram 1 for visualizing each parameter set and the training result corresponding to each parameter set provided by the embodiment of the present application;
- FIG. 4 is a second schematic flow diagram for visualizing each parameter set and the training result corresponding to each parameter set provided by the embodiment of the present application;
- FIG. 5 is a second schematic diagram of a data visualization method provided by the embodiment of the present application.
- FIG. 6 is a schematic flowchart of establishing a coordinate system of a visualized sub-region corresponding to each type of data provided by the embodiment of the present application;
- FIG. 7 is a schematic diagram of a data visualization device provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- hyperparameter search based on automatic optimization algorithms is comparable to hyperparameter configurations set by human experts based on experience. Therefore, the field of automatic machine learning has risen rapidly, covering areas including automatic data cleaning, automatic feature engineering, hyperparameter optimization, meta-learning , neural architecture search, early stopping algorithms, etc.
- hyperparameters include ⁇ 1 , ⁇ 2 , ⁇ 3 , ... ⁇ i ..., ⁇ n ⁇ , and each hyperparameter space belongs to a specific discrete space Or continuous space, the combination of all hyperparameter spaces constitutes the total search space A.
- a set of hyperparameter configurations sampled from the search space A is defined as A ⁇ , the configurations are continuously sampled from the search space, and the sampling configuration process is continuously optimized according to the feedback accuracy corresponding to the configurations.
- FIG. 1 shows a schematic flowchart of a data visualization method provided by an embodiment of the present application.
- the data visualization method provided by the embodiment of the present application may include the following steps:
- Step A101 Obtain the training results of the training task under each parameter set according to multiple parameter sets used in the training task of the machine learning model; each parameter set includes the value of at least one type of parameter.
- the type of machine learning model includes, without limitation, a convolutional neural network, a deep neural network, a convolutional neural network, a recurrent neural network, or any combination thereof.
- the machine learning model may be a model formed by combining a convolutional neural network and a deep neural network.
- the parameter types of the machine learning model may include common parameters and hyperparameters.
- common parameters include but are not limited to activation functions, optimization algorithms, regularization parameters, and the number of neural network layer nodes.
- the hyperparameters are parameters in a preset generalized regression neural network model.
- the parameter type of hyperparameters includes any of the following: learning rate (gamma), number of samples used in one iteration (batchSize), number of iterations (epochs), and step size (stepSize).
- one epoch of samples that is, all training samples.
- An Epoch sample can be divided into multiple batches for training, and the batchSize of each batch can be 512 or 1024.
- the training result of the training task under each parameter set may be the return value of the reward function of the subtask corresponding to each parameter set.
- the test indicators reflected by the reward function of the training task include any of the following: accuracy rate, precision rate, recall rate, F1 value, and classification probability value.
- the classification probability value can be an AUC (Area Under Curve) value, and the AUC value is defined as the area enclosed by the ROC curve and the coordinate axis, and the value of the area is less than 1.
- the value range of AUC is [0.5, 1]. The closer the AUC value is to 1.0, the higher the authenticity of the classification probability; when the AUC value is equal to 0.5, the lower the authenticity of the classification probability.
- the F1 value is an indicator used in statistics to measure the accuracy of the binary classification model.
- the F1 value takes into account the precision and recall of the classification model at the same time, and can be regarded as a harmonic average of the precision and recall of the model.
- the value range of the F1 value is [0, 1].
- Step A102 Obtain the corresponding relationship between the data type of each type of data in the data to be visualized and the visualization sub-region; the data to be visualized includes any of the following items: multiple parameter sets and training results corresponding to each parameter set.
- the multiple parameter sets of the machine learning model are ⁇ 0.8, 512, 3, 1 ⁇ , ⁇ 0.8, 512, 6, 2 ⁇ , ⁇ 0.6, 512, 3, 1 ⁇ , ⁇ 0.8 , 512, 3, 1 ⁇ , ⁇ 0.6, 1024, 3, 1 ⁇ , ⁇ 0.8, 512, 6, 2 ⁇ .
- the subtasks of the training task include: #871, #872, #873, #874, #875, #876.
- subtask #871 adopts parameter set ⁇ 0.8, 512, 3, 1 ⁇
- subtask #872 adopts parameter set ⁇ 0.8, 512, 6, 2 ⁇
- subtask #873 adopts parameter set ⁇ 0.6, 1024, 3
- subtask #874 adopts parameter set ⁇ 0.8, 512, 3
- 1 ⁇ , subtask #875 adopts parameter set ⁇ 0.6, 512, 3, 1 ⁇ , subtask #876 adopts parameter set ⁇ 0.8, 512, 6 ,2 ⁇ .
- the preset visualization area 201 includes a first visualization subarea 202, a second visualization subarea 203, a third visualization subarea 204, and a fourth visualization subarea 205.
- Each visualization sub-area may include an ordinate axis in the parallel coordinate system of the preset visualization area.
- Each axis index corresponds to a type of visualization data.
- a parallel coordinate graph can be formed in a preset visualization area.
- the index of the coordinate axis includes any of the following: gamma, batchSize, epochs, stepSize, reward, where reward represents the value of the reward function.
- the data type can include any of the following: gamma, batchSize, epochs, stepSize, value of the reward function.
- stepSize The value of the reward function Visualize sub-areas sub-area 1 sub-area 2 sub-area 3 sub-area 4 sub-area 5
- data type "gamma” corresponds to subregion 1
- data type "batchSize” corresponds to subregion 2
- data type "epochs” corresponds to subregion 3
- data type “stepSize” corresponds to subregion 4
- data type "value of reward function" corresponds to sub-area 5.
- Step A103 Visualize each parameter set and the training result corresponding to each parameter set in a preset visualization area according to the corresponding relationship; the preset visualization area includes a visualization sub-area corresponding to each type of data.
- the coordinate axes in each visualization sub-area form a parallel coordinate system in the preset visualization area. Therefore, the influence of hyperparameters of artificial intelligence on the training results of the training task of the machine learning model can be presented from multiple dimensions.
- the subtasks of the training task include: #871, #872, #873, #874, #875, #876.
- the values of multiple dimensions in the training result data of each subtask can be displayed.
- the preset visualization area contains information in 5 dimensions, and the 5 dimensions correspond to the values of gamma, batchSize, epochs, stepSize, and reward function respectively.
- Each broken line represents the parameter set adopted by a subtask and the value of the reward function corresponding to the parameter set.
- the value dimension of the reward function may include any of the following: accuracy rate, precision rate, recall rate, F1 value, and AUC value.
- a parallel coordinate system for each visualization sub-area is created in the preset visualization area, and each parameter set corresponds to each parameter set in the preset visualization area according to the corresponding relationship Visualize the training results.
- Echarts is an open source visual chart library based on JavaScript.
- the bottom layer relies on the lightweight Canvas class library ZRender, which has features such as drag and drop recalculation, data view, and value range roaming, and provides interactive and personalized data visualization. chart.
- a two-dimensional coordinate system is often used when comparing attributes of the same dimension of multiple sets of training result data.
- training result data has more than one dimension, and multiple polylines are required.
- the graph visualizes the training result data in different dimensions.
- using a two-dimensional coordinate system cannot show the comparison of multiple sets of data in the same graph.
- data to be visualized includes any of the following: multiple parameter sets and training results corresponding to each parameter set; training for each parameter set and each parameter set in the preset visualization area according to the corresponding relationship
- the results are visualized. Since the preset visualization area includes a visualization sub-area corresponding to each type of data, each visualization sub-area corresponds to a data dimension of the data to be visualized. Therefore, the influence of hyperparameters of artificial intelligence on the training results of the training task of the machine learning model can be presented from multiple dimensions.
- the above step A101 to step A103 can be implemented by a processor, and the above processor can be an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Logic Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, At least one of a microcontroller and a microprocessor.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processor
- DSPD Digital Signal Processing Device
- PLD Programmable Logic Device
- Field Programmable Logic Gate Array Field Programmable Gate Array
- FPGA Field Programmable Gate Array
- CPU Central Processing Unit
- CPU Central Processing Unit
- each parameter set and the training result corresponding to each parameter set are visualized in the preset visualization area according to the corresponding relationship, as shown in FIG. 3 , which may include the following steps:
- Step A301 Establish the coordinate system of the visualized sub-region corresponding to each type of data according to the corresponding relationship.
- the Echarts open source visualization chart library is called, and the parallel coordinate system of the visualization sub-area corresponding to each type of data is established according to the corresponding relationship.
- the parallel coordinate system includes a horizontal axis (X axis) and a vertical axis (Y axis).
- the minimum value of the Y-axis scale is the minimum value of the data to be visualized corresponding to the visualization sub-area
- the maximum value of the Y-axis scale is the data to be visualized corresponding to the visualization sub-area the maximum value in .
- Step A302 Mark the coordinate points of each type of data in the data to be visualized in the preset visualization area according to the coordinate system of the visualization sub-area corresponding to each type of data.
- the value on the Y axis represents the value of the data to be visualized, for example, see Table 5, the type of data to be visualized is gamma, and the value on the Y axis Indicates the parameter value of the hyperparameter gamma of subtask #871 of the training task.
- the subtasks of the training task include: #871.
- subtask #871 mark the coordinate points of each type of data to be visualized in subtask #871 in the preset visualization area according to the coordinate system of the visualization subarea corresponding to each type of data.
- the value of the reward function 0.615 represents the training performance of the subtask #871 of the training task of the machine learning model under the hyperparameter values in the parameter set ⁇ 0.8, 512, 3, 1 ⁇ .
- the coordinate points of each type of data to be visualized in subtask #871 include: (0,0.8), (0,512), (0,3), (0,1), ( 0,0.615).
- Step A303 Generate a characteristic curve corresponding to each parameter set according to the coordinate points of each type of data in the preset visualization area.
- a connecting line corresponding to two coordinate points of the same parameter set in the two visualization sub-areas is generated to obtain a characteristic curve corresponding to each parameter set.
- the two visualized sub-areas with adjacent relationship include: sub-area 1 and sub-area 2, sub-area 2 and sub-area 3, sub-area 3 and sub-area 4, sub-area 4 and sub-area 5.
- the parameter set of the subtask is represented by a broken line in the parallel coordinates plot, the vertical direction corresponds to the value of the data to be visualized, and the horizontal direction corresponds to the data type of the data to be visualized.
- variable values correspond to the coordinate points of each type of data to be visualized in the preset visualization area.
- the embodiment of the present application can generate the characteristic curve corresponding to each parameter set in the coordinate system of the corresponding visualization sub-region for each type of data in the visualization data, and the visualization data includes the training tasks of the machine learning model.
- the above data visualization method may further include the following steps:
- Step A401 traverse the training results corresponding to each parameter set, and determine a target parameter set among multiple parameter sets.
- the target parameter set includes at least one of the following: a parameter set corresponding to the maximum value of the reward function of the training task, a parameter set corresponding to the minimum value of the training task reward function
- each training task data is traversed to identify at least one of the maximum value and minimum value of the reward function in multiple sets of training task data, and the maximum value of the reward function A parameter set corresponding to at least one of the minimum values is determined as a target parameter set.
- the training task data may include parameter sets adopted by subtasks of the training task and training results corresponding to each parameter set.
- the training result corresponding to the parameter set can be the return value of the reward function of the machine learning model.
- Step A402 Display the characteristic curve corresponding to the target parameter set in a preset style.
- the characteristic curve corresponding to the target parameter set is displayed in a preset style. For example, if the preset style is highlight style, the width of the characteristic curve becomes larger and the transparency becomes lower.
- the identifier of the subtask corresponding to the minimum value of the reward function is #874, and the identifier of the subtask corresponding to the maximum value of the reward function is #873.
- a switch of a preset style is set in the display interface, and the switch of the preset style includes at least one of a show max switch 501 and a shou min switch 502.
- the show max switch 501 is used to highlight the characteristic curve corresponding to the maximum value of the reward function
- the shou min switch 502 is used to highlight the characteristic curve corresponding to the minimum value of the reward function.
- switching the switch of the preset style in the display interface can highlight/unhighlight the characteristic curve corresponding to the target parameter set.
- the first characteristic curve 503 can be displayed by highlighting
- the show min switch 502 is in an open state
- the second characteristic curve 504 can be displayed by highlighting
- the highlighted display of the characteristic curves can be achieved by using bold lines.
- the first characteristic curve 503 represents the characteristic curve corresponding to the maximum value of the reward function
- the second characteristic curve 504 represents the characteristic curve corresponding to the minimum value of the reward function.
- the parallel axis coordinate system is set in the Tensorboard product, and the characteristic curve corresponding to the target parameter set is highlighted, or at least one of the following is highlighted: the characteristic curve corresponding to the maximum value of the reward function, and the minimum value of the reward function corresponding to characteristic curve.
- Tensorboard is a built-in visualization tool of Tensorflow, which visualizes the information of the log file output by the tensorflow program, making the understanding, debugging and optimization of the tensorflow program easier and more efficient.
- a preset style can be used to display the characteristic curve corresponding to the target parameter set, which is convenient for analyzing the content displayed in the preset style, that is, it is convenient for analyzing the characteristic curve corresponding to the target parameter set.
- step A303 the coordinate system of the visualized sub-region corresponding to each type of data is established according to the corresponding relationship. Referring to FIG. 6, the following steps may be included:
- Step A601 Determine the numerical range of each type of data in the data to be visualized.
- the value range of the hyperparameter gamma is [0.6, 0.8]
- the value range of the hyperparameter batchSize is [512, 1024]
- the value range of the hyperparameter epochs is [3, 6]
- the hyperparameter stepSize The value range of is [1, 2].
- the reward function of the training task - the numerical range of accuracy is [0.009, 0.988].
- hyperparameter search space for each hyperparameter may belong to a specific discrete space or continuous space, and all hyperparameter spaces combine to form a total search space, which is not limited in the present application.
- Step A602 According to the numerical range and corresponding relationship of each type of data, determine the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data.
- each vertical axis in the parallel coordinate system corresponds to a type of data in the data to be visualized.
- Table 6 according to the numerical range and corresponding relationship of each type of data, determine the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data.
- Table 6 The coordinate range of the coordinate axis of the visualization sub-area corresponding to each type of data
- the Y-axis coordinate range of the first sub-area corresponds to the hyperparameter gamma
- the Y-axis coordinate range of the second sub-area corresponds to the hyper-parameter batchSize
- the Y-axis coordinate range of the third sub-area corresponds to the hyperparameter epochs
- the Y-axis coordinate range of the third sub-area corresponds to the hyperparameter epochs.
- the coordinate range of the Y axis of the 4 sub-regions corresponds to the hyperparameter stepSize.
- the value on the Y-axis of the i-th sub-area represents the parameter value of the sub-task of the training task on a certain type of parameter in the parameter set;
- the value on the Y axis of the i sub-region corresponds to the value of the reward function.
- the value of the reward function can reflect the performance of the training results of the subtasks of the training task under the parameter values in the parameter set.
- the coordinate ranges of the Y-axis in each visualization sub-area are within a reasonable range, therefore, the parameter sets corresponding to each sub-task in the multiple sub-tasks of the training task can be compared at the same time.
- the coordinate system of the visualization sub-region corresponding to each type of data forms a parallel coordinate system, and each vertical axis in the parallel coordinate system corresponds to a type of data in the data to be visualized.
- the coordinate system of the visualization sub-area corresponding to each type of data forms a parallel coordinate system, which can generate multiple parallel and equidistant axes, and represent the objects in the multi-dimensional space as polylines with vertices on the parallel axes .
- the objects in the multi-dimensional space may include: each type of parameter in the parameter set and the training result corresponding to each parameter set.
- any two visualization sub-areas in the preset visualization area is the same; the shape of the visualization sub-area is rectangular, the visualization sub-areas can be arranged in the horizontal direction, and the coordinate system in each visualization sub-area forms parallel coordinate system.
- any two visualization sub-areas in the preset visualization area have the same shape and size; the shape of the visualization sub-areas is fan-shaped, and the visualization sub-areas may be arranged along a circumferential direction. Therefore, a radar chart can be formed in a preset visualization area, and each sector in the radar chart corresponds to a visualization sub-area.
- the embodiment of the present application can present a corresponding type of data in the visualized data in the visualization sub-area based on each vertical axis of the parallel coordinate system, that is, the embodiment of the present application can target different types of data to be visualized Types of data are classified and displayed in different visualization sub-areas, which facilitates classification and analysis of different types of data to be visualized.
- the data visualization device provided by the embodiment of the present application may include:
- the processing part 701 is configured to obtain the training results of the training task under each parameter set according to multiple parameter sets adopted by the training task of the machine learning model; each parameter set includes the value of at least one type of parameter ;
- the acquiring part 702 is configured to acquire the corresponding relationship between the data type of each type of data in the data to be visualized and the visualization sub-region; the data type includes: the plurality of parameter sets and the training results corresponding to each of the parameter sets;
- the loading part 703 is configured to visualize each of the parameter sets and the training results corresponding to each of the parameter sets in a preset visualization area according to the correspondence; the preset visualization area includes each type of data The corresponding visualization subarea.
- each visualization sub-area corresponds to a data dimension of the data to be visualized. Therefore, the influence of hyperparameters of artificial intelligence on the training results of the training task of the machine learning model can be presented from multiple dimensions.
- the loading part 703 is configured to visualize each of the parameter sets and the training results corresponding to each of the parameter sets in a preset visualization area according to the correspondence, including:
- the visualization sub-area corresponding to each type of data mark the coordinate points of each type of data in the data to be visualized in the preset visualization area;
- a characteristic curve corresponding to each of the parameter sets is generated.
- the embodiment of the present application can generate the characteristic curve corresponding to each parameter set in the coordinate system of the corresponding visualization sub-region for each type of data in the visualization data, and the visualization data includes the training tasks of the machine learning model.
- the loading part 703 is also used to:
- the characteristic curve corresponding to the target parameter set is displayed in a preset style; the target parameter set includes at least one of the following: a parameter set corresponding to the maximum value of the reward function of the training task, and a parameter set corresponding to the minimum value of the reward function of the training task set of parameters.
- a preset style can be used to display the characteristic curve corresponding to the target parameter set, which is convenient for analyzing the content displayed in the preset style, that is, it is convenient for analyzing the characteristic curve corresponding to the target parameter set.
- the loading part 703 is configured to establish the coordinate system of the visualization sub-region corresponding to each type of data according to the correspondence, including:
- each type of data determines the coordinate range of the coordinate axis of the visualization sub-region corresponding to each type of data.
- the coordinate system of the visualization sub-region corresponding to each type of data forms a parallel coordinate system, and each vertical axis in the parallel coordinate system corresponds to a type of data in the data to be visualized.
- the embodiment of the present application can present a corresponding type of data in the visualized data in the visualization sub-area based on each vertical axis of the parallel coordinate system, that is, the embodiment of the present application can target different types of data to be visualized Types of data are classified and displayed in different visualization sub-areas, which facilitates classification and analysis of different types of data to be visualized.
- the functions or parts included in the apparatus provided in the embodiments of the present application can be used to execute the methods described in the above method embodiments, and its specific implementation can refer to the descriptions of the above method embodiments for brevity.
- the processing part 701, the acquiring part 702 and the loading part 703 can all be realized by the processor of the electronic device, and the above-mentioned processor can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, At least one of the microprocessors, which is not limited in this embodiment of the present application.
- the electronic device 800 provided by the embodiment of the present application may include: a memory 810 and a processor 820; wherein,
- memory 810 for storing computer programs and data
- the processor 820 is configured to execute the computer program stored in the memory, so as to implement any data visualization method in the foregoing embodiments.
- the above-mentioned memory 810 can be a volatile memory (volatile memory), exemplary RAM; or a non-volatile memory (non-volatile memory), exemplary ROM, flash memory (flash memory), Hard Disk Drive (HDD) or Solid-State Drive (SSD); or a combination of the above types of storage.
- the aforementioned memory 810 may provide instructions and data to the processor 820 .
- the embodiment of the present application also provides a computer program, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes any one of the above data visualization method.
- the disclosed devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of units is only a logical function division.
- the mutual coupling, or direct coupling, or communication connection of the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms .
- the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple grid units ; Some or all of the units can be selected according to the actual situation to realize the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application can be integrated into one processing module, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
- the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
- the embodiment of the present application provides a data visualization method, device, electronic equipment, storage medium and program, the method includes: according to multiple parameter sets used in the training task of the machine learning model, obtain the training task under each parameter set Training results; each parameter set includes the value of at least one type of parameter; obtain the corresponding relationship between the data type of each type of data in the data to be visualized and the visualization sub-area; the data to be visualized includes: multiple parameter sets corresponding to each parameter set Each parameter set and the training result corresponding to each parameter set are visualized in the preset visualization area according to the corresponding relationship; the preset visualization area includes a visualization sub-area corresponding to each type of data.
- each visualization sub-area corresponds to a data dimension of the data to be visualized. Therefore, the influence of hyperparameters of artificial intelligence on the training results of the training task of the machine learning model can be presented from multiple dimensions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- User Interface Of Digital Computer (AREA)
- Image Generation (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
L'invention concerne un procédé et un appareil de visualisation de données, un dispositif électronique, un support de stockage et un programme. Le procédé consiste à : obtenir un résultat d'apprentissage d'une tâche d'apprentissage sous chaque ensemble de paramètres selon une pluralité d'ensembles de paramètres utilisés par la tâche d'apprentissage d'un modèle d'apprentissage automatique, chaque ensemble de paramètres comprenant une valeur numérique d'au moins un paramètre (A101) ; obtenir la correspondance entre un type de données de chaque type de données et une sous-région de visualisation de données à visualiser, les données à visualiser comprenant : la pluralité des ensembles de paramètres et le résultat d'apprentissage correspondant à chaque ensemble de paramètres (A102) ; et visualiser chaque ensemble de paramètres et le résultat d'apprentissage correspondant à chaque ensemble de paramètres dans une région de visualisation prédéfinie selon la correspondance, la région de visualisation prédéfinie comprenant la sous-région de visualisation correspondant à chaque type de données (A103).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111619175.7 | 2021-12-27 | ||
CN202111619175.7A CN114357253A (zh) | 2021-12-27 | 2021-12-27 | 数据可视化方法、装置、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023123851A1 true WO2023123851A1 (fr) | 2023-07-06 |
Family
ID=81103648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/095486 WO2023123851A1 (fr) | 2021-12-27 | 2022-05-27 | Procédé et appareil de visualisation de données, dispositif électronique, support de stockage et programme |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114357253A (fr) |
WO (1) | WO2023123851A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357253A (zh) * | 2021-12-27 | 2022-04-15 | 上海商汤科技开发有限公司 | 数据可视化方法、装置、电子设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991658A (zh) * | 2019-11-28 | 2020-04-10 | 重庆紫光华山智安科技有限公司 | 模型训练方法、装置、电子设备和计算机可读存储介质 |
CN111797294A (zh) * | 2020-07-09 | 2020-10-20 | 上海商汤智能科技有限公司 | 可视化方法及相关设备 |
CN112101522A (zh) * | 2020-08-20 | 2020-12-18 | 四川大学 | 基于可视化的交互式机器学习方法 |
US11151480B1 (en) * | 2020-06-22 | 2021-10-19 | Sas Institute Inc. | Hyperparameter tuning system results viewer |
CN113673174A (zh) * | 2021-09-08 | 2021-11-19 | 中国平安人寿保险股份有限公司 | 超参数确定方法、装置、设备及存储介质 |
CN114357253A (zh) * | 2021-12-27 | 2022-04-15 | 上海商汤科技开发有限公司 | 数据可视化方法、装置、电子设备和存储介质 |
-
2021
- 2021-12-27 CN CN202111619175.7A patent/CN114357253A/zh active Pending
-
2022
- 2022-05-27 WO PCT/CN2022/095486 patent/WO2023123851A1/fr unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991658A (zh) * | 2019-11-28 | 2020-04-10 | 重庆紫光华山智安科技有限公司 | 模型训练方法、装置、电子设备和计算机可读存储介质 |
US11151480B1 (en) * | 2020-06-22 | 2021-10-19 | Sas Institute Inc. | Hyperparameter tuning system results viewer |
CN111797294A (zh) * | 2020-07-09 | 2020-10-20 | 上海商汤智能科技有限公司 | 可视化方法及相关设备 |
CN112101522A (zh) * | 2020-08-20 | 2020-12-18 | 四川大学 | 基于可视化的交互式机器学习方法 |
CN113673174A (zh) * | 2021-09-08 | 2021-11-19 | 中国平安人寿保险股份有限公司 | 超参数确定方法、装置、设备及存储介质 |
CN114357253A (zh) * | 2021-12-27 | 2022-04-15 | 上海商汤科技开发有限公司 | 数据可视化方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114357253A (zh) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen | A tutorial on kernel density estimation and recent advances | |
Korenevskii et al. | Generation of fuzzy network models taught on basis of data structure for medical expert systems | |
Mühlbacher et al. | A partition-based framework for building and validating regression models | |
Pflüger et al. | Spatially adaptive sparse grids for high-dimensional data-driven problems | |
Ziamtsov et al. | Machine learning approaches to improve three basic plant phenotyping tasks using three-dimensional point clouds | |
CN106537422A (zh) | 用于捕获信息内的关系的系统和方法 | |
Tercan et al. | Improving the laser cutting process design by machine learning techniques | |
Servant et al. | Fuzzy fine-grained code-history analysis | |
RU2689818C1 (ru) | Способ интерпретации искусственных нейронных сетей | |
CN111797998A (zh) | 生成机器学习样本的组合特征的方法及系统 | |
CN113614778A (zh) | 图像分析系统及使用该图像分析系统的方法 | |
WO2023123851A1 (fr) | Procédé et appareil de visualisation de données, dispositif électronique, support de stockage et programme | |
WO2021058867A1 (fr) | Analyse d'image en pathologie | |
Zhang et al. | A view-reduction based multi-view TSK fuzzy system and its application for textile color classification | |
Lytvyn et al. | Information technologies for decision support in industry-specific geographic information systems based on swarm intelligence | |
JP6178023B2 (ja) | モジュール分割支援装置、方法、及びプログラム | |
US20240104804A1 (en) | System for clustering data points | |
Gajamannage et al. | Dimensionality reduction of collective motion by principal manifolds | |
US20210150078A1 (en) | Reconstructing an object | |
de Sousa et al. | Evolved explainable classifications for lymph node metastases | |
Geng et al. | Automated variance modeling for three-dimensional point cloud data via Bayesian neural networks | |
Rasal et al. | Deep structural causal shape models | |
Resti et al. | Performance improvement of decision tree model using fuzzy membership function for classification of corn plant diseases and pests | |
Levi et al. | Fast and Simple Explainability for Point Cloud Networks | |
Stegmaier et al. | Fuzzy-based propagation of prior knowledge to improve large-scale image analysis pipelines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22913143 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |