WO2021136365A1 - 基于机器学习模型的应用开发方法、装置及电子设备 - Google Patents

基于机器学习模型的应用开发方法、装置及电子设备 Download PDF

Info

Publication number
WO2021136365A1
WO2021136365A1 PCT/CN2020/141344 CN2020141344W WO2021136365A1 WO 2021136365 A1 WO2021136365 A1 WO 2021136365A1 CN 2020141344 W CN2020141344 W CN 2020141344W WO 2021136365 A1 WO2021136365 A1 WO 2021136365A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine learning
application
model
training
user
Prior art date
Application number
PCT/CN2020/141344
Other languages
English (en)
French (fr)
Inventor
黄缨宁
南雨含
郭朕
张宇
许立鹏
孙佳维
Original Assignee
第四范式(北京)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 第四范式(北京)技术有限公司 filed Critical 第四范式(北京)技术有限公司
Publication of WO2021136365A1 publication Critical patent/WO2021136365A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to the technical field of application development, and more specifically, to an application development method based on a machine learning model, an application development device based on a machine learning model, an electronic device, and a computer-readable storage medium.
  • One purpose of the embodiments of the present disclosure is to provide a new technical solution based on application development of a machine learning model.
  • an application development method based on a machine learning model including: obtaining the type of the machine learning model set by the user; according to the machine learning strategy corresponding to the type, through one or more automatic training Model experiments to obtain one or more machine learning models, wherein the machine learning strategy is used to control at least one of data, algorithms, and resources related to model training; according to the obtained machine learning model, generate The application.
  • an application development device including: a model type acquisition module for acquiring the type of a machine learning model set by a user; and a model training module configured to learn according to the type of machine learning A strategy, one or more machine learning models are obtained through one or more experiments of automatically training the model, wherein the machine learning strategy is used to control at least one of data, algorithms, and resources related to model training; application The generating module is configured to generate the application according to the obtained machine learning model.
  • an electronic device including: the apparatus according to the second aspect of the present disclosure; or, a processor and a memory, where the memory is used to store instructions, and the instructions are used to control the
  • the processor executes the application development method based on the machine learning model according to the first aspect of the present disclosure.
  • a computer-readable storage medium storing executable commands that, when executed by a processor, implement the machine learning model-based Application development methods.
  • the application development method based on the machine learning model provided in this embodiment can independently build artificial intelligence, especially vision application services, and meet the one-stop shop from the access and storage of labeled data under the standard path to the construction and optimization of the model, and Apply the model online to provide online services for actual business scenarios. Supplemented by data, service, and application monitoring and management suites, it realizes integrated, automated, and intelligent artificial intelligence development and management. Simplify the complex application construction process through low-threshold interface operation, and solve the problem of high labor cost in the development of artificial intelligence applications.
  • Figure 1 shows a schematic diagram of an electronic device that can be used to implement embodiments of the present disclosure.
  • Fig. 2 shows a flowchart of an application development method based on a machine learning model according to an embodiment of the present disclosure.
  • Fig. 3 shows a schematic diagram of a layout picture in an example of an embodiment of the present disclosure.
  • Fig. 4 shows a schematic diagram of an application development apparatus according to an embodiment of the present disclosure.
  • Fig. 5 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
  • Figure 1 shows a schematic diagram of an electronic device that can be used to implement embodiments of the present disclosure.
  • the electronic device 1000 includes a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, an output device 1500, and an input device 1600.
  • the processor 1100 is, for example, a central processing unit CPU, a microprocessor MCU, and the like.
  • the memory 1200 is, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, or the like.
  • the interface device 1300 is, for example, a USB interface, a headphone interface, or the like.
  • the communication device 1400 can perform wired or wireless communication, for example.
  • the output device 1500 is, for example, a liquid crystal display, a touch display, a speaker, and the like.
  • the input device 1600 is, for example, a touch screen, a keyboard, a mouse, a microphone, and the like.
  • the memory 1200 of the electronic device 1000 is used to store instructions, and the instructions are used to control the processor 1100 to execute the application development method based on the machine learning model provided by the embodiments of the present disclosure.
  • technical personnel can design instructions according to the solutions disclosed in the present disclosure. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
  • FIG. 1 shows multiple devices of the electronic device 1000
  • the electronic device 1000 of the embodiment of the present disclosure may only involve some of the devices.
  • the electronic device 1000 only involves the memory 1200, the processor 1100, and the output device. 1500 and input device 1600.
  • the electronic device 1000 shown in FIG. 1 is merely explanatory, and is by no means intended to limit the present invention, its application or use.
  • This embodiment provides an application development method based on a machine learning model.
  • the method is implemented by, for example, the electronic device 1000 in FIG. 1. As shown in Figure 2, the method includes the following steps S1100-S1200.
  • step S1100 the type of the machine learning model set by the user is acquired.
  • the types of machine learning models can be divided according to their application scenarios.
  • the machine learning model is a machine learning model related to computer vision, and its type includes at least one of an image classification type, an object recognition type, a text positioning type, and a text recognition type.
  • Image classification is to distinguish different image categories based on the semantic information of the image.
  • the semantic information of an image includes visual layer information, object layer information, and conceptual layer information.
  • the visual layer refers to the bottom layer of the image, for example, including the color, texture, and shape of the image. These features are called the bottom layer feature semantics;
  • the object layer is also That is, the middle layer includes the attribute characteristics of the image, such as the state of the object in the image at a certain moment, etc.;
  • the conceptual layer is the high-level, which is the information expressed by the image that is closest to human understanding and reflects the image content.
  • Different image categories are distinguished according to semantic information, that is, the classification is carried out according to the division of different image content.
  • Image classification is an important basic problem in computer vision. Different image categories are distinguished according to the semantic information of the image, and the images are labeled with different categories. Image classification is the basis of other high-level vision tasks such as image detection, entity segmentation, and object tracking. It can be applied to face recognition in the security field, intelligent video analysis, and traffic scene recognition in the traffic field.
  • Object recognition is to locate the target first and then classify the image content.
  • Object recognition is a process of classifying and marking different objects that may exist in the image after detecting and positioning frames based on the semantic information of the image. For example, if the picture contains multiple objects, you can first determine the objects in the picture, that is, perform target positioning, and then identify the type of each object to determine the label of each object. Considering that picture data in real life usually describes a scene where multiple objects coexist, it is often difficult to effectively deal with a single image classification. At this time, object recognition uses the idea of divide and conquer, first positioning and then classification, which can greatly improve the accuracy of the recognition results, and can be applied to aerospace, medicine, communications, industrial automation, robotics and military fields.
  • Text positioning is to identify the position of text information in the picture. Text positioning uses computer vision to intelligently recognize and locate text information in pictures to generate target candidate frames with category information, which can be used in the recognition of bills and certificates with various text information.
  • Text recognition is to intelligently recognize the text content on a picture as computer editable text.
  • the input text is the main image fragment to generate the corresponding text that can be edited by the computer.
  • the step of obtaining the type of machine learning model set by the user includes the following process:
  • the type of machine learning model selected by the user from the candidate types is received. For example, in response to the user's submission operation, the type selected by the user is obtained based on the state of the selection control.
  • step S1200 according to the corresponding type of machine learning strategy, one or more machine learning models are obtained through one or more experiments of automatically training the model, where the machine learning strategy is used to control the data, algorithms, and algorithms related to model training. At least one of the resources.
  • machine learning strategies are used to control data, algorithms, and resources related to model training.
  • the data related to model training is, for example, the training data set used in the training task.
  • Algorithms related to model training for example, the calculation model, training parameters, and training indicators used in the training task.
  • Resources related to model training for example, CPU resources, GPU resources, memory resources, etc. allocated by training tasks.
  • networks such as Resnet, Inception, or Mobilenet can be used to build a machine learning model.
  • the Resnet network is proposed to solve the difficult training problem of the deep network. It can greatly accelerate the training of the deep neural network while ensuring the amount of parameters, and the accuracy is also greatly improved.
  • the Inception network introduces the inception structure, increases the width of the network, and can extract richer features. At the same time, it uses a 1 ⁇ 1 convolution kernel to reduce network parameters, and uses Batch Normalization (batch normalization) to accelerate network training and reduce the Fitting.
  • the Mobilenet network uses a separable convolution method to reduce model parameters and calculations, which greatly improves the cost-effectiveness of the network.
  • a Faster Region Convolution Neural Network (Faster-rcnn) method can be used to build a machine learning model.
  • Faster-rcnn is a two-stage object recognition method, which unifies the four basic steps of object recognition (candidate region generation, feature extraction, classification, location refinement) into a deep network framework, and the calculation is not repeated. Improve the running speed.
  • the DeepText model can be used to build a machine learning model.
  • DeepText is a two-stage model based on the improvement of Faster-rcnn for text localization, and its structure is the same as Faster-rcnn: first, the feature layer uses VGG-16, and the second is the algorithm used to extract candidate regions by the region selection network ( Region Proposal Network (RPN) and Faster-rcnn used to detect objects.
  • RPN Region Proposal Network
  • FPN Region Proposal Network
  • the Densenet algorithm uses the architecture of the backbone network and the CTC loss function, and can flexibly choose the structure of the network. For example, you can choose densenet or simplenet as the backbone network, and you can choose whether to choose the cyclic neural network RNN.
  • the Connectionist Temporal Classification (CTC) algorithm uses a six-layer Convolutional Neural Networks (CNN) to extract features from the model skeleton, and uses a two-layer Recurrent Neural Network (RNN) to combine the timing features.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Network
  • the step of obtaining one or more machine learning models through one or more experiments of automatically training the model according to the corresponding type of machine learning strategy further includes the following process:
  • the setting items include at least one of annotation data upload, data preprocessing strategy, algorithm configuration, and resource configuration.
  • Annotated data upload refers to uploading annotated data for model training.
  • labeling refers to the use of sample data with result information in a supervised learning mode.
  • the sample data with result information is called label data for model training.
  • annotation refers to the classification result of the image data.
  • the annotation refers to the classification result of the image data.
  • the annotation of the image is the invoice; in the object recognition scene, the annotation refers to the The target area (Region of Interest, ROI) in the image is framed, and the framed image area is distinguished and determined.
  • the label here is a composite label with the coordinate range of the target area and the final classification result, such as
  • the annotation of the image area includes the coordinate range of the image area in the image and the classification result of the person.
  • a set of data composed of image data and annotation data is called a data set.
  • the annotation data is obtained by publishing an annotation task, and the obtained annotation data is uploaded for use in training the model.
  • the labeling task includes, for example, the following requirements: the image of the training process should be consistent with the image environment in the application scenario; the data should be kept intact and pollution-free; the format of the labeling data should conform to the preset format.
  • At least one of the following processing can be performed according to the user's settings: discarding the abnormal file, ignoring the abnormal annotation, importing failure, using the recommended configuration, and using the custom configuration.
  • it may also include a graphical interface displaying the uploaded annotation data in response to user input, wherein at least one of the following items is provided on the graphical interface: upload log The details of or its entry, a shortcut for copying the path of the annotation data, and a button for viewing the annotation data.
  • the data preprocessing strategy refers to the preprocessing strategy of transforming and enhancing the labeled data.
  • Data preprocessing generally includes two parts. One part is data splitting. According to certain splitting rules, the data can be split into two data sets: training data set and validation data set. In the platform that uses labeled data for model training, you can Supports two splitting methods, random splitting and designating one/several data sets as validation sets.
  • the training set is used for model training, and the validation set is used to evaluate the effect of the model.
  • the second part is data enhancement, which performs certain transformations and scaling operations on the training data set, including certain processing methods such as cropping, segmentation, noise, etc., making the model adaptable and robust to various sample pictures in the real environment Stronger sex.
  • the common data enhancement methods in the computer vision field include cropping, rotation, and noise.
  • Cropping refers to selecting a part of the image, cutting out this part of the image, and then adjusting it to the size of the original image.
  • Rotation refers to the clockwise or counterclockwise rotation of the picture. Note that when rotating, it is best to rotate 90-180 degrees, otherwise there will be scale problems.
  • the main purpose of adding noise is to make the image unclear, and disturb the observable information of the image through a series of mathematical distribution calculations on the picture set.
  • the technical logic of other data enhancement methods is similar to the method described above.
  • Algorithm configuration is used to fine-tune the training algorithm of the model, including the core structure that affects the training process and related hyperparameters.
  • the hyperparameters are the parameters set before the start of learning, rather than the data obtained through training. Under normal circumstances, it is necessary to optimize the hyperparameters, and improve the performance and effect of learning through better hyperparameters.
  • Resource configuration used to control the corresponding resources allocated for training, including GPU, CPU, and memory configuration.
  • At least one of the data preprocessing strategy, the algorithm configuration, and the resource configuration provides different levels of configuration strategies.
  • the "smart” mode is the black box mode, and a variety of data preprocessing methods are built in according to different data types, without the user's choice.
  • the "fine adjustment” mode allows users to make fine adjustments.
  • the data preprocessing configuration includes a variety of methods that can be performed in the preprocessing stage. The user can choose the preprocessing method according to the training needs.
  • the "Expert” mode opens all the adjustable parameters of the preprocessing method, and the user can adjust the relevant parameters according to the training needs.
  • the "smart” mode is the black box mode, which automatically provides users with the best model and algorithm hyperparameters according to different project types and data types.
  • the "fine adjustment” mode is the fine adjustment mode. Different models can be selected for the project type and whether transfer learning is required.
  • "Expert” mode opens all parameters of hyperparameter configuration, and users can adjust related parameters according to training needs.
  • the "smart” mode is the black box mode. According to different data types and model types, combined with the user's resource configuration and current resource usage, the optimal resource configuration and scheduling is performed without the user's own consideration.
  • the "fine adjustment” mode is the fine adjustment mode. Since image training occupies the most GPU compression resources, users will be provided here to modify GPU resources and whether to consider opening the elastic scaling of the estimated service.
  • the "Expert” mode opens up the GPU, CPU, and memory configuration entries, as well as the range of the number of flexible instances. Among them, the number of instances refers to the number of services running at the same time.
  • the default level of at least one of the preprocessing strategy, the algorithm configuration, and the resource configuration can be provided according to the type of the labeled data and the machine learning model.
  • the optimal value of each configuration is automatically determined according to the type of the annotation data and the machine learning model, which is used as the default level of each configuration, that is, the default configuration of the "intelligent model”.
  • Migration learning refers to the transfer of knowledge from one domain (ie the source domain) to another domain (ie the target domain), so that the target domain can achieve better learning results, and it can generate artificial intelligence in the small data field and break the impact of artificial intelligence.
  • the transfer learning of computer vision algorithms refers to the first use of a data set different from the target domain to train the general model as a backbone network model, and then use the data of the target scene to optimize the training of the backbone network model. For example, in an object detection scene, first use a backbone network to extract image features, and then perform subsequent processing based on the extracted features.
  • the user when the user chooses to perform migration learning, the user is provided with candidate backbone network models, including mobilenet network, resnet network, inception network, etc., and the subsequent migration learning process is performed based on the backbone network model selected by the user.
  • candidate backbone network models including mobilenet network, resnet network, inception network, etc.
  • the application development method in this embodiment also supports the distributed training of the machine learning model, and automatically controls the training strategy in the "black box" mode of the algorithm configuration. Based on the automatic control training strategy, the training task of each model can be divided into multiple sub-tasks, and accordingly, the training strategy of each model can be divided into multiple sub-training strategies.
  • the scheduling sub-training strategy is distributed according to data, parameters, and training tasks.
  • the training of the machine learning model can be performed by a trainer, an evaluator and an optimizer.
  • the trainer can be used to perform training tasks and obtain training results; the evaluator is used to evaluate the pros and cons of the training results to obtain the evaluation results; and the optimizer uses According to the evaluation results, the hyperparameters used in the next round of training are determined.
  • the corresponding trainer, evaluator, and optimizer can be configured for each subtask. Each trainer executes the corresponding training subtask, and the output weight is given to the special evaluator for evaluation, and the optimizer is based on The evaluation results produce super-parameters, and super-participants instruct each trainer to perform the next round of training, and finally generate a model.
  • the above multiple subtasks can be executed in parallel in a distributed system, thereby realizing distributed training of machine learning models and improving model training efficiency.
  • one or more experiments of automatic training models are carried out based on the annotation data uploaded by the user to obtain one or more machine learning models.
  • a project refers to a combination of a series of tasks oriented by a certain result, and each project generates a corresponding application.
  • An experiment represents a model training job in a certain project, and a model can be obtained through a successful experiment.
  • An experiment can be divided into three stages: pre-processing, training, and post-processing, as well as five states: waiting to be started, queuing, running failed, running terminated, and running successfully.
  • At least one of the experiment version, experiment status, experiment progress, accuracy and other indicators of the experiment is shown to the user.
  • the experimental version refers to the serial number of the current experiment in the project experiment list.
  • the default experimental version is 1. If you continue to create an experiment, the experimental version will increase in the sequence of natural numbers.
  • the experiment status refers to the running status of the current experiment, including "queuing", “running”, “running successfully”, etc.
  • Experimental progress refers to the progress of model training, which can be displayed through a progress bar, etc.
  • Accuracy refers to the accuracy of the model generated by the current experiment on the validation set, expressed as a percentage. Creation time, that is, the time when the current experiment was created.
  • Basic information of the experiment including the name of the project to which the experiment belongs, the type of project to which the experiment belongs, the experiment version, the experiment status, the progress of the experiment, the creation time and other information.
  • Experiment log refers to the operation information record of the current experiment.
  • the training detail index refers to the accuracy index during the experiment, including training loss (loss on the training set) and so on.
  • Experimental evaluation refers to the evaluation of the model after the experiment is completed, including verification accuracy rate, verification accuracy rate, verification recall rate, verification F value, etc.
  • showing the user the training detailed indicators of the experiment includes: obtaining the indicators of multiple training iterations, and showing the indicator evolution process between the multiple training iterations. For example, a line graph is established with the number of iterations as the horizontal axis and training loss as the vertical axis, and the indicator evolution process between multiple training iterations is displayed through the line graph.
  • the experiment after the experiment is completed, it also includes the step of creating an experiment evaluation task to evaluate the experimental output model.
  • displaying the experimental evaluation of the experiment to users includes: displaying at least one of the evaluation index statistics, resource allocation, real-time log, and error data under the experimental evaluation task.
  • Evaluation index statistics that is, the overall overview of each evaluation index.
  • Resource allocation that is, the running resources allocated for the experimental evaluation task.
  • Real-time log that is, real-time record of the running status of the evaluation task.
  • Error case data that is, cases where the model recognizes errors in the evaluation task, such as pictures of error cases.
  • creating an experimental evaluation task includes: selecting an evaluation data set and configuring resources for the evaluation task. Based on the user's check operation, the data set used for evaluation can be determined, and the evaluation result of the corresponding model effect can be obtained by evaluating the data set. Resource configuration can adopt the "smart" mode, which is the default configuration.
  • step S1300 an application is generated according to the obtained machine learning model.
  • step S1300 further includes: generating an application based on a single machine learning model after training; or generating an application based on a template process, where the template process is used to limit the training of multiple machine learning models in the application process The orchestration process.
  • this embodiment provides a way to generate an application based on a template process according to multiple machine learning models.
  • the target process in this scenario is the OCR recognition process.
  • the OCR process can be divided into two parts: text positioning and text recognition.
  • the process of generating an application based on the OCR recognition process includes the following steps.
  • This step includes: displaying the OCR sample picture selected by the user in the canvas area; providing controls for setting the OCR recognition area in or around the canvas area; and responding to the user's operation on the control, on the displayed OCR sample picture Set one or more OCR recognition areas corresponding to the image content to obtain the OCR layout image.
  • This step also includes: providing a control for editing the OCR sample picture in or around the canvas area; and editing the OCR sample picture in response to the user's operation on the control, where the editing includes changing pictures, selecting, moving, At least one of cropping, zooming in, and zooming out.
  • the user is provided with an operation interface that displays the OCR layout picture and is used to configure one or more models respectively applied to each recognition area in the OCR layout picture.
  • the configuration operation performed by the user in the operation interface is received to generate an application that applies one or more models to each recognition area.
  • the electronic device 1000 receives a picture uploaded by a user as an OCR format. After that, the electronic device 1000 displays the picture in the canvas area, and provides editing controls in or around the canvas area to support the user to perform operations such as changing pictures, selecting, moving, cropping, zooming in, and zooming out.
  • the identification box is provided in the operation interface, and the user locates the identification area through the identification box.
  • the dashed box in Figure 3 shows four identification areas, which are identified area 1, identification area 2, identification area 3, and Identification area 4. The user can name the four identification areas in sequence as "bill name", "invoicing date", "uppercase amount", and “lowercase amount”.
  • the corresponding positioning model and recognition model can be trained. For example, for the "uppercase amount” recognition area, the corresponding "uppercase amount” positioning model and "uppercase amount” recognition model can be trained.
  • the user can select the application type as "application template" to generate an application based on the template process.
  • application template an application capable of recognizing multiple areas of the same picture.
  • the modeling creation interface described above is also used to display the layout picture and to receive the user's selection of the recognition area in the layout picture, where the layout picture is used for the user to specify the recognition area, and the related process also includes: Receive the user's selection of the recognition area in the layout picture to crop the annotated image so that the cropped annotated image is consistent with the selected recognition area.
  • the application development method further includes: launching the application and visually presenting to users at least one of application information, resources and instances, resource monitoring, API call monitoring, and application logs.
  • going online refers to deploying applications in related devices to provide corresponding services.
  • the sample picture uploaded by the user is received, and the prediction result of the online application for the sample picture is displayed. In this way, it is helpful for the user to detect the recognition effect of the generated application.
  • the application development method based on the machine learning model provided in this embodiment can independently build artificial intelligence, especially vision application services, and meet the one-stop shop from the access and storage of labeled data under the standard path to the construction and optimization of the model, and Apply the model online to provide online services for actual business scenarios. Supplemented by data, service, and application monitoring and management suites, it realizes integrated, automated, and intelligent artificial intelligence development and management. Simplify the complex application construction process through low-threshold interface operation, and solve the problem of high labor cost in the development of artificial intelligence applications.
  • the application development apparatus 400 includes a model type acquisition module 410, a model training module 420, and an application generation module 430.
  • the model type obtaining module 410 is configured to obtain the type of the machine learning model set by the user.
  • the model training module 420 is configured to obtain one or more machine learning models through one or more experiments of automatically training the model according to the corresponding type of machine learning strategy, wherein the machine learning strategy is used to control data related to model training At least one of, algorithm, and resource.
  • the application generating module 430 is configured to generate an application according to the obtained machine learning model.
  • the machine learning model is a machine learning model related to computer vision.
  • the type of the machine learning model includes at least one of an image classification type, an object recognition type, a text localization type, and a text recognition type.
  • the model training module 420 is configured to: provide the user with a modeling creation interface for setting automatic model training tasks according to the corresponding type of machine learning strategy; receive the setting operation performed by the user in the modeling creation interface, To obtain the setting items required for the automatic training model; and according to the obtained setting items, perform one or more experiments of the automatic training model based on the annotation data uploaded by the user to obtain one or more machine learning models.
  • the setting items include at least one of annotation data upload, data preprocessing strategy, algorithm configuration, and resource configuration.
  • At least one of the data preprocessing strategy, the algorithm configuration, and the resource configuration provides different levels of configuration strategies.
  • the model training module 420 is configured to provide a default level of at least one of a preprocessing strategy, an algorithm configuration, and a resource configuration according to the type of the labeled data and the machine learning model.
  • the modeling creation interface is also used to display the layout picture and to receive the user's selection of the recognition area in the layout picture, where the layout picture is used for the user to specify the recognition area, and the model training module 420 is also configured to : Receive the user's selection of the recognition area in the layout picture to crop the annotated image so that the cropped annotated image is consistent with the selected recognition area.
  • model training module 420 is also configured to show users the experimental version, experimental status, experimental progress, accuracy and other indicators of the experiment, creation time, experimental basic information, experimental log, training detailed indicators, and experimental evaluation. At least one of.
  • the model training module 420 is configured to obtain indicators of multiple training iterations, and display the evolution process of the indicators between the multiple training iterations.
  • the model training module 420 is further configured to: create an experimental evaluation task to evaluate the experimental output model, and present the experimental evaluation of the experiment to the user includes: display the evaluation index statistics and resource allocation under the experimental evaluation task , At least one of real-time logs and error data.
  • one or more experiments of automatically training the model belong to the same project, where each project generates a corresponding application.
  • the model training module 420 is configured to select an evaluation data set and configure resources for the evaluation task.
  • the application generation module 430 is configured to: generate an application based on a single machine learning model after training; or generate an application based on a template process, where the template process is used to limit the training of multiple machine learning models.
  • the orchestration process in the application process is configured to: generate an application based on a single machine learning model after training; or generate an application based on a template process, where the template process is used to limit the training of multiple machine learning models. The orchestration process in the application process.
  • the application generation module 430 is configured to provide the user with the application parameters involved in the template process; according to the user's setting of the application parameters, generate an application using multiple machine learning models in accordance with the template process.
  • the template process includes an OCR recognition process
  • the application generation module 430 is used to: provide users with an operation interface that displays an OCR format picture and is used to configure one or more models respectively applied to each recognition area in the OCR format picture; And receiving the configuration operation performed by the user in the operation interface to generate an application that applies one or more models to each recognition area.
  • the multiple models include a positioning model and a recognition model for the recognition area.
  • the application generation module 430 is further configured to: create an OCR layout corresponding to the OCR layout picture.
  • the application generation module 430 is also configured to: display the OCR sample picture selected by the user in the canvas area; provide controls for setting the OCR recognition area in or around the canvas area; and respond to the user's control of the control Operate to set one or more OCR recognition areas corresponding to the image content on the displayed OCR sample image to obtain the OCR layout image.
  • the application generation module 430 is further configured to: provide a control for editing the OCR sample picture in or around the canvas area; and edit the OCR sample picture in response to the user's operation on the control, wherein, Editing includes at least one of changing images, selecting, moving, cropping, zooming in and zooming out.
  • the application generation module 430 is further configured to launch the application and visually display at least one of application information, resources and instances, resource monitoring, API call monitoring, and application logs to the user.
  • the application generation module 430 is further configured to receive sample pictures uploaded by the user, and display the prediction results of the online application for the sample pictures.
  • the device further includes an annotation data acquisition module configured to obtain annotation data by publishing an annotation task, and upload the acquired annotation data for use in training the model.
  • the annotation data acquisition module is configured to perform at least one of the following processing according to the user's setting: discarding the abnormal file, ignoring the abnormal annotation, introducing failure, using the recommended configuration, and using the custom configuration.
  • the annotation data acquisition module is further configured to display a graphical interface about the uploaded annotation data in response to user input, wherein at least one of the following items is provided on the graphical interface: details of the uploaded log or Entry, shortcut for copying the path of labeling data, and button for viewing labeling data.
  • This embodiment provides an electronic device, which includes the application development apparatus 400 shown in FIG. 4.
  • the electronic device is the electronic device 500 shown in FIG. 5, and includes a processor 510 and a memory 520.
  • the memory 510 is used to store instructions, which are used to control the processor to execute the application development method based on the machine learning model described in the method embodiments of the present disclosure.
  • This embodiment provides a computer-readable storage medium.
  • the computer-readable storage medium stores executable commands, and when the executable commands are executed by the processor, the application development method based on the machine learning model described in the method embodiments of the present disclosure is implemented.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the embodiments of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
  • Programming languages include object-oriented programming languages-such as Smalltalk, C++, etc., and conventional procedural programming languages-such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to implement various aspects of the embodiments of the present disclosure.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram can represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more executables for implementing the specified logical functions. instruction.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation through hardware, implementation through software, and implementation through a combination of software and hardware are all equivalent.
  • the present disclosure during the process of executing the assigned tasks by the working nodes, it dynamically adjusts the resource usage, thereby realizing efficient task allocation and resource scheduling, and improving task execution efficiency and resource utilization. Therefore, the present disclosure has strong industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

基于机器学习模型的应用开发方法、装置及电子设备。该方法包括:获取用户设置的机器学习模型的类型(S1100);根据对应类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项(S1200);根据获得的机器学习模型,生成应用(S1300)。该方法将复杂的应用构建流程简化,解决人工智能应用开发中人力成本高的问题。

Description

基于机器学习模型的应用开发方法、装置及电子设备
本申请要求2019年12月30日提交的、申请号为201911395248.1、名称为“基于机器学习模型的应用开发方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开实施例涉及应用开发技术领域,更具体地,涉及一种基于机器学习模型的应用开发方法、一种基于机器学习模型的应用开发装置、一种电子设备以及一种计算机可读存储介质。
背景技术
随着人工智能技术快速发展,人工智能的应用场景也日趋广泛多样。例如,计算机视觉相关的人工智能技术可以应用于人脸识别、车牌识别、票据识别、细菌识别等。
人工智能应用的开发通常环节较多、过程复杂,并且对开发人员的能力要求较高。采用已有的人工智能应用,可以显著降低解决业务问题所需要的成本。但是,现有的成熟应用通常集中在若干个主流的应用场景,例如大多数计算机视觉相关的应用集中在在人脸识别、车辆识别、车牌识别等领域。对于主流场景之外的个性化需求(即长尾需求),例如票据识别、细菌识别等场景,仍然需要开发专门的人工智能应用。
因此,有必要提出一种新的人工智能应用的开发方法,以降低开发难度,满足多样化的业务需求。
发明内容
本公开实施例的一个目的是提供一种基于机器学习模型的应用开发的新技术方案。
根据本公开的第一方面,提供了一种基于机器学习模型的应用开发方法,包括:获取用户设置的机器学习模型的类型;根据对应所述类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,所述机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项;根据获得的所述机器学习模型,生成所述应用。
根据本公开的第二方面,提供了一种应用开发装置,包括:模型类型获取模块,用于获取用户设置的机器学习模型的类型;模型训练模块,被配置为根据对应所述类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,所述机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项;应用生成模块,被配置为根据获得的所述机器学习模型,生成所述应用。
根据本公开的第三方面,提供了一种电子设备,包括:如本公开第二方面所述的装置;或者,处理器和存储器,所述存储器用于存储指令,所述指令用于控制所述处理器执行根据本公开第一方面所述的基于机器学习模型的应用开发方法。
根据本公开的第四方面,提供了一种计算机可读存储介质,存储有可执行命令,所述可执行命令被处理器执行时,实现根据本公开第一方面所述的基于机器学习模型的应用开发方法。
本实施例提供的基于机器学习模型的应用开发方法,能够自主构建人工智能尤其是视觉类应用服务,一站式满足从标准路径下的标注数据的接入、存储到模型的构建、优化,直至将模型应用于线上,为实际业务场景提供在线服务。辅以数据、服务、应用的监控管理套件,实现一体化、自动化、智能化的人工智能开发管理。通过低门槛的界面化操作将复杂的应用构建流程简化,解决人工智能应用开发中人力成本高的问题。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
被结合在说明书中并构成说明书的一部分的附图示出了本公开的实施例,并且连同其说明一起用于解释本公开实施例的原理。
图1示出了可用于实现本公开实施例的电子设备的示意图。
图2示出了根据本公开实施例的基于机器学习模型的应用开发方法的流程图。
图3示出了本公开实施例的一个例子中版式图片的示意图。
图4示出了根据本公开实施例的应用开发装置的示意图。
图5示出了根据本公开实施例的电子设备的示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,技术、方法和设备应当被视为说明书的一部分。
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
<硬件配置>
图1示出了可用于实现本公开实施例的电子设备的示意图。
如图1所示,电子设备1000包括处理器1100、存储器1200、接口装置1300、通信装置1400、输出装置1500、输入装置1600。其中,处理器1100例如是中央处理器CPU、微处理器MCU等。存储器1200例如是ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1300例如是USB接口、耳机接口等。通信装置1400例如能够进行有线或无线通信。输出装置1500例如是液晶显示屏、触摸显示屏、扬声器等。输入装置1600例如是触摸屏、键盘、鼠标、麦克风等。
应用于本公开的实施例中,电子设备1000的存储器1200用于存储指令,指令用于控制处理器1100执行本公开实施例提供的基于机器学习模型的应用开发方法。在上述描述中,技术人员可以根据本公开所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知,故在此不再详细描述。
尽管在图1中示出了电子设备1000的多个装置,但是,本公开实施例的电子设备1000可以仅涉及其中的部分装置,例如,电子设备1000只涉及存储器1200、处理器1100、输出装置1500和输入装置1600。
图1所示的电子设备1000仅是解释性的,并且决不是为了限制本发明、其应用或用途。
<方法实施例>
本实施例提供了一种基于机器学习模型的应用开发方法,该方法例如由图1中的电子设备1000实施。如图2所示,该方法包括以下步骤S1100-S1200。
在步骤S1100中,获取用户设置的机器学习模型的类型。
本实施例中,机器学习模型的类型可以按照其应用场景划分。在一个例子中,机器学习模型为计算机视觉相关的机器学习模型,其类型包括图像分类类型、物体识别类型、文本定位类型、文本识别类型中的至少一种。
图像分类,是根据图像的语义信息区分不同的图像类别。图像的语义信息包括视觉层信息、对象层信息和概念层信息,视觉层指图像的底层,例如,包括图像的颜色、纹理和形状等等,这些特征都被称为底层特征语义;对象层也即中间层,包括图像的属性特征,该属性特征例如包括图像中对象在某一时刻的状态等;概念层是高层,是图像表达出的最接近人类理解的信息,反映图像内容。根据语义信息区分不同的图像类别,也即为按照对不同图像内容的划分所进行的分类。例如,图像的语义信息反映图像内容为发票,则该图像的图像类别即为发票类图像。图像分类是计算机视觉中重要的基本问题,根据图像的语义信息区分不同的图像类别,为图像打上不同的类别标签。图像分类是图像检测、实体分割、物体跟踪等其他高层视觉任务的基础,可应用于安防领域的人脸识别、智能视频分析、交通领域的交通场景识别等。
物体识别,是对图片内容先进行目标定位再进行目标分类。物体识别是根据图像的语义信息,将其中可能存在的不同物体经过检测定位框定后再进行分类和打标的过程。例如,图片中包含多个物体,这可以先确定图片中存在的物体,也即进行目标定位,再分别识别各物体的类型,以确定各物体的标签。考虑到现实生活中的图片数据通常描述的是多种物体符合并存的场景,所以单独的图像分类往往难以有效应对。这时,物体识别借助分治的思路,先定位后分类,能够极大提升识别结果的准确性,可应用于航天、医学、通信、工业自动化、机器人及军事领域。
文本定位,是识别图片中文本信息的位置。文本定位利用计算机视觉智能识别图片中的文本信息并进行定位,生成带有类别信息的目标候选框,可应用于带有多种文本信息的票据、证件识别中。
文本识别,是将图片上的文字内容智能识别为计算机可编辑的文本。在文本识别中,输入文字为主体的图像碎片,生成计算机可编辑的对应文本。显著加快业务流程,提供有价值的信息,能够应用于金融、保险、咨询等行业。
在一个例子中,获取用户设置的机器学习模型的类型的步骤,包括以下过程:
首先,向用户展示与各种机器学习任务分别对应的机器学习模型的候选类型。例如,向用户展示“图像分类”、“物体识别”、“文本定位”、“文本识别”等候选类型。
其次,接收用户从候选类型中选择的机器学习模型的类型。例如,响应于用户的提交操作,基于选择控件的状态获得用户选择的类型。
在步骤S1200中,根据对应类型的机器学习策略,通过一次或者多次自动训练模型的实验,获得一个或者多个机器学习模型,其中,机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项。
对于不同类型的机器学习模型,模型训练过程中涉及的数据、算法、资源通常也 不同。本实施例中,通过机器学习策略来控制模型训练相关的数据、算法和资源。
与模型训练相关的数据,例如是训练任务所使用的训练数据集。与模型训练有关的算法,例如是训练任务所使用的计算模型、训练参数以及训练指标。与模型训练相关的资源,例如是训练任务分配的CPU资源、GPU资源、内存资源等。
在一个例子中,对于图像分类类型,可以采用Resnet、Inception或者Mobilenet等网络来建立机器学习模型。其中,Resnet网络是为了解决深层网络存在的难以训练的问题而提出的,可以在保证参数量的情况下,极大加快深度神经网络的训练,在精度上也有很大提升。Inception网络引入了inception结构,增加了网络的宽度,可以提取更丰富的特征,同时使用了1×1卷积核来降低网络参数,使用了Batch Normalization(批量规范化)来加速网络训练,同时降低过拟合。Mobilenet网络使用可分离的卷积方式来减少模型参数和计算量,极大地提高了网络的性价比。
在一个例子中,对于物体识别类型,可以采用快速区域卷积神经网络(Faster region convolution neural network,Faster-rcnn)方法来建立机器学习模型。其中,Faster-rcnn是一种两阶段的物体识别方法,它将物体识别的四个基本步骤(候选区域生成、特征提取、分类、位置精修)统一到一个深度网络框架中,计算没有重复,提高了运行速度。
在一个例子中,对于文本定位类型,可以采用DeepText模型来建立机器学习模型。其中,DeepText是基于Faster-rcnn针对文本定位进行改进的两阶段模型,其结构和Faster-rcnn如出一辙:首先特征层使用的是VGG-16,其次是算法由用于提取候选区域的区域选取网络(Region Proposal Network,RPN)和用于检测物体的Faster-rcnn组成。
在一个例子中,对于文本识别类型,可以采用Densenet、基于神经网络的时序类分类(Connectionist temporal classification,CTC)等算法来建立机器学习模型。其中,Densenet算法使用主干网络和CTC损失函数的架构,可以灵活选择网络的结构,例如可以选择densenet或者simplenet作为主干网络,并且可以选择是否选择循环神经网络RNN。连接主义时间分类(Connectionist Temporal Classification,CTC)算法采用六层卷积神经网络(Convolutional Neural Networks,CNN)为模型骨架提取特征,用双层双向循环神经网络(Recurrent Neural Network,RNN)组合时序特征,最后使用CTC解码方法计算损失以及解码句子。
在一个例子中,根据对应类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型的步骤,进一步包括以下过程:
首先,向用户提供用于根据对应类型的机器学习策略来设置模型自动训练任务的建模创建界面。
其次,接收用户在建模创建界面中执行的设置操作,以获取自动训练模型所需的设置项。设置项包括标注数据上传、数据预处理策略、算法配置、资源配置中的至少一个。
标注数据上传,是指上传用于进行模型训练的标注数据。其中,标注是指在有监督的学习模式中,采用带有结果信息的样本数据,该带有结果信息的样本数据称为标注数据,进行模型训练。在计算机视觉领域,存在几种常用的标注手段来应对常见的图像理解、识别场景。例如,在图像分类场景中,标注指代对图像数据的分类结果,在一图像的分类结果为发票类图像的情况下,该图像的标注即为发票;在物体识别场景中,标注指代对图像中的目标区域(Region of Interest,ROI)进行框定,并且对框定的图像区域加以区分和判定的过程,因而这里的标注是一个带有目标区域的坐标范围以及最终分类结果的复合标签,例如,在框定的图像区域为人物图像的情况下,该图像区域的标注包括该图像区域在图像中的坐标范围及人物的分类结果。由图像数据和标注数据组成的一组数据的集合,称为一个数据集。
在一个例子中,通过发布标注任务来获取标注数据,并上传获取到的标注数据以用于训练模型。标注任务例如包括以下要求:训练流程的图片应与应用场景中的图片环境一致;保持数据完整、无污染;标注数据的格式符合预设格式。
在一个例子中,在标注数据的上传过程中,可以根据用户的设置执行以下处理之中的至少一项:抛弃异常文件、忽略异常标注、引入失败、使用推荐配置和使用自定义配置。
在一个例子中,在标注数据的上传过程中,还可以包括响应于用户的输入,展示关于上传的标注数据的图形界面,其中,在图形界面上提供以下各项中的至少一项:上传日志的详情或其入口、用于复制标注数据路径的快捷方式、用于查看标注数据的按钮。
数据预处理策略,是指对标注数据进行变换、增强等预处理的策略。数据预处理一般包括两部分内容,一部分是数据拆分,按照一定的拆分规则可以将数据拆分为训练数据集和验证数据集两份数据集,在通过标注数据进行模型训练的平台中可以支持两种拆分方式,随机拆分和指定某一个/几个数据集为验证集。其中训练集用来模型训练,验证集用来评估模型效果。第二部分是数据增强,对于训练数据集进行一定的变换放缩等操作,其中包括一定的裁剪、切分、噪声等等处理方法,使得模型对真实环境各种样本图片的适应性和鲁棒性更强。计算机视觉领域常见到的数据增强方法有裁剪、旋转、噪声。裁剪指的是从图像中选择一部分,将这部分的图像裁剪出来,然后调整为原图像的大小。旋转是指对图片顺时针或者逆时针的旋转,注意在旋转的时候,最好旋转90-180度否则会出现尺度的问题。最后,增加噪声目的主要是让图像不清楚,通过对图片集的一系列数学分布的操作计算,扰乱图像的可观测信息。其它数据增强 方法的技术逻辑都与以上描述的方法相似。
算法配置,用于对模型的训练算法进行精细调优,其中包括影响训练过程的核心结构以及相关超参数。其中,在深度学习网络的训练过程中,超参数是开始学习之前设置的参数,而不是通过训练得到的数据。通常情况下,需要对超参数进行优化,通过较佳超参数提高学习的性能和效果。
资源配置,用于控制训练所分配的对应资源,包括GPU、CPU以及内存配置。
在一个例子中,数据预处理策略、算法配置和资源配置之中的至少一个提供不同级别的配置策略。
例如,对于数据预处理策略,提供“智能”、“精调”、“专家”三个级别的配置策略。其中,“智能”模式即为黑箱模式,根据不同的数据类型内置了多种数据预处理方法,无需用户选择。“精调”模式允许用户进行精细化调整,数据预处理配置包含了在预处理阶段可以进行的多种方法,用户可以根据训练需求选择预处理方法。“专家”模式开放了预处理方法的所有可调参数,用户可以根据训练需求调整相关参数。
例如,对于算法配置,提供“智能”、“精调”、“专家”三个级别的配置策略。其中,“智能”模式即为黑箱模式,根据不同的项目类型和数据类型,为用户自动提供最佳的模型以及算法的超参数。“精调”模式即为精细化调整模式,可以针对项目类型选择不同的模型以及是否需要迁移学习等。“专家”模式开放了超参数配置的所有参数,用户可以根据训练需求调整相关参数。
例如,对于资源配置,提供“智能”、“精调”、“专家”三个级别的配置策略。其中,“智能”模式即为黑箱模式,根据不同的数据类型和模型种类,结合用户的资源配置和当前资源的使用情况,进行最佳的资源配置调度,无需用户自行考虑。“精调”模式即精细化调整模式,由于图像训练最占用的是GPU压缩资源,这里会提供给用户进行GPU资源的修改和是否考虑开启预估服务的弹性伸缩。“专家”模式开放了GPU、CPU以及内存配置入口,以及弹性伸缩的实例数范围。其中,实例数是指同时运行的服务数量。
本实施例中,可以根据标注数据和机器学习模型的类型来提供预处理策略、算法配置和资源配置之中的至少一个的默认级别。例如,根据标注数据和机器学习模型的类型确定自动确定各项配置的最优值,以此作为各项配置的默认级别,也就是“智能模型”的默认配置。
需要说明的是,本实施例引入了基于迁移学习的模型训练方法。迁移学习是指把一个领域(即源领域)的知识,迁移到另一个领域(即目标领域),使得目标领域能够取得更好的学习效果,能够让小数据领域产生人工智能,打破人工智能对大数据的依赖。计算机视觉算法的迁移学习是指首先利用与目标领域不同的数据集训练通用模 型,作为骨干网络模型,再利用目标场景的数据对骨干网络模型进行优化训练。例如,在物体检测场景里,先用一个骨干网络提取图像特征,再基于提取到的特征进行后续处理。
本实施例中,在用户选择进行迁移学习的情况下,向用户提供候选的骨干网络模型,包括mobilenet网络、resnet网络、inception网络等,并基于用户选择的骨干网络模型进行后续迁移学习过程。
本实施例中的应用开发方法还支持机器学习模型的分布式训练,并且在算法配置的“黑箱”模式中自动控制训练策略。基于自动控制训练策略,每个模型的训练任务可以划分为多个子任务,相应地,每个模型的训练策略可以划分为多个子训练策略。本实施例中,根据数据、参数以及训练任务来分发调度子训练策略。机器学习模型的训练可以通过训练器、评估器和优化器执行,其中,训练器可用于执行训练任务,得到训练结果;评估器用于评估该训练结果优劣,得到评估结果;而优化器则用于根据评估结果确定下一轮训练使用的超参数。对应到分布式训练中,可以针对各子任务配置对应的训练器、评估器和优化器,每个训练器执行对应的训练子任务,产出权重交予专门的评估器评估,优化器则根据评估结果生产超参,超参会指导每个训练器进行下一轮训练,最终产生模型。上述多个子任务可以在分布式系统中并行执行,从而实现机器学习模型的分布式训练,提高模型训练效率。
最后,按照获取的设置项,基于用户上传的标注数据进行一次或多次自动训练模型的实验,获得一个或多个机器学习模型。
本实施例中引入了项目的概念,一个项目是指以某一结果导向的一系列任务的结合,每个项目生成相应的一个应用。实验代表某一项目中一次模型训练作业,通过一次成功的实验可得到一个模型。一次实验可以分为预处理、训练、后处理三个阶段,以及待启动、排队中、运行失败、运行终止、运行成功五中状态。
在一个例子中,向用户展示实验的实验版本、实验状态、实验进度、准确率等指标、创建时间、实验基本信息、实验日志、训练详情指标、实验评估之中的至少一项。
实验版本,是指当前实验在项目实验列表中的序号,默认实验版本为1,随后如果继续创建实验,则实验版本以自然数序列递增。实验状态,是指当前实验的运行状态,包括“排队中”、“运行中”、“运行成功”等。实验进度,是指模型训练的进展情况,可通过进度条等方式展示。准确率,是指当前实验产生的模型在验证集上的准确率,以百分比表示。创建时间,即当前实验被创建的时间。实验基本信息,包括实验所属的项目名称、实验所属的项目类型、实验版本、实验状态、实验进度、创建时间等信息。实验日志,是指当前实验的操作信息记录。训练详情指标,是指实验过程中的准确性 指标,包括训练损失(在训练集上的损失)等。实验评估,是指实验完成后对模型的评估,包括验证准确率、验证精准率、验证召回率、验证F值等。
在一个例子中,向用户展示实验的训练详情指标,包括:获取多次训练迭代的指标,并展示多次训练迭代之间的指标演进过程。例如,以迭代次数为横轴、训练损失为纵轴建立折线图,通过折线图展示多次训练迭代之间的指标演进过程。
在一个例子中,在实验完成后,还包括创建实验评估任务以对实验产出模型进行评估的步骤。此外,向用户展示实验的实验评估,包括:展示实验评估任务下的评估指标统计、资源配置、实时日志、错例数据之中的至少一项。评估指标统计,即各项评估指标的总体概况。资源配置,即为实验评估任务分配的运行资源。实时日志,即评估任务运行情况的实时记录。错例数据,即评估任务中模型识别错误的案例,例如错例图片。
在上述例子中,创建实验评估任务包括:选择评估数据集,并配置评估任务的资源。基于用户的勾选操作,可以确定用于评估的数据集,通过对该数据集的评估得到相应模型效果的评估结果。资源配置可以采用“智能”模式即默认配置。
在步骤S1300中,根据获得的机器学习模型,生成应用。
本实施例中,步骤S1300进一步包括:基于训练后的单个机器学习模型来生成应用;或者,基于模板流程来生成应用,其中,模板流程用于限定训练后的多个机器学习模型在应用过程中的编排流程。
在某些应用场景中,例如光学字符识别(Optical Character Recognition,OCR)场景中,需要多个模型相互配合才能完成任务。因此,本实施例提供了基于模板流程根据多个机器学习模型生成应用的方式。
以OCR识别场景为例,该场景下的目标流程即为OCR识别流程。通常来说,OCR流程可以分为文本定位和文本识别两个环节。基于OCR识别流程来生成应用的过程包括以下步骤。
首先,创建OCR版式图片对应的OCR版式。该步骤包括:在画布区展示用户选择的OCR样例图片;在画布区之内或周围提供用于设置OCR识别区的控件;以及响应于用户对控件的操作,在展示的OCR样例图片上设置一个或多个对应图片内容的OCR识别区,以得到OCR版式图片。该步骤还包括:在画布区之内或周围提供用于编辑OCR样例图片的控件;以及响应于用户对控件的操作对OCR样例图片进行编辑,其中,编辑包括换图、选择、移动、裁剪、放大和缩小之中的至少一个。
其次,向用户提供展示OCR版式图片且用于配置分别应用于OCR版式图片中各个识别区的一个或多个模型的操作界面。
最后,接收用户在操作界面中执行的配置操作,以生成针对各个识别区应用一个或多个模型的应用。
下面以图3所示的发票图片为例说明上述过程。首先,电子设备1000接收用户上传的、作为OCR版式的图片。之后,电子设备1000在画布区展示该图片,并在画布区之内或周围提供编辑控件,支持用户执行换图、选择、移动、裁剪、放大和缩小等操作。此外,在操作界面中提供识别选框,用户通过识别选框将识别区定位出来,例如图3中的虚线框示出了四个识别区,依次识别区1、识别区2、识别区3和识别区4。用户可以给四个识别区依次命名为“票据名称”、“开票日期”、“大写金额”、“小写金额”。最后,针对每个识别区,可以训练相应的定位模型和识别模型,例如针对“大写金额”识别区,训练相应的“大写金额”定位模型和“大写金额”识别模型。
在生成应用时,用户可以选择应用类型为“应用模板”,以基于模板流程来生成应用。在图3所示的例子中,基于训练好的针对各个识别区的定位模型和识别模型,可基于预设的模板流程自动生成能够对同一图片的多个区域进行识别的应用。
在一个例子中,前文描述的建模创建界面还用于展示版式图片且用于接收用户对版式图片中的识别区的选择,其中版式图片用于用户指定识别区,并且,相关过程还包括:接收用户对版式图片中的识别区的选择,以对标注图像进行裁剪,使裁剪后的标注图像与所选择的识别区相一致。
在一个例子中,应用开发方法还包括:将应用上线,并可视化地向用户展示应用信息、资源与实例、资源监控、API调用监控、应用日志之中的至少一项。这里,上线是指将应用部署在相关设备中以提供相应的服务。
在一个例子中,在应用上线之后,接收用户上传的示例图片,并展示上线应用针对所述示例图片的预测结果。如此,有利于用户检测所生成应用的识别效果。
本实施例提供的基于机器学习模型的应用开发方法,能够自主构建人工智能尤其是视觉类应用服务,一站式满足从标准路径下的标注数据的接入、存储到模型的构建、优化,直至将模型应用于线上,为实际业务场景提供在线服务。辅以数据、服务、应用的监控管理套件,实现一体化、自动化、智能化的人工智能开发管理。通过低门槛的界面化操作将复杂的应用构建流程简化,解决人工智能应用开发中人力成本高的问题。
<装置实施例>
本实施例提供了一种应用开发装置。如图4所示,应用开发装置400包括模型类型获取模块410、模型训练模块420和应用生成模块430。
模型类型获取模块410,被配置为获取用户设置的机器学习模型的类型。
模型训练模块420,被配置为根据对应类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,机器学习策略用于控制与模 型训练相关的数据、算法、资源之中的至少一项。
应用生成模块430,被配置为根据获得的机器学习模型,生成应用。
在一个例子中,机器学习模型为计算机视觉相关的机器学习模型。
在一个例子中,机器学习模型的类型包括图像分类类型、物体识别类型、文本定位类型、文本识别类型中的至少一种。
在一个例子中,模型训练模块420被配置为:向用户提供用于根据对应类型的机器学习策略来设置模型自动训练任务的建模创建界面;接收用户在建模创建界面中执行的设置操作,以获取自动训练模型所需的设置项;以及按照获取的设置项,基于用户上传的标注数据进行一次或多次自动训练模型的实验,获得一个或多个机器学习模型。
在一个例子中,设置项包括标注数据上传、数据预处理策略、算法配置、资源配置中的至少一个。
在一个例子中,数据预处理策略、算法配置和资源配置之中的至少一个提供不同级别的配置策略。
在一个例子中,模型训练模块420被配置为根据标注数据和机器学习模型的类型来提供预处理策略、算法配置和资源配置之中的至少一个的默认级别。
在一个例子中,建模创建界面还用于展示版式图片且用于接收用户对版式图片中的识别区的选择,其中版式图片用于用户指定识别区,并且,模型训练模块420还被配置为:接收用户对版式图片中的识别区的选择,以对标注图像进行裁剪,使裁剪后的标注图像与所选择的识别区相一致。
在一个例子中,模型训练模块420还被配置为:向用户展示实验的实验版本、实验状态、实验进度、准确率等指标、创建时间、实验基本信息、实验日志、训练详情指标、实验评估之中的至少一项。
在一个例子中,模型训练模块420被配置为:获取多次训练迭代的指标,并展示多次训练迭代之间的指标演进过程。
在一个例子中,模型训练模块420还被配置为:创建实验评估任务以对实验产出模型进行评估,并且,向用户展示实验的实验评估包括:展示实验评估任务下的评估指标统计、资源配置、实时日志、错例数据之中的至少一项。
在一个例子中,一次或多次自动训练模型的实验归属于同一项目,其中,每个项目生成相应的一个应用。
在一个例子中,模型训练模块420被配置为:选择评估数据集,并配置评估任务的资源。
在一个例子中,应用生成模块430被配置为:基于训练后的单个机器学习模型来 生成应用;或者,基于模板流程来生成应用,其中,模板流程用于限定训练后的多个机器学习模型在应用过程中的编排流程。
在一个例子中,应用生成模块430被配置为:向用户提供模板流程中涉及的应用参数;根据用户对应用参数的设置,按照模板流程来生成利用多个机器学习模型的应用。
在一个例子中,模板流程包括OCR识别流程,应用生成模块430用于:向用户提供展示OCR版式图片且用于配置分别应用于OCR版式图片中各个识别区的一个或多个模型的操作界面;以及接收用户在操作界面中执行的配置操作,以生成针对各个识别区应用一个或多个模型的应用。
在一个例子中,多个模型包括针对识别区的定位模型和识别模型。
在一个例子中,应用生成模块430还被配置为:创建OCR版式图片对应的OCR版式。
在一个例子中,应用生成模块430还被配置为:在画布区展示用户选择的OCR样例图片;在画布区之内或周围提供用于设置OCR识别区的控件;以及响应于用户对控件的操作在展示的OCR样例图片上设置一个或多个对应图片内容的OCR识别区,以得到OCR版式图片。
在一个例子中,应用生成模块430还被配置为:在画布区之内或周围提供用于编辑OCR样例图片的控件;以及响应于用户对控件的操作对OCR样例图片进行编辑,其中,编辑包括换图、选择、移动、裁剪、放大和缩小之中的至少一个。
在一个例子中,应用生成模块430还被配置为:将应用上线,并可视化地向用户展示应用信息、资源与实例、资源监控、API调用监控、应用日志之中的至少一项。
在一个例子中,应用生成模块430还被配置为:接收用户上传的示例图片,并展示上线应用针对示例图片的预测结果。
在一个例子中,该装置还包括标注数据获取模块,该数据获取模块被配置为:通过发布标注任务来获取标注数据,并将获取的标注数据上传以用于训练模型。
在一个例子中,标注数据获取模块被配置为根据用户的设置来执行以下处理之中的至少一个:抛弃异常文件、忽略异常标注、引入失败、使用推荐配置和使用自定义配置。
在一个例子中,标注数据获取模块还被配置为响应于用户的输入,展示关于上传的标注数据的图形界面,其中,在图形界面上提供以下项之中的至少一个:上传日志的详情或其入口、用于复制标注数据路径的快捷方式、用于查看标注数据的按钮。
<电子设备实施例>
本实施例提供一种电子设备,该电子设备包括图4所示的应用开发装置400。或者,该电子设备为图5所示的电子设备500,包括处理器510和存储器520。存储器510用于存储指令,该指令用于控制处理器执行根据本公开方法实施例描述的基于机器学习 模型的应用开发方法。
<计算机可读存储介质实施例>
本实施例提供一种计算机可读存储介质。该计算机可读存储介质存储有可执行命令,该可执行命令被处理器执行时,实现根据本公开方法实施例描述的基于机器学习模型的应用开发方法。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开实施例的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例 中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开实施例的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开实施例的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权 利要求来限定。
工业实用性
通过本公开实施例,其在工作节点执行分配的任务的过程中,针对资源使用情况进行动态调节,从而实现有效率的任务分配和资源调度,提高任务的执行效率以及资源利用率。因此本公开具有很强的工业实用性。

Claims (54)

  1. 一种基于机器学习模型的应用开发方法,包括:
    获取用户设置的机器学习模型的类型;
    根据对应所述类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,所述机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项;
    根据获得的所述机器学习模型,生成所述应用。
  2. 根据权利要求1所述的方法,其中,所述机器学习模型为计算机视觉相关的机器学习模型。
  3. 根据权利要求1或2所述的方法,其中,所述机器学习模型的类型包括图像分类类型、物体识别类型、文本定位类型、文本识别类型中的至少一种。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述获取用户设置的机器学习模型的类型,包括:
    向用户展示与各种机器学习任务分别对应的机器学习模型的候选类型;
    接收用户从候选类型中选择的机器学习模型的类型。
  5. 根据权利要求1至4中任一项所述的方法,其中,所述根据对应所述类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,包括:
    向用户提供用于根据对应所述类型的机器学习策略来设置模型自动训练任务的建模创建界面;
    接收用户在建模创建界面中执行的设置操作,以获取自动训练模型所需的设置项;以及
    按照获取的设置项,基于用户上传的标注数据进行一次或多次自动训练模型的实验,获得一个或多个机器学习模型。
  6. 根据权利要求5所述的方法,其中,所述设置项包括标注数据上传、数据预处理策略、算法配置、资源配置中的至少一个。
  7. 根据权利要求6所述的方法,其中,所述数据预处理策略、所述算法配置和所述资源配置之中的至少一个提供不同级别的配置策略。
  8. 根据权利要求7所述的方法,其中,根据标注数据和所述机器学习模型的类型来提供所述预处理策略、所述算法配置和所述资源配置之中的至少一个的默认级别。
  9. 根据权利要求5所述的方法,其中,所述建模创建界面还用于展示版式图片且用于接收用户对版式图片中的识别区的选择,其中所述版式图片用于用户指定识别区,并且,所述方法还包括:接收用户对版式图片中的识别区的选择,以对标注图像进行裁剪,使裁剪后的标注图像与所选择的识别区相一致。
  10. 根据权利要求1至9中任一项所述的方法,其中,还包括:
    向用户展示所述实验的实验版本、实验状态、实验进度、准确率、创建时间、实验基本信息、实验日志、训练详情指标、实验评估之中的至少一项。
  11. 根据权利要求10所述的方法,其中,向用户展示所述实验的训练详情指标,包括:获取多次训练迭代的指标,并展示多次训练迭代之间的指标演进过程。
  12. 根据权利要求1至11中任一项所述的方法,其中,还包括:创建实验评估任务以对实验产出模型进行评估,并且,向用户展示所述实验的实验评估,包括:展示实验评估任务下的评估指标统计、资源配置、实时日志、错例数据之中的至少一项。
  13. 根据权利要求12所述的方法,其中,所述创建实验评估任务包括:选择评估数据集,并配置评估任务的资源。
  14. 根据权利要求1至13中任一项所述的方法,其中,所述一次或多次自动训练模型的实验归属于同一项目,其中,每个所述项目生成相应的一个应用。
  15. 根据权利要求1至14中任一项所述的方法,其中,所述根据获得的所述机器学习模型,生成所述应用,包括:基于训练后的单个机器学习模型来生成所述应用;或者,基于模板流程来生成所述应用,其中,模板流程用于限定训练后的多个机器学习模型在应用过程中的编排流程。
  16. 根据权利要求15所述的方法,其中,所述基于模板流程来生成所述应用,包括:
    向用户提供模板流程中涉及的应用参数;
    根据用户对应用参数的设置,按照模板流程来生成利用所述多个机器学习模型的应用。
  17. 根据权利要求15所述的方法,其中,所述模板流程包括OCR识别流程,其中,基于OCR识别流程来生成应用包括:
    向用户提供展示OCR版式图片且用于配置分别应用于OCR版式图片中各个识别区的一个或多个模型的操作界面;以及
    接收用户在操作界面中执行的配置操作,以生成针对各个识别区应用所述一个或多个模型的应用。
  18. 根据权利要求17所述的方法,其中,所述多个模型包括针对识别区的定位模型和识别模型。
  19. 根据权利要求17所述的方法,其中,还包括:创建所述OCR版式图片对应的OCR版式。
  20. 根据权利要求19所述的方法,其中,所述创建所述OCR版式图片对应的OCR版式,包括:
    在画布区展示用户选择的OCR样例图片;
    在画布区之内或周围提供用于设置OCR识别区的控件;以及
    响应于用户对所述控件的操作在展示的OCR样例图片上设置一个或多个对应图片内容的OCR识别区,以得到OCR版式图片。
  21. 根据权利要求19所述的方法,其中,所述创建OCR版式图片对应的OCR版式的步骤还包括:
    在画布区之内或周围提供用于编辑OCR样例图片的控件;以及
    响应于用户对所述控件的操作对OCR样例图片进行编辑,其中,所述编辑包括换图、选择、移动、裁剪、放大和缩小之中的至少一个。
  22. 根据权利要求1至21中任一项所述的方法,其中,还包括:将所述应用上线,并可视化地向用户展示应用信息、资源与实例、资源监控、API调用监控、应用日志之中的至少一项。
  23. 根据权利要求22所述的方法,其中,还包括:接收用户上传的示例图片,并展示上线应用针对所述示例图片的预测结果。
  24. 根据权利要求5至23中任一项所述的方法,其中,还包括:通过发布标注任务来获取标注数据,并将获取的标注数据上传以用于训练模型。
  25. 根据权利要求24所述的方法,其中,所述上传过程中根据用户的设置来执行以下处理之中的至少一个:抛弃异常文件、忽略异常标注、引入失败、使用推荐配置和使用自定义配置。
  26. 根据权利要求24所述的方法,其中,还包括响应于用户的输入,展示关于上传的标注数据的图形界面,其中,在所述图形界面上提供以下项之中的至少一个:上传日志的详情或其入口、用于复制标注数据路径的快捷方式、用于查看标注数据的按钮。
  27. 一种应用开发装置,包括:
    模型类型获取模块,被配置为获取用户设置的机器学习模型的类型;
    模型训练模块,被配置为根据对应所述类型的机器学习策略,通过一次或多次自动训练模型的实验,获得一个或多个机器学习模型,其中,所述机器学习策略用于控制与模型训练相关的数据、算法、资源之中的至少一项;
    应用生成模块,被配置为根据获得的所述机器学习模型,生成所述应用。
  28. 根据权利要求27所述的装置,其中,所述机器学习模型为计算机视觉相关的机器学习模型。
  29. 根据权利要求27或28所述的装置,其中,所述机器学习模型的类型包括图像分类类型、物体识别类型、文本定位类型、文本识别类型中的至少一种。
  30. 根据权利要求27至29中任一项所述的装置,其中,所述模型类型获取模块还被配置为:
    向用户展示与各种机器学习任务分别对应的机器学习模型的候选类型;
    接收用户从候选类型中选择的机器学习模型的类型。
  31. 根据权利要求27至30中任一项所述的装置,其中,所述模型训练模块还被配置为:
    向用户提供用于根据对应所述类型的机器学习策略来设置模型自动训练任务的建模创建界面;
    接收用户在建模创建界面中执行的设置操作,以获取自动训练模型所需的设置项;以及
    按照获取的设置项,基于用户上传的标注数据进行一次或多次自动训练模型的实验,获得一个或多个机器学习模型。
  32. 根据权利要求31所述的装置,其中,所述设置项包括标注数据上传、数据预处理策略、算法配置、资源配置中的至少一个。
  33. 根据权利要求32所述的装置,其中,所述数据预处理策略、所述算法配置和所述资源配置之中的至少一个提供不同级别的配置策略。
  34. 根据权利要求33所述的装置,其中,所述模型训练模块还被配置为:根据标注数据和所述机器学习模型的类型来提供所述预处理策略、所述算法配置和所述资源配置之中的至少一个的默认级别。
  35. 根据权利要求31所述的装置,其中,所述建模创建界面还用于展示版式图片且用于接收用户对版式图片中的识别区的选择,其中所述版式图片用于用户指定识别区,并且,所述模型训练模块还被配置为:接收用户对版式图片中的识别区的选择,以对标注图像进行裁剪,使裁剪后的标注图像与所选择的识别区相一致。
  36. 根据权利要求27至35中任一项所述的装置,其中,所述模型训练模块还被配置为:
    向用户展示所述实验的实验版本、实验状态、实验进度、准确率、创建时间、实验基本信息、实验日志、训练详情指标、实验评估之中的至少一项。
  37. 根据权利要求36所述的装置,其中,所述模型训练模块还被配置为:获取多次训练迭代的指标,并展示多次训练迭代之间的指标演进过程。
  38. 根据权利要求27至37中任一项所述的装置,其中,所述模型训练模块还被配置为:创建实验评估任务以对实验产出模型进行评估,并且,向用户展示所述实验的实验评估,包括:展示实验评估任务下的评估指标统计、资源配置、实时日志、错例数据之中的至少一项。
  39. 根据权利要求38所述的装置,其中,所述模型训练模块还被配置为:选择评估数据集,并配置评估任务的资源。
  40. 根据权利要求27至39中任一项所述的装置,其中,所述一次或多次自动训练模型的实验归属于同一项目,其中,每个所述项目生成相应的一个应用。
  41. 根据权利要求27至40中任一项所述的装置,其中,所述应用生成模块被配置为:基于训练后的单个机器学习模型来生成所述应用;或者,基于模板流程来生成所述应用,其中,模板流程用于限定训练后的多个机器学习模型在应用过程中的编排 流程。
  42. 根据权利要求41所述的装置,其中,所述应用生成模块被配置为:
    向用户提供模板流程中涉及的应用参数;
    根据用户对应用参数的设置,按照模板流程来生成利用所述多个机器学习模型的应用。
  43. 根据权利要求41所述的装置,其中,所述模板流程包括OCR识别流程,其中,所述应用生成模块被配置为:
    向用户提供展示OCR版式图片且用于配置分别应用于OCR版式图片中各个识别区的一个或多个模型的操作界面;以及
    接收用户在操作界面中执行的配置操作,以生成针对各个识别区应用所述一个或多个模型的应用。
  44. 根据权利要求43所述的装置,其中,所述多个模型包括针对识别区的定位模型和识别模型。
  45. 根据权利要求43所述的装置,其中,所述应用生成模块还被配置为:创建所述OCR版式图片对应的OCR版式。
  46. 根据权利要求45所述的装置,其中,所述应用生成模块被配置为:
    在画布区展示用户选择的OCR样例图片;
    在画布区之内或周围提供用于设置OCR识别区的控件;以及
    响应于用户对所述控件的操作在展示的OCR样例图片上设置一个或多个对应图片内容的OCR识别区,以得到OCR版式图片。
  47. 根据权利要求45所述的装置,其中,所述应用生成模块还被配置为:
    在画布区之内或周围提供用于编辑OCR样例图片的控件;以及
    响应于用户对所述控件的操作对OCR样例图片进行编辑,其中,所述编辑包括换图、选择、移动、裁剪、放大和缩小之中的至少一个。
  48. 根据权利要求27至47中任一项所述的装置,其中,所述应用生成模块还被配置为:将所述应用上线,并可视化地向用户展示应用信息、资源与实例、资源监控、API调用监控、应用日志之中的至少一项。
  49. 根据权利要求48所述的装置,其中,所述应用生成模块还被配置为:接收用户上传的示例图片,并展示上线应用针对所述示例图片的预测结果。
  50. 根据权利要求31至49中任一项所述的装置,其中,所述装置还包括标注数据获取模块,所述数据获取模块被配置为:通过发布标注任务来获取标注数据,并将获取的标注数据上传以用于训练模型。
  51. 根据权利要求50所述的装置,其中,所述标注数据获取模块还被配置为根据用户的设置来执行以下处理之中的至少一个:抛弃异常文件、忽略异常标注、引入失败、使用推荐配置和使用自定义配置。
  52. 根据权利要求50所述的装置,其中,所述标注数据获取模块还被配置为响应于用户的输入,展示关于上传的标注数据的图形界面,其中,在所述图形界面上提供以下项之中的至少一个:上传日志的详情或其入口、用于复制标注数据路径的快捷方式、用于查看标注数据的按钮。
  53. 一种电子设备,包括:
    如权利要求27至52任一项所述的装置;或者,
    处理器和存储器,所述存储器用于存储指令,所述指令用于控制所述处理器执行根据权利要求1-26中任一项所述的方法。
  54. 一种计算机可读存储介质,存储有可执行命令,所述可执行命令被处理器执行时,实现根据权利要求1-26中任一项所述的基于机器学习模型的应用开发方法。
PCT/CN2020/141344 2019-12-30 2020-12-30 基于机器学习模型的应用开发方法、装置及电子设备 WO2021136365A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911395248.1 2019-12-30
CN201911395248.1A CN111160569A (zh) 2019-12-30 2019-12-30 基于机器学习模型的应用开发方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021136365A1 true WO2021136365A1 (zh) 2021-07-08

Family

ID=70559199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/141344 WO2021136365A1 (zh) 2019-12-30 2020-12-30 基于机器学习模型的应用开发方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN111160569A (zh)
WO (1) WO2021136365A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064157A (zh) * 2021-11-09 2022-02-18 中国电力科学研究院有限公司 基于页面元素识别的自动化流程实现方法、系统、设备及介质
CN115618239A (zh) * 2022-12-16 2023-01-17 四川金信石信息技术有限公司 一种深度学习框架训练的管理方法、系统、终端及介质
CN117474125A (zh) * 2023-12-21 2024-01-30 环球数科集团有限公司 一种自动训练机器学习模型系统

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160569A (zh) * 2019-12-30 2020-05-15 第四范式(北京)技术有限公司 基于机器学习模型的应用开发方法、装置及电子设备
CN111695443B (zh) * 2020-05-21 2023-01-24 平安科技(深圳)有限公司 智能交通人工智能开放平台、方法、介质及电子设备
CN111723746A (zh) * 2020-06-22 2020-09-29 江苏云从曦和人工智能有限公司 场景识别模型生成方法、系统、平台、设备及介质
CN111813084B (zh) * 2020-07-10 2022-10-28 重庆大学 一种基于深度学习的机械装备故障诊断方法
CN114154641A (zh) * 2020-09-07 2022-03-08 华为云计算技术有限公司 Ai模型的训练方法、装置、计算设备和存储介质
CN112101567A (zh) * 2020-09-15 2020-12-18 厦门渊亭信息科技有限公司 基于人工智能的自动化建模方法及装置
CN112364883B (zh) * 2020-09-17 2022-06-10 福州大学 基于单阶段目标检测和deeptext识别网络的美式车牌识别方法
CN112508769A (zh) * 2020-12-28 2021-03-16 浪潮云信息技术股份公司 一种基于深度学习构建多任务计算机视觉应用服务的方法
CN112734911A (zh) * 2021-01-07 2021-04-30 北京联合大学 基于卷积神经网络的单幅图像三维人脸重建方法及系统
CN112767205A (zh) * 2021-01-26 2021-05-07 深圳市恩孚电子科技有限公司 机器学习教学方法、装置、电子设备和存储介质
CN112966439A (zh) * 2021-03-05 2021-06-15 北京金山云网络技术有限公司 机器学习模型训练方法、装置以及虚拟实验箱
CN113470448A (zh) * 2021-06-30 2021-10-01 上海松鼠课堂人工智能科技有限公司 基于生成对抗网络的科学实验的模拟方法、系统及设备
CN113850186A (zh) * 2021-09-24 2021-12-28 中国劳动关系学院 基于卷积神经网络的智能流媒体视频大数据分析方法
CN115660064B (zh) * 2022-11-10 2023-09-29 北京百度网讯科技有限公司 基于深度学习平台的模型训练方法、数据处理方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190090774A1 (en) * 2017-09-27 2019-03-28 Regents Of The University Of Minnesota System and method for localization of origins of cardiac arrhythmia using electrocardiography and neural networks
CN110009174A (zh) * 2018-12-13 2019-07-12 阿里巴巴集团控股有限公司 风险识别模型训练方法、装置及服务器
CN110058922A (zh) * 2019-03-19 2019-07-26 华为技术有限公司 一种提取机器学习任务的元数据的方法、装置
CN110210626A (zh) * 2019-05-31 2019-09-06 京东城市(北京)数字科技有限公司 数据处理方法、装置和计算机可读存储介质
CN111160569A (zh) * 2019-12-30 2020-05-15 第四范式(北京)技术有限公司 基于机器学习模型的应用开发方法、装置及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163233A (zh) * 2018-02-11 2019-08-23 陕西爱尚物联科技有限公司 一种使机器胜任更多复杂工作的方法
CN108881446B (zh) * 2018-06-22 2021-09-21 深源恒际科技有限公司 一种基于深度学习的人工智能平台系统
CN109815991B (zh) * 2018-12-29 2021-02-19 北京城市网邻信息技术有限公司 机器学习模型的训练方法、装置、电子设备及存储介质
CN110378463B (zh) * 2019-07-15 2021-05-14 北京智能工场科技有限公司 一种人工智能模型标准化训练平台及自动化系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190090774A1 (en) * 2017-09-27 2019-03-28 Regents Of The University Of Minnesota System and method for localization of origins of cardiac arrhythmia using electrocardiography and neural networks
CN110009174A (zh) * 2018-12-13 2019-07-12 阿里巴巴集团控股有限公司 风险识别模型训练方法、装置及服务器
CN110058922A (zh) * 2019-03-19 2019-07-26 华为技术有限公司 一种提取机器学习任务的元数据的方法、装置
CN110210626A (zh) * 2019-05-31 2019-09-06 京东城市(北京)数字科技有限公司 数据处理方法、装置和计算机可读存储介质
CN111160569A (zh) * 2019-12-30 2020-05-15 第四范式(北京)技术有限公司 基于机器学习模型的应用开发方法、装置及电子设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064157A (zh) * 2021-11-09 2022-02-18 中国电力科学研究院有限公司 基于页面元素识别的自动化流程实现方法、系统、设备及介质
CN114064157B (zh) * 2021-11-09 2023-09-15 中国电力科学研究院有限公司 基于页面元素识别的自动化流程实现方法、系统、设备及介质
CN115618239A (zh) * 2022-12-16 2023-01-17 四川金信石信息技术有限公司 一种深度学习框架训练的管理方法、系统、终端及介质
CN115618239B (zh) * 2022-12-16 2023-04-11 四川金信石信息技术有限公司 一种深度学习框架训练的管理方法、系统、终端及介质
CN117474125A (zh) * 2023-12-21 2024-01-30 环球数科集团有限公司 一种自动训练机器学习模型系统
CN117474125B (zh) * 2023-12-21 2024-03-01 环球数科集团有限公司 一种自动训练机器学习模型系统

Also Published As

Publication number Publication date
CN111160569A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2021136365A1 (zh) 基于机器学习模型的应用开发方法、装置及电子设备
EP3692438B1 (en) Automatic generation of a graphic user interface (gui) based on a gui screen image
US10719301B1 (en) Development environment for machine learning media models
US11367271B2 (en) Similarity propagation for one-shot and few-shot image segmentation
AU2017254848B2 (en) Image matting using deep learning
US11176423B2 (en) Edge-based adaptive machine learning for object recognition
US11783227B2 (en) Method, apparatus, device and readable medium for transfer learning in machine learning
US11631234B2 (en) Automatically detecting user-requested objects in images
JP6182242B1 (ja) データのラベリングモデルに係る機械学習方法、コンピュータおよびプログラム
US11537506B1 (en) System for visually diagnosing machine learning models
CN113994384A (zh) 使用机器学习的图像着色
WO2021129181A1 (en) Portrait segmentation method, model training method and electronic device
CN115082740B (zh) 目标检测模型训练方法、目标检测方法、装置、电子设备
CN114330588A (zh) 一种图片分类方法、图片分类模型训练方法及相关装置
CN111310837A (zh) 车辆改装识别方法、装置、系统、介质和设备
KR101700030B1 (ko) 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model
Kazangirler et al. UIBee: An improved deep instance segmentation and classification of UI elements in wireframes
US20240163393A1 (en) Predicting video edits from text-based conversations using neural networks
WO2024035416A1 (en) Machine-learned models for multimodal searching and retrieval of images
WO2021251960A1 (en) Subtask adaptable neural network
Amrutha Raj et al. GAMNet: A deep learning approach for precise gesture identification
WO2023149888A1 (en) Training systems for surface anomaly detection
KR20210096367A (ko) 제품의 개발을 보조하기 위한 전자 장치, 방법, 및 컴퓨터 판독가능 매체
CN117058739A (zh) 一种人脸聚类更新方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910580

Country of ref document: EP

Kind code of ref document: A1