WO2020158954A1

WO2020158954A1 - Service building device, service building method, and service building program

Info

Publication number: WO2020158954A1
Application number: PCT/JP2020/003915
Authority: WO
Inventors: 常人萱沼; 古賀　直樹
Original assignee: 株式会社コンピュータマインド
Priority date: 2019-02-01
Filing date: 2020-02-03
Publication date: 2020-08-06
Also published as: JP2022033153A; JPWO2020158954A1

Abstract

The present invention addresses the problem of improving the efficiency of business relating to devices and the like having a learning function.　An image acquiring unit 111 executes an image acquisition process. An annotation unit 112 assigns, to an image BF, prescribed information that may serve as an annotation for the image BF. A learning unit 113 generates a model by performing learning using the image BF as a teacher image TF. A deployment unit 114 enables the generated model to be used in a prescribed environment. An inference library unit 102 executes an inference library process. A model file MF is generated as a usable state. The abovementioned problem is thus resolved.

Description

[Invention name determined by ISA based on Rule 37.2] Service construction device, service construction method, and service construction program

The present invention relates to an information processing device.

Devices that have a learning function using technologies such as deep learning have existed in the past. (See Patent Document 1).

JP, 2008-206262, A

However, the conventional apparatus having a learning function is not versatile because it is manufactured as a dedicated machine for performing only specific learning.
In other words, there are no package solutions for learning functions that use technologies such as deep learning. For this reason, when a person who undertakes the development and provision of a device having various learning functions presents the estimated amount to the customer, it is often difficult to explain the calculation basis.

The present invention has been made in view of such a situation, and an object of the present invention is to improve the efficiency of sales of devices having a learning function.

In order to achieve the above object, an information processing device of one embodiment of the present invention is
An annotation means that adds predetermined information that can be an annotation of the image to the image,
A learning unit that performs learning by using the image provided with the predetermined information as a teacher image and generates a model,
Deploying means for making the generated model usable under a predetermined environment,
Equipped with.

According to the present invention, it is possible to improve the efficiency of sales of a device having a learning function.

It is a flow figure showing an outline of a function of an information processor concerning one embodiment of an information processor of the present invention. It is a block diagram which shows the hardware constitutions of the information processing apparatus of FIG. FIG. 3 is a functional block diagram showing an example of a functional configuration required for various processes executed by the information processing device of FIG. 2. It is a figure which shows the flow of a process in a modeling function. It is a figure which shows the outline|summary of an inference library function. It is a figure which shows the issuing method of "DeepEye Predictor." It is a figure which shows "DeepEye Predictor" after issuance.

Embodiments of the present invention will be described below with reference to the drawings.

FIG. 1 is a flowchart showing an outline of functions of an embodiment of an information processing apparatus of the present invention.
Note that, hereinafter, the processing of the information processing apparatus 1 targets image data, but hereinafter, unless otherwise specified, “data” is omitted and simply referred to as “image”.
The "image" is a broad concept including a still image and a moving image.

As shown in FIG. 1, the functions of the information processing device 1 include a “modeling function” and an “inference library function”.

Among the functions of the information processing apparatus 1, the “modeling function” means learning using deep running or the like with respect to the image BF as the material of the teacher data, using the data with annotations such as the correct answer as the teacher data. This is a function of generating a file MF (hereinafter, referred to as “model file MF”) in a predetermined format by performing the image identification/object detection/segmentation model.
Here, the model file MF is a file generated according to a file format used as a model in an inference process described later.
Specifically, for example, the information processing apparatus 1 exerts a modeling function to perform image acquisition processing (step S1), annotation processing (step S2), learning processing (step S3), and deployment processing (step S4). The model files MF are generated by sequentially executing.

Here, the "image acquisition process" refers to a process of acquiring an image BF which is a material of teacher data.
The “annotation process” refers to a process of adding information (annotations such as correct answers) used as teacher data as metadata to the acquired image BF. The image BF to which the metadata is added is stored and managed in the teacher DB 401 as an image TF as teacher data (hereinafter referred to as “teacher image TF”).
The “learning process” refers to a process of generating or updating a model of image identification/object detection/segmentation by performing learning using a technique such as deep learning using the teacher image TF.
The “deployment process” is a process of making the model file MF of the generated model into a model file MF so that it can be used in a predetermined environment.
The model file MF generated by exhibiting the modeling function in the information processing device 1 is stored and managed in the model DB 402 described later. The model file MF stored in the model DB 402 is managed for each version. For example, as shown in FIG. Each of (version) 1 to n (n is an integer value of 1 or more) is managed.

Among the functions of the information processing apparatus 1, the “inference library function” is a function of reading a model file MF generated by the modeling function to make a component of a program capable of executing inference processing into a library. Say. Hereinafter, the library created by the inference library function is referred to as "deep learning package solution".
By doing so, it is possible to easily build a unique system having a deep learning function simply by mounting the deep learning package solution on the application program developed by the user (not shown). ..

As described above, the modeling function and the inference library function of the information processing device 1 can provide a package solution of the learning function that enables the promotion and sales of the person who develops the device having the learning function to be facilitated. it can.

That is, it is possible to develop and sell integrated software (for example, the deep learning package solution described above) that creates a model for image identification, object detection, and segmentation using deep learning.
In addition, it is possible to sell as a set with hardware without selling the software as a single unit. We can also ship deep learning package solutions with the software environment already built.
This allows the user to immediately use the hardware as soon as it is installed. Further, for example, it is possible to improve the efficiency of deep learning sales. In addition, for example, it is possible to expand to contract development. Further, for example, it is possible to develop into a license business.

Further, according to the present embodiment, it is possible to solve the business problem of the device having the deep learning function. For example, the following problems have existed as problems in sales of a device having a conventional deep learning function. That is, there is no fixed package solution for the device having the deep learning function. For this reason, when doing business, it was forced to rely on the manpower of the staff of the sales department. There is also a problem that it is difficult to explain the price basis to the customer.
With respect to such a problem, according to the present embodiment, (1) it becomes easy to advertise and sell the package sale, and (2) if the package sale is completed, it is possible to say “it is good” ( 3) It becomes possible to connect to the conclusion of a consultant contract, etc., if requested by the customer.

In other words, the feature of this embodiment is that "anyone can easily challenge the development of AI (deep learning)".
Here, it is also possible to build an environment for exerting the deep learning function by using only conventional technology. For example, if you prepare your own personal computer and install some open source software on it, you can build your own environment to exercise the deep learning function.
However, sufficient knowledge of Linux (registered trademark) and software is required in order to build an environment in which the deep learning function is exerted by itself. For this reason, even if a person who does not have such knowledge tries to build an environment for exerting the deep learning function by himself, there are many hurdles. Further, even if an environment for exerting a deep learning function can be constructed, software for executing data management, annotation, learning, and inference are generally different from each other. In addition, data management and management of learned models must be done by themselves. Further, since many open source softwares are designed to be used by system engineers, it is necessary not only to understand technical terms but also to set many complicated parameters. The product to which this embodiment is applied, "DeepEye," is a product that can remove many of these hurdles.

As described above, the modeling function includes annotation processing (step S2), learning processing (step S3), and deployment processing (step S4). In addition, the inference library function is provided with a program component that reads the model file generated by the modeling function to perform the deployment process and actually infers the model file.

Further, according to this embodiment, for example, the following services can be realized. That is, it can be provided as educational content for universities as a browser version of "DeepEye Machine Vision". In this case, it can be provided at a low price as an Acoustic version. It can also be provided as OEM (Original Equipment Manufacturer) for manufacturer products. In this case, a GUI (Graphical User Interface) can be customized for the customer and provided as a service or option for the product.

Next, the hardware configuration of the information processing apparatus 1 that executes various processes for exhibiting the modeling function and the inference library function will be described.
FIG. 2 is a block diagram showing the hardware configuration of the information processing device 1 of FIG.

The information processing device 1 includes a GPU (Graphics Processing Unit) 10, a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a bus 14, and an input/output interface 15. 1, an output unit 16, an input unit 17, a storage unit 18, a communication unit 19, and a drive 20.

The GPU 10 executes routine arithmetic processing according to a program recorded in the ROM 12 or a program loaded from the storage unit 18 into the RAM 13. Specifically, the GPU 10 speeds up deep learning operations by repeatedly executing parallel processing of enormous operations required for learning processing and inference processing. In addition, the GPU 10 performs arithmetic processing required when performing image depiction.
The RAM 13 also appropriately stores data and the like necessary for the GPU 10 to execute arithmetic processing.

The CPU 11 executes various processes according to a program recorded in the ROM 12 or a program loaded from the storage unit 18 into the RAM 13.
The RAM 13 also appropriately stores data and the like necessary for the CPU 11 to execute various processes.

The GPU 10, the CPU 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output interface 15 is also connected to the bus 14. An output unit 16, an input unit 17, a storage unit 18, a communication unit 19, and a drive 20 are connected to the input/output interface 15.

The output unit 16 includes various liquid crystal displays and outputs various information.
The input unit 17 is composed of various types of hardware such as lead, and inputs various types of information.
The storage unit 18 is configured by a DRAM (Dynamic Random Access Memory) or the like, and stores various data.
The communication unit 19 controls communication with other devices via the network N including the Internet.

The drive 20 is provided as needed. A removable medium 30, which is composed of a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is appropriately mounted on the drive 20. The program read from the removable medium 30 by the drive 20 is installed in the storage unit 18 as needed. The removable medium 30 can also store various data stored in the storage unit 18 in the same manner as the storage unit 18.

Specifically, for example, the following hardware configuration can be used. That is, the OS is “Ubuntu 16.0.4 LTS”, the CPU (eg CPU 11 in FIG. 2) is “Core i7-8700K”, the memory (eg RAM 13 in FIG. 2) has a capacity of 32 GB, and SSD (Sold State Drive) is 500 G. SATA SSD, HDD (Hard Disk Drive) 3TB (TeraByte), ODD (Optical Disk Drive) “DVD Super Multi”, power supply 1000W, GPU (eg GPU 10 in FIG. 2) “GeforceTi RTX 20”. It is shown.

Next, the functional configuration of the information processing device 1 having the hardware configuration of FIG. 2 will be described.
FIG. 3 is a functional block diagram showing an example of a functional configuration required for various processes executed by the information processing device 1 of FIG.

As shown in FIG. 3, in the GPU 10 (also the CPU 11 (not shown)) of the information processing device 1, the modeling unit 101 functions when the modeling process is executed. Further, when the inference library processing is executed, the inference library unit 102 functions.
A teacher DB 401, a model DB 402, and a library DB 403 are provided in one area of the storage unit 18 of the information processing device 1.

Here, the “modeling process” means a series of processes executed by the information processing apparatus 1 in which the modeling function of FIG. 1 described above is performed, that is, an image acquisition process (for example, step S1 of FIG. 1) and an annotation process ( For example, it refers to a process in which the learning process (for example, step S3 in FIG. 1) and the deployment process (for example, step S4 in FIG. 1) are sequentially executed.
The "inference library process" refers to a series of processes executed by the information processing device 1 in which the inference library function of FIG.

The modeling unit 101 has an image acquisition unit 111, an annotation unit 112, a learning unit 113, and a deployment unit 114.

The image acquisition unit 111 executes an image acquisition process (for example, step S1 in FIG. 1). Specifically, the image acquisition unit 111 acquires the image BF that is the material of the teacher data.

The annotation unit 112 executes annotation processing (for example, step S2 in FIG. 1). Specifically, the annotation unit 112 attaches, to the acquired image BF, information used as teacher data (annotations such as the contents of the correct answer) as metadata.

The learning unit 113 executes a learning process (eg, step S3 in FIG. 1). Specifically, the learning unit 113 generates or updates a model of image identification/object detection/segmentation by performing learning using a technique such as deep learning using the teacher image TF.

The deploy unit 114 executes a deploy process (eg, step S4 in FIG. 1). Specifically, the deploying unit 114 sets the generated model as the model file MF so that the model file MF can be used in a predetermined environment.

The inference library unit 102 executes inference library processing.

That is, the library forming unit 121 reads the model file MF generated by the above-described modeling function to make a library of parts that constitute a program that enables inference processing.

A deep learning package solution is generated by the information processing device 1 having the above functional configuration executing the modeling process and the inference library process described above. As a result, a user's own system having a deep learning function can be easily constructed by simply installing the deep learning package solution in the application program developed by the customer.

FIG. 4 is a diagram showing the flow of processing in the modeling function.

As shown in FIG. 4, in the modeling function, acquisition of an image BF, annotation (for example, the above-mentioned annotation processing), learning based on teacher data (for example, the above-mentioned teacher image TF) generated by the annotation (for example, the above-mentioned learning processing) , The model generated by learning is deployed (for example, the deploy process of generating the model file MF described above). The learning result can be output as a report.

Specifically, for example, the information processing apparatus 1 performs a modeling function as an image acquisition process (step S1) to display an image BF of a label (hereinafter, referred to as “wine label”) attached to a wine bottle, for example. get.
As the annotation process (step S2), the information processing device 1 adds metadata (for example, information on wine specified by the wine label, specifically, brand, origin, year of manufacture, etc.) to the image BF. By doing so, the teacher image TF is generated.
As a learning process (step S3), the information processing device 1 performs learning using the teacher image TF to infer the wine brand or the like indicated by the wine label from the wine label (the image including the subject as a subject). Generate a model.
The information processing apparatus 1 generates a wine label “.DEEP file” as a model file MF for the model as a deploy process (step S4).

FIG. 5 is a diagram showing an outline of the inference library function.

As shown in FIG. 5, according to the inference library function, the user can install the “DeepEye Predictor” (the deep learning package solution described above) in the application program developed by the user. This allows the user to build a system using deep learning. It can also be compatible with OSs (Operating Systems) such as Windows (registered trademark) and Ubuntu.

”DeepEye Predictor” is provided as a library that supports various languages such as C++, C#, and Python. For example, it is provided in DLL (Dynamic Link Library) or the like.

FIG. 6 is a diagram showing a method of issuing “DeepEye Predictor”.

As shown in FIG. 6, “DeepEye Predictor” is license-managed by the information processing device 1 functioning as a license management server. Therefore, when the library is created from the created model (.DEEP file), “DeepEye Predictor” is issued from the license management information processing apparatus (information processing apparatus 1) via the Internet or the like.

FIG. 7 is a diagram showing “DeepEye Predictor” after issuance.

As shown in Fig. 7, after being made into a library, it can be used standalone, so it can be incorporated into various devices as an application program.

Specifically, for example, it is assumed that the user has developed an application program (hereinafter, referred to as “wine brand application”) that enables extraction of information such as a brand of wine from an image obtained by capturing a wine label.
In this case, a system using deep learning can be provided by mounting "DeepEye Predictor" (deep learning package solution) on the wine brand application.
Then, by reading the ".DEEP file" (model file MF) of the wine label and making it into a library, the customer can develop a wine brand application that can be used standalone.

Summarizing the above, according to the present embodiment, the following effects can be expected.
That is, according to the present embodiment, the AI environment can be realized without the user being aware of the deep learning library (environment construction). That is, in the development of conventional deep learning, it is essential for an engineer to build an environment, which is a trouble. On the other hand, in the deep learning according to the present embodiment, since a deep learning tool with integrated hardware is used, the trouble of constructing the environment is unnecessary. In other words, if you get a deep learning tool, you can immediately develop deep learning according to your purpose. As a result, the convenience of the user who develops deep learning can be improved.

Further, for example, according to this embodiment, the processes from annotation, learning, test, and deployment can be performed in a series of flows. The fact that annotation, learning, and testing can be performed in a series of flows in the development of deep learning can be expected to have a significant time saving effect compared to the process of deep learning development using conventional methods. In addition, it is possible to improve efficiency in the sense that there is no need to learn complicated procedures and know-how.

Further, for example, according to the present embodiment, the created inference model can be executed from another application. Specifically, the deep learning function can be easily incorporated via the above-mentioned “DeepEye Predictor” to which the present embodiment is applied. This makes it possible to add a deep learning function to an existing application program at low cost. Also, the fact that the user can embed annotations into existing application programs "on their own" means that deep learning can be used without leaking confidential data owned by a company to the outside.

Further, for example, according to this embodiment, even a person who does not have knowledge of AI development can try deep learning on a GUI basis. Specifically, it becomes possible to set hyperparameters, select a network, and display a visualization function (Gradcam). Also, image classification annotations can be easily implemented by drag and drop operations. That is, various networks can be evaluated on the basis of GUI only by adjusting the parameters. As described above, according to the present embodiment, even a person who does not have knowledge of AI development need only perform an operation of selecting from given options, which is convenient for the user and enlarges the user. You can expect a big merit in.

Further, for example, according to this embodiment, it is possible to deploy to a plurality of edge devices. Further, even with a browser, the same application program as above can be used. In addition, it is possible to support multi-platform (Windows/Ubuntu).

Although one embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment, and modifications, improvements, etc. within the scope of achieving the object of the present invention are included in the present invention. Is.

For example, in the above-described embodiment, the wine label has been described as the subject of the image, but this is only an example, and any object can be the subject.

The hardware configuration shown in FIG. 2 is merely an example for achieving the object of the present invention, and is not particularly limited.

Also, the functional block diagram shown in FIG. 3 is merely an example and is not particularly limited. That is, it is sufficient if the information processing system has a function capable of executing the above-described series of processes as a whole, and what kind of functional block is used to realize this function is not particularly limited to the example of FIG. ..

The location of the functional block is not limited to that shown in FIG. 3 and may be arbitrary.
Further, one functional block may be configured by hardware alone, software alone, or a combination thereof.

When the processing of each functional block is executed by software, the program forming the software is installed in a computer or the like from a network or a recording medium.
The computer may be a computer embedded in dedicated hardware. In addition, the computer may be a computer capable of executing various functions by installing various programs, for example, a general-purpose smartphone or a personal computer other than an information processing device.

The recording medium containing such a program is not only constituted by a removable medium which is distributed separately from the apparatus main body in order to provide the program to each user, but is also pre-installed in the apparatus main body to each user. It is composed of a recording medium provided.

In the present specification, the steps of writing the program recorded on the recording medium include, not only the processing performed in time series according to the order, but also the processing performed in parallel or individually not necessarily in time series. It also includes the processing to be executed.

Further, in the present specification, the term system means the entire device including a plurality of devices and a plurality of means.

In summary, the information processing apparatus to which the present invention is applied may have various configurations as long as it has the following configuration.
That is, the information processing apparatus (for example, the information processing apparatus 1) to which the present invention is applied is
An annotation means (for example, the annotation unit 112 in FIG. 3) that adds predetermined information (for example, metadata) that can be an annotation of the image to the image (for example, the image BF in FIG. 1),
A learning unit (for example, the learning unit 113 in FIG. 3) that performs learning by using the image to which the predetermined information is given as a teacher image (for example, a teacher image TF) and generates a model,
Deploying means (for example, the deploying unit 114 in FIG. 3) that puts the generated model into a usable state (for example, a model file MF) under a predetermined environment (for example, a system constructed by a user),
Equipped with.

With this, it is possible to provide a deep learning package solution that enables easy promotion and sales for those who develop devices with learning functions.

Further, it is possible to further include a library means (for example, a library forming unit 121) which forms a library of components of a program that executes inference processing by reading the model.

With this, the user can install the deep learning package solution in the application program developed by the user, so that the system using the deep learning can be easily constructed.

In addition, non-browser applications will be able to perform image classification, object detection, and segmentation tasks using deep learning.
Not only learning but also annotation function for each task. In other words, even the annotation function is included.
The Data Augmentation can be set. There is a data replication function.
It is possible to manage the verification results of deep learning in project units like Visual studio. Manage data, trained data, architects, and models consistently.
Annotation is possible with simple operations such as drag and drop (from Explorer to app).
It can be deployed not only on PC but also on various devices such as Jetson, FPGA, iPhone (registered trademark), MOVIDIUS, ARM.
The status of the current work is easily possible. You can visualize the status of annotations>learning>inference> deployment.
The deep learning model visualization function (Gradcam) is packaged.
By visualizing the value of the weight in each layer and the intermediate calculation result, it is possible to feed back to the learning data and to devise the learning data.
The result of classification can be visually displayed in a confusion matrix.

1: Information processing device, 10: GPU, 11: CPU, 12: ROM, 13: RAM, 14: Bus, 15: Input/output interface, 16: Output unit, 17: Input unit, 18: Storage unit, 19: Communication Part, 20: drive, 30: removable medium, 101: modeling part, 102: inference library part, 111: image acquisition part, 112: annotation part, 113: learning part, 114: deploying part, 121: library forming part, 401 : Teacher DB, 402: model DB, BF: image, TF: teacher image, MF: model file

Claims

An annotation means that adds predetermined information that can be an annotation of the image to the image,
A learning unit that performs learning by using the image provided with the predetermined information as a teacher image and generates a model,
Deploying means for making the generated model usable under a predetermined environment,
An information processing apparatus including.
Further comprising library means for converting the components of a program for executing inference processing into a library by reading the model.
The information processing apparatus according to claim 1.