WO2020008502A1

WO2020008502A1 - Information processing system, information processing device, server device, program, or method

Info

Publication number: WO2020008502A1
Application number: PCT/JP2018/025066
Authority: WO
Inventors: 幸輝島田
Original assignee: シンセティックゲシュタルトエルティーディー
Priority date: 2018-07-02
Filing date: 2018-07-02
Publication date: 2020-01-09

Abstract

The present invention provides an information processing system capable of more appropriately use data acquired from an analysis device. An information processing system is provided with: an acquisition unit for acquiring a data set including a plurality of values corresponding to physical index values and a feature of the data set; a neural network which is caused to learn associating between the data set and the feature; and a first identifying unit for identifying the index value which gives an influence on the feature by using a structure in the neural network. The information processing system is further provided with: a database in which the index value and substance information are associated and stored; and a second identifying unit for applying the index value identified by the first identifying unit to the database, and identifying substance information corresponding to the index value.

Description

Information processing system, information processing device, server device, program, or method

技術 The technology disclosed in the present application relates to an information processing system, an information processing device, a server device, a program, or a method.

In recent years, in fields such as biology and chemistry, it has become possible to obtain much information from one sample by improving measurement technology.

JP-A-2017-211762 JP 2018-041434A JP-A-2017-045341

However, due to improvements in the sensitivity and speed of the analyzer, the amount of data obtained from one sample has become high-dimensional and enormous, and it has become difficult to classify data and analyze data.
On the other hand, deep learning, which is one type of machine learning, can deal with high-dimensional and enormous data groups in the learning process. There was a problem that it could not be used for
Therefore, various embodiments of the present invention provide an information processing system, an information processing device, a server device, a program, or a method for solving the above-described problem.

One embodiment of the present application is an acquisition unit that acquires a data set including a plurality of values corresponding to physical index values, characteristics of the data set, and learning of an association between the data set and the characteristics. An information processing system comprising: a neural network that has been made to operate; and a first specifying unit that specifies the index value that affects the characteristic by using a structure in the neural network.

One embodiment of the present application is a database in which the index value and the substance information are stored in association with each other, and the index value specified by the first specifying unit is applied to the database, and the substance information corresponding to the index value is applied. An information processing system, comprising:

One embodiment of the present application is an information processing system, wherein the structure of the neural network used by the first specifying unit to specify the index value is at least one parameter of at least one layer included in the neural network. .

One embodiment of the present application uses a structure in a neural network that has learned the association between a data set including a plurality of values corresponding to physical index values and the characteristics of the data set to influence the characteristics. An information processing apparatus comprising a specifying unit that specifies the index value to be given.

In one embodiment of the present application, the index value specified by the specifying unit is applied to a database that stores the index value and the substance information in association with each other, and a second specification that specifies the substance information corresponding to the index value is performed. An information processing device comprising a unit.

In one embodiment of the present application, the data set is a biochemical component, the index value is an m / z value, and the characteristic indicates that the subject is a healthy person or an affected person related to a specific disease. Information processing device that is information.

In one embodiment of the present application, the data set is a microbiome, the index value is a base sequence, and the characteristic is information indicating that the subject is a healthy person or an affected person related to a specific disease. Processing equipment.

One embodiment of the present application learns the association between the database in which the index values and the substance information are stored in association with each other, a data set including a plurality of values corresponding to physical index values, and the characteristics of the data sets. A second specifying unit that specifies substance information corresponding to the index value, using the index value that affects the characteristic, which is specified based on the structure in the neural network that has been made to operate. apparatus.

In one embodiment of the present application, a computer acquires a data set including a plurality of values corresponding to physical index values, and a property of the data set, obtaining the data set, the data set, the property, An information processing method comprising: a step of causing a neural network to learn the association of the above; and a first specifying step of using the structure in the neural network to specify the index value that affects the characteristic.

One embodiment of the present application uses a database in which the index value and the substance information are stored in association with each other, and applies the index value specified by the first specifying step to the database to correspond to the index value. A second specifying step of specifying substance information.

In one embodiment of the present application, the data set is a biochemical component, the index value is an m / z value, and the characteristic indicates that the subject is a healthy person or an affected person related to a specific disease. Information processing method that is information.

In one embodiment of the present application, the data set is a microbiome, the index value is a base sequence, and the characteristic is information indicating that the subject is a healthy person or an affected person related to a specific disease. Processing method.

One embodiment of the present application is an acquisition unit that acquires a first data set including a plurality of values corresponding to physical index values, characteristics of the first data set, the first data set, and the characteristics And applying a second data set including a plurality of values corresponding to physical index values to the neural network, based on a calculated value at the time of application of the neural network, An information processing system comprising: a first specifying unit that specifies an index value.

In one embodiment of the present application, the calculated value at the time of application of the neural network is the neural network used to derive characteristics of the second data set when the second data set is applied to the neural network. An information processing system that is a numerical value for each layer in the network.

の一 One embodiment of the present application is a program that causes a computer to perform some of the operations described above.

The data set may be a biochemical component or a microbiome. The biochemical component may be a biological component. The data set may be in a vector format or a tensor format. The data set does not need to include the physical index value in the information output from the analyzer, but may include the physical index value.
Physical index values include, but are not limited to, m / z values, base sequences, wavelengths, wave numbers, angles, times, or measurement locations.

Characteristics may be information indicating cell characteristics. The cell characteristics include high differentiation characteristics (information such as high or low differentiation ability), high cell activity (information such as high or low cell activity), and high cell growth ( Information such as high or low cell proliferation), production of cytokines in the cell (information such as high or low production), high cytotoxic activity (information such as high or low cytotoxicity), Information such as the degree of differentiation (information on whether the cell is differentiated or not, and information on the degree of differentiation when the cell is differentiated) may be used. Further, the height of each characteristic may be based on the result measured by a measuring device that measures each characteristic.

The property may also be an evaluation based on a measurement result such as information indicating that the patient is a healthy person or an affected person for a specific disease, or an evaluation regarding a cell such as a good cell or a bad cell.

Microbiomes include, for example, intestinal flora.
The index value may be a genome, a metagenome, or a presence distribution of a bacterial species.
The substance information may be a substance name, a chemical formula of the substance, a composition formula of the substance, a molecular formula of the substance, an ionic formula of the substance, a structural formula of the substance, or the like.

According to one embodiment of the present invention, data obtained from an analyzer can be more appropriately utilized.

FIG. 1 is a block diagram illustrating the configuration of one information processing apparatus according to one embodiment. FIG. 2 is a block diagram illustrating a configuration of another information processing apparatus according to the embodiment. FIG. 3 is a block diagram illustrating a specific example of the function of the information processing apparatus according to the embodiment. FIG. 4 is a block diagram illustrating a flow example of the information processing apparatus according to the embodiment. FIG. 5 is a diagram illustrating a concept according to one embodiment. FIG. 6 is a diagram illustrating how to read a diagram according to an embodiment. FIG. 7 is a block diagram illustrating a configuration of the information processing apparatus according to the embodiment. FIG. 8 is a block diagram illustrating a flow example of the information processing apparatus according to the embodiment. FIG. 9 is a diagram illustrating a display example of the information processing apparatus according to the embodiment. FIG. 10 is a diagram illustrating a display example of the information processing apparatus according to the embodiment.

1. Each Configuration of Information Processing Apparatus 10 The information processing apparatus 10 according to an embodiment of the present invention may include a bus 11, an arithmetic device 12, a storage device 13, and a communication IF 16, as shown in FIG. Further, the information processing device 10 may include an input device 14 and a display device 15. Further, it is directly or indirectly connected to the network 19.

The bus 11 may have a function of transmitting information among the arithmetic device 12, the storage device 13, the input device 14, the display device 15, and the communication IF 16.

プロセッサ An example of the arithmetic unit 12 is, for example, a processor. This may be a CPU or an MPU. Further, it may have a graphics processing unit, a digital signal processor, or the like. In short, the arithmetic device 12 may be any device that can execute the instructions of the program.

The storage device 13 is a device for recording information. This may be either an external memory or an internal memory, and may be either a main storage device or an auxiliary storage device. Further, a magnetic disk (hard disk), an optical disk, a magnetic tape, a semiconductor memory, or the like may be used. Further, a storage device via a network or a storage device on a cloud via a network may be provided.

Note that a register, an L1 cache, an L2 cache, and the like that store information at a position physically close to the arithmetic device may be included in the arithmetic device 12 in the block diagram of FIG. , The storage device 13 may include the information recording device. In short, it is only necessary that the arithmetic device 12, the storage device 13, and the bus 11 are configured to cooperate and execute information processing.

The storage device 13 can include a program for executing a service related to the present invention. Further, data necessary for executing a service related to the present invention can be recorded as appropriate. Further, the storage device 13 may include a database.

In the above description, the case where the arithmetic device 12 is executed based on a program provided in the storage device 13 has been described, but one of the above-described forms in which the bus 11, the arithmetic device 12 and the storage device 13 are combined is described. Alternatively, the information processing according to the present system may be realized by a programmable logic device capable of changing a hardware circuit itself or a dedicated circuit in which information processing to be performed is determined.

The input device 14 is for inputting information, but may have other functions. Examples of the input device 14 include input devices such as a keyboard, a mouse, a touch panel, and a pen-type pointing device.

The display device 15 has a function of displaying information. For example, a liquid crystal display, a plasma display, an organic EL display, and the like can be given. In short, any device that can display information may be used. Further, the input device 14 may be partially provided like a touch panel.

The network 19 transmits information together with the communication IF 16. That is, it has a function of transmitting information of ten information processing apparatuses to another information terminal (not shown) via the network 19. The communication IF 16 may be of any connection type, such as USB, IEEE 1394, Ethernet (registered trademark), PCI, or SCSI. The network 19 may be either wired or wireless, and may use an optical fiber, a coaxial cable, or the like.

The hardware constituting the information processing apparatus according to one embodiment of the present invention may be a general-purpose computer or a dedicated computer. Further, the hardware may be a workstation, a desktop personal computer, a laptop personal computer, a notebook personal computer, a PDA, a mobile phone, a smartphone, or the like.

Although FIG. 1 illustrates one information processing apparatus 10, the information processing apparatus 10 may include a plurality of information processing apparatuses. The plurality of information processing devices may be internally connected or may be externally connected. Further, when the information processing device 10 includes a plurality of information processing devices, the owners thereof may be different. Also, the person who operates the information processing device 10 as the system according to the present invention may be different from the owner of the information processing device 10.
Further, the information processing apparatus 10 may be a physical entity or a virtual entity. For example, the information processing apparatus 10 may be virtually realized using cloud computing.

2. An embodiment Figure 2 of the system is an example of the system of the present embodiment that schematizes. A feature amount is extracted from the learning data 201 using a learned neural network, and a neural network 202 having a classification model is constructed. By using this model for the unknown data 203, the determination result 204 for the unknown data is extracted. Further, a feature amount 205 serving as a basis for the determination result is extracted. The characteristic amount 205 is inquired to the database 206 to provide related substance information 207.

FIG. 7 shows another embodiment of the system of this embodiment. The

user terminals

71a and 71b are terminal devices assumed to be used by the user. The

user terminals

71a and 71b are connected to a network 72 so that information can be transmitted.

The management device 73a is a server that manages the system of this example. The management server 73a can connect to the

user terminals

71a and 71b via the network 72. Further, the management device 73a can be connected to the

administrator terminals

73b and 73c.

The analyzer 75 may be configured to be connected to the network 72. The analyzer 75 may be configured to be connected to the

user terminal

71a or 71b, or may be configured to be connected to the management server 73a, for use by the user. Since the system of this example uses a sample obtained by the analyzer, the analyzer itself may be configured separately.

The neural network system 76 is connected to the network 72 and can be connected to the management device 73a.

The above description has been made on the assumption that the system of the present example includes the neural network system 76. However, the neural network system 76 exists independently of the system of the present example, and the system of the present example includes the neural network system 76. There may be no configuration. In this case, for example, the system of this example is configured to receive a feature amount (or an explanatory element described later) from a neural network that has learned an analysis result (a sample such as a detection intensity vector described later) and a characteristic. Is also good. Further, the system of the present example transmits the target sample to the neural network in which the sample and the characteristic are learned, and the neural network in a state where the target sample is applied to the learned neural network. The configuration may be such that the feature amount (or an explanatory element described later) relating to the internal structure is received from the neural network system 76. The learning may be deep learning.

The database 74a and / or the database 74b include, for example, a database that associates data of an analyzer with substance information. For example, a database for associating a feature value (or an explanatory element described later) with substance information may be used. These may not be single but may be divided into a plurality. The management device 73a may be configured to be able to acquire substance information from a database by an inquiry using a physical index value.

The system of this example may have a configuration including the database 74a and / or the database 74b, or may not include the database 74a and / or the database 74b. For example, the system of the present example includes

databases

74a and 74b, and adds a feature amount (or an explanatory note) acquired from the neural network system 76 to a database including an association between a feature amount (or an explanatory note described later) and material information. And the database may be configured to transmit the substance information corresponding to the characteristic amount (or the explanatory element). Also, when the system of the present example does not include the database 74a and / or 74b, The system of the present embodiment transmits the feature amount (or explanatory note) acquired from the neural network system 76 to the database 74 and / or 74b, and the system of the present embodiment transmits the feature amount (or the explanatory note) from the database 74 and / or 74b. Or, it may be configured to receive the substance information corresponding to the above-described explanatory element).

Databases and neural networks may be implemented in a server-client format or in a cloud format. Further, the information processing device may be formed by one information processing device, or may be formed by a plurality of information processing devices. Further, in the case of a plurality of information processing apparatuses, the present invention is not limited to the diagram of FIG. 7 and may be realized by various network configurations.

3. Next, functions of the system of the present example will be described with reference to FIG. FIG. 3 is a block diagram showing a specific example of a function according to the system of the present example. As described above, the neural network unit 32 and the database unit 33 may be outside the system of the present example.

3.1. Acquisition unit 31
The acquisition unit 31 has a function of acquiring information. The information includes a sample for causing the neural network unit to learn, characteristic information corresponding to the sample, and the like.

3.2. Neural network unit 32
The neural network unit 32 has a function of learning using data. The neural network unit 32 is not essential, but may have a function capable of responding to the input data with corresponding information using the learned neural network.

3.3. Database unit 33
The database unit 33 has related data. Specifically, it has a function of associating the element associated with the neural network with the substance information and replying the substance information corresponding to the inquired element. For example, if the database associates m / z values with substance information, the database has a function of responding to substance information corresponding to one or more m / z values, and is a database that associates base sequences with substance information. If so, it may have a function capable of responding to the substance information corresponding to one or more base sequences, but is not limited thereto, and stores a physical index value and the corresponding substance information, It may have a function that can respond to the index information with the corresponding substance information.

3.4. Identification unit 34
The specifying unit 34 has a function of specifying a feature amount in the neural network unit 32 or an explanatory element described later. The feature amount or the explanatory element may be specified from the learned neural network, or may be specified from information in the neural network when specific data is used.

3.5. Storage unit 35
The storage unit 35 may have a function of storing a program related to each of the above functions and / or corresponding data.

4. Example
4.1. Example 1
Next, an example of the overall flow using the system of the present example will be described. First, the user measures a plurality of samples under the same condition using the analyzer, and acquires data corresponding to each sample (401). Next, data pre-processing is performed (402). Next, using the data, the neural network is trained, the data is classified, and the features involved in the classification are specified (403). Next, the feature amount is displayed (404). Next, information related to the characteristic amount is specified and displayed from the database including the characteristic amount in the index and the specified characteristic amount (405).

Next, each step will be described more specifically. First, a user measures a sample using an analyzer. Various analyzers may be used as the analyzer used at this time. In the present embodiment, a description will be given of a mass spectrometer.

In addition, it is preferable that the measurement of a plurality of samples is performed under the same conditions. However, although the same conditions are preferably strict measurement conditions, measurement errors may occur due to various factors. Any measurement result within a conceivable range may be used. In this embodiment, a biological component is used as a sample.

Next, pre-process the data. As the pre-processing, for example, a baseline correction is performed. This is because the standard may be affected by, for example, the inclusion of a magnetic substance in the sample. Baseline correction may be performed manually or automatically. Further, the processing may be performed as a process in the analyzer, or may be performed as one function of the system of the present example after inputting to the system of the present example.

{Circle around (2)} As a preprocessing, a two-dimensional image may be created by regarding the measurement result as a vector. For example, in a mass spectrometer, a secondary image is created by sequentially folding information of a combination of an m / z value and detection intensity acquired as data. Here, the m / z value can be shared as a value of, for example, 1000.00 to 2999.99, and can be omitted. For example, as information of a set of each m / z value and the detection intensity, each m / z value such as (1000.00: a), (1000.01: b), (1000.02: c). When there is information on a set of detection intensities and detection intensities, the information is input, and a vector is composed of detection intensity values corresponding to the respective m / z values (a, b, c...) Intensity vector). It should be noted that information on a healthy person or an affected patient may be added to one vector of the detection intensity.

Next, using the data, the neural network is trained, the data is classified, and the features involved in the classification are specified. The learning algorithm may be a supervised learning algorithm or an unsupervised learning algorithm.

In the case of a supervised learning algorithm, a relationship between the above-described data vector (or a tensor described later) corresponding to a physical index value measured by an analyzer and their characteristics as supervised data is represented by a neural network. Let them learn. When mass is analyzed using a mass spectrometer with a biological component as a sample, whether the person is a healthy person or an affected person with respect to a specific disease is given as teacher information. In this case, the system of the present example may learn the neural network by associating the information of the biological component with the healthy person or the affected person. Thereby, it is configured that the information of the biological component can be classified into a healthy person and an affected person. In this case, the sample may not be annotated manually in advance, and a sample without annotation may be used. Further, in the above description, the neural network learns the relationship between the above-described data vector (or tensor described later) corresponding to the physical index value and their characteristics as teacher data. Learning may be performed by adding input values of other information.

In the case of an unsupervised learning algorithm, it is possible to acquire unknown information. In this case, it is possible to classify the biological component without giving information on whether the component is a healthy person or an affected patient. In particular, when using a sample without annotation, there is an advantage that the burden of manually annotating is reduced.

学習 As an unsupervised learning algorithm, an auto encoder, a restricted Boltzmann machine, or a method in which these are multi-layered may be used. For example, when an auto encoder is used, the dimension of the input data is reduced by applying the encoder to the input data, and the decoder is applied to the data with the reduced dimension to recover the dimension. Then, learning is performed by changing the weight value so that the same data is obtained. Here, the classification is realized by changing the weighting, but the present invention is not limited to this method, and another method may be used.

(4) Next, specify the feature quantities involved in the classification. The feature value affects the classification. More specifically, an explanatory element described later may be specified as the feature amount. For example, in a neural network acquired by the above-described mass spectrometer and learned to classify using the spectrum of the m / z value and the detected intensity or a vector based on the spectrum as the input data, the m / z value is calculated from the feature amount. Can be identified.

The method of specifying the m / z value may use, for example, parameters of each layer. Here, the parameter of each layer is a value indicating how much the input value affects the output value. Therefore, by tracing parameters that affect the output value from the output layer toward the input layer, it is possible to specify the input value that affects the final output value.

において In the specification of the present application, an input value that affects a final output value and / or information processed based on the input value is referred to as an explanation element. The input value may be one or more. As a specific example of the explanatory element, for example, if a sample is a spectrum of m / z values, one or a plurality of m / z values may be mentioned. The processed information includes an edge that would indicate a cell wall if the sample is an image, a contour of a cell morphology, and the like. Since these edges are processed using a filter on the input value, the explanatory note may include these. Note that the degree of influence of the explanatory element on the output varies, and some of them have a large effect, while others have a small effect. It should be noted that the feature quantity in the specification of the present application is a superset of explanatory characters, including those in a form that cannot be generally interpreted by humans.

That is, for the first parameter and the second parameter in the same layer, when the first parameter is larger than the second parameter, the system according to the present embodiment converts the input value to which the first parameter is applied into an explanatory value. It may be configured to be set as. Here, the first parameter and the second parameter may have a common input value.

Also, in the system of the present example, the first parameter is applied to the first parameter and the second to N-th parameters in the same layer when the first parameter is larger than the second to N-th parameters. May be configured to set an input value to be used as an explanatory note. Here, the first parameter and the second to N-th parameters may have a common input value.

In the above description, the parameter may be a weight value in a calculation formula for calculating an inner product in a neural network.

Because the internal structure of the machine learning device is complicated, there is a problem that the user cannot understand the basis of the judgment made by the machine, such as the feature amount that has contributed to the data classification. In the case where a configuration capable of specifying the quantity is provided, it is clear that the classification has been made on the basis of the feature quantity, so that there is an advantage that the user can easily understand the basis of the judgment made by the machine. For example, when learning is performed by a neural network using, as target data, information relating detection intensity data corresponding to an m / z value acquired by a mass spectrometer with respect to a biological component and information on a healthy person or an affected patient. , The specific detection intensity (or the corresponding m / z value) that classifies the healthy person or the affected person can be specified, and the m / z value that is the basis for such a neural network to determine is The user can understand.

The system of the present example may be configured to display the specified feature amount or the explanatory element so that the user can easily understand the characteristic amount or the explanatory element. In addition, processing may be performed and displayed so that the user can easily understand. For example, the system of the present example may be configured to list and display the specified plurality of m / z values, or to display the specified plurality of m / z values in a ranking format in consideration of the order in which the specified m / z values are affected. May be. FIG. 9 is an example in which a ranking is given to a plurality of m / z values in the order of influence on the output value. Here, the degree of the influence may be determined by determining the magnitude of the input value that affects the output value based on the magnitude of the above-described parameter. However, the method is not limited to this method, and other methods may be used. Good.

FIG. 5 is an example showing a method for specifying an m / z value serving as a feature amount. In the calculation of each layer from the input layer to the output layer, an input layer that has a greater effect on the output layer is presented as a feature amount. FIG. 6 shows an example of how to read the display of FIG. That is, the system of the present example displays each layer from the input layer to the output layer in the neural network, in which the input value to each layer is associated, and displays the input value whose influence on the output layer is larger than a predetermined value. A specific display may be performed rather than other input values.

Ｍ The m / z value as a feature value when a supervised learning algorithm is used has an effect on classifying a biological component as a healthy person or an affected person. That is, by using the system of this example, it is possible to specify the m / z value that affects the determination of a healthy person or an affected person. In addition, even if the detection intensity of the m / z value is very small, it is a specific target as long as it has an influence on the determination of a healthy person or an affected person. For this reason, in the prior art, as a biological component mass spectrometer, research or determination is mainly performed based on the peak value of the m / z value, whereas according to the above-described configuration, the detection intensity is very small. There is an advantage that the z value can be specified as a target.

Next, the system of the present example may display information related to the feature amount from the database including the feature amount in the index and the specified feature amount. For example, it is assumed that one or a plurality of m / z values can be specified as the feature amount. It is assumed that the database is a database including an association between m / z values and substances. At this time, one or a plurality of substances associated with one or a plurality of m / z values specified as the feature amount may be specified from the database.

システム The system of this example, when configured in this way, can identify substances that affect the determination of a healthy person or an affected person.

The data used by the system of this example is not limited to the m / z value corresponding to a so-called peak having a high detection intensity, but may be data indicating the relationship between the m / z value other than the peak and the detection intensity. When a neural network is trained by deep learning, even if the detection intensity of the m / z value is very small, it may be possible to specify a small amount of substances as long as it affects the discrimination between a healthy person and an affected person. There is. As a result, it is possible to specify a small amount of a substance that has not been examined to determine the presence or absence of a disease. In particular, when a trace amount of a substance contains toxicity, it has an advantage that it can be effectively identified.

手法 Various methods may be used for specifying one or more substances from one or more m / z values. For example, when one m / z value is specified, a plurality of substances in a database associated with the m / z value may be specified. The plurality of substances may be all substances associated with the m / z value in the database, or may be one selected by a specific criterion among all substances associated with the m / z value. Parts of the substance.

{Circle around (2)} When a plurality of m / z values are specified, a substance related to the plurality of m / z values may be specified using the database. In this case, the system of the present example specifies one or more related substances by using a database for one m / z value of the plurality of m / z values, and identifies the related substance or substances with the plurality of m / z values. Multiple substances may be specified by applying to all of the values. Further, the system of the present example may be configured to specify one or a plurality of substances that include all of the plurality of specified m / z values and are associated with the specified plurality of m / z values.

In addition, in the specification of the plurality of substances, the degree of influence may be calculated in consideration of the degree of influence on the pattern classification among the plurality of feature amounts or explanatory characters. FIG. 10 is an example of displaying related substances by ranking in relation to m / z values. Related substances A to E are displayed corresponding to the order of influence of the m / z value. Here, FIG. 10 shows one substance corresponding to each m / z value, but a plurality of substances corresponding to each m / z value (such as when a plurality of substances are searched in a database). Substance information may be displayed, or one substance information may be displayed for a plurality of m / z values (such as when one substance is specified corresponding to a plurality of m / z values). .

4.2. Example 2
In the second embodiment, as in the first embodiment, the sample is a biological component, and the analyzer is a mass spectrometer. The second embodiment mainly describes differences from the first embodiment. In the second embodiment, unknown data is used to obtain an explanatory note.

Specifically, the system of this example may make the neural network learn the association between the vector of the detection intensity and the information on the healthy person or the affected person. Thereafter, a vector of the detected intensity may be generated from a biological component of a specific patient, which is unknown whether it is a healthy person or an affected patient, and the vector may be applied to the neural network. Thereby, information indicating whether the patient is a healthy person or an affected person may be specified. Further, among the numerical values in the vector of the detection intensity of the patient, an m / z value corresponding to a numerical value that has influenced whether the patient is a healthy person or an affected person may be specified. Note that the information on whether the patient is a healthy person or an affected patient may not be used.

The method for determining the m / z value is to follow the numerical value of each layer in the neural network from the output layer to the input value in the calculation at the time when the vector of the detected intensity of the patient is applied to the learned neural network. In this way, among the numerical values in the vector of the detection intensity of the patient, a numerical value that has influenced the identification of whether the patient is a healthy person or an affected person is specified, and the m / z value corresponding to this is determined. Identify. Here, in the calculation of the explanatory element, the calculation process in the neural network based on the structure of the neural network at the time when the sample relating to the patient is applied to the learned neural network, for example, the sample information relating to the patient, May be used.

The specified m / z value is the same as that of the first embodiment in that the related substance information is specified using a database in which the m / z value is associated with the substance information.

FIG. 8 shows an example of the flow of the second embodiment. The acquisition 801 of the analyzer, the preprocessing 802, and the specification and display 806 of the information related to the feature amount are the same as those in the first embodiment. On the other hand, the deep learning 803 is performed, the neural network is applied to the unknown data after the deep learning 804, and the feature amount 805 is different from the first embodiment.

When the present example system has such a configuration, the m / z value that has influenced the determination of whether the subject is a healthy person or an affected individual is specified for a healthy person or a patient whose unknownness is unknown, The substance information corresponding to this is specified, and the substance information serving as a basis when it is determined that the patient is affected can be specified.

違い The difference between the first embodiment and the second embodiment will be described more conceptually. For example, in the first embodiment, it is assumed that there are three input values (for example, m / z values) A, B, and C for distinguishing a healthy person from an affected person. These A, B, and C are m / z values specified based on the configuration of the neural network trained by associating the detected intensity vector with the information of a healthy person or an affected person. The meaning of these specified m / z values is that if all three of A, B, and C satisfy predetermined requirements, they are determined to be healthy subjects or diseased patients, or A, B, and C If any one (or two) of the three satisfies the predetermined requirements, it is determined that the subject is a healthy person or an affected person, and all three may not need to satisfy the predetermined requirements. As described above, when the requirement can be selectively satisfied, in the second embodiment, the m / z value which is the requirement to be selectively satisfied is specified, and the target patient can be determined as a healthy person or an affected patient. There is an advantage in that evidence can be provided.

An acquisition unit configured to acquire a first data set including a plurality of values corresponding to physical index values, and characteristics of the first data set;
A neural network that has learned the association between the first data set and the characteristic,
A second data set including a plurality of values corresponding to physical index values is applied to the neural network, and a specifying unit that specifies the index value based on a calculated value when the neural network is applied, It may be a processing system.

Here, the index value specified by the specifying unit may be an index value corresponding to information in the second data set that has influenced the derivation of characteristics for the second data set.

In addition, when the second data set is applied to the neural network, the calculated value at the time of application of the neural network is the value of each layer in the neural network used to derive the characteristics of the second data set. It may be a numerical value.

4.3. Example 3
Example 3 describes an example in which intestinal microflora is used as a sample and a next-generation sequencer is used as an analyzer. Also in this case, samples obtained from healthy persons and affected persons are used. As a result, a base sequence can be obtained from a next-generation sequencer. Using the base sequence and data in which a healthy person or an affected person is associated, a learned neural network is constructed. Then, a sequence as a feature amount can be specified when classifying healthy subjects and affected patients. More specifically, it is possible to specify an array as an explanatory element.

腸 By using a database in which the sequence is associated with the intestinal flora, the intestinal flora can be identified from the sequence derived as a feature value. This makes it possible to specify the intestinal microflora that influences the discrimination between a healthy person and an affected person for a specific disease. That is, intestinal flora related to the disease can be specified. Thereby, for example, the intestinal flora to be a target for studying the disease in more detail can be specified.

It should be noted that a base sequence serving as a feature may be specified from the structure of the neural network learned using the data to which information of a healthy person or an affected person is attached, or that after a deep learning, a healthy person or an affected person may be identified. A base sequence serving as a feature may be specified from the structure of the neural network at the time when the unknown sample is applied to the learned neural network.

4.4. Example 4
In the fourth embodiment, an analyzer, a sample, and a characteristic other than the targets described in the first to third embodiments will be described. The description of the parts overlapping with the first to third embodiments will be omitted.

In the first to third embodiments, examples of the mass spectrometer and the next-generation sequencer have been described. However, other various analyzers may be used. For example, there are an optical analyzer, an electromagnetic analyzer, a separation analyzer, a thermal analyzer, and the like. Examples of the optical analyzer include an ultraviolet / visible spectrophotometer, an infrared spectrophotometer, an atomic absorption analyzer, a fluorimeter, and a Raman spectrophotometer. Examples of the electromagnetic analyzer include an X-ray analyzer, an X-ray absorption analyzer, a mass analyzer, a nuclear magnetic resonance apparatus, and the like. Examples of the separation analyzer include a gas chromatograph, a high-performance liquid chromatograph, and an electrophoresis apparatus. In short, any analyzer that is not listed above may be any device that can output a spectrum that can be changed to a vector format as an analysis result, as described later.

In addition, in Examples 1 to 3, a biological component and a base sequence were used as a sample, but the present invention is not limited thereto. Examples of the sample include components inside and outside an organism. The component in the organism may be an animal component, a plant component, or a bacterial component. Further, as a component of an animal, a component in a human body may be used. In addition, although not a component that forms the body of an animal, bacteria that exist inside the body such as the digestive tract, respiratory system, and oral cavity of the animal may be used. Similarly, bacteria may be present inside the body, such as the human digestive tract, respiratory system, or oral cavity as a class of animals. For example, the sample may be an intestinal flora. The component outside the organism may be a component that has been excreted outside the organism, a component that has been excreted extracellularly, or a component that has been used or produced in a cell factory.

In the first to third embodiments, an example of the detection intensity with respect to the m / z value and the base sequence has been described as a vector. However, data used as a vector may be other data according to each analyzer. For example, when an optical analyzer is used as the analyzer, it may be a vector of numerical values for each wavelength. More specifically, if a spectrophotometer such as an ultraviolet / visible light photometer, an infrared spectrophotometer, and a Raman spectrometer is used as the analyzer, the transmittance (reflectance or absorbance) for each wavelength (or wave number) is assumed. Vector. When a fluorimeter is used as the analyzer, a vector of the fluorescence intensity for each wavelength may be used. When an atomic absorption spectrometer is used as the analyzer, the concentration vector for each element may be used. When an X-ray analyzer is used as the analyzer, the vector may be an intensity vector for each angle. In the case of an analyzer such as a gas chromatograph, a liquid chromatograph, or an electrophoresis apparatus, the detection frequency with respect to time may be used. In these data, the m / z value, the base sequence, the wavelength, the wave number, the element, the angle, the time, the measurement location, and the like are physical index values.

The system of the present example may be configured to acquire a plurality of data sets including a plurality of values corresponding to physical index values, and learn a neural network for the plurality of data sets. By having a plurality of data sets as inputs, there is an advantage that the neural network can be deeply learned with diversified information including different viewpoints while sharing the target data. For example, when a biological component is used as a sample using a mass spectrometer, a vector of the detected intensity of the m / z value and image data of the sample may be obtained. The image data is data of the appearance of the sample. As described above, the physical index in the image data is the measurement location (Cartesian coordinates, polar coordinates, etc.), and the value corresponding to this is the input value. In this case, tensor data is obtained from the image value and the spectrum data. Further, color data may be included corresponding to the position data. The color data may be RGB or CMYK.

That is, the system of the present example includes a first data set including a plurality of values corresponding to a physical first index value, a second data set including a plurality of values corresponding to a physical second index value, Using an acquisition unit that acquires the characteristics of the set, the first data set, the second data set, the neural network that has learned the association between the characteristics, and a structure in the neural network, A first specifying unit that specifies the first index value that affects the characteristic, a database that stores the first index value and the substance information in association with each other, and an index value specified by the first specifying unit, An information processing system comprising: a second specifying unit that is applied to the database and specifies substance information corresponding to the index value.

特性 In the first to third embodiments, the characteristic is information on a healthy person or an affected patient, but may be other information. For example, it may be a measurement result of a sample such as a cell or a biological component by another measurement device, may be an objective evaluation based on these, or may be a subjectivity based on these or other situations. It may be a simple evaluation. The information on a healthy person or an affected person is a result of measurement by another device or a comprehensive medical finding based on these results, and is an example of evaluation. In addition, the quality of the cells is also an example of the evaluation. By configuring the characteristics in this manner, the data set measured by the analyzer can be measured in terms of characteristics (e.g., in terms of healthy or diseased patients, in terms of good and bad cells, or measured by other measurement devices). (Viewpoint of results), and an index value related to a data set that has influenced the classification can be specified.

In particular, when the characteristic is information of another measurement result or evaluation based on the result, the sample can be classified based on the characteristic, and a physical index value that has influenced the characteristic can be specified. When the index value is applied to the database of the above, it is possible to specify the substance information that causes a difference between the other measurement results or the samples classified from the viewpoint of evaluation based on the other measurement results.

The system of this example may acquire a plurality of data as characteristics and use the plurality of data as teacher data when learning a neural network. For example, both information about a healthy person or an affected patient and information about the quality of cells may be acquired, and both pieces of information may be used as teacher data of a neural network. Thereby, when learning the sample in the neural network, the types and viewpoints of the characteristics are increased, so that there is an advantage that the classification can be set more finely and the sample can be classified from a more diverse viewpoint.

In Examples 1 to 3, the database used was a database of m / z values and substance information, and the database of base sequences and substance information. However, a database adapted to the characteristics of each physical index value was used. May be. That is, a database of substance information measurable by each of the analyzers associated with each index value measured by each of the analyzers may be used. This makes it possible to specify the substance information related to or indicating the index value in accordance with each index value measured by each analyzer. Then, by being configured to be able to display the substance information, the user specifies the substance information in the sample that affects the classification when the sample is classified from a predetermined viewpoint of the characteristic. It becomes possible.

Also, in the specification of the present application, a vector has been described as a data format of the data acquired from the analyzer, but a tensor may be used instead of or in addition to the vector. Since a vector is a first-order tensor, a second-order tensor, a third-order tensor,..., An N-th order tensor as an extension of the vector can be similarly learned. For example, when a fluorimeter is used as an analyzer and excitation wavelengths are selected, fluorescence intensity data for each excitation wavelength for each excitation wavelength, that is, a second-order tensor may be used. Also, for the two-dimensional data of the image or the three-dimensional data of the volume of the object itself, or for each of these parts, the dimension of the measured value obtained by measuring the appearance information, constituent information, numerical information, temperature information, etc. related to the constituent material is added. Thus, a tensor of the second, third, or higher order may be used. An example of the two-dimensional data or the three-dimensional data may be data acquired by a microscope or X-ray analysis. The microscope may be an optical microscope, an electron microscope, an X-ray microscope, an ultrasonic microscope, a scanning probe microscope, or the like. For example, three-dimensional data obtained by measurement with a cryo-electron microscope or three-dimensional data obtained by X-ray analysis may be used.

In the above description, the configuration implemented by the system of the present example has been described, but these may be configured by one or a plurality of information processing apparatuses in the system.

The invention examples described in the embodiments of the present application document are not limited to those described in the present application document, but may be applied to various examples within the technical idea. For example, in the embodiments of the present application, the information presented on the screen of the information processing apparatus can be displayed on the screen of another information processing apparatus, and can be transmitted to the other information processing apparatus. A system may be configured.

The processes and procedures described in the present application may be realized not only by those explicitly described in the embodiments but also by software, hardware, or a combination thereof. The processes and procedures described in the present application may be implemented by a computer by implementing the processes and procedures as a computer program.

Claims

A data set including a plurality of values corresponding to physical index values, and characteristics of the data set;
A neural network that has learned the association between the data set and the characteristic,
A first specifying unit that specifies the index value that affects the characteristic using a structure in the neural network;
Information processing system equipped with.
A database stored in association with the index value and the substance information,
A second specifying unit that applies the index value specified by the first specifying unit to the database and specifies substance information corresponding to the index value;
The information processing system according to claim 1, comprising:
The dataset is a biochemical component,
The index value is an m / z value,
The characteristic is information indicating that the subject is a healthy person or an affected person for a specific disease,
The information processing system according to claim 1.
The data set is a microbiome,
The index value is a base sequence,
The characteristic is information indicating that the subject is a healthy person or an affected person for a specific disease,
The information processing system according to claim 1.
The structure of the neural network used by the first specifying unit to specify the index value is at least one parameter of at least one layer included in the neural network. An information processing system according to item 1.
A data set including a plurality of values corresponding to physical index values, and characteristics of the data set,
An information processing apparatus comprising: a specifying unit that specifies the index value that affects the characteristic by using a structure in a neural network that has learned association of the index.
The apparatus according to claim 1, further comprising a second specifying unit configured to apply the index value specified by the specifying unit to a database storing the index value and the substance information in association with each other, and to specify the substance information corresponding to the index value. 7. The information processing apparatus according to 6.
The dataset is a biochemical component,
The index value is an m / z value.
The property is information indicating that the subject is a healthy person or an affected person with respect to a specific disease,
The information processing device according to claim 6.
The data set is a microbiome,
The index value is a base sequence,
The property is information indicating that the subject is a healthy person or an affected person with respect to a specific disease,
The information processing device according to claim 6.
A database stored in association with the index value and the substance information,
A data set including a plurality of values corresponding to physical index values, and characteristics of the data set,
A second specifying unit that specifies the substance information corresponding to the index value, using the index value that affects the characteristic, which is specified based on the structure in the neural network that has learned the association.
An information processing device comprising:
Computer
Acquiring a data set including a plurality of values corresponding to physical index values, and characteristics of the data set,
Training the neural network to associate the dataset with the property;
A first specifying step of using the structure in the neural network to specify the index value affecting the characteristic;
An information processing method including:
Using a database stored in association with the index value and substance information,
A second specifying step of applying the index value specified by the first specifying step to the database to specify substance information corresponding to the index value;
The information processing method according to claim 11, comprising:
The dataset is a biochemical component,
The index value is an m / z value,
The property is information indicating that the subject is a healthy person or an affected person with respect to a specific disease,
The information processing method according to claim 11.
The data set is a microbiome,
The index value is a base sequence,
The property is information indicating that the subject is a healthy person or an affected person with respect to a specific disease,
The information processing method according to claim 11.
A program for causing a computer to perform the operation according to any one of claims 11 to 14.