CN111599431A

CN111599431A - Report sheet-based data coding model generation method, system and equipment

Info

Publication number: CN111599431A
Application number: CN202010242017.3A
Authority: CN
Inventors: 陶然; 赵利伟; 杨苗; 刘敏; 吴佳丽; 续静
Original assignee: Taiyuan Kingmed Clinic Examination Co ltd
Current assignee: Taiyuan Kingmed Clinic Examination Co ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-08-28

Abstract

The invention discloses a report form-based data coding model generation method, which comprises the following steps: initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder and a decoder, and the network parameters comprise encoder parameters and decoder parameters; enabling the initial training model to enter one cycle iteration according to the first preset cycle times; calculating a loss value of a preset loss function; using the loss value to modify the network parameter by a back propagation algorithm; enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number; splitting the initial training model to split the encoder from the initial training model as a data encoding model. The invention also discloses a data coding model generation system, equipment and a storage medium based on the report sheet. The data coding model generated by the embodiment of the invention can learn the nonlinear characteristic representation, and is beneficial to the improvement of the effect of the subsequent task algorithm.

Description

Report sheet-based data coding model generation method, system and equipment

Technical Field

The invention relates to the field of data coding, in particular to a report-sheet-based data coding model generation method, system and device.

Background

Currently, result analysis corresponding to a medical detection report mainly analyzes result values of detection items in a certain type of report, and the detected result values are compared with statistical reference values to obtain a final report result. Most of the results of the report are documented through extensive testing and clinical performance during patient treatment, but there is still much research and mining space to examine the results of the report. At a certain specific time point, the examinees are detected by a plurality of detection methods, so that the accuracy of the detection result can be provided, the current state of the organism can be more comprehensively known, and more detailed physical data of the patient can be provided for clinical treatment. But as the number of test items and accumulated reports increases, the challenges become greater. The main reason is that human biological status information is projected into a high-dimensional data space through a detection result, and it is increasingly difficult to analyze correlations between detection items and clinical manifestations through a conventional statistical method, and the characteristic engineering efficiency of the detection items is low, resulting in a long and expensive whole detection item data analysis process, and therefore, it is urgent to wait for a data coding model to code the detection item data to extract data characteristics of a detection report.

Disclosure of Invention

The embodiment of the invention aims to provide a report-sheet-based data coding model generation method, a report-sheet-based data coding model generation system, report-sheet-based data coding model generation equipment and a report-sheet-based storage medium, wherein the generated data coding model can learn nonlinear feature representation, and is beneficial to improving the effect of a subsequent task algorithm.

In order to achieve the above object, an embodiment of the present invention provides a method for generating a data coding model based on a report, including:

initializing network parameters in a pre-constructed initial training model; wherein the initial training model comprises an encoder and a decoder, and the network parameters comprise encoder parameters and decoder parameters;

enabling the initial training model to enter one cycle iteration according to the first preset cycle times;

calculating a loss value of a preset loss function;

using the loss value to modify the network parameter by a back propagation algorithm;

enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;

splitting the initial training model to split the encoder from the initial training model as a data encoding model.

Compared with the prior art, the data coding model generation method based on the report disclosed by the embodiment of the invention comprises the following steps of firstly, initializing network parameters in a pre-constructed initial training model; then, enabling the initial training model to enter a first cycle iteration according to a first preset cycle number, calculating a loss value of a preset loss function by using the initial training model, using the loss value for correcting the network parameter through a back propagation algorithm, and enabling the initial training model to enter a second cycle iteration according to a second preset cycle number; and finally, splitting the initial training model, and splitting the encoder from the initial training model to be used as a data coding model. The data coding model generated by the report-sheet-based data coding model generation method can learn nonlinear feature representation, is beneficial to improving the effect of a subsequent task algorithm, adopts an unsupervised algorithm, is convenient to operate, and can save a large amount of labor marking cost.

As an improvement of the above solution, the encoder is configured to input result list data obtained by encoding data in a report in advance, so as to output a mean and a variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process.

As an improvement of the above scheme, the calculating a loss value of the preset loss function specifically includes:

sampling a group of random numbers in a preset standard normal distribution;

adding the mean value and the variance to the random number respectively to obtain a latent variable;

and inputting the latent variable into the decoder, and calculating a loss value through a preset loss function.

As an improvement of the above, the decoder is configured to input the latent variable to output regenerated result list data.

As an improvement of the above, the method further comprises:

and adjusting network parameters of the initial training model by using a random gradient descent algorithm.

In order to achieve the above object, an embodiment of the present invention further provides a system for generating a data coding model based on a report, including:

the network parameter initialization module is used for initializing network parameters in a pre-constructed initial training model; wherein the initial training model comprises an encoder and a decoder, and the network parameters comprise encoder parameters and decoder parameters;

the primary cycle iteration module is used for enabling the initial training model to enter primary cycle iteration according to a first preset cycle number;

the loss value calculating module is used for calculating the loss value of the preset loss function;

a network parameter correction module for using the loss value to correct the network parameter through a back propagation algorithm;

the secondary cycle iteration module is used for enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;

and the data coding model generation module is used for splitting the initial training model so as to split the encoder from the initial training model as a data coding model.

Compared with the prior art, the data coding model generation system based on the report disclosed by the embodiment of the invention comprises the following steps that firstly, a network parameter initialization module initializes network parameters in a pre-constructed initial training model; then, the primary cycle iteration module enables the initial training model to enter primary cycle iteration according to a first preset cycle number, the loss value calculation module calculates a loss value of a preset loss function, the network parameter correction module uses the loss value for correcting the network parameter through a back propagation algorithm, and the secondary cycle iteration module enables the initial training model to enter secondary cycle iteration according to a second preset cycle number; and finally, splitting the initial training model by a data coding model generating module, and splitting the encoder serving as a data coding model in the initial training model. The data coding model generated by the report-based data coding model generation system can learn the nonlinear feature representation, is beneficial to improving the effect of the subsequent report task algorithm, adopts the unsupervised algorithm, is convenient to operate, and can save a large amount of labor marking cost.

As an improvement of the above solution, the encoder is configured to input result list data obtained by encoding data centered in advance, so as to output a mean and a variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process.

As an improvement of the above scheme, the loss value calculating module is specifically configured to:

sampling a group of random numbers in a preset standard normal distribution;

In order to achieve the above object, an embodiment of the present invention further provides a report-based data coding model generating device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the report-based data coding model generating device implements the report-based data coding model generating method according to any one of the above embodiments

In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the report-sheet-based data coding model generating method according to any one of the above embodiments.

Drawings

FIG. 1 is a flow chart of a report-based data coding model generation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a main frame of a variational self-encoder according to an embodiment of the present invention;

FIG. 3 is a block diagram of an initial training model according to an embodiment of the present invention;

FIG. 4 is a block diagram of a report-based data coding model generation system according to an embodiment of the present invention;

fig. 5 is a block diagram of a data coding model generating device based on a report according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a report-based data coding model generation method according to an embodiment of the present invention; the report-sheet-based data coding model generation method comprises the following steps:

s1, initializing network parameters in a pre-constructed initial training model; wherein the initial training model comprises an encoder and a decoder, and the network parameters comprise encoder parameters and decoder parameters;

s2, enabling the initial training model to enter one cycle iteration according to the first preset cycle times;

s3, calculating the loss value of the preset loss function;

s4, using the loss value for correcting the network parameter through a back propagation algorithm;

s5, enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;

and S6, splitting the initial training model to split the encoder as a data coding model from the initial training model.

It should be noted that the report-based data coding model generation method according to the embodiment of the present invention is used for generating a data coding model, and the data coding model can code data in a report so as to complete analysis of feature information in the report. Illustratively, the report sheet is a detection report of a patient, and the report sheet can be an electronic version report sheet or an electronic version report sheet generated after a paper version report sheet (handwritten by doctors/patients) is automatically identified by a machine, so that information in the report sheet can be automatically extracted, and further detailed data in the report sheet can be determined. It should be noted that, the process of identifying/extracting information from the report sheet may refer to a data processing process in the prior art, and the present invention is not limited thereto.

It should be noted that the initial training model is evolved from a variational self-coder, which is abbreviated as VAE, and is one of the most promising methods for unsupervised learning. The VAE body architecture is shown in fig. 2, inherits the architecture of a conventional autoencoder, which consists of two parts, an encoder and a decoder. The encoder is an inference model and mainly completes data encoding and feature extraction, and the decoder is a generation model and completes data sampling generation. A data generation distribution is learned using a variational self-encoder (VAE) and allows samples to be randomly drawn from the underlying space, which can then be decoded using a decoder network to generate unique data with features similar to those of the training network.

In order to solve the variation inference problem, VAE is mainly a Monte Carlo Markov Chain (MCMC) and a Variation Inference (VI) method. A distribution q (z) is used in the variational inference to approximate the posterior distribution p (z | x), and the model is optimized by minimizing the KL divergence between the two distributions q (z) and p (z | x), the formula for the KL divergence being shown below:

since p (x) is an unknown constant, ELBO can be indirectly maximized to optimize the entire network. Since the VAE optimizes the model by optimizing the ELBO, its encoding mode and decoding model can be trained simultaneously. The variational encoder uses a re-parameterization technique to solve the problems of calculation and gradient back-transmission of the KL divergence. Unlike conventional self-coder models, instead of generating one implicit vector at a time, two vectors are generated, one representing the mean and one representing the standard deviation, and the implicit vector is then synthesized from the two statistics and a random noise that follows a standard normal distribution.

Referring to fig. 3, fig. 3 is a block diagram of an initial training model according to an embodiment of the present invention. The initial training model consists of an encoder and a decoder, both of which are comprised of processing of data using a deep neural network. Illustratively, the left side of the dashed box below N (0, 1) in fig. 3 is the encoder, and the right side is the decoder.

The encoder employs a conventional 2D convolution operation or a full join operator. The encoder is used for inputting result list data obtained by encoding data in a report in advance so as to output the mean value and the variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process. When the encoder inputs the coded nominal variable and the coded detection result data, the nominal variable and the detection result data are coded in advance, and the encoder codes the coded nominal variable and the coded detection result data again to generate a mean value and a variance of the result list data.

Specifically, the process of encoding the nominal variables and the detection result data includes steps S11 to S15.

And S11, acquiring nominal variables in the detection items, and coding the nominal variables according to the value number of the nominal variables.

Determining the value quantity of each nominal variable according to a preset value rule; judging whether the value quantity of the current nominal variable is larger than or equal to a preset value quantity threshold value or not; if so, encoding the nominal variable by adopting a Hash encoding mode; if not, coding the nominal variable by adopting a single hot coding mode.

And S12, acquiring detection result data in the detection items, and preprocessing the detection result data according to the type of the detection result data.

When the type of the detection result data is continuous data, carrying out normalization processing on the detection result data; and when the type of the detection result data is discrete data, carrying out space equidistant coding processing on the detection result data within a preset set value.

And S13, encoding the detection result data after the preprocessing. The coding modes are four, and are respectively as follows: vector dimension encoding, time dimension encoding, matrix dimension encoding, tensor dimension encoding.

The first scheme is as follows: vector dimension coding, namely transversely arranging the detection result data according to a preset detection item; the detection result data corresponding to the detection items which are not detected currently are empty, and the positions of the detection result data in the arrangement are reserved; the detection items, i.e. the unique identifications of the detection items in the laboratory, are generally arranged in order, so that the writing and reading back of the program coding result are convenient.

Scheme II: and time dimension coding, namely sequencing the detection result data according to the time for generating the detection result data. But the items without detection result need to be eliminated. For example, for 2000 detection items, 7 barcodes are detected, and then only 7 detection result data subjected to normalization/space equidistant coding processing are in the vector.

The third scheme is as follows: matrix dimension coding, namely arranging the detection result data according to a preset arrangement rule; and the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result data. Specifically, the detection result data of the master barcode is arranged in a two-dimensional table manner. Because the results of the detection items have correlation, whether the arrangement of the detection items in the two-dimensional table is reasonable may hinder the neural network from extracting the relevant information, and the arrangement rule of the detection items needs to be specially designed.

And the scheme is as follows: tensor dimension coding, namely sequencing the detection result data according to a preset three-dimensional model; wherein the three-dimensional model is presented in the form of a three-dimensional table (tensor), said three-dimensional model comprising a number of slices (channels) representing different test packages, each of said slices comprising a number of said test result data.

And S14, randomly scrambling the coded detection result data.

The analysis result of the report sheet of the same main bar code should not be influenced by the arrangement of the detection items, that is, the arrangement sequence in the schemes 1 to 4 should not influence the whole analysis result, so that the encoded data is allowed to be randomly disordered in different dimensions before being sent to the deep learning model. For example, in the scheme 2, the sequence of the detection items should be randomly adjustable, the subject in the scheme 3 may be randomly disordered left and right, the subject in the scheme 4 is randomly disordered in the slice dimension (channel), and the analysis values before and after the disorder can ensure the self-consistency.

And S15, combining the coded nominal variable, the coded detection result data and the randomly scrambled coded detection result data to output the coding result of the detection item.

The network of the decoder adopts conventional 2D convolution operation or full join operator, and does not contain BN operation operator. The decoder is used for inputting the latent variable to output regenerated result list data.

Illustratively, the whole network model loss function mainly comprises two parts, namely a reconstruction loss function and a regularization loss function, wherein the reconstruction loss function mainly has the function of ensuring that the distribution of data generated by a decoder and the distribution of real data are consistent as much as possible, the loss function is calculated by adopting cross entropy, and the regularization loss function mainly has the function of restraining the data distribution of latent variables sampled by an encoder and the standard normal distribution to be consistent. The overall loss function is represented as a sub-formula:

wherein x represents the input data, i.e. the encoded result list; z represents the value of the encoded latent variable, i.e. the result of the sampling step of the normal distribution in the structure diagram; θ represents a parameter of the decoder; phi represents the parameters of the encoder; p_θRepresenting a decoder or generating a network; q. q.s_φRepresents an encoder; p (z) represents the sampling from a standard normal distribution; x-pdata represents sampled data from the result list dataset that needs to be trained; z to q_φ(zx) represents the result of sampling the latent variable z when the input data is x, β isThe hyper-parameters, which are used primarily to adjust the weighting of the KL divergence loss in the overall loss, can be used to control the unwrapping strength between different dimensions of the resulting manifest latent variables.

Specifically, in step S1, the encoder parameters and decoder parameters are initialized using truncated random gaussians.

Specifically, in step S2, an iterative loop is entered, where the number of the first preset loop is n epochs, and a specific value of n is an empirical parameter.

Specifically, in step S3, the result list data of a batch in the training set is read into the memory, and the loss value of the predetermined loss function is calculated. The predetermined loss function is the loss function in the above equation (2).

Preferably, the calculating the loss value of the preset loss function specifically includes: sampling a group of random numbers in a preset standard normal distribution; adding the mean value and the variance to the random number respectively to obtain a latent variable; and inputting the latent variable into the decoder, and calculating a loss value through a preset loss function.

Specifically, in step S4, the loss value is used to correct the encoder parameter and the decoder parameter by a back propagation algorithm.

Specifically, in step S5, after the back propagation algorithm is completed, a loop is performed until the number of iterations reaches a second preset number of loops.

Specifically, in step S6, the trained initial training model is frozen and pruned, the split encoder is the best available data encoder model, the model inputs result list data, and outputs the dense feature vector after dimensionality reduction. The split decoder is a result list generation model, random noise of multivariate Gaussian is input into the model, and the output is the generated result list model.

Further, a random gradient descent algorithm is used for carrying out network parameter optimization adjustment on the initial training model. Illustratively, the stochastic gradient descent algorithm is SGD, and the learning rate is 0.0001.

Compared with the prior art, the data coding model generation method based on the report disclosed by the embodiment of the invention comprises the following steps of firstly, initializing network parameters in a pre-constructed initial training model; then, enabling the initial training model to enter a first cycle iteration according to a first preset cycle number, calculating a loss value of a preset loss function by using the initial training model, using the loss value for correcting the network parameter through a back propagation algorithm, and enabling the initial training model to enter a second cycle iteration according to a second preset cycle number; and finally, splitting the initial training model, and splitting the encoder from the initial training model to be used as a data coding model.

The data coding model generated by the report-sheet-based data coding model generation method can learn nonlinear feature representation, is beneficial to improving the effect of a subsequent task algorithm, adopts an unsupervised algorithm, is convenient to operate, and can save a large amount of labor marking cost; compared with the early self-coding feature learning method, more feature information can be extracted, and the spatial detangling performance of latent variables is superior to that of the traditional self-coder; compared with the GANS network method, the variational self-coder method has more stable training process and less time required; the dimension reduction can be carried out on the data, and the length of the learned characteristic variable can be adjusted according to the actual requirement.

Referring to fig. 4, fig. 4 is a block diagram of a coding model generation system 10 according to an embodiment of the present invention, where the coding model generation system 10 includes:

a network parameter initialization module 11, configured to initialize network parameters in a pre-constructed initial training model; wherein the initial training model comprises an encoder and a decoder, and the network parameters comprise encoder parameters and decoder parameters;

a first iteration-by-loop module 12, configured to make the initial training model enter a first iteration-by-loop according to a first preset number of cycles;

a loss value calculation module 13, configured to calculate a loss value of a preset loss function;

a network parameter modification module 14, configured to use the loss value to modify the network parameter through a back propagation algorithm;

the secondary cycle iteration module 15 is configured to enable the initial training model to enter secondary cycle iteration according to a second preset cycle number;

and a data coding model generating module 16, configured to split the initial training model, so as to split the encoder from the initial training model as a data coding model.

It should be noted that the report-based data coding model generation system 10 according to the embodiment of the present invention is used for generating a data coding model, which is capable of coding data in a report so as to complete analysis of characteristic information in the report. Illustratively, the report sheet is a detection report of a patient, and the report sheet can be an electronic version report sheet or an electronic version report sheet generated after a paper version report sheet (handwritten by doctors/patients) is automatically identified by a machine, so that information in the report sheet can be automatically extracted, and further detailed data in the report sheet can be determined. It should be noted that, the process of identifying/extracting information from the report sheet may refer to a data processing process in the prior art, and the present invention is not limited thereto.

The initial training model includes an encoder and a decoder, both of which are used for data processing using a deep neural network. The encoder employs a conventional 2D convolution operation or a full join operator. The encoder is used for inputting result list data obtained by encoding data in a report in advance so as to output the mean value and the variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process. When the encoder inputs the coded nominal variable and the coded detection result data, the nominal variable and the detection result data are coded in advance, and the encoder codes the coded nominal variable and the coded detection result data again to generate a mean value and a variance of the result list data.

And S14, randomly scrambling the coded detection result data.

wherein x represents the input data, i.e. the encoded result list; z represents the value of the encoded latent variable, i.e. the result of the sampling step of the normal distribution in the structure diagram; θ represents a parameter of the decoder; phi represents the parameters of the encoder; p_θRepresenting a decoder or generating a network; q. q.s_φRepresents an encoder; p (z) represents the sampling from a standard normal distribution; x-pdata represents sampled data from the result list dataset that needs to be trained; z to q_φAnd β is a hyper-parameter which is mainly used for adjusting the weight of KL divergence loss in the overall loss and can be used for controlling the unwrapping strength between different dimensions of the latent variable of the result list.

Specifically, the network parameter initialization module 11 initializes the encoder parameters and the decoder parameters using a truncated random gaussian. The primary cycle iteration module 12 makes the initial training model enter primary cycle iteration according to a first preset cycle number, where the first preset cycle number is n epochs, and a specific value of n is an empirical parameter. The loss value calculation module 13 reads the result list data of a batch in the training set to the memory, and calculates the loss value of the preset loss function. The loss value calculation module 13 first samples a group of random numbers in a preset standard normal distribution, then adds the mean and the variance to the random numbers respectively to obtain a latent variable, and finally inputs the latent variable into the decoder to calculate a loss value through a preset loss function.

The network parameter modification module 14 uses the penalty values to modify the encoder parameters and the decoder parameters using a back propagation algorithm. And the secondary cycle iteration module 15 makes the initial training model enter secondary cycle iteration according to a second preset cycle number. The data coding model generation module 16 freezes and branches off the trained initial training model, the split encoder is the best available data encoder model, the model inputs result list data, and dense feature vectors after dimension reduction are output. The split decoder is a result list generation model, random noise of multivariate Gaussian is input into the model, and the output is the generated result list model.

Further, the network parameter modification module 14 is further configured to perform network parameter optimization adjustment on the initial training model by using a stochastic gradient descent algorithm. Illustratively, the stochastic gradient descent algorithm is SGD, and the learning rate is 0.0001.

Compared with the prior art, the coding model generation system 10 disclosed by the embodiment of the invention comprises the following steps that firstly, a network parameter initialization module 11 initializes network parameters in a pre-constructed initial training model; then, the primary cycle iteration module 12 makes the initial training model enter primary cycle iteration according to a first preset cycle number, the loss value calculation module 13 calculates a loss value of a preset loss function, the network parameter correction module 14 uses the loss value for correcting the network parameter through a back propagation algorithm, and the secondary cycle iteration module 15 makes the initial training model enter secondary cycle iteration according to a second preset cycle number; finally, the data coding model generation module 16 splits the initial training model, and splits the encoder from the initial training model as a data coding model.

The data coding model generated by the report-based data coding model generation system 10 can learn the nonlinear feature representation, is beneficial to improving the effect of a subsequent task algorithm, adopts an unsupervised algorithm, is convenient to operate, and can save a large amount of labor marking cost; compared with the early self-coding feature learning method, more feature information can be extracted, and the spatial detangling performance of latent variables is superior to that of the traditional self-coder; compared with the GANS network method, the variational self-coder method has more stable training process and less time required; the dimension reduction can be carried out on the data, and the length of the learned characteristic variable can be adjusted according to the actual requirement.

Referring to fig. 5, fig. 5 is a block diagram of a data coding model generating device 20 based on a report according to an embodiment of the present invention. The report sheet-based data encoding model generation device 20 of this embodiment includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21, when executing the computer program, implements the steps of the above-mentioned report-based data coding model generation method embodiment, such as steps S1 to S6 shown in fig. 1. Alternatively, the processor 21, when executing the computer program, implements the functions of the modules/units in the above-mentioned device embodiments, such as the network parameter initialization module 11.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program in the report-based data coding model generation apparatus 20. For example, the computer program may be divided into a network parameter initialization module 11, a primary loop iteration module 12, a loss value calculation module 13, a network parameter correction module 14, a secondary loop iteration module 15, and a data coding model generation module 16, and specific functions of each module refer to a specific working process of the report-based data coding model generation system 10 described in the foregoing embodiment, which is not described herein again.

The report-based data encoding model generating device 20 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The report-based data encoding model generating device 20 may include, but is not limited to, a processor 21 and a memory 22. Those skilled in the art will appreciate that the schematic diagram is merely an example of the report based data coding model generating device 20, does not constitute a limitation of the report based data coding model generating device 20, and may include more or less components than those shown, or combine some components, or different components, for example, the report based data coding model generating device 20 may further include an input output device, a network access device, a bus, etc.

The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor 21 may be any conventional processor or the like, and the processor 21 is a control center of the report based data coding model generating device 20, and various interfaces and lines are used to connect various parts of the entire report based data coding model generating device 20.

The memory 22 may be used for storing the computer programs and/or modules, and the processor 21 implements various functions of the report-based data coding model generation apparatus 20 by running or executing the computer programs and/or modules stored in the memory 22 and calling data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the modules/units integrated by the report-based data coding model generation device 20 can be stored in a computer readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A report-based data coding model generation method is characterized by comprising the following steps:

calculating a loss value of a preset loss function;

2. The report-based data coding model generation method of claim 1, wherein the encoder is configured to input result list data obtained by previously encoding data in a report to output a mean and a variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process.

3. The report-sheet-based data coding model generation method according to claim 2, wherein the calculating a loss value of the preset loss function specifically includes:

sampling a group of random numbers in a preset standard normal distribution;

4. The report-based data coding model generation method of claim 3, wherein the decoder is configured to input the latent variable to output regenerated result manifest data.

5. The report-sheet based data coding model generation method of claim 1, wherein the method further comprises:

6. A report-based data coding model generation system, comprising:

7. The report-based data coding model generation system of claim 6, wherein the encoder is configured to input result list data obtained by previously encoding data in a report to output a mean and a variance of the result list data; wherein the data in the report sheet comprises nominal variables of the detection items and detection result data, and the nominal variables comprise at least one of units of the detection items, names of adopted reagents and names of used detection equipment in the detection process.

8. The report-based data coding model generation system of claim 7, wherein the loss value calculation module is specifically configured to:

sampling a group of random numbers in a preset standard normal distribution;

9. A report-based data coding model generation device, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the report-based data coding model generation method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the report-based data coding model generation method according to any one of claims 1 to 5.