CN111489802A - Report coding model generation method, system, device and storage medium - Google Patents

Report coding model generation method, system, device and storage medium Download PDF

Info

Publication number
CN111489802A
CN111489802A CN202010242585.3A CN202010242585A CN111489802A CN 111489802 A CN111489802 A CN 111489802A CN 202010242585 A CN202010242585 A CN 202010242585A CN 111489802 A CN111489802 A CN 111489802A
Authority
CN
China
Prior art keywords
initial training
coding
report
model
training model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010242585.3A
Other languages
Chinese (zh)
Other versions
CN111489802B (en
Inventor
陶然
宋洪平
靳俊锐
易守艳
刘圣艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Kingmed Diagnostics Co ltd
Original Assignee
Chongqing Kingmed Diagnostics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Kingmed Diagnostics Co ltd filed Critical Chongqing Kingmed Diagnostics Co ltd
Priority to CN202010242585.3A priority Critical patent/CN111489802B/en
Publication of CN111489802A publication Critical patent/CN111489802A/en
Application granted granted Critical
Publication of CN111489802B publication Critical patent/CN111489802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a report coding model generation method, which comprises the following steps: initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters; enabling the initial training model to enter one cycle iteration according to the first preset cycle times; calculating a loss value of a preset loss function; using the loss value for correcting the network parameter through a back propagation algorithm; enabling the initial training model to enter secondary cycle iteration according to the second preset cycle number; and splitting the initial training model to split the encoder from the initial training model as an encoding model. The invention also discloses a report coding model generation system, equipment and a storage medium. The coding model generated by the embodiment of the invention can learn the nonlinear characteristic representation, and is beneficial to improving the effect of the subsequent task algorithm.

Description

Report coding model generation method, system, device and storage medium
Technical Field
The present invention relates to the field of data coding, and in particular, to a report coding model generation method, system, device, and storage medium.
Background
Currently, result analysis corresponding to a medical detection report mainly analyzes result values of detection items in a certain type of report, and the detected result values are compared with statistical reference values to obtain a final report result. Most of the results of the report are documented through extensive testing and clinical performance during patient treatment, but there is still much research and mining space to examine the results of the report. At a certain specific time point, the examinees are detected by a plurality of detection methods, so that the accuracy of the detection result can be provided, the current state of the organism can be more comprehensively known, and more detailed physical data of the patient can be provided for clinical treatment. But as the number of test items and accumulated reports increases, the challenges become greater. The main reason is that human biological status information is projected into a high-dimensional data space through a detection result, and it is increasingly difficult to analyze correlations between detection items and clinical manifestations through a conventional statistical method, and the characteristic engineering efficiency of the detection items is low, resulting in a long and expensive whole detection item data analysis process, and therefore, it is urgent to wait for a coding model to be able to code the detection item data to extract data characteristics of a detection report.
Disclosure of Invention
The embodiment of the invention aims to provide a report coding model generation method, a report coding model generation system, report coding model generation equipment and a storage medium, wherein the generated coding model can learn nonlinear feature representation, the improvement of the effect of a subsequent task algorithm is facilitated, and the adopted generation network can extract abundant information features with individual styles.
In order to achieve the above object, an embodiment of the present invention provides a report coding model generating method, including:
initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
enabling the initial training model to enter one cycle iteration according to the first preset cycle times;
calculating a loss value of a preset loss function by using the initial training model;
using the loss value to modify the network parameter by a back propagation algorithm;
enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;
splitting the initial training model to split the encoder from the initial training model as an encoding model;
the encoder is used for inputting the nominal variable and the detection result data in the report sheet so as to output a latent variable; wherein the nominal variable comprises at least one of a unit of the test item, a name of a reagent used, and a name of a test device used in a test process;
the generator is used for inputting the latent variable and the condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report.
Compared with the prior art, the report coding model generation method disclosed by the embodiment of the invention comprises the following steps of firstly, initializing network parameters in a pre-constructed initial training model; then, enabling the initial training model to enter a first cycle iteration according to a first preset cycle number, calculating a loss value of a preset loss function by using the initial training model, using the loss value for correcting the network parameter through a back propagation algorithm, and enabling the initial training model to enter a second cycle iteration according to a second preset cycle number; and finally, splitting the initial training model, and splitting the encoder from the initial training model to be used as a coding model. The coding model generated by the report coding model generation method provided by the embodiment of the invention can learn nonlinear feature representation, which is beneficial to the improvement of the effect of a subsequent task algorithm, and the adopted generation network can extract abundant information features with individual styles.
As an improvement of the above solution, the feature discriminator is configured to reconstruct the discrimination between the result list data and the real result list data, and transmit the discriminated gradient information back to the encoder and the generator, so that the encoder and the generator modify their own network parameters.
As an improvement of the above, the encoding discriminator is configured to make the data distribution of the latent variable coincide with a gaussian distribution.
As an improvement of the above, the method further comprises:
and adjusting network parameters of the initial training model by using a random gradient descent algorithm.
In order to achieve the above object, an embodiment of the present invention further provides a report encoding model generating system, including:
the network parameter initialization module is used for initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
the primary cycle iteration module is used for enabling the initial training model to enter primary cycle iteration according to a first preset cycle number;
the loss value calculation module is used for calculating the loss value of a preset loss function by using the initial training model;
a network parameter correction module for using the loss value to correct the network parameter through a back propagation algorithm;
the secondary cycle iteration module is used for enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;
the coding model generation module is used for splitting the initial training model so as to split the coder serving as a coding model from the initial training model;
the encoder is used for inputting the nominal variable and the detection result data in the report sheet so as to output a latent variable; wherein the nominal variable comprises at least one of a unit of the test item, a name of a reagent used, and a name of a test device used in a test process;
the generator is used for inputting the latent variable and the condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report.
Compared with the prior art, the report coding model generation system disclosed by the embodiment of the invention comprises the following steps that firstly, a network parameter initialization module initializes network parameters in a pre-constructed initial training model; then, the primary cycle iteration module enables the initial training model to enter primary cycle iteration according to a first preset cycle number, the loss value calculation module calculates a loss value of a preset loss function by using the initial training model, the network parameter correction module uses the loss value for correcting the network parameter through a back propagation algorithm, and the secondary cycle iteration module enables the initial training model to enter secondary cycle iteration according to a second preset cycle number; and finally, splitting the initial training model by a coding model generation module, and splitting the encoder serving as a coding model in the initial training model. The coding model generated by the report coding model generation system provided by the embodiment of the invention can learn nonlinear feature representation, which is beneficial to the improvement of the effect of a subsequent task algorithm, and the adopted generation network can extract abundant information features with individual styles.
As an improvement of the above solution, the feature discriminator is configured to reconstruct the discrimination between the result list data and the real result list data, and transmit the discriminated gradient information back to the encoder and the generator, so that the encoder and the generator modify their own network parameters.
As an improvement of the above, the encoding discriminator is configured to make the data distribution of the latent variable coincide with a gaussian distribution.
In order to achieve the above object, an embodiment of the present invention further provides a report coding model generation device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the report coding model generation method according to any one of the above embodiments
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the report coding model generation method according to any one of the above embodiments.
Drawings
FIG. 1 is a flow chart of a report coding model generation method according to an embodiment of the present invention;
FIG. 2 is a block diagram of an initial training model according to an embodiment of the present invention;
FIG. 3 is a block diagram of a report encoding model generation system according to an embodiment of the present invention;
fig. 4 is a block diagram of a report coding model generation device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a report coding model generation method according to an embodiment of the present invention; the report coding model generation method comprises the following steps:
s1, initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
s2, enabling the initial training model to enter one cycle iteration according to the first preset cycle times;
s3, calculating a loss value of a preset loss function by using the initial training model;
s4, using the loss value for correcting the network parameter through a back propagation algorithm;
s5, enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;
and S6, splitting the initial training model to split the encoder as an encoding model in the initial training model.
It should be noted that the report coding model generation method according to the embodiment of the present invention is used for generating a coding model, and the coding model can code data in a report to complete analysis of feature information in the report. Illustratively, the report sheet is a detection report of a patient, and the report sheet can be an electronic version report sheet or an electronic version report sheet generated after a paper version report sheet (handwritten by doctors/patients) is automatically identified by a machine, so that information in the report sheet can be automatically extracted, and further detailed data in the report sheet can be determined. It should be noted that, the process of identifying/extracting information from the report sheet may refer to a data processing process in the prior art, and the present invention is not limited thereto.
It is worth to be noted that the initial training model is evolved by generating an antagonistic network, referred to as GAN for short, which is composed of two networks, a generator network and a discriminator network. The two networks may be neural networks (from convolutional neural networks, recurrent neural networks to autoencoders). In this arrangement, the two networks participate in a competitive game and attempt to pass each other while helping each other perform their tasks. After thousands of iterations, the generator network becomes perfect in generating realistic false images if everything goes well, while the discriminator network becomes perfect in judging whether the image displayed to it is false or true (i.e., the discrimination process). In other words, the generator network converts a random noise vector from a potential space (not all GAN samples from a potential space) into samples of a real data set.
Referring to fig. 2, fig. 2 is a block diagram of an initial training model according to an embodiment of the present invention. The initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator.
The encoder adopts conventional convolution operation or full join operator, and does not contain BN operation operator. The coder is used for inputting the nominal variable which is not precoded or precoded and the detection result data in the report sheet so as to output latent variables; wherein the nominal variable comprises at least one of a unit of the item of test, a name of a reagent used, and a name of a test device used in a test procedure. When the coder inputs the nominal variable and the detection result data which are not pre-coded, the coder does not process the nominal variable and the detection result data in advance, and the coder directly codes the nominal variable and the detection result data to generate the latent variable; when the coder inputs the nominal variable and the detection result data which are precoded, the nominal variable and the detection result data are precoded in advance, and the coder encodes the nominal variable and the detection result data which are precoded again to generate the latent variable.
Specifically, the process of pre-coding the nominal variable and the detection result data includes steps S111 to S115.
And S111, acquiring nominal variables in the detection items, and coding the nominal variables according to the value quantity of the nominal variables.
Determining the value quantity of each nominal variable according to a preset value rule; judging whether the value quantity of the current nominal variable is larger than or equal to a preset value quantity threshold value or not; if so, encoding the nominal variable by adopting a Hash encoding mode; if not, coding the nominal variable by adopting a single hot coding mode.
And S112, acquiring detection result data in the detection items, and preprocessing the detection result data according to the type of the detection result data.
When the type of the detection result data is continuous data, carrying out normalization processing on the detection result data; and when the type of the detection result data is discrete data, carrying out space equidistant coding processing on the detection result data within a preset set value.
And S113, encoding the detection result data after the preprocessing. The coding modes are four, and are respectively as follows: vector dimension encoding, time dimension encoding, matrix dimension encoding, tensor dimension encoding.
The first scheme is as follows: vector dimension coding, namely transversely arranging the detection result data according to a preset detection item; the detection result data corresponding to the detection items which are not detected currently are empty, and the positions of the detection result data in the arrangement are reserved; the detection items, i.e. the unique identifications of the detection items in the laboratory, are generally arranged in order, so that the writing and reading back of the program coding result are convenient.
Scheme II: and time dimension coding, namely sequencing the detection result data according to the time for generating the detection result data. But the items without detection result need to be eliminated. For example, for 2000 detection items, 7 barcodes are detected, and then only 7 detection result data subjected to normalization/space equidistant coding processing are in the vector.
The third scheme is as follows: matrix dimension coding, namely arranging the detection result data according to a preset arrangement rule; and the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result data. Specifically, the detection result data of the master barcode is arranged in a two-dimensional table manner. Because the results of the detection items have correlation, whether the arrangement of the detection items in the two-dimensional table is reasonable may hinder the neural network from extracting the relevant information, and the arrangement rule of the detection items needs to be specially designed.
And the scheme is as follows: tensor dimension coding, namely sequencing the detection result data according to a preset three-dimensional model; wherein the three-dimensional model is presented in the form of a three-dimensional table (tensor), said three-dimensional model comprising a number of slices (channels) representing different test packages, each of said slices comprising a number of said test result data.
And S114, randomly scrambling the coded detection result data.
The analysis result of the report sheet of the same main bar code should not be influenced by the arrangement of the detection items, that is, the arrangement sequence in the schemes 1 to 4 should not influence the whole analysis result, so that the encoded data is allowed to be randomly disordered in different dimensions before being sent to the deep learning model. For example, in the scheme 2, the sequence of the detection items should be randomly adjustable, the subject in the scheme 3 may be randomly disordered left and right, the subject in the scheme 4 is randomly disordered in the slice dimension (channel), and the analysis values before and after the disorder can ensure the self-consistency.
And S115, combining the coded nominal variable, the coded detection result data and the randomly scrambled coded detection result data to output a coding result of the detection item.
The network of the generator adopts conventional convolution operation or full connection operators and does not contain BN operation operators. The generator is used for inputting the latent variable and the non-precoded or precoded condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report. When the generator inputs the latent variable and the condition variable which is not pre-coded, the generator does not process the condition variable in advance, and the generator directly regenerates the latent variable and the condition variable to obtain the result list data. When the generator inputs the latent variable and the precoded condition variable, the generator precodes the condition variable in advance, and the generator regenerates the latent variable and the precoded condition variable to obtain the result list data.
Specifically, the process of pre-coding the condition variables includes steps S121 to S124.
And S121, carrying out hidden variable assignment on the detection items and the corresponding user information in the report sheet to generate corresponding item hidden variables and user hidden variables.
The implicit variables to be initialized are divided into two groups, one group is used for expressing the patient and is characterized as user implicit variables, and the other group is used for expressing the detection items and is characterized as item implicit variables. The vector length of the two groups of variables is temporarily set to be 10 according to experience, and the later period can be adjusted according to the scale of actual data, the training time of the model and the size of a final loss function. Illustratively, the random number generated by the truncated standard Gaussian distribution is used for carrying out hidden variable assignment on the detection items and the corresponding user information in the report.
S122, calculating inner product predicted values of the project hidden variables and the user hidden variables; the following formula is satisfied:
Figure BDA0002433053790000091
wherein R isUIThe inner product predicted value is obtained; pUA hidden variable matrix for the user; qIA project hidden variable matrix; k is the number of rows; pU,KImplicit variable matrix P for a userUThe kth line of data; qK,IFor the item latent variable matrix QIThe kth line of (1).
S123, adopting the deviation degree of the inner product predicted value and the actual value of the detection item as a loss value of the coding model, and satisfying the following formula:
Figure BDA0002433053790000092
wherein C is the loss value and is used for measuring the direct deviation degree of the inner product predicted value and the actual value;
Figure BDA0002433053790000093
is the actual value; and lambda is a regularization hyper-parameter of the model, is a constant and is used for preventing the over-fitting regularization item from appearing on the model, and is obtained by repeated experiments according to a specific application scene.
S124, judging whether the loss value is kept stable within a preset value range (namely when the loss value is not obviously reduced any more); and when the loss value is stable within a preset value range, outputting the coding model. And when the loss value is not kept stable within a preset numerical range, optimizing the parameters of the coding model until the loss value is kept stable within the preset numerical range, and outputting the coding model after the parameters are optimized.
The characteristic discriminator is used for reconstructing the discrimination of the result list data and the real result list data and returning the discriminated gradient information to the encoder and the generator so as to enable the encoder and the generator to modify the network parameters of the encoder and the generator.
And the coding discriminator is used for enabling the data distribution of the latent variable to be consistent with the Gaussian distribution. The manifold constraint of the latent variable is mainly completed, the data distribution of the latent variable can be kept consistent with Gaussian distribution as far as possible, and the data sampling and research of the latent variable at the later stage are facilitated.
Further, Raw _ data in fig. 2 represents the original medical examination report single result list data in the database, and Clean & transform represents the cleaning and transform coding of the original data. The data converted from the original cleaning can be divided into detection result value data and condition variables (user information, detection item information) by dimension. code embedding is a process of precoding (user information, detection item information). C represents the condition variable, Z represents the latent variable of different outputs, n represents the input adopted by gaussian, x represents the input, real x represents the real result list value, recon xr represents the result list value of network reconstruction, fake xg represents the result list value generated by network, and the circle in the right side represents the five types of loss functions of the model.
Illustratively, the overall network model loss function is mainly composed of five parts, as shown by the circular icons in FIG. 2, which are respectively the reconstruction loss function LRecon(circle 1 in FIG. 2), constant loss function LConst(5 th circle in FIG. 2), classification loss function LCate(3 rd circle in FIG. 2), characteristic loss function Lfeature(2 nd circle in FIG. 2), latent variable loss function Llatent(the 4 th circle in fig. 2).
The main function is to ensure the reconstruction of the result list data between the encoder and the generator generation module, ensure that the reconstructed result list data and the input result list data have little difference, and the loss function adopts L1L oss, and the specific formula is as follows:
Figure BDA0002433053790000101
wherein x represents input result list data, xrRepresenting reconstructed result manifest data, gθA generation network representing the generator and a network of generators,
Figure BDA0002433053790000102
representing output samples in time of the encoder's real result manifest data.
The constant loss function is mainly used for restraining the difference between the coding result of the real result list and the coding result of the data of the reconstruction result list and ensuring that the coding results before and after reconstruction of one result list are consistent, and the loss function samples L2L oss, and the specific formula is as follows:
LConst(z,z′)=||z-z′||2formula (4);
wherein z represents the encoding result of the real result list,
Figure BDA0002433053790000103
a sample representing the encoded result of the reconstructed result list.
Classification loss function: the method has the main effects of providing self-monitoring information, improving the authenticity of a result list output by a generating network, ensuring the data of a reconstructed result list, generating the result list data and the consistency of the data distribution of the real result list data. The loss function adopts a cross entropy function, and a specific formula is as follows:
LCate(c,x′)=c×(-logDc(C=c|x′))+(1-c)×(-log(1-Dc(C ═ C | x'))) formula (5); wherein c represents the one-hot coding of the category, and the category comprises three categories, namely real data, reconstructed data and generated data. DcRepresents the feature discriminator and x' represents the resulting inventory data input to the discriminator, which may be real data, reconstructed data or generated data.
Characteristic loss function: the main function is to capture the individual characteristic information of the result list to make up for the loss of the detailed information by the reconstruction loss function. The loss function adopts a countermeasure loss function, and the specific formula is as follows:
Figure BDA0002433053790000111
wherein D iscRepresenting said feature discriminator, xfData representing a list of results generated by the generating network, xfReconstruction data, x, representing a list of results generated by the generating networktRepresenting the actual result inventory data.
And (3) a coding discriminator: the manifold constraint of the latent variable z is mainly completed, and the data distribution of the latent variable can be kept consistent with Gaussian distribution as far as possible. The loss function adopts a countermeasure loss function, and the specific formula is as follows:
Figure BDA0002433053790000112
wherein D iswFor the code discriminator, n represents multivariate Gaussian distribution, the mean value is 0, and the covariance matrix trace is 1.
The overall loss function is formed by weighted addition of the five loss functions, lambda1To lambda4Is a preset weighted weight value, the loss values of all loss functions are kept on the same dimension as much as possible, and the formula is shown as follows
LTohybrid=LRecon1LConst2LCate3Lfeature4LlatentEquation (8).
Specifically, in step S1, the encoder parameters, the generator parameters, the feature discriminator parameters, and the encoding discriminator parameters are initialized using truncated random gaussians.
Specifically, in step S2, an iterative loop is entered, where the number of the first preset loop is n epochs, and a specific value of n is an empirical parameter.
Specifically, in step S3, the result list data of a batch in the training set is read into the memory, and the initial training model is used to calculate the loss value of the predetermined loss function. The predetermined loss function is the loss function in the above equations (3) to (8).
Specifically, in step S4, the applying the loss value to the network parameter correction through a back propagation algorithm specifically includes:
l will be calculated using a back propagation algorithmTohybridIs used to correct the encoder parameters and the generator parameters, L is applied using a back propagation algorithmfeatureIs used to modify the feature discriminator parameters, L is applied using a back propagation algorithmlatentIs used to modify the coded discriminator parameter.
Specifically, in step S5, after the three back propagation algorithms are completed, the steps are returned to, and a loop is performed until the number of iterations reaches a second preset loop number.
Specifically, in step S6, the trained initial training model is frozen and pruned, the split encoder is the best available encoder model, the model inputs result list data, and outputs the dense feature vector after dimensionality reduction. The split generation network is a result list generation model, random noise of multivariate Gaussian is input into the model, and the output is the generated result list model.
Further, a random gradient descent algorithm is used for carrying out network parameter optimization adjustment on the initial training model. Illustratively, the stochastic gradient descent algorithm is SGD, and the learning rate is 0.0001.
Compared with the prior art, the report coding model generation method disclosed by the embodiment of the invention comprises the following steps of firstly, initializing network parameters in a pre-constructed initial training model; then, enabling the initial training model to enter a first cycle iteration according to a first preset cycle number, calculating a loss value of a preset loss function by using the initial training model, using the loss value for correcting the network parameter through a back propagation algorithm, and enabling the initial training model to enter a second cycle iteration according to a second preset cycle number; and finally, splitting the initial training model, and splitting the encoder from the initial training model to be used as a coding model. The coding model generated by the report list coding model generation method can learn nonlinear feature representation, and is beneficial to improving the effect of a subsequent task algorithm; the method has the advantages that the method is free of supervision algorithm, does not need any data marking, is convenient to operate, and can save a large amount of labor marking cost; the adopted generation network can extract rich information characteristics with individual styles; the dimension reduction can be carried out on the data, and the length of the learned characteristic variable can be adjusted according to the actual requirement.
Referring to fig. 3, fig. 3 is a block diagram of a structure of a report coding model generation system 10 according to an embodiment of the present invention, where the report coding model generation system 10 includes:
a network parameter initialization module 11, configured to initialize network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
a first iteration-by-loop module 12, configured to make the initial training model enter a first iteration-by-loop according to a first preset number of cycles;
a loss value calculating module 13, configured to calculate a loss value of a preset loss function by using the initial training model;
a network parameter modification module 14, configured to use the loss value to modify the network parameter through a back propagation algorithm;
the secondary cycle iteration module 15 is configured to enable the initial training model to enter secondary cycle iteration according to a second preset cycle number;
and the coding model generation module 16 is configured to split the initial training model, so that the encoder is split from the initial training model as a coding model.
It should be noted that the report coding model generation system 10 according to the embodiment of the present invention is used for generating a coding model, which can code data in a report to complete analysis of characteristic information in the report. Illustratively, the report sheet is a detection report of a patient, and the report sheet can be an electronic version report sheet or an electronic version report sheet generated after a paper version report sheet (handwritten by doctors/patients) is automatically identified by a machine, so that information in the report sheet can be automatically extracted, and further detailed data in the report sheet can be determined. It should be noted that, the process of identifying/extracting information from the report sheet may refer to a data processing process in the prior art, and the present invention is not limited thereto.
Referring to fig. 2, fig. 2 is a block diagram of an initial training model according to an embodiment of the present invention. The initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator.
The encoder adopts conventional convolution operation or full join operator, and does not contain BN operation operator. The coder is used for inputting the nominal variable which is not precoded or precoded and the detection result data in the report sheet so as to output latent variables; wherein the nominal variable comprises at least one of a unit of the item of test, a name of a reagent used, and a name of a test device used in a test procedure. When the coder inputs the nominal variable and the detection result data which are not pre-coded, the coder does not process the nominal variable and the detection result data in advance, and the coder directly codes the nominal variable and the detection result data to generate the latent variable; when the coder inputs the nominal variable and the detection result data which are precoded, the nominal variable and the detection result data are precoded in advance, and the coder encodes the nominal variable and the detection result data which are precoded again to generate the latent variable.
Specifically, the process of pre-coding the nominal variable and the detection result data includes steps S111 to S115.
And S111, acquiring nominal variables in the detection items, and coding the nominal variables according to the value quantity of the nominal variables.
Determining the value quantity of each nominal variable according to a preset value rule; judging whether the value quantity of the current nominal variable is larger than or equal to a preset value quantity threshold value or not; if so, encoding the nominal variable by adopting a Hash encoding mode; if not, coding the nominal variable by adopting a single hot coding mode.
And S112, acquiring detection result data in the detection items, and preprocessing the detection result data according to the type of the detection result data.
When the type of the detection result data is continuous data, carrying out normalization processing on the detection result data; and when the type of the detection result data is discrete data, carrying out space equidistant coding processing on the detection result data within a preset set value.
And S113, encoding the detection result data after the preprocessing. The coding modes are four, and are respectively as follows: vector dimension encoding, time dimension encoding, matrix dimension encoding, tensor dimension encoding.
The first scheme is as follows: vector dimension coding, namely transversely arranging the detection result data according to a preset detection item; the detection result data corresponding to the detection items which are not detected currently are empty, and the positions of the detection result data in the arrangement are reserved; the detection items, i.e. the unique identifications of the detection items in the laboratory, are generally arranged in order, so that the writing and reading back of the program coding result are convenient.
Scheme II: and time dimension coding, namely sequencing the detection result data according to the time for generating the detection result data. But the items without detection result need to be eliminated. For example, for 2000 detection items, 7 barcodes are detected, and then only 7 detection result data subjected to normalization/space equidistant coding processing are in the vector.
The third scheme is as follows: matrix dimension coding, namely arranging the detection result data according to a preset arrangement rule; and the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result data. Specifically, the detection result data of the master barcode is arranged in a two-dimensional table manner. Because the results of the detection items have correlation, whether the arrangement of the detection items in the two-dimensional table is reasonable may hinder the neural network from extracting the relevant information, and the arrangement rule of the detection items needs to be specially designed.
And the scheme is as follows: tensor dimension coding, namely sequencing the detection result data according to a preset three-dimensional model; wherein the three-dimensional model is presented in the form of a three-dimensional table (tensor), said three-dimensional model comprising a number of slices (channels) representing different test packages, each of said slices comprising a number of said test result data.
And S114, randomly scrambling the coded detection result data.
The analysis result of the report sheet of the same main bar code should not be influenced by the arrangement of the detection items, that is, the arrangement sequence in the schemes 1 to 4 should not influence the whole analysis result, so that the encoded data is allowed to be randomly disordered in different dimensions before being sent to the deep learning model. For example, in the scheme 2, the sequence of the detection items should be randomly adjustable, the subject in the scheme 3 may be randomly disordered left and right, the subject in the scheme 4 is randomly disordered in the slice dimension (channel), and the analysis values before and after the disorder can ensure the self-consistency.
And S115, combining the coded nominal variable, the coded detection result data and the randomly scrambled coded detection result data to output a coding result of the detection item.
The network of the generator adopts conventional convolution operation or full connection operators and does not contain BN operation operators. The generator is used for inputting the latent variable and the non-precoded or precoded condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report. When the generator inputs the latent variable and the condition variable which is not pre-coded, the generator does not process the condition variable in advance, and the generator directly regenerates the latent variable and the condition variable to obtain the result list data. When the generator inputs the latent variable and the precoded condition variable, the generator precodes the condition variable in advance, and the generator regenerates the latent variable and the precoded condition variable to obtain the result list data.
Specifically, the process of pre-coding the condition variables includes steps S121 to S124.
And S121, carrying out hidden variable assignment on the detection items and the corresponding user information in the report sheet to generate corresponding item hidden variables and user hidden variables.
The implicit variables to be initialized are divided into two groups, one group is used for expressing the patient and is characterized as user implicit variables, and the other group is used for expressing the detection items and is characterized as item implicit variables. The vector length of the two groups of variables is temporarily set to be 10 according to experience, and the later period can be adjusted according to the scale of actual data, the training time of the model and the size of a final loss function. Illustratively, the random number generated by the truncated standard Gaussian distribution is used for carrying out hidden variable assignment on the detection items and the corresponding user information in the report.
S122, calculating inner product predicted values of the project hidden variables and the user hidden variables; the following formula is satisfied:
Figure BDA0002433053790000161
wherein R isUIThe inner product predicted value is obtained; pUA hidden variable matrix for the user; qIA project hidden variable matrix; k is the number of rows; pU,KImplicit variable matrix P for a userUThe kth line of data; qK,IFor the item latent variable matrix QIThe kth line of (1).
S123, adopting the deviation degree of the inner product predicted value and the actual value of the detection item as a loss value of the coding model, and satisfying the following formula:
Figure BDA0002433053790000162
wherein C is the loss value and is used for measuring the direct deviation degree of the inner product predicted value and the actual value;
Figure BDA0002433053790000163
is the actual value; and lambda is a regularization hyper-parameter of the model, is a constant and is used for preventing the over-fitting regularization item from appearing on the model, and is obtained by repeated experiments according to a specific application scene.
S124, judging whether the loss value is kept stable within a preset value range (namely when the loss value is not obviously reduced any more); and when the loss value is stable within a preset value range, outputting the coding model. And when the loss value is not kept stable within a preset numerical range, optimizing the parameters of the coding model until the loss value is kept stable within the preset numerical range, and outputting the coding model after the parameters are optimized.
The characteristic discriminator is used for reconstructing the discrimination of the result list data and the real result list data and returning the discriminated gradient information to the encoder and the generator so as to enable the encoder and the generator to modify the network parameters of the encoder and the generator.
And the coding discriminator is used for enabling the data distribution of the latent variable to be consistent with the Gaussian distribution. The manifold constraint of the latent variable is mainly completed, the data distribution of the latent variable can be kept consistent with Gaussian distribution as far as possible, and the data sampling and research of the latent variable at the later stage are facilitated.
Further, Raw _ data in fig. 2 represents the original medical examination report single result list data in the database, and Clean & transform represents the cleaning and transform coding of the original data. The data converted from the original cleaning can be divided into detection result value data and condition variables (user information, detection item information) by dimension. code embedding is a process of precoding (user information, detection item information). C represents the condition variable, Z represents the latent variable of different outputs, n represents the input adopted by gaussian, x represents the input, real x represents the real result list value, recon xr represents the result list value of network reconstruction, fake xg represents the result list value generated by network, and the circle in the right side represents the five types of loss functions of the model.
Illustratively, the overall network model loss function is mainly composed of five parts, as shown by the circular icons in FIG. 2, which are respectively the reconstruction loss function LRecon(circle 1 in FIG. 2), constant loss function LConst(5 th circle in FIG. 2), classification loss function LCate(3 rd circle in FIG. 2), characteristic loss function Lfeature(2 nd circle in FIG. 2), latent variable loss function Llatent(the 4 th circle in fig. 2).
The main function is to ensure the reconstruction of the result list data between the encoder and the generator generation module, ensure that the reconstructed result list data and the input result list data have little difference, and the loss function adopts L1L oss, and the specific formula is as follows:
Figure BDA0002433053790000181
wherein the content of the first and second substances,x represents input result list data, xrRepresenting reconstructed result manifest data, gθA generation network representing the generator and a network of generators,
Figure BDA0002433053790000182
representing output samples in time of the encoder's real result manifest data.
The constant loss function is mainly used for restraining the difference between the coding result of the real result list and the coding result of the data of the reconstruction result list and ensuring that the coding results before and after reconstruction of one result list are consistent, and the loss function samples L2L oss, and the specific formula is as follows:
LConst(z,z′)=||z-z′||2formula (4);
wherein z represents the encoding result of the real result list,
Figure BDA0002433053790000185
a sample representing the encoded result of the reconstructed result list.
Classification loss function: the method has the main effects of providing self-monitoring information, improving the authenticity of a result list output by a generating network, ensuring the data of a reconstructed result list, generating the result list data and the consistency of the data distribution of the real result list data. The loss function adopts a cross entropy function, and a specific formula is as follows:
LCate(c,x′)=c×(-logDc(C=c|x′))+(1-c)×(-log(1-Dc(C ═ C | x'))) formula (5); wherein c represents the one-hot coding of the category, and the category comprises three categories, namely real data, reconstructed data and generated data. DcRepresents the feature discriminator and x' represents the resulting inventory data input to the discriminator, which may be real data, reconstructed data or generated data.
Characteristic loss function: the main function is to capture the individual characteristic information of the result list to make up for the loss of the detailed information by the reconstruction loss function. The loss function adopts a countermeasure loss function, and the specific formula is as follows:
Figure BDA0002433053790000183
wherein D iscRepresenting said feature discriminator, xfData representing a list of results generated by the generating network, xfReconstruction data, x, representing a list of results generated by the generating networktRepresenting the actual result inventory data.
And (3) a coding discriminator: the manifold constraint of the latent variable z is mainly completed, and the data distribution of the latent variable can be kept consistent with Gaussian distribution as far as possible. The loss function adopts a countermeasure loss function, and the specific formula is as follows:
Figure BDA0002433053790000184
wherein D iswFor the code discriminator, n represents multivariate Gaussian distribution, the mean value is 0, and the covariance matrix trace is 1.
The overall loss function is formed by weighted addition of the five loss functions, lambda1To lambda4Is a preset weighted weight value, the loss values of all loss functions are kept on the same dimension as much as possible, and the formula is shown as follows
LTohybrid=LRecon1LConst2LCata3Lfeature4LlatentEquation (8).
The network parameter initialization module 11 initializes the encoder parameters, the generator parameters, the feature discriminator parameters and the encoding discriminator parameters by using a truncated random Gaussian, the primary loop iteration module 12 enables the initial training model to enter primary loop iteration according to a first preset loop time, the first preset loop time is n epochs, the specific value of n is an empirical parameter, the loss value calculation module 13 reads result list data of a batch in a training set to a memory, the initial training model is used for calculating the loss value of a preset loss function, and the network parameter correction module 14 uses a back propagation algorithm to LTohybridLoss value ofFor modifying the encoder parameters and the generator parameters L will be calculated using a back propagation algorithmfeatureIs used to modify the feature discriminator parameters, L using a back propagation algorithmlatentIs used to modify the coded discriminator parameter. And the secondary cycle iteration module 15 makes the initial training model enter secondary cycle iteration according to a second preset cycle number. The report coding model generation module 16 freezes and subtracts branches of the trained initial training model, the split encoder is the best available encoder model, the model inputs result list data, and outputs the dense feature vector after dimensionality reduction. The split generation network is a result list generation model, random noise of multivariate Gaussian is input into the model, and the output is the generated result list model.
Further, the network parameter modification module 14 is further configured to perform network parameter optimization adjustment on the initial training model by using a stochastic gradient descent algorithm. Illustratively, the stochastic gradient descent algorithm is SGD, and the learning rate is 0.0001.
Compared with the prior art, the report coding model generation system 10 disclosed by the embodiment of the invention comprises the following steps that firstly, a network parameter initialization module 11 initializes network parameters in a pre-constructed initial training model; then, the primary cycle iteration module 12 makes the initial training model enter primary cycle iteration according to a first preset cycle number, the loss value calculation module 13 calculates a loss value of a preset loss function by using the initial training model, the network parameter correction module 14 uses the loss value for correcting the network parameter through a back propagation algorithm, and the secondary cycle iteration module 15 makes the initial training model enter secondary cycle iteration according to a second preset cycle number; finally, the coding model generation module 16 splits the initial training model, and splits the encoder as a coding model in the initial training model. The coding model generated by the report coding model generation system 10 in the embodiment of the invention can learn the nonlinear feature representation, which is beneficial to the improvement of the effect of the subsequent task algorithm; the method has the advantages that the method is free of supervision algorithm, does not need any data marking, is convenient to operate, and can save a large amount of labor marking cost; the adopted generation network can extract rich information characteristics with individual styles; the dimension reduction can be carried out on the data, and the length of the learned characteristic variable can be adjusted according to the actual requirement.
Referring to fig. 4, fig. 4 is a block diagram of a report encoding model generating device 20 according to an embodiment of the present invention. The report sheet coding model generation device 20 of this embodiment includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21, when executing the computer program, implements the steps of the report encoding model generation method embodiments described above, such as steps S1-S6 shown in fig. 1. Alternatively, the processor 21, when executing the computer program, implements the functions of the modules/units in the above-mentioned device embodiments, such as the network parameter initialization module 11.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the report coding model generation apparatus 20. For example, the computer program may be divided into a network parameter initialization module 11, a primary loop iteration module 12, a loss value calculation module 13, a network parameter modification module 14, a secondary loop iteration module 15, and a coding model generation module 16, and for specific functions of each module, reference is made to the specific working process of the report coding model generation system 10 described in the foregoing embodiment, which is not described herein again.
The report encoding model generating device 20 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The report encoding model generating device 20 may include, but is not limited to, a processor 21 and a memory 22. Those skilled in the art will appreciate that the schematic diagram is merely an example of the report sheet coding model generation apparatus 20, does not constitute a limitation of the report sheet coding model generation apparatus 20, and may include more or less components than those shown, or combine some components, or different components, for example, the report sheet coding model generation apparatus 20 may further include an input-output device, a network access device, a bus, etc.
The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor 21 may be any conventional processor, etc., and the processor 21 is a control center of the report sheet coding model generating apparatus 20 and connects various parts of the entire report sheet coding model generating apparatus 20 by using various interfaces and lines.
The memory 22 may be used to store the computer programs and/or modules, and the processor 21 may implement various functions of the report encoding model generating apparatus 20 by executing or executing the computer programs and/or modules stored in the memory 22 and calling data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the modules/units integrated by the report coding model generation device 20 can be stored in a computer readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (9)

1. A report encoding model generation method, comprising:
initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
enabling the initial training model to enter one cycle iteration according to the first preset cycle times;
calculating a loss value of a preset loss function by using the initial training model;
using the loss value to modify the network parameter by a back propagation algorithm;
enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;
splitting the initial training model to split the encoder from the initial training model as an encoding model;
the encoder is used for inputting the nominal variable and the detection result data in the report sheet so as to output a latent variable; wherein the nominal variable comprises at least one of a unit of the test item, a name of a reagent used, and a name of a test device used in a test process;
the generator is used for inputting the latent variable and the condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report.
2. The report encoding model generation method of claim 1, wherein the feature discriminator is configured to reconstruct the discrimination of the result list data and the real result list data, and transmit the discriminated gradient information back to the encoder and the generator, so that the encoder and the generator modify their own network parameters.
3. The report encoding model generation method of claim 2, wherein the encoding discriminator is configured to make the data distribution of the latent variable coincide with a gaussian distribution.
4. The report encoding model generation method of claim 1, the method further comprising:
and adjusting network parameters of the initial training model by using a random gradient descent algorithm.
5. A report encoding model generation system, comprising:
the network parameter initialization module is used for initializing network parameters in a pre-constructed initial training model; the initial training model comprises an encoder, a generator, a feature discriminator and a coding discriminator, and the network parameters comprise encoder parameters, generator parameters, feature discriminator parameters and coding discriminator parameters;
the primary cycle iteration module is used for enabling the initial training model to enter primary cycle iteration according to a first preset cycle number;
the loss value calculation module is used for calculating the loss value of a preset loss function by using the initial training model;
a network parameter correction module for using the loss value to correct the network parameter through a back propagation algorithm;
the secondary cycle iteration module is used for enabling the initial training model to enter secondary cycle iteration according to a second preset cycle number;
the coding model generation module is used for splitting the initial training model so as to split the coder serving as a coding model from the initial training model;
the encoder is used for inputting the nominal variable and the detection result data in the report sheet so as to output a latent variable; wherein the nominal variable comprises at least one of a unit of the test item, a name of a reagent used, and a name of a test device used in a test process;
the generator is used for inputting the latent variable and the condition variable so as to output result list data; wherein the condition variable comprises user information and corresponding detection items in the report.
6. The report encoding model generation system of claim 5, wherein the feature discriminator is configured to reconstruct the discrimination of the result list data and the actual result list data and to transmit the discriminated gradient information back to the encoder and the generator, so that the encoder and the generator modify their own network parameters.
7. The report encoding model generation system of claim 5, wherein the encoding arbiter is configured to conform the data distribution of the latent variables to a Gaussian distribution.
8. A report coding model generation device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the report coding model generation method of any of claims 1 to 4 when executing the computer program.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the report encoding model generation method according to any one of claims 1 to 4.
CN202010242585.3A 2020-03-31 2020-03-31 Report coding model generation method, system, equipment and storage medium Active CN111489802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242585.3A CN111489802B (en) 2020-03-31 2020-03-31 Report coding model generation method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242585.3A CN111489802B (en) 2020-03-31 2020-03-31 Report coding model generation method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111489802A true CN111489802A (en) 2020-08-04
CN111489802B CN111489802B (en) 2023-07-25

Family

ID=71794533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242585.3A Active CN111489802B (en) 2020-03-31 2020-03-31 Report coding model generation method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111489802B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170199963A1 (en) * 2016-01-13 2017-07-13 Nuance Communications, Inc. Medical report coding with acronym/abbreviation disambiguation
WO2018211140A1 (en) * 2017-05-19 2018-11-22 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
CN109598671A (en) * 2018-11-29 2019-04-09 北京市商汤科技开发有限公司 Image generating method, device, equipment and medium
CN109784249A (en) * 2019-01-04 2019-05-21 华南理工大学 A kind of scramble face identification method based on variation cascaded message bottleneck
CN110111864A (en) * 2019-04-15 2019-08-09 中山大学 A kind of medical report generation model and its generation method based on relational model
CN110163267A (en) * 2019-05-09 2019-08-23 厦门美图之家科技有限公司 A kind of method that image generates the training method of model and generates image
CN110222140A (en) * 2019-04-22 2019-09-10 中国科学院信息工程研究所 A kind of cross-module state search method based on confrontation study and asymmetric Hash
EP3557584A1 (en) * 2018-04-19 2019-10-23 Siemens Healthcare GmbH Artificial intelligence querying for radiology reports in medical imaging
US10460235B1 (en) * 2018-07-06 2019-10-29 Capital One Services, Llc Data model generation using generative adversarial networks
CN110458904A (en) * 2019-08-06 2019-11-15 苏州瑞派宁科技有限公司 Generation method, device and the computer storage medium of capsule endoscopic image
CN110544275A (en) * 2019-08-19 2019-12-06 中山大学 Methods, systems, and media for generating registered multi-modality MRI with lesion segmentation tags
CN110619347A (en) * 2019-07-31 2019-12-27 广东工业大学 Image generation method based on machine learning and method thereof
CN110689937A (en) * 2019-09-05 2020-01-14 郑州金域临床检验中心有限公司 Coding model training method, system and equipment and detection item coding method
JP2020013543A (en) * 2018-07-20 2020-01-23 哈爾濱工業大学(深セン) Model clothing recommendation method based upon generative adversarial network
CN110910982A (en) * 2019-11-04 2020-03-24 广州金域医学检验中心有限公司 Self-coding model training method, device, equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170199963A1 (en) * 2016-01-13 2017-07-13 Nuance Communications, Inc. Medical report coding with acronym/abbreviation disambiguation
WO2018211140A1 (en) * 2017-05-19 2018-11-22 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
EP3557584A1 (en) * 2018-04-19 2019-10-23 Siemens Healthcare GmbH Artificial intelligence querying for radiology reports in medical imaging
US10460235B1 (en) * 2018-07-06 2019-10-29 Capital One Services, Llc Data model generation using generative adversarial networks
JP2020013543A (en) * 2018-07-20 2020-01-23 哈爾濱工業大学(深セン) Model clothing recommendation method based upon generative adversarial network
CN109598671A (en) * 2018-11-29 2019-04-09 北京市商汤科技开发有限公司 Image generating method, device, equipment and medium
CN109784249A (en) * 2019-01-04 2019-05-21 华南理工大学 A kind of scramble face identification method based on variation cascaded message bottleneck
CN110111864A (en) * 2019-04-15 2019-08-09 中山大学 A kind of medical report generation model and its generation method based on relational model
CN110222140A (en) * 2019-04-22 2019-09-10 中国科学院信息工程研究所 A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN110163267A (en) * 2019-05-09 2019-08-23 厦门美图之家科技有限公司 A kind of method that image generates the training method of model and generates image
CN110619347A (en) * 2019-07-31 2019-12-27 广东工业大学 Image generation method based on machine learning and method thereof
CN110458904A (en) * 2019-08-06 2019-11-15 苏州瑞派宁科技有限公司 Generation method, device and the computer storage medium of capsule endoscopic image
CN110544275A (en) * 2019-08-19 2019-12-06 中山大学 Methods, systems, and media for generating registered multi-modality MRI with lesion segmentation tags
CN110689937A (en) * 2019-09-05 2020-01-14 郑州金域临床检验中心有限公司 Coding model training method, system and equipment and detection item coding method
CN110910982A (en) * 2019-11-04 2020-03-24 广州金域医学检验中心有限公司 Self-coding model training method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BOXIN HE ET AL.,: "DATA AUGMENTATION FOR MONAURAL SINGING VOICE SEPARATION BASED ON VARIATIONAL AUTOENCODER-GENERATIVE ADVERSARIAL NETWORK", 《2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》, pages 1 - 6 *
秦政: "基于生成对抗网络的图像哈希学习", 《中国优秀硕士论文全文数据库(电子期刊) 信息科技辑》, no. 1, pages 1 - 65 *
胡聪: "基于自编码器和生成对抗网络的图像识别方法研宄", 《中国优秀硕士论文全文数据库(电子期刊) 信息科技辑》, pages 1 - 107 *

Also Published As

Publication number Publication date
CN111489802B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
Ramponi et al. T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling
Toker et al. Information integration in large brain networks
CN111599431A (en) Report sheet-based data coding model generation method, system and equipment
CN110910982A (en) Self-coding model training method, device, equipment and storage medium
CN110543901A (en) image recognition method, device and equipment
Krumin et al. Generation of spike trains with controlled auto-and cross-correlation functions
Brenner et al. Tractable dendritic RNNs for reconstructing nonlinear dynamical systems
CN103559205A (en) Parallel feature selection method based on MapReduce
CN112699045A (en) Software test case generation method based on multi-population genetic algorithm
Walsh et al. Automated human cell classification in sparse datasets using few-shot learning
CN111489803B (en) Report form coding model generation method, system and equipment based on autoregressive model
CN108304322B (en) Pressure testing method and terminal equipment
CN113627576B (en) Code scanning information detection method, device, equipment and storage medium
Gustisyaf et al. Implementation of Convolutional Neural Network to Classification Gender based on Fingerprint.
CN110689937A (en) Coding model training method, system and equipment and detection item coding method
Benjamini et al. The shuffle estimator for explainable variance in FMRI experiments
CN111489802B (en) Report coding model generation method, system, equipment and storage medium
CN110970100A (en) Method, device and equipment for detecting item coding and computer readable storage medium
CN111613287B (en) Report coding model generation method, system and equipment based on Glow network
Tripp Decorrelation of spiking variability and improved information transfer through feedforward divisive normalization
Reddy et al. Effect of image colourspace on performance of convolution neural networks
Duan et al. Pip: Physical interaction prediction via mental simulation with span selection
Rasyid et al. Implementation of machine learning using the convolution neural network method for Aglaonema interest classification
Bharti et al. Detection and classification of plant diseases
Vázquez et al. Morphological hetero-associative memories applied to restore true-color patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant