US20210073649A1 - Automated data ingestion using an autoencoder - Google Patents
Automated data ingestion using an autoencoder Download PDFInfo
- Publication number
- US20210073649A1 US20210073649A1 US17/101,517 US202017101517A US2021073649A1 US 20210073649 A1 US20210073649 A1 US 20210073649A1 US 202017101517 A US202017101517 A US 202017101517A US 2021073649 A1 US2021073649 A1 US 2021073649A1
- Authority
- US
- United States
- Prior art keywords
- autoencoder
- subset
- values
- numeric
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000037406 food intake Effects 0.000 title description 10
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims description 80
- 239000013598 vector Substances 0.000 claims description 20
- 238000001914 filtration Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 description 31
- 238000013179 statistical model Methods 0.000 description 20
- 238000004891 communication Methods 0.000 description 12
- 230000003287 optical effect Effects 0.000 description 8
- 238000007667 floating Methods 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 229920000642 polymer Polymers 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- FMFKNGWZEQOWNK-UHFFFAOYSA-N 1-butoxypropan-2-yl 2-(2,4,5-trichlorophenoxy)propanoate Chemical compound CCCCOCC(C)OC(=O)C(C)OC1=CC(Cl)=C(Cl)C=C1Cl FMFKNGWZEQOWNK-UHFFFAOYSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- Embodiments disclosed herein generally relate to deep learning, and more specifically, to training an autoencoder to perform automated data ingestion.
- Input data is often received in different formats.
- Data engineering involves converting the format of input data to a desired format.
- data engineering is conventionally a manual process which requires significant time and resources.
- data engineering solutions are not portable, such that a new solution needs to be manually designed for different types of input data and/or desired output formats.
- Embodiments disclosed herein provide systems, methods, articles of manufacture, and computer-readable media for training an autoencoder to perform automated data ingestion.
- the autoencoder may receive streaming data comprising numeric values during a first time interval.
- the autoencoder may determine, during the first time interval, a maximum value and a minimum value of a first subset of the numeric values.
- the autoencoder may then process, during the first time interval, a second subset of the numeric values based on the determined maximum and minimum values.
- FIG. 1 illustrates an embodiment of a system that uses an autoencoder to perform automated data ingestion.
- FIG. 2 illustrates an embodiment of training an autoencoder to perform automated data ingestion.
- FIG. 3 illustrates an embodiment of a processing pipeline.
- FIG. 4 illustrates an embodiment of a first logic flow.
- FIG. 5 illustrates an embodiment of a second logic flow.
- FIG. 6 illustrates an embodiment of a computing architecture.
- Embodiments disclosed herein provide techniques to use an autoencoder to automatically format input data according to a desired output format.
- a statistical model (or other machine learning (ML) model) may format the data sampled from the dataset, thereby generating a formatted output dataset.
- a training dataset may then be used to train the autoencoder to format data.
- the training dataset may include the data sampled from the dataset as an input dataset and the formatted output dataset generated by the statistical model as an output dataset.
- the training dataset may include overlapping “chunks” such that the same data may appear in two or more chunks.
- the autoencoder attempts to format the input dataset, thereby generating an output.
- the statistical model may analyze the output of the autoencoder to determine an accuracy of the autoencoder.
- the determined accuracy of the autoencoder may then be used to train the values of a latent vector of the autoencoder.
- the training of the autoencoder may be repeated until the accuracy of the autoencoder exceeds a threshold.
- the trained autoencoder may then be used for data ingestion, e.g., by attaching the trained autoencoder to all new models and/or datasets.
- embodiments disclosed herein provide techniques to automatically format data using an autoencoder.
- the autoencoder may be trained to appropriately format all data, even if the data has not been previously analyzed.
- embodiments disclosed herein provide scalable solutions that can be ported to any type of data processing pipeline, regardless of any particular input and/or output data formats. Further still, embodiments disclosed herein may train the autoencoder using only the training dataset and/or a portion thereof.
- FIG. 1 depicts an exemplary system 100 , consistent with disclosed embodiments.
- the system 100 includes a computing system 101 .
- the computing system 101 is representative of any type of computing system, such as servers, compute clusters, desktop computers, smartphones, tablet computers, wearable devices, laptop computers, workstations, portable gaming devices, virtualized computing systems, and the like.
- the computing system 101 includes a processor 102 , a memory 103 , and may further include a storage, network interface, and/or other components not pictured for the sake of clarity.
- the memory 103 includes an autoencoder 104 , a machine learning (ML) model 105 , a statistical model 106 , and data stores of training data 107 and formatted data 108 .
- the autoencoder 104 is representative of any type of autoencoder, including variational autoencoders, denoising autoencoders, sparse autoencoders, and contractive autoencoders.
- an autoencoder is a type of artificial neural network that learns data codings (e.g., the latent vector 109 ) in an unsupervised manner.
- Values of the latent vector 109 may be learned (or refined) during training of the autoencoder 104 , thereby training the autoencoder 104 to format input data according to a desired output format (which may include formatting according to a desired operation).
- the trained autoencoder 104 may approximate any function and/or operation applied to input data.
- the autoencoder 104 may convert input data comprising integer values to floating point values.
- the autoencoder 104 may perform any encoding operation, which may include, but is not limited to, normalizing values of input data, computing a z-score (e.g., a signed value reflecting a number of standard deviations the value of input data is from a mean value) for values of input data, standardizing values of input data, recasting values of input data, filtering the input data according to one or more filtering criteria, fuzzing of the values of input data, applying statistical filters to the input data, and the like.
- a z-score e.g., a signed value reflecting a number of standard deviations the value of input data is from a mean value
- the use of any particular type of encoding operation as a reference example herein should not be considered limiting of the disclosure, as the disclosure is equally applicable to all types of encoding operations.
- the use of the term “vector” to describe the latent vector 109 should not be considered limiting of the disclosure, as the latent vector 109 is also representative of a matrix
- the training data 107 comprises columnar and/or row-based data, e.g., one or more columns of integer values, one or more columns of floating point values, etc.
- the training data 107 may be representative of multiple datasets of any size.
- the training data 107 may include 50 column-based datasets, where each dataset has thousands of records (or more).
- the training data 107 may be segmented (e.g., the training data 107 may comprise a plurality of segments of one or more datasets). In one embodiment, each segmented dataset of training data 107 is overlapping, such that at least one value of the training data 107 appears in at least two segments.
- a first dataset may include rows 0 - 1000 of the training data 107
- a second dataset may include rows 900 - 2000 of the training data 107 , such that rows 900 - 1000 appear in the first and second datasets.
- the size of the datasets may be learned based on hyperparameter tuning.
- the ML model 105 and the statistical model 106 are representative of any type of computing model, such as deep learning models, machine learning models, neural networks, classifiers, clustering algorithms, support vector machines, and the like.
- the ML model 105 and the statistical model 106 comprise the same model.
- the ML model 105 (and/or the statistical model 106 ) may be configured to transform (or encode) input data to a target format, thereby generating an output dataset.
- the ML model 105 may be configured to normalize integer values of input data to floating point values, and the output dataset may comprise the floating point values.
- the ML model 105 may compute an output dataset for each input dataset of training data 107 .
- An input dataset and corresponding formatted output dataset generated by the ML model 105 may be referred to as a “training sample” herein.
- the autoencoder 104 may then be trained using the input dataset of one or more training samples. Generally, the autoencoder 104 may receive the input dataset as input, convert the dataset to an encoded format using the values of the latent vector 109 , and decode the converted dataset. In some embodiments, the converted dataset generated by the autoencoder 104 may then be compared to the formatted data of the training sample generated by the ML model 105 . The comparison may include determining a difference and/or least squared error of the converted dataset generated by the autoencoder 104 and the formatted data of the training sample generated by the ML model 105 . Doing so generates one or more values reflecting an accuracy of the autoencoder 104 . In some embodiments, the accuracy may comprise a loss of the autoencoder 104 .
- the ML model 105 and/or the statistical model 106 may receive the converted data generated by the autoencoder 104 to determine the accuracy of the autoencoder 104 relative to the data of the training sample generated by the ML model 105 .
- the ML model 105 may process the converted data generated by the autoencoder 104 and compare the output to the formatted data of the training sample.
- the statistical model 106 may classify the converted data generated by the autoencoder 104 and compare the classification to a classification of the input dataset of the training sample.
- the statistical model 106 may classify the formatted output generated by the autoencoder 104 as a dataset of credit card data.
- the statistical model 106 may compute a relatively high accuracy value for the autoencoder 104 . If, however, the classification for the input dataset is for purchase order amounts, the statistical model 106 may compute a relatively low accuracy value for the autoencoder 104 . In one embodiment, the statistical model 106 may compute the accuracy value for the autoencoder 104 based on a distance between the classifications in a data space, where the accuracy increases as the distance between the classifications decreases.
- the determined accuracy of the autoencoder 104 may then be used to refine the values of the latent vector 109 and/or other components of the autoencoder 104 via a backpropagation operation.
- the backpropagation may be performed using any feasible backpropagation algorithm.
- the values of the latent vector 109 and/or the other components of the autoencoder 104 are refined based on the accuracy of the formatted output generated by the autoencoder 104 . Doing so may result in a latent vector 109 that most accurately maps the input data to the desired output format.
- the training of the autoencoder 104 may be repeated any number of times until the accuracy of the autoencoder 104 exceeds a threshold (and/or the loss of the autoencoder 104 is below a threshold).
- the autoencoder 104 may then be configured to ingest (e.g., format) data to be processed in any processing platform, such as a streaming data platform, thereby generating the formatted data 108 .
- the autoencoder 104 may perform estimated ingestion operations. For example, the autoencoder 104 may receive streaming data over a time interval. If the streaming data is of a reasonable size, the autoencoder 104 may perform a predictive formatting operation on the streaming data.
- the autoencoder 104 may determine the minimum and maximum values therein. Doing so may allow the autoencoder 104 to normalize the streaming data in a predictive fashion in a single pass. Stated differently, the autoencoder 104 may normalize the streaming data in a single processing phase, rather than having to process the streaming data twice (e.g., to discover the minimum/maximum values, then normalize the data based on the identified minimum/maximum values).
- FIG. 2 is a schematic 200 illustrating an embodiment of training the autoencoder 104 to perform automated data ingestion.
- one or more datasets of training data 107 may be segmented.
- the training data 107 may include row-based data and/or column-based data.
- the segments may have a minimum size (e.g., 10,000 rows and/or columns of data).
- one or more of the segments may be modified, for example, by dropping one or more columns of data, formatting one or more columns of data, and the like.
- Doing so may produce varying segments of training data 107 , e.g., where a first segment has had a column dropped, a second segment has had a column formatted, a third segment has had one column dropped and one column formatted, and a fourth segment has not been modified.
- the ML model 105 may process the segmented training data 107 to format the segmented training data 107 according to one or more formatting rules and/or operations. For example, the ML model 105 may normalize, convert, and/or filter the segmented training data 107 .
- one or more output datasets generated by the ML model 105 at block 202 may be stored. The output datasets may include each segment of training data 204 and the corresponding formatted data 205 generated by the ML model 105 at block 202 .
- the segmented training data 204 may include the 1,000 segments
- the formatted data 205 may include 1,000 formatted datasets generated by the ML model 105 by processing each segment at block 202 .
- 1,000 training samples may comprise the segmented training data as input data and the corresponding formatted data 205 generated by the ML model 105 .
- overlapping datasets may be generated using the training samples of segmented training data 204 and formatted data 205 .
- the 1,000 training samples may be modified to include overlapping values.
- the autoencoder 104 may be trained using the overlapping datasets generated at block 206 .
- the autoencoder 104 may process each input dataset (e.g., the segmented training data 204 ) of each training sample, e.g., to convert each of the input datasets of the training samples to a desired output format and/or based on a predefined operation.
- the accuracy of the autoencoder 104 is determined based on the output generated by the autoencoder 104 at block 207 .
- a difference and/or a least squared error may be computed between the output of the autoencoder 104 based on the segmented training data 204 and the corresponding formatted data 205 generated by the ML model 105 .
- the difference and/or least squared error may be used as accuracy values for the autoencoder 104 .
- the statistical model 106 may classify the output generated by the autoencoder 104 at block 207 and compare the generated classification to a classification of the corresponding segmented training data 204 . For example, if the output generated by the autoencoder 104 at block 207 for a first overlapping segment of training data 204 matches a classification generated for the formatted data 205 corresponding to the first overlapping segment of training data 204 , the statistical model 106 may compute a relatively high accuracy value for the autoencoder 104 for the first training sample.
- the determined accuracy may be used to train the autoencoder 104 via a backpropagation operation. Doing so refines the values of the autoencoder 104 , including the latent vector 109 , based on the determined accuracy values for the autoencoder 104 and/or a loss of the autoencoder 104 .
- the accuracy at block 208 may be determined for each training sample. Therefore, continuing with the previous example, the accuracy for each of the 1,000 training samples processed by the autoencoder 104 may be determined at block 208 .
- Each of the 1,000 accuracy values may be provided to the autoencoder 104 to update the weights of the autoencoder 104 , e.g., via 1,000 (or fewer) backpropagation operations.
- FIG. 3 illustrates an embodiment of a processing pipeline 300 .
- streaming input data is received in the processing pipeline 300 .
- the streaming input data may be any type of data, such as transaction data, stock ticker data, financial data, sensor data, and the like.
- the streaming input data includes numeric values in one or more rows and/or columns.
- the streaming input data may have varying types and/or formats which may need to be modified to be compatible with various components of the processing pipeline. Therefore, at block 302 , the trained autoencoder 104 may process the streaming input data.
- the trained autoencoder 104 may format the streaming input data according to a desired output format, normalize the values of the streaming input data, compute a z-score for the streaming input data, standardizing values of the streaming input data, recasting values of the streaming input data, filtering the streaming input data according to one or more filtering criteria, fuzzing of the values of the streaming input data, and the like.
- one or more components of the processing pipeline process the output generated by the autoencoder 104 at block 302 , e.g., the formatted and/or converted streaming input data.
- the autoencoder 104 may process the streaming data in a single pass, e.g., by providing estimated normalization, recasting, etc., and without having to process the streaming data in two or more passes.
- FIG. 4 illustrates an embodiment of a logic flow 400 .
- the logic flow 400 may be representative of some or all of the operations executed by one or more embodiments described herein.
- the logic flow 400 may include some or all of the operations to provide automated data ingestion using an autoencoder. Embodiments are not limited in this context.
- the logic flow 400 begins at block 410 , where a target data format is determined for data.
- the target format may specify a datatype (e.g., integers, floating points, etc.), a data space (e.g., a range of values), etc. More generally, any type of operation may be determined for the data at block 410 , e.g., normalization, filtering, score computation, etc.
- the autoencoder 104 is trained to format data according to the target formats and/or operations defined at block 410 . Generally, the training of the autoencoder 104 is guided by the ML model 105 and/or the statistical model 106 as described in greater detail herein.
- the accuracy of the autoencoder 104 may be determined to exceed a threshold accuracy level. For example, if the threshold is 90% accuracy, and the accuracy of the autoencoder 104 is 95%, the accuracy of the autoencoder may exceed the threshold.
- the autoencoder 104 is configured to format data in a processing pipeline.
- FIG. 5 illustrates an embodiment of a logic flow 500 .
- the logic flow 500 may be representative of some or all of the operations executed by one or more embodiments described herein.
- the logic flow 500 may include some or all of the operations performed to train the autoencoder 104 .
- Embodiments are not limited in this context.
- the logic flow 500 begins at block 510 , where the training data 107 , which may comprise one or more datasets, is segmented into overlapping training data subsets.
- the training data 107 may include row and/or column-based numerical values. By generating overlapping subsets, one or more values of the training data 107 may appear in two or more subsets.
- the ML model 105 transforms the training data subsets according to the format defined at block 410 .
- the ML model 105 may be configured to transform the training data from a first format to a second format. More generally, the ML model 105 may perform any operation on the training data as described above. Doing so may generate a respective transformed output dataset for each of the training data subsets.
- Each training dataset and corresponding transformed output dataset pair may comprise a training sample for the autoencoder.
- One or more of the training samples may be selected at block 530 .
- the autoencoder 104 may process the input dataset of the training sample selected at block 530 . Generally, the autoencoder 104 may transform the input dataset of the training sample (or perform any other operation) based at least in part on the current weights of the latent vector 109 . Doing so may generate a transformed output. At block 550 , the accuracy of the autoencoder 104 is determined based at least in part on the transformed output generated by the autoencoder 104 . As stated, the ML model 105 and/or the statistical model 106 may be used to determine the accuracy of the autoencoder 104 .
- a difference and/or a least squared error may be computed for the output of the autoencoder 104 based on the transformed output dataset of the training sample (e.g., the output of the ML model 105 ) and the output generated by the autoencoder 104 at block 540 .
- the difference and/or least squared error may be used as accuracy values for the autoencoder 104 .
- the statistical model 106 may classify the output generated by the autoencoder 104 at block 540 and compare the generated classification to a classification of the training data of the input sample selected at block 530 .
- the accuracy of the autoencoder 104 may then be determined based on a similarity of the classifications, where more similar classifications result in higher accuracy values for the autoencoder 104 .
- the accuracy determined at block 550 may be provided to the autoencoder 104 .
- the values of the latent vector 109 and any other values of the autoencoder 104 may be refined during a backpropagation operation. Doing so may allow the values of the latent vector 109 to more accurately reflect a mapping required to perform the desired operation on data (e.g., filtering, formatting, recasting, etc.).
- the logic flow 500 may return to block 530 , where another training sample is selected, thereby repeating the training process until the accuracy of the autoencoder 104 exceeds the threshold. Once the accuracy of the autoencoder 104 exceeds a threshold and/or all training samples have been used to train the autoencoder 104 , the logic flow 500 may end.
- FIG. 6 illustrates an embodiment of an exemplary computing architecture 600 comprising a computing system 602 that may be suitable for implementing various embodiments as previously described.
- the computing architecture 600 may comprise or be implemented as part of an electronic device.
- the computing architecture 600 may be representative, for example, of a system that implements one or more components of the system 100 .
- computing system 602 may be representative, for example, of the computing system 101 of the system 100 .
- the embodiments are not limited in this context. More generally, the computing architecture 600 is configured to implement all logic, applications, systems, methods, apparatuses, and functionality described herein with reference to FIGS. 1-5 .
- a component can be, but is not limited to being, a process running on a computer processor, a computer processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer.
- a component can be, but is not limited to being, a process running on a computer processor, a computer processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a server and the server can be a component.
- One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
- the computing system 602 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth.
- processors multi-core processors
- co-processors memory units
- chipsets controllers
- peripherals peripherals
- oscillators oscillators
- timing devices video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth.
- the embodiments are not limited to implementation by the computing system 602 .
- the computing system 602 comprises a processor 604 , a system memory 606 and a system bus 608 .
- the processor 604 can be any of various commercially available computer processors, including without limitation an AMD® Athlon®, Duron® and Opteron® processors; ARM® application, embedded and secure processors; IBM® and Motorola® DragonBall® and PowerPC® processors; IBM and Sony® Cell processors; Intel® Celeron®, Core®, Core ( 2 ) Duo®, Itanium®, Pentium®, Xeon®, and XScale® processors; and similar processors. Dual microprocessors, multi-core processors, and other multi processor architectures may also be employed as the processor 604 .
- the system bus 608 provides an interface for system components including, but not limited to, the system memory 606 to the processor 604 .
- the system bus 608 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures.
- Interface adapters may connect to the system bus 608 via a slot architecture.
- Example slot architectures may include without limitation Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and the like.
- the system memory 606 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory (e.g., one or more flash arrays), polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information.
- the system memory 606 can include non-volatile memory (EEPROM), flash
- the computing system 602 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD) 614 , a magnetic floppy disk drive (FDD) 616 to read from or write to a removable magnetic disk 618 , and an optical disk drive 620 to read from or write to a removable optical disk 622 (e.g., a CD-ROM or DVD).
- the HDD 614 , FDD 616 and optical disk drive 620 can be connected to the system bus 608 by a HDD interface 624 , an FDD interface 626 and an optical drive interface 628 , respectively.
- the HDD interface 624 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
- the computing system 602 is generally is configured to implement all logic, systems, methods, apparatuses, and functionality described herein with reference to FIGS. 1-5 .
- the drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth.
- a number of program modules can be stored in the drives and memory units 610 , 612 , including an operating system 630 , one or more application programs 632 , other program modules 634 , and program data 636 .
- the one or more application programs 632 , other program modules 634 , and program data 636 can include, for example, the various applications and/or components of the system 100 , e.g., the autoencoder 104 , ML model 105 , statistical model 106 , training data 107 , formatted data 108 , and latent vector 109 .
- a user can enter commands and information into the computing system 602 through one or more wire/wireless input devices, for example, a keyboard 638 and a pointing device, such as a mouse 640 .
- Other input devices may include microphones, infra-red (IR) remote controls, radio-frequency (RF) remote controls, game pads, stylus pens, card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, retina readers, touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors, styluses, and the like.
- IR infra-red
- RF radio-frequency
- input devices are often connected to the processor 604 through an input device interface 642 that is coupled to the system bus 608 , but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, and so forth.
- a monitor 644 or other type of display device is also connected to the system bus 608 via an interface, such as a video adaptor 646 .
- the monitor 644 may be internal or external to the computing system 602 .
- a computer typically includes other peripheral output devices, such as speakers, printers, and so forth.
- the computing system 602 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as a remote computer 648 .
- the remote computer 648 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computing system 602 , although, for purposes of brevity, only a memory/storage device 650 is illustrated.
- the logical connections depicted include wire/wireless connectivity to a local area network (LAN) 652 and/or larger networks, for example, a wide area network (WAN) 654 .
- LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
- the computing system 602 When used in a LAN networking environment, the computing system 602 is connected to the LAN 652 through a wire and/or wireless communication network interface or adaptor 656 .
- the adaptor 656 can facilitate wire and/or wireless communications to the LAN 652 , which may also include a wireless access point disposed thereon for communicating with the wireless functionality of the adaptor 656 .
- the computing system 602 can include a modem 658 , or is connected to a communications server on the WAN 654 , or has other means for establishing communications over the WAN 654 , such as by way of the Internet.
- the modem 658 which can be internal or external and a wire and/or wireless device, connects to the system bus 608 via the input device interface 642 .
- program modules depicted relative to the computing system 602 can be stored in the remote memory/storage device 650 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
- the computing system 602 is operable to communicate with wired and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.16 over-the-air modulation techniques).
- wireless communication e.g., IEEE 802.16 over-the-air modulation techniques.
- the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
- Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc.) to provide secure, reliable, fast wireless connectivity.
- a Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions).
- Various embodiments may be implemented using hardware elements, software elements, or a combination of both.
- hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
- Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
- One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein.
- Such representations known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that make the logic or processor.
- Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments.
- Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software.
- the machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like.
- CD-ROM Compact Disk Read Only Memory
- CD-R Compact Disk Recordable
- CD-RW Compact Dis
- the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 16/549,465, titled “AUTOMATED DATA INGESTION USING AN AUTOENCODER” filed on Aug. 23, 2019. The contents of the aforementioned application are incorporated herein by reference in their entirety.
- Embodiments disclosed herein generally relate to deep learning, and more specifically, to training an autoencoder to perform automated data ingestion.
- Input data is often received in different formats. Data engineering involves converting the format of input data to a desired format. However, data engineering is conventionally a manual process which requires significant time and resources. Furthermore, data engineering solutions are not portable, such that a new solution needs to be manually designed for different types of input data and/or desired output formats.
- Embodiments disclosed herein provide systems, methods, articles of manufacture, and computer-readable media for training an autoencoder to perform automated data ingestion. In one example, the autoencoder may receive streaming data comprising numeric values during a first time interval. The autoencoder may determine, during the first time interval, a maximum value and a minimum value of a first subset of the numeric values. The autoencoder may then process, during the first time interval, a second subset of the numeric values based on the determined maximum and minimum values.
-
FIG. 1 illustrates an embodiment of a system that uses an autoencoder to perform automated data ingestion. -
FIG. 2 illustrates an embodiment of training an autoencoder to perform automated data ingestion. -
FIG. 3 illustrates an embodiment of a processing pipeline. -
FIG. 4 illustrates an embodiment of a first logic flow. -
FIG. 5 illustrates an embodiment of a second logic flow. -
FIG. 6 illustrates an embodiment of a computing architecture. - Embodiments disclosed herein provide techniques to use an autoencoder to automatically format input data according to a desired output format. Generally, embodiments disclosed herein may sample a dataset. A statistical model (or other machine learning (ML) model) may format the data sampled from the dataset, thereby generating a formatted output dataset. A training dataset may then be used to train the autoencoder to format data. The training dataset may include the data sampled from the dataset as an input dataset and the formatted output dataset generated by the statistical model as an output dataset. The training dataset may include overlapping “chunks” such that the same data may appear in two or more chunks. Generally, during training, the autoencoder attempts to format the input dataset, thereby generating an output. The statistical model (or other ML model) may analyze the output of the autoencoder to determine an accuracy of the autoencoder. The determined accuracy of the autoencoder may then be used to train the values of a latent vector of the autoencoder. The training of the autoencoder may be repeated until the accuracy of the autoencoder exceeds a threshold. The trained autoencoder may then be used for data ingestion, e.g., by attaching the trained autoencoder to all new models and/or datasets.
- Advantageously, embodiments disclosed herein provide techniques to automatically format data using an autoencoder. Advantageously, the autoencoder may be trained to appropriately format all data, even if the data has not been previously analyzed. Furthermore, embodiments disclosed herein provide scalable solutions that can be ported to any type of data processing pipeline, regardless of any particular input and/or output data formats. Further still, embodiments disclosed herein may train the autoencoder using only the training dataset and/or a portion thereof.
- With general reference to notations and nomenclature used herein, one or more portions of the detailed description which follows may be presented in terms of program procedures executed on a computer or network of computers. These procedural descriptions and representations are used by those skilled in the art to most effectively convey the substances of their work to others skilled in the art. A procedure is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. These operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It proves convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be noted, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to those quantities.
- Further, these manipulations are often referred to in terms, such as adding or comparing, which are commonly associated with mental operations performed by a human operator. However, no such capability of a human operator is necessary, or desirable in most cases, in any of the operations described herein that form part of one or more embodiments. Rather, these operations are machine operations. Useful machines for performing operations of various embodiments include digital computers as selectively activated or configured by a computer program stored within that is written in accordance with the teachings herein, and/or include apparatus specially constructed for the required purpose or a digital computer. Various embodiments also relate to apparatus or systems for performing these operations. These apparatuses may be specially constructed for the required purpose. The required structure for a variety of these machines will be apparent from the description given.
- Reference is now made to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for the purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding thereof. It may be evident, however, that the novel embodiments can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate a description thereof. The intention is to cover all modification, equivalents, and alternatives within the scope of the claims.
-
FIG. 1 depicts anexemplary system 100, consistent with disclosed embodiments. As shown, thesystem 100 includes acomputing system 101. Thecomputing system 101 is representative of any type of computing system, such as servers, compute clusters, desktop computers, smartphones, tablet computers, wearable devices, laptop computers, workstations, portable gaming devices, virtualized computing systems, and the like. Thecomputing system 101 includes aprocessor 102, amemory 103, and may further include a storage, network interface, and/or other components not pictured for the sake of clarity. - As shown, the
memory 103 includes anautoencoder 104, a machine learning (ML)model 105, astatistical model 106, and data stores oftraining data 107 and formatteddata 108. Theautoencoder 104 is representative of any type of autoencoder, including variational autoencoders, denoising autoencoders, sparse autoencoders, and contractive autoencoders. Generally, an autoencoder is a type of artificial neural network that learns data codings (e.g., the latent vector 109) in an unsupervised manner. Values of the latent vector 109 (also referred to as a code, coding, latent variables, and/or latent representation) may be learned (or refined) during training of theautoencoder 104, thereby training theautoencoder 104 to format input data according to a desired output format (which may include formatting according to a desired operation). Stated differently, the trainedautoencoder 104 may approximate any function and/or operation applied to input data. As one example, theautoencoder 104 may convert input data comprising integer values to floating point values. More generally, theautoencoder 104 may perform any encoding operation, which may include, but is not limited to, normalizing values of input data, computing a z-score (e.g., a signed value reflecting a number of standard deviations the value of input data is from a mean value) for values of input data, standardizing values of input data, recasting values of input data, filtering the input data according to one or more filtering criteria, fuzzing of the values of input data, applying statistical filters to the input data, and the like. The use of any particular type of encoding operation as a reference example herein should not be considered limiting of the disclosure, as the disclosure is equally applicable to all types of encoding operations. Similarly, the use of the term “vector” to describe thelatent vector 109 should not be considered limiting of the disclosure, as thelatent vector 109 is also representative of a matrix having multiple dimensions (e.g., a vector of vectors). - To train the
autoencoder 104, one or more datasets oftraining data 107 may be generated. In one embodiment, thetraining data 107 comprises columnar and/or row-based data, e.g., one or more columns of integer values, one or more columns of floating point values, etc. Generally, thetraining data 107 may be representative of multiple datasets of any size. For example, thetraining data 107 may include 50 column-based datasets, where each dataset has thousands of records (or more). Furthermore, thetraining data 107 may be segmented (e.g., thetraining data 107 may comprise a plurality of segments of one or more datasets). In one embodiment, each segmented dataset oftraining data 107 is overlapping, such that at least one value of thetraining data 107 appears in at least two segments. For example, a first dataset may include rows 0-1000 of thetraining data 107, while a second dataset may include rows 900-2000 of thetraining data 107, such that rows 900-1000 appear in the first and second datasets. In one embodiment, the size of the datasets may be learned based on hyperparameter tuning. - The
ML model 105 and thestatistical model 106 are representative of any type of computing model, such as deep learning models, machine learning models, neural networks, classifiers, clustering algorithms, support vector machines, and the like. In one embodiment, theML model 105 and thestatistical model 106 comprise the same model. Generally, the ML model 105 (and/or the statistical model 106) may be configured to transform (or encode) input data to a target format, thereby generating an output dataset. For example, theML model 105 may be configured to normalize integer values of input data to floating point values, and the output dataset may comprise the floating point values. Generally, theML model 105 may compute an output dataset for each input dataset oftraining data 107. An input dataset and corresponding formatted output dataset generated by theML model 105 may be referred to as a “training sample” herein. - The
autoencoder 104 may then be trained using the input dataset of one or more training samples. Generally, theautoencoder 104 may receive the input dataset as input, convert the dataset to an encoded format using the values of thelatent vector 109, and decode the converted dataset. In some embodiments, the converted dataset generated by theautoencoder 104 may then be compared to the formatted data of the training sample generated by theML model 105. The comparison may include determining a difference and/or least squared error of the converted dataset generated by theautoencoder 104 and the formatted data of the training sample generated by theML model 105. Doing so generates one or more values reflecting an accuracy of theautoencoder 104. In some embodiments, the accuracy may comprise a loss of theautoencoder 104. - In some embodiments, the
ML model 105 and/or thestatistical model 106 may receive the converted data generated by theautoencoder 104 to determine the accuracy of theautoencoder 104 relative to the data of the training sample generated by theML model 105. For example, theML model 105 may process the converted data generated by theautoencoder 104 and compare the output to the formatted data of the training sample. In another embodiment, thestatistical model 106 may classify the converted data generated by theautoencoder 104 and compare the classification to a classification of the input dataset of the training sample. For example, thestatistical model 106 may classify the formatted output generated by theautoencoder 104 as a dataset of credit card data. If the statistical model classifies the input dataset of the training sample as being credit card data, thestatistical model 106 may compute a relatively high accuracy value for theautoencoder 104. If, however, the classification for the input dataset is for purchase order amounts, thestatistical model 106 may compute a relatively low accuracy value for theautoencoder 104. In one embodiment, thestatistical model 106 may compute the accuracy value for theautoencoder 104 based on a distance between the classifications in a data space, where the accuracy increases as the distance between the classifications decreases. - The determined accuracy of the
autoencoder 104 may then be used to refine the values of thelatent vector 109 and/or other components of theautoencoder 104 via a backpropagation operation. The backpropagation may be performed using any feasible backpropagation algorithm. Generally, during backpropagation, the values of thelatent vector 109 and/or the other components of theautoencoder 104 are refined based on the accuracy of the formatted output generated by theautoencoder 104. Doing so may result in alatent vector 109 that most accurately maps the input data to the desired output format. - The training of the
autoencoder 104 may be repeated any number of times until the accuracy of theautoencoder 104 exceeds a threshold (and/or the loss of theautoencoder 104 is below a threshold). Theautoencoder 104 may then be configured to ingest (e.g., format) data to be processed in any processing platform, such as a streaming data platform, thereby generating the formatteddata 108. In some embodiments, theautoencoder 104 may perform estimated ingestion operations. For example, theautoencoder 104 may receive streaming data over a time interval. If the streaming data is of a reasonable size, theautoencoder 104 may perform a predictive formatting operation on the streaming data. For example, by ingesting enough streaming data during the time interval, theautoencoder 104 may determine the minimum and maximum values therein. Doing so may allow theautoencoder 104 to normalize the streaming data in a predictive fashion in a single pass. Stated differently, theautoencoder 104 may normalize the streaming data in a single processing phase, rather than having to process the streaming data twice (e.g., to discover the minimum/maximum values, then normalize the data based on the identified minimum/maximum values). -
FIG. 2 is a schematic 200 illustrating an embodiment of training theautoencoder 104 to perform automated data ingestion. As shown, atblock 201, one or more datasets oftraining data 107 may be segmented. Thetraining data 107 may include row-based data and/or column-based data. The segments may have a minimum size (e.g., 10,000 rows and/or columns of data). In some embodiments, one or more of the segments may be modified, for example, by dropping one or more columns of data, formatting one or more columns of data, and the like. Doing so may produce varying segments oftraining data 107, e.g., where a first segment has had a column dropped, a second segment has had a column formatted, a third segment has had one column dropped and one column formatted, and a fourth segment has not been modified. - At
block 202, theML model 105 may process the segmentedtraining data 107 to format the segmentedtraining data 107 according to one or more formatting rules and/or operations. For example, theML model 105 may normalize, convert, and/or filter the segmentedtraining data 107. Atblock 203, one or more output datasets generated by theML model 105 atblock 202 may be stored. The output datasets may include each segment oftraining data 204 and the corresponding formatteddata 205 generated by theML model 105 atblock 202. For example, if 1,000 segments of training data were generated atblock 201, the segmentedtraining data 204 may include the 1,000 segments, while the formatteddata 205 may include 1,000 formatted datasets generated by theML model 105 by processing each segment atblock 202. In such an example, 1,000 training samples may comprise the segmented training data as input data and the corresponding formatteddata 205 generated by theML model 105. - At
block 206, overlapping datasets may be generated using the training samples ofsegmented training data 204 and formatteddata 205. Continuing with the previous example, the 1,000 training samples may be modified to include overlapping values. Atblock 207, theautoencoder 104 may be trained using the overlapping datasets generated atblock 206. For example, theautoencoder 104 may process each input dataset (e.g., the segmented training data 204) of each training sample, e.g., to convert each of the input datasets of the training samples to a desired output format and/or based on a predefined operation. Atblock 208, the accuracy of theautoencoder 104 is determined based on the output generated by theautoencoder 104 atblock 207. For example, a difference and/or a least squared error may be computed between the output of theautoencoder 104 based on the segmentedtraining data 204 and the corresponding formatteddata 205 generated by theML model 105. The difference and/or least squared error may be used as accuracy values for theautoencoder 104. - As another example, the
statistical model 106 may classify the output generated by theautoencoder 104 atblock 207 and compare the generated classification to a classification of the correspondingsegmented training data 204. For example, if the output generated by theautoencoder 104 atblock 207 for a first overlapping segment oftraining data 204 matches a classification generated for the formatteddata 205 corresponding to the first overlapping segment oftraining data 204, thestatistical model 106 may compute a relatively high accuracy value for theautoencoder 104 for the first training sample. - The determined accuracy may be used to train the
autoencoder 104 via a backpropagation operation. Doing so refines the values of theautoencoder 104, including thelatent vector 109, based on the determined accuracy values for theautoencoder 104 and/or a loss of theautoencoder 104. Generally, the accuracy atblock 208 may be determined for each training sample. Therefore, continuing with the previous example, the accuracy for each of the 1,000 training samples processed by theautoencoder 104 may be determined atblock 208. Each of the 1,000 accuracy values may be provided to theautoencoder 104 to update the weights of theautoencoder 104, e.g., via 1,000 (or fewer) backpropagation operations. -
FIG. 3 illustrates an embodiment of a processing pipeline 300. Atblock 301, streaming input data is received in the processing pipeline 300. The streaming input data may be any type of data, such as transaction data, stock ticker data, financial data, sensor data, and the like. In some embodiments, the streaming input data includes numeric values in one or more rows and/or columns. However, the streaming input data may have varying types and/or formats which may need to be modified to be compatible with various components of the processing pipeline. Therefore, at block 302, the trainedautoencoder 104 may process the streaming input data. For example, the trainedautoencoder 104 may format the streaming input data according to a desired output format, normalize the values of the streaming input data, compute a z-score for the streaming input data, standardizing values of the streaming input data, recasting values of the streaming input data, filtering the streaming input data according to one or more filtering criteria, fuzzing of the values of the streaming input data, and the like. Atblock 303, one or more components of the processing pipeline process the output generated by theautoencoder 104 at block 302, e.g., the formatted and/or converted streaming input data. Advantageously, theautoencoder 104 may process the streaming data in a single pass, e.g., by providing estimated normalization, recasting, etc., and without having to process the streaming data in two or more passes. -
FIG. 4 illustrates an embodiment of alogic flow 400. Thelogic flow 400 may be representative of some or all of the operations executed by one or more embodiments described herein. For example, thelogic flow 400 may include some or all of the operations to provide automated data ingestion using an autoencoder. Embodiments are not limited in this context. - As shown, the
logic flow 400 begins atblock 410, where a target data format is determined for data. For example, the target format may specify a datatype (e.g., integers, floating points, etc.), a data space (e.g., a range of values), etc. More generally, any type of operation may be determined for the data atblock 410, e.g., normalization, filtering, score computation, etc. Atblock 420, theautoencoder 104 is trained to format data according to the target formats and/or operations defined atblock 410. Generally, the training of theautoencoder 104 is guided by theML model 105 and/or thestatistical model 106 as described in greater detail herein. Atblock 430, the accuracy of theautoencoder 104 may be determined to exceed a threshold accuracy level. For example, if the threshold is 90% accuracy, and the accuracy of theautoencoder 104 is 95%, the accuracy of the autoencoder may exceed the threshold. Atblock 440, theautoencoder 104 is configured to format data in a processing pipeline. -
FIG. 5 illustrates an embodiment of a logic flow 500. The logic flow 500 may be representative of some or all of the operations executed by one or more embodiments described herein. For example, the logic flow 500 may include some or all of the operations performed to train theautoencoder 104. Embodiments are not limited in this context. - As shown, the logic flow 500 begins at
block 510, where thetraining data 107, which may comprise one or more datasets, is segmented into overlapping training data subsets. As stated, thetraining data 107 may include row and/or column-based numerical values. By generating overlapping subsets, one or more values of thetraining data 107 may appear in two or more subsets. Atblock 520, theML model 105 transforms the training data subsets according to the format defined atblock 410. For example, theML model 105 may be configured to transform the training data from a first format to a second format. More generally, theML model 105 may perform any operation on the training data as described above. Doing so may generate a respective transformed output dataset for each of the training data subsets. Each training dataset and corresponding transformed output dataset pair may comprise a training sample for the autoencoder. One or more of the training samples may be selected atblock 530. - At
block 540, theautoencoder 104 may process the input dataset of the training sample selected atblock 530. Generally, theautoencoder 104 may transform the input dataset of the training sample (or perform any other operation) based at least in part on the current weights of thelatent vector 109. Doing so may generate a transformed output. Atblock 550, the accuracy of theautoencoder 104 is determined based at least in part on the transformed output generated by theautoencoder 104. As stated, theML model 105 and/or thestatistical model 106 may be used to determine the accuracy of theautoencoder 104. For example, a difference and/or a least squared error may be computed for the output of theautoencoder 104 based on the transformed output dataset of the training sample (e.g., the output of the ML model 105) and the output generated by theautoencoder 104 atblock 540. The difference and/or least squared error may be used as accuracy values for theautoencoder 104. As another example, thestatistical model 106 may classify the output generated by theautoencoder 104 atblock 540 and compare the generated classification to a classification of the training data of the input sample selected atblock 530. The accuracy of theautoencoder 104 may then be determined based on a similarity of the classifications, where more similar classifications result in higher accuracy values for theautoencoder 104. - At
block 560, the accuracy determined atblock 550 may be provided to theautoencoder 104. Atblock 570, the values of thelatent vector 109 and any other values of theautoencoder 104 may be refined during a backpropagation operation. Doing so may allow the values of thelatent vector 109 to more accurately reflect a mapping required to perform the desired operation on data (e.g., filtering, formatting, recasting, etc.). If the accuracy of theautoencoder 104 determined atblock 550 is lower than a threshold accuracy, the logic flow 500 may return to block 530, where another training sample is selected, thereby repeating the training process until the accuracy of theautoencoder 104 exceeds the threshold. Once the accuracy of theautoencoder 104 exceeds a threshold and/or all training samples have been used to train theautoencoder 104, the logic flow 500 may end. -
FIG. 6 illustrates an embodiment of anexemplary computing architecture 600 comprising acomputing system 602 that may be suitable for implementing various embodiments as previously described. In various embodiments, thecomputing architecture 600 may comprise or be implemented as part of an electronic device. In some embodiments, thecomputing architecture 600 may be representative, for example, of a system that implements one or more components of thesystem 100. In some embodiments,computing system 602 may be representative, for example, of thecomputing system 101 of thesystem 100. The embodiments are not limited in this context. More generally, thecomputing architecture 600 is configured to implement all logic, applications, systems, methods, apparatuses, and functionality described herein with reference toFIGS. 1-5 . - As used in this application, the terms “system” and “component” and “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by the
exemplary computing architecture 600. For example, a component can be, but is not limited to being, a process running on a computer processor, a computer processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces. - The
computing system 602 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth. The embodiments, however, are not limited to implementation by thecomputing system 602. - As shown in
FIG. 6 , thecomputing system 602 comprises aprocessor 604, asystem memory 606 and asystem bus 608. Theprocessor 604 can be any of various commercially available computer processors, including without limitation an AMD® Athlon®, Duron® and Opteron® processors; ARM® application, embedded and secure processors; IBM® and Motorola® DragonBall® and PowerPC® processors; IBM and Sony® Cell processors; Intel® Celeron®, Core®, Core (2) Duo®, Itanium®, Pentium®, Xeon®, and XScale® processors; and similar processors. Dual microprocessors, multi-core processors, and other multi processor architectures may also be employed as theprocessor 604. - The
system bus 608 provides an interface for system components including, but not limited to, thesystem memory 606 to theprocessor 604. Thesystem bus 608 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. Interface adapters may connect to thesystem bus 608 via a slot architecture. Example slot architectures may include without limitation Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and the like. - The
system memory 606 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory (e.g., one or more flash arrays), polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information. In the illustrated embodiment shown inFIG. 6 , thesystem memory 606 can includenon-volatile memory 610 and/orvolatile memory 612. A basic input/output system (BIOS) can be stored in thenon-volatile memory 610. - The
computing system 602 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD) 614, a magnetic floppy disk drive (FDD) 616 to read from or write to a removablemagnetic disk 618, and anoptical disk drive 620 to read from or write to a removable optical disk 622 (e.g., a CD-ROM or DVD). TheHDD 614,FDD 616 andoptical disk drive 620 can be connected to thesystem bus 608 by aHDD interface 624, anFDD interface 626 and anoptical drive interface 628, respectively. TheHDD interface 624 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies. Thecomputing system 602 is generally is configured to implement all logic, systems, methods, apparatuses, and functionality described herein with reference toFIGS. 1-5 . - The drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For example, a number of program modules can be stored in the drives and
memory units operating system 630, one ormore application programs 632,other program modules 634, andprogram data 636. In one embodiment, the one ormore application programs 632,other program modules 634, andprogram data 636 can include, for example, the various applications and/or components of thesystem 100, e.g., theautoencoder 104,ML model 105,statistical model 106,training data 107, formatteddata 108, andlatent vector 109. - A user can enter commands and information into the
computing system 602 through one or more wire/wireless input devices, for example, akeyboard 638 and a pointing device, such as amouse 640. Other input devices may include microphones, infra-red (IR) remote controls, radio-frequency (RF) remote controls, game pads, stylus pens, card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, retina readers, touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors, styluses, and the like. These and other input devices are often connected to theprocessor 604 through aninput device interface 642 that is coupled to thesystem bus 608, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, and so forth. - A
monitor 644 or other type of display device is also connected to thesystem bus 608 via an interface, such as avideo adaptor 646. Themonitor 644 may be internal or external to thecomputing system 602. In addition to themonitor 644, a computer typically includes other peripheral output devices, such as speakers, printers, and so forth. - The
computing system 602 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as aremote computer 648. Theremote computer 648 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to thecomputing system 602, although, for purposes of brevity, only a memory/storage device 650 is illustrated. The logical connections depicted include wire/wireless connectivity to a local area network (LAN) 652 and/or larger networks, for example, a wide area network (WAN) 654. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet. - When used in a LAN networking environment, the
computing system 602 is connected to theLAN 652 through a wire and/or wireless communication network interface oradaptor 656. Theadaptor 656 can facilitate wire and/or wireless communications to theLAN 652, which may also include a wireless access point disposed thereon for communicating with the wireless functionality of theadaptor 656. - When used in a WAN networking environment, the
computing system 602 can include amodem 658, or is connected to a communications server on theWAN 654, or has other means for establishing communications over theWAN 654, such as by way of the Internet. Themodem 658, which can be internal or external and a wire and/or wireless device, connects to thesystem bus 608 via theinput device interface 642. In a networked environment, program modules depicted relative to thecomputing system 602, or portions thereof, can be stored in the remote memory/storage device 650. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used. - The
computing system 602 is operable to communicate with wired and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.16 over-the-air modulation techniques). This includes at least Wi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wireless technologies, among others. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices. Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions). - Various embodiments may be implemented using hardware elements, software elements, or a combination of both. Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
- One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that make the logic or processor. Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
- The foregoing description of example embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed. Many modifications and variations are possible in light of this disclosure. It is intended that the scope of the present disclosure be limited not by this detailed description, but rather by the claims appended hereto. Future filed applications claiming priority to this application may claim the disclosed subject matter in a different manner, and may generally include any set of one or more limitations as variously disclosed or otherwise demonstrated herein.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/101,517 US20210073649A1 (en) | 2019-08-23 | 2020-11-23 | Automated data ingestion using an autoencoder |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/549,465 US10853728B1 (en) | 2019-08-23 | 2019-08-23 | Automated data ingestion using an autoencoder |
US17/101,517 US20210073649A1 (en) | 2019-08-23 | 2020-11-23 | Automated data ingestion using an autoencoder |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/549,465 Continuation US10853728B1 (en) | 2019-08-23 | 2019-08-23 | Automated data ingestion using an autoencoder |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210073649A1 true US20210073649A1 (en) | 2021-03-11 |
Family
ID=73554888
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/549,465 Active US10853728B1 (en) | 2019-08-23 | 2019-08-23 | Automated data ingestion using an autoencoder |
US17/101,517 Pending US20210073649A1 (en) | 2019-08-23 | 2020-11-23 | Automated data ingestion using an autoencoder |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/549,465 Active US10853728B1 (en) | 2019-08-23 | 2019-08-23 | Automated data ingestion using an autoencoder |
Country Status (1)
Country | Link |
---|---|
US (2) | US10853728B1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11561948B1 (en) | 2021-03-01 | 2023-01-24 | Era Software, Inc. | Database indexing using structure-preserving dimensionality reduction to accelerate database operations |
EP4083858A1 (en) * | 2021-04-29 | 2022-11-02 | Siemens Aktiengesellschaft | Training data set reduction and image classification |
US11734318B1 (en) * | 2021-11-08 | 2023-08-22 | Servicenow, Inc. | Superindexing systems and methods |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180222407A1 (en) * | 2017-02-06 | 2018-08-09 | Korea University Research And Business Foundation | Apparatus, control method thereof and recording media |
US20180262525A1 (en) * | 2017-03-09 | 2018-09-13 | General Electric Company | Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid |
US20190050465A1 (en) * | 2017-08-10 | 2019-02-14 | International Business Machines Corporation | Methods and systems for feature engineering |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015130928A1 (en) * | 2014-02-26 | 2015-09-03 | Nancy Packes, Inc. | Real estate evaluating platform methods, apparatuses, and media |
US10460251B2 (en) * | 2015-06-19 | 2019-10-29 | Preferred Networks Inc. | Cross-domain time series data conversion apparatus, methods, and systems |
US10832168B2 (en) | 2017-01-10 | 2020-11-10 | Crowdstrike, Inc. | Computational modeling and classification of data streams |
EP3599575B1 (en) | 2017-04-27 | 2023-05-24 | Dassault Systèmes | Learning an autoencoder |
US10417556B1 (en) * | 2017-12-07 | 2019-09-17 | HatchB Labs, Inc. | Simulation-based controls optimization using time series data forecast |
US11586915B2 (en) | 2017-12-14 | 2023-02-21 | D-Wave Systems Inc. | Systems and methods for collaborative filtering with variational autoencoders |
-
2019
- 2019-08-23 US US16/549,465 patent/US10853728B1/en active Active
-
2020
- 2020-11-23 US US17/101,517 patent/US20210073649A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180222407A1 (en) * | 2017-02-06 | 2018-08-09 | Korea University Research And Business Foundation | Apparatus, control method thereof and recording media |
US20180262525A1 (en) * | 2017-03-09 | 2018-09-13 | General Electric Company | Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid |
US20190050465A1 (en) * | 2017-08-10 | 2019-02-14 | International Business Machines Corporation | Methods and systems for feature engineering |
Non-Patent Citations (2)
Title |
---|
Lee, Doyup. "Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis" 9 October 2017 [ONLINE] Downloaded 10/15/2024 https://arxiv.org/pdf/1708.02635 (Year: 2017) * |
Sun, Haonan et al "Stacked Denoising Autoencoder Based Stock Market Trend Prediction via K-NEarest Neighbor Data Selection" 2017 [ONLINE] Downloaded 5/18/2023 https://link.springer.com/chapter/10.1007/978-3-319-70096-0_90 (Year: 2017) * |
Also Published As
Publication number | Publication date |
---|---|
US10853728B1 (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210073649A1 (en) | Automated data ingestion using an autoencoder | |
US10311334B1 (en) | Learning to process images depicting faces without leveraging sensitive attributes in deep learning models | |
US11907672B2 (en) | Machine-learning natural language processing classifier for content classification | |
US20240265063A1 (en) | Techniques to embed a data object into a multidimensional frame | |
US11514329B2 (en) | Data-driven deep learning model generalization analysis and improvement | |
US11748448B2 (en) | Systems and techniques to monitor text data quality | |
US11914583B2 (en) | Utilizing regular expression embeddings for named entity recognition systems | |
US11238531B2 (en) | Credit decisioning based on graph neural networks | |
CN112131322B (en) | Time sequence classification method and device | |
US11960846B2 (en) | Embedding inference | |
US12032549B2 (en) | Techniques for creating and utilizing multidimensional embedding spaces | |
WO2022192270A1 (en) | Identifying trends using embedding drift over time | |
US10783257B1 (en) | Use of word embeddings to locate sensitive text in computer programming scripts | |
Niu et al. | Efficient Multiple Kernel Learning Algorithms Using Low‐Rank Representation | |
US20220012535A1 (en) | Augmenting Training Data Sets for ML Classifiers Using Classification Metadata | |
US20240013523A1 (en) | Model training method and model training system | |
US20220284433A1 (en) | Unidimensional embedding using multi-modal deep learning models | |
US20240078415A1 (en) | Tree-based systems and methods for selecting and reducing graph neural network node embedding dimensionality | |
US20240311580A1 (en) | Clinical context centric natural language processing solutions | |
Yeh et al. | A wrapper-based combined recursive orthogonal array and support vector machine for classification and feature selection | |
US20210201334A1 (en) | Model acceptability prediction system and techniques | |
Luan et al. | Multi-Instance Learning with One Side Label Noise | |
Zheng et al. | Character Recognition Based on k-Nearest Neighbor, Simple Logistic Regression, and Random Forest | |
Prexawanprasut et al. | Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique | |
CN114418060A (en) | Identity keeping confrontation training method, device and medium based on graph representation learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CAPITAL ONE SERVICES, LLC, VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WALTERS, AUSTIN GRANT;GOODSITT, JEREMY EDWARD;REEL/FRAME:054447/0080 Effective date: 20190822 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PRE-INTERVIEW COMMUNICATION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
STCC | Information on status: application revival |
Free format text: WITHDRAWN ABANDONMENT, AWAITING EXAMINER ACTION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |