CN117272035A

CN117272035A - Generating test data using principal component analysis

Info

Publication number: CN117272035A
Application number: CN202310746036.3A
Authority: CN
Inventors: J·E·帕特森; 谈侃
Original assignee: Tektronix Inc
Current assignee: Tektronix Inc
Priority date: 2022-06-21
Filing date: 2023-06-21
Publication date: 2023-12-22

Abstract

Test data is generated using principal component analysis. A system, comprising: an input for accepting a dataset comprising at least two sets of data in the dataset; and one or more processors configured to: deriving at least two principal components from the dataset using principal component analysis, the at least two principal components being orthogonal to each other; mapping the dataset to a principal component domain derived from the at least two principal components; generating additional data in the principal component domain; and remapping the additional data in the principal component domain back to the dataset domain as a newly generated dataset. Methods of operation and descriptions of storage media are also described, the operations of which perform the above-described operations.

Description

Generating test data using principal component analysis

Cross Reference to Related Applications

The present disclosure claims the benefit of U.S. provisional application No. 63/353,956, entitled "PRINCIPALCOMPONENT ANALYSIS FOR SIGNAL GENERATION," filed on month 21 of 2022, the disclosure of which is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates to test and measurement instruments, and more particularly to using principal component analysis (principal component analysis) for signal generation.

Background

Test data is generated for a variety of purposes. It is generated to train a Machine Learning (ML) workflow that uses a large or very large amount of data to train the system. Data may also be generated to model specific behavior of the device. Furthermore, the data may be used as a primary step in generating specific test signals, such as those generated by an Arbitrary Waveform Generator (AWG). In all these cases, modeling high-dimensional data is a complex task to ensure that the generated data is accurate, but includes variability for testing. Common approaches include treating the observed measurements forming the basis of the generated data as independent, or by using complex mathematics in the form of interpolation, line fitting (line fitting), perturbation (pertubating), etc., to address the relationship between specific measurements. None of these methods is ideal because they either produce inaccurate results or produce accurate results that require extensive processing to achieve.

Embodiments according to the present disclosure address these and other limitations found in conventional instruments.

Drawings

Fig. 1 is a diagram illustrating how a system using principal component analysis applies such analysis to a collection of data according to an embodiment of the present disclosure.

FIG. 2 is a diagram of a collection of data on which principal component analysis may be performed, according to an embodiment of the present disclosure.

Fig. 3A and 3B are diagrams of individual data of a dataset, and fig. 3C and 3D illustrate a conventional method of synthesizing data based on such data.

Fig. 4A and 4B are diagrams of first and second principal components of the dataset of fig. 3A and 3B, and fig. 4C and 4D illustrate a method of synthesizing data using a data generation method in a principal component domain according to an embodiment of the present disclosure.

5A, 5B, and 5C illustrate examples of conventionally generated datasets for a dataset map synthesized in the principal component domain using a data generation method for (against) in accordance with an embodiment of the present disclosure.

Fig. 6A and 6C illustrate examples of orthogonal measurement histograms, and fig. 6B and 6D show principal component histograms for a data generation method in a principal component domain according to an embodiment of the present disclosure.

Fig. 7 is a diagram illustrating a data generation method in a principal component domain with a small number of device samples according to an embodiment of the present disclosure.

Fig. 8 is an example flowchart illustrating operations of data and signal generation using a principal component domain according to an embodiment of the present disclosure.

Fig. 9 is a functional block diagram of a data generation system including principal component analysis according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present invention include data generation or signal generation devices that generate an output based on Principal Component Analysis (PCA) of an original dataset. PCAs typically operate on large data sets, such as those generated from measurement data from a Device Under Test (DUT) or from other sources. The dataset for the PCA may also be retrieved from a database storing previously collected data. As a first step, performing PCA on these datasets provides insight (insight) as to which variables contain most information about the data, such as the measurements included in the data. In general, PCA is a matrix decomposition of data that allows a user to analyze and extract insight from measurements in a Principal Component (PC) domain, which may be a different domain than the measurement domain that produced the measurement data, or a different original domain than the original dataset. In some aspects, the ability of PCA to re-map data from the original domain to the PC domain resembles a fourier transform re-characterizing data collected, for example, in the time domain as measurements in the frequency domain. With PCA tools, a user may be able to discern relationships with respect to particular measurements or related data that are not identifiable without PCA analysis. PCA analysis is particularly powerful in analyzing multiple variables and determining which variables are related to each other. Then, as a second step after PCA, the data is modified in some way within the PC domain itself to create modified data. Typically, but not always, the modified data is a larger data set than the original data. Finally, the modified data is mapped from the PC domain back to the original domain where it becomes a new set of data for testing, training, or various other uses. Details and variations of all of these steps and processes are described in detail below.

As mentioned above, PCA operates on the data set. Fig. 1 is a graph 10 of data illustrating one of the basic principles of PCA. The data on chart 10 is assumed to be data having an X component and a Y component. The data is mapped on the graph 10 according to its XY component. PCA maps data from the measurement domain to the principal component domain by coordinate transformation. In order to find the principal component axis, singular value decomposition is performed on the measurement data in the following process. When data from the dataset is projected onto this particular axis, the principal component axis (axis 20 in this case) will always be the axis with the greatest variance. To perform singular value decomposition on a set of data, it is conceivable to generate an axis at any orientation of the raw data and project the dataset onto that arbitrary axis. The variance of the projection data for the current axis is recorded and then the process is repeated by projecting the original data to the new arbitrary axis. This process is repeated with all possible axis orientations. When all possible axes have been generated, the variance data for each axis is analyzed to determine which axis has the greatest variance when the raw data is projected onto which axis. The axis with the greatest variance is the principal component axis. In other words, the principal component axis will point in the direction in which the measured data has the greatest variance. In the dataset of fig. 1, the principal axis is labeled as axis 20 and is oriented in the direction of the most variable data. Other axes, one axis for each measurement in the variable or data, may also be generated. In PCA, each of the principal axes are orthogonal to each other, so the minor component axis 30 is orthogonal to the major component axis 20, as illustrated in fig. 1. PCA is particularly useful when the measurements are linearly related, for example, for feedforward equalizer taps and data carried on signals having various levels. It should be noted that when the number of measurements is three or more, visualization of the measurement data, including visualization using PCA analysis, becomes increasingly difficult, but is a useful tool for analyzing a more modest number of measurements. One of the reasons that PCA is useful in analyzing measurement data is that PCA produces a hierarchical (hierarchical) result of principal components. Then, when modifying the data in the PC domain to generate additional data sets for training or for signal generation purposes, the user may systematically select which principal components to use.

Although many of the examples used herein refer to measurement data as raw data, any type of data may be used as raw data according to an embodiment of an attachment (analog), and is not limited to only measurement data types. However, it should be noted that in order to use PCA on the raw data, the raw data needs to include at least two sets of data.

Fig. 2 illustrates measurement data collected from a system with non-return-to-zero encoding with level measurements where the levels are linearly related, which will be used as an example of PCA. The level 1 (lvl 1) value is plotted on the X-axis and the level 0 (lvl 0) value is plotted on the Y-axis. The data is generated by 1000 with respect to linear correlation data. In other words, according to equation 1, the recorded measurements are simultaneously shifted from the average in opposite directions.

Equation (1) lvl1=x ₁ +x _n

lvl0＝x ₀ +x _n

Wherein x is ₁ Is uniformly distributed in the interval [.8,1]Wherein x is ₀ ＝-2x ₁ +1, and x _n Is zero-mean gaussian noise with a standard deviation of 0.1.

Conventional data analysis in the measurement domain of the measurement data of fig. 2 is illustrated in fig. 3A and 3B, wherein the graph in fig. 3A illustrates various binned (binned) data of the measurement value lvl1, and the graph in fig. 3B illustrates various binned data of the measurement value lvl 0. Fig. 3C and 3D illustrate that a larger dataset may be generated from the measurement data using conventional methods, such as by extraction from distribution (draw). In particular, fig. 3C illustrates data conventionally synthesized from the data of fig. 3A, and fig. 3D illustrates data conventionally synthesized from the data of fig. 3B. In general, conventional methods of generating this synthetic data for lvl1 and lvl0 include extraction from the distribution, interpolation, line fitting, perturbation, etc., as is known in the art. Generating data in this conventional manner assumes that each of the measurements is independent of the other, as each of the synthesized data in fig. 3C and 3D is synthesized using only the variation of a single variable lvl1 or lvl 0.

However, using PCA on recorded data may reveal linear relationships in the data that are not identifiable using conventional tools. These relationships can be used later to advantage to generate large composite data sets that more accurately reflect the relationships of the original data than traditional composite data.

First, to perform PCA, a principal component is extracted from the raw measurement data using singular value decomposition to determine a principal component axis, as described above. Then, after deriving the principal component, the measurements initially collected in the measurement domain are projected into the Principal Component (PC) domain, where each PC is a linear combination of levels.

Equation (2)

For example, using equation 2, the levels [1, -1]V ] are mapped to [ -0.223, ] 0002.

The principal component 1 axis (PC 1) and the principal component 2 axis (PC 2) are illustrated in fig. 2, which is determined by performing PCA on the measurement data in fig. 2 according to equation 2. Note that the PC2 axis 30 in fig. 2 is orthogonal to the PC1 axis 20.

After deriving the principal component and thus the PC domain, and after the raw measurement data has also been projected into the PC domain, data analysis may continue which is not possible with the raw data alone. For example, a histogram may be generated on the PC domain data. Fig. 4A shows a measurement histogram mapped to raw data in the first principal component PC1, and fig. 4B shows a histogram of measurement data after the measurement data has been projected onto the second principal component PC2. Looking back at the above, PC2 is orthogonal to PC 1.

Unlike drawing fig. 3A and 3B (fig. 3A and 3B provide a small amount of information about the raw measurement data), the histograms illustrated in fig. 4A and 4B provide useful information about the measurement data, such as the patterns revealed when binning the data. The binned data of PC1 plotted in fig. 4A illustrates that most of the center bins have approximately the same measurements per bin, meaning that the data has a relatively uniform distribution along PC 1. The obvious difference is the binning data of PC2, which is plotted in fig. 4B. The data looks much like a gaussian distribution, where many more data values fall in the center bin than the end bins. Importantly, both the relatively uniform type of data as seen in fig. 4A and the data showing gaussian distribution characteristics such as seen in fig. 4B are identified as being a distribution of "standard" type of data. When the data in the PC domain is a standard type of distribution, a new data set can be directly generated in the principal component domain by generating new data within the range of any of these types of distributions. For example, the PCI composite data illustrated in FIG. 4C shows far more histograms than the raw data of FIG. 4A, each histogram having a relatively uniform distribution. Similarly, the PC2 synthesized data illustrated in fig. 4D shows a histogram with bins that approximate the original PC2 data illustrated in fig. 4B, but differs in that additional data is generated for the data in fig. 4D. The method of generating a new data set from raw data in each PC domain having a standard distribution includes extracting directly from the distribution in the PC domain. Furthermore, it is possible to generate data from raw data in individual PC domains that do not have a standard distribution. For non-standard distributions, methods of generating additional datasets include interpolation, line fitting, perturbation, etc., in the various PC domains.

After the new data sets have been generated in the PC domain, they can be mapped back into the original domain of the source of the data.

Fig. 5A, 5B, and 5C illustrate the benefits of using PCA to synthesize a data set and then remapping the synthesized data back into the original domain. Fig. 5A shows the composite data extracted from the measurement profile, as described above with reference to fig. 3C and 3D. Note how the data does not follow (follow) the original data set illustrated in fig. 2, but varies greatly and appears almost random in nature. This occurs because it is assumed that the measurements of lvl1 and lvl0 are independent and synthesized independently, but in reality they are correlated with each other as shown by PCA. Fig. 5B shows the synthesized data extracted in the PC domain, and in particular, from both the PCI and PC2 profiles of fig. 4C and 4D. Finally, fig. 5C shows data also synthesized from the PC domain but extracted from only PC1, which is the principal component from the original dataset, as described above. PC2 is set to 0. The data sets illustrated in fig. 5B and 5C, i.e. the data sets extracted in the PC domain and remapped back to the original domain, are much more closely associated with the original data set illustrated in fig. 2, especially when compared to the data set generated from the extraction of the measurement profile of fig. 5A. These new data sets generated using embodiments of the present disclosure may be used for a variety of purposes, such as training a machine learning system, or modeling multiple devices according to observing only a subset of the devices. Additional data is tracked given how closely the data generated in accordance with embodiments of the present disclosure is. In some cases, modifying data from a first device using the above-described method may have the purpose of generating a digital twin (twin) for a second device, even if no measurements are made from the second device.

Another use for generating a new data set that is very similar to the original data set includes generating a signal, for example using an Arbitrary Wave Generator (AWG). Just as generating a data set that is close to but different from the original data set, it is useful to generate a signal that is close to but different from the original signal. For example, a signal from a first device may be measured and converted into an original dataset describing the signal. Using the PCA technique described above, embodiments in accordance with the present disclosure may then be used to generate a different data set that is very similar to, but different from, the original data set. The composite data set is then converted back to a signal so that a device such as an AWG can generate a plurality of different signals that are different from, but based on, the original signal. Thus, such an AWG may be used to generate a plurality of different signals based on the original signal from the device for edge testing or testing different parameters of the device.

Fig. 6A-6D illustrate another example of measurements and how they relate to each other in the PC domain. Fig. 6A and 6C show histograms in the measurement domain of Random Jitter (RJ) and Sinusoidal Jitter (SJ) that are measurements orthogonal to each other. In calculating PCA on RJ and SJ, note that PC1 equals SJ and PC2 equals RJ. In this example SJ is assigned to PC1 because it has more variation. Thus, when the measurements are orthogonal, extracting from the PC distribution to generate a new data set is equivalent to extracting from the measurement distribution to form a new data set.

Fig. 7 illustrates yet another example of generating a dataset using operations in the PC domain. The figure shows the first two principal components PC1 and PC2 of four devices. In this example, there are only nine samples per device, which is a limited number of samples for statistical analysis to generate a data set. In the case of a limited sample, the distribution is typically not "standard" because it has an arbitrary shape. As an example, if the goal is to synthesize a dataset of approximately 200 devices from four measurement devices, then 50 perturbations may be generated for each device to get a total of 200 devices according to embodiments of the present disclosure. As in the previous example, the perturbation is performed in the principal component domain.

FIG. 8 is an example flowchart illustrating operations for generating test data using principal component analysis according to an embodiment of the present disclosure. The process 800 begins at operation 802 with obtaining a set of raw data comprising at least two sets of data or measurements. The original two sets of data may be test data, measurement data, or any type of data. The data may be received from another device, such as a DUT, or may be retrieved from a database of previously stored data 801. Once the data is collected, PCA is performed on the data, converting the raw data from its raw domain to a principal component domain, as described above, in operation 804. The number of principal components that can be generated is limited by the number of independent sets or measurements in the raw data. For example, if the raw data includes three measurements, up to three principal components may be generated using the above-described process.

After mapping the original data to the PC domain in operation 804, the data is analyzed in the PC domain to determine whether the data of a specific domain is a standard distribution in operation 806. Referring back to the above, the standard distribution includes a uniform distribution or a gaussian distribution, or a distribution approximating these standard distributions. If the data of the particular domain is a standard distribution, then in operation 808, new data is generated in the PC domain. Such examples are provided above with reference to fig. 4A-4D. Instead, if the data of the specific domain is not a standard distribution, the data is extended in the PC domain in operation 807. Such augmentation may include perturbing variables or modifying data in the PC domain using interpolation or line fitting. Examples of disturbances in the PC domain are provided above with reference to FIGS. 6A-6D. Once augmented in operation 807, new data may be generated in the PC domain, as described above.

Next, regardless of how the new data is generated in the PC domain, i.e., using operations 807 or 808, the newly generated data is then mapped back to its original domain in operation 810. This operation is handled by using the inverse matrix for generating the type of PC domain and then adding back the average value used in equation 2.

In some embodiments, the process 800 stops with the newly generated data, which may be used for a variety of purposes described herein. In other embodiments, the newly generated data is used to generate a signal. In these embodiments, a new signal is generated in operation 812. When measurements of the signals are used to generate the raw data set collected in operation 802, the signals are typically generated in operation 812. Finally, the signals generated in operation 812, which may be referred to as synthesized waveforms, are validated as they are synthesized using embodiments of the present disclosure. The verification may include ensuring that the composite waveform meets certain requirements, such as maximum voltage, minimum or maximum timing, etc. Although not illustrated in fig. 8, the synthesized waveform or data for generating the synthesized waveform in operation 812 may be stored in the database 802 storing the data to be reused.

Thus, using the techniques described above, any desired amount of test data may be generated using PCA from the original dataset. The generated dataset accurately reflects the original dataset, which means that the generated dataset retains the dependencies in the data of the original dataset.

Embodiments of the present disclosure operate on specific hardware and/or software to achieve the PCA operation described above. FIG. 9 is a block diagram of an example system 900 for generating a data set from an original data set. The system 900 may be a test and measurement instrument such as an oscilloscope or a spectrum analyzer. The system may instead be an arbitrary waveform generator. The system 900 may instead be implemented using cloud processing. Depending on implementation details, system 900 may take many forms. The system 900 may include one or more ports 902 for receiving a data set. In some embodiments, the data set is imported directly into the system 900 through an input port 902. In other embodiments, the data set is in the form of an input signal, in which case the port 902 may include a receiver and/or transceiver.

The port 902 is coupled to one or more processors 916 to process data sets and/or signals received at the port 902. Although only one processor 916 is shown in fig. 9 for ease of illustration, multiple processors 916 of different types may be used in combination in the instrument 900 instead of a single processor 916, as will be appreciated by those skilled in the art.

The port 902 may be connected to a measurement unit 908 in the test instrument 900. Measurement unit 908 may include any component capable of measuring aspects (e.g., voltage, amperage, amplitude, power, energy, etc.) of a signal received via port 902. The test and measurement instrument 900 may include additional hardware and/or processors, such as conditioning circuitry, analog-to-digital converters, and/or other circuitry, to convert the received signals into waveforms for further analysis. The measurement unit 908 generates a data set comprising two or more sets of data for use by the one or more processors 916 and/or for use by a principal component processor 930 described below.

In some embodiments, the data set is not retrieved through the input port 902 nor measured from the signal received through the input port, but is retrieved from the data set storage 920, the data set storage 920 may be within the system 900, or may be an external database.

The one or more processors 916 may be configured to execute instructions from the memory 910 and may perform any methods and/or associated steps indicated by such instructions, such as displaying and modifying input signals received by the instrument. Memory 910 may be implemented as processor cache, random Access Memory (RAM), read Only Memory (ROM), solid state memory, hard disk drive(s), or any other memory type. Memory 910 acts as a medium for storing data such as acquired sample waveforms, computer program products, and other instructions.

User input 914 is coupled to processor 916. User input 914 may include a keyboard, mouse, touch screen, and/or any other controller (control) that may be used by a user to set up and control instrument 900. User input 914 may include a graphical user interface or a text/character interface that operates in conjunction with display 912. User input 914 may receive remote commands or commands in the form of a program on instrument 100 itself or from a remote device. Display 912 may be a digital screen, cathode ray tube based display, or any other monitor to display waveforms, measurements, and other data to a user. Although the components of the test instrument 900 are depicted as being integrated within the test and measurement instrument 900, one of ordinary skill in the art will appreciate that any of these components may be external to the test instrument 900 and may be coupled to the test instrument 900 in any conventional manner (e.g., wired and/or wireless communication medium and/or mechanism). For example, in some embodiments, the display 912 may be remote from the test and measurement instrument 900, or the instrument may be configured to send output to a remote device in addition to displaying the output on the instrument 900. In further embodiments, the output from the measurement instrument 900 may be sent to or stored in a remote device (such as a cloud device) that is accessible from other machines coupled to the cloud device.

The instrument 900 may include a principal component processor 930, which may be a separate processor from the one or more processors 916 described above, or the functionality of the principal component processor 930 may be integrated into the one or more processors 916. In addition, principal component processor 920 can include separate memory, using memory 910 described above, or any other memory accessible by instrument 900. The principal component processor 920 can include a dedicated processor or operations to perform the functions described above. For example, the principal component processor 920 can include a principal component extractor 932 for performing principal component analysis on a data set, which can include measurement data. The principal component extractor 932 may perform a singular value decomposition process on the raw data set. The primary domain mapper 934 then maps the raw dataset data from the dataset domain to a primary component domain derived by the primary component extractor 932. The data set field means a field in which data is initially located. For example, the field may be a measurement field of measurement data. As described above, once the dataset data has been mapped to the principal component domain, the data generator 936 generates further data in the principal component domain. Then, after the new data has been generated, the original domain remapper 938 remaps the data from the original domain (including the new data generated by the data generator 936) back to the original domain of the original data set. Thus, the principal component processor generates a composite dataset that closely follows the original dataset, thereby preserving the relationship.

Any or all of the components of the principal component processor 930, including the principal component extractor 932, the principal domain mapper 934, the data generator 936, and the original domain remapper 938, may be implemented in one or more separate processors, and the separate functions described herein may be implemented as specific preprogrammed operations for dedicated or general-purpose processors. Further, as described above, any or all of the components or functions of the principal component processor 930 may be integrated into one or more processors 916 operating the system 900.

Aspects of the disclosure may operate on specially created hardware, on firmware, digital signal processors, or on specially programmed general-purpose computers, including processors operating according to programmed instructions. The term controller or processor as used herein is intended to include microprocessors, microcomputers, application Specific Integrated Circuits (ASICs), and special purpose hardware controllers. One or more aspects of the present disclosure may be implemented in computer-usable data and computer-executable instructions (such as in one or more program modules) executed by one or more computers (including a monitoring module) or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. Computer-executable instructions may be stored on non-transitory computer-readable media such as hard disks, optical disks, removable storage media, solid state memory, random Access Memory (RAM), and the like. As will be appreciated by those skilled in the art, the functionality of the program modules may be combined or distributed as desired in various aspects. Furthermore, the functions may be implemented in whole or in part in firmware or hardware equivalents such as integrated circuits, FPGAs, and the like. Particular data structures may be used to more effectively implement one or more aspects of the present disclosure, and such data structures are contemplated to be within the scope of the computer-executable instructions and computer-usable data described herein.

In some cases, the disclosed aspects may be implemented in hardware, firmware, software, or any combination thereof. The disclosed aspects may also be implemented as instructions carried by or stored on one or more non-transitory computer-readable media, which may be read and executed by one or more processors. Such instructions may be referred to as a computer program product. Computer-readable media as discussed herein means any medium that can be accessed by a computing device. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.

Computer storage media means any medium that can be used to store computer readable information. By way of example, and not limitation, computer storage media may include RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital Video Disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, and any other volatile or non-volatile, removable or non-removable media implemented in any technology. Computer storage media excludes signals themselves and transitory forms of signal transmission.

Communication media means any medium that can be used for communication of computer readable information. By way of example, and not limitation, communication media may include coaxial cables, fiber-optic cables, air, or any other medium suitable for communication of electrical, optical, radio Frequency (RF), infrared, acoustic, or other types of signals.

Example

Illustrative examples of the disclosed technology are provided below. Embodiments of the technology may include one or more of the examples described below, and any combination thereof.

Example 1 is a system, comprising: an input for accepting a dataset comprising at least two sets of data in the dataset; and one or more processors configured to: deriving at least two principal components from the dataset using principal component analysis, the at least two principal components being orthogonal to each other, mapping the dataset to a principal component domain derived from the at least two principal components, generating additional data in the principal component domain, and remapping the additional data in the principal component domain back to the dataset domain as a newly generated dataset.

Example 2 is the system of example 1, wherein the additional data generated in the principal component domain is generated from data having a standard distribution in the principal component domain.

Example 3 is the system of any of the preceding examples, wherein the additional data generated in the principal component domain is generated from data having a non-standard distribution in the principal component domain.

Example 4 is the system of any one of the preceding examples, further comprising a signal generator.

Example 5 is the system of example 4, wherein the signal generator is configured to generate a signal from the newly generated dataset.

Example 6 is the system of example 5, wherein the dataset comprising at least two sets of data is generated from an original signal received at the input.

Example 7 is the system of example 6, further comprising a measurement unit configured to measure a signal received at the input.

Example 8 is the system of example 5, wherein the system further comprises a signal validator configured to ensure that the generated signal conforms to one or more signal definitions.

Example 9 is a method, comprising: accepting a dataset comprising at least two sets of data in the dataset; deriving at least two principal components from the dataset using principal component analysis, the at least two principal components being orthogonal to each other; mapping the dataset to a principal component domain derived from the at least two principal components; generating additional data in the principal component domain; and remapping the additional data in the principal component domain back to the dataset domain as a newly generated dataset.

Example 10 is the method of example 9, wherein the additional data generated in the principal component domain is generated from data having a standard distribution in the principal component domain.

Example 11 is a method according to any one of the preceding example methods, wherein the additional data generated in the principal component domain is generated from data having a non-standard distribution in the principal component domain.

Example 12 is the method of any of the previous example methods, further comprising generating a signal from the newly generated dataset.

Example 13 is the method of any of the previous example methods, further comprising generating a dataset comprising at least two sets of data from the input signal.

Example 14 is the method of any one of the preceding example methods, further comprising: the method includes receiving an input signal, performing one or more measurements on the input signal, and generating a dataset comprising at least two sets of data from the one or more measurements of the input signal.

Example 15 is the method of example method 12, further comprising validating the generated signal against one or more signal definitions.

Example 16 is a non-transitory computer-readable storage medium storing one or more instructions that, when executed by one or more processors of a computing device, cause the computing device to: accepting a dataset comprising at least two sets of data in the dataset; deriving at least two principal components from the dataset using principal component analysis, the at least two principal components being orthogonal to each other; mapping the dataset to a principal component domain derived from the at least two principal components; generating additional data in the principal component domain; and remapping the additional data in the principal component domain back to the dataset domain as a newly generated dataset.

Example 17 is the non-transitory computer-readable storage medium of claim 16, wherein execution of the one or more instructions causes the computing device to generate additional data in a principal component domain using data having a standard distribution in the principal component domain.

Example 18 is a non-transitory computer-readable storage medium according to any preceding storage medium example, wherein execution of the one or more instructions causes the computing device to generate additional data in the principal component domain using data having a non-standard distribution in the principal component domain.

Example 19 is a non-transitory computer-readable storage medium according to any preceding storage medium example, wherein execution of the one or more instructions causes the computing device to generate a signal from the newly generated dataset.

Example 20 is the non-transitory computer-readable storage medium of example 19, wherein execution of the one or more instructions causes the computing device to verify the generated signal to ensure that the generated signal conforms to one or more signal definitions.

The previously described versions of the disclosed subject matter have many advantages that are described or will be apparent to one of ordinary skill. Even so, such advantages or features are not required in all versions of the disclosed apparatus, systems or methods.

Furthermore, the written description references specific features. It is to be understood that the disclosure in this specification includes all possible combinations of those particular features. Where a particular feature is disclosed in the context of a particular aspect or example, that feature may also be used in the context of other aspects and examples to the extent possible.

Furthermore, when reference is made in this application to a method having two or more defined steps or operations, the defined steps or operations may be performed in any order or simultaneously unless the context excludes those possibilities.

Although specific examples of the invention have been illustrated and described for purposes of description, it will be appreciated that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, the invention should not be limited except as by the appended claims.

Claims

1. A system, comprising:

an input for accepting a dataset comprising at least two sets of data in the dataset; and

one or more processors configured to:

deriving from the dataset at least two principal components using principal component analysis, the at least two principal components being orthogonal to each other,

mapping the dataset to a principal component domain derived from the at least two principal components,

generating additional data in the principal component domain

Additional data in the principal component domain is remapped back to the dataset domain as a newly generated dataset.

2. The system of claim 1, wherein the additional data generated in the principal component domain is generated from data having a standard distribution in the principal component domain.

3. The system of claim 1, wherein the additional data generated in the principal component domain is generated from data having a non-standard distribution in the principal component domain.

4. The system of claim 1, further comprising a signal generator.

5. The system of claim 4, wherein the signal generator is configured to generate a signal from a newly generated data set.

6. The system of claim 5, wherein the dataset comprising at least two sets of data is generated from an original signal received at the input.

7. The system of claim 6, further comprising a measurement unit configured to measure a signal received at the input.

8. The system of claim 5, wherein the system further comprises a signal validator configured to ensure that the generated signal conforms to one or more signal definitions.

9. A method, comprising:

accepting a dataset comprising at least two sets of data in the dataset;

deriving at least two principal components from the dataset using principal component analysis, the at least two principal components being orthogonal to each other;

mapping the dataset to a principal component domain derived from the at least two principal components;

generating additional data in the principal component domain; and

10. The method of claim 9, wherein the additional data generated in the principal component domain is generated from data having a standard distribution in the principal component domain.

11. The method of claim 9, wherein the additional data generated in the principal component domain is generated from data having a non-standard distribution in the principal component domain.

12. The method of claim 9, further comprising generating a signal from the newly generated dataset.

13. The method of claim 9, further comprising generating a dataset comprising at least two sets of data from the input signal.

14. The method of claim 9, further comprising:

receiving an input signal;

performing one or more measurements on an input signal; and

a dataset comprising at least two sets of data is generated from one or more measurements of the input signal.

15. The method of claim 12, further comprising validating the generated signals against one or more signal definitions.

16. A non-transitory computer-readable storage medium storing one or more instructions that, when executed by one or more processors of a computing device, cause the computing device to:

accepting a dataset comprising at least two sets of data in the dataset;

generating additional data in the principal component domain; and

17. The non-transitory computer-readable storage medium of claim 16, wherein execution of the one or more instructions causes the computing device to generate additional data in a principal component domain using data having a standard distribution in the principal component domain.

18. The non-transitory computer-readable storage medium of claim 16, wherein execution of the one or more instructions causes the computing device to generate additional data in a principal component domain using data having a non-standard distribution in the principal component domain.

19. The non-transitory computer-readable storage medium of claim 16, wherein execution of the one or more instructions causes the computing device to generate a signal from a newly generated data set.

20. The non-transitory computer-readable storage medium of claim 19, wherein execution of the one or more instructions causes the computing device to verify the generated signal to ensure that the generated signal conforms to one or more signal definitions.