WO2023050490A1 - Procédé et appareil d'analyse de caractéristiques d'association de données, dispositif et support - Google Patents
Procédé et appareil d'analyse de caractéristiques d'association de données, dispositif et support Download PDFInfo
- Publication number
- WO2023050490A1 WO2023050490A1 PCT/CN2021/124577 CN2021124577W WO2023050490A1 WO 2023050490 A1 WO2023050490 A1 WO 2023050490A1 CN 2021124577 W CN2021124577 W CN 2021124577W WO 2023050490 A1 WO2023050490 A1 WO 2023050490A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- matrix
- column
- feature
- data
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 93
- 239000011159 matrix material Substances 0.000 claims abstract description 230
- 239000002131 composite material Substances 0.000 claims abstract description 60
- 238000012360 testing method Methods 0.000 claims abstract description 54
- 238000012216 screening Methods 0.000 claims abstract description 47
- 238000006243 chemical reaction Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000001514 detection method Methods 0.000 claims description 84
- 238000004364 calculation method Methods 0.000 claims description 71
- 238000007689 inspection Methods 0.000 claims description 39
- 238000013507 mapping Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 17
- 238000013506 data mapping Methods 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims 3
- 238000012545 processing Methods 0.000 abstract description 5
- 108090000623 proteins and genes Proteins 0.000 description 19
- 238000010586 diagram Methods 0.000 description 12
- 201000010099 disease Diseases 0.000 description 10
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 230000002068 genetic effect Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 8
- 108700028369 Alleles Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012098 association analyses Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Definitions
- the present application relates to the technical field of big data analysis, and in particular to a data association feature analysis method, device, equipment and medium.
- the relationship between the cause and the result can be obtained through big data analysis, such as the association analysis between the genome and the disease, so as to determine which genes are related to the disease.
- big data analysis such as the association analysis between the genome and the disease, so as to determine which genes are related to the disease.
- the inventors found that due to the gene The amount of information contained is huge, and the amount of data contained in the gene sequence to be analyzed is also very large. With the increase of the number of samples, the existing correlation feature analysis method is inefficient and inaccurate when analyzing massive genetic data. Obtain the position of genes associated with disease. Therefore, there is a problem in the prior art that it is impossible to quickly analyze massive data information to accurately obtain associated features.
- the embodiment of the present application provides a data association feature analysis method, device, equipment and medium, aiming to solve the problem existing in the prior art methods that it is impossible to quickly analyze massive data information to accurately obtain association features.
- the embodiment of the present application provides a data association feature analysis method, which includes:
- the associated column information corresponding to the preset associated screening coefficient is screened out from the sample feature matrix according to the composite test value.
- the embodiment of the present application provides a data association feature analysis device, which includes:
- a data conversion unit configured to convert the initial sample data according to a preset data conversion rule to obtain a corresponding sample feature matrix and a sample detection result matrix if the input initial sample data is received;
- a feature distribution value acquisition unit configured to perform feature analysis on each column of sample data in the sample feature matrix according to preset sample feature analysis rules and the sample detection result matrix to obtain feature distribution values corresponding to each column of sample data ;
- a composite inspection value acquisition unit configured to perform distribution statistics on the characteristic distribution value to obtain a composite inspection value corresponding to the sample data in each column;
- the association column information acquisition unit is configured to filter out the association column information corresponding to the preset association screening coefficients from the sample feature matrix according to the composite test value.
- the embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor executes the computer program.
- the program implements the data association feature analysis method described in the first aspect above.
- the embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned first step.
- the data association feature analysis method In one aspect, the data association feature analysis method.
- Embodiments of the present application provide a data association feature analysis method, device, computer equipment, and readable storage medium.
- the initial sample data is converted and processed to obtain the sample feature matrix and the sample detection result matrix.
- the sample feature analysis rules and the sample detection result matrix the feature analysis is performed on each column of sample data in the sample feature matrix to obtain the corresponding feature distribution value. Perform distribution statistics on the characteristic distribution values corresponding to the sample data in each column to obtain the corresponding composite inspection value, and filter out the associated column information corresponding to the associated screening coefficient from the sample characteristic matrix according to the composite inspection value.
- the characteristic distribution value can be obtained according to the sample characteristic analysis rules for distribution statistics, and the associated column information can be screened out from the sample characteristic matrix according to the composite inspection value obtained by distribution statistics, which can realize rapid analysis of massive data information to obtain Accurately associate features.
- FIG. 1 is a schematic flow diagram of a data association feature analysis method provided in an embodiment of the present application
- FIG. 2 is a schematic diagram of the sub-flow of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 3 is a schematic diagram of another sub-flow of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 4 is a schematic diagram of another sub-flow of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 5 is a schematic diagram of another sub-flow of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 6 is another schematic flowchart of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 7 is a schematic diagram of another sub-flow of the data association feature analysis method provided by the embodiment of the present application.
- FIG. 8 is a schematic block diagram of a data association feature analysis device provided in an embodiment of the present application.
- Fig. 9 is a schematic block diagram of a computer device provided by an embodiment of the present application.
- FIG. 1 is a schematic flow diagram of the data association feature analysis method provided by the embodiment of the present application; the data association feature analysis method is applied to the user terminal or the management server, and the data association feature analysis method is installed on the user terminal or The application software in the management server is executed.
- the management server is the server that can execute the data correlation feature analysis method to perform correlation feature analysis on the initial sample data.
- the management server can be a server built inside an enterprise or a government department. It is a terminal device, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone, which can perform a data correlation feature analysis method to perform correlation feature analysis on the initial sample data. As shown in FIG. 1, the method includes steps S110-S140.
- the initial sample data is converted according to a preset data conversion rule to obtain a corresponding sample feature matrix and a sample detection result matrix.
- the user can input the initial sample data to the user terminal or the management server.
- the initial sample data can be the genetic data and test results of the sample.
- the genetic data can be all or part of the gene sequence contained in a pair of chromosomes.
- the test result is whether the patient is infected or not.
- the detection information of the disease, the technical scheme can screen out the gene points with strong correlation with the detection results from the genetic data through data correlation analysis.
- the initial sample data can be converted according to data conversion rules, wherein the data conversion rules include sample data mapping information and detection result mapping information.
- step S110 includes sub-steps S111 and S112.
- the sample characteristic data of each sample in the initial sample data can be mapped according to the sample data mapping information, and the sample characteristic data is the genetic data of each sample in the initial sample data, and various types of genetic data can be
- the sample feature matrix is obtained through the mapping process, and the obtained sample feature matrix includes sample data corresponding to the genetic data of each sample.
- the sample The data mapping information correspondingly includes mapping information of AA mapping 0, AG mapping 1, and GG mapping 2.
- the initial sample data contains 1963 samples, and the genetic data of each sample contains 317503 gene points, so a sample feature matrix with 1963 rows and 317503 columns can be obtained correspondingly.
- the detection results in the initial sample data may be mapped according to the detection result mapping information.
- the detection results may include the detection results of one or more diseases. If there is only one disease in the detection result, the detection result of "disease” is mapped to "1", and the detection result of "not diseased” is mapped to "0"; The detection result of suffering from multiple diseases is mapped to "1", and the other detection results are mapped to "0".
- the detection results of 1963 samples in the initial sample data are mapped to obtain a sample detection result matrix with 1963 rows and 1 column.
- the sample feature analysis rules are the specific rules for analyzing the sample feature matrix. Based on the sample feature analysis rules and the sample detection result matrix, the feature analysis of each column of sample data in the sample feature matrix can be performed to obtain the feature distribution value corresponding to each column of sample data.
- the feature distribution value is the distribution value of the feature of each gene point in each sample in a specific distribution state.
- the sample feature analysis rules include latent variable calculation formulas and feature calculation formulas.
- step S120 includes sub-steps S121 and S122.
- the sample feature matrix can be calculated according to the hidden variable calculation formula to obtain the corresponding hidden variable matrix, which includes the hidden correlation between each column of sample data and the corresponding detection results.
- sample feature matrix X can be decomposed according to the latent variable calculation formula, then the sample feature matrix X can be expressed by formula (1):
- the characteristic distribution value of each column of sample data can be calculated separately according to the characteristic calculation formula.
- the feature calculation formula includes a degree of freedom value calculation formula, a block matrix formula and a distribution value calculation formula.
- step S122 includes sub-steps S1221 , S1222 and S1223 .
- n is the number of rows of the sample feature matrix X
- d is the number of columns of the hidden variable matrix G.
- the hidden variable matrix, the sample detection result matrix and the sample feature matrix can be inversely operated according to the block matrix formula to obtain an estimated value corresponding to each column of sample data.
- X i YB i +G ⁇ i +E i
- X i the sample data of column i in the sample feature matrix
- Y the sample detection result matrix
- B i the sample
- G is the hidden variable matrix
- ⁇ i is the coefficient corresponding to the hidden variable matrix
- E i is the residual
- the residuals corresponding to any column of sample data are independent of each other.
- T is the matrix transpose symbol.
- the characteristic distribution value of each column of sample data can be further calculated. Specifically, the corresponding characteristic distribution value can be calculated by the distribution value calculation formula. Since each column of sample data contains the characteristic data corresponding to each sample at the same gene point, the corresponding characteristic data of each column of sample data can be calculated. The characteristic distribution value contains a gene point and the characteristic distribution value corresponding to all samples, that is, the number of distribution values contained in the characteristic distribution value of each column of sample data is equal to the number of samples.
- the distribution value calculation formula can be expressed by formula (4):
- z is the degree of freedom value
- t i is the calculated characteristic distribution value
- Distribution statistics can be performed on the characteristic distribution values to obtain the corresponding composite inspection value, and each column of sample data can correspondingly obtain a composite inspection value.
- step S130 includes sub-steps S131 and S132.
- extreme value distribution statistics can be performed on the characteristic distribution value of each column of sample data. Specifically, when the sample size is infinite, the distribution statistics of the characteristic distribution value t of any column of sample data is approximately a normal distribution, using the extreme value distribution theorem The distribution form with the largest absolute value can be determined as the target distribution form corresponding to the characteristic distribution value t of the current column of sample data, and the distribution parameters of the target distribution form can be further obtained as the corresponding characteristic distribution value statistical information.
- a normal distribution can be represented by expression (5):
- the ⁇ and ⁇ in the above expression are the corresponding distribution parameters.
- the user terminal or management server also pre-stores a test value data table, which contains the test value corresponding to each statistical form. After obtaining the statistical information of the characteristic distribution value, it can be based on the statistical mentality corresponding to the statistical information. , obtain a corresponding test value from the test value data table as a composite test value by means of table lookup.
- the associated column information corresponding to the preset associated screening coefficient is screened out from the sample feature matrix according to the composite test value.
- the sample feature matrix can be screened according to the compound test value and the correlation screening coefficient to obtain the corresponding correlation column information.
- the correlation column information can contain at least one column code value, and the correlation column information The included column encoding values can be used to indicate the gene points in the gene sequence that have a strong correlation with the disease.
- step S1401 is further included before step S140 .
- the corresponding correlation screening coefficient can also be calculated according to the calculation formula of the screening coefficient and the column number of the sample feature matrix.
- the calculation formula of the screening coefficient can be expressed by formula (6) :
- e is the preset parameter value in the formula
- m is the column number of the sample feature matrix
- step S140 includes sub-steps S141 and S142.
- the composite test value of each column of sample data is less than the correlation screening coefficient. If it is less than, it indicates that the gene point corresponding to the composite test value is a gene point with significant correlation; if it is not less than, it indicates that the composite test value The gene points corresponding to the values have no significant correlation. According to the judgment result, the compound test value smaller than the correlation screening coefficient can be obtained as the target test value.
- each target test value corresponds to a column of sample data in the sample feature matrix
- the column code value corresponding to each target test value can be obtained from the sample feature matrix and combined, Get the corresponding associated column information.
- the initial sample data is converted according to the data conversion rules to obtain the sample feature matrix and the sample detection result matrix
- the sample feature matrix is processed according to the sample feature analysis rules and the sample detection result matrix
- Perform characteristic analysis on each column of sample data to obtain the corresponding characteristic distribution value
- filter out the correlation value from the sample feature matrix according to the composite test value Correlation column information corresponding to the filter coefficient
- the characteristic distribution value can be obtained according to the sample characteristic analysis rules for distribution statistics, and the associated column information can be screened out from the sample characteristic matrix according to the composite inspection value obtained by distribution statistics, which can realize rapid analysis of massive data information to obtain Accurately associate features.
- the embodiment of the present application also provides a data association feature analysis device, which can be configured in a user terminal or a management server, and the data association feature analysis device is used to implement any implementation of the aforementioned data association feature analysis method example.
- a data association feature analysis device which can be configured in a user terminal or a management server, and the data association feature analysis device is used to implement any implementation of the aforementioned data association feature analysis method example.
- FIG. 8 is a schematic block diagram of an apparatus for analyzing data association features provided by an embodiment of the present application.
- the data correlation feature analysis device 100 includes a data conversion unit 110 , a feature distribution value acquisition unit 120 , a composite test value acquisition unit 130 and an association column information acquisition unit 140 .
- the data conversion unit 110 is configured to convert the initial sample data according to preset data conversion rules to obtain a corresponding sample feature matrix and sample detection result matrix if the input initial sample data is received.
- the data conversion unit 110 includes a subunit: a sample feature matrix acquisition unit, configured to map the sample feature data of each sample in the initial sample data according to the sample data mapping information, Obtaining a corresponding sample feature matrix; a sample detection result matrix acquisition unit configured to perform mapping processing on the detection result of each sample in the initial sample data according to the detection result mapping information to obtain a corresponding sample detection result matrix.
- the feature distribution value acquisition unit 120 is configured to perform feature analysis on each column of sample data in the sample feature matrix according to preset sample feature analysis rules and the sample detection result matrix to obtain a feature distribution corresponding to each column of sample data value.
- the characteristic distribution value acquisition unit 120 includes a subunit: a hidden variable matrix acquisition unit, which is used to calculate the sample feature matrix according to the hidden variable calculation formula to obtain a corresponding hidden variable matrix; A calculation unit, configured to calculate each column of sample data in the hidden variable matrix, the sample detection result matrix, and the sample feature matrix according to the feature calculation formula, so as to obtain the feature corresponding to each column of the sample data distribution value.
- the feature calculation unit includes a subunit: a degree of freedom value calculation unit, which is used to calculate the number of rows of the sample feature matrix and the implicit The number of columns of the variable matrix is calculated to obtain the corresponding degree of freedom; the estimated value calculation unit is used to calculate the hidden variable matrix, the sample detection result matrix and the sample according to the block matrix formula in the feature calculation formula Performing an inverse operation on the characteristic matrix to obtain the estimated value corresponding to the sample data in each column; the distribution value calculation unit is used to correspond to the degree of freedom value and the sample data in each column according to the distribution value calculation formula in the feature calculation formula Calculate the estimated value of the hidden variable matrix, the sample detection result matrix and the sample data in each column of the sample feature matrix to obtain the characteristic distribution value corresponding to each column of the sample data.
- a degree of freedom value calculation unit which is used to calculate the number of rows of the sample feature matrix and the implicit The number of columns of the variable matrix is calculated to obtain the corresponding degree of freedom
- the estimated value calculation unit is used to calculate the hidden variable matrix, the sample
- the composite inspection value acquisition unit 130 is configured to perform distribution statistics on the characteristic distribution values to obtain a composite inspection value corresponding to each column of the sample data.
- the composite test value acquisition unit 130 includes a subunit: a characteristic distribution value statistics unit, which is used to perform extreme value distribution statistics on the characteristic distribution values corresponding to each column of sample data, and obtain the Statistical information of characteristic distribution values of the sample data; a test value acquisition unit, configured to obtain a composite test value corresponding to the statistical form of each feature distribution value statistical information according to a preset test value data table.
- the association column information acquisition unit 140 is configured to filter out the association column information corresponding to the preset association screening coefficients from the sample feature matrix according to the composite test value.
- the data correlation feature analysis device 100 further includes a subunit: a correlation screening coefficient calculation unit, which is used to calculate the number of columns of the sample feature matrix according to a preset screening coefficient calculation formula, and obtain the The correlation screening coefficient mentioned above.
- a correlation screening coefficient calculation unit which is used to calculate the number of columns of the sample feature matrix according to a preset screening coefficient calculation formula, and obtain the The correlation screening coefficient mentioned above.
- the association column information acquisition unit 140 includes a subunit: a target inspection value determination unit, which is used to judge whether the composite inspection value of each column of the sample data is smaller than the association screening coefficient, according to As a result of the judgment, the composite test value smaller than the associated screening coefficient is determined as the target test value; the column code value combination unit is used to obtain the column code value corresponding to the target test value in the sample feature matrix and combine it as the The associated column information corresponding to the associated screening coefficient.
- the data association feature analysis device applies the above-mentioned data association feature analysis method, converts the initial sample data according to the data conversion rules to obtain a sample feature matrix and a sample detection result matrix, and according to the sample feature analysis rules and sample detection
- the result matrix analyzes the characteristics of each column of sample data in the sample feature matrix to obtain the corresponding characteristic distribution value, and performs distribution statistics on the characteristic distribution value corresponding to the sample data in each column to obtain the corresponding composite inspection value.
- the associated column information corresponding to the associated screening coefficient is filtered out from the matrix.
- the characteristic distribution value can be obtained according to the sample characteristic analysis rules for distribution statistics, and the associated column information can be screened out from the sample characteristic matrix according to the composite inspection value obtained by distribution statistics, which can realize rapid analysis of massive data information to obtain Accurately associate features.
- the above-mentioned device for analyzing data association features can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 9 .
- FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the present application.
- the computer device may be a user terminal or a management server for performing the data correlation feature analysis method to perform correlation feature analysis on the initial sample data.
- the computer device 500 includes a processor 502 connected through a system bus 501 , a memory and a network interface 505 , wherein the memory may include a storage medium 503 and an internal memory 504 .
- the storage medium 503 can store an operating system 5031 and a computer program 5032 .
- the processor 502 may execute the data association feature analysis method, wherein the storage medium 503 may be a volatile storage medium or a non-volatile storage medium.
- the processor 502 is used to provide calculation and control capabilities and support the operation of the entire computer device 500 .
- the internal memory 504 provides an environment for the operation of the computer program 5032 in the storage medium 503.
- the processor 502 can execute the data association feature analysis method.
- the network interface 505 is used for network communication, such as providing data transmission and the like.
- the network interface 505 is used for network communication, such as providing data transmission and the like.
- FIG. 9 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer device 500 on which the solution of this application is applied.
- the specific computer device 500 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
- the processor 502 is configured to run the computer program 5032 stored in the memory, so as to realize the corresponding functions in the above-mentioned data association feature analysis method.
- the embodiment of the computer device shown in FIG. 9 does not constitute a limitation on the specific composition of the computer device.
- the computer device may include more or less components than those shown in the illustration. Or combine certain components, or different component arrangements.
- the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in FIG. 9 , and will not be repeated here.
- the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- a computer readable storage medium may be a volatile or non-volatile computer-readable storage medium.
- the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the steps included in the above-mentioned data association feature analysis method are implemented.
- the disclosed devices, devices and methods can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only logical function division.
- there may be other division methods, and units with the same function may also be combined into one Units such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present application.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the readable storage medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned computer-readable storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Sont divulgués dans la présente demande un procédé et un appareil d'analyse de caractéristiques d'association de données, ainsi qu'un dispositif et un support. Le procédé consiste : à réaliser un traitement de conversion sur des données d'échantillon initiales selon une règle de conversion de données, de façon à obtenir une matrice de caractéristiques d'échantillon et une matrice de résultat de test d'échantillon ; à réaliser une analyse de caractéristiques sur chaque colonne de données d'échantillon dans la matrice de caractéristiques d'échantillon selon une règle d'analyse de caractéristiques d'échantillon et la matrice de résultat de test d'échantillon, de façon à obtenir une valeur de distribution de caractéristiques correspondante ; à compiler des statistiques de distribution sur la valeur de distribution de caractéristiques correspondant à chaque colonne des données d'échantillon, de façon à obtenir une valeur de test composite correspondante ; et selon la valeur de test composite, à rechercher par criblage, dans la matrice de caractéristiques d'échantillon, des informations de colonne associées qui correspondent à un coefficient de recherche par criblage associé.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111164594.6 | 2021-09-30 | ||
CN202111164594.6A CN113609204B (zh) | 2021-09-30 | 2021-09-30 | 数据关联特征分析方法、装置、设备及介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023050490A1 true WO2023050490A1 (fr) | 2023-04-06 |
Family
ID=78343317
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/124577 WO2023050490A1 (fr) | 2021-09-30 | 2021-10-19 | Procédé et appareil d'analyse de caractéristiques d'association de données, dispositif et support |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113609204B (fr) |
WO (1) | WO2023050490A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120215458A1 (en) * | 2009-07-14 | 2012-08-23 | Board Of Regents, The University Of Texas System | Orthologous Phenotypes and Non-Obvious Human Disease Models |
CN108567418A (zh) * | 2018-05-17 | 2018-09-25 | 陕西师范大学 | 一种基于PCANet的脉搏信号亚健康检测方法及检测系统 |
CN110674104A (zh) * | 2019-08-15 | 2020-01-10 | 中国平安人寿保险股份有限公司 | 特征组合筛选方法、装置、计算机设备及存储介质 |
CN113035275A (zh) * | 2021-04-22 | 2021-06-25 | 广东技术师范大学 | 结合轮廓系数和rjmcmc算法的肿瘤基因点突变的特征提取方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2963421A1 (fr) * | 2014-07-01 | 2016-01-06 | SeNostic GmbH | Procédé de diagnostic non invasif de maladies neurodégénératives |
CN106354794A (zh) * | 2016-08-26 | 2017-01-25 | 成都汉康信息产业有限公司 | 一种数据分析处理系统 |
CN111383717B (zh) * | 2018-12-29 | 2024-10-18 | 北京安诺优达医学检验实验室有限公司 | 一种构建生物信息分析参照数据集的方法及系统 |
-
2021
- 2021-09-30 CN CN202111164594.6A patent/CN113609204B/zh active Active
- 2021-10-19 WO PCT/CN2021/124577 patent/WO2023050490A1/fr unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120215458A1 (en) * | 2009-07-14 | 2012-08-23 | Board Of Regents, The University Of Texas System | Orthologous Phenotypes and Non-Obvious Human Disease Models |
CN108567418A (zh) * | 2018-05-17 | 2018-09-25 | 陕西师范大学 | 一种基于PCANet的脉搏信号亚健康检测方法及检测系统 |
CN110674104A (zh) * | 2019-08-15 | 2020-01-10 | 中国平安人寿保险股份有限公司 | 特征组合筛选方法、装置、计算机设备及存储介质 |
CN113035275A (zh) * | 2021-04-22 | 2021-06-25 | 广东技术师范大学 | 结合轮廓系数和rjmcmc算法的肿瘤基因点突变的特征提取方法 |
Also Published As
Publication number | Publication date |
---|---|
CN113609204A (zh) | 2021-11-05 |
CN113609204B (zh) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
McManus et al. | Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans | |
Niroula et al. | PON-P2: prediction method for fast and reliable identification of harmful variants | |
Sun et al. | Differential expression analysis for RNAseq using Poisson mixed models | |
Geniza et al. | Tools for building de novo transcriptome assembly | |
Murray et al. | kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity | |
Verbist et al. | VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering | |
WO2021098615A1 (fr) | Procédé et dispositif de remplissage de données de génotype manquantes et serveur | |
Cheung et al. | Prediction of biogeographical ancestry from genotype: a comparison of classifiers | |
US20210225456A1 (en) | Method for detecting genetic variation in highly homologous sequences by independent alignment and pairing of sequence reads | |
WO2022127075A1 (fr) | Procédé de discrétisation de caractéristiques pour une image de télédétection sur la base d'un modèle flou rugueux | |
Jia et al. | Thousands of missing variants in the UK Biobank are recoverable by genome realignment | |
De Marino et al. | A comparative analysis of current phasing and imputation software | |
KR20220073732A (ko) | 분석물질 레벨의 적응적 정규화를 위한 방법, 장치 및 컴퓨터 판독가능 매체 | |
Glusman et al. | Ultrafast comparison of personal genomes via precomputed genome fingerprints | |
Makowski et al. | Mutational analysis of SARS-CoV-2 variants of concern reveals key tradeoffs between receptor affinity and antibody escape | |
Liao et al. | ROC curve analysis in the presence of imperfect reference standards | |
CN116525108A (zh) | 基于snp数据的预测方法、装置、设备及存储介质 | |
Rapti et al. | CoverageMaster: comprehensive CNV detection and visualization from NGS short reads for genetic medicine applications | |
König et al. | Computational assessment of feature combinations for pathogenic variant prediction | |
WO2023050490A1 (fr) | Procédé et appareil d'analyse de caractéristiques d'association de données, dispositif et support | |
He et al. | Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences | |
Naseri et al. | RAFFI: accurate and fast familial relationship inference in large scale biobank studies using RaPID | |
CN116955735A (zh) | 高通量测序数据的质控方法、装置、设备和存储介质 | |
WO2022258077A2 (fr) | Procédé et appareil de discrétisation de caractéristique d'image de détection à distance basés sur un modèle brut flou de type ii, support de stockage et dispositif informatique. | |
US20180239866A1 (en) | Prediction of genetic trait expression using data analytics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21959054 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 05/07/2024) |