KR101632073B1 - Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis - Google Patents

Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis Download PDF

Info

Publication number
KR101632073B1
KR101632073B1 KR1020150143390A KR20150143390A KR101632073B1 KR 101632073 B1 KR101632073 B1 KR 101632073B1 KR 1020150143390 A KR1020150143390 A KR 1020150143390A KR 20150143390 A KR20150143390 A KR 20150143390A KR 101632073 B1 KR101632073 B1 KR 101632073B1
Authority
KR
South Korea
Prior art keywords
attribute
data
weight
value
profiling
Prior art date
Application number
KR1020150143390A
Other languages
Korean (ko)
Inventor
장원중
Original Assignee
장원중
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 장원중 filed Critical 장원중
Priority to PCT/KR2016/005920 priority Critical patent/WO2016195421A1/en
Application granted granted Critical
Publication of KR101632073B1 publication Critical patent/KR101632073B1/en

Links

Images

Classifications

    • G06F17/30318
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30598
    • G06F17/30699

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to an embodiment of the present invention, provided is a method for providing statistical analysis-based data profiling, which enables data profiling ensuring reliability and having high efficiency to be performed. The method comprises the steps of: calculating at least one statistical value for each attribute based on data included in each of attributes defined in a data set; determining a weight to be assigned to each of the attributes using the calculated at least one statistical value; and determining at least one attribute, having a weight equal to or greater than a preset level, among the attributes defined in the data set to be a target attribute for data profiling.

Description

≪ Desc / Clms Page number 1 > METHOD, DEVICE, SYSTEM AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM FOR PROVIDING DATA PROFILING BASED ON STATISTICAL ANALYSIS < RTI ID =

The present invention relates to a method, system and non-temporal computer-readable recording medium for providing statistical analysis-based data profiling.

Recently, data generated through e-mail, social network service (SNS), multimedia, mobile, and Internet (IoT) have been rapidly increasing, ZettaByte, 10 21 ).

In addition, technology for analyzing and utilizing Big Data has been actively researched across all industries and has become a global issue. In Korea, the government (3.0) is also aiming to share data by actively opening public information It is a situation. In addition, a variety of technologies have been developed to provide useful services to users by utilizing the data poured as described above.

In this situation, it is necessary to ensure the reliability of data quality. As an example of the data quality diagnosis technology that has been introduced in the past, data quality (Or data profiling). However, according to the related art, there is a problem that the data profiling result can be greatly changed according to the subjective judgment of the manager. For example, suppose you are using a data set for the purpose of mailing promotional materials to customers. In this case, data profiling can be performed only on the data included in the data attribute of the address, which is determined to be the most important attribute of the address. It is difficult to conclude that the data set is necessarily used for mailing only, It is only a subjective judgment of the manager. Therefore, although the data quality diagnosis can be performed efficiently, the reliability of the data profiling result is inevitably lowered.

As another example of the data quality diagnosis technology that has been introduced in the past, there is a technique of performing data profiling on all the attributes defined in the data set. According to this conventional technology, accurate data profiling results can be obtained, but there is a limitation that excessive time and effort are required because data profiling must be performed on all the data included in the data set. For example, suppose that there are 100 million transaction information data for every 100 attributes defined in the data set. In this case, the total number of data targeted for data profiling is 10 billion (number of attributes x records Number = 100 x 100,000,000).

Therefore, there is a demand for a data profiling technique which can secure reliability and is highly efficient.

It is an object of the present invention to solve all the problems described above.

In addition, the present invention calculates at least one statistic value for each attribute based on data included in each attribute defined in the data set, and assigns at least one statistic value to each attribute with reference to the calculated at least one statistic value And at least one attribute whose weight is equal to or higher than a predetermined level is determined as an attribute to be subjected to data profiling so that data profiling with high efficiency can be performed while ensuring reliability For other purposes.

In order to accomplish the above object, a representative structure of the present invention is as follows.

According to one aspect of the present invention, there is provided a method for providing statistical analysis based data profiling, comprising the steps of: generating at least one statistic for each attribute based on data contained in each attribute defined in the data set Determining at least one statistic value to be weighted for each attribute based on the calculated at least one statistical value, and determining at least one of the attributes defined in the data set, As an attribute to be subjected to data profiling.

According to another aspect of the present invention there is provided a system for providing data profiling based on statistical analysis, the system comprising: means for generating at least one statistic for each attribute based on data contained in each attribute defined in the data set A statistical value calculation unit for calculating a statistical value, a weighting unit for determining a weight given to each attribute with reference to the calculated at least one statistical value, As an attribute to be subjected to data profiling, at least one attribute that is equal to or higher than a set level.

In addition, there is further provided a non-transitory computer-readable recording medium for recording a computer program for executing the method and a user device, system, and other methods for implementing the invention.

According to the present invention, data profiling is performed on data included in some attributes, which are determined to have a high possibility of occurrence of errors, based on statistical analysis among various attributes defined in the data set, so that they are arbitrarily selected according to the subjective judgment of the administrator The reliability can be greatly increased as compared with the prior art in which data profiling is performed on the data included in the attribute.

In addition, according to the present invention, an efficiency can be remarkably improved as compared with the prior art in which data profiling is performed on data of all attributes defined in a data set.

Further, according to the present invention, since the attributes to be subjected to the data profiling can be determined by further reflecting the business rules (code values, business rules, etc.) applied to the data set together with the statistical analysis result, It is possible to achieve the effect of increasing the amount of the liquid.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a schematic configuration of an overall system for providing statistical analysis based data profiling according to an embodiment of the present invention. FIG.
2 is an exemplary diagram illustrating an internal configuration of a data profiling system according to an embodiment of the present invention.
3 is a diagram illustrating an exemplary internal configuration of an attribute extraction unit according to an exemplary embodiment of the present invention.
FIG. 4 is a diagram conceptually showing a configuration for determining an attribute to be subjected to data profiling among attributes defined in a data set according to an embodiment of the present invention.

The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, so that those skilled in the art can easily carry out the present invention.

Configuration of the entire system

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a schematic configuration of an overall system for providing statistical analysis based data profiling according to an embodiment of the present invention. FIG.

1, an overall system according to an embodiment of the present invention may be configured to include a network 100, a data profiling system 200, a user device 300, and an external server 400 .

First, the communication network 100 according to an embodiment of the present invention can be configured without regard to communication modes such as wired communication and wireless communication. The communication network 100 can be a local area network (LAN), a metropolitan area network ), A wide area network (WAN), and the like. Preferably, the communication network 100 referred to herein may be the well-known Internet or World Wide Web (WWW). However, the communication network 100 may include, at least in part, a known wire / wireless data communication network, a known telephone network, or a known wire / wireless television communication network, without being limited thereto.

Next, in accordance with one embodiment of the present invention, the data profiling system 200 may be a digital device having memory means and equipped with a microprocessor and capable of computing. This data profiling system 200 may be a server system.

Specifically, in accordance with one embodiment of the present invention, the data profiling system 200 may include at least one (e.g., one or more) Calculating a statistic value, determining a weight given to each attribute with reference to the at least one statistical value calculated above, and assigning at least one attribute whose weight is equal to or higher than a predetermined level to the data profile It is possible to perform a function of performing data profiling with high efficiency while ensuring reliability.

The function of the data profiling system 200 will be described in more detail below. Although described above with respect to the data profiling system 200, this description is exemplary and at least some of the functionality or components required of the data profiling system 200 may be utilized by the user device 300 ) Or external server 400, as will be appreciated by those skilled in the art.

Next, in accordance with an embodiment of the present invention, the user device 300 is a digital device that performs a function of accessing the data profiling system 200 through the communication network 100 and communicating with the data profiling system 200, And can be employed as the user device 300 according to the present invention.

According to an embodiment of the present invention, the external server 400 is a server including a function of accessing the data profiling system 200 through the communication network 100 and communicating with the external profiling system 200, And can provide raw data or a data set as a target in the form of a file or a database. For example, the external server 400 can provide reference information, transaction information, aggregation information, and the like as structured data, and can provide HTML, XML, GIS and the like as semi-structured data, Images, sounds, documents, and the like as data.

Configuration of data profiling system

Hereinafter, the internal configuration of the data profiling system that performs an important function for the implementation of the present invention and the functions of the respective components will be described.

2 is an exemplary diagram illustrating an internal configuration of a data profiling system according to an embodiment of the present invention.

3 is a diagram illustrating an exemplary internal configuration of an attribute extraction unit according to an exemplary embodiment of the present invention.

FIG. 4 is a diagram conceptually showing a configuration for determining an attribute to be subjected to data profiling among attributes defined in a data set according to an embodiment of the present invention.

2 and 3, a data profiling system 200 according to an embodiment of the present invention includes a data set management unit 210, an attribute extraction unit 220, a data profiling unit 230, A communication unit 240, a communication unit 250, and a control unit 260. Here, the attribute extracting unit 220 may include a statistical value calculating unit 221, a weight assigning unit 222, and a target attribute determining unit 223. According to an embodiment of the present invention, the data set management unit 210, the attribute extraction unit 220, the data profiling unit 230, the database 240, the communication unit 250, Some of which may be program modules that communicate with an external system (not shown). These program modules may be included in the data profiling system 200 in the form of an operating system, application program modules, and other program modules, and may be physically stored on various known storage devices. These program modules may also be stored in a remote storage device capable of communicating with the data profiling system 200. These program modules include, but are not limited to, routines, subroutines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types as described below in accordance with the present invention.

First, according to an embodiment of the present invention, the data set management unit 210 may perform a function of acquiring raw data or a data set to be subjected to data profiling from the external server 400 a)). In addition, according to an embodiment of the present invention, the data set management unit 210 may perform a function of converting various types of raw data collected as described above into a data set of a format suitable for data profiling (b)).

Next, in accordance with an embodiment of the present invention, the attribute extracting unit 220 (specifically, the statistical value calculating unit 221) extracts data included in each attribute defined in the data set to be subjected to data profiling Based on the at least one statistical value for each attribute.

Here, the attribute defined in the data set refers to an item that is a criterion for classifying a large amount of data (i.e., a record) included in the data set. For example, a bike rental status according to a weather situation Demand), the data set includes a date, a year, a month, a day, an hour, a season, a holiday, a working day, properties such as weather, humidity, casual, registered, number of counts, temp, atemp, windspeed can be defined. have.

Specifically, according to an embodiment of the present invention, the attribute extracting unit 220 may calculate a statistic value that can be utilized as a measure for measuring the probability of occurrence of an error in the data included in each attribute defined in the data set . For example, statistical values such as missing value, minimum value, maximum value, mode value, average value, variance, standard deviation, five numerical summary, outlier, and near zero variance can be calculated.

According to an embodiment of the present invention, the attribute extracting unit 220 (specifically, the weight assigning unit 222) refers to at least one statistical value calculated as described above with respect to each attribute defined in the data set , And can determine a weight to be given to each attribute defined in the data set (refer to FIG. 4 (c)).

Specifically, if the at least one statistical value calculated with respect to the first attribute defined in the data set satisfies a preset criterion, the attribute extracting unit 220 according to an embodiment of the present invention may determine It can be determined that the set weight value is given. More specifically, the attribute extraction unit 220 according to an exemplary embodiment of the present invention can determine a higher weight to be given to the first attribute as the probability of occurrence of an error in the data included in the first attribute increases.

Here, according to an embodiment of the present invention, the weight that can be given to each attribute defined in the data set may include a first weight and a second weight, and the first weight and the second weight are determined independently of each other . In more detail, the attribute extracting unit 220 extracts the first attribute from the data included in the first attribute so that a first weight is assigned to the first attribute when the probability of occurrence of an error is equal to a predetermined level If the probability of occurrence of an error in the data included in the first attribute exceeds a predetermined level, it may be determined that a second weight is given to the first attribute.

For example, the criteria for assigning the first weight and the second weight to attributes defined in the data set may be set as shown in Tables 1 and 2 below, respectively.

The first weighting criterion The first weight If there is even one missing value (NA) 0.1 If Near Zero Variance exists 0.1 If the standard deviation is greater than or equal to a 0.1 Space ("") number exceeds b 0.1 Outlier Bonferroni p value less than c 0.1 If the data time interval (last day - first day) is greater than the current time interval (current day - first day) 0.1

The second weighting criterion The second weight If the number of missing values (NA) is more than d% of the total number of data 0.1 Outlier Bonferroni If p is less than or equal to e (e is less than c in Table 1) 0.1

However, the first weight and the second weighting criterion according to the present invention are not necessarily limited to those listed in Table 1 or Table 2 above, and may be changed to any extent within the scope of achieving the object of the present invention. .

According to an embodiment of the present invention, the attribute extracting unit 220 (more specifically, the object attribute determining unit 223) determines whether or not the attribute of the attribute set in the data set is greater than or equal to a predetermined level As an attribute to be subjected to data profiling.

In addition, according to an embodiment of the present invention, the attribute extraction unit 220 may refer to at least one business rule applied to the data set, Can be determined. Here, business rules may include code values or business rules that are applied to a data set.

For example, the attribute extraction unit 220 according to an embodiment of the present invention may calculate the geometric mean (GM) of the sum of the first weight and the second weight calculated between at least two attributes among the attributes defined in the data set ), It is possible to determine at least two attributes constituting a combination in which the above geometric mean is equal to or higher than a predetermined level as an attribute of data profiling. Here, the equation for calculating the geometric mean (GM) of the sum of the first weight and the second weight between at least two attributes can be expressed by the following equation (1).

Figure 112015099238326-pat00001

And from equation (1) above, S is the number of the set of attributes (a 1, a 2, ... , a i, a n), n is property selected from S, a i is the i-th attribute, a i14 Is a first weight given to the i-th attribute, and a i15 is a second weight given to the i-th attribute.

For example, the attribute extracting unit 220 may extract a plurality of attributes defined in the data set based on the first weight and the second weight given to each attribute defined in the data set, Can be classified into at least one group, and at least one attribute belonging to at least one group among the above groups can be determined as an attribute to be subjected to data profiling.

Next, according to an embodiment of the present invention, the data profiling unit 230 performs a function of performing data profiling only on at least one attribute determined as an attribute to be subjected to data profiling .

Meanwhile, according to an embodiment of the present invention, the database 240 may include at least one of a row data, a data set, a statistic value calculated with respect to an attribute defined in the data set, a weight given to an attribute defined in the data set, The data profiling result, and the like, which are determined as the objects of the data profiling, and the like. Such a database 240 is a concept including a computer-readable recording medium, and may be a broad database including not only a negotiated database but also a data record based on a file system.

Next, in accordance with an embodiment of the present invention, the communication unit 250 performs a function of allowing the data profiling system 200 to communicate with the user device 300 or the external server 400.

The controller 260 according to an exemplary embodiment of the present invention includes data managing unit 210, attribute extracting unit 220, data profiling unit 230, database 240, and communication unit 250 And the like. That is, the control unit 256 controls the flow of data between the components of the data profiling system 200 from outside, thereby controlling the data collecting unit 210, the attribute extracting unit 220, 230, the database 240, and the communication unit 250, respectively.

Experimental Example

Hereinafter, experimental results of data profiling according to the statistical analysis-based data profiling method provided by the data profiling system 200 according to the present invention will be described.

In this experiment, data set of "Bike Sharing Demand" registered in Kaggle was used and data quality efficiency measure (DQEM) was calculated for performance evaluation of data profiling. Here, the equation for calculating the data quality efficiency measurement value can be expressed by the following equation (2).

Figure 112015099238326-pat00002

In Equation (2), S is the product of the total number of attributes and the number of records (i.e., the total number of data included in the data set), and m is the product of the number of attributes and the number of records subject to data profiling.

In this experiment, statistical values indicating that the probability of error occurrence is high with respect to seven attributes out of the 16 attributes defined in the data set were calculated by the statistical analysis-based data profiling method according to the present invention, For the attribute, a first weight or a second weight is given according to predetermined conditions.

Serial number Attribute name Statistics related to the first weight Statistics related to the second weight One Weather Bonferroni p: 0 Bonferroni p: 0 2 Temperature (temp) Bonferroni p: 0 Bonferroni p: 0 3 Discomfort Index (atemp) Bonferroni p: 0 Bonferroni p: 0 4 Windspeed Bonferroni p: 0 Bonferroni p: 0 5 Casual Bonferroni p: 0 Bonferroni p: 0 Missing value (NA): 6,493 Missing value (NA): 37.36% 6 Registered rental (registered) Bonferroni p: 0 Bonferroni p: 0 Standard deviation (sd): 151.039 - Missing value (NA): 6,493 Missing value (NA): 37.36% 7 Number of rental (count) Bonferroni p: 0 Bonferroni p: 0 Standard deviation (sd): 181.144 - Missing value (NA): 6,493 Missing value (NA): 37.36%

Serial number Attribute name The first weight The second weight One Weather 0.1 0.1 2 Temperature (temp) 0.1 0.1 3 Discomfort Index (atemp) 0.1 0.1 4 Windspeed 0.1 0.1 5 Casual 0.2 0.2 6 Registered rental (registered) 0.3 0.2 7 Number of rental (count) 0.3 0.2

Referring to Table 3 and Table 4, among the 16 attributes defined in the data set, it is suggested that there is a high possibility of occurrence of errors in the seven attributes of weather, temperature, discomfort index, wind intensity, temporary lease, registration lease, It can be confirmed that the first weight or the second weight is given.

In this experiment, (i) when data profiling is performed on only two attributes having a first weight of 0.3 or more among the 16 attributes defined in the data set, the data quality efficiency measurement value (DQEM) is 87.5% , And (ii) data profiling was performed on only seven attributes having a first weight of 0.1 or more out of the 16 attributes defined in the data set, the data quality efficiency measurement value was calculated to be 56.25% . Such a data quality efficiency measurement value is significantly higher than the data quality efficiency measurement value (0%) calculated according to the prior art that performs data profiling on all 16 attributes defined in the data set .

Therefore, according to the present invention, it is confirmed that the efficiency of data profiling can be remarkably improved. In addition, according to the present invention, the reliability can be enhanced as compared with the prior art in which data profiling is performed on data included in an arbitrarily selected attribute in accordance with subjective judgment of an administrator.

The embodiments of the present invention described above can be implemented in the form of program instructions that can be executed through various computer components and recorded in a non-transitory computer readable recording medium. The non-transitory computer readable medium may include program instructions, data files, data structures, etc., either alone or in combination. The program instructions recorded on the non-transitory computer-readable recording medium may be those specially designed and constructed for the present invention or may be those known to those skilled in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs, DVDs, magneto-optical media such as floppy disks magneto-optical media), and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Therefore, the spirit of the present invention should not be construed as being limited to the above-described embodiments, and all of the equivalents or equivalents of the claims, as well as the following claims, I will say.

100: Network
200: Data Profiling System
210: Data Set Management Unit
220: Attribute extraction unit
221: statistical value calculating section
222: Weight assignment
223: target attribute determination unit
230: Data profiling performing unit
240: Database
250:
260:
300: User device
400: external server

Claims (10)

CLAIMS 1. A method for providing statistical analysis based data profiling,
Calculating at least one statistical value for each attribute based on data contained in each attribute defined in the data set,
Determining a weight assigned to each attribute with reference to the calculated at least one statistical value, and
Determining at least one attribute whose weight is equal to or higher than a predetermined level among attributes defined in the data set as an attribute to be subjected to data profiling
Lt; / RTI >
In the weight determination step,
And if at least one statistical value calculated with respect to the first attribute satisfies a predetermined criterion, a predetermined weight is given to the first attribute.
The method according to claim 1,
Wherein the at least one statistical value includes at least one of a missing value, a minimum value, a maximum value, a mode value, an average value, a variance, a standard deviation, a five value summation, an outlier, and a Near Zero Variance .
delete The method according to claim 1,
Wherein the weight is determined to be higher as the probability of occurrence of an error in the data included in the attribute is greater.
The method according to claim 1,
Wherein the weight includes at least one of a first weight and a second weight determined independently of each other.
6. The method of claim 5,
In the attribute determination step,
Wherein at least two attributes constituting a combination constituting a combination of the geometric mean and a predetermined level or more are determined as attributes to be subjected to data profiling with reference to a geometric mean of a sum of a first weight and a second weight between at least two attributes.
The method according to claim 1,
In the attribute determination step,
And determining at least one attribute as an attribute to be subjected to data profiling with further reference to business rules applied to the data set.
The method according to claim 1,
Performing data profiling on the data set only for data included in the determined at least one attribute
≪ / RTI >
A non-transitory computer readable recording medium having recorded thereon a computer program for carrying out the method according to any one of claims 1, 2 and 4 to 8. A system for providing statistical analysis-based data profiling,
A statistical value calculating unit for calculating at least one statistical value concerning each attribute based on data included in each attribute defined in the data set,
A weighting unit determining a weighting value to be given to each attribute with reference to the calculated at least one statistical value, and
As an attribute to be subjected to data profiling, at least one attribute whose weight is equal to or higher than a predetermined level among the attributes defined in the data set,
Lt; / RTI >
Wherein the weighting unit determines that a predetermined weight is given to the first attribute if at least one statistical value calculated with respect to the first attribute satisfies a predetermined criterion.
KR1020150143390A 2015-06-04 2015-10-14 Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis KR101632073B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2016/005920 WO2016195421A1 (en) 2015-06-04 2016-06-03 Method, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020150079056 2015-06-04
KR20150079056 2015-06-04

Publications (1)

Publication Number Publication Date
KR101632073B1 true KR101632073B1 (en) 2016-06-20

Family

ID=56354579

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150143390A KR101632073B1 (en) 2015-06-04 2015-10-14 Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis

Country Status (2)

Country Link
KR (1) KR101632073B1 (en)
WO (1) WO2016195421A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102240496B1 (en) * 2020-04-17 2021-04-15 주식회사 한국정보기술단 Data quality management system and method
KR20210085886A (en) * 2019-12-31 2021-07-08 가톨릭관동대학교산학협력단 Data profiling method and data profiling system using attribute value quality index

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389295B (en) * 2019-06-14 2022-03-25 福建省福联集成电路有限公司 VBA language-based electrical data processing method and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150015029A (en) * 2008-10-23 2015-02-09 아브 이니티오 테크놀로지 엘엘시 A method, a system, and a computer-readable medium storing a computer program for performing a data operation, measuring data quality, or joining data elements

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101033179B1 (en) * 2003-09-15 2011-05-11 아브 이니티오 테크놀로지 엘엘시 Data profiling
US8869208B2 (en) * 2011-10-30 2014-10-21 Google Inc. Computing similarity between media programs
KR101530848B1 (en) * 2012-09-20 2015-06-24 국립대학법인 울산과학기술대학교 산학협력단 Apparatus and method for quality control using datamining in manufacturing process
KR101448228B1 (en) * 2013-02-12 2014-10-10 이주양 Apparatus and Method for social data analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150015029A (en) * 2008-10-23 2015-02-09 아브 이니티오 테크놀로지 엘엘시 A method, a system, and a computer-readable medium storing a computer program for performing a data operation, measuring data quality, or joining data elements

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210085886A (en) * 2019-12-31 2021-07-08 가톨릭관동대학교산학협력단 Data profiling method and data profiling system using attribute value quality index
KR102365910B1 (en) * 2019-12-31 2022-02-22 가톨릭관동대학교산학협력단 Data profiling method and data profiling system using attribute value quality index
KR102240496B1 (en) * 2020-04-17 2021-04-15 주식회사 한국정보기술단 Data quality management system and method

Also Published As

Publication number Publication date
WO2016195421A1 (en) 2016-12-08

Similar Documents

Publication Publication Date Title
JP5917719B2 (en) Method, apparatus and computer readable recording medium for image management in an image database
CA2985028C (en) Gating decision system and methods for determining whether to allow material implications to result from online activities
CN109271420B (en) Information pushing method, device, computer equipment and storage medium
US9836517B2 (en) Systems and methods for mapping and routing based on clustering
US20140317756A1 (en) Anonymization apparatus, anonymization method, and computer program
CN109522190B (en) Abnormal user behavior identification method and device, electronic equipment and storage medium
KR101632073B1 (en) Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis
WO2020211146A1 (en) Identifier association method and device, and electronic apparatus
CN108763956A (en) A kind of stream data difference secret protection dissemination method based on fractal dimension
CN108470195A (en) Video identity management method and device
CN110503566B (en) Wind control model building method and device, computer equipment and storage medium
KR101163196B1 (en) Method of managing customized social network map in application server providing customized content and computer readable medium thereof
CN105376223A (en) Network identity relationship reliability calculation method
CN112101692B (en) Identification method and device for mobile internet bad quality users
US20150302302A1 (en) Method and device for predicting number of suicides using social information
CN106961441B (en) User dynamic access control method for Hadoop cloud platform
Cai et al. Tropical cyclone risk assessment for China at the provincial level based on clustering analysis
JP5847122B2 (en) Evaluation apparatus, information providing system, evaluation method, and evaluation program
KR101959213B1 (en) Method for predicting cyber incident and Apparatus thereof
Cheng et al. Toward quantitative measures for the semantic quality of polygon generalization
Basik et al. Slim: Scalable linkage of mobility data
CN111460796A (en) Accidental sensitive word discovery method based on word network
KR102387284B1 (en) Apparatus and method for forecasting heatwave Impact considering severity of health impacts and socio-economic vulnerability
JP5665685B2 (en) Importance determination device, importance determination method, and program
JP6142617B2 (en) Information processing apparatus, information processing method, and information processing program

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190613

Year of fee payment: 4