CN110442620A - A kind of big data is explored and cognitive approach, device, equipment and computer storage medium - Google Patents

A kind of big data is explored and cognitive approach, device, equipment and computer storage medium Download PDF

Info

Publication number
CN110442620A
CN110442620A CN201910718388.1A CN201910718388A CN110442620A CN 110442620 A CN110442620 A CN 110442620A CN 201910718388 A CN201910718388 A CN 201910718388A CN 110442620 A CN110442620 A CN 110442620A
Authority
CN
China
Prior art keywords
data
field
information
task
exploration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910718388.1A
Other languages
Chinese (zh)
Other versions
CN110442620B (en
Inventor
赵玉德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Letter Telepresence Polytron Technologies Inc
Original Assignee
Zhejiang Letter Telepresence Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Letter Telepresence Polytron Technologies Inc filed Critical Zhejiang Letter Telepresence Polytron Technologies Inc
Priority to CN201910718388.1A priority Critical patent/CN110442620B/en
Publication of CN110442620A publication Critical patent/CN110442620A/en
Application granted granted Critical
Publication of CN110442620B publication Critical patent/CN110442620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to data processings and analysis technical field, a kind of big data is disclosed to explore and cognitive approach, device, equipment and computer storage medium, so that big data application project, ETL exploitation, data processing, cleaning, integration and in terms of and face various different data sources mass data when, it can explore to obtain and assume overall responsibility for information comprising each tables of data even each field, and these are assumed overall responsibility for into information and is shown automatically with visual view and table, so that user be allowed to obtain the full cognizance to data.Many data warehouses, ETL, BI, data mining, machine learning and big data analysis project are often imperfectly or to end up in failure at present, major reason is because recognizing at the beginning to data complete or collected works inadequate, there is deviation, or with without exception complete, therefore can preferably help through the invention, guides user with a definite target in view, the planning suited the remedy to the case and develop aforementioned project, be avoided as much as these projects imperfectly or to end in failure.

Description

A kind of big data is explored and cognitive approach, device, equipment and computer storage medium
Technical field
The invention belongs to data processings and analysis technical field, and in particular to a kind of big data is explored and cognitive approach, dress It sets, equipment and computer storage medium.
Background technique
Big data (big data), which refers to, to be captured within the scope of certain time with conventional software tool, manage and The data acquisition system of processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability Magnanimity, high growth rate and diversified information assets.In recent years, with Internet technology, computer technology and database skill , there is a large amount of data information in our daily work life in the fast development of art etc., and these data informations with The growth rate of geometry rank keeps high speed to be incremented by, and then the speed for causing information content to increase is wanted more than the speed of human intelligible Fastly, and with ocean wave type the life of the mankind is poured in from all directions.In face of growing mass data, it is very difficult to manually comb at present It manages and excavates therein and hide useful or valuable information, so that many data warehouses, ETL (Extract- The abbreviation of Transform-Load, for describing data passing through extraction, transposition, the process for being loaded onto destination from source terminal), (Business Intelligence, translator of Chinese are business intelligence to BI, are the solutions of complete set, for it will organize in Existing data are effectively integrated, and are fast and accurately provided report and are provided decision-making foundation, tissue is helped to make wisdom Business business decision), data mining, machine learning and big data analysis project be all often with imperfectly or failure mode comes to an end (one of the main reasons for this be exactly because at the beginning to available data complete or collected works understanding not enough, have a deviation, or with without exception complete).
By taking Feature Engineering as an example.Feature Engineering is exactly to select one group of representative feature for constructing machine learning mould Type.This is an extremely important problem, it may be said that is a most time-consuming link in data analysis project.Its purpose is just It is to reduce the scope as much as possible in the case where there is big measure feature (variable) selectable situation, selects to analysis target most worthy Most influential feature in other words.However in real world, data are usually complexity, redundancy, missing, cause to count According to of poor quality.It is therefore desirable to process to screen to initial data, but by artificial screening, not only take time and effort, but also It is largely dependent upon manpower and its professional knowledge.
Summary of the invention
In order to solve the problems, such as currently to be difficult to comb or excavate to hide useful/worth of data in mass data, the present invention A kind of big data is designed to provide to explore and cognitive approach, device, equipment and computer storage medium.
The technical scheme adopted by the invention is as follows:
A kind of big data is explored and cognitive approach, includes the following steps:
S101. data source creation and management: creation needs to explore the target data source of cognition, including data source types, number According to the IP address of library server, login username, login password and database-name;
S102. data set creation and management: the data source based on creation creates a dataset name, to contain need Explore the target matrix and field of cognition;
S103. data set configures: the data set based on creation, and user arbitrarily screens the data inside specified correspondence database Table and field;
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function Can, constitute a specific exploration task;
S105. it executes exploration task: multiple exploration tasks of user's screening is ranked up, and sequentially distribution execution is multiple Exploration task;
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
Optimization, it is described to select all or part of mode for exploring function to configure following appoint in the step S104 The combined exploration task of meaning: table essential information explores task, task is explored in field value distributed intelligence, field feature information is explored Task and interfield hierarchical relationship information explore task;
In the step S105, some exploration task is executed as follows:
(A) it if the exploration task is that table essential information explores task, executes and explores task with the table essential information Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all The following essential information of target matrix explores result: table name, creation time, modification time, record sum and/or field are total Number;
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and is explored with the field value distributed intelligence Task is corresponding and based on the access program that JAVA and sql like language are write, and access target database server, then inquiry obtains The following Distribution value information of all aiming fields explores result: each difference non-null value, and corresponding from each different non-null values Frequency of occurrence;
(C) it if the exploration task is that field feature information explores task, executes and explores task with the field feature information Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all The following characteristic information of aiming field explores result: data type, null value rate, different value number, value density, memory space length, Whether physical length, minimum value, maximum value and/or field value are unique;
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship It is corresponding and based on the access program that JAVA and sql like language are write that information explores task, access target database server, then Inquiry obtains the following hierarchical relationship information between any two aiming field and explores result: one-one relationship, many-one relationship and/ Or many-to-one relationship.
It advanced optimizes, in (C) mode, the access program write according to following formula inquires acquisition aiming field Null value rate RnullAnd/or value density Ddistinct:
In formula, nnullFor the null value sum in aiming field, CdistinctFor in aiming field with different non-null values Number, N are the total number of records in aiming field said target tables of data.
It advanced optimizes, in (D) mode, the access program write in accordance with the following steps inquires acquisition first object Hierarchical relationship information between field and the second aiming field:
DS101. it checks for some non-null value in the first object field and corresponds to second aiming field In multiple or zero non-null value the case where, if it does not exist, execute step DS102, and if it exists, then follow the steps DS103;
DS102. it checks for some non-null value in second aiming field and corresponds to the first object field In multiple or zero non-null value the case where, if it does not exist, then determine the first object field and second aiming field it Between hierarchical relationship information be one-one relationship, otherwise determine between the first object field and second aiming field Hierarchical relationship information is many-to-one relationship;
DS103. it checks for some non-null value in second aiming field and corresponds to the first object field In multiple or zero non-null value the case where, if it does not exist, then determine the first object field and second aiming field it Between hierarchical relationship information be many-one relationship, otherwise determine between the first object field and second aiming field Hierarchical relationship information is many-to-many relationship.
It advanced optimizes, in the step S106:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if using tabular form show that this is basic Information explores result;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or histogram Form shows that the Distribution value information explores result;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or column figure Formula shows that this feature information explores result;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped Diagram form shows that the hierarchical relationship information explores result.
Optimize in detail, in the step S106 and when showing that Distribution value information explores result using bar graph form, Highest first M different non-null values of frequency of occurrence in selected field are shown by histogram, wherein M is between 10~100 Natural number.
Optimization, in the step S106, exports and visualize and select word with user's selected data table/and user The corresponding information of assuming overall responsibility for of section explores result.
Another technical solution of the present invention are as follows:
A kind of big data is explored and cognitive device, including data source creation and management module, data set creation and management mould Block, tables of data and field configuration module explore task configuration module, explore task execution module and explore result visualization mould Block;
The data source creation and management module, create and manage for data source: creation needs to explore the target of cognition Data source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, are used for data set Creation and management: the data source based on creation creates a dataset name, to contain the target data for needing to explore cognition Table and field;
The tables of data and field configuration module communicate to connect the data set creation and management module, are used for data set Configuration: the data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for exploring task wound It builds and manages: based on creation and configured data set, selecting all or part of exploration function, constitute specific explores and appoint Business;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: right Multiple exploration tasks of user's screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for automatically export and can Show that executing the obtained information of assuming overall responsibility for of exploration task explores result depending on changing.
Another technical solution of the present invention are as follows:
A kind of big data is explored and cognitive device, including communicating connected memory and processor, wherein the memory In store computer-readable instruction, when the computer-readable instruction is executed by the processor, so that the processor is held The step of row big data exploration as previously described and cognitive approach.
Another technical solution of the present invention are as follows:
A kind of computer storage medium is stored with computer program, the computer journey in the computer storage medium When sequence is executed by processor, so that the step of processor executes big data exploration as previously described and cognitive approach.
The invention has the benefit that
(1) the invention provide it is a kind of abstracted suitable for various industries and conveniently to various data it is general Method, apparatus, equipment and computer storage medium so that big data application project, ETL exploitation, data processing, cleaning, Integration and analysis modeling etc. and when facing the mass data of various different data sources, first create data source and data set, Then for target data set flexible configuration goal seeking and exploration task, and depth exploration is carried out, obtaining includes each data The basic and statistical information of table, the content distribution of each data sheet field, statistics, general picture, hierarchical relationship, null value rate, value density Etc. assuming overall responsibility for information, and these are assumed overall responsibility for into information and is recorded, be stored in client local data base, with visual view and Table is shown automatically, so that user be allowed to obtain the full cognizances of data, avoid the occurrence of it is insufficient to data cognition, have deviation and With congruent problem without exception, so be conducive to guide user shoot the arrow at the target or plan with suiting the remedy to the case and build data warehouse, ETL, BI, The projects such as data mining, machine learning and big data analysis escort for the normal development of these projects;
(2) user's each field of comprehensive understanding at a glance can be allow by the visualization of Data Mining result Association and hierarchical relationship between the quality and field of (namely characteristic variable) are conducive to reject uncorrelated (irrelevant) or the feature of redundancy (redundant), to reach reduction Characteristic Number, it is accurate to improve machine learning model Degree, reduces the purpose of runing time.For example, the null value rate of certain field too high (i.e. poor quality), value density reach 100% (such as unique field) or value density are close to 0 (numeric field as usual), then this field is just unsuitable for as characteristic variable. For another example A can be constructed in multidimensional on-line analysis if exploring field A is one-to-many relationship to the relationship of field B With the level (Hierarchy) of B, and in Feature Engineering, if having selected field B as feature, would not usually select Field A is also used as feature, further achievees the purpose that reduce Characteristic Number;
(3) business meaning and business scope representated by data of the present embodiment to exploration do not have any hypotheses, because This is applicable to the product of the various data general-purposes of all trades and professions, and does not have similar systematization on current international market at home Unitized exploration cognitive techniques exist;In addition, the present embodiment can support all relevant databases, including MySQL, Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ... etc.;
(4) big data is explored and cognitive approach also has many advantages, such as that exploring automation and result shows diversification, just In actually popularization and use.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram of big data exploration and cognitive approach provided by the invention.
Fig. 2 is the exemplary diagram provided by the invention that essential information exploration result is shown using tabular form.
Fig. 3 is the exemplary diagram provided by the invention that Distribution value information exploration result is shown using tabular form.
Fig. 4 is the exemplary diagram provided by the invention that Distribution value information exploration result is shown using bar graph form.
Fig. 5 is the exemplary diagram provided by the invention that characteristic information exploration result is shown using tabular form.
Fig. 6 is the exemplary diagram provided by the invention that null value rate in characteristic information exploration result is shown using bar graph form.
Fig. 7 is the exemplary diagram provided by the invention that characteristic information exploration result Median densities are shown using bar graph form.
Fig. 8 be it is provided by the invention using bar graph form show characteristic information explore result in space waste statistics show Example diagram.
Fig. 9 is the exemplary diagram provided by the invention that level relation information exploration result is shown using tabular form.
Figure 10 is the example provided by the invention that one-to-many hierarchical relationship information exploration result is shown using tree-shaped diagram form Figure.
Figure 11 is the example provided by the invention that one-to-one hierarchical relationship information exploration result is shown using tree-shaped diagram form Figure.
Figure 12 is the structural schematic diagram of big data exploration and cognitive device provided by the invention.
Figure 13 is the structural schematic diagram of big data exploration and cognitive device provided by the invention.
Figure 14 is the application scenarios schematic diagram of big data exploration and cognitive device provided by the invention.
In above-mentioned attached drawing: 1- big data is explored and cognitive device;201-CRM database server;202-ERP database clothes Business device;203-SCM database server;204-Legacy database server;205-External database server.
Specific embodiment
With reference to the accompanying drawing and specific embodiment the present invention is further elaborated.It should be noted that for this The explanation of a little way of example is used to help understand the present invention, but and does not constitute a limitation of the invention.It is disclosed herein specific Structure and function details is only used for description example embodiments of the present invention.However, this hair can be embodied with many alternative forms It is bright, and be not construed as limiting the invention in embodiment set forth herein.
It should be appreciated that containing the multiple operations occurred according to particular order, still in some processes described herein These operations can not be executed according to its sequence what appears in this article or be executed parallel, the serial number of operation such as S101, S102 Deng being only used for distinguishing each different operation, serial number itself, which does not represent, any executes sequence.In addition, these processes It may include more or fewer operations, and these operations equally execute in order or parallel execution.
It will be appreciated that though term first, second etc. can be used herein to describe various units, these units are not answered When being limited by these terms.These terms are only used to distinguish a unit and another unit.Such as it can be single by first Member is referred to as second unit, and similarly second unit can be referred to as first unit, real without departing from example of the invention Apply the range of example.
It should be appreciated that the terms "and/or", only a kind of incidence relation for describing affiliated partner, expression can be with There are three kinds of relationships, for example, A and/or B, can indicate: individualism A, individualism B exist simultaneously tri- kinds of situations of A and B, The terms "/and " are to describe another affiliated partner relationship, indicate may exist two kinds of relationships, can be with for example, A/ and B Indicate: two kinds of situations of individualism A, individualism A and B, in addition, character "/" herein, typicallying represent forward-backward correlation object is A kind of "or" relationship.
It should be appreciated that when by unit referred to as with another unit " connection ", " connected " or " coupling " when, it can with it is another A unit is directly connected or couples or temporary location may exist.Relatively, it is referred to as with another unit " directly when by unit It is connected " or when " direct-coupling ", temporary location is not present.It should explain in a similar manner for describing relationship between unit Other words (for example, " ... between " to " between directly existing ... ", " adjacent " is to " direct neighbor " etc.).
Terms used herein are only used for description specific embodiment, are not intended to limit example embodiments of the present invention.Such as Used herein, singular "a", "an" and "the" is intended to include plural form, unless phase is explicitly indicated in context The anti-meaning.It should also be understood that term " includes ", " including ", "comprising" and/or " containing " are as used herein, institute is specified The feature of statement, integer, step, operation, unit and/or component existence, and be not excluded for other one or more features, Quantity, step, operation, unit, component and/or their combination existence or increase.
It will be further noted that the function action occurred may be with the sequence of attached drawing appearance in some alternative embodiments It is different.Such as related function action is depended on, it can actually substantially be executed concurrently, or sometimes can be with phase Anti- sequence executes continuously show two figures.
Specific details is provided, in the following description in order to which example embodiment is understood completely.However ability Domain those of ordinary skill is it is to be understood that implementation example embodiment without these specific details.Such as it can be System is shown in block diagram, to avoid with unnecessary details come so that example is unclear.It in other instances, can not be with need not The details wanted shows well-known process, structure and technology, to avoid making example embodiment unclear.
Embodiment one
As shown in Fig. 1~11, the big data provided in this embodiment is explored and cognitive approach, can be, but not limited to include Following steps S101~S106.
S101. data source creation and management: creation needs to explore the target data source of cognition, including but not limited to data source Type, the IP address of database server, login username, login password and database-name etc..
In the step S101, data source, that is, data source is the source of data set, can be various businesses field Data, such as CRM (Customer Relationship Management, customer relation management), ERP (Enterprise Resource Planning, Enterprise Resources Plan), ecommerce (electric business), SCM (Supply Chain Management, Supply chain management), Legacy and External (history and external data) ... etc..These data can be stored in various Lane database, such as MySQL, Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ... Etc., the present embodiment supports all relevant databases.Before being explored, creation data source is first had to, indicates that data source is believed Breath.Data source information include be successfully connected to all information required for data source, such as database server IP address or Title, type, database-name, login username and login password.Therefore a complete effective data source information must wrap It includes: data source name, data source types (such as Oracle, MySQL, DB2, Microsoft SQL Server, Microsoft Access, Sybase, Hive, PostgreSQL, Teradata ... etc.), the IP address of database server or title, login User name, login password and corresponding database-name etc..It is detailed, data are preferably created on human-computer interaction interface Source generates data source information, then the data source information is added in a data source list, subsequently carries out following data Source control: it is included in data source list and increases or delete data source information newly;Data source information can not be changed once creation, only It can test connection validity or deletion data source.
S102. data set creation and management: the data source based on creation creates a dataset name, to contain need Explore the target matrix and field of cognition.
In the step S102, data set is obtained and by mass data table dependent on that can be successfully connected data source With the set of field composition, each data set has its description information and some can be successfully connected data source correspondingly, because This can carry out new creation and management to data set according to data source;For example, certain data set includes to surround the 5 of some theme respectively A tables of data: user personality table, hotel information table, date table, hotel reservation table and reservation tran list, they are used to manage hotel The information such as user information, hotel's price in booking process, these tables of data independent are coupled by creation, and are protected There are in data set;When can carry out the configuration management of data set after creation, and record the time of last time configuration, i.e. data Collect renewal time.Since each data set can correspond to the data source that can be successfully connected, configuration is being created or updated Afterwards, all tables of data for obtaining and being located in the database server can be accessed from the database server in corresponding data source Title/and all field names in each tables of data, obtain data set information.
S103. data set configures: the data set based on creation, and user arbitrarily screens the data inside specified correspondence database Table and field.
In the step S103, specified tables of data and field are specific object to be explored, including will be whole Field as goal seeking tables of data or using part field as goal seeking tables of data (by configure target matrix with And the aiming field in configuration target matrix, can so be explored for specific several aiming fields).Tables of data or Table (TabIe) is that (database is a frame to one of most important component part of database, and tables of data is only its essence Content), a line in table can be called one " record " (should " record " include this line in all information, just as Address book data concentrate someone all information, but " record " in data set not special record name, usually use it The line number at place indicates which " record " this is), while the column in table can be called to one " field " and (be somebody's turn to do " field " Contain the information of a certain special topic, such as concentrated in address book data, " name " and " telephone number " these be all in table The shared attribute of row, so these column are known as " name " field and " telephone number " field).In addition, in the step S103 Specific any screening mode can be, but not limited to add tables of data/and field to be explored in data set, or delete The tables of data/and field explored are not needed.
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function Can, constitute a specific exploration task.
In the step S104, optimization, all or part of mode for exploring function of selection can be, but not limited to For the exploration task for configuring following any combination: table essential information explores task, task, field are explored in field value distributed intelligence Characteristic information explores task and interfield hierarchical relationship information explores task.The table essential information explores task, the word Task is explored in segment value distributed intelligence, the field feature information explores task and the interfield hierarchical relationship information explores task Be based on JAVA and sql like language (Structured Query Language, structured query language) and custom-written is basic Unit is explored, exploration task that is different and wanting can be configured for different data collection, to execute corresponding task in starting Access program after, access target database server, the target information that then inquiry obtains target matrix or field is explored As a result.
S105. it executes exploration task: multiple exploration tasks of user's screening is ranked up, and sequentially distribution execution is multiple Exploration task.
It is detailed in the step S105, it can be, but not limited to (A)~(D) as follows and execute some exploration Task.
(A) it if the exploration task is that table essential information explores task, executes and explores task with the table essential information Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all Target matrix but be not limited to following essential information and explore result: table name, creation time, modification time, record sum and/ Or field sum etc..
In the mode (A), the table name, the creation time, the modification time, record sum or described Field sum can be directly based upon the sql like language for tables of data and metadata, carry out inquiry acquisition.
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and is explored with the field value distributed intelligence Task is corresponding and based on the access program that JAVA and sql like language are write, and access target database server, then inquiry obtains All aiming fields but be not limited to following Distribution value information and explore result: each difference non-null values, and with each difference Corresponding frequency of occurrence of non-null value etc..
In the mode (B), each different non-null values of aiming field can be based on existing sql like language and routine The value matching way of JAVA program carries out data base querying and compares to obtain, and (it includes occurrence to their corresponding frequency of occurrences The several and/or frequency of occurrences) it again may be by the counting mode of existing sql like language He routine JAVA program, progress database Inquiry and statistics obtain.
(C) it if the exploration task is that field feature information explores task, executes and explores task with the field feature information Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all Aiming field but be not limited to following characteristic information and explore result: data type, different value number, value density, stores null value rate Whether space length, physical length, minimum value, maximum value and/or field value are unique etc..
In the mode (C), the data type of aiming field, memory space length, physical length, different value number, most Whether small value, maximum value and field value can uniquely be based on sql like language and JAVA program, carry out data and metadata query and place Reason obtains, and the null value rate R of aiming fieldnullWith value density DdistinctThe access program that can write according to following formula is inquired It obtains:
In formula, nnullFor the null value sum in aiming field, CdistinctFor in aiming field with different non-null values Number, N are the total number of records (total line number i.e. in target matrix) in aiming field said target tables of data.By aforementioned formula It is found that the null value rate R of aiming fieldnullWith value density DdistinctIt is respectively interposed between 0~1, wherein null value rate is higher, indicates The value missing of this field is more, and the quality of data is lower, as low as to a certain degree it is necessary to ignore this field or to missing Value processes, such as is substituted with a default value;If being worth density is equal to 0, show that this field is all null value, if value Density is equal to 1, then it is duplicate to show that value that this field is included does not have, both extreme cases all make this field not It is suitable as the characteristic variable of data mining or machine learning;It is smaller to be worth density, it may also be said to thicker at granularity.
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship It is corresponding and based on the access program that JAVA and sql like language are write that information explores task, access target database server, then Inquiry obtains between any two aiming field but is not limited to following hierarchical relationship information exploration result: one-one relationship, one Many-many relationship and/or many-to-one relationship etc..
In the mode (D), the access program write in accordance with the following steps can be, but not limited to inquire acquisition first Hierarchical relationship information between aiming field and the second aiming field: DS101. is checked in the first object field Some non-null value the case where corresponding to multiple in second aiming field or zero non-null value execute step if it does not exist DS102, and if it exists, then follow the steps DS103;DS102. some non-null value in second aiming field is checked for In the corresponding first object field the case where multiple or zero non-null value, if it does not exist, then the first object field is determined Hierarchical relationship information between second aiming field is one-one relationship, otherwise determines the first object field and institute Stating the hierarchical relationship information between the second aiming field is many-to-one relationship;DS103. second target word is checked for The case where some non-null value in section corresponds to multiple in the first object field or zero non-null value, if it does not exist, then determines Hierarchical relationship information between the first object field and second aiming field is many-one relationship, otherwise described in judgement Hierarchical relationship information between first object field and second aiming field is many-to-many relationship.In abovementioned steps DS101 In~DS103, inspection judgement specifically can be carried out using the Group By function of sql like language, be repeated no more in this.In addition, described First object field and second aiming field can be located in same tables of data, can also be located in different data table;By Mean not determine relationship in many-to-many relationship, answers this that can explore result not as hierarchical relationship information.
Foregoing manner (A)~(D) can one-touch starting execute, and explored after task execution as a result, task Status information can also be updated therewith.Task once execute, there will be time started, end time and time-consuming information from Dynamic record, and be stored in the summary of implementing result with task status, wherein the task status may include following several Kind: from not running (original state of i.e. newly-built task), it is currently running (the shape after task brings into operation and before end of run State), run successfully (Mission Success executes completion), operation failure (task execution failure, as in implementation procedure data source close Or other fortuitous events will lead to failure), operation complete (part table or field execute exception in task execution, after show Operation is completed) and stop (task in being currently running, user click the Terminate button manually and stop execution procedures) manually.It is in office After business is run successfully, user can check the summary for exploring result and various displayed pages.In addition, the step S104 it Afterwards, user's sequence can be carried out to pending more exploration tasks in task list, then sequentially executes each exploration task, i.e., If selecting the multiple exploration tasks of starting in the task list, putting in order for task is explored according to these and is seriatim passed through The step S105 is obtained and is saved corresponding information of assuming overall responsibility for and explores as a result, realizing that multitask is sequentially distributed the purpose of execution.
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
In the step S106, the exploration result of target object is checked in order to facilitate user flexibility or can choose specific Certain execution task goes to check as a result, optimize, can be after user's selected data table/and user select field, and exporting simultaneously can Show that information of assuming overall responsibility for corresponding with user's selected data table/and the selected field of user explores result depending on changing, wherein user's choosing Determine tables of data and it is described for select field require be located at target data concentrate.
In the step S106, it specifically can also export and visualize as follows and obtained assume overall responsibility for information Explore result:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if using tabular form show that this is basic Information is explored as a result, as shown in Fig. 2 citing;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or histogram Form shows that the Distribution value information is explored as a result, as shown in Fig. 3 and Fig. 4 citing;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or column figure Formula shows that this feature information is explored as a result, as shown in the citing of Fig. 5,6,7 and 8, wherein " field null value rate " in Fig. 6 is sky Value rate Rnull, " field density " in Fig. 7 is to be worth density Ddistinct, " character length " in Fig. 8 is memory space length, " percentage " is the percentage that physical length accounts for memory space length;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped Diagram form shows that the hierarchical relationship information is explored as a result, as shown in the citing of Fig. 9,10 and 11.
Further specifically, showing that Distribution value information explores result in the step S106 and using bar graph form When, highest first M different non-null values of frequency of occurrence in selected field are shown by histogram, wherein M is between 10~100 Between natural number.As shown in figure 4, M can be exemplified as 20.The further details of relevant field are drilled through for convenience, are optimized , in the step S104 and when receiving the input signal for clicking histogram, exports and show word corresponding with the histogram The distribution situation of main contents and/or sparse content in section, wherein the main contents refer to the frequency of occurrence highest in field Non-null value, the sparse content refers to the non-null value that frequency of occurrence is minimum in field.
To sum up, it using the exploration of big data provided by the present embodiment and cognitive approach, has the following technical effect that
(1) a kind of general side suitable for various industries and conveniently abstracted to various data is present embodiments provided Method so that big data application project, ETL exploitation, data processing, cleaning, integration and in terms of and in face of each When the mass data of kind different data sources, data source and data set are first created, is then explored for target data set flexible configuration Target and exploration task, and depth exploration is carried out, obtain the basic and statistical information comprising each tables of data, each data literary name Content distribution, statistics, general picture, hierarchical relationship, null value rate, the value density etc. of section assume overall responsibility for information, and these are assumed overall responsibility for information record Get off, be stored in client local data base, shown automatically with visual view and table, so that user be allowed to obtain The full cognizance of data, avoid the occurrence of it is insufficient to data cognition, have deviation and with congruent problem without exception, and then be conducive to guide user Plan with a definite target in view or with suiting the remedy to the case and build data warehouse, ETL, BI, data mining, machine learning and big data analysis etc. Project escorts for the normal development of these projects;
(2) user's each field of comprehensive understanding at a glance can be allow by the visualization of Data Mining result Association and hierarchical relationship between the quality and field of (namely characteristic variable) are conducive to reject uncorrelated (irrelevant) or the feature of redundancy (redundant), to reach reduction Characteristic Number, it is accurate to improve machine learning model Degree, reduces the purpose of runing time.For example, the null value rate of certain field too high (i.e. poor quality), value density reach 100% (such as unique field) or value density are close to 0 (numeric field as usual), then this field is just unsuitable for as characteristic variable. For another example A can be constructed in multidimensional on-line analysis if exploring field A is one-to-many relationship to the relationship of field B With the level (Hierarchy) of B, and in Feature Engineering, if having selected field B as feature, would not usually select Field A is also used as feature, further achievees the purpose that reduce Characteristic Number;
(3) business meaning and business scope representated by data of the present embodiment to exploration do not have any hypotheses, because This is applicable to the product of the various data general-purposes of all trades and professions, and does not have similar systematization on current international market at home Unitized exploration cognitive techniques exist;In addition, the present embodiment can support all relevant databases, including MySQL, Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ... etc.;
(4) big data is explored and cognitive approach also has many advantages, such as that exploring automation and result shows diversification, just In actually popularization and use.
Embodiment two
As shown in figure 12, present embodiments provide it is a kind of realize big data described in embodiment one explore and cognitive approach it is hard Part device, including data source creation and management module, data set creation and management module, tables of data and field configuration module, spy Rope task configuration module explores task execution module and explores result visualization module;
The data source creation and management module, create and manage for data source: creation needs to explore the target of cognition Data source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, are used for data set Creation and management: the data source based on creation creates a dataset name, to contain the target data for needing to explore cognition Table and field;
The tables of data and field configuration module communicate to connect the data set creation and management module, are used for data set Configuration: the data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for exploring task wound It builds and manages: based on creation and configured data set, selecting all or part of exploration function, constitute specific explores and appoint Business;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: right Multiple exploration tasks of user's screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for automatically export and can Show that executing the obtained information of assuming overall responsibility for of exploration task explores result depending on changing.
Big data provided in this embodiment is explored and the course of work, operational detail and the technical effect of cognitive device, can be with Referring to embodiment one, repeated no more in this.
Embodiment three
As shown in Figs. 13 and 14, big data described in a kind of realization embodiment one is present embodiments provided to explore and cognitive approach Hardware device, including communicating connected memory and processor, wherein store computer-readable finger in the memory It enables, when the computer-readable instruction is executed by the processor, is counted greatly as described in embodiment one so that the processor is executed The step of according to exploration and cognitive approach.As shown in figure 13, it includes being connected by system bus that big data, which is explored with cognitive device, Processor, memory and network interface, wherein the processor is for providing calculating and control ability;The memory includes Non-volatile memory medium and built-in storage, the non-volatile memory medium are stored with operating system and described computer-readable Instruction, the built-in storage provide ring for the operation of operating system and computer-readable instruction in non-volatile memory medium Border, the network interface are used to carry out network communication connection with external database server.As shown in figure 14, one kind is provided to show The application scenarios of example property, big data is explored and cognitive device 1 pass through internet or corporate intranet respectively with CRM database server 201, ERP database server 202, SCM database server 203, Legacy database server 204 and External data The communication of library server 205 is connected, this five kinds of data sources can so be carried out with the creation and management, the creation of data set of data source With management and carry out as described in embodiment one and the big data as described in step S103~S106 explore and cognition times Business obtains and assumes overall responsibility for information exploration result accordingly.
Big data provided in this embodiment is explored and the course of work, operational detail and the technical effect of cognitive device, can be with Referring to embodiment one, repeated no more in this.
Example IV
Present embodiments provide a kind of computer journey of the storage comprising the exploration and cognitive approach of big data described in embodiment one The computer storage medium of sequence is stored with computer program, the computer program quilt in the computer storage medium When processor executes, so that the processor executes as described in embodiment one big data exploration and the step of cognitive approach.Wherein, Computer can be general purpose computer, special purpose computer, computer network or other programmable devices, be also possible to move Smart machine (such as smart phone, PAD or ipad).
The course of work, operational detail and the technical effect of computer storage medium provided in this embodiment, may refer to reality Example one is applied, is repeated no more in this.
Multiple embodiments described above are only schematical, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables with so that a computer equipment executes method described in certain parts of each embodiment or embodiment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features.And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Finally it is pointed out that for those of ordinary skill in the art, before not departing from present inventive concept It puts, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.Therefore, the guarantor of the invention patent Shield range should be determined by the appended claims, and specification can be used for interpreting the claims.

Claims (10)

1. a kind of big data is explored and cognitive approach, which comprises the steps of:
S101. data source creation and management: creation needs to explore the target data source of cognition, including data source types, database IP address, login username, login password and the database-name of server;
S102. data set creation and management: the data source based on creation creates a dataset name, to contain needing to visit The target matrix and field of rope cognition;
S103. data set configures: the data set based on creation, user arbitrarily screen tables of data inside specified correspondence database with Field;
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function, Constitute a specific exploration task;
S105. it executes exploration task: multiple exploration tasks of user's screening being ranked up, and sequentially distribution executes multiple explorations Task;
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
2. a kind of big data as described in claim 1 is explored and cognitive approach, it is characterised in that:
In the step S104, the exploration for selecting all or part of mode for exploring function as the following any combination of configuration Task: table essential information explores task, task is explored in field value distributed intelligence, field feature information explores task and interfield Hierarchical relationship information explores task;
In the step S105, some exploration task is executed as follows:
(A) it if the exploration task is that table essential information explores task, executes corresponding with table essential information exploration task And based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all targets The following essential information of tables of data explores result: table name, creation time, modification time, record sum and/or field sum;
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and explores task with the field value distributed intelligence Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all The following Distribution value information of aiming field explores result: each difference non-null values, and it is corresponding from each different non-null values out The existing frequency;
(C) it if the exploration task is that field feature information explores task, executes corresponding with field feature information exploration task And based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all targets The following characteristic information of field explores result: data type, null value rate, different value number, value density, memory space length, reality Whether length, minimum value, maximum value and/or field value are unique;
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship information Exploration task is corresponding and based on the access program that JAVA and sql like language are write, then access target database server is inquired It obtains following hierarchical relationship information between any two aiming field and explores result: one-one relationship, many-one relationship and/or more One-one relationship.
3. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in (C) mode, according to such as The access program that lower formula is write obtains the null value rate R of aiming field to inquirenullAnd/or value density Ddistinct:
In formula, nnullFor the null value sum in aiming field, CdistinctTo have the number of different non-null values, N in aiming field For the total number of records in aiming field said target tables of data.
4. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in (D) mode, according to such as The access program that lower step is write inquires the hierarchical relationship information obtained between first object field and the second aiming field:
DS101. check for some non-null value in the first object field correspond to it is more in second aiming field The case where a or zero non-null value, executes step DS102, and if it exists, then follow the steps DS103 if it does not exist;
DS102. check for some non-null value in second aiming field correspond to it is more in the first object field The case where a or zero non-null value, then determines between the first object field and second aiming field if it does not exist Hierarchical relationship information is one-one relationship, otherwise determines the level between the first object field and second aiming field Relation information is many-to-one relationship;
DS103. check for some non-null value in second aiming field correspond to it is more in the first object field The case where a or zero non-null value, then determines between the first object field and second aiming field if it does not exist Hierarchical relationship information is many-one relationship, otherwise determines the level between the first object field and second aiming field Relation information is many-to-many relationship.
5. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in the step S106:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if the essential information shown using tabular form Explore result;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or bar graph form Show that the Distribution value information explores result;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or bar graph form exhibition Show that this feature information explores result;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped figure Formula shows that the hierarchical relationship information explores result.
6. a kind of big data as claimed in claim 6 is explored and cognitive approach, it is characterised in that: in the step S106 and When showing that Distribution value information explores result using bar graph form, show that frequency of occurrence is highest in selected field by histogram First M different non-null value, wherein M is the natural number between 10~100.
7. a kind of big data as described in claim 1 is explored and cognitive approach, it is characterised in that: in the step S106, It exports and visualizes information of assuming overall responsibility for corresponding with user's selected data table/and the selected field of user and explore result.
8. a kind of big data is explored and cognitive device, it is characterised in that: including data source creation and management module, data set creation And management module, tables of data and field configuration module, exploration task configuration module, exploration task execution module and exploration result can Depending on changing module;
The data source creation and management module, create and manage for data source: creation needs to explore the target data of cognition Source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, create for data set And management: the data source based on creation creates a dataset name, to contain need explore cognition target matrix and Field;
The tables of data and field configuration module communicate to connect the data set creation and management module, configure for data set: Data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for explore task creation and Management: based on creation and configured data set, selection is all or part of to explore function, constitutes a specific exploration task;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: to user Multiple exploration tasks of screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for exporting and visualizing automatically Show that executing the obtained information of assuming overall responsibility for of exploration task explores result.
9. a kind of big data is explored and cognitive device, which is characterized in that including communicating connected memory and processor, wherein Computer-readable instruction is stored in the memory, when the computer-readable instruction is executed by the processor, so that institute It states processor and executes as described in claim 1~7 any one big data exploration and the step of cognitive approach.
10. a kind of computer storage medium, which is characterized in that be stored with computer program, institute in the computer storage medium When stating computer program and being executed by processor, so that the processor executes the big data as described in claim 1~7 any one The step of exploration and cognitive approach.
CN201910718388.1A 2019-08-05 2019-08-05 Big data exploration and cognition method, device, equipment and computer storage medium Active CN110442620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910718388.1A CN110442620B (en) 2019-08-05 2019-08-05 Big data exploration and cognition method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910718388.1A CN110442620B (en) 2019-08-05 2019-08-05 Big data exploration and cognition method, device, equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN110442620A true CN110442620A (en) 2019-11-12
CN110442620B CN110442620B (en) 2023-08-29

Family

ID=68433296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910718388.1A Active CN110442620B (en) 2019-08-05 2019-08-05 Big data exploration and cognition method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN110442620B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990447A (en) * 2019-12-19 2020-04-10 北京锐安科技有限公司 Data probing method, device, equipment and storage medium
CN111180083A (en) * 2019-12-31 2020-05-19 北京零研科技有限公司 Clinical scientific research data management method and system
CN111190597A (en) * 2019-12-27 2020-05-22 天津浪淘科技股份有限公司 Data UE visual design system
CN112434009A (en) * 2020-11-19 2021-03-02 浙江大华技术股份有限公司 End-to-end data probing method and device, computer equipment and storage medium
CN115017136A (en) * 2022-06-29 2022-09-06 江苏重行信息科技有限公司 Monitoring data analysis, storage and management system based on big data application
CN116109121A (en) * 2023-04-17 2023-05-12 西昌学院 User demand mining method and system based on big data analysis

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5870739A (en) * 1996-09-20 1999-02-09 Novell, Inc. Hybrid query apparatus and method
US20060136382A1 (en) * 2004-12-17 2006-06-22 International Business Machines Corporation Well organized query result sets
US20090083277A1 (en) * 2007-09-26 2009-03-26 Barsness Eric L Nodal data normalization
CN101556606A (en) * 2009-05-20 2009-10-14 同方知网(北京)技术有限公司 Data mining method based on extraction of Web numerical value tables
US20110022581A1 (en) * 2009-07-27 2011-01-27 Rama Krishna Korlapati Derived statistics for query optimization
US20110145210A1 (en) * 2009-12-10 2011-06-16 Negti Systems, Inc. System and Method for Managing One or More Databases
CN103930887A (en) * 2011-11-18 2014-07-16 惠普发展公司,有限责任合伙企业 Query summary generation using row-column data storage
US20140280280A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Estimating error propagation for database optimizers
US20150066930A1 (en) * 2013-08-28 2015-03-05 Intelati, Inc. Generation of metadata and computational model for visual exploration system
CN106295983A (en) * 2016-08-08 2017-01-04 烟台海颐软件股份有限公司 Power marketing data visualization statistical analysis technique and system
US20170103104A1 (en) * 2015-10-07 2017-04-13 International Business Machines Corporation Query plan based on a data storage relationship
US9633076B1 (en) * 2012-10-15 2017-04-25 Tableau Software Inc. Blending and visualizing data from multiple data sources
CN106775997A (en) * 2015-11-23 2017-05-31 阿里巴巴集团控股有限公司 A kind of task processing method and equipment
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107391739A (en) * 2017-08-07 2017-11-24 北京奇艺世纪科技有限公司 A kind of query statement generation method, device and electronic equipment
CN108717786A (en) * 2018-07-17 2018-10-30 南京航空航天大学 A kind of traffic accident causation method for digging based on universality meta-rule
CN109189846A (en) * 2018-09-11 2019-01-11 北京易华录信息技术股份有限公司 A kind of public security traffic control visual modeling system and method based on big data technology
CN109918389A (en) * 2019-03-13 2019-06-21 试金石信用服务有限公司 Data air control method, apparatus and storage medium based on message flow and chart database
CN110008232A (en) * 2019-04-11 2019-07-12 北京启迪区块链科技发展有限公司 Generation method, device, server and the medium of structured query sentence
CN110069559A (en) * 2019-03-21 2019-07-30 中国人民解放军陆军工程大学 A kind of analysis of Heterogeneous Information System data and integrated approach with height automatic control

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5870739A (en) * 1996-09-20 1999-02-09 Novell, Inc. Hybrid query apparatus and method
US20060136382A1 (en) * 2004-12-17 2006-06-22 International Business Machines Corporation Well organized query result sets
US20090083277A1 (en) * 2007-09-26 2009-03-26 Barsness Eric L Nodal data normalization
CN101556606A (en) * 2009-05-20 2009-10-14 同方知网(北京)技术有限公司 Data mining method based on extraction of Web numerical value tables
US20110022581A1 (en) * 2009-07-27 2011-01-27 Rama Krishna Korlapati Derived statistics for query optimization
US20110145210A1 (en) * 2009-12-10 2011-06-16 Negti Systems, Inc. System and Method for Managing One or More Databases
CN103930887A (en) * 2011-11-18 2014-07-16 惠普发展公司,有限责任合伙企业 Query summary generation using row-column data storage
US9633076B1 (en) * 2012-10-15 2017-04-25 Tableau Software Inc. Blending and visualizing data from multiple data sources
US20140280280A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Estimating error propagation for database optimizers
US20150066930A1 (en) * 2013-08-28 2015-03-05 Intelati, Inc. Generation of metadata and computational model for visual exploration system
US20170103104A1 (en) * 2015-10-07 2017-04-13 International Business Machines Corporation Query plan based on a data storage relationship
CN106775997A (en) * 2015-11-23 2017-05-31 阿里巴巴集团控股有限公司 A kind of task processing method and equipment
CN106295983A (en) * 2016-08-08 2017-01-04 烟台海颐软件股份有限公司 Power marketing data visualization statistical analysis technique and system
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107391739A (en) * 2017-08-07 2017-11-24 北京奇艺世纪科技有限公司 A kind of query statement generation method, device and electronic equipment
CN108717786A (en) * 2018-07-17 2018-10-30 南京航空航天大学 A kind of traffic accident causation method for digging based on universality meta-rule
CN109189846A (en) * 2018-09-11 2019-01-11 北京易华录信息技术股份有限公司 A kind of public security traffic control visual modeling system and method based on big data technology
CN109918389A (en) * 2019-03-13 2019-06-21 试金石信用服务有限公司 Data air control method, apparatus and storage medium based on message flow and chart database
CN110069559A (en) * 2019-03-21 2019-07-30 中国人民解放军陆军工程大学 A kind of analysis of Heterogeneous Information System data and integrated approach with height automatic control
CN110008232A (en) * 2019-04-11 2019-07-12 北京启迪区块链科技发展有限公司 Generation method, device, server and the medium of structured query sentence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FARRE, CARLES; NUTT, WERNER; TENIENTE, ERNEST; URPI, TONI: "Containment of conjunctive queries over databases with null values", 《11TH INTERNATIONAL CONFERENCE ON DATABASE THEORY》 *
JAGDEV BHOGAL; IMRAN CHOKSI: "Handling Big Data Using NoSQL", 《IEEE》 *
魏小宁: "构建数据仓库系统的技术分析", 华南金融电脑, no. 08 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990447A (en) * 2019-12-19 2020-04-10 北京锐安科技有限公司 Data probing method, device, equipment and storage medium
CN110990447B (en) * 2019-12-19 2023-09-15 北京锐安科技有限公司 Data exploration method, device, equipment and storage medium
CN111190597A (en) * 2019-12-27 2020-05-22 天津浪淘科技股份有限公司 Data UE visual design system
CN111180083A (en) * 2019-12-31 2020-05-19 北京零研科技有限公司 Clinical scientific research data management method and system
CN112434009A (en) * 2020-11-19 2021-03-02 浙江大华技术股份有限公司 End-to-end data probing method and device, computer equipment and storage medium
CN115017136A (en) * 2022-06-29 2022-09-06 江苏重行信息科技有限公司 Monitoring data analysis, storage and management system based on big data application
CN115017136B (en) * 2022-06-29 2024-02-13 广州市橙鑫网络有限公司 Monitoring data analysis storage management system based on big data application
CN116109121A (en) * 2023-04-17 2023-05-12 西昌学院 User demand mining method and system based on big data analysis
CN116109121B (en) * 2023-04-17 2023-06-30 西昌学院 User demand mining method and system based on big data analysis

Also Published As

Publication number Publication date
CN110442620B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN110442620A (en) A kind of big data is explored and cognitive approach, device, equipment and computer storage medium
US10564622B1 (en) Control interface for metric definition specification for assets and asset groups driven by search-derived asset tree hierarchy
US20200167350A1 (en) Loading queries using search points
US20200334237A1 (en) Systems, methods, user interfaces and algorithms for performing database analysis and search of information involving structured and/or semi-structured data
US11681694B2 (en) Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface
US10459938B1 (en) Punchcard chart visualization for machine data search and analysis system
CN110168515A (en) System for analyzing data relationship to support query execution
US10585892B2 (en) Hierarchical dimension analysis in multi-dimensional pivot grids
CN110300963A (en) Data management system in large-scale data repository
US10459939B1 (en) Parallel coordinates chart visualization for machine data search and analysis system
CN101971165B (en) Graphic representations of data relationships
CN109997125A (en) System for importing data to data storage bank
Lum et al. 1978 New Orleans data base design workshop report
US11809439B1 (en) Updating client dashboarding component of an asset monitoring and reporting system
US11422869B2 (en) Presenting collaboration activity
US11461350B1 (en) Control interface for dynamic elements of asset monitoring and reporting system
US20070150562A1 (en) System and method for data quality management and control of heterogeneous data sources
EP1585036A2 (en) Management of parameterized database queries
Hobbs et al. Oracle 10g data warehousing
CA3179300C (en) Domain-specific language interpreter and interactive visual interface for rapid screening
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
US20150058363A1 (en) Cloud-based enterprise content management system
Postina et al. An ea-approach to develop soa viewpoints
US20130218893A1 (en) Executing in-database data mining processes
WO2021195285A1 (en) Systems and methods for tracking features in a development environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant