CN110442620A - A kind of big data is explored and cognitive approach, device, equipment and computer storage medium - Google Patents
A kind of big data is explored and cognitive approach, device, equipment and computer storage medium Download PDFInfo
- Publication number
- CN110442620A CN110442620A CN201910718388.1A CN201910718388A CN110442620A CN 110442620 A CN110442620 A CN 110442620A CN 201910718388 A CN201910718388 A CN 201910718388A CN 110442620 A CN110442620 A CN 110442620A
- Authority
- CN
- China
- Prior art keywords
- data
- field
- information
- task
- exploration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to data processings and analysis technical field, a kind of big data is disclosed to explore and cognitive approach, device, equipment and computer storage medium, so that big data application project, ETL exploitation, data processing, cleaning, integration and in terms of and face various different data sources mass data when, it can explore to obtain and assume overall responsibility for information comprising each tables of data even each field, and these are assumed overall responsibility for into information and is shown automatically with visual view and table, so that user be allowed to obtain the full cognizance to data.Many data warehouses, ETL, BI, data mining, machine learning and big data analysis project are often imperfectly or to end up in failure at present, major reason is because recognizing at the beginning to data complete or collected works inadequate, there is deviation, or with without exception complete, therefore can preferably help through the invention, guides user with a definite target in view, the planning suited the remedy to the case and develop aforementioned project, be avoided as much as these projects imperfectly or to end in failure.
Description
Technical field
The invention belongs to data processings and analysis technical field, and in particular to a kind of big data is explored and cognitive approach, dress
It sets, equipment and computer storage medium.
Background technique
Big data (big data), which refers to, to be captured within the scope of certain time with conventional software tool, manage and
The data acquisition system of processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability
Magnanimity, high growth rate and diversified information assets.In recent years, with Internet technology, computer technology and database skill
, there is a large amount of data information in our daily work life in the fast development of art etc., and these data informations with
The growth rate of geometry rank keeps high speed to be incremented by, and then the speed for causing information content to increase is wanted more than the speed of human intelligible
Fastly, and with ocean wave type the life of the mankind is poured in from all directions.In face of growing mass data, it is very difficult to manually comb at present
It manages and excavates therein and hide useful or valuable information, so that many data warehouses, ETL (Extract-
The abbreviation of Transform-Load, for describing data passing through extraction, transposition, the process for being loaded onto destination from source terminal),
(Business Intelligence, translator of Chinese are business intelligence to BI, are the solutions of complete set, for it will organize in
Existing data are effectively integrated, and are fast and accurately provided report and are provided decision-making foundation, tissue is helped to make wisdom
Business business decision), data mining, machine learning and big data analysis project be all often with imperfectly or failure mode comes to an end
(one of the main reasons for this be exactly because at the beginning to available data complete or collected works understanding not enough, have a deviation, or with without exception complete).
By taking Feature Engineering as an example.Feature Engineering is exactly to select one group of representative feature for constructing machine learning mould
Type.This is an extremely important problem, it may be said that is a most time-consuming link in data analysis project.Its purpose is just
It is to reduce the scope as much as possible in the case where there is big measure feature (variable) selectable situation, selects to analysis target most worthy
Most influential feature in other words.However in real world, data are usually complexity, redundancy, missing, cause to count
According to of poor quality.It is therefore desirable to process to screen to initial data, but by artificial screening, not only take time and effort, but also
It is largely dependent upon manpower and its professional knowledge.
Summary of the invention
In order to solve the problems, such as currently to be difficult to comb or excavate to hide useful/worth of data in mass data, the present invention
A kind of big data is designed to provide to explore and cognitive approach, device, equipment and computer storage medium.
The technical scheme adopted by the invention is as follows:
A kind of big data is explored and cognitive approach, includes the following steps:
S101. data source creation and management: creation needs to explore the target data source of cognition, including data source types, number
According to the IP address of library server, login username, login password and database-name;
S102. data set creation and management: the data source based on creation creates a dataset name, to contain need
Explore the target matrix and field of cognition;
S103. data set configures: the data set based on creation, and user arbitrarily screens the data inside specified correspondence database
Table and field;
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function
Can, constitute a specific exploration task;
S105. it executes exploration task: multiple exploration tasks of user's screening is ranked up, and sequentially distribution execution is multiple
Exploration task;
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
Optimization, it is described to select all or part of mode for exploring function to configure following appoint in the step S104
The combined exploration task of meaning: table essential information explores task, task is explored in field value distributed intelligence, field feature information is explored
Task and interfield hierarchical relationship information explore task;
In the step S105, some exploration task is executed as follows:
(A) it if the exploration task is that table essential information explores task, executes and explores task with the table essential information
Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all
The following essential information of target matrix explores result: table name, creation time, modification time, record sum and/or field are total
Number;
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and is explored with the field value distributed intelligence
Task is corresponding and based on the access program that JAVA and sql like language are write, and access target database server, then inquiry obtains
The following Distribution value information of all aiming fields explores result: each difference non-null value, and corresponding from each different non-null values
Frequency of occurrence;
(C) it if the exploration task is that field feature information explores task, executes and explores task with the field feature information
Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all
The following characteristic information of aiming field explores result: data type, null value rate, different value number, value density, memory space length,
Whether physical length, minimum value, maximum value and/or field value are unique;
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship
It is corresponding and based on the access program that JAVA and sql like language are write that information explores task, access target database server, then
Inquiry obtains the following hierarchical relationship information between any two aiming field and explores result: one-one relationship, many-one relationship and/
Or many-to-one relationship.
It advanced optimizes, in (C) mode, the access program write according to following formula inquires acquisition aiming field
Null value rate RnullAnd/or value density Ddistinct:
In formula, nnullFor the null value sum in aiming field, CdistinctFor in aiming field with different non-null values
Number, N are the total number of records in aiming field said target tables of data.
It advanced optimizes, in (D) mode, the access program write in accordance with the following steps inquires acquisition first object
Hierarchical relationship information between field and the second aiming field:
DS101. it checks for some non-null value in the first object field and corresponds to second aiming field
In multiple or zero non-null value the case where, if it does not exist, execute step DS102, and if it exists, then follow the steps DS103;
DS102. it checks for some non-null value in second aiming field and corresponds to the first object field
In multiple or zero non-null value the case where, if it does not exist, then determine the first object field and second aiming field it
Between hierarchical relationship information be one-one relationship, otherwise determine between the first object field and second aiming field
Hierarchical relationship information is many-to-one relationship;
DS103. it checks for some non-null value in second aiming field and corresponds to the first object field
In multiple or zero non-null value the case where, if it does not exist, then determine the first object field and second aiming field it
Between hierarchical relationship information be many-one relationship, otherwise determine between the first object field and second aiming field
Hierarchical relationship information is many-to-many relationship.
It advanced optimizes, in the step S106:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if using tabular form show that this is basic
Information explores result;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or histogram
Form shows that the Distribution value information explores result;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or column figure
Formula shows that this feature information explores result;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped
Diagram form shows that the hierarchical relationship information explores result.
Optimize in detail, in the step S106 and when showing that Distribution value information explores result using bar graph form,
Highest first M different non-null values of frequency of occurrence in selected field are shown by histogram, wherein M is between 10~100
Natural number.
Optimization, in the step S106, exports and visualize and select word with user's selected data table/and user
The corresponding information of assuming overall responsibility for of section explores result.
Another technical solution of the present invention are as follows:
A kind of big data is explored and cognitive device, including data source creation and management module, data set creation and management mould
Block, tables of data and field configuration module explore task configuration module, explore task execution module and explore result visualization mould
Block;
The data source creation and management module, create and manage for data source: creation needs to explore the target of cognition
Data source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, are used for data set
Creation and management: the data source based on creation creates a dataset name, to contain the target data for needing to explore cognition
Table and field;
The tables of data and field configuration module communicate to connect the data set creation and management module, are used for data set
Configuration: the data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for exploring task wound
It builds and manages: based on creation and configured data set, selecting all or part of exploration function, constitute specific explores and appoint
Business;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: right
Multiple exploration tasks of user's screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for automatically export and can
Show that executing the obtained information of assuming overall responsibility for of exploration task explores result depending on changing.
Another technical solution of the present invention are as follows:
A kind of big data is explored and cognitive device, including communicating connected memory and processor, wherein the memory
In store computer-readable instruction, when the computer-readable instruction is executed by the processor, so that the processor is held
The step of row big data exploration as previously described and cognitive approach.
Another technical solution of the present invention are as follows:
A kind of computer storage medium is stored with computer program, the computer journey in the computer storage medium
When sequence is executed by processor, so that the step of processor executes big data exploration as previously described and cognitive approach.
The invention has the benefit that
(1) the invention provide it is a kind of abstracted suitable for various industries and conveniently to various data it is general
Method, apparatus, equipment and computer storage medium so that big data application project, ETL exploitation, data processing, cleaning,
Integration and analysis modeling etc. and when facing the mass data of various different data sources, first create data source and data set,
Then for target data set flexible configuration goal seeking and exploration task, and depth exploration is carried out, obtaining includes each data
The basic and statistical information of table, the content distribution of each data sheet field, statistics, general picture, hierarchical relationship, null value rate, value density
Etc. assuming overall responsibility for information, and these are assumed overall responsibility for into information and is recorded, be stored in client local data base, with visual view and
Table is shown automatically, so that user be allowed to obtain the full cognizances of data, avoid the occurrence of it is insufficient to data cognition, have deviation and
With congruent problem without exception, so be conducive to guide user shoot the arrow at the target or plan with suiting the remedy to the case and build data warehouse, ETL, BI,
The projects such as data mining, machine learning and big data analysis escort for the normal development of these projects;
(2) user's each field of comprehensive understanding at a glance can be allow by the visualization of Data Mining result
Association and hierarchical relationship between the quality and field of (namely characteristic variable) are conducive to reject uncorrelated
(irrelevant) or the feature of redundancy (redundant), to reach reduction Characteristic Number, it is accurate to improve machine learning model
Degree, reduces the purpose of runing time.For example, the null value rate of certain field too high (i.e. poor quality), value density reach 100%
(such as unique field) or value density are close to 0 (numeric field as usual), then this field is just unsuitable for as characteristic variable.
For another example A can be constructed in multidimensional on-line analysis if exploring field A is one-to-many relationship to the relationship of field B
With the level (Hierarchy) of B, and in Feature Engineering, if having selected field B as feature, would not usually select
Field A is also used as feature, further achievees the purpose that reduce Characteristic Number;
(3) business meaning and business scope representated by data of the present embodiment to exploration do not have any hypotheses, because
This is applicable to the product of the various data general-purposes of all trades and professions, and does not have similar systematization on current international market at home
Unitized exploration cognitive techniques exist;In addition, the present embodiment can support all relevant databases, including MySQL,
Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ... etc.;
(4) big data is explored and cognitive approach also has many advantages, such as that exploring automation and result shows diversification, just
In actually popularization and use.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram of big data exploration and cognitive approach provided by the invention.
Fig. 2 is the exemplary diagram provided by the invention that essential information exploration result is shown using tabular form.
Fig. 3 is the exemplary diagram provided by the invention that Distribution value information exploration result is shown using tabular form.
Fig. 4 is the exemplary diagram provided by the invention that Distribution value information exploration result is shown using bar graph form.
Fig. 5 is the exemplary diagram provided by the invention that characteristic information exploration result is shown using tabular form.
Fig. 6 is the exemplary diagram provided by the invention that null value rate in characteristic information exploration result is shown using bar graph form.
Fig. 7 is the exemplary diagram provided by the invention that characteristic information exploration result Median densities are shown using bar graph form.
Fig. 8 be it is provided by the invention using bar graph form show characteristic information explore result in space waste statistics show
Example diagram.
Fig. 9 is the exemplary diagram provided by the invention that level relation information exploration result is shown using tabular form.
Figure 10 is the example provided by the invention that one-to-many hierarchical relationship information exploration result is shown using tree-shaped diagram form
Figure.
Figure 11 is the example provided by the invention that one-to-one hierarchical relationship information exploration result is shown using tree-shaped diagram form
Figure.
Figure 12 is the structural schematic diagram of big data exploration and cognitive device provided by the invention.
Figure 13 is the structural schematic diagram of big data exploration and cognitive device provided by the invention.
Figure 14 is the application scenarios schematic diagram of big data exploration and cognitive device provided by the invention.
In above-mentioned attached drawing: 1- big data is explored and cognitive device;201-CRM database server;202-ERP database clothes
Business device;203-SCM database server;204-Legacy database server;205-External database server.
Specific embodiment
With reference to the accompanying drawing and specific embodiment the present invention is further elaborated.It should be noted that for this
The explanation of a little way of example is used to help understand the present invention, but and does not constitute a limitation of the invention.It is disclosed herein specific
Structure and function details is only used for description example embodiments of the present invention.However, this hair can be embodied with many alternative forms
It is bright, and be not construed as limiting the invention in embodiment set forth herein.
It should be appreciated that containing the multiple operations occurred according to particular order, still in some processes described herein
These operations can not be executed according to its sequence what appears in this article or be executed parallel, the serial number of operation such as S101, S102
Deng being only used for distinguishing each different operation, serial number itself, which does not represent, any executes sequence.In addition, these processes
It may include more or fewer operations, and these operations equally execute in order or parallel execution.
It will be appreciated that though term first, second etc. can be used herein to describe various units, these units are not answered
When being limited by these terms.These terms are only used to distinguish a unit and another unit.Such as it can be single by first
Member is referred to as second unit, and similarly second unit can be referred to as first unit, real without departing from example of the invention
Apply the range of example.
It should be appreciated that the terms "and/or", only a kind of incidence relation for describing affiliated partner, expression can be with
There are three kinds of relationships, for example, A and/or B, can indicate: individualism A, individualism B exist simultaneously tri- kinds of situations of A and B,
The terms "/and " are to describe another affiliated partner relationship, indicate may exist two kinds of relationships, can be with for example, A/ and B
Indicate: two kinds of situations of individualism A, individualism A and B, in addition, character "/" herein, typicallying represent forward-backward correlation object is
A kind of "or" relationship.
It should be appreciated that when by unit referred to as with another unit " connection ", " connected " or " coupling " when, it can with it is another
A unit is directly connected or couples or temporary location may exist.Relatively, it is referred to as with another unit " directly when by unit
It is connected " or when " direct-coupling ", temporary location is not present.It should explain in a similar manner for describing relationship between unit
Other words (for example, " ... between " to " between directly existing ... ", " adjacent " is to " direct neighbor " etc.).
Terms used herein are only used for description specific embodiment, are not intended to limit example embodiments of the present invention.Such as
Used herein, singular "a", "an" and "the" is intended to include plural form, unless phase is explicitly indicated in context
The anti-meaning.It should also be understood that term " includes ", " including ", "comprising" and/or " containing " are as used herein, institute is specified
The feature of statement, integer, step, operation, unit and/or component existence, and be not excluded for other one or more features,
Quantity, step, operation, unit, component and/or their combination existence or increase.
It will be further noted that the function action occurred may be with the sequence of attached drawing appearance in some alternative embodiments
It is different.Such as related function action is depended on, it can actually substantially be executed concurrently, or sometimes can be with phase
Anti- sequence executes continuously show two figures.
Specific details is provided, in the following description in order to which example embodiment is understood completely.However ability
Domain those of ordinary skill is it is to be understood that implementation example embodiment without these specific details.Such as it can be
System is shown in block diagram, to avoid with unnecessary details come so that example is unclear.It in other instances, can not be with need not
The details wanted shows well-known process, structure and technology, to avoid making example embodiment unclear.
Embodiment one
As shown in Fig. 1~11, the big data provided in this embodiment is explored and cognitive approach, can be, but not limited to include
Following steps S101~S106.
S101. data source creation and management: creation needs to explore the target data source of cognition, including but not limited to data source
Type, the IP address of database server, login username, login password and database-name etc..
In the step S101, data source, that is, data source is the source of data set, can be various businesses field
Data, such as CRM (Customer Relationship Management, customer relation management), ERP (Enterprise
Resource Planning, Enterprise Resources Plan), ecommerce (electric business), SCM (Supply Chain Management,
Supply chain management), Legacy and External (history and external data) ... etc..These data can be stored in various
Lane database, such as MySQL, Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ...
Etc., the present embodiment supports all relevant databases.Before being explored, creation data source is first had to, indicates that data source is believed
Breath.Data source information include be successfully connected to all information required for data source, such as database server IP address or
Title, type, database-name, login username and login password.Therefore a complete effective data source information must wrap
It includes: data source name, data source types (such as Oracle, MySQL, DB2, Microsoft SQL Server, Microsoft
Access, Sybase, Hive, PostgreSQL, Teradata ... etc.), the IP address of database server or title, login
User name, login password and corresponding database-name etc..It is detailed, data are preferably created on human-computer interaction interface
Source generates data source information, then the data source information is added in a data source list, subsequently carries out following data
Source control: it is included in data source list and increases or delete data source information newly;Data source information can not be changed once creation, only
It can test connection validity or deletion data source.
S102. data set creation and management: the data source based on creation creates a dataset name, to contain need
Explore the target matrix and field of cognition.
In the step S102, data set is obtained and by mass data table dependent on that can be successfully connected data source
With the set of field composition, each data set has its description information and some can be successfully connected data source correspondingly, because
This can carry out new creation and management to data set according to data source;For example, certain data set includes to surround the 5 of some theme respectively
A tables of data: user personality table, hotel information table, date table, hotel reservation table and reservation tran list, they are used to manage hotel
The information such as user information, hotel's price in booking process, these tables of data independent are coupled by creation, and are protected
There are in data set;When can carry out the configuration management of data set after creation, and record the time of last time configuration, i.e. data
Collect renewal time.Since each data set can correspond to the data source that can be successfully connected, configuration is being created or updated
Afterwards, all tables of data for obtaining and being located in the database server can be accessed from the database server in corresponding data source
Title/and all field names in each tables of data, obtain data set information.
S103. data set configures: the data set based on creation, and user arbitrarily screens the data inside specified correspondence database
Table and field.
In the step S103, specified tables of data and field are specific object to be explored, including will be whole
Field as goal seeking tables of data or using part field as goal seeking tables of data (by configure target matrix with
And the aiming field in configuration target matrix, can so be explored for specific several aiming fields).Tables of data or
Table (TabIe) is that (database is a frame to one of most important component part of database, and tables of data is only its essence
Content), a line in table can be called one " record " (should " record " include this line in all information, just as
Address book data concentrate someone all information, but " record " in data set not special record name, usually use it
The line number at place indicates which " record " this is), while the column in table can be called to one " field " and (be somebody's turn to do " field "
Contain the information of a certain special topic, such as concentrated in address book data, " name " and " telephone number " these be all in table
The shared attribute of row, so these column are known as " name " field and " telephone number " field).In addition, in the step S103
Specific any screening mode can be, but not limited to add tables of data/and field to be explored in data set, or delete
The tables of data/and field explored are not needed.
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function
Can, constitute a specific exploration task.
In the step S104, optimization, all or part of mode for exploring function of selection can be, but not limited to
For the exploration task for configuring following any combination: table essential information explores task, task, field are explored in field value distributed intelligence
Characteristic information explores task and interfield hierarchical relationship information explores task.The table essential information explores task, the word
Task is explored in segment value distributed intelligence, the field feature information explores task and the interfield hierarchical relationship information explores task
Be based on JAVA and sql like language (Structured Query Language, structured query language) and custom-written is basic
Unit is explored, exploration task that is different and wanting can be configured for different data collection, to execute corresponding task in starting
Access program after, access target database server, the target information that then inquiry obtains target matrix or field is explored
As a result.
S105. it executes exploration task: multiple exploration tasks of user's screening is ranked up, and sequentially distribution execution is multiple
Exploration task.
It is detailed in the step S105, it can be, but not limited to (A)~(D) as follows and execute some exploration
Task.
(A) it if the exploration task is that table essential information explores task, executes and explores task with the table essential information
Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all
Target matrix but be not limited to following essential information and explore result: table name, creation time, modification time, record sum and/
Or field sum etc..
In the mode (A), the table name, the creation time, the modification time, record sum or described
Field sum can be directly based upon the sql like language for tables of data and metadata, carry out inquiry acquisition.
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and is explored with the field value distributed intelligence
Task is corresponding and based on the access program that JAVA and sql like language are write, and access target database server, then inquiry obtains
All aiming fields but be not limited to following Distribution value information and explore result: each difference non-null values, and with each difference
Corresponding frequency of occurrence of non-null value etc..
In the mode (B), each different non-null values of aiming field can be based on existing sql like language and routine
The value matching way of JAVA program carries out data base querying and compares to obtain, and (it includes occurrence to their corresponding frequency of occurrences
The several and/or frequency of occurrences) it again may be by the counting mode of existing sql like language He routine JAVA program, progress database
Inquiry and statistics obtain.
(C) it if the exploration task is that field feature information explores task, executes and explores task with the field feature information
Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all
Aiming field but be not limited to following characteristic information and explore result: data type, different value number, value density, stores null value rate
Whether space length, physical length, minimum value, maximum value and/or field value are unique etc..
In the mode (C), the data type of aiming field, memory space length, physical length, different value number, most
Whether small value, maximum value and field value can uniquely be based on sql like language and JAVA program, carry out data and metadata query and place
Reason obtains, and the null value rate R of aiming fieldnullWith value density DdistinctThe access program that can write according to following formula is inquired
It obtains:
In formula, nnullFor the null value sum in aiming field, CdistinctFor in aiming field with different non-null values
Number, N are the total number of records (total line number i.e. in target matrix) in aiming field said target tables of data.By aforementioned formula
It is found that the null value rate R of aiming fieldnullWith value density DdistinctIt is respectively interposed between 0~1, wherein null value rate is higher, indicates
The value missing of this field is more, and the quality of data is lower, as low as to a certain degree it is necessary to ignore this field or to missing
Value processes, such as is substituted with a default value;If being worth density is equal to 0, show that this field is all null value, if value
Density is equal to 1, then it is duplicate to show that value that this field is included does not have, both extreme cases all make this field not
It is suitable as the characteristic variable of data mining or machine learning;It is smaller to be worth density, it may also be said to thicker at granularity.
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship
It is corresponding and based on the access program that JAVA and sql like language are write that information explores task, access target database server, then
Inquiry obtains between any two aiming field but is not limited to following hierarchical relationship information exploration result: one-one relationship, one
Many-many relationship and/or many-to-one relationship etc..
In the mode (D), the access program write in accordance with the following steps can be, but not limited to inquire acquisition first
Hierarchical relationship information between aiming field and the second aiming field: DS101. is checked in the first object field
Some non-null value the case where corresponding to multiple in second aiming field or zero non-null value execute step if it does not exist
DS102, and if it exists, then follow the steps DS103;DS102. some non-null value in second aiming field is checked for
In the corresponding first object field the case where multiple or zero non-null value, if it does not exist, then the first object field is determined
Hierarchical relationship information between second aiming field is one-one relationship, otherwise determines the first object field and institute
Stating the hierarchical relationship information between the second aiming field is many-to-one relationship;DS103. second target word is checked for
The case where some non-null value in section corresponds to multiple in the first object field or zero non-null value, if it does not exist, then determines
Hierarchical relationship information between the first object field and second aiming field is many-one relationship, otherwise described in judgement
Hierarchical relationship information between first object field and second aiming field is many-to-many relationship.In abovementioned steps DS101
In~DS103, inspection judgement specifically can be carried out using the Group By function of sql like language, be repeated no more in this.In addition, described
First object field and second aiming field can be located in same tables of data, can also be located in different data table;By
Mean not determine relationship in many-to-many relationship, answers this that can explore result not as hierarchical relationship information.
Foregoing manner (A)~(D) can one-touch starting execute, and explored after task execution as a result, task
Status information can also be updated therewith.Task once execute, there will be time started, end time and time-consuming information from
Dynamic record, and be stored in the summary of implementing result with task status, wherein the task status may include following several
Kind: from not running (original state of i.e. newly-built task), it is currently running (the shape after task brings into operation and before end of run
State), run successfully (Mission Success executes completion), operation failure (task execution failure, as in implementation procedure data source close
Or other fortuitous events will lead to failure), operation complete (part table or field execute exception in task execution, after show
Operation is completed) and stop (task in being currently running, user click the Terminate button manually and stop execution procedures) manually.It is in office
After business is run successfully, user can check the summary for exploring result and various displayed pages.In addition, the step S104 it
Afterwards, user's sequence can be carried out to pending more exploration tasks in task list, then sequentially executes each exploration task, i.e.,
If selecting the multiple exploration tasks of starting in the task list, putting in order for task is explored according to these and is seriatim passed through
The step S105 is obtained and is saved corresponding information of assuming overall responsibility for and explores as a result, realizing that multitask is sequentially distributed the purpose of execution.
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
In the step S106, the exploration result of target object is checked in order to facilitate user flexibility or can choose specific
Certain execution task goes to check as a result, optimize, can be after user's selected data table/and user select field, and exporting simultaneously can
Show that information of assuming overall responsibility for corresponding with user's selected data table/and the selected field of user explores result depending on changing, wherein user's choosing
Determine tables of data and it is described for select field require be located at target data concentrate.
In the step S106, it specifically can also export and visualize as follows and obtained assume overall responsibility for information
Explore result:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if using tabular form show that this is basic
Information is explored as a result, as shown in Fig. 2 citing;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or histogram
Form shows that the Distribution value information is explored as a result, as shown in Fig. 3 and Fig. 4 citing;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or column figure
Formula shows that this feature information is explored as a result, as shown in the citing of Fig. 5,6,7 and 8, wherein " field null value rate " in Fig. 6 is sky
Value rate Rnull, " field density " in Fig. 7 is to be worth density Ddistinct, " character length " in Fig. 8 is memory space length,
" percentage " is the percentage that physical length accounts for memory space length;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped
Diagram form shows that the hierarchical relationship information is explored as a result, as shown in the citing of Fig. 9,10 and 11.
Further specifically, showing that Distribution value information explores result in the step S106 and using bar graph form
When, highest first M different non-null values of frequency of occurrence in selected field are shown by histogram, wherein M is between 10~100
Between natural number.As shown in figure 4, M can be exemplified as 20.The further details of relevant field are drilled through for convenience, are optimized
, in the step S104 and when receiving the input signal for clicking histogram, exports and show word corresponding with the histogram
The distribution situation of main contents and/or sparse content in section, wherein the main contents refer to the frequency of occurrence highest in field
Non-null value, the sparse content refers to the non-null value that frequency of occurrence is minimum in field.
To sum up, it using the exploration of big data provided by the present embodiment and cognitive approach, has the following technical effect that
(1) a kind of general side suitable for various industries and conveniently abstracted to various data is present embodiments provided
Method so that big data application project, ETL exploitation, data processing, cleaning, integration and in terms of and in face of each
When the mass data of kind different data sources, data source and data set are first created, is then explored for target data set flexible configuration
Target and exploration task, and depth exploration is carried out, obtain the basic and statistical information comprising each tables of data, each data literary name
Content distribution, statistics, general picture, hierarchical relationship, null value rate, the value density etc. of section assume overall responsibility for information, and these are assumed overall responsibility for information record
Get off, be stored in client local data base, shown automatically with visual view and table, so that user be allowed to obtain
The full cognizance of data, avoid the occurrence of it is insufficient to data cognition, have deviation and with congruent problem without exception, and then be conducive to guide user
Plan with a definite target in view or with suiting the remedy to the case and build data warehouse, ETL, BI, data mining, machine learning and big data analysis etc.
Project escorts for the normal development of these projects;
(2) user's each field of comprehensive understanding at a glance can be allow by the visualization of Data Mining result
Association and hierarchical relationship between the quality and field of (namely characteristic variable) are conducive to reject uncorrelated
(irrelevant) or the feature of redundancy (redundant), to reach reduction Characteristic Number, it is accurate to improve machine learning model
Degree, reduces the purpose of runing time.For example, the null value rate of certain field too high (i.e. poor quality), value density reach 100%
(such as unique field) or value density are close to 0 (numeric field as usual), then this field is just unsuitable for as characteristic variable.
For another example A can be constructed in multidimensional on-line analysis if exploring field A is one-to-many relationship to the relationship of field B
With the level (Hierarchy) of B, and in Feature Engineering, if having selected field B as feature, would not usually select
Field A is also used as feature, further achievees the purpose that reduce Characteristic Number;
(3) business meaning and business scope representated by data of the present embodiment to exploration do not have any hypotheses, because
This is applicable to the product of the various data general-purposes of all trades and professions, and does not have similar systematization on current international market at home
Unitized exploration cognitive techniques exist;In addition, the present embodiment can support all relevant databases, including MySQL,
Oracle, SQL Server, DB2, Sybase, Hive, PostgreSQL, Teradata ... etc.;
(4) big data is explored and cognitive approach also has many advantages, such as that exploring automation and result shows diversification, just
In actually popularization and use.
Embodiment two
As shown in figure 12, present embodiments provide it is a kind of realize big data described in embodiment one explore and cognitive approach it is hard
Part device, including data source creation and management module, data set creation and management module, tables of data and field configuration module, spy
Rope task configuration module explores task execution module and explores result visualization module;
The data source creation and management module, create and manage for data source: creation needs to explore the target of cognition
Data source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, are used for data set
Creation and management: the data source based on creation creates a dataset name, to contain the target data for needing to explore cognition
Table and field;
The tables of data and field configuration module communicate to connect the data set creation and management module, are used for data set
Configuration: the data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for exploring task wound
It builds and manages: based on creation and configured data set, selecting all or part of exploration function, constitute specific explores and appoint
Business;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: right
Multiple exploration tasks of user's screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for automatically export and can
Show that executing the obtained information of assuming overall responsibility for of exploration task explores result depending on changing.
Big data provided in this embodiment is explored and the course of work, operational detail and the technical effect of cognitive device, can be with
Referring to embodiment one, repeated no more in this.
Embodiment three
As shown in Figs. 13 and 14, big data described in a kind of realization embodiment one is present embodiments provided to explore and cognitive approach
Hardware device, including communicating connected memory and processor, wherein store computer-readable finger in the memory
It enables, when the computer-readable instruction is executed by the processor, is counted greatly as described in embodiment one so that the processor is executed
The step of according to exploration and cognitive approach.As shown in figure 13, it includes being connected by system bus that big data, which is explored with cognitive device,
Processor, memory and network interface, wherein the processor is for providing calculating and control ability;The memory includes
Non-volatile memory medium and built-in storage, the non-volatile memory medium are stored with operating system and described computer-readable
Instruction, the built-in storage provide ring for the operation of operating system and computer-readable instruction in non-volatile memory medium
Border, the network interface are used to carry out network communication connection with external database server.As shown in figure 14, one kind is provided to show
The application scenarios of example property, big data is explored and cognitive device 1 pass through internet or corporate intranet respectively with CRM database server
201, ERP database server 202, SCM database server 203, Legacy database server 204 and External data
The communication of library server 205 is connected, this five kinds of data sources can so be carried out with the creation and management, the creation of data set of data source
With management and carry out as described in embodiment one and the big data as described in step S103~S106 explore and cognition times
Business obtains and assumes overall responsibility for information exploration result accordingly.
Big data provided in this embodiment is explored and the course of work, operational detail and the technical effect of cognitive device, can be with
Referring to embodiment one, repeated no more in this.
Example IV
Present embodiments provide a kind of computer journey of the storage comprising the exploration and cognitive approach of big data described in embodiment one
The computer storage medium of sequence is stored with computer program, the computer program quilt in the computer storage medium
When processor executes, so that the processor executes as described in embodiment one big data exploration and the step of cognitive approach.Wherein,
Computer can be general purpose computer, special purpose computer, computer network or other programmable devices, be also possible to move
Smart machine (such as smart phone, PAD or ipad).
The course of work, operational detail and the technical effect of computer storage medium provided in this embodiment, may refer to reality
Example one is applied, is repeated no more in this.
Multiple embodiments described above are only schematical, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables with so that a computer equipment executes method described in certain parts of each embodiment or embodiment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments
Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation
Technical solution documented by example is modified or equivalent replacement of some of the technical features.And these modification or
Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Finally it is pointed out that for those of ordinary skill in the art, before not departing from present inventive concept
It puts, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.Therefore, the guarantor of the invention patent
Shield range should be determined by the appended claims, and specification can be used for interpreting the claims.
Claims (10)
1. a kind of big data is explored and cognitive approach, which comprises the steps of:
S101. data source creation and management: creation needs to explore the target data source of cognition, including data source types, database
IP address, login username, login password and the database-name of server;
S102. data set creation and management: the data source based on creation creates a dataset name, to contain needing to visit
The target matrix and field of rope cognition;
S103. data set configures: the data set based on creation, user arbitrarily screen tables of data inside specified correspondence database with
Field;
S104. task creation and management are explored: based on creation and configured data set, selecting completely or partially to explore function,
Constitute a specific exploration task;
S105. it executes exploration task: multiple exploration tasks of user's screening being ranked up, and sequentially distribution executes multiple explorations
Task;
S106. it exports automatically and visualizes the obtained information of assuming overall responsibility for of execution exploration task and explore result.
2. a kind of big data as described in claim 1 is explored and cognitive approach, it is characterised in that:
In the step S104, the exploration for selecting all or part of mode for exploring function as the following any combination of configuration
Task: table essential information explores task, task is explored in field value distributed intelligence, field feature information explores task and interfield
Hierarchical relationship information explores task;
In the step S105, some exploration task is executed as follows:
(A) it if the exploration task is that table essential information explores task, executes corresponding with table essential information exploration task
And based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all targets
The following essential information of tables of data explores result: table name, creation time, modification time, record sum and/or field sum;
(B) it if the exploration task is that task is explored in field value distributed intelligence, executes and explores task with the field value distributed intelligence
Corresponding and based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all
The following Distribution value information of aiming field explores result: each difference non-null values, and it is corresponding from each different non-null values out
The existing frequency;
(C) it if the exploration task is that field feature information explores task, executes corresponding with field feature information exploration task
And based on the access program that JAVA and sql like language are write, access target database server, then inquiry obtains all targets
The following characteristic information of field explores result: data type, null value rate, different value number, value density, memory space length, reality
Whether length, minimum value, maximum value and/or field value are unique;
(D) it if the exploration task is that interfield hierarchical relationship information explores task, executes and the interfield hierarchical relationship information
Exploration task is corresponding and based on the access program that JAVA and sql like language are write, then access target database server is inquired
It obtains following hierarchical relationship information between any two aiming field and explores result: one-one relationship, many-one relationship and/or more
One-one relationship.
3. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in (C) mode, according to such as
The access program that lower formula is write obtains the null value rate R of aiming field to inquirenullAnd/or value density Ddistinct:
In formula, nnullFor the null value sum in aiming field, CdistinctTo have the number of different non-null values, N in aiming field
For the total number of records in aiming field said target tables of data.
4. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in (D) mode, according to such as
The access program that lower step is write inquires the hierarchical relationship information obtained between first object field and the second aiming field:
DS101. check for some non-null value in the first object field correspond to it is more in second aiming field
The case where a or zero non-null value, executes step DS102, and if it exists, then follow the steps DS103 if it does not exist;
DS102. check for some non-null value in second aiming field correspond to it is more in the first object field
The case where a or zero non-null value, then determines between the first object field and second aiming field if it does not exist
Hierarchical relationship information is one-one relationship, otherwise determines the level between the first object field and second aiming field
Relation information is many-to-one relationship;
DS103. check for some non-null value in second aiming field correspond to it is more in the first object field
The case where a or zero non-null value, then determines between the first object field and second aiming field if it does not exist
Hierarchical relationship information is many-one relationship, otherwise determines the level between the first object field and second aiming field
Relation information is many-to-many relationship.
5. a kind of big data as claimed in claim 2 is explored and cognitive approach, which is characterized in that in the step S106:
If obtain assume overall responsibility for information explore result include essential information explore as a result, if the essential information shown using tabular form
Explore result;
If obtain assume overall responsibility for information explore result include Distribution value information explore as a result, if using tabular form or bar graph form
Show that the Distribution value information explores result;
If obtain assume overall responsibility for information explore result include characteristic information explore as a result, if using tabular form or bar graph form exhibition
Show that this feature information explores result;
If obtain assume overall responsibility for information explore result include hierarchical relationship information explore as a result, if using tabular form or tree-shaped figure
Formula shows that the hierarchical relationship information explores result.
6. a kind of big data as claimed in claim 6 is explored and cognitive approach, it is characterised in that: in the step S106 and
When showing that Distribution value information explores result using bar graph form, show that frequency of occurrence is highest in selected field by histogram
First M different non-null value, wherein M is the natural number between 10~100.
7. a kind of big data as described in claim 1 is explored and cognitive approach, it is characterised in that: in the step S106,
It exports and visualizes information of assuming overall responsibility for corresponding with user's selected data table/and the selected field of user and explore result.
8. a kind of big data is explored and cognitive device, it is characterised in that: including data source creation and management module, data set creation
And management module, tables of data and field configuration module, exploration task configuration module, exploration task execution module and exploration result can
Depending on changing module;
The data source creation and management module, create and manage for data source: creation needs to explore the target data of cognition
Source, including data source types, the IP address of database server, login username, login password and database-name;
The data set creation and management module communicate to connect the data source creation and management module, create for data set
And management: the data source based on creation creates a dataset name, to contain need explore cognition target matrix and
Field;
The tables of data and field configuration module communicate to connect the data set creation and management module, configure for data set:
Data set based on creation, user arbitrarily screen the tables of data and field inside specified correspondence database;
The exploration task configuration module, communicates to connect the tables of data and field configuration module, for explore task creation and
Management: based on creation and configured data set, selection is all or part of to explore function, constitutes a specific exploration task;
The exploration task execution module communicates to connect the exploration task configuration module, for executing exploration task: to user
Multiple exploration tasks of screening are ranked up, and sequentially distribution executes multiple exploration tasks;
The exploration result visualization module, communicates to connect the exploration task execution module, for exporting and visualizing automatically
Show that executing the obtained information of assuming overall responsibility for of exploration task explores result.
9. a kind of big data is explored and cognitive device, which is characterized in that including communicating connected memory and processor, wherein
Computer-readable instruction is stored in the memory, when the computer-readable instruction is executed by the processor, so that institute
It states processor and executes as described in claim 1~7 any one big data exploration and the step of cognitive approach.
10. a kind of computer storage medium, which is characterized in that be stored with computer program, institute in the computer storage medium
When stating computer program and being executed by processor, so that the processor executes the big data as described in claim 1~7 any one
The step of exploration and cognitive approach.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910718388.1A CN110442620B (en) | 2019-08-05 | 2019-08-05 | Big data exploration and cognition method, device, equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910718388.1A CN110442620B (en) | 2019-08-05 | 2019-08-05 | Big data exploration and cognition method, device, equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442620A true CN110442620A (en) | 2019-11-12 |
CN110442620B CN110442620B (en) | 2023-08-29 |
Family
ID=68433296
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910718388.1A Active CN110442620B (en) | 2019-08-05 | 2019-08-05 | Big data exploration and cognition method, device, equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442620B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990447A (en) * | 2019-12-19 | 2020-04-10 | 北京锐安科技有限公司 | Data probing method, device, equipment and storage medium |
CN111180083A (en) * | 2019-12-31 | 2020-05-19 | 北京零研科技有限公司 | Clinical scientific research data management method and system |
CN111190597A (en) * | 2019-12-27 | 2020-05-22 | 天津浪淘科技股份有限公司 | Data UE visual design system |
CN112434009A (en) * | 2020-11-19 | 2021-03-02 | 浙江大华技术股份有限公司 | End-to-end data probing method and device, computer equipment and storage medium |
CN115017136A (en) * | 2022-06-29 | 2022-09-06 | 江苏重行信息科技有限公司 | Monitoring data analysis, storage and management system based on big data application |
CN116109121A (en) * | 2023-04-17 | 2023-05-12 | 西昌学院 | User demand mining method and system based on big data analysis |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870739A (en) * | 1996-09-20 | 1999-02-09 | Novell, Inc. | Hybrid query apparatus and method |
US20060136382A1 (en) * | 2004-12-17 | 2006-06-22 | International Business Machines Corporation | Well organized query result sets |
US20090083277A1 (en) * | 2007-09-26 | 2009-03-26 | Barsness Eric L | Nodal data normalization |
CN101556606A (en) * | 2009-05-20 | 2009-10-14 | 同方知网(北京)技术有限公司 | Data mining method based on extraction of Web numerical value tables |
US20110022581A1 (en) * | 2009-07-27 | 2011-01-27 | Rama Krishna Korlapati | Derived statistics for query optimization |
US20110145210A1 (en) * | 2009-12-10 | 2011-06-16 | Negti Systems, Inc. | System and Method for Managing One or More Databases |
CN103930887A (en) * | 2011-11-18 | 2014-07-16 | 惠普发展公司,有限责任合伙企业 | Query summary generation using row-column data storage |
US20140280280A1 (en) * | 2013-03-15 | 2014-09-18 | International Business Machines Corporation | Estimating error propagation for database optimizers |
US20150066930A1 (en) * | 2013-08-28 | 2015-03-05 | Intelati, Inc. | Generation of metadata and computational model for visual exploration system |
CN106295983A (en) * | 2016-08-08 | 2017-01-04 | 烟台海颐软件股份有限公司 | Power marketing data visualization statistical analysis technique and system |
US20170103104A1 (en) * | 2015-10-07 | 2017-04-13 | International Business Machines Corporation | Query plan based on a data storage relationship |
US9633076B1 (en) * | 2012-10-15 | 2017-04-25 | Tableau Software Inc. | Blending and visualizing data from multiple data sources |
CN106775997A (en) * | 2015-11-23 | 2017-05-31 | 阿里巴巴集团控股有限公司 | A kind of task processing method and equipment |
CN107103050A (en) * | 2017-03-31 | 2017-08-29 | 海通安恒(大连)大数据科技有限公司 | A kind of big data Modeling Platform and method |
CN107391739A (en) * | 2017-08-07 | 2017-11-24 | 北京奇艺世纪科技有限公司 | A kind of query statement generation method, device and electronic equipment |
CN108717786A (en) * | 2018-07-17 | 2018-10-30 | 南京航空航天大学 | A kind of traffic accident causation method for digging based on universality meta-rule |
CN109189846A (en) * | 2018-09-11 | 2019-01-11 | 北京易华录信息技术股份有限公司 | A kind of public security traffic control visual modeling system and method based on big data technology |
CN109918389A (en) * | 2019-03-13 | 2019-06-21 | 试金石信用服务有限公司 | Data air control method, apparatus and storage medium based on message flow and chart database |
CN110008232A (en) * | 2019-04-11 | 2019-07-12 | 北京启迪区块链科技发展有限公司 | Generation method, device, server and the medium of structured query sentence |
CN110069559A (en) * | 2019-03-21 | 2019-07-30 | 中国人民解放军陆军工程大学 | A kind of analysis of Heterogeneous Information System data and integrated approach with height automatic control |
-
2019
- 2019-08-05 CN CN201910718388.1A patent/CN110442620B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870739A (en) * | 1996-09-20 | 1999-02-09 | Novell, Inc. | Hybrid query apparatus and method |
US20060136382A1 (en) * | 2004-12-17 | 2006-06-22 | International Business Machines Corporation | Well organized query result sets |
US20090083277A1 (en) * | 2007-09-26 | 2009-03-26 | Barsness Eric L | Nodal data normalization |
CN101556606A (en) * | 2009-05-20 | 2009-10-14 | 同方知网(北京)技术有限公司 | Data mining method based on extraction of Web numerical value tables |
US20110022581A1 (en) * | 2009-07-27 | 2011-01-27 | Rama Krishna Korlapati | Derived statistics for query optimization |
US20110145210A1 (en) * | 2009-12-10 | 2011-06-16 | Negti Systems, Inc. | System and Method for Managing One or More Databases |
CN103930887A (en) * | 2011-11-18 | 2014-07-16 | 惠普发展公司,有限责任合伙企业 | Query summary generation using row-column data storage |
US9633076B1 (en) * | 2012-10-15 | 2017-04-25 | Tableau Software Inc. | Blending and visualizing data from multiple data sources |
US20140280280A1 (en) * | 2013-03-15 | 2014-09-18 | International Business Machines Corporation | Estimating error propagation for database optimizers |
US20150066930A1 (en) * | 2013-08-28 | 2015-03-05 | Intelati, Inc. | Generation of metadata and computational model for visual exploration system |
US20170103104A1 (en) * | 2015-10-07 | 2017-04-13 | International Business Machines Corporation | Query plan based on a data storage relationship |
CN106775997A (en) * | 2015-11-23 | 2017-05-31 | 阿里巴巴集团控股有限公司 | A kind of task processing method and equipment |
CN106295983A (en) * | 2016-08-08 | 2017-01-04 | 烟台海颐软件股份有限公司 | Power marketing data visualization statistical analysis technique and system |
CN107103050A (en) * | 2017-03-31 | 2017-08-29 | 海通安恒(大连)大数据科技有限公司 | A kind of big data Modeling Platform and method |
CN107391739A (en) * | 2017-08-07 | 2017-11-24 | 北京奇艺世纪科技有限公司 | A kind of query statement generation method, device and electronic equipment |
CN108717786A (en) * | 2018-07-17 | 2018-10-30 | 南京航空航天大学 | A kind of traffic accident causation method for digging based on universality meta-rule |
CN109189846A (en) * | 2018-09-11 | 2019-01-11 | 北京易华录信息技术股份有限公司 | A kind of public security traffic control visual modeling system and method based on big data technology |
CN109918389A (en) * | 2019-03-13 | 2019-06-21 | 试金石信用服务有限公司 | Data air control method, apparatus and storage medium based on message flow and chart database |
CN110069559A (en) * | 2019-03-21 | 2019-07-30 | 中国人民解放军陆军工程大学 | A kind of analysis of Heterogeneous Information System data and integrated approach with height automatic control |
CN110008232A (en) * | 2019-04-11 | 2019-07-12 | 北京启迪区块链科技发展有限公司 | Generation method, device, server and the medium of structured query sentence |
Non-Patent Citations (3)
Title |
---|
FARRE, CARLES; NUTT, WERNER; TENIENTE, ERNEST; URPI, TONI: "Containment of conjunctive queries over databases with null values", 《11TH INTERNATIONAL CONFERENCE ON DATABASE THEORY》 * |
JAGDEV BHOGAL; IMRAN CHOKSI: "Handling Big Data Using NoSQL", 《IEEE》 * |
魏小宁: "构建数据仓库系统的技术分析", 华南金融电脑, no. 08 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990447A (en) * | 2019-12-19 | 2020-04-10 | 北京锐安科技有限公司 | Data probing method, device, equipment and storage medium |
CN110990447B (en) * | 2019-12-19 | 2023-09-15 | 北京锐安科技有限公司 | Data exploration method, device, equipment and storage medium |
CN111190597A (en) * | 2019-12-27 | 2020-05-22 | 天津浪淘科技股份有限公司 | Data UE visual design system |
CN111180083A (en) * | 2019-12-31 | 2020-05-19 | 北京零研科技有限公司 | Clinical scientific research data management method and system |
CN112434009A (en) * | 2020-11-19 | 2021-03-02 | 浙江大华技术股份有限公司 | End-to-end data probing method and device, computer equipment and storage medium |
CN115017136A (en) * | 2022-06-29 | 2022-09-06 | 江苏重行信息科技有限公司 | Monitoring data analysis, storage and management system based on big data application |
CN115017136B (en) * | 2022-06-29 | 2024-02-13 | 广州市橙鑫网络有限公司 | Monitoring data analysis storage management system based on big data application |
CN116109121A (en) * | 2023-04-17 | 2023-05-12 | 西昌学院 | User demand mining method and system based on big data analysis |
CN116109121B (en) * | 2023-04-17 | 2023-06-30 | 西昌学院 | User demand mining method and system based on big data analysis |
Also Published As
Publication number | Publication date |
---|---|
CN110442620B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442620A (en) | A kind of big data is explored and cognitive approach, device, equipment and computer storage medium | |
US10564622B1 (en) | Control interface for metric definition specification for assets and asset groups driven by search-derived asset tree hierarchy | |
US20200167350A1 (en) | Loading queries using search points | |
US20200334237A1 (en) | Systems, methods, user interfaces and algorithms for performing database analysis and search of information involving structured and/or semi-structured data | |
US11681694B2 (en) | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface | |
US10459938B1 (en) | Punchcard chart visualization for machine data search and analysis system | |
CN110168515A (en) | System for analyzing data relationship to support query execution | |
US10585892B2 (en) | Hierarchical dimension analysis in multi-dimensional pivot grids | |
CN110300963A (en) | Data management system in large-scale data repository | |
US10459939B1 (en) | Parallel coordinates chart visualization for machine data search and analysis system | |
CN101971165B (en) | Graphic representations of data relationships | |
CN109997125A (en) | System for importing data to data storage bank | |
Lum et al. | 1978 New Orleans data base design workshop report | |
US11809439B1 (en) | Updating client dashboarding component of an asset monitoring and reporting system | |
US11422869B2 (en) | Presenting collaboration activity | |
US11461350B1 (en) | Control interface for dynamic elements of asset monitoring and reporting system | |
US20070150562A1 (en) | System and method for data quality management and control of heterogeneous data sources | |
EP1585036A2 (en) | Management of parameterized database queries | |
Hobbs et al. | Oracle 10g data warehousing | |
CA3179300C (en) | Domain-specific language interpreter and interactive visual interface for rapid screening | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
US20150058363A1 (en) | Cloud-based enterprise content management system | |
Postina et al. | An ea-approach to develop soa viewpoints | |
US20130218893A1 (en) | Executing in-database data mining processes | |
WO2021195285A1 (en) | Systems and methods for tracking features in a development environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |