WO2017024966A1 - 一种数据表的分类方法和装置 - Google Patents

一种数据表的分类方法和装置 Download PDF

Info

Publication number
WO2017024966A1
WO2017024966A1 PCT/CN2016/092819 CN2016092819W WO2017024966A1 WO 2017024966 A1 WO2017024966 A1 WO 2017024966A1 CN 2016092819 W CN2016092819 W CN 2016092819W WO 2017024966 A1 WO2017024966 A1 WO 2017024966A1
Authority
WO
WIPO (PCT)
Prior art keywords
data table
parameter
data
identifier
condition
Prior art date
Application number
PCT/CN2016/092819
Other languages
English (en)
French (fr)
Inventor
李晓菲
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017024966A1 publication Critical patent/WO2017024966A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of data processing, and in particular, to a method and apparatus for classifying data tables.
  • the data stored in the data table may be log data, transaction data, user data, etc. collected from various systems, and the data table may be uploaded by a user or provided to a cloud computing platform for sharing. Users can find the required data tables on the cloud computing platform.
  • the present invention provides a data table classification method and device, which determines a high quality data table from a data table, and preferentially displays a high quality data table in the query result during the query process. Improve query efficiency.
  • a method for classifying data tables comprising:
  • the server updates the identifier of the first data table to a first identifier
  • the server receives a query request, and the query request includes a query condition
  • the server displays the plurality of data tables according to the identifier of the data table to the query request; wherein, the first data table The display position is better than the display position of the second data table, the second data table is one of the plurality of data tables, and the identifier of the second data table is not the first identifier.
  • the table parameter further includes a combination of any one or more of a category parameter, a change frequency parameter, and a data command control DQC parameter, where the category parameter is used to identify the belonging of the first data table.
  • the change frequency parameter is used to identify a change frequency of a field in the first data table and/or a change frequency of a first data table, where the DQC parameter is used to identify that the first data table is monitored by a DQC parameter.
  • the server displays the multiple data tables to the query request according to the identifier of the data table, and further includes:
  • the server classifies the plurality of data tables according to category parameters, wherein the first data table is displayed under the category to which the first data table belongs.
  • the server determines whether the table parameter meets the first determining condition, and further includes:
  • the server updates the identifier of the first data table to a second identifier
  • the server displays the plurality of data tables to the query request according to the identifier of the data table, including:
  • the display position of the third data table is better than the display position of the first data table, the third data table is one of the plurality of data tables, and the identifier of the third data table is the A logo.
  • the method further includes:
  • the server updates the identifier of the first data table to a third identifier
  • the server displays the plurality of data tables to the query request according to the identifier of the data table, including:
  • the server masks the first data table in the process of presenting the plurality of data tables.
  • the first determining condition includes that the integrity parameter has a table comment, the proportion of the comment field reaches a preset threshold, A combination of a data hierarchy, a storage type with data, and any one or more of the scheduling periods;
  • the first determining condition further includes the update parameter having an update record that is continuously updated.
  • a sorting device for a data table comprising:
  • An obtaining unit configured to obtain a table parameter of the first data table, where the table parameter includes an integrity parameter and an update parameter, where the integrity parameter is used to identify metadata integrity of the first data table, and the update parameter An update record for identifying the first data table;
  • a determining unit configured to determine whether the table parameter meets the first determining condition; if the table parameter meets the first determining condition, triggering the first updating unit;
  • the first update unit is configured to update an identifier of the first data table to a first identifier
  • a receiving unit configured to receive a query request, where the query request includes a query condition
  • a display unit configured to display the plurality of data tables according to the identifier of the data table to the query request if the first data table is included in the plurality of data tables that meet the query condition; wherein the first data is The display position of the table is better than the display position of the second data table, the second data table is one of the plurality of data tables, and the identifier of the second data table is not the first identifier.
  • the table parameter further includes a combination of any one or more of a category parameter, a change frequency parameter, and a data command control DQC parameter, where the category parameter is used to identify the belonging of the first data table.
  • the change frequency parameter is used to identify a change frequency of a field in the first data table and/or a change frequency of a first data table, where the DQC parameter is used to identify that the first data table is monitored by a DQC parameter.
  • the display unit is further configured to display the plurality of data tables according to the category parameter, wherein the first data table is displayed in the The classification under which the first data table belongs.
  • the determining unit is further configured to trigger the second updating unit
  • the second update unit is configured to update the identifier of the first data table to a second identifier
  • the display unit displays the plurality of data tables to the query request according to the identifier of the data table, wherein the display position of the third data table is better than the display position of the first data table, and the third data table is One of the plurality of data tables, the identifier of the third data table being the first identifier.
  • the determining unit is further configured to determine whether the table parameter meets a second determining condition, and the matching condition of the second determining condition is lower than the a condition of the first judgment condition; if the table parameter does not meet the second judgment condition, triggering the third update unit;
  • the third update unit is configured to update the identifier of the first data table to a third identifier
  • the display unit displays the plurality of data tables to the query request according to an identifier of the data table, wherein the first data table is masked in the process of displaying the plurality of data tables.
  • the first determining condition includes a combination of the integrity parameter having a table annotation, a comment field ratio reaching a preset threshold, having a data hierarchy, a storage type having data, and having any one or more of scheduling periods;
  • the first determining condition further includes the update parameter having an update record that is continuously updated.
  • the server determines whether the table parameter of the first data table meets the first determining condition, and updates the identifier of the first data table to the first identifier when the first determining condition is met,
  • the server receives the query request including the query condition, and when the first data table is included in the plurality of data tables that meet the query condition, the server displays the plurality of data tables to the query request according to the identifier of the data table,
  • the display position of the first data table identified as the first identifier is better than the display position of the second data table that is not the first identifier, thereby, by the classification of the data table, the data table capable of meeting the first judgment condition or
  • the identifier of the data table with better quality is updated to the first identifier, so that when the data table is queried, the data table with the first identifier and the better quality in the query result will be preferentially displayed to the querying user.
  • the quality data table is more able to meet the user's query requirements than the low-quality data table.
  • the user basically only needs to browse these high-quality data tables. To find the data sheet in line with their needs, eliminating a large extent from the time looking at the poor quality of the data in the table, query time savings, improve query efficiency.
  • FIG. 1 is a flowchart of a method for classifying a data table according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for classifying a data table according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for classifying a data table according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a data table classification structure according to an embodiment of the present invention.
  • FIG. 5 is a structural diagram of a device for classifying a data table according to an embodiment of the present invention.
  • FIG. 6 is a structural diagram of a device for classifying a data table according to an embodiment of the present invention.
  • a large number of data tables are stored on the cloud computing platform.
  • users can query on the cloud computing platform according to their own needs to see if they have saved the data table that meets their own needs. If they can query, they can save a lot of development time and energy.
  • the number of data tables saved on the cloud computing platform has increased exponentially.
  • the data output delay of the data table may result in delay of the project output using the data table, and the user experience difference.
  • an embodiment of the present invention provides a method and apparatus for classifying a data table.
  • the server determines whether the table parameter of the first data table meets the first determining condition, and when the first determining condition is met, the first data is The identifier of the table is updated to the first identifier, and the server receives the query request including the query condition.
  • the server When the first data table is included in the plurality of data tables that meet the query condition, the server will use the identifier of the data table to The query request displays the plurality of data tables such that the display position of the first data table identified as the first identifier is better than the display position of the second data table that is not the first identifier, thereby, by classification of the data table, The identifier of the data table that can meet the first judgment condition or the data table with better quality is updated to the first identifier, so that when the data table is queried, the data table with the first identifier and the better quality can be obtained in the query result. Users will be given priority to the query. These high-quality data tables are more able to meet the user's query requirements than the low-quality data tables.
  • the users basically These high-quality data required to browse the table to find data that meets the needs of the table itself, eliminating a large extent from the time looking at the poor quality of the data in the table, query time savings, improve query efficiency. Moreover, if the user finally determines the data table identified as the first identifier from the search result as the data table that truly meets the query requirement, since the metadata of the data table identified as the first identifier has good integrity and the update record is complete, It can effectively ensure that the user is using the data table, and it will be less likely to occur due to the delay of the update of the data table. The situation has improved the user experience.
  • FIG. 1 is a flowchart of a method for classifying a data table according to an embodiment of the present invention, where the method includes:
  • the server acquires a table parameter of the first data table, where the table parameter includes an integrity parameter and an update parameter, where the integrity parameter is used to identify metadata integrity of the first data table, where the update parameter is used to An update record identifying the first data table.
  • the server serves a cloud computing platform.
  • the server may be a data table storage server that includes the first data table, or may be a server that is only used for processing table classification and query, which is not limited by the present invention.
  • the integrity parameter can determine whether the metadata of the first data table is complete and whether there is no deletion.
  • the first determining condition includes: the integrity parameter has a table comment, the comment field ratio reaches a preset threshold, has a data hierarchy, has a storage type of data, and has one or more of a scheduling period.
  • the first determining condition further includes the update parameter having an update record that is continuously updated. That is to say, the integrity parameter may specifically include at least: whether there is a table comment, and the integrity of the table comment is better. Whether there is a person in charge of the form, the integrity of the person in charge of the table is better. The percentage of fields with comments is higher. The higher the percentage, the better the integrity. Whether there is a data hierarchy, the integrity of the data hierarchy is better.
  • the update record identified by the update parameter can determine whether the first data table is continuously updated.
  • S102 The server determines whether the table parameter meets the first determining condition. If the table parameter meets the first determination condition, S103 is performed.
  • the first determining condition may be understood as a determining condition for determining whether the table parameter reaches a certain standard.
  • the first determining condition may be: determining whether the integrity parameter in the table parameter has a table comment, whether there is a data level, and whether the proportion of the annotated field reaches a certain percentage; Determining whether the first data table is continuously updated according to the update parameter. When there is a table comment, a data level, and a comment field percentage reaches 100% and is continuously updated, the table parameter meets the first judgment condition, and vice versa does not meet the first judgment condition.
  • S103 The server updates the identifier of the first data table to a first identifier.
  • the first identifier is an identifier corresponding to the high quality data table, and the server may identify which category the data table belongs to by reading the identifier of the data table, and the category described herein may be high quality, normal, or the like.
  • the first determining condition may determine that the first data table belongs to a high quality data table. Updating the identifier of the first data table determined to be high quality to the first identifier can be understood as dividing the first data table into categories of high quality data tables.
  • the identifier of the first data table uniquely corresponds to the first data table.
  • the identifier of the first data table has one and only one, which can be used to clarify the category in which the first data table is located. For example, when the identifier of the first data table is the first identifier, the first data table is in a category of the high quality data table corresponding to the first identifier, if the identifier of the first data table is followed by a cause When the first judgment condition cannot be met, the identifier of the first data table is updated from the first identifier to another identifier, such as a second identifier or a third identifier, which will be mentioned later, then the first data The table is in a category corresponding to the second identifier or the third identifier.
  • S104 The server receives a query request, where the query request includes a query condition.
  • the query condition may include a query keyword and the like, and details are not described herein again.
  • the server displays the plurality of data tables according to the identifier of the data table to the query request; wherein the first data is The display position of the table is better than the display position of the second data table, the second data table is one of the plurality of data tables, and the identifier of the second data table is not the first identifier.
  • the data table obtained according to a query condition that meets the query conditions is massive.
  • the number of the plurality of data tables that meet the query condition is at least two, one is the first data table, and the other is the second data table.
  • the first data table has been determined as a high quality data table in S103, and the identifier of the first data table is updated to correspond to the first identifier of the high quality data table category.
  • the identifier of the second data table is not the first identifier, that is, the second data table is not in the category of the high quality data table corresponding to the first identifier, or the second data table is not High quality data sheet.
  • the server Upon presenting the query results for the query request, the server will determine the placement of the data table based on the identification of the data table.
  • the high quality data table category is determined from the data table, and the identifiers of the data tables in the high quality data table category are all updated to the first identifier.
  • the server may identify the data table in the high-quality data table category (that is, the data table identified as the first identifier) by using the identifier of the data table, or may identify that the data table is not high by the identifier of the data table.
  • a data table of the quality data table category ie, a data table whose identification is not the first identifier).
  • the server When determining the display position, the server places the display position of the first data table identified as the first identifier at a position superior to the display position of the second data table, so that the user sees the query result.
  • the first data table can be seen more easily, the second data table It will be seen as easily as without the first data table.
  • the display location is exemplified. For example, when multiple pages are required to display the query result, the first data table may be located on the front page, and the second data table is located on the back page opposite to the first data table.
  • the display position of the first data table may be higher, so that the user can see the first time, and the second data table is displayed at a higher position than the first The data table is lower, making it possible to scroll through the screen before it can be seen.
  • the display position may not be simply understood as “location”.
  • the data table with the first identifier may be displayed in color, enlarged display, and the like. No more examples are given here.
  • the server receives the query condition including the query condition.
  • the query request when the first data table is included in the plurality of data tables that meet the query condition, the server displays the plurality of data tables to the query request according to the identifier of the data table, so that the identifier is the first identifier.
  • the display position of the first data table is better than the display position of the second data table that is not the first identifier, thereby, by the classification of the data table, the data table or the better quality data that can meet the first judgment condition
  • the identifier of the table is updated to the first identifier, so that when the data table is queried, the data table with the first identifier and the better quality in the query result will be preferentially displayed to the querying user, and the high quality data table is relative to the quality.
  • the low data table can better meet the user's query requirements. Users only need to browse these high-quality data tables to find their own needs. Data sheets, eliminating a large extent from the time looking at the poor quality of the data in the table, query time savings, improve query efficiency.
  • the table parameter may further include other parameters for identifying related content of the data table.
  • the more types of parameters included in the table parameters the more the judgment criteria in the corresponding first judgment condition, and the higher the classification accuracy on the data table.
  • the embodiment of the present invention provides a parameter combination that can be included in a table parameter.
  • the table parameter further includes any one of a category parameter, a change frequency parameter, and a data quality control (English: Data Quality Control, abbreviation: DQC) parameter. A combination of one or more.
  • the category parameter is used to identify the category to which the first data table belongs, as shown in Table 1:
  • Ant Gold Service (1001)
  • Table 1 shows the results of a query that includes multiple categories for a single query request.
  • Table 1 specifically includes 10 different categories, and may actually include more categories or different category names.
  • the numbers shown in parentheses in Table 1 are the number of different categories in the data table that meet the conditions of this query. For example, a total of 1711 data tables in the query result belong to the common middle layer. A total of 378 data tables belong to the security department. This category.
  • the server displays the plurality of data tables to the query request according to the identifier of the data table, and further includes:
  • the server classifies the plurality of data tables according to category parameters, wherein the first data table is displayed under the category to which the first data table belongs.
  • the purpose of classification can make the user's purpose more clear. Since the category can be related to the work area, the user can directly enter the corresponding category in the search result according to his or her own field. The data table can have a greater chance of hitting the query. This further improves query efficiency and saves query time.
  • the change frequency parameter is used to identify a change frequency of a field in the first data table and/or a change frequency of the first data table.
  • the frequency of change here can be understood as whether it is changed frequently.
  • the parameters according to the parameters may include: the number of days of table or field renaming ⁇ 2 days in the past 90 days, the proportion of fields renaming in the past 90 days is ⁇ 5%, the number of days in which table reconstruction has occurred in the last 90 days ⁇ 2 days, average of 7 days
  • the running time is less than 2 hours. If some or all of the above parameters are met, it can be determined that the change is frequent, and vice versa.
  • the DQC parameter is used to identify a parameter that the first data table is monitored by the DQC. This can include whether there is strong monitoring, whether the number of monitoring types exceeds three, and whether there is strong unique value monitoring. When any of the conditions is YES, it can be understood that the DQC parameter meets the first judgment condition.
  • FIG. 2 is a flowchart of a method for classifying a data table according to an embodiment of the present invention, where the method includes:
  • S201 The server acquires a table parameter of the first data table.
  • S202 The server determines whether the table parameter meets the first determining condition; if the table parameter does not meet the first determining condition, executing S203.
  • S203 The server updates the identifier of the first data table to a second identifier.
  • the first judgment condition is not met, it can be understood that the first data table does not conform to become a high quality data table, and the first data table will not be classified into a high quality data table.
  • the second identifier can be understood to correspond to a common data table category.
  • the data table classification proposed by the embodiment of the present invention is not a one-time classification, but will be correspondingly changed according to the change of the table parameters.
  • the table parameter is analyzed by a periodic or other manner.
  • the identifier of the first data table may be updated to the first identifier, and then If the table parameter of the first data table cannot meet the criterion of the first determining condition, update the identifier of the first data table from the first identifier to the second identifier, if the first The table parameter of the data table again reaches the criterion of the first determining condition, and the identifier of the first data table may be updated from the second identifier to the first identifier.
  • S204 The server receives a query request, where the query request includes a query condition.
  • the server displays the multiple data tables according to the identifier of the data table to the query request; wherein, the third data table The display location is better than the display location of the first data table, the third data table is one of the plurality of data tables, and the identifier of the third data table is the first identifier.
  • the display position of the first data table belonging to the common data table category is inferior to the high quality data table.
  • the display position of the third data table of the category is specifically inferior to the related description of S105, wherein the first data table in S205 can be regarded as the second data table in S105, in S205
  • the third data table can be regarded as the first data table in S105.
  • the saved data table on the cloud computing platform can be divided into two categories by using the first determining condition, and one category is a high-quality data table category that meets the first determining condition, and one category is not in accordance with the first A common data table category that determines the condition.
  • FIG. 3 is provided according to an embodiment of the present invention.
  • S301 The server acquires a table parameter of the first data table.
  • S302 The server determines whether the table parameter meets the first determining condition; if the table parameter does not meet the first determining condition, executing S303.
  • S303 The server determines whether the table parameter meets a second determining condition, and the matching condition of the second determining condition is lower than a matching condition of the first determining condition. If the table parameter does not meet the second determination condition, S304 is performed, and if the table parameter meets the second determination condition, then S203 is performed.
  • S304 The server updates the identifier of the first data table to a third identifier.
  • the second determination condition can be understood as a judgment condition that is more easily matched with the first determination condition.
  • the second determining condition may be used as a reference for determining whether the data table belongs to a common data table category, if the first data table does not meet the first determining condition but meets the second Judging the condition, the server may update the identifier of the first data table to the second identifier. If the first data table cannot meet the second determining condition, the first data table may be considered as not meeting the condition of being a common data table, and may not be classified into the common data table category corresponding to the second identifier.
  • the server updates the identifier of the first data table to a third identifier, which is equivalent to assigning the first data table to a category corresponding to the third identifier.
  • the measure taken by the embodiment of the present invention is to shield the data table identified as the third identifier from being displayed.
  • the data tables that do not meet the second determination condition are not necessarily the data tables with poor quality, for example, because some items that generate the data table are relatively confidential or It is more private and does not want to be searched by others on the cloud computing platform. Or, some developers use data tables that are used in the online testing process and do not want to disclose them to prevent them from being overwritten by others. According to these requirements, these requirements can also be judged as part of the second judgment condition.
  • the cloud computing platform divides the data table into a production table in the production scheduling and a development table (ie, dev table) and a temporary table (tmp table) in the development process, so that the embodiment of the present invention is
  • the development table and the temporary table may be directly used as a data table that does not meet the second determination condition, so that it cannot be searched by others.
  • S305 The server receives a query request, where the query request includes a query condition.
  • the server displays the multiple data tables according to the identifier of the data table, where the server is displaying The first data table is masked in the process of the plurality of data tables.
  • the server when the server displays the query result to the user, the server identifies the query result as the third.
  • the identified data table is masked, that is, the data table identified as the third identifier is not presented to the user. This eliminates the time wasted by the user in the low quality data table in the query results.
  • FIG. 4 is a schematic diagram of a data table sorting structure provided by an embodiment of the present invention, as shown in FIG.
  • the data table of the first determination condition is classified into a fine table (ie, the aforementioned high quality data table), and the data table that does not meet the first determination condition but meets the second determination condition is classified into a common table, and the part conforms to the
  • the data table of the second judgment condition is classified into a private table.
  • the fine table and the ordinary table in the query result can be displayed to the user, and the private table in the query result will not be displayed to the user, and is invisible to the user.
  • Using the form of a pyramid to organize the data tables of the cloud computing platform you can add a Service Level Agreement (English: Service-Level Agreement, SLA) guarantee for the boutique table.
  • SLA Service-Level Agreement
  • the embodiment of the present invention is not limited to classifying only the data table into two or three categories, and may be further divided into more categories according to specific application scenarios.
  • FIG. 5 is a structural diagram of a device for classifying a data table according to an embodiment of the present disclosure, where the device includes:
  • the obtaining unit 501 is configured to obtain a table parameter of the first data table, where the table parameter includes an integrity parameter and an update parameter, where the integrity parameter is used to identify metadata integrity of the first data table, the update The parameter is used to identify an update record of the first data table.
  • the server serves a cloud computing platform.
  • the server may be a data table storage server that includes the first data table, or may be a server that is only used for processing table classification and query, which is not limited by the present invention.
  • the integrity parameter can determine whether the metadata of the first data table is complete and whether there is no deletion.
  • the integrity parameter may specifically include at least: whether there is a table comment, and the integrity of the table comment is better. Whether there is a person in charge of the form, the integrity of the person in charge of the table is better. The percentage of fields with comments is higher. The higher the percentage, the better the integrity.
  • Whether there is a data hierarchy the integrity of the data hierarchy is better. Is there a data storage type (full partition table / incremental partition table / non-partition table, etc.), the integrity of the storage type with data is better. Whether there is a scheduling period (day/hour/week/minute, etc.), the integrity of the scheduling period is better.
  • the update record identified by the update parameter can determine whether the first data table is continuously updated.
  • the determining unit 502 is configured to determine whether the table parameter meets the first determining condition; if the table parameter meets the first determining condition, triggering the first updating unit 503.
  • the first determining condition may be understood as a determining condition for determining whether the table parameter reaches a certain standard.
  • the first determining condition includes: the integrity parameter has a table comment, the comment field ratio reaches a preset threshold, has a data hierarchy, has a storage type of data, and has one or more of a scheduling period.
  • the first determining condition further includes the update parameter having an update record that is continuously updated.
  • the first determining condition may be determining whether the integrity parameter in the table parameter has a table comment, whether there is a data level, and an annotated field ratio. Whether a certain percentage is reached; whether the first data table is continuously updated may be determined according to the update parameter. When there is a table comment, a data level, and a comment field percentage reaches 100% and is continuously updated, the table parameter meets the first judgment condition, and vice versa does not meet the first judgment condition.
  • the first update unit 503 is configured to update the identifier of the first data table to the first identifier.
  • the first identifier is an identifier corresponding to the high-quality data table, and the identifier of the data table can be identified by reading the identifier of the data table, and the category described herein can be high quality, normal, and the like.
  • the table parameter meets the first determining condition, it may be determined that the first data table belongs to a high quality data table. Updating the identifier of the first data table determined to be high quality to the first identifier can be understood as dividing the first data table into categories of high quality data tables.
  • the identifier of the first data table uniquely corresponds to the first data table.
  • the identifier of the first data table has one and only one, which can be used to clarify the category in which the first data table is located. For example, when the identifier of the first data table is the first identifier, the first data table is in a category of the high quality data table corresponding to the first identifier, if the identifier of the first data table is followed by a cause When the first judgment condition cannot be met, the identifier of the first data table is updated from the first identifier to another identifier, such as a second identifier or a third identifier, which will be mentioned later, then the first data The table is in a category corresponding to the second identifier or the third identifier.
  • the receiving unit 504 is configured to receive a query request, where the query request includes a query condition.
  • the query condition may include a query keyword and the like, and details are not described herein again.
  • the display unit 505 is configured to: if the first data table is included in the plurality of data tables that meet the query condition, display the plurality of data tables according to the identifier of the data table; wherein, the first The display position of the data table is better than the display position of the second data table, the second data table is one of the plurality of data tables, and the identifier of the second data table is not the first identifier.
  • the data table obtained according to a query condition that meets the query conditions is massive.
  • the number of the plurality of data tables that meet the query condition is at least two, one is the first data table, and the other is the second data table.
  • the first data table has been
  • the determining unit 502 determines that the high quality data table is updated, and the identifier of the first data table is updated to correspond to the first identifier of the high quality data table category.
  • the identifier of the second data table is not the first identifier, that is, the second data table is not in the category of the high quality data table corresponding to the first identifier, or the second data table is not High quality data sheet.
  • the presentation unit 505 Upon presenting the query results for the query request, the presentation unit 505 will determine the placement of the data table based on the identification of the data table.
  • the high quality data table category is determined from the data table, and the identifiers of the data tables in the high quality data table category are all updated to the first identifier.
  • the display unit 505 can identify the data table in the high-quality data table category (that is, the data table identified as the first identifier) by using the identifier of the data table, or identify the data table by using the identifier of the data table.
  • a data table in a high quality data table category ie, a data table whose identification is not the first identifier).
  • the display position of the first data table identified as the first identifier is placed at a position superior to the display position of the second data table, so that the user sees the
  • the first data table can be seen more easily, and the second data table will be seen as easily as the first data table.
  • the display location is exemplified. For example, when multiple pages are required to display the query result, the first data table may be located on the front page, and the second data table is located on the back page opposite to the first data table.
  • the display position of the first data table may be higher, so that the user can see the first time, and the second data table is displayed at a higher position than the first The data table is lower, making it possible to scroll through the screen before it can be seen.
  • the display position may not be simply understood as “location”.
  • the data table with the first identifier may be displayed in color, enlarged display, and the like. No more examples are given here.
  • the server receives the query condition including the query condition.
  • the query request when the first data table is included in the plurality of data tables that meet the query condition, the server displays the plurality of data tables to the query request according to the identifier of the data table, so that the identifier is the first identifier.
  • the display position of the first data table is better than the display position of the second data table that is not the first identifier, thereby, by the classification of the data table, the data table or the better quality data that can meet the first judgment condition
  • the identifier of the table is updated to the first identifier, so that when the data table is queried, the data table with the first identifier and the better quality in the query result will be preferentially displayed to the querying user, and the high quality data table is relative to the quality.
  • the low data table can better meet the user's query requirements. Users only need to browse these high-quality data tables to find their own needs. Data sheets, eliminating a large extent from the time looking at the poor quality of the data in the table, query time savings, improve query efficiency.
  • the table parameter may further include other parameters for identifying related content of the data table.
  • the more types of parameters included in the table parameters the more the judgment criteria in the corresponding first judgment condition, and the higher the classification accuracy on the data table.
  • the embodiment of the present invention provides a combination of parameters that can be included in the table parameter.
  • the table parameter further includes a combination of any one or more of a category parameter, a change frequency parameter, and a DQC parameter.
  • the category parameter is used to identify the category to which the first data table belongs, as shown in Table 1.
  • the change frequency parameter is used to identify a change frequency of a field in the first data table and/or a change frequency of the first data table.
  • the frequency of change here can be understood as whether it is changed frequently.
  • the parameters according to the parameters may include: the number of days of table or field renaming ⁇ 2 days in the past 90 days, the proportion of fields renaming in the past 90 days is ⁇ 5%, the number of days in which table reconstruction has occurred in the last 90 days ⁇ 2 days, average of 7 days
  • the running time is less than 2 hours. If some or all of the above parameters are met, it can be determined that the change is frequent, and vice versa.
  • the DQC parameter is used to identify a parameter that the first data table is monitored by the DQC. This can include whether there is strong monitoring, whether the number of monitoring types exceeds three, and whether there is strong unique value monitoring. When any of the conditions is YES, it can be understood that the DQC parameter meets the first judgment condition.
  • the display unit 505 is further configured to display, by using the category parameter, the plurality of data tables, wherein the first data table is displayed in the The classification under which the first data table belongs.
  • the purpose of classification can make the user's purpose more clear. Since the category can be related to the work area, the user can directly enter the corresponding category in the search result according to his or her own field. The data table can have a greater chance of hitting the query. This further improves query efficiency and saves query time.
  • FIG. 6 is a structural diagram of a device for classifying a data table according to an embodiment of the present invention. If the table parameter does not meet the first determining condition, the determining unit 502 is also used to trigger the second update unit 601;
  • the second update unit 601 is configured to update the identifier of the first data table to a second identifier.
  • the first judgment condition is not met, it can be understood that the first data table does not conform to become a high quality data table, and the first data table will not be classified into a high quality data table.
  • the second identifier can be understood to correspond to a common data table category.
  • the data table classification proposed by the embodiment of the present invention is not It is a one-time classification, but it will change according to the changes of the table parameters.
  • the table parameter is analyzed by a periodic or other manner.
  • the identifier of the first data table may be updated to the first identifier, and then If the table parameter of the first data table cannot meet the criterion of the first determining condition, update the identifier of the first data table from the first identifier to the second identifier, if the first The table parameter of the data table again reaches the criterion of the first determining condition, and the identifier of the first data table may be updated from the second identifier to the first identifier.
  • the display unit 505 displays the plurality of data tables to the query request according to the identifier of the data table, wherein the display position of the third data table is better than the display position of the first data table, the third data table For one of the plurality of data tables, the identifier of the third data table is the first identifier.
  • the display position of the first data table belonging to the common data table category is inferior to the high quality data table.
  • the display position of the third data table of the category is specifically inferior to the related description of the embodiment corresponding to FIG. 5.
  • the saved data table on the cloud computing platform can be divided into two categories by using the first determining condition, and one category is a high-quality data table category that meets the first determining condition, and one category is not in accordance with the first A common data table category that determines the condition.
  • a second judgment condition to separate the third category of the data table.
  • the determining unit 502 is further configured to determine whether the table parameter meets the second determining condition, and the matching condition of the second determining condition is lower than the first determining condition The condition is met; if the table parameter does not meet the second determination condition, the third update unit 602 is triggered.
  • the third updating unit 602 is configured to update the identifier of the first data table to a third identifier.
  • the second determination condition can be understood as a judgment condition that is more easily matched with the first determination condition.
  • the second determining condition may be used as a reference for determining whether the data table belongs to a common data table category, if the first data table does not meet the first determining condition but meets the second Judging the condition, the identifier of the first data table may be updated to the second identifier. If the first data table cannot meet the second determining condition, the first data table may be considered as not meeting the condition of being a common data table, and may not be classified into the common data table category corresponding to the second identifier.
  • the third update unit 602 updates the identifier of the first data table to a third identifier, which is equivalent to assigning the first data table to a category corresponding to the third identifier.
  • the measure taken by the embodiment of the present invention is to shield the data table identified as the third identifier from being displayed.
  • the data tables that do not meet the second determination condition are not necessarily all It is a poor quality data table.
  • some items that generate data tables are relatively confidential or have high privacy, they are not expected to be searched by others in the cloud computing platform.
  • some developers use data tables that are used in the online testing process and do not want to disclose them to prevent them from being overwritten by others. According to these requirements, these requirements can also be judged as part of the second judgment condition.
  • the cloud computing platform divides the data table into a production table in the production scheduling and a development table (ie, dev table) and a temporary table (tmp table) in the development process, so that the embodiment of the present invention is
  • the development table and the temporary table may be directly used as a data table that does not meet the second determination condition, so that it cannot be searched by others.
  • the display unit 505 presents the plurality of data tables to the query request according to the identifier of the data table, wherein the first data table is masked in the process of displaying the plurality of data tables.
  • the display unit 505 displays the query result to the user
  • the data table identified as the third identifier in the query result is masked, that is, the data table identified as the third identifier is not displayed to the user. This eliminates the time wasted by the user in the low quality data table in the query results.
  • the data table on the cloud computing platform can be divided into three categories by using the first determining condition and the second determining condition. As shown in FIG. 4, the data table that meets the first determining condition is classified into a fine table (ie, the foregoing The high quality data table), the data table that does not meet the first judgment condition but meets the second judgment condition is classified into a normal table, and the data table that meets the second judgment condition is classified as a private table.
  • the fine table and the ordinary table in the query result can be displayed to the user, and the private table in the query result will not be displayed to the user, and is invisible to the user.
  • Using the form of a pyramid to organize the data tables of the cloud computing platform can add SLA guarantees to the boutique tables.
  • the embodiment of the present invention is not limited to classifying only the data table into two or three categories, and may be further divided into more categories according to specific application scenarios.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据表的分类方法和装置,所述方法包括:服务器获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录(S101);所述服务器判断所述表参数是否符合第一判断条件(S102);若所述表参数符合所述第一判断条件,将所述第一数据表的标识更新为第一标识(S103);所述服务器接收查询请求,所述查询请求包括查询条件(S104);若符合所述查询条件的多个数据表中包括所述第一数据表,根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识(S105)。通过数据表的分类,这样在查询数据表时,可以将查询结果中具有第一标识的、质量更优的数据表优先展示给查询的用户,从而免去了在质量不高的数据表中查找的时间,提高了查询效率。

Description

一种数据表的分类方法和装置
本申请要求2015年08月11日递交的申请号为201510490712.0、发明名称为“一种数据表的分类方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,特别是涉及一种数据表的分类方法和装置。
背景技术
云计算平台上保存有大量的数据表(table)。数据表中保存的数据可以是从各个系统中采集来的日志数据、交易数据、用户数据等,数据表可以由用户上传或提供到云计算平台以便起到共享等作用。用户可以在云计算平台上查找所需的数据表。
目前大数据的广泛应用使得云计算平台上保存的数据表的数量指数级增长。用户在云计算平台上查找数据表时,符合查询关键词(key word)的数据表会有很多,而且通过查询关键词展示给用户的数据表的质量可能良莠不齐,即使是有经验的用户也需要消耗大量时间找到所需的数据表并鉴别数据表的质量,导致用户需要花费3到5个小时甚至几天的时间才可能从海量的查询结果中找到真正适合自身需求的数据表。
发明内容
为了解决上述技术问题,本发明提供了一种数据表的分类方法和装置,从数据表中判断出高质量的数据表,并在查询过程中,将查询结果中的高质量数据表优先展示,提高了查询效率。
本发明实施例公开了如下技术方案:
一种数据表的分类方法,所述方法包括:
服务器获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录;
所述服务器判断所述表参数是否符合第一判断条件;
若所述表参数符合所述第一判断条件,所述服务器将所述第一数据表的标识更新为第一标识;
所述服务器接收查询请求,所述查询请求包括查询条件;
若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
可选的,所述表参数还包括类目参数、变更频率参数和数据指令控制DQC参数中的任意一种或多种的组合,所述类目参数用于标识所述第一数据表的所属的分类,所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率,所述DQC参数用于标识所述第一数据表被DQC监控的参数。
可选的,若所述表参数包括所述类目参数,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,还包括:
所述服务器将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
可选的,所述服务器判断所述表参数是否符合第一判断条件,还包括:
若所述表参数不符合所述第一判断条件,所述服务器将所述第一数据表的标识更新为第二标识;
若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,包括:
第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
可选的,若所述表参数不符合所述第一判断条件,还包括:
所述服务器判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件;
若所述表参数不符合所述第二判断条件,所述服务器将所述第一数据表的标识更新为第三标识;
所述若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,包括:
所述服务器在展示所述多个数据表的过程中屏蔽所述第一数据表。
可选的,
所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、 具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;
所述第一判断条件还包括所述更新参数具有持续更新的更新记录。
一种数据表的分类装置,所述装置包括:
获取单元,用于获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录;
判断单元,用于判断所述表参数是否符合第一判断条件;若所述表参数符合所述第一判断条件,触发第一更新单元;
所述第一更新单元,用于将所述第一数据表的标识更新为第一标识;
接收单元,用于接收查询请求,所述查询请求包括查询条件;
展示单元,用于若符合所述查询条件的多个数据表中包括所述第一数据表,根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
可选的,所述表参数还包括类目参数、变更频率参数和数据指令控制DQC参数中的任意一种或多种的组合,所述类目参数用于标识所述第一数据表的所属的分类,所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率,所述DQC参数用于标识所述第一数据表被DQC监控的参数。
可选的,若所述表参数包括所述类目参数,所述展示单元还用于将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
可选的,若所述表参数不符合所述第一判断条件,所述判断单元还用于触发第二更新单元;
所述第二更新单元,用于将所述第一数据表的标识更新为第二标识;
所述展示单元根据数据表的标识向所述查询请求展示所述多个数据表,其中,第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
可选的,若所述表参数不符合所述第一判断条件,所述判断单元还用于判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件;若所述表参数不符合所述第二判断条件,触发第三更新单元;
所述第三更新单元,用于将所述第一数据表的标识更新为第三标识;
所述展示单元根据数据表的标识向所述查询请求展示所述多个数据表,其中,在展示所述多个数据表的过程中屏蔽所述第一数据表。
可选的,
所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;
所述第一判断条件还包括所述更新参数具有持续更新的更新记录。
由上述技术方案可以看出,服务器通过判断第一数据表的表参数是否符合第一判断条件,在符合第一判断条件时,将所述第一数据表的标识更新为第一标识,所述服务器接收包括查询条件的查询请求,当符合查询条件的多个数据表中包括所述第一数据表时,所述服务器将根据数据表的标识向所述查询请求展示所述多个数据表,使得标识为第一标识的第一数据表的展示位置优于标识不是第一标识的第二数据表的展示位置,由此,通过数据表的分类,为能够符合第一判断条件的数据表或者说质量更优的数据表的标识更新为第一标识,这样在查询数据表时,可以将查询结果中具有第一标识的、质量更优的数据表将会优先展示给查询的用户,这些高质量数据表相对于质量不高的数据表更能够满足用户的查询需求,用户基本上只需浏览这些高质量的数据表就可以找到符合自身需求的数据表,从而从很大程度上免去了在质量不高的数据表中查找的时间,节约了查询时间,提高了查询效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种数据表的分类方法的方法流程图;
图2为本发明实施例提供的一种数据表的分类方法的方法流程图;
图3为本发明实施例提供的一种数据表的分类方法的方法流程图;
图4为本发明实施例提供的一种数据表分类结构示意图;
图5为本发明实施例提供的一种数据表的分类装置的装置结构图;
图6为本发明实施例提供的一种数据表的分类装置的装置结构图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
云计算平台上保存有大量的数据表。在研发或使用过程中,用户可以根据自身需求在云计算平台上进行查询,看看是否已经保存了符合自身需求的数据表,若能够查询到,则可以节约大量的研发时间、精力。然而由于目前大数据的广泛应用,使得云计算平台上保存的数据表的数量指数级增长。用户在云计算平台上查找数据表时,符合查询关键词或者说查询条件的数据表会有很多,而且通过查询关键词展示给用户的数据表的质量可能良莠不齐,即使是有经验的用户也需要消耗大量时间找到所需的数据表并鉴别数据表的质量,导致用户需要花费3到5个小时甚至几天的时间才可能从海量的查询结果中找到真正适合自身需求的数据表。而且,若找到的数据表质量不高,例如更新频率没有保障的话,用户在使用该数据表时,可能会由于该数据表的数据更新延时导致使用该数据表的项目产出延迟,用户体验差。
为此,本发明实施例提供了一种数据表的分类方法和装置,服务器通过判断第一数据表的表参数是否符合第一判断条件,在符合第一判断条件时,将所述第一数据表的标识更新为第一标识,所述服务器接收包括查询条件的查询请求,当符合查询条件的多个数据表中包括所述第一数据表时,所述服务器将根据数据表的标识向所述查询请求展示所述多个数据表,使得标识为第一标识的第一数据表的展示位置优于标识不是第一标识的第二数据表的展示位置,由此,通过数据表的分类,为能够符合第一判断条件的数据表或者说质量更优的数据表的标识更新为第一标识,这样在查询数据表时,可以将查询结果中具有第一标识的、质量更优的数据表将会优先展示给查询的用户,这些高质量数据表相对于质量不高的数据表更能够满足用户的查询需求,用户基本上只需浏览这些高质量的数据表就可以找到符合自身需求的数据表,从而从很大程度上免去了在质量不高的数据表中查找的时间,节约了查询时间,提高了查询效率。并且,若用户最终从查找结果中确定出标识为第一标识的数据表作为真正符合查询需求的数据表,由于标识为第一标识的数据表的元数据完整性较好、更新记录较为完备,可以有效的保证用户在使用该数据表是,会较少的出现由于该数据表更新延迟导致使用该数据表的项目产出延迟的 情况,提高了用户体验。
实施例一
图1为本发明实施例提供的一种数据表的分类方法的方法流程图,所述方法包括:
S101:服务器获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录。
举例说明,所述服务器服务于云计算平台。所述服务器可以为保存包括所述第一数据表的数据表存储服务器,也可以是仅用于处理表分类和查询的服务器,本发明对此不进行限定。
通过所述完整性参数可以判定所述第一数据表的元数据是否完整,是否无缺失。可选的,所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;所述第一判断条件还包括所述更新参数具有持续更新的更新记录。也就是说,所述完整性参数具体可以至少包括:是否有表注释,有表注释的完整性更好。是否有表负责人,有表负责人的完整性更好。有注释的字段占比,百分比越高完整性越好。是否有数据层次,有数据层次的完整性更好。是否有数据的存储类型(全量分区表/增量分区表/非分区表等),有数据的存储类型的完整性更好。是否有调度周期(天/小时/周/分钟等),有调度周期的完整性更好。通过所述更新参数所标识的更新记录,可以判断所述第一数据表是否是持续更新的。
S102:所述服务器判断所述表参数是否符合第一判断条件。若所述表参数符合所述第一判断条件,执行S103。
举例说明,所述第一判断条件可以理解为用于判断所述表参数是否达到一定标准的判断条件。以S101中所举的例子为例,所述第一判断条件可以是判断所述表参数中的完整性参数是否有表注释、是否有数据层次、有注释的字段占比是否达到一定百分比;可以根据所述更新参数判断所述第一数据表是否持续更新。当有表注释、有数据层次、有注释字段占比达到100%且持续更新时所述表参数符合所述第一判断条件,反之则不符合所述第一判断条件。
S103:所述服务器将所述第一数据表的标识更新为第一标识。
所述第一标识是对应于高质量数据表的标识,服务器可以通过读取数据表的标识来识别数据表属于哪一类别,这里所述的类别可以为高质量、普通等。当所述表参数符合 所述第一判断条件,可以判断出所述第一数据表属于高质量的数据表。将被判定为高质量的所述第一数据表的标识更新为第一标识,可以理解为将所述第一数据表分到了高质量数据表的类别中。
对于所述标识需要说明的是,所述第一数据表的标识与所述第一数据表唯一对应。所述第一数据表的标识有且只有一个,可以用于明确所述第一数据表所处的类别。例如当所述第一数据表的标识为第一标识时,所述第一数据表处于所述第一标识对应的高质量数据表的类别,若所述第一数据表的标识之后因为原因导致无法符合所述第一判断条件时,所述第一数据表的标识将会从第一标识被更新为其他标识例如稍后会提到的第二标识或第三标识,则所述第一数据表处于所述第二标识或第三标识对应的类别。
S104:所述服务器接收查询请求,所述查询请求包括查询条件。
举例说明,所述查询条件可以包括查询关键字等,这里不再赘述。
S105:若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
举例说明,尤其在大数据的背景下,依据一个查询条件获取到的符合该查询条件的数据表是海量的。在本发明实施例中,通过检索,符合所述查询条件的多个数据表的数量至少为两个,一个是所述第一数据表,另一个是所述第二数据表。所述第一数据表在S103中已经被判断为高质量数据表,所述第一数据表的标识被更新为对应所述高质量数据表类别的第一标识。所述第二数据表的标识不是所述第一标识,也就是说所述第二数据表没有处于所述第一标识所对应的高质量数据表的类别,或者说所述第二数据表不是高质量数据表。
在针对所述查询请求展示查询结果时,所述服务器将根据数据表的标识确定数据表的展示位置。根据本发明实施例中的数据表分类,从数据表中确定出了高质量数据表类别,处于高质量数据表类别中的数据表的标识均被更新为所述第一标识。在展示查询结果时,所述服务器可以通过数据表的标识识别出处于高质量数据表类别的数据表(即标识为第一标识的数据表),也可以通过数据表的标识识别出不处于高质量数据表类别的数据表(即标识不是第一标识的数据表)。所述服务器在确定展示位置时,将标识为第一标识的所述第一数据表的展示位置放在较所述第二数据表的展示位置更优的位置,使得用户看到所述查询结果时,所述第一数据表能够更加容易的被看到,所述第二数据表 将没有所述第一数据表那么容易的被看到。对所述展示位置进行举例说明,例如需要多页展示查询结果时,所述第一数据表可以位于靠前的页面,所述第二数据表位于相对第一数据表更为靠后的页面。例如需要单页展示查询结果时,所述第一数据表的展示位置可以更加靠上,使得可以被用户第一时间看到,而所述第二数据表的展示位置将较于所述第一数据表更加靠下,使得可能需要滚动屏幕后才能被看到。展示的方式较多,展示位置也可以不单纯的理解为“位置”,例如也可以对具有第一标识的数据表进行彩色展示、放大展示等。这里不再一一举例说明。
可见,服务器通过判断第一数据表的表参数是否符合第一判断条件,在符合第一判断条件时,将所述第一数据表的标识更新为第一标识,所述服务器接收包括查询条件的查询请求,当符合查询条件的多个数据表中包括所述第一数据表时,所述服务器将根据数据表的标识向所述查询请求展示所述多个数据表,使得标识为第一标识的第一数据表的展示位置优于标识不是第一标识的第二数据表的展示位置,由此,通过数据表的分类,为能够符合第一判断条件的数据表或者说质量更优的数据表的标识更新为第一标识,这样在查询数据表时,可以将查询结果中具有第一标识的、质量更优的数据表将会优先展示给查询的用户,这些高质量数据表相对于质量不高的数据表更能够满足用户的查询需求,用户基本上只需浏览这些高质量的数据表就可以找到符合自身需求的数据表,从而从很大程度上免去了在质量不高的数据表中查找的时间,节约了查询时间,提高了查询效率。
实施例二
所述表参数除了包括所述完整性参数和更新参数以外,还可以进一步的包括其他用于标识数据表相关内容的参数。一般来说,表参数包括的参数种类越多,相应的所述第一判断条件中的判断标准也越多,对数据表的分类精度也越高。本发明实施例提供了表参数可以包括的参数组合,可选的,所述表参数还包括类目参数、变更频率参数和数据质量控制(英文:Data Quality Control,缩写:DQC)参数中的任意一种或多种的组合。所述类目参数用于标识所述第一数据表的所属的分类,如表1所示:
类目导航
公共中间层(1711)
蚂蚁金服(1001)
安全部(378)
阿里云(859)
共享业务(6)
元数据(10)
阿里妈妈(16)
天猫(129)
搜索事业部(39)
航旅事业群(5)
表1
表1中展示的就是针对一次查询请求的包括多个类目的查询结果。表1中具体包括了10个不同的类目,实际也可以包括更多类目或不同的类目名称等。表1括号中显示的数字为符合本次查询条件的数据表中属于不同类目的数量,例如查询结果中共有1711个数据表属于公共中间层这一类目,共有378个数据表属于安全部这一类目。
也就是说,若所述表参数包括所述类目参数,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,还包括:
所述服务器将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
举例说明,通过分类目的展示,可以使得用户可以目的性更为明确,由于类目可以与工作领域等相关,用户可以根据自己所在的领域,直接进入查找结果中对应的类目,该类目中的数据表能够有更大的几率命中查询需求。由此进一步的提高了查询效率,节约了查询时间。
所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率。这里的变更频率可以理解为是否频繁变更。所依据的参数可以包括:近90天表或字段重命名天数<2天、近90天重命名的字段占比<5%、近90天发生过表重建的天数<2天、近7天平均运行时长小于2小时。如果达到上述参数的部分或全部要求,可以以此确定变更频繁,反之则不频繁。
所述DQC参数用于标识所述第一数据表被DQC监控的参数。其中可以包括是否有强监控、监控类型个数是否超过三个、是否有强唯一值监控等。其中任一个条件为是时,可以理解为所述DQC参数符合所述第一判断条件。
需要注意的是,如图1所对应实施例中,若所述服务器判断所述表参数是否符合第 一判断条件得到判断结果为不符合,将不会将所述第一数据表的标识更新为第一标识。在图1所对应实施例的基础上,图2为本发明实施例提供的一种数据表的分类方法的方法流程图,所述方法包括:
S201:服务器获取第一数据表的表参数。
S202:所述服务器判断所述表参数是否符合第一判断条件;若所述表参数不符合所述第一判断条件,执行S203。
S203:所述服务器将所述第一数据表的标识更新为第二标识。
举例说明,若不符合所述第一判断条件,可以理解为所述第一数据表并不符合成为一个高质量数据表,那么所述第一数据表将不会被划分到高质量数据表这一类别中。所述第二标识可以理解为对应于普通数据表类别。本发明实施例提出的数据表分类,并不是一次性的分类,而是会根据表参数的变化而相应的改变。通过周期性的或其他方式分析所述表参数,当所述第一数据表的表参数达到第一判断条件的标准,则可以将所述第一数据表的标识更新为第一标识,当之后所述第一数据表的表参数不能达到所述第一判断条件的标准,则将所述第一数据表的标识从所述第一标识更新为所述第二标识,若之后所述第一数据表的表参数再一次的达到所述第一判断条件的标准,又可以将所述第一数据表的标识从所述第二标识更新为所述第一标识。
S204:所述服务器接收查询请求,所述查询请求包括查询条件。
S205:若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表;其中,第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
也就是说,当所述第一数据表的标识为第二标识时,在作为查询结果进行展示时,属于普通数据表类别的所述第一数据表的展示位置将劣于属于高质量数据表类别的所述第三数据表的展示位置,具体如何劣于可以参照S105的相关描述,其中S205中的所述第一数据表可以视为S105中的所述第二数据表,S205中的所述第三数据表可以视为S105中的所述第一数据表。
通过所述第一判断条件,可以将云计算平台上的保存的数据表分为两个类别,一个类别为符合所述第一判断条件的高质量数据表类别,一个类别为不符合所述第一判断条件的普通数据表类别。为了更加精细化的对数据表分类,还可以增加第二判断条件,从而分出数据表的第三类别。在图1所对应实施例的基础上,图3为本发明实施例提供的 一种数据表的分类方法的方法流程图,所述方法包括:
S301:服务器获取第一数据表的表参数。
S302:所述服务器判断所述表参数是否符合第一判断条件;若所述表参数不符合所述第一判断条件,执行S303。
S303:所述服务器判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件。若所述表参数不符合所述第二判断条件,执行S304,若所述表参数符合所述第二判断条件,则执行S203。
S304:所述服务器将所述第一数据表的标识更新为第三标识。
举例说明,所述第二判断条件可以理解为相较于所述第一判断条件更容易符合的判断条件。在本发明实施例中,所述第二判断条件可以用于作为判断数据表是否属于普通数据表类别的基准,若所述第一数据表不符合所述第一判断条件但符合所述第二判断条件,所述服务器可以将所述第一数据表的标识更新为所述第二标识。若所述第一数据表不能符合所述第二判断条件,则可以认为所述第一数据表不符合成为普通数据表的条件,将不能被分类到所述第二标识对应的普通数据表类别,所述服务器将所述第一数据表的标识更新为第三标识,相当于将所述第一数据表分配到所述第三标识对应的类别中。对于这种低质量数据表,本发明实施例所采取的措施是在展示时将标识为第三标识的数据表屏蔽。
需要注意的是,在本发明实施例中,不符合所述第二判断条件的数据表并不一定都是质量不好的数据表,例如,由于有些产生数据表的项目相对机密性较高或私密性较高,不希望被他人在云计算平台中搜索到。或者,有些开发者在线上测试过程中使用的数据表并不想公开出来,以防止被他人引用过多产生故障。根据这些需求,也可以将这些需求作为所述第二判断条件的一部分判断依据。云计算平台在保存数据表的过程中,会将数据表分成了上生产调度的生产表和开发过程中的开发表(即dev表)、临时表(tmp表),故而早本发明实施例的解决方案中,可以是将开发表、临时表直接作为不符合所述第二判断条件的数据表,使其不可被他人搜索到。
S305:所述服务器接收查询请求,所述查询请求包括查询条件。
S306:若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,其中,所述服务器在展示所述多个数据表的过程中屏蔽所述第一数据表。
举例说明,所述服务器在向用户展示查询结果时,会将所述查询结果中标识为第三 标识的数据表屏蔽掉,即不向所述用户展示标识为第三标识的数据表。从而免去了用户在查询结果中的低质量数据表中所浪费的时间。
通过第一判断条件和第二判断条件,可以将云计算平台上的数据表分为三个类别,图4为本发明实施例提供的一种数据表分类结构示意图,如图4所示,符合所述第一判断条件的数据表被分类为精品表(即前述高质量数据表),不符合所述第一判断条件但符合第二判断条件的数据表被分类为普通表,部符合所述第二判断条件的数据表被分类为私有表。其中,查询结果中的精品表和普通表可以展示给用户,而查询结果中的私有表将不会展示给用户,对用户处于不可见的状态。使用金字塔的形式来组织云计算平台的数据表,可以为精品表增加服务等级协议(英文:Service-Level Agreement,缩写:SLA)保障。
本发明实施例并不限定仅将数据表分类为两个或三个类别,根据具体的应用场景,还可以分为更多个类别。
实施例三
图5为本发明实施例提供的一种数据表的分类装置的装置结构图,所述装置包括:
获取单元501,用于获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录。
举例说明,所述服务器服务于云计算平台。所述服务器可以为保存包括所述第一数据表的数据表存储服务器,也可以是仅用于处理表分类和查询的服务器,本发明对此不进行限定。
通过所述完整性参数可以判定所述第一数据表的元数据是否完整,是否无缺失。所述完整性参数具体可以至少包括:是否有表注释,有表注释的完整性更好。是否有表负责人,有表负责人的完整性更好。有注释的字段占比,百分比越高完整性越好。是否有数据层次,有数据层次的完整性更好。是否有数据的存储类型(全量分区表/增量分区表/非分区表等),有数据的存储类型的完整性更好。是否有调度周期(天/小时/周/分钟等),有调度周期的完整性更好。通过所述更新参数所标识的更新记录,可以判断所述第一数据表是否是持续更新的。
判断单元502,用于判断所述表参数是否符合第一判断条件;若所述表参数符合所述第一判断条件,触发第一更新单元503。
举例说明,所述第一判断条件可以理解为用于判断所述表参数是否达到一定标准的判断条件。可选的,所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;所述第一判断条件还包括所述更新参数具有持续更新的更新记录。以在获取单元501的相关描述中所举的例子为例,所述第一判断条件可以是判断所述表参数中的完整性参数是否有表注释、是否有数据层次、有注释的字段占比是否达到一定百分比;可以根据所述更新参数判断所述第一数据表是否持续更新。当有表注释、有数据层次、有注释字段占比达到100%且持续更新时所述表参数符合所述第一判断条件,反之则不符合所述第一判断条件。
所述第一更新单元503,用于将所述第一数据表的标识更新为第一标识。
所述第一标识是对应于高质量数据表的标识,可以通过读取数据表的标识来识别数据表属于哪一类别,这里所述的类别可以为高质量、普通等。当所述表参数符合所述第一判断条件,可以判断出所述第一数据表属于高质量的数据表。将被判定为高质量的所述第一数据表的标识更新为第一标识,可以理解为将所述第一数据表分到了高质量数据表的类别中。
对于所述标识需要说明的是,所述第一数据表的标识与所述第一数据表唯一对应。所述第一数据表的标识有且只有一个,可以用于明确所述第一数据表所处的类别。例如当所述第一数据表的标识为第一标识时,所述第一数据表处于所述第一标识对应的高质量数据表的类别,若所述第一数据表的标识之后因为原因导致无法符合所述第一判断条件时,所述第一数据表的标识将会从第一标识被更新为其他标识例如稍后会提到的第二标识或第三标识,则所述第一数据表处于所述第二标识或第三标识对应的类别。
接收单元504,用于接收查询请求,所述查询请求包括查询条件。
举例说明,所述查询条件可以包括查询关键字等,这里不再赘述。
展示单元505,用于若符合所述查询条件的多个数据表中包括所述第一数据表,根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
举例说明,尤其在大数据的背景下,依据一个查询条件获取到的符合该查询条件的数据表是海量的。在本发明实施例中,通过检索,符合所述查询条件的多个数据表的数量至少为两个,一个是所述第一数据表,另一个是所述第二数据表。所述第一数据表已 由所述判断单元502判断为高质量数据表,所述第一数据表的标识被更新为对应所述高质量数据表类别的第一标识。所述第二数据表的标识不是所述第一标识,也就是说所述第二数据表没有处于所述第一标识所对应的高质量数据表的类别,或者说所述第二数据表不是高质量数据表。
在针对所述查询请求展示查询结果时,所述展示单元505将根据数据表的标识确定数据表的展示位置。根据本发明实施例中的数据表分类,从数据表中确定出了高质量数据表类别,处于高质量数据表类别中的数据表的标识均被更新为所述第一标识。在展示查询结果时,所述展示单元505可以通过数据表的标识识别出处于高质量数据表类别的数据表(即标识为第一标识的数据表),也可以通过数据表的标识识别出不处于高质量数据表类别的数据表(即标识不是第一标识的数据表)。所述展示单元505在确定展示位置时,将标识为第一标识的所述第一数据表的展示位置放在较所述第二数据表的展示位置更优的位置,使得用户看到所述查询结果时,所述第一数据表能够更加容易的被看到,所述第二数据表将没有所述第一数据表那么容易的被看到。对所述展示位置进行举例说明,例如需要多页展示查询结果时,所述第一数据表可以位于靠前的页面,所述第二数据表位于相对第一数据表更为靠后的页面。例如需要单页展示查询结果时,所述第一数据表的展示位置可以更加靠上,使得可以被用户第一时间看到,而所述第二数据表的展示位置将较于所述第一数据表更加靠下,使得可能需要滚动屏幕后才能被看到。展示的方式较多,展示位置也可以不单纯的理解为“位置”,例如也可以对具有第一标识的数据表进行彩色展示、放大展示等。这里不再一一举例说明。
可见,服务器通过判断第一数据表的表参数是否符合第一判断条件,在符合第一判断条件时,将所述第一数据表的标识更新为第一标识,所述服务器接收包括查询条件的查询请求,当符合查询条件的多个数据表中包括所述第一数据表时,所述服务器将根据数据表的标识向所述查询请求展示所述多个数据表,使得标识为第一标识的第一数据表的展示位置优于标识不是第一标识的第二数据表的展示位置,由此,通过数据表的分类,为能够符合第一判断条件的数据表或者说质量更优的数据表的标识更新为第一标识,这样在查询数据表时,可以将查询结果中具有第一标识的、质量更优的数据表将会优先展示给查询的用户,这些高质量数据表相对于质量不高的数据表更能够满足用户的查询需求,用户基本上只需浏览这些高质量的数据表就可以找到符合自身需求的数据表,从而从很大程度上免去了在质量不高的数据表中查找的时间,节约了查询时间,提高了查询效率。
实施例四
所述表参数除了包括所述完整性参数和更新参数以外,还可以进一步的包括其他用于标识数据表相关内容的参数。一般来说,表参数包括的参数种类越多,相应的所述第一判断条件中的判断标准也越多,对数据表的分类精度也越高。本发明实施例提供了表参数可以包括的参数组合,可选的,所述表参数还包括类目参数、变更频率参数和DQC参数中的任意一种或多种的组合。所述类目参数用于标识所述第一数据表的所属的分类,例如表1所示。
所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率。这里的变更频率可以理解为是否频繁变更。所依据的参数可以包括:近90天表或字段重命名天数<2天、近90天重命名的字段占比<5%、近90天发生过表重建的天数<2天、近7天平均运行时长小于2小时。如果达到上述参数的部分或全部要求,可以以此确定变更频繁,反之则不频繁。
所述DQC参数用于标识所述第一数据表被DQC监控的参数。其中可以包括是否有强监控、监控类型个数是否超过三个、是否有强唯一值监控等。其中任一个条件为是时,可以理解为所述DQC参数符合所述第一判断条件。
可选的,若所述表参数包括所述类目参数,所述展示单元505还用于将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
举例说明,通过分类目的展示,可以使得用户可以目的性更为明确,由于类目可以与工作领域等相关,用户可以根据自己所在的领域,直接进入查找结果中对应的类目,该类目中的数据表能够有更大的几率命中查询需求。由此进一步的提高了查询效率,节约了查询时间。
在图5所对应实施例的基础上,图6为本发明实施例提供的一种数据表的分类装置的装置结构图,若所述表参数不符合所述第一判断条件,所述判断单元502还用于触发第二更新单元601;
所述第二更新单元601,用于将所述第一数据表的标识更新为第二标识。
举例说明,若不符合所述第一判断条件,可以理解为所述第一数据表并不符合成为一个高质量数据表,那么所述第一数据表将不会被划分到高质量数据表这一类别中。所述第二标识可以理解为对应于普通数据表类别。本发明实施例提出的数据表分类,并不 是一次性的分类,而是会根据表参数的变化而相应的改变。通过周期性的或其他方式分析所述表参数,当所述第一数据表的表参数达到第一判断条件的标准,则可以将所述第一数据表的标识更新为第一标识,当之后所述第一数据表的表参数不能达到所述第一判断条件的标准,则将所述第一数据表的标识从所述第一标识更新为所述第二标识,若之后所述第一数据表的表参数再一次的达到所述第一判断条件的标准,又可以将所述第一数据表的标识从所述第二标识更新为所述第一标识。
所述展示单元505根据数据表的标识向所述查询请求展示所述多个数据表,其中,第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
也就是说,当所述第一数据表的标识为第二标识时,在作为查询结果进行展示时,属于普通数据表类别的所述第一数据表的展示位置将劣于属于高质量数据表类别的所述第三数据表的展示位置,具体如何劣于可以参照图5所对应实施例的相关描述。
通过所述第一判断条件,可以将云计算平台上的保存的数据表分为两个类别,一个类别为符合所述第一判断条件的高质量数据表类别,一个类别为不符合所述第一判断条件的普通数据表类别。为了更加精细化的对数据表分类,还可以增加第二判断条件,从而分出数据表的第三类别。如图6所示:
若所述表参数不符合所述第一判断条件,所述判断单元502还用于判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件;若所述表参数不符合所述第二判断条件,触发第三更新单元602。
所述第三更新单元602,用于将所述第一数据表的标识更新为第三标识。
举例说明,所述第二判断条件可以理解为相较于所述第一判断条件更容易符合的判断条件。在本发明实施例中,所述第二判断条件可以用于作为判断数据表是否属于普通数据表类别的基准,若所述第一数据表不符合所述第一判断条件但符合所述第二判断条件,可以将所述第一数据表的标识更新为所述第二标识。若所述第一数据表不能符合所述第二判断条件,则可以认为所述第一数据表不符合成为普通数据表的条件,将不能被分类到所述第二标识对应的普通数据表类别,所述第三更新单元602将所述第一数据表的标识更新为第三标识,相当于将所述第一数据表分配到所述第三标识对应的类别中。对于这种低质量数据表,本发明实施例所采取的措施是在展示时将标识为第三标识的数据表屏蔽。
需要注意的是,在本发明实施例中,不符合所述第二判断条件的数据表并不一定都 是质量不好的数据表,例如,由于有些产生数据表的项目相对机密性较高或私密性较高,不希望被他人在云计算平台中搜索到。或者,有些开发者在线上测试过程中使用的数据表并不想公开出来,以防止被他人引用过多产生故障。根据这些需求,也可以将这些需求作为所述第二判断条件的一部分判断依据。云计算平台在保存数据表的过程中,会将数据表分成了上生产调度的生产表和开发过程中的开发表(即dev表)、临时表(tmp表),故而早本发明实施例的解决方案中,可以是将开发表、临时表直接作为不符合所述第二判断条件的数据表,使其不可被他人搜索到。
所述展示单元505根据数据表的标识向所述查询请求展示所述多个数据表,其中,在展示所述多个数据表的过程中屏蔽所述第一数据表。
举例说明,所述展示单元505在向用户展示查询结果时,会将所述查询结果中标识为第三标识的数据表屏蔽掉,即不向所述用户展示标识为第三标识的数据表。从而免去了用户在查询结果中的低质量数据表中所浪费的时间。
通过第一判断条件和第二判断条件,可以将云计算平台上的数据表分为三个类别,如图4所示,符合所述第一判断条件的数据表被分类为精品表(即前述高质量数据表),不符合所述第一判断条件但符合第二判断条件的数据表被分类为普通表,部符合所述第二判断条件的数据表被分类为私有表。其中,查询结果中的精品表和普通表可以展示给用户,而查询结果中的私有表将不会展示给用户,对用户处于不可见的状态。使用金字塔的形式来组织云计算平台的数据表,可以为精品表增加SLA保障。
本发明实施例并不限定仅将数据表分类为两个或三个类别,根据具体的应用场景,还可以分为更多个类别。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质可以是下述介质中的至少一种:只读存储器(英文:read-only memory,缩写:ROM)、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备及系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的设备及系统实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单 元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (12)

  1. 一种数据表的分类方法,其特征在于,所述方法包括:
    服务器获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录;
    所述服务器判断所述表参数是否符合第一判断条件;
    若所述表参数符合所述第一判断条件,所述服务器将所述第一数据表的标识更新为第一标识;
    所述服务器接收查询请求,所述查询请求包括查询条件;
    若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
  2. 根据权利要求1所述的方法,其特征在于,所述表参数还包括类目参数、变更频率参数和数据指令控制DQC参数中的任意一种或多种的组合,所述类目参数用于标识所述第一数据表的所属的分类,所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率,所述DQC参数用于标识所述第一数据表被DQC监控的参数。
  3. 根据权利要求2所述的方法,其特征在于,若所述表参数包括所述类目参数,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,还包括:
    所述服务器将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述服务器判断所述表参数是否符合第一判断条件,还包括:
    若所述表参数不符合所述第一判断条件,所述服务器将所述第一数据表的标识更新为第二标识;
    若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,包括:
    第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
  5. 根据权利要求4所述的方法,其特征在于,若所述表参数不符合所述第一判断条件,还包括:
    所述服务器判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件;
    若所述表参数不符合所述第二判断条件,所述服务器将所述第一数据表的标识更新为第三标识;
    所述若符合所述查询条件的多个数据表中包括所述第一数据表,所述服务器根据数据表的标识向所述查询请求展示所述多个数据表,包括:
    所述服务器在展示所述多个数据表的过程中屏蔽所述第一数据表。
  6. 根据权利要求1所述的方法,其特征在于,
    所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;
    所述第一判断条件还包括所述更新参数具有持续更新的更新记录。
  7. 一种数据表的分类装置,其特征在于,所述装置包括:
    获取单元,用于获取第一数据表的表参数,所述表参数包括完整性参数和更新参数,所述完整性参数用于标识所述第一数据表的元数据完整性,所述更新参数用于标识所述第一数据表的更新记录;
    判断单元,用于判断所述表参数是否符合第一判断条件;若所述表参数符合所述第一判断条件,触发第一更新单元;
    所述第一更新单元,用于将所述第一数据表的标识更新为第一标识;
    接收单元,用于接收查询请求,所述查询请求包括查询条件;
    展示单元,用于若符合所述查询条件的多个数据表中包括所述第一数据表,根据数据表的标识向所述查询请求展示所述多个数据表;其中,所述第一数据表的展示位置优于第二数据表的展示位置,所述第二数据表为所述多个数据表中的一个数据表,所述第二数据表的标识不是所述第一标识。
  8. 根据权利要求7所述的装置,其特征在于,所述表参数还包括类目参数、变更频率参数和数据指令控制DQC参数中的任意一种或多种的组合,所述类目参数用于标识所述第一数据表的所属的分类,所述变更频率参数用于标识所述第一数据表中字段的变更频率和/或第一数据表的变更频率,所述DQC参数用于标识所述第一数据表被DQC监控的参数。
  9. 根据权利要求8所述的装置,其特征在于,若所述表参数包括所述类目参数,所述展示单元还用于将所述多个数据表按照类目参数进行分类展示,其中,所述第一数据表展示在所述第一数据表所属的分类下。
  10. 根据权利要求7至9任一项所述的装置,其特征在于,若所述表参数不符合所述第一判断条件,所述判断单元还用于触发第二更新单元;
    所述第二更新单元,用于将所述第一数据表的标识更新为第二标识;
    所述展示单元根据数据表的标识向所述查询请求展示所述多个数据表,其中,第三数据表的展示位置优于所述第一数据表的展示位置,所述第三数据表为所述多个数据表中的一个数据表,所述第三数据表的标识是所述第一标识。
  11. 根据权利要求10所述的装置,其特征在于,若所述表参数不符合所述第一判断条件,所述判断单元还用于判断所述表参数是否符合第二判断条件,所述第二判断条件的符合条件低于所述第一判断条件的符合条件;若所述表参数不符合所述第二判断条件,触发第三更新单元;
    所述第三更新单元,用于将所述第一数据表的标识更新为第三标识;
    所述展示单元根据数据表的标识向所述查询请求展示所述多个数据表,其中,在展示所述多个数据表的过程中屏蔽所述第一数据表。
  12. 根据权利要求7所述的装置,其特征在于,
    所述第一判断条件包括所述完整性参数具有表注释、注释字段占比达到预设阈值、具有数据层次、具有数据的存储类型和具有调度周期中的任意一项或多项的组合;
    所述第一判断条件还包括所述更新参数具有持续更新的更新记录。
PCT/CN2016/092819 2015-08-11 2016-08-02 一种数据表的分类方法和装置 WO2017024966A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510490712.0 2015-08-11
CN201510490712.0A CN106708835A (zh) 2015-08-11 2015-08-11 一种数据表的分类方法和装置

Publications (1)

Publication Number Publication Date
WO2017024966A1 true WO2017024966A1 (zh) 2017-02-16

Family

ID=57984386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/092819 WO2017024966A1 (zh) 2015-08-11 2016-08-02 一种数据表的分类方法和装置

Country Status (2)

Country Link
CN (1) CN106708835A (zh)
WO (1) WO2017024966A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144999A (zh) * 2018-08-02 2019-01-04 东软集团股份有限公司 一种数据定位方法、装置及存储介质、程序产品

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357902B (zh) * 2017-07-14 2021-05-28 电子科技大学 一种基于关联规则的数据表分类系统与方法
CN113032494A (zh) * 2021-03-08 2021-06-25 浙江大华技术股份有限公司 一种数据表分类、模型训练方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201689418U (zh) * 2010-04-14 2010-12-29 北京喻龙恒瑞科技有限公司 一种管理数据表更新装置及管理系统
CN103092839A (zh) * 2011-10-28 2013-05-08 腾讯科技(深圳)有限公司 记录历史信息的管理方法及装置
US20130232139A1 (en) * 2012-03-02 2013-09-05 Yu-Kai Xiong Electronic device and method for generating recommendation content
CN103970747A (zh) * 2013-01-24 2014-08-06 爱帮聚信(北京)科技有限公司 网络侧计算机对搜索结果进行排序的数据处理方法
CN104123346A (zh) * 2014-07-02 2014-10-29 广东电网公司信息中心 一种结构化数据搜索方法
CN104699771A (zh) * 2015-03-02 2015-06-10 北京京东尚科信息技术有限公司 数据同步方法和集群节点

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339296A (zh) * 2010-07-26 2012-02-01 阿里巴巴集团控股有限公司 一种查询结果的排序方法和装置
CN102117320B (zh) * 2011-01-11 2012-07-25 百度在线网络技术(北京)有限公司 一种结构化数据搜索的方法和装置
CN103067618A (zh) * 2012-12-21 2013-04-24 上海即略网络信息科技有限公司 来电显示方法及系统
US20140337361A1 (en) * 2013-05-09 2014-11-13 Piazza Technologies, Inc. User-specific feed generation system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201689418U (zh) * 2010-04-14 2010-12-29 北京喻龙恒瑞科技有限公司 一种管理数据表更新装置及管理系统
CN103092839A (zh) * 2011-10-28 2013-05-08 腾讯科技(深圳)有限公司 记录历史信息的管理方法及装置
US20130232139A1 (en) * 2012-03-02 2013-09-05 Yu-Kai Xiong Electronic device and method for generating recommendation content
CN103970747A (zh) * 2013-01-24 2014-08-06 爱帮聚信(北京)科技有限公司 网络侧计算机对搜索结果进行排序的数据处理方法
CN104123346A (zh) * 2014-07-02 2014-10-29 广东电网公司信息中心 一种结构化数据搜索方法
CN104699771A (zh) * 2015-03-02 2015-06-10 北京京东尚科信息技术有限公司 数据同步方法和集群节点

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144999A (zh) * 2018-08-02 2019-01-04 东软集团股份有限公司 一种数据定位方法、装置及存储介质、程序产品

Also Published As

Publication number Publication date
CN106708835A (zh) 2017-05-24

Similar Documents

Publication Publication Date Title
US20130290347A1 (en) Systems and methods for providing data-driven document suggestions
US8990149B2 (en) Generating a predictive model from multiple data sources
US9235653B2 (en) Discovering entity actions for an entity graph
US9990610B2 (en) Systems and methods for providing suggested reminders
US10540368B2 (en) System and method for resolving synchronization conflicts
US9544726B2 (en) Adding location names using private frequent location data
RU2573209C2 (ru) Автоматический поиск контекстно-связанных элементов задачи
US20140344745A1 (en) Auto-calendaring
US10740336B2 (en) Computerized methods and systems for grouping data using data streams
US20180189416A1 (en) Method and apparatus for visualizing relations between incident resources
CN105700819B (zh) 用于网络数据存储的方法和系统
WO2017024966A1 (zh) 一种数据表的分类方法和装置
WO2019085463A1 (zh) 部门需求的推荐方法、应用服务器及计算机可读存储介质
US20220318319A1 (en) Focus Events
JP2018045713A (ja) ソフトウェア・アプリケーションのイベントの識別
US20160063451A1 (en) Systems and Methods for Biasing Task Assistance Auto-complete Suggestions
US11188536B2 (en) Automatically connecting external data to business analytics process
US20130346405A1 (en) Systems and methods for managing data items using structured tags
US11216894B2 (en) Image-based semantic accommodation search
US20090112704A1 (en) Management tool for efficient allocation of skills and resources
CN114116811B (zh) 日志处理方法、装置、设备及存储介质
CN114780648A (zh) 任务调度方法、装置、计算机设备、存储介质和程序产品
CN113590914A (zh) 信息处理方法、装置、电子设备和存储介质
US20150169776A1 (en) System and method for displaying contextual data respective of events
CN110688590A (zh) 一种地图应用地点描述信息展示方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16834590

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16834590

Country of ref document: EP

Kind code of ref document: A1