CN116955326A - Method and device for controlling life cycle of data product and electronic equipment - Google Patents

Method and device for controlling life cycle of data product and electronic equipment Download PDF

Info

Publication number
CN116955326A
CN116955326A CN202210397362.3A CN202210397362A CN116955326A CN 116955326 A CN116955326 A CN 116955326A CN 202210397362 A CN202210397362 A CN 202210397362A CN 116955326 A CN116955326 A CN 116955326A
Authority
CN
China
Prior art keywords
data
product
information
target
auditing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210397362.3A
Other languages
Chinese (zh)
Inventor
崔金涛
叶玮彬
刘涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210397362.3A priority Critical patent/CN116955326A/en
Publication of CN116955326A publication Critical patent/CN116955326A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning

Abstract

The disclosure provides a method, a device and electronic equipment for controlling the life cycle of a data product, relates to the technical field of data processing, and particularly relates to the field of big data. The specific implementation scheme is as follows: acquiring data product attributes corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle; determining data application ranges corresponding to different stages in a data life cycle according to the data product attributes; based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, performing data constraint auditing on the target data product to obtain an auditing result; and readjusting the state circulation of the target data product among different stages in the data life cycle by using the auditing result.

Description

Method and device for controlling life cycle of data product and electronic equipment
Technical Field
The disclosure relates to the technical field of data processing, and further relates to the field of big data, in particular to a method and a device for controlling the life cycle of a data product and electronic equipment.
Background
The life cycle management of the data products refers to a technology for managing key information and states of the data products in a big data processing link. In the current Internet big data age, each enterprise can produce and process a large amount of high-value data, the data has the characteristics of large scale, long link and multiple participation roles, and along with the explosive growth of the enterprise big data, the practical problems of data tracking, data management, data security and the like are necessarily caused, so that the data management becomes an important work which is necessary to be carried out by the enterprise.
When the related technology is used for data management, a data product management mechanism based on data mounting and a data product life cycle management mechanism based on automatic departure and information synchronization are generally adopted. The data product management mechanism based on data mounting can provide simple data product information management and retrieval functions, but cannot manage the life cycle of the data product, and the reliability of the data product information is poor after long-time operation; the data product life cycle management mechanism based on automatic departure and information synchronization can ensure the accuracy and real-time of data product information, but cannot accurately manage the data product life cycle in a complex data link scene.
Disclosure of Invention
The disclosure provides a method, a device and electronic equipment for controlling a life cycle of a data product, so as to at least solve the technical problem of low reliability when the life cycle of the data product is managed by related technologies.
According to an aspect of the present disclosure, there is provided a method of controlling a lifecycle of a data product, comprising: acquiring data product attributes corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle; determining data application ranges corresponding to different stages in a data life cycle according to the data product attributes; based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, performing data constraint auditing on the target data product to obtain an auditing result; and readjusting the state circulation of the target data product among different stages in the data life cycle by using the auditing result.
According to yet another aspect of the present disclosure, there is provided an apparatus for controlling a lifecycle of a data product, comprising: the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle; the determining module is used for determining the application ranges of the data corresponding to different stages in the data life cycle according to the data product attributes; the auditing module is used for conducting data constraint auditing on the target data product based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, and obtaining an auditing result; and the control module is used for readjusting the state circulation of the target data product among different stages in the data life cycle by utilizing the auditing result.
According to still another aspect of the present disclosure, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of controlling the lifecycle of the data product as set forth in the present disclosure.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of controlling the lifecycle of a data product set forth in the present disclosure.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method of controlling the lifecycle of a data product as set forth in the present disclosure.
According to the method and the device, the data product attributes of the target data product corresponding to different stages in the data life cycle are obtained, the data application ranges corresponding to the different stages in the data life cycle are determined according to the data product attributes, then the data constraint audit is carried out on the target data product based on the data states and the data application ranges corresponding to the different stages in the data life cycle, the audit result is obtained, finally the state circulation of the target data product among the different stages in the data life cycle is readjusted by utilizing the audit result, the purpose of efficiently managing the data life cycle of the target data product is achieved, the effect of improving the management accuracy of the life cycle of the data product is achieved, and therefore the technical problem that reliability of the related technology for managing the life cycle of the data product is low is solved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a block diagram of a hardware architecture of a computer terminal (or mobile device) for implementing a method of controlling a lifecycle of a data product, according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of controlling a lifecycle of a data product, according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data lifecycle flow, according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a method of controlling a lifecycle of a data product, according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a method of controlling a lifecycle of a data product, according to an embodiment of the present disclosure;
fig. 6 is a block diagram of an apparatus for controlling a lifecycle of a data product, according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The life cycle management of the data products refers to a technology for managing key information and states of the data products in a big data processing link.
The related art generally adopts the following two schemes for managing the life cycle of the data product:
scheme one: data product management mechanism based on data mounting. Common data product management systems are implemented through a data source mounted model. Specifically, as a producer, a user mounts data produced by the producer, such as a data warehouse (Hive), a distributed file system (Hadoop Distributed File System, HDFS), a relational database and the like, on a platform for other users, and in the mounting process, corresponding data product information needs to be manually marked or generated through the system, wherein the data product information comprises: fields (Schema), age of yield, cycle of yield, days of reservation, etc. As a user, the corresponding data product information can be screened in the data marts of the platform, so that the needed data product information is obtained for use.
The first scheme can realize the function of data product management and provide the function of simple data product information management and retrieval. However, since there is no concept of the life cycle of the data product, a large amount of data may be outdated or erroneous after a long-term operation, and the reliability of the data product information is deteriorated. For example, after the data a is installed in the platform management system, some data changes may occur, for example, a change occurs in a yield cycle, or a change occurs in yield aging, and even the data a is not produced completely. The change information is completely dependent on synchronous change of the interface person on the platform, and if the interface person does not operate, the information on the platform is distorted. In addition, as the life cycle of the data product does not exist in the management mechanism of the scheme one, the user only knows that the data A is mounted on the platform, but cannot know whether the data A can be used or not and whether the aging of the data A is guaranteed or not.
Scheme II: a data product lifecycle management mechanism based on automatic exit and information synchronization. The scheme II perfects the life cycle of the data product on the basis of the scheme I, and provides a synchronization mechanism to ensure the accuracy and instantaneity of the information of the data product. The life cycle of the data product in the scheme is as follows: production- > mounting- > offline. To realize the management of the life cycle of the data product, the following functions are newly added: 1) Data exit mechanism. Specifically, the function of downloading data is provided, after a user downloads the data calculation task, the corresponding data product is set to be in a downloading state through data backspace, and the life cycle of the data product on the platform is ended. 2) A data synchronization mechanism. Specifically, the synchronization of basic information, such as synchronization Schema, separator and the like, is performed for the data product in the mounting state, so that the real-time performance of the information of the data product is ensured.
In the second scheme, although the data life cycle management can be realized, the state circulation of the data life cycle is not limited by blood clots, and the information reliability of the data product is poor. For example, data a is routinely produced after being mounted on a platform, and is dependent on three data sets, data B, data C and data D. If the calculation task corresponding to the data a is stopped after a period of time, the data a is not offline, but is in a mounted state in the platform, and can be referenced by other data, and all downstream calculation faults or errors of the data a can be caused. Therefore, the management accuracy of the scheme II on the life cycle of the data product in the complex data link scene is low.
In the related art, there is a technical problem of low reliability when managing the life cycle of the data product, and no effective solution has been proposed at present for the above problem.
In accordance with an embodiment of the present disclosure, a method of controlling the lifecycle of a data product is provided, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
The method embodiments provided by the embodiments of the present disclosure may be performed in a mobile terminal, a computer terminal, or similar electronic device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein. Fig. 1 shows a block diagram of the hardware architecture of a computer terminal (or mobile device) for implementing a method of controlling the lifecycle of a data product.
As shown in fig. 1, the computer terminal 100 includes a computing unit 101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 102 or a computer program loaded from a storage unit 108 into a Random Access Memory (RAM) 103. In the RAM 103, various programs and data required for the operation of the computer terminal 100 can also be stored. The computing unit 101, ROM 102, and RAM 103 are connected to each other by a bus 104. An input/output (I/O) interface 105 is also connected to bus 104.
Various components in computer terminal 100 are connected to I/O interface 105, including: an input unit 106 such as a keyboard, a mouse, etc.; an output unit 107 such as various types of displays, speakers, and the like; a storage unit 108 such as a magnetic disk, an optical disk, or the like; and a communication unit 109 such as a network card, modem, wireless communication transceiver, etc. The communication unit 109 allows the computer terminal 100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 101 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 101 performs the method of controlling the lifecycle of the data product described herein. For example, in some embodiments, the method of controlling the lifecycle of the data product may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the computer terminal 100 via the ROM 102 and/or the communication unit 109. One or more steps of the methods of controlling the lifecycle of a data product described herein may be performed when the computer program is loaded into RAM 103 and executed by computing unit 101. Alternatively, in other embodiments, the computing unit 101 may be configured to perform the method of controlling the lifecycle of the data product in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
It should be noted here that, in some alternative embodiments, the electronic device shown in fig. 1 described above may include hardware elements (including circuits), software elements (including computer code stored on a computer readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a specific example, and is intended to illustrate the types of components that may be present in the above-described electronic devices.
In the above operating environment, the present disclosure provides a method of controlling the lifecycle of a data product as shown in fig. 2, which may be performed by a computer terminal or similar electronic device as shown in fig. 1. Fig. 2 is a flow chart of a method of controlling a lifecycle of a data product, provided in accordance with an embodiment of the present disclosure. As shown in fig. 2, the method may include the steps of:
step S21, obtaining data product attributes respectively corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
the data product attributes include: basic information (Basic), meta information (Meta), responsible person information (Owner), production Period information (Period), time information (timeline), priority information (Priority). The basic information includes Name identification (Name) and Type information (Type) of the target data product, for example, the Type information may indicate that the target data product is a Hive table or an HDFS file. The meta information includes field information (Schema) and Partition information (Partition) of the target data product. The responsible person information includes responsible person and department information of the target data product. The production cycle information includes the time required for the production of the target data product in days or hours. The age information includes the time required for the target data product to promise to yield per cycle. The priority information is used to indicate the importance of the target data product.
The above-mentioned multiple stages are used to describe the whole process of data production, data use and data offline of the target data product, and the multiple stages include: an initialization phase (Init), a create-in-process phase (Creating), a Created phase (Created), a to-be-issued phase (version), a Verified phase (verify), a Deployed phase (release), a issued phase (Published), an offline phase (offline).
Specifically, the implementation process of obtaining the data product attributes corresponding to different stages of the target data product in the data lifecycle may refer to further description of the embodiments of the present disclosure, which is not repeated.
Step S22, determining the application ranges of the data corresponding to different stages in the data life cycle according to the data product attributes;
the corresponding data quality of the different data product attributes is different. Specifically, the more data product attributes, the higher the corresponding data product quality. And determining the application ranges of the data corresponding to different stages in the data life cycle according to the data product attributes, thereby meeting the requirements of the different stages in the data life cycle on the data quality.
Specifically, the implementation process of determining the application ranges of the data corresponding to different stages in the data lifecycle according to the data product attribute may refer to further description of the embodiments of the present disclosure, which is not repeated.
Step S23, based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, performing data constraint auditing on the target data product to obtain an auditing result;
specifically, based on the data states and the data application ranges corresponding to different stages of the target data product in the data lifecycle, data constraint auditing is performed on the target data product, and the implementation process of obtaining the auditing result can refer to further description of the embodiments of the present disclosure and is not repeated.
And step S24, readjusting the state circulation of the target data product among different stages in the data life cycle by using the auditing result.
The state circulation of the target data product in different stages of the data life cycle can be readjusted by utilizing the auditing result, so that the target data product can be maintained, and the accuracy and the reliability of the target data product in the data life cycle are further ensured.
According to the steps S21 to S24 of the present disclosure, the data application ranges corresponding to the different stages of the data lifecycle are determined according to the data product attributes by acquiring the data product attributes corresponding to the different stages of the target data product in the data lifecycle, then the data constraint audit is performed on the target data product based on the data states and the data application ranges corresponding to the different stages of the target data product in the data lifecycle, so as to obtain an audit result, and finally the state circulation of the target data product between the different stages of the data lifecycle is readjusted by using the audit result, thereby achieving the purpose of efficiently managing the data lifecycle of the target data product, realizing the effect of improving the management accuracy of the data product lifecycle, and further solving the technical problem of low reliability when the data product lifecycle is managed by the related technology.
The method of controlling the lifecycle of the data product of the above embodiments is further described below.
As an optional implementation manner, in step S21, acquiring the data product attributes corresponding to different phases of the data lifecycle of the target data product respectively includes:
step S211, basic information is acquired in an initialization stage;
step S212, obtaining meta information in the established stage;
step S213, obtaining output period information, aging information and priority information in the deployed stage and the subsequent stage;
step S214, the responsible person information is acquired in the issued stage.
Fig. 3 is a flow diagram of a data lifecycle including a plurality of phases, including in particular an initialization phase, an in-creation phase, a created phase, a to-be-issued phase, a checked phase, a deployed phase, an issued phase, and an offline phase, as shown in fig. 3, according to an embodiment of the present disclosure. The method for acquiring the data product attributes of the target data product corresponding to different stages in the data life cycle specifically comprises the following steps: the method comprises the steps of obtaining Name identification (Name) and Type information (Type) of a target data product in an initialization stage, obtaining field information (Schema) and Partition information (Partition) of the target data product in a created stage, obtaining output period information, aging information and priority information in a deployed stage and a subsequent stage, and obtaining principal information in a released stage.
Based on the steps S211 to S214, the data product attributes corresponding to the different phases of the target data product in the data lifecycle are obtained, so as to determine the data application ranges corresponding to the different phases of the data lifecycle according to the data product attributes, thereby guaranteeing the discoverability of the target data product.
As an optional implementation manner, in step S22, determining, according to the data product attribute, the data application ranges corresponding to different phases in the data lifecycle includes:
step S221, determining a first application range of the first part of data in the data lifecycle according to the basic information and the meta information, where the first application range is used to indicate that the first part of data is in a visible state for the debug job, and the first part of data includes: the target tenant has created data and the rest of tenants except the target tenant have published data;
specifically, the first part of data includes created data of the tenant and published data of an external tenant, and the first application range is a data range visible by the tenant in debugging operation.
Step S222, determining a second application range of the second part of data in the data lifecycle according to the basic information, the meta information, the yield cycle information, the aging information and the priority information, where the second application range is used to indicate that the second part of data is in a visible state for the first example job, and the second part of data includes: the target tenant deployed data and the rest of tenants except the target tenant published data;
Specifically, the second portion of data includes deployed data of the tenant and published data of an external tenant, and the second application range is a data range visible in performing tenant routine operations of the tenant.
After the development is completed, the target data product enters a routine output flow, and a piece of data corresponding to the time partition is output every day or every hour, wherein the state is a routine state, and the data operation in the routine state is called routine operation. The data range visible to the tenant routine refers to the data range visible to the tenant in the routine job for which the tenant is responsible.
Step S223, determining a third application range of a third part of data in the data life cycle according to the basic information, the meta information, the output period information, the aging information, the priority information and the responsible person information, wherein the third application range is used for indicating that the third part of data is in a visible state for the second routine operation, and the third part of data comprises: all tenants have published data.
Specifically, the second routine operation is a system routine operation, and a data range visible by the system routine refers to a data range visible by all tenants of the whole system in the routine operation. The third application range is a data range visible by the tenant in performing system routine operation.
And determining that the attributes of the data products required by the first application range, the second application range and the third application range are sequentially increased, and the corresponding data quality is sequentially increased. By determining the data application ranges corresponding to different phases in the data lifecycle according to the data product attributes, it is possible to control that high quality data is used for routine jobs, that high quality data is visible to the whole system, and that low quality data is only visible to the debugging jobs or to the target tenant himself.
Based on the steps S221 to S223, the data application ranges corresponding to different stages in the data lifecycle are determined according to the data product attributes, so that corresponding data quality control is performed in each link of the data product production, so as to ensure discoverability of the target data product.
As an optional implementation manner, in step S23, based on the data states and the data application ranges corresponding to different phases of the target data product in the data lifecycle, performing data constraint audit on the target data product, and obtaining an audit result includes:
step S231, based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing first constraint audit on the target data product to obtain a first audit result, wherein the first audit result is used for auditing ageing information;
The first constraint audit is an aging constraint audit of the data entry stage, and the accuracy of aging information of the target data product can be verified based on a first audit result. The data approach phase is a phase of forward progress of the data lifecycle stream, for example, in the embodiment of the present application, the data approach phase includes: an initialization phase, an in-creation phase, an established phase, a to-be-issued phase, a verified phase, a deployed phase, and an issued phase.
Specifically, whether the target data product can achieve the current aging information is detected within a period of time. The priority information is different, and the limit on the achievement of the time efficiency is different. For example, when the priority of the target data product is higher, the importance of the target data product is higher, the corresponding data quality requirement is higher, and the time efficiency of the target data product is higher. Conversely, when the priority of the target data product is lower, the importance of the target data product is lower, the corresponding data quality requirement is lower, and the time efficiency of the target data product is lower.
Step S232, based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing second constraint audit on the target data product to obtain a second audit result, wherein the second audit result is used for auditing the influence of the modified aging information on upstream data and downstream data in a blood margin link;
The second constraint audit is blood margin constraint audit at a data entry stage, and the influence of the changed aging information on the upstream data and the downstream data in the blood margin link can be verified based on a second audit result.
Specifically, the influence of the aging information after the change of the target data product on the upstream data and the downstream data is detected. And calculating the aging range required by the upstream data and the downstream data by utilizing the aging information after the target data product is changed and the output period information, and further judging whether the aging information after the target product data is changed is in the aging range.
And step S233, performing third constraint audit on the target data product based on the data states and the data application ranges corresponding to different stages before the offline stage of the target data product, so as to obtain a third audit result, wherein the third audit result is used for auditing the operation success rate of the target data product.
The third constraint audit is task constraint audit at the data entry stage, and the correctness and stability of the output of the target data product can be verified based on a third audit result.
Specifically, the task routine scene and the output stability of the output target data product are checked. For example, the routine operation for producing the target data product is checked, if the operation success rate of the routine operation is lower than the preset threshold value or the operation fails for a plurality of continuous days, the accuracy of producing the target data product is lower, the stability is poor, and the target data product is not allowed to enter.
Based on the steps S231 to S233, based on the data state and the data application range corresponding to the data approach stage of the target data product in the data life cycle, the data constraint auditing is performed on the target data product to obtain an auditing result, and the quality control of the target data product can be performed through the visibility constraint in the data approach stage, so as to ensure the discoverability and maintainability of the target data product, and further ensure the reliability of the data information in the data life cycle.
As an optional implementation manner, in step S23, based on the data states and the data application ranges corresponding to different phases of the target data product in the data lifecycle, performing data constraint audit on the target data product, and obtaining an audit result includes:
step S234, based on the data state and the data application range corresponding to the target data product in the offline stage, performing fourth constraint audit on the target data product to obtain a fourth audit result, wherein the fourth audit result is used for auditing whether the target data product is referenced by downstream data in a blood-margin link;
the fourth constraint audit is a blood margin constraint audit at a data refund stage, and whether the target data product is referenced by downstream data in a blood margin link can be checked based on a fourth audit result. The data-out phase is a phase in which the data product pauses or stops routine operations in the data lifecycle, for example, in the embodiment of the present application, the offline phase of the data lifecycle is the data-out phase, and a fourth constraint audit is required for the target data product in the offline phase, so as to verify whether the target data product is referenced by downstream data in the blood-edge link.
Without performing the fourth constraint audit described above, there may be situations where the routine data job depends on the down-bound data, which may result in a large-scale routine failure of the data job in the down-bound link that depends on the down-bound data when the down-bound data is upstream of the down-bound link.
And step S235, performing fifth constraint auditing on the target data product based on the data state and the data application range corresponding to the target data product in the offline stage to obtain a fifth auditing result, wherein the fifth auditing result is used for auditing whether the target data product does not generate data any more.
The fifth constraint audit is a task constraint audit at a data departure stage, and based on a fifth audit result, whether the target data product stops routine or not can be checked, and whether an instance in an operating state exists in the target data product or not can be checked. The fifth constraint audit can ensure that the target data product is not generated after being off line, thereby reducing the waste of storage resources.
Based on the steps S234 to S235, based on the data state and the data application range corresponding to the data exit stage of the target data product in the data life cycle, the data constraint audit is performed on the target data product to obtain an audit result, and the quality control of the target data product can be performed through the visibility constraint in the data exit stage, so as to ensure the discoverability and maintainability of the target data product, and further ensure the reliability of the data information in the data life cycle.
Fig. 4 is a schematic diagram of a method of controlling a lifecycle of a data product, as shown in fig. 4, in which tenant a may perform data production of a target data product, performing tenant data source creation at an established stage in the data lifecycle, and after the established stage, being capable of performing data production of multiple versions, e.g., version a, version B, version C, version D, etc., in accordance with an embodiment of the present disclosure. In the deployed stage of each version data, the data generated by the target data product can be deployed to the local data set of the tenant A, and in the released stage of each version data, the data generated by the target data product can be released to the system data set. The blood edge link includes a plurality of tenants, for example, the blood edge link further includes tenant B, tenant C, etc., where the implementation process of tenant B and tenant C for controlling the lifecycle of the data product may refer to the implementation process of tenant a, and will not be described in detail. In the task management channel of the system, an initialization phase (Init), a Debug phase (Debug), a Test phase (Test) and an Online phase (Online) are included.
Fig. 5 is a schematic diagram of a method for controlling a data product life cycle according to an embodiment of the present disclosure, as shown in fig. 5, after a to-be-issued stage of a data commodity life cycle, data constraint auditing can be performed on a target data product based on data states and data application ranges corresponding to different stages of the target data product in the data life cycle, so as to obtain an auditing result, and then state circulation of the target data product in the checked stage is readjusted by using the auditing result. Specifically, the constraint audit for the target data product comprises a system audit and a manual audit, wherein the system audit comprises an aging audit, a blood margin audit and a task audit, and the manual audit comprises an audit of a business responsible person, an audit of a team responsible person and an audit of a data producer. And adjusting state circulation of the target data product among different stages in the data life cycle based on the system audit conclusion and the manual audit conclusion. For example, when the audit is not passed, the target data product returns to the last stage of the data product life cycle to be circulated again.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present disclosure may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the various embodiments of the present disclosure.
The present disclosure also provides a device for controlling the life cycle of a data product, which is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 6 is a block diagram of an apparatus for controlling a lifecycle of a data product according to one embodiment of the present disclosure, as shown in fig. 6, an apparatus 600 for controlling a lifecycle of a data product includes:
the obtaining module 601 is configured to obtain data product attributes corresponding to different stages of a data lifecycle of a target data product, where the data lifecycle includes: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
a determining module 602, configured to determine data application ranges corresponding to different phases in a data lifecycle according to data product attributes;
the auditing module 603 is configured to perform data constraint auditing on the target data product based on the data states and the data application ranges corresponding to different stages of the target data product in the data lifecycle, so as to obtain an auditing result;
the control module 604 is configured to readjust the state flow of the target data product between different phases in the data lifecycle according to the auditing result.
Optionally, a plurality of stages are used to describe the whole process of data production, data use and data offline of the target data product, the plurality of stages including: an initialization phase, an in-creation phase, an created phase, a to-be-issued phase, a verified phase, a deployed phase, an issued phase, an offline phase.
Optionally, the data product attributes include: basic information, meta information, responsible person information, production cycle information, aging information and priority information.
Optionally, the obtaining module 601 is further configured to: basic information is acquired in an initialization stage; acquiring meta information in the created stage; acquiring output period information, aging information and priority information in the deployed stage and the subsequent stage; and acquiring the responsible person information in the released stage.
Optionally, the determining module 602 is further configured to: according to the basic information and the meta information, determining a first application range of the first part of data in the data life cycle, wherein the first application range is used for indicating that the first part of data is in a visible state for the debug job, and the first part of data comprises: the target tenant has created data and the rest of tenants except the target tenant have published data; determining a second application range of second part of data in the data life cycle according to the basic information, the meta information, the output period information, the aging information and the priority information, wherein the second application range is used for indicating that the second part of data is in a visible state for the first example operation, and the second part of data comprises: the target tenant deployed data and the rest of tenants except the target tenant published data; determining a third application range of third part of data in the data life cycle according to the basic information, the meta information, the output period information, the aging information, the priority information and the responsible person information, wherein the third application range is used for indicating that the third part of data is in a visible state for the second routine operation, and the third part of data comprises: all tenants have published data.
Optionally, the auditing module 603 is further configured to: based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing first constraint audit on the target data product to obtain a first audit result, wherein the first audit result is used for auditing ageing information; based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing second constraint audit on the target data product to obtain a second audit result, wherein the second audit result is used for auditing the influence of the changed aging information on upstream data and downstream data in a blood-margin link; and based on the data states and the data application ranges corresponding to the different stages of the target data product before the offline stage, performing third constraint audit on the target data product to obtain a third audit result, wherein the third audit result is used for auditing the operation success rate of the target data product.
Optionally, the auditing module 603 is further configured to: based on the data state and the data application range corresponding to the target data product in the offline stage, performing fourth constraint auditing on the target data product to obtain a fourth auditing result, wherein the fourth auditing result is used for auditing whether the target data product is referenced by downstream data in a blood margin link or not; and carrying out fifth constraint auditing on the target data product based on the data state and the data application range corresponding to the target data product in the offline stage to obtain a fifth auditing result, wherein the fifth auditing result is used for auditing whether the target data product does not generate data any more.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
According to an embodiment of the present disclosure, there is also provided an electronic device comprising a memory having stored therein computer instructions and at least one processor arranged to execute the computer instructions to perform the steps of the above-described method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in the present disclosure, the above processor may be configured to perform the following steps by a computer program:
s1, acquiring data product attributes respectively corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
S2, determining data application ranges corresponding to different stages in a data life cycle according to the data product attributes;
s3, based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, performing data constraint auditing on the target data product to obtain an auditing result;
s4, readjusting state circulation of the target data product among different stages in the data life cycle by using the auditing result.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
According to an embodiment of the present disclosure, the present disclosure also provides a non-transitory computer readable storage medium having stored therein computer instructions, wherein the computer instructions are arranged to perform the steps of the above-described method embodiments when run.
Alternatively, in the present embodiment, the above-described non-transitory computer-readable storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring data product attributes respectively corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
S2, determining data application ranges corresponding to different stages in a data life cycle according to the data product attributes;
s3, based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, performing data constraint auditing on the target data product to obtain an auditing result;
s4, readjusting state circulation of the target data product among different stages in the data life cycle by using the auditing result.
Alternatively, in the present embodiment, the non-transitory computer readable storage medium described above may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to an embodiment of the present disclosure, the present disclosure also provides a computer program product. Program code for carrying out embodiments of the disclosed methods may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the foregoing embodiments of the present disclosure, the descriptions of the various embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a usb disk, a read-only memory (ROM), a random-access memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, etc., which can store program codes.
The foregoing is merely a preferred embodiment of the present disclosure, and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present disclosure, which are intended to be comprehended within the scope of the present disclosure.

Claims (17)

1. A method of controlling a lifecycle of a data product, comprising:
obtaining data product attributes corresponding to different stages of a target data product in a data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
determining data application ranges corresponding to different stages in the data lifecycle according to the data product attributes;
based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, carrying out data constraint auditing on the target data product to obtain an auditing result;
and readjusting the state circulation of the target data product among different stages in the data life cycle by using the auditing result.
2. The method of claim 1, wherein the plurality of phases are used to describe the overall process of data production, data usage, and data drop-off for the target data product, the plurality of phases comprising: an initialization phase, an in-creation phase, an created phase, a to-be-issued phase, a verified phase, a deployed phase, an issued phase, an offline phase.
3. The method of claim 2, wherein the data product attributes comprise: basic information, meta information, responsible person information, production cycle information, aging information and priority information.
4. A method according to claim 3, wherein obtaining data product attributes for the target data product at different stages of the data lifecycle respectively comprises:
acquiring the basic information in the initialization stage;
acquiring the meta information in the created stage;
acquiring the output period information, the aging information and the priority information in the deployed stage and the subsequent stage;
and acquiring the responsible person information in the issued stage.
5. A method according to claim 3, wherein determining data application ranges for different phases of the data lifecycle according to the data product attributes comprises:
according to the basic information and the meta information, determining a first application range of first part of data in the data life cycle, wherein the first application range is used for indicating that the first part of data is in a visible state for a debugging operation, and the first part of data comprises: a target tenant has created data and other tenants except the target tenant have published data;
Determining a second application range of second part of data in the data life cycle according to the basic information, the meta information, the yield cycle information, the aging information and the priority information, wherein the second application range is used for indicating that the second part of data is in a visible state for the first example operation, and the second part of data comprises: the target tenant deployed data and the rest of tenants except the target tenant published data;
determining a third application range of third part of data in the data life cycle according to the basic information, the meta information, the output period information, the aging information, the priority information and the responsible person information, wherein the third application range is used for indicating that the third part of data is in a visible state for a second routine operation, and the third part of data comprises: all tenants have published data.
6. The method of claim 3, wherein performing a data constraint audit on the target data product based on data states and data application ranges corresponding to different phases of the target data product in the data lifecycle, the obtaining the audit result comprises:
Based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing first constraint audit on the target data product to obtain a first audit result, wherein the first audit result is used for auditing the aging information;
based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing second constraint audit on the target data product to obtain a second audit result, wherein the second audit result is used for auditing the influence of the changed aging information on upstream data and downstream data in a blood margin link;
and based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing third constraint audit on the target data product to obtain a third audit result, wherein the third audit result is used for auditing the operation success rate of the target data product.
7. The method of claim 3, wherein performing a data constraint audit on the target data product based on data states and data application ranges corresponding to different phases of the target data product in the data lifecycle, the obtaining the audit result comprises:
Based on the data state and the data application range corresponding to the target data product in the offline stage, performing fourth constraint auditing on the target data product to obtain a fourth auditing result, wherein the fourth auditing result is used for auditing whether the target data product is referenced by downstream data in a blood-margin link or not;
and carrying out fifth constraint auditing on the target data product based on the data state and the data application range corresponding to the target data product in the offline stage to obtain a fifth auditing result, wherein the fifth auditing result is used for auditing whether the target data product does not generate data any more.
8. An apparatus for controlling a lifecycle of a data product, comprising:
the data life cycle acquisition module is used for acquiring data product attributes respectively corresponding to different stages of a target data product in the data life cycle, wherein the data life cycle comprises: a plurality of stages, wherein the target data product performs state circulation among different stages in the data life cycle;
the determining module is used for determining the data application ranges corresponding to different stages in the data lifecycle according to the data product attributes;
The auditing module is used for conducting data constraint auditing on the target data product based on the data states and the data application ranges corresponding to different stages of the target data product in the data life cycle, and obtaining an auditing result;
and the control module is used for readjusting the state circulation of the target data product among different stages in the data life cycle by using the auditing result.
9. The apparatus of claim 8, wherein the plurality of phases are to describe an overall process of data production, data usage, and data drop-off of the target data product, the plurality of phases comprising: an initialization phase, an in-creation phase, an created phase, a to-be-issued phase, a verified phase, a deployed phase, an issued phase, an offline phase.
10. The apparatus of claim 9, wherein the data product attributes comprise: basic information, meta information, responsible person information, production cycle information, aging information and priority information.
11. The apparatus of claim 10, wherein the acquisition module is further to:
acquiring the basic information in the initialization stage;
acquiring the meta information in the created stage;
Acquiring the output period information, the aging information and the priority information in the deployed stage and the subsequent stage;
and acquiring the responsible person information in the issued stage.
12. The apparatus of claim 10, wherein the means for determining is further for:
according to the basic information and the meta information, determining a first application range of first part of data in the data life cycle, wherein the first application range is used for indicating that the first part of data is in a visible state for a debugging operation, and the first part of data comprises: a target tenant has created data and other tenants except the target tenant have published data;
determining a second application range of second part of data in the data life cycle according to the basic information, the meta information, the yield cycle information, the aging information and the priority information, wherein the second application range is used for indicating that the second part of data is in a visible state for the first example operation, and the second part of data comprises: the target tenant deployed data and the rest of tenants except the target tenant published data;
Determining a third application range of third part of data in the data life cycle according to the basic information, the meta information, the output period information, the aging information, the priority information and the responsible person information, wherein the third application range is used for indicating that the third part of data is in a visible state for a second routine operation, and the third part of data comprises: all tenants have published data.
13. The apparatus of claim 10, wherein the auditing module is further to:
based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing first constraint audit on the target data product to obtain a first audit result, wherein the first audit result is used for auditing the aging information;
based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing second constraint audit on the target data product to obtain a second audit result, wherein the second audit result is used for auditing the influence of the changed aging information on upstream data and downstream data in a blood margin link;
And based on the data states and the data application ranges corresponding to different stages of the target data product before the offline stage, performing third constraint audit on the target data product to obtain a third audit result, wherein the third audit result is used for auditing the operation success rate of the target data product.
14. The apparatus of claim 10, wherein the auditing module is further to:
based on the data state and the data application range corresponding to the target data product in the offline stage, performing fourth constraint auditing on the target data product to obtain a fourth auditing result, wherein the fourth auditing result is used for auditing whether the target data product is referenced by downstream data in a blood-margin link or not;
and carrying out fifth constraint auditing on the target data product based on the data state and the data application range corresponding to the target data product in the offline stage to obtain a fifth auditing result, wherein the fifth auditing result is used for auditing whether the target data product does not generate data any more.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
CN202210397362.3A 2022-04-15 2022-04-15 Method and device for controlling life cycle of data product and electronic equipment Pending CN116955326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210397362.3A CN116955326A (en) 2022-04-15 2022-04-15 Method and device for controlling life cycle of data product and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210397362.3A CN116955326A (en) 2022-04-15 2022-04-15 Method and device for controlling life cycle of data product and electronic equipment

Publications (1)

Publication Number Publication Date
CN116955326A true CN116955326A (en) 2023-10-27

Family

ID=88444839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210397362.3A Pending CN116955326A (en) 2022-04-15 2022-04-15 Method and device for controlling life cycle of data product and electronic equipment

Country Status (1)

Country Link
CN (1) CN116955326A (en)

Similar Documents

Publication Publication Date Title
CN108874558B (en) Message subscription method of distributed transaction, electronic device and readable storage medium
CN109344170B (en) Stream data processing method, system, electronic device and readable storage medium
US9020949B2 (en) Method and system for centralized issue tracking
CN112612768B (en) Model training method and device
CN111427748B (en) Task alarm method, system, equipment and storage medium
US10701213B2 (en) Dynamically generating an aggregation routine
CN112559475B (en) Data real-time capturing and transmitting method and system
CN111190892B (en) Method and device for processing abnormal data in data backfilling
CN109933509A (en) A kind of method and apparatus for realizing automatic test defect management
CN111429241A (en) Accounting processing method and device
US20210191921A1 (en) Method, apparatus, device and storage medium for data aggregation
CN114816393B (en) Information generation method, device, equipment and storage medium
CN112860343A (en) Configuration changing method, system, device, electronic equipment and storage medium
US11954123B2 (en) Data processing method and device for data integration, computing device and medium
CN110716804A (en) Method and device for automatically deleting useless resources, storage medium and electronic equipment
CN111831536A (en) Automatic testing method and device
CN113220907A (en) Business knowledge graph construction method and device, medium and electronic equipment
CN116955326A (en) Method and device for controlling life cycle of data product and electronic equipment
CN112148762A (en) Statistical method and device for real-time data stream
US20230259994A1 (en) Reference-based software application blueprint creation
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN114549097A (en) Enterprise invoice system version management method and device, electronic equipment and storage medium
CN109597819B (en) Method and apparatus for updating a database
CN112130849A (en) Automatic code generation method and device
CN113220501A (en) Method, apparatus and computer program product for data backup

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination