WO2023226461A1 - 一种多域数据融合的方法、装置和存储介质 - Google Patents

一种多域数据融合的方法、装置和存储介质 Download PDF

Info

Publication number
WO2023226461A1
WO2023226461A1 PCT/CN2023/072949 CN2023072949W WO2023226461A1 WO 2023226461 A1 WO2023226461 A1 WO 2023226461A1 CN 2023072949 W CN2023072949 W CN 2023072949W WO 2023226461 A1 WO2023226461 A1 WO 2023226461A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
domain
data
execution engine
fusion
Prior art date
Application number
PCT/CN2023/072949
Other languages
English (en)
French (fr)
Inventor
林文楷
周成祖
魏超
吴文
朱海勇
Original Assignee
厦门市美亚柏科信息股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门市美亚柏科信息股份有限公司 filed Critical 厦门市美亚柏科信息股份有限公司
Publication of WO2023226461A1 publication Critical patent/WO2023226461A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases

Definitions

  • the present invention relates to the field of big data processing, and in particular to a method, device and storage medium for multi-domain data fusion.
  • Embodiments of the present invention provide a multi-domain data fusion method, device and storage medium, and achieve efficient cross-domain data fusion by setting up multi-domain scheduling fusion areas, task normalization and task scheduling.
  • a multi-domain data fusion method for using data from multiple domains to perform processing tasks, including:
  • the task rule library includes: the identification of the task to be processed, the data source related to the task to be processed, multiple data fields related to the data source, and each data field.
  • step S2 includes:
  • Traverse Tn aggregate the records in Tn according to the execution engine, merge tasks in the same data domain, and obtain a set of tasks related to the same data domain.
  • S3 includes:
  • the tasks in the task set related to the same data domain are sorted, and the m tasks with higher priorities are taken in order;
  • the tasks participating in task scheduling are tasks with available task status in the task set related to the same data domain.
  • the steps for verifying the task include:
  • the execution engine corresponding to the task. If the execution engine returns a result of 0, it will retry and the number of retries will be increased by 1; if the number of retries reaches the predetermined threshold and the return result is still 0, the task status will be set to Unable Use; if the execution engine returns a result of 1, the verification is passed and the task status is set to available.
  • the format of the returned task result is dynamically defined by the corresponding execution engine.
  • the storage time limit of the fusion result is set according to the preset data classification.
  • the method also includes filtering the fusion results according to the task source and task classification, and distributing the filtered results to the task source.
  • the method also includes the step of destroying the corresponding task after the distribution is completed.
  • a device for multi-domain data fusion including a memory and a processor.
  • the memory stores at least one program, and at least one program is executed by the processor to implement the multi-domain data fusion method as described above.
  • a computer-readable storage medium in which at least one program is stored, and at least one program is executed by a processor to implement the method of multi-domain data fusion as described above.
  • the multi-domain data fusion technical solution of the embodiment of the present invention uses the pre-set multi-domain multi-domain scheduling fusion area to schedule the data in each business scenario through task normalization and task scheduling.
  • Task standardization processing forms a unified task pool, executes corresponding execution engines for different data domains, and effectively integrates and accurately distributes task execution results, thus forming a physically dispersed and logically unified cross-domain data fusion model. , effectively supports big data application needs in various business scenarios in real time, and improves the coverage of big data dividend sharing.
  • Figure 1 is a schematic flow chart of a multi-domain data fusion method according to an embodiment of the present invention
  • Figure 2 is a schematic diagram of the overall flow of a multi-domain data fusion method according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a multi-domain data fusion device according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a multi-domain data fusion method according to an embodiment of the present invention. As shown in Figure 1, the multi-domain data fusion method of this embodiment is used to use data from multiple domains to perform processing tasks, including the following steps:
  • the task rule library includes: the identification of the task to be processed, the data source related to the task to be processed, multiple data fields related to the data source, and each data field.
  • the corresponding execution engine; the task rule library can be used to standardize data processing or data scheduling tasks in various business scenarios;
  • a task may correspond to multiple different data domains.
  • the above-mentioned data domain group may include each data domain involved in the corresponding data source. These data domains may belong to different owners; using the multi-domain data in the embodiment of the present invention
  • the fusion method can build an intermediate multi-domain scheduling fusion area between different data domains for different data domains, and match the corresponding execution engine or processing engine according to the conditions of each data domain.
  • the above execution engine group includes a The execution engine matched or corresponding to different data fields related to the task; the execution engine can also be preset according to the type of task;
  • S3 schedules the tasks in the task set related to the same data domain according to the preset task priority, and calls the execution engine corresponding to the task with higher priority first;
  • the same data domain may involve multiple tasks, and when executing the task
  • task scheduling that takes into account different business scenarios can be realized and the efficiency of data processing can be improved;
  • FIG. 2 is an overall flow diagram of a multi-domain data fusion method according to another embodiment of the present invention. As shown in Figure 2, in this embodiment of the present invention, the multi-domain data fusion method can be executed in a preset multi-domain scheduling fusion area.
  • a task rule library which stores various attributes related to the tasks to be processed.
  • Table 1 is an example of a feature table in the rule library.
  • the task rule base includes: an identification of the task to be processed, a data source related to the task to be processed, a plurality of data fields related to the data source, and an execution engine corresponding to each data field.
  • the task library can contain other task-related feature information, such as required time limits, associated features, associated proportions, etc., as shown in Table 1.
  • the task rule base can be used to obtain information such as the characteristics of each to-be-processed task and the matching execution engine.
  • the attribute LWLC represents the task type label. For example, 1 represents personnel information supplementation, 2 represents mobile phone profile creation, and 3 represents modeling, etc. This label is only an exemplary description.
  • the task content that matches the task type label can be set according to the actual business scenario.
  • Table 2 is a task schedule, which can be used to obtain the scheduling information of each pending task. Both Table 1 and Table 2 are stored in the multi-domain scheduling fusion area. The attribute names in Table 1 and Table 2 are only examples, and other names can be used as needed.
  • the corresponding relationship between the data domain and the execution engine or the corresponding relationship between the task type and the execution engine is established in advance. Normalizing tasks, such as normalizing the subject identification of tasks, and then further forming a task set targeting the same data domain will greatly improve the efficiency of scheduled task execution and reduce the usage of computing resources.
  • Tn is traversed, and then the records are aggregated according to the data domain, and the tasks in the same domain are merged to form the final task set related to the same data domain, that is, the execution task list Tn ⁇ data domain, List (task type label, task Identity, execution engine) ⁇ .
  • the task scheduling in the embodiment of the present invention comprehensively considers task scheduling, authority control, and data fusion, which can meet the management of the entire life cycle from task verification to scheduling to final destruction, and can also meet the on-demand data in specific business scenarios.
  • Application and data security requirements make building a multi-domain scheduling integration zone a better model for integrating different business scenarios.
  • the record whose Tn status is 1 is saved as Pn, that is, the set of tasks whose status is available.
  • Pn the set of tasks whose status is available.
  • task scheduling is performed based on preset priorities. Get the maximum number of processing threads m according to the hardware resources allocated in the multi-domain scheduling fusion area, traverse Pn, sort in descending order according to the priority of the task, take the first m records of Pn in turn, use the task identification array in the List as a parameter, call [Tn].
  • the execution engine in the List of the data field performs formal scheduling processing and returns the analysis result Rn. M items are processed each time until the execution engines corresponding to all tasks in the same data domain are traversed.
  • the priority setting can be set by considering the type of task, the routing of data, and the permissions of the data user.
  • the method of the embodiment of the present invention supports the execution engine corresponding to the domain to dynamically define the attributes of the returned data, and can also support returning according to the preset security level of the domain data item.
  • the data can be returned according to the preset permissions.
  • the results of the same object or the same task returned by different domains can be stored in the multi-domain scheduling fusion area, and fused such as attribute merging in the multi-domain scheduling fusion area to obtain the final fusion result. In this way, it can adapt to flexible and changeable business scenario data fusion.
  • mobile phone files can be established in the multi-domain scheduling fusion area.
  • the execution engines of different domains can depict the dimensions of the mobile phone according to their respective data characteristics and provide data respectively. Finally, they can be merged into a complete file in the multi-domain scheduling fusion area to complete the retrieval task of the mobile phone.
  • Domain 1 Mobile phone identification, mobile application information, APP package name, APP application software name, APP version number, APP installation time, operating system type, application information;
  • Domain 2 Mobile phone identification, address book information: address book friend’s name (nickname), friend’s mobile phone number, Mobile phone location, friend notes, group name, data source, person label, number of calls, call duration, recent call time;
  • Domain 3 Mobile phone identification, mobile phone associated address information: account type, account number, mobile phone number, name, ID number, authentication account number, contact address, data source, data source.
  • the storage time limit of the returned results can also be determined according to the preset business classification principles. For example, for personnel files, data on low-risk groups such as the elderly and children can be saved for a shorter period of time, while high-risk data such as criminal records and key control personnel can be saved for a longer period of time.
  • the fusion result Rn is filtered according to the task source and task hierarchical classification, and the data items that are not allowed to be consulted in advance are filtered out, and the filtered Rn is distributed to the task source, and Carry out logging and auditing work to ensure that data use is reasonable, compliant, safe and reliable. For example, data items that are not allowed to be consulted can be set based on permissions and security.
  • the technical solution of the embodiment of the present invention standardizes the data tasks in each business scenario by constructing a multi-domain integration scheduling area, establishing a task rule library, performing task normalization and task scheduling, and forming a unified task pool; for different data domains Execute the corresponding execution engine, and effectively integrate, process and accurately distribute the execution results of each execution engine, which can form a physically dispersed and logically unified cross-domain data fusion model to meet the big data fusion needs in various business scenarios. It solves the long-standing problems of massive data integration in the big data era, can effectively support the big data application needs in various business scenarios in real time, and improves the coverage of big data dividend sharing.
  • the present invention also provides a multi-domain data fusion device.
  • the device includes a processor 301, a memory 302, a bus 303, and a computer program stored in the memory 302 and capable of running on the processor 301.
  • the processor 301 includes one or more processing cores.
  • the memory 302 is connected to the processor 301 through the bus 303.
  • the memory 302 is used to store program instructions. When the processor executes the computer program, it implements the steps in the above method embodiment of the first embodiment of the present invention. .
  • the device for identifying microplastics can be a computer unit, which can be a computing device such as a desktop computer, notebook, palmtop computer, and cloud server.
  • the computer unit may include, but is not limited to, a processor and a memory.
  • a processor and a memory.
  • the above-mentioned composition structure of the computer unit is only an example of the computer unit and does not constitute a limitation on the computer unit. It may include more or less components than the above, or some components may be combined, or different components may be used. part.
  • the computer unit may also include input and output devices, network access devices, buses, etc., which are not limited in this embodiment of the present invention.
  • the so-called processor can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit ( Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general processor can be a microprocessor or the processor can be any conventional processor, etc.
  • the processor is the control center of the computer unit and uses various interfaces and lines to connect various parts of the entire computer unit.
  • the memory can be used to store computer programs and/or modules, and the processor implements various functions of the computer unit by running or executing the computer programs and/or modules stored in the memory, and calling data stored in the memory.
  • the memory may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system and at least one application required for a function; the stored data area may store data created based on the use of the mobile phone, etc.
  • the memory may include high-speed random access memory, and may also include non-volatile memory, such as hard disk, memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card , Flash Card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • non-volatile memory such as hard disk, memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card , Flash Card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the steps of the above method in the embodiment of the present invention are implemented.
  • the modules/units integrated with the computer unit are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium, and the computer program can be stored in a computer-readable storage medium.
  • the steps of each of the above method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form, etc.
  • Computer-readable media can include: any entity or device that can carry computer program code, recording media, USB flash drives, mobile hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM, Read-Onny Memory), random access Memory (RAM, Random Access Memory) and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种多域数据融合的方法、装置和存储介质,该方法包括:S1,在预先设置的多域调度融合区建立任务规则库;S2,提取任务规则库中选定的所有任务,针对选定的每一任务确定对应的数据域组及对应的执行引擎组,并获得与同一数据域相关的任务集合;S3,根据预先设置的任务优先级对与同一数据域相关的任务集合中的任务进行调度,优先调用与优先级较高的任务对应的执行引擎;S4,针对每一任务,将对应的执行引擎组中的各执行引擎从对应数据域返回的任务结果存储到多域调度融合区,并在多域调度融合区进行融合,获得融合结果。利用上述技术方案,可以实现高效的跨域数据融合。

Description

一种多域数据融合的方法、装置和存储介质
本PCT申请要求于2022年05月25日提交的申请号为CN 202210573368.1的中国在先申请的优先权,在此通过引用将该中国在先申请的全部内容并入本文。
技术领域
本发明涉及大数据处理领域,特别是涉及一种多域数据融合的方法、装置和存储介质。
背景技术
在当今时代,大数据已成为宝贵的资源。现有的大数据项目常常采用传统的“标准+集中”的融合方法。这种融合方法要求将各域的数据都转化成统一的标准格式,并集中存储在某一特定的域内。对于数据量大、结构复杂的大数据项目而言,这种处理方法存在以下不足:集中存储会导致很大的数据体量重复转换和存储,这会导致较高的项目建设成本;由于各域的业务变化频繁,这会导致产生的业务数据也会经常性地变化,而按照一定的标准集中存储的方式,往往难以及时匹配变化的数据格式及兼容旧的数据格式,这将导致对外赋能能力的下降。
发明内容
本发明的实施例提供了一种多域数据融合的方法、装置及存储介质,通过设置多域调度融合区、任务归一化和任务调度,实现了高效的跨域数据融合。
一方面,提供了一种多域数据融合的方法,用于使用多个域的数据来执行处理任务,包括:
S1,在预先设置的多域调度融合区建立任务规则库,任务规则库包括:待处理任务的标识、与待处理任务相关的数据源、与数据源相关的多个数据域、与各数据域对应的执行引擎;
S2,提取任务规则库中选定的所有任务,针对选定的每一任务确定对应的数据域组及对应的执行引擎组,并获得与同一数据域相关的任务集合;
S3,根据预先设置的任务优先级对与同一数据域相关的任务集合中的任务进行调度,优先调用与优先级较高的任务对应的执行引擎;
S4,针对选定每一任务,将对应的执行引擎组中的各执行引擎从对应数据域返回的任务结果存储到多域调度融合区,并在多域调度融合区进行融合,获得融合结果。
进一步地,该方法中,步骤S2包括:
提取任务规则库中选定的所有任务,形成待处理任务数据集Sn,并新建归一任务数据集合Tn;
遍历Sn,获得与各任务相关的数据源所对应的数据域组及对应的执行引擎组list(clyc),并根据执行引擎组list(clyc)将Sn拆分存储到Tn,其中Tn={Sn,list(clyc)};
遍历Tn,根据执行引擎对Tn中的记录进行聚合,将同一数据域的任务进行合并,获得与同一数据域相关的任务集合。
进一步地,该方法中,S3包括:
根据多域调度融合区分配的硬件资源确定最大的处理线程数m,m为大于0的自然数;
根据预先设置的优先级,对与同一数据域相关的任务集合中的任务进行排序,依次取优先级较高的m条任务;
调用与m条任务对应的执行引擎进行数据处理。
进一步地,该方法中,在S2之后、S3之前,还包括:
对与同一数据域相关的任务集合中的任务进行验证;如果验证通过,将对应任务的任务状态设置为可用;否则,将对应任务的任务状态设置为不可用;
S3中,参与任务调度的任务为与同一数据域相关的任务集合中任务状态可用的任务。
进一步地,该方法中,对任务进行验证的步骤包括:
调用任务对应的执行引擎,如果执行引擎返回结果的为0,则进行重试,且重试次数加1;如果重试次数达到预定阈值时,返回结果仍为0,则将任务状态设置为不可用;如果执行引擎返回结果的为1,则验证通过,将任务状态设置为可用。
进一步地,该方法中,返回的任务结果的格式由对应的执行引擎动态定义。
进一步地,该方法中,根据预先设置的数据分类设置融合结果的保存时限。
进一步地,该方法中,还包括根据任务来源和任务分级对融合结果进行过滤,并将过滤后的结果分发给任务来源方。
进一步地,该方法中,还包括:在分发完成后,将对应任务销毁的步骤。
另一方面,提供了一种多域数据融合的装置,包括存储器和处理器,存储器存储有至少一段程序,至少一段程序由处理器执行以实现如上文所述的多域数据融合的方法。
又一方面,提供了一种计算机可读存储介质,存储介质中存储有至少一段程序,至少一段程序由处理器执行以实现如上文所述的多域数据融合的方法。。
上述技术方案具有如下技术效果:
针对多域海量数据融合应用的场景,本发明实施例的多域数据融合技术方案利用预先设置的多域多域调度融合区,通过任务归一化和任务调度,将各业务场景下的数据调度任务标准化处理,形成统一的任务池,针对不同的数据域执行对应的执行引擎,并对任务执行结果进行有效地融合处理和精准分发,从而可以形成物理分散,逻辑统一的跨域的数据融合模式,实时有效地支撑了各业务场景下的大数据应用需求,提升了大数据红利共享的覆盖面。
附图说明
图1为本发明一实施例的多域数据融合的方法流程示意图;
图2为本发明另一实施例的多域数据融合的方法整体流程示意图;
图3为本发明一实施例的多域数据融合的装置的结构示意图。
具体实施方式
为进一步说明各实施例,本发明提供有附图。这些附图为本发明揭露内容的一部分,其主要用以说明实施例,并可配合说明书的相关描述来解释实施例的运作原理。配合参考这些内容,本领域普通技术人员应能理解其他可能的实施方式以及本发明的优点。图中的组件并未按比例绘制,而类似的组件符号通常用来表示类似的组件。
现结合附图和具体实施方式对本发明的实施例进行进一步说明。
实施例一:
图1为本发明一实施例的多域数据融合的方法流程示意图。如图1,该实施例的多域数据融合的方法用于使用多个域的数据来执行处理任务,包括如下步骤:
S1,在预先设置的多域调度融合区建立任务规则库,任务规则库包括:待处理任务的标识、与待处理任务相关的数据源、与数据源相关的多个数据域、与各数据域对应的执行引擎;利用任务规则库可以将各业务场景下的数据处理或数据调度任务标准化;
S2,提取任务规则库中选定的所有任务,针对选定的每一任务确定对应的数据域组及对应的执行引擎组,并获得与同一数据域相关的任务集合;在大数据的许多业务场景下,一个任务可能对应多个不同的数据域,上述的数据域组可以包括对应数据源所涉及的各数据域,这些数据域可以属于不同的所有者;利用本发明实施例的多域数据融合方法,可以针对不同的数据域在不同的数据域之间构建一个处于中间地位的多域调度融合区,根据各数据域的情况匹配对应的执行引擎或处理引擎,上述执行引擎组包含与一任务相关的不同数据域所匹配或对应的执行引擎;执行引擎也可以根据任务的类型预先设定;
S3,根据预先设置的任务优先级对与同一数据域相关的任务集合中的任务进行调度,优先调用与优先级较高的任务对应的执行引擎;同一数据域可能涉及多个任务,在执行任务时按照任务的优先级来调用执行引擎执行任务可以实现考虑不同业务场景的任务调度,提高数据处理的效率;
S4,针对选定的每一任务,将对应的执行引擎组中的各执行引擎从对应数据域返回的任务结果存储到多域调度融合区,并在多域调度融合区进行融合,获得融合结果。利用多域调度融合区,可以方便地针对同一任务完成对来自不同域执行结果的结果融合。
实施例二:
图2为本发明另一实施例的多域数据融合的方法的整体流程示意图。如图2,在本发明该实施例中,多域数据融合的方法可在预设的多域调度融合区中执行。
在本发明该实施例的多域数据融合方法中,定义了任务规则库,该规则库中存储了与待处理任务相关的各属性,表1为规则库中的特征表示例。优选地,任务规则库包括:待处理任务的标识、与待处理任务相关的数据源、与数据源相关的多个数据域、与各数据域对应的执行引擎。根据需要,任务库中可以包含其他与任务相关的特征信息,如要求时限、关联特征、关联比例等,具体如表1所示。利用任务规则库可以获取各待处理任务的特征及相匹配的执行引擎等信息。属性LWLC表示任务类型标签,示例性地,如用1表示人员信息补充,用2表示手机建档,用3表示建模等。该标签仅为示例性的说明,根据实际业务场景可以设定与任务类型标签适配的任务内容。
表2为任务调度表,利用该任务调度表可以获取每个待处理的任务的调度信息。表1和表2都存储在多域调度融合区中。表1、表2中的属性名称仅为示例性的,根据需要可以采用其他的名称。
表1
表2
任务归一化:
该实施例中,数据域与执行引擎的对应关系或任务类型与执行引擎的对应关系预先建立。对任务进行归一化处理,例如对任务的主体标识进行归一化,然后进一步形成针对同一数据域的任务集合,这将会大大提高调度任务执行的效率,降低计算资源的使用率。
提取任务规则库的所有任务,形成待处理任务数据集Sn,并新建归一任务数据集合Tn。遍历Sn,分析得到任务中所涉及到的数据源Sn.CJLY所对应的执行引擎组list(clyc)。将Sn根据执行引擎拆分存储到Tn,如Tn={Sn,list(clyc)}。举例说明:现在存在一个数据处理任务1。任务1要求调用“宾馆住宿记录”这张资源,而在域1、域3中都存储有该资源,则Tn={任务1,域1引擎},{任务1,域3引擎}。
接下来,遍历Tn,再根据数据域进行记录的聚合,将同一域的任务进行合并,形成最终的与同一数据域相关的任务集合即执行任务列表Tn{数据域,List(任务类型标签、任务标识、执行引擎)}。
传统的任务调度往往只考虑任务本身的执行情况,无法满足不同业务场景下的数据融合要求。本发明实施例中的任务调度将任务调度与权限控制、数据融合进行了综合考虑,可以满足从任务验证到调度再到最后销毁的全生命周期的管理,又满足特定业务场景下的按需数据应用和数据安全的要求,使构建多域调度融合区成为不同业务场景融合的较佳模式。
任务验证:
进一步的实现中,在建立任务规则库之后且在进行实际的任务调度之前可以对任务先进行验证。具体地,遍历Tn,将List中的任务标识数组的第一条记录作为参数,调用[Tn].数据域的List中的执行引擎,如果执行引擎的返回结果为0,则将任务的状态设置为不可用,重试,且重试次数加1即Tn.zt=0 and Tn.ZXCC=Tn.ZXCC+1。如果执行引擎的返回结果为1,则状态可用即Tn.zt=1。当重试次数超过5次状态依然为0则不再重试,该任务的状态保持为不可用。最终取Tn状态为1的记录保存为Pn,即状态为可用的任务集合。通过任务验证,可以大大减少生产环境的异常比例,例如大数据平台将建好的动态模型推送至融合区进行模型测试验证,验证运行成功后,再将模型部署至指定域进行运算分析,确保生产环境的稳定运行。
任务调度:
针对同一数据域的任务集合,基于预先设置的优先级来进行任务调度。根据多域调度融合区分配的硬件资源得到最大的处理线程数m,遍历Pn,按任务的优先级进行降序排序,依次取Pn的前m条记录,将List中的任务标识数组作为参数,调用[Tn].数据域的List中的执行引擎执行正式的调度处理,并返回分析结果Rn。每次处理m条,直至遍历完针对同一数据域的所有任务对应的执行引擎。示例性地,优先级的设置可以结合考虑任务的类型、数据的路由、数据使用者的权限来设置。
调度结果存储:
本发明实施例的方法支持由与域对应的执行引擎动态地定义返回数据的属性,也可以支持根据预先设置的域数据项的安全级别进行返回,如可以根据预先设置的权限来返回数据。不同域返回的同一对象或同一任务的结果,可以存储在多域调度融合区,并在多域调度融合区进行融合如属性合并,得到最终的融合结果。这样就能适应灵活多变的业务场景数据融合。例如,针对手机的检索任务,可以在多域调度融合区建立手机档案。不同的域的执行引擎可按照各自的数据特性刻画出手机的维度,分别提供数据,最终可以在多域调度融合区中合并成完整档案,完成该手机的检索任务。
针对该例,各域的数据具体如下:
域1:手机标识、手机应用信息,APP包名、APP应用软件名称、APP版本号、APP安装时间、操作系统类型、应用信息;
域2:手机标识、通讯录信息:通讯录好友姓名(昵称)、好友手机号、 手机归属地、好友备注、分组名称、数据源、人员标签、通话次数、通话时长、最近通话时间;
域3:手机标识、手机关联地址信息:账号类型、账号、手机号、姓名、身份证号、认证账号、联系地址、数据来源、数据源。
在进一步的实现中,还可按照预先设置的业务分级分类的原则来确定返回结果的保存时限。例如,对于人员档案而言,针对老人和小孩等低危人群的数据可以保存的时间短一些,而前科人员、重点管控人员等高危数据可以保存的时间久一些。
调度结果分发:
进一步的实现中,在任务执行完成后,根据任务来源和任务分级分类对融合结果Rn进行过滤,将预先设定不允许查阅的数据项过滤掉,将过滤后的Rn分发给任务来源方,并做好日志记录和审计工作,确保数据使用合理合规、安全可靠。示例性地,不允许查阅的数据项可以根据权限、安全来设定。
任务销毁:
进一步的实现中,在结果分发完成后,将任务状态置为不可用,即Tn.Zt=0,不再执行针对该任务记录的任务调度。
本发明实施例的技术方案通过构建多域融合调度区、建立任务规则库、进行任务归一和任务调度,将各业务场景下的数据任务标准化处理,形成统一的任务池;针对不同的数据域执行对应的执行引擎,并对各执行引擎的执行结果进行有效地融合处理和精准分发,可形成物理分散,逻辑统一的跨域的数据融合模式,满足了各业务场景下的大数据融合需求,解决了大数据时代下,困扰已久的海量数据融合存在的问题,能实时有效地支撑各业务场景下的大数据应用需求,提升大数据红利共享的覆盖面。
实施例三:
本发明还提供一种多域数据融合的装置,如图3所示,该装置包括处理器301、存储器302、总线303、以及存储在存储器302中并可在处理器301上运行的计算机程序,处理器301包括一个或一个以上处理核心,存储器302通过总线303与处理器301相连,存储器302用于存储程序指令,处理器执行计算机程序时实现本发明实施例一的上述方法实施例中的步骤。
进一步地,作为一个可执行方案,识别微塑料的装置可以是计算机单元,该计算机单元可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机单元可包括,但不仅限于,处理器、存储器。本领域技术人员可以理解,上述计算机单元的组成结构仅仅是计算机单元的示例,并不构成对计算机单元的限定,可以包括比上述更多或更少的部件,或者组合某些部件,或者不同的部件。例如计算机单元还可以包括输入输出设备、网络接入设备、总线等,本发明实施例对此不做限定。
进一步地,作为一个可执行方案,所称处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,处理器是计算机单元的控制中心,利用各种接口和线路连接整个计算机单元的各个部分。
存储器可用于存储计算机程序和/或模块,处理器通过运行或执行存储在存储器内的计算机程序和/或模块,以及调用存储在存储器内的数据,实现计算机单元的各种功能。存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据手机的使用所创建的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
实施例四:
本发明还提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现本发明实施例上述方法的步骤。
计算机单元集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,计算机程序包括计算机程序代码,计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Onny Memory)、随机存取存储器(RAM,Random Access Memory)以及软件分发介质等。需要说明的是,计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减。
尽管结合优选实施方案具体展示和介绍了本发明,但所属领域的技术人员应该明白,在不脱离所附权利要求书所限定的本发明的精神和范围内,在形式上和细节上可以对本发明做出各种变化,均为本发明的保护范围。

Claims (11)

  1. 一种多域数据融合的方法,用于使用多个域的数据来执行处理任务,其特征在于,包括:
    S1,在预先设置的多域调度融合区建立任务规则库,所述任务规则库包括:待处理任务的标识、与所述待处理任务相关的数据源、与各数据源相关的多个数据域、与各数据域对应的执行引擎;
    S2,提取所述任务规则库中选定的所有任务,针对选定的每一任务确定对应的数据域组及对应的执行引擎组,并获得与同一数据域相关的任务集合;
    S3,根据预先设置的任务优先级对与所述同一数据域相关的任务集合中的任务进行调度,优先调用与优先级较高的任务对应的执行引擎;
    S4,针对选定的所述每一任务,将对应的执行引擎组中的各执行引擎从对应数据域返回的任务结果存储到所述多域调度融合区,并在所述多域调度融合区进行融合,获得融合结果。
  2. 根据权利要求1的方法,其特征在于,所述步骤S2包括:
    提取所述任务规则库中所述选定的所有任务,形成待处理任务数据集Sn,并新建归一任务数据集合Tn;
    遍历所述Sn,获得与各任务相关的数据源所对应的数据域组及对应的执行引擎组list(clyc),并根据所述执行引擎组list(clyc)将所述Sn拆分存储到Tn,其中Tn={Sn,list(clyc)};
    遍历所述Tn,根据执行引擎对所述Tn中的记录进行聚合,将同一数据域的任务进行合并,获得与同一数据域相关的任务集合。
  3. 根据权利要求1的方法,其特征在于,所述S3包括:
    根据所述多域调度融合区分配的硬件资源确定最大的处理线程数m,m为大于0的自然数;
    根据预先设置的优先级,对所述与同一数据域相关的任务集合中的任务进行排序,依次取优先级较高的m条任务;
    调用与所述m条任务对应的执行引擎进行数据处理。
  4. 根据权利要求1的方法,其特征在于,在所述S2之后、所述S3之前, 还包括:
    对所述与同一数据域相关的任务集合中的任务进行验证;如果验证通过,将对应任务的任务状态设置为可用;否则,将对应任务的任务状态设置为不可用;
    所述S3中,参与任务调度的任务为所述与同一数据域相关的任务集合中任务状态可用的任务。
  5. 根据权利要求4的方法,其特征在于,对任务进行验证的步骤包括:
    调用待验证任务对应的执行引擎,如果执行引擎返回结果的为0,则进行重试,且重试次数加1;如果重试次数达到预定阈值时,返回结果仍为0,则将任务状态设置为不可用;如果执行引擎返回结果的为1,则验证通过,将任务状态设置为可用。
  6. 根据权利要求1的方法,其特征在于,返回的任务结果的格式由对应的执行引擎动态定义。
  7. 根据权利要求1的方法,其特征在于,根据预先设置的数据分类设置所述融合结果的保存时限。
  8. 根据权利要求1的方法,其特征在于,还包括根据任务来源和任务分级对所述融合结果进行过滤,并将过滤后的结果分发给任务来源方。
  9. 根据权利要求1的方法,其特征在于,还包括:在分发完成后,将对应任务销毁的步骤。
  10. 一种多域数据融合的装置,其特征在于,包括存储器和处理器,存储器存储有至少一段程序,至少一段程序由处理器执行以实现如权利要求1至9中任一项所述的多域数据融合的方法。
  11. 一种计算机可读存储介质,其特征在于,存储介质中存储有至少一段程序,至少一段程序由处理器执行以实现如权利要求1至9中任一项所述的多域数据融合的方法。
PCT/CN2023/072949 2022-05-25 2023-01-18 一种多域数据融合的方法、装置和存储介质 WO2023226461A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210573368.1 2022-05-25
CN202210573368.1A CN115033590A (zh) 2022-05-25 2022-05-25 一种多域数据融合的方法、装置和存储介质

Publications (1)

Publication Number Publication Date
WO2023226461A1 true WO2023226461A1 (zh) 2023-11-30

Family

ID=83121680

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072949 WO2023226461A1 (zh) 2022-05-25 2023-01-18 一种多域数据融合的方法、装置和存储介质

Country Status (3)

Country Link
CN (1) CN115033590A (zh)
WO (1) WO2023226461A1 (zh)
ZA (1) ZA202305627B (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033590A (zh) * 2022-05-25 2022-09-09 厦门市美亚柏科信息股份有限公司 一种多域数据融合的方法、装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190215338A1 (en) * 2018-01-05 2019-07-11 Goodrich Corporation Multi-domain operational environment utilizing a common information layer
US20190213263A1 (en) * 2018-01-05 2019-07-11 Goodrich Corporation Automated multi-domain operational services
CN110347878A (zh) * 2019-06-14 2019-10-18 中电科大数据研究院有限公司 一种规则引擎驱动的数据融合方法
CN113886457A (zh) * 2021-09-14 2022-01-04 浪潮软件科技有限公司 一种解决跨域异构数据联合检索的方法
CN114371933A (zh) * 2021-12-28 2022-04-19 深度数智科技(珠海)有限公司 一种动态调度多核融合计算处理器的方法及其系统
CN115033590A (zh) * 2022-05-25 2022-09-09 厦门市美亚柏科信息股份有限公司 一种多域数据融合的方法、装置和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190215338A1 (en) * 2018-01-05 2019-07-11 Goodrich Corporation Multi-domain operational environment utilizing a common information layer
US20190213263A1 (en) * 2018-01-05 2019-07-11 Goodrich Corporation Automated multi-domain operational services
CN110347878A (zh) * 2019-06-14 2019-10-18 中电科大数据研究院有限公司 一种规则引擎驱动的数据融合方法
CN113886457A (zh) * 2021-09-14 2022-01-04 浪潮软件科技有限公司 一种解决跨域异构数据联合检索的方法
CN114371933A (zh) * 2021-12-28 2022-04-19 深度数智科技(珠海)有限公司 一种动态调度多核融合计算处理器的方法及其系统
CN115033590A (zh) * 2022-05-25 2022-09-09 厦门市美亚柏科信息股份有限公司 一种多域数据融合的方法、装置和存储介质

Also Published As

Publication number Publication date
CN115033590A (zh) 2022-09-09
ZA202305627B (en) 2023-12-20

Similar Documents

Publication Publication Date Title
US20220159041A1 (en) Data processing and scanning systems for generating and populating a data inventory
CN107015853B (zh) 多阶段任务的实现方法和装置
US9740468B2 (en) Cloud-based application resource files
RU2586866C2 (ru) Дифференцирование набора признаков участником арендуемой среды и пользователем
US9727577B2 (en) System and method to store third-party metadata in a cloud storage system
US20130311597A1 (en) Locally backed cloud-based storage
US9813450B1 (en) Metadata-based verification of artifact quality policy compliance
US9471665B2 (en) Unified system for real-time coordination of content-object action items across devices
WO2023226461A1 (zh) 一种多域数据融合的方法、装置和存储介质
CN106021566A (zh) 一种提高单台数据库并发处理能力的方法、装置及系统
CN111190936A (zh) 一种基于区块链技术的可信标识关联关系查询方法及相应存储介质与电子装置
CN114971827A (zh) 一种基于区块链的对账方法、装置、电子设备及存储介质
TW202032466A (zh) 用戶年齡預測方法、裝置及設備
US11151088B2 (en) Systems and methods for verifying performance of a modification request in a database system
CN112328592A (zh) 数据存储方法、电子设备及计算机可读存储介质
CN115392501A (zh) 数据采集方法、装置、电子设备及存储介质
CN108846755A (zh) 一种基于智能合约的权限管理方法及装置
US11481377B2 (en) Compute-efficient effective tag determination for data assets
CN115543428A (zh) 一种基于策略模板的模拟数据生成方法和装置
US20230050048A1 (en) Isolating And Reinstating Nodes In A Distributed Ledger Using Proof Of Innocence
US11688027B2 (en) Generating actionable information from documents
CN113946872B (zh) 数据库的操作方法、系统、装置及计算机可读介质
CN111813842B (zh) 一种数据处理方法、装置、系统、设备和存储介质
US20220188295A1 (en) Dynamic management of blockchain resources
US20240070319A1 (en) Dynamically updating classifier priority of a classifier model in digital data discovery

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23810537

Country of ref document: EP

Kind code of ref document: A1