CN117573777A - Data synchronization method, device and electronic equipment - Google Patents
Data synchronization method, device and electronic equipment Download PDFInfo
- Publication number
- CN117573777A CN117573777A CN202311605112.5A CN202311605112A CN117573777A CN 117573777 A CN117573777 A CN 117573777A CN 202311605112 A CN202311605112 A CN 202311605112A CN 117573777 A CN117573777 A CN 117573777A
- Authority
- CN
- China
- Prior art keywords
- data
- program
- preset
- processing
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical field
本申请涉及数据同步处理技术领域,尤其涉及一种数据同步方法、装置及电子设备。The present application relates to the technical field of data synchronization processing, and in particular to a data synchronization method, device and electronic equipment.
背景技术Background technique
在当前的大型系统或互联网项目开发中,系统环境通常包括开发环境、测试环境和生产环境。为了减少重复配置次数和降低数据配置出错率,基础数据或配置项数据通常在开发环境中配置,将开发完成后的数据同步迁移到测试环境进行测试,最后将测试完的数据同步迁移到生产环境。此外,系统中的一些配置数据需要通过获取第三方数据进行处理后入库供使用。In current large-scale system or Internet project development, the system environment usually includes development environment, test environment and production environment. In order to reduce the number of repeated configurations and reduce the error rate of data configuration, basic data or configuration item data are usually configured in the development environment, the data after development is synchronously migrated to the test environment for testing, and finally the tested data is synchronously migrated to the production environment. . In addition, some configuration data in the system needs to be processed by obtaining third-party data and then stored in the database for use.
上述现有数据同步方法或系统的实现,主要依赖源数据库的变化检测,因此,无法同时支持多种数据源获取数据;在获取源数据库中的变更数据并发送消息通知目标数据库时,在目标数据库上执行SQL脚本更新数据已达到源数据库和目标数据库中数据存储状态一致性,使得不可灵活设置同步时间或控制数据同步频率。The implementation of the above-mentioned existing data synchronization methods or systems mainly relies on the change detection of the source database. Therefore, it cannot support multiple data sources to obtain data at the same time; when obtaining the changed data in the source database and sending a message to notify the target database, the target database Executing SQL scripts to update data has reached the consistency of data storage status in the source database and target database, making it impossible to flexibly set the synchronization time or control the frequency of data synchronization.
发明内容Contents of the invention
有鉴于此,本申请提供了一种数据同步方法、装置及电子设备,主要目的在于解决目前现有技术无法同时支持多种数据源获取数据,以及不可灵活设置同步时间或控制数据同步频率的技术问题。In view of this, this application provides a data synchronization method, device and electronic equipment. The main purpose is to solve the problem that the current existing technology cannot support multiple data sources to obtain data at the same time, and cannot flexibly set the synchronization time or control the data synchronization frequency. question.
根据本公开的第一个方面,提供了一种数据同步方法,该方法包括:According to a first aspect of the present disclosure, a data synchronization method is provided, which method includes:
定时获取预设配置信息,所述预设配置信息包括:周期启动任务时间、预设处理规则;Regularly obtain preset configuration information, which includes: periodic startup task time and preset processing rules;
在当前时间等于所述周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;When the current time is equal to the cycle startup task time, use multiple data collection programs to collect initial data from each data source according to the corresponding execution parameters;
利用数据处理程序对所述初始数据按照所述预设处理规则处理,得到处理后的目标数据。Use a data processing program to process the initial data according to the preset processing rules to obtain processed target data.
根据本公开的第二个方面,提供了一种数据同步装置,该装置包括:According to a second aspect of the present disclosure, a data synchronization device is provided, which device includes:
第一获取模块,用于定时获取预设配置信息,所述预设配置信息包括:周期启动任务时间、预设处理规则;The first acquisition module is used to regularly acquire preset configuration information, where the preset configuration information includes: periodic startup task time and preset processing rules;
采集模块,用于在当前时间等于所述周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;The collection module is used to use multiple data collection programs to collect initial data from each data source according to the corresponding execution parameters when the current time is equal to the cycle startup task time;
处理模块,用于利用数据处理程序对所述初始数据按照所述预设处理规则处理,得到处理后的目标数据。A processing module, configured to use a data processing program to process the initial data according to the preset processing rules to obtain processed target data.
根据本公开的第三个方面,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述第一方面的方法。According to a third aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the at least one processor. The instructions are executed by the at least one processor, so that the at least one processor can execute the method of the first aspect.
根据本公开的第四个方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行前述第一方面的方法。According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method of the first aspect.
本公开提供的数据同步方法、装置及电子设备,与现有技术相比,本公开通过定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则;在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;利用数据处理程序对初始数据按照预设处理规则处理,得到处理后的目标数据。通过这种方式,既同时支持从多种数据源获取数据,又可以灵活设置同步时间或控制数据同步频率。Compared with the existing technology, the data synchronization method, device and electronic equipment provided by the present disclosure obtain preset configuration information through timing. The preset configuration information includes: periodic startup task time, preset processing rules; at the current time, it is equal to When the task is started in a cycle, multiple data collection programs are used to collect initial data from each data source according to the corresponding execution parameters; the data processing program is used to process the initial data according to preset processing rules to obtain the processed target data. In this way, it not only supports obtaining data from multiple data sources at the same time, but also can flexibly set the synchronization time or control the data synchronization frequency.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solutions of the present application. In order to have a clearer understanding of the technical means of the present application, they can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and understandable. , the specific implementation methods of the present application are specifically listed below.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
为了更清楚地说明本申请实施例或现有技术中的技术申请,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present application or the technical applications in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those of ordinary skill in the art, It is said that other drawings can be obtained based on these drawings without exerting creative labor.
图1为本公开实施例所提供的一种数据同步方法的流程示意图;Figure 1 is a schematic flow chart of a data synchronization method provided by an embodiment of the present disclosure;
图2为本公开实施例所提供的一种数据同步方法的流程示意图;Figure 2 is a schematic flowchart of a data synchronization method provided by an embodiment of the present disclosure;
图3为本公开实施例所提供的一种数据同步方法的整体流程图;Figure 3 is an overall flow chart of a data synchronization method provided by an embodiment of the present disclosure;
图4为本公开实施例所提供的一种周期任务图;Figure 4 is a periodic task diagram provided by an embodiment of the present disclosure;
图5为本公开实施例所提供的一种数据同步装置的结构示意图。FIG. 5 is a schematic structural diagram of a data synchronization device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness. It should be noted that, as long as there is no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other.
下面参考附图描述本公开实施例的数据同步方法、装置及电子设备。The following describes the data synchronization method, device and electronic equipment according to the embodiments of the present disclosure with reference to the accompanying drawings.
本公开提供了一种数据同步方法、装置及电子设备,既同时支持从多种数据源获取数据,又可以灵活设置同步时间或控制数据同步频率。The present disclosure provides a data synchronization method, device and electronic equipment, which not only supports obtaining data from multiple data sources at the same time, but also can flexibly set the synchronization time or control the data synchronization frequency.
如图1所示,本公开的实施例提供了一种数据同步方法,其中,该方法可包括:As shown in Figure 1, an embodiment of the present disclosure provides a data synchronization method, where the method may include:
步骤101、定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则。Step 101: Obtain preset configuration information regularly. The preset configuration information includes: periodic startup task time and preset processing rules.
其中,预设配置信息可为在定时任务平台中为任务设定的一系列参数和设置,这些参数和设置决定了任务的执行时间、执行周期、执行方式等。Among them, the preset configuration information can be a series of parameters and settings set for the task in the scheduled task platform. These parameters and settings determine the execution time, execution cycle, execution method, etc. of the task.
预设配置信息可包括数据采集程序的注册,数据采集程序执行参数、数据处理的预设处理规则、周期启动任务时间、执行失败是否重试、最大失败重试次数等参数配置。The preset configuration information may include the registration of the data collection program, data collection program execution parameters, preset processing rules for data processing, periodic startup task time, whether to retry if the execution fails, the maximum number of failed retries, and other parameter configurations.
对于本公开实施例,执行主体可为数据同步装置或设备,以定时任务平台为数据同步装置或设备为例,对本公开中的技术方案进行说明,但并不构成对本公开中技术方案的具体限定。For the embodiments of the present disclosure, the execution subject can be a data synchronization device or equipment. Taking the scheduled task platform as a data synchronization device or equipment as an example, the technical solutions in the present disclosure will be described, but this does not constitute a specific limitation to the technical solutions in the present disclosure. .
其中,定时任务平台主要负责定时监控,与数据采集程序进行交互,提供可视化的配置,任务程序注册及全流程的监控及展示,使得用户可以灵活设置同步时间,并控制数据同步频率,以满足不同场景和需求。Among them, the scheduled task platform is mainly responsible for timing monitoring, interacting with the data collection program, providing visual configuration, task program registration, and full-process monitoring and display, allowing users to flexibly set synchronization time and control data synchronization frequency to meet different needs. Scenarios and needs.
步骤102、在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据。Step 102: When the current time is equal to the cycle startup task time, use multiple data collection programs to collect initial data from each data source according to the corresponding execution parameters.
对于本公开实施例,在定时任务平台中,可以为每个数据同步任务注册一个或多个数据采集程序,并配置相应的执行参数,以便平台能够识别和调度相应的数据采集任务。这些数据采集程序相互独立,互不影响,可以处理不同类型的数据源,如关系型数据库、非关系型数据库或通过调用第三方程序获取的数据等。For the embodiments of the present disclosure, in the scheduled task platform, one or more data collection programs can be registered for each data synchronization task, and corresponding execution parameters can be configured so that the platform can identify and schedule the corresponding data collection tasks. These data collection programs are independent of each other and do not affect each other. They can process different types of data sources, such as relational databases, non-relational databases, or data obtained by calling third-party programs.
当完成所有配置后,一旦到达用户设定的周期启动任务时间,定时任务平台就会启动相关的数据采集程序按照对应的执行参数执行数据采集任务,在各个数据源中采集初始数据,其中,初始数据可为在某个周期启动任务时间或状态下,采集的初始版本的数据集。After all configurations are completed, once the periodic task start time set by the user is reached, the scheduled task platform will start the relevant data collection program to execute the data collection task according to the corresponding execution parameters, and collect initial data from each data source. Among them, the initial The data can be an initial version of the data set collected when the task is started or in a certain cycle.
周期启动任务的时间可为指定时任务启动执行的时间点,例如,一个定时任务在每小时的整点启动,那么它的周期启动时间就是每小时整点,以确保数据同步时间点的准确性;The periodic start time of the task can be the time point at which the task starts execution at a specified time. For example, if a scheduled task is started on the hour every hour, then its periodic start time is the hour every hour to ensure the accuracy of the data synchronization time point. ;
执行参数用于指导数据采集程序在执行数据同步任务时如何采集数据、处理数据以及与其他组件进行交互,可包括采集范围、数据采集的频率、数据源的地址、需要采集的数据的条件等,以满足不同数据源的采集需求;Execution parameters are used to guide the data collection program on how to collect data, process data, and interact with other components when executing data synchronization tasks. They can include the collection range, frequency of data collection, address of the data source, conditions for the data that needs to be collected, etc. To meet the collection needs of different data sources;
数据采集程序用于读取数据库中存储的执行参数,根据获取到的执行参数,数据采集程序会运行相关程序代码,从数据源中获取符合条件的数据。The data collection program is used to read the execution parameters stored in the database. Based on the obtained execution parameters, the data collection program will run the relevant program code to obtain qualified data from the data source.
步骤103、利用数据处理程序对初始数据按照预设处理规则处理,得到处理后的目标数据。Step 103: Use a data processing program to process the initial data according to preset processing rules to obtain processed target data.
其中,预设处理规则可为预先设置的数据处理程序的执行规则,用于指定如何对数据进行清洗、转换、整合等的指导方针可包括数据清洗、转换、合并等操作,以便对采集到的数据进行预处理,保证数据的一致性、准确性和完整性,从而确保数据在存储和分析过程中具有较高的价值;Among them, the preset processing rules can be the execution rules of a preset data processing program, and the guidelines used to specify how to clean, transform, integrate, etc. the data can include data cleaning, transformation, merging and other operations, so as to collect the collected data. Data is preprocessed to ensure the consistency, accuracy and completeness of the data, thereby ensuring that the data has a high value during storage and analysis;
数据处理程序可为用于对原始数据进行清洗、转换、整合和分析等操作,以满足特定需求或目标的计算机程序;Data processing programs can be computer programs used to clean, transform, integrate and analyze raw data to meet specific needs or goals;
目标数据可为经过数据处理程序处理后的数据,其质量、格式和内容满足后续分析、建模和应用等需求。The target data can be data processed by a data processing program, and its quality, format, and content meet the needs of subsequent analysis, modeling, and application.
综上,本公开提供的数据同步方法,与现有技术相比,本公开通过定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则;在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;利用数据处理程序对初始数据按照预设处理规则处理,得到处理后的目标数据。通过这种方式,既同时支持从多种数据源获取数据,又可以灵活设置同步时间或控制数据同步频率。In summary, compared with the existing technology, the data synchronization method provided by the present disclosure obtains preset configuration information through timing. The preset configuration information includes: periodic startup task time and preset processing rules; when the current time is equal to periodic startup During task time, multiple data collection programs are used to collect initial data from each data source according to the corresponding execution parameters; the data processing program is used to process the initial data according to preset processing rules to obtain the processed target data. In this way, it not only supports obtaining data from multiple data sources at the same time, but also can flexibly set the synchronization time or control the data synchronization frequency.
进一步的,作为上述实施例的细化和扩展,为了完整说明本公开方法的具体实现过程,本公开提供了如图2所示的具体方法,该方法包括:Further, as a refinement and expansion of the above embodiments, in order to fully explain the specific implementation process of the disclosed method, the present disclosure provides a specific method as shown in Figure 2, which method includes:
步骤201、定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则。Step 201: Obtain preset configuration information regularly. The preset configuration information includes: periodic startup task time and preset processing rules.
对于本公开实施例,主要通过引入定时任务平台、消息队列中间件以及数据库来解决现有问题;For the embodiments of the present disclosure, existing problems are mainly solved by introducing a scheduled task platform, message queue middleware and database;
如图3所示,数据采集程序、数据处理程序、数据存储程序及定时任务平台四个部分组成整个数据同步系统。这种分工合作的模式使得系统能够同时支持多种数据源的获取数据,提高数据同步的灵活性和兼容性。As shown in Figure 3, the four parts of the data collection program, data processing program, data storage program and scheduled task platform constitute the entire data synchronization system. This division of labor and cooperation model enables the system to support data acquisition from multiple data sources at the same time, improving the flexibility and compatibility of data synchronization.
引入消息队列中间件进行各个程序间的异步通信,解决数据处理和数据采集、数据存储之间的耦合问题。这种方式使得数据处理过程可以独立于数据采集和存储过程,提高系统的并发处理能力和性能。Introducing message queue middleware for asynchronous communication between various programs to solve the coupling problem between data processing, data collection, and data storage. This method makes the data processing process independent of the data collection and storage process, improving the system's concurrent processing capabilities and performance.
引入数据库存储数据采集程序执行参数、预设处理规则及程序执行结果等信息,方便统一管理和监控。A database is introduced to store information such as data collection program execution parameters, preset processing rules, and program execution results to facilitate unified management and monitoring.
数据采集程序负责从各数据源中采集数据,并将数据存到缓存中,通知数据处理程序;数据处理程序根据预先设定的处理规则对缓存中的数据进行处理,并将处理完的数据发送至数据存储程序进行存储,这种模式既保证了数据同步的实时性,又能在数据变化较小时降低系统资源消耗。The data acquisition program is responsible for collecting data from various data sources, storing the data in the cache, and notifying the data processing program; the data processing program processes the data in the cache according to preset processing rules, and sends the processed data to the data storage program for storage. This mode not only ensures real-time data synchronization, but also reduces system resource consumption when data changes are small.
步骤202、在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据。Step 202: When the current time is equal to the cycle startup task time, use multiple data collection programs to collect initial data from each data source according to the corresponding execution parameters.
其中,数据采集程序等程序可以在定时任务平台中运行,也可以在其他服务器或设备上运行,用于与定时任务平台进行通信和协作。定时任务平台负责调度和监控这些程序的执行,以确保数据同步任务的顺利进行。同时,数据采集程序的运行环境和数据处理能力可以根据实际需求进行选择和配置,以满足各种不同场景下的数据同步需求。Among them, programs such as data collection programs can run in the scheduled task platform or on other servers or devices to communicate and collaborate with the scheduled task platform. The scheduled task platform is responsible for scheduling and monitoring the execution of these programs to ensure the smooth progress of data synchronization tasks. At the same time, the operating environment and data processing capabilities of the data collection program can be selected and configured according to actual needs to meet the data synchronization needs in various scenarios.
对于本公开实施例,在定时任务平台的调度下,数据采集程序可根据用户设定的参数和规则执行数据采集任务,其中,不同的数据采集程序可以采集不同类型的数据。数据源可以是关系型数据库中的数据,如MySQL、Oracle等;也可以是非关系型数据库中的数据,如MongoDB、Redis等;还可以是通过调用第三方程序获得的数据。For the embodiments of the present disclosure, under the scheduling of the scheduled task platform, the data collection program can execute the data collection task according to the parameters and rules set by the user, where different data collection programs can collect different types of data. The data source can be data in a relational database, such as MySQL, Oracle, etc.; it can also be data in a non-relational database, such as MongoDB, Redis, etc.; it can also be data obtained by calling a third-party program.
每个数据采集程序都需要对应的数据源配置,可包括数据源的地址、访问方式、数据类型等信息;数据采集程序可以通过执行参数配置采集全量数据还是增量数据,全量数据可为所有数据,而增量数据可为上次采集以来新增的数据;Each data collection program requires corresponding data source configuration, which can include the address of the data source, access method, data type and other information; the data collection program can collect full data or incremental data through execution parameter configuration, and full data can be all data , and incremental data can be newly added data since the last collection;
数据采集程序还可以通过执行参数来设置采集条件,以便采集符合特定需求的数据。例如,可以设置采集某个特定字段的数据、某个时间段的数据等。The data collection program can also set collection conditions through execution parameters to collect data that meets specific needs. For example, you can set up the collection of data in a specific field, data in a certain time period, etc.
步骤203、计算初始数据的数据量;在数据量大于0的情况下,将初始数据存储到缓存系统中,得到缓存数据;利用消息中间件将缓存数据以消息的形式传递给数据处理程序。Step 203: Calculate the data volume of the initial data; if the data volume is greater than 0, store the initial data in the cache system to obtain the cached data; use the message middleware to transfer the cached data to the data processing program in the form of a message.
对于本公开实施例,如果数据采集程序没有获取到任何数据(即初始数据的数据量等于0的情况下),则程序的流程结束。For the embodiment of the present disclosure, if the data collection program does not obtain any data (that is, when the amount of initial data is equal to 0), the flow of the program ends.
如果数据采集程序获取到了数据,则这些数据可被存储到缓存系统中。缓存系统用于临时存储数据,以便后续的数据处理;在缓存数据后,数据采集程序会发送一条消息,消息中间件负责接收消息,并将消息发送给数据处理程序,用于唤醒数据处理程序启动新任务处理缓存数据,数据处理程序根据消息中包含的数据信息,对缓存中的数据进行处理,同时数据采集程序还会将采集程序执行的相关信息存储至数据库中,如数据采集时长、数据采集总量等。If the data is obtained by the data acquisition program, the data can be stored in the cache system. The cache system is used to temporarily store data for subsequent data processing; after caching the data, the data collection program will send a message, and the message middleware is responsible for receiving the message and sending the message to the data processing program to wake up the data processing program to start The new task processes cached data. The data processing program processes the data in the cache based on the data information contained in the message. At the same time, the data collection program will also store relevant information about the collection program execution into the database, such as data collection duration, data collection time, etc. total amount etc.
其中,消息可包含缓存数据的相关信息,如数据源、数据类型、数据量等;消息中间件(Message Queue,MQ)是一种在分布式系统中用于处理消息传递和通信的技术,是一种异步通信机制,通过将消息发送到队列中,实现应用程序之间的解耦合。Among them, the message can contain information related to cached data, such as data source, data type, data volume, etc.; Message Queue (MQ) is a technology used to handle message delivery and communication in distributed systems. An asynchronous communication mechanism that decouples applications by sending messages to a queue.
步骤204、确定线程池中的线程数量;在数据处理程序接收到消息时,利用线程池中的线程通过第一预设处理算法按照预设处理规则,对缓存数据进行数据处理,得到处理后的目标数据。Step 204: Determine the number of threads in the thread pool; when the data processing program receives the message, it uses the threads in the thread pool to perform data processing on the cached data according to the preset processing rules through the first preset processing algorithm to obtain the processed target data.
其中,线程池可为用于一种用于分配、调度和管理线程的工具,在数据处理程序中,线程池可以在多个线程之间分配任务,使得数据处理程序可以同时处理多个数据任务,而无需等待一个任务完成后才开始下一个任务,可充分利用计算机的多核处理器,有助于缩短数据处理的总时间,提高处理速度。Among them, the thread pool can be a tool used to allocate, schedule and manage threads. In the data processing program, the thread pool can allocate tasks among multiple threads, so that the data processing program can process multiple data tasks at the same time. , instead of waiting for one task to complete before starting the next task, it can make full use of the computer's multi-core processor, helping to shorten the total time of data processing and increase processing speed.
对于本公开实施例,当数据处理程序接收到消息中间件发送的消息后唤醒,并从数据库中读取预设处理规则,并利用预先定义好的算法(即第一预设处理算法),对缓存中的原始数据(即缓存数据)进行清洗和处理,得到处理后的目标数据(即目标数据库存储格式数据)。其中,第一预设处理算法可为在数据处理程序中设定的算法,用于对数据进行转换或处理;第一预设处理算法可存储在数据处理程序中,可用于实现简单的一种数据模型转换成另一种数据模型,也可以通过复杂的算法运算,将原数据模型转换成另一种数据模型;目标数据可为经过清洗和处理后,符合目标数据库存储要求的数据。For the embodiment of the present disclosure, when the data processing program receives the message sent by the message middleware, it wakes up, reads the preset processing rules from the database, and uses the predefined algorithm (i.e., the first preset processing algorithm) to The original data in the cache (that is, the cache data) is cleaned and processed to obtain the processed target data (that is, the target database storage format data). Among them, the first preset processing algorithm can be an algorithm set in the data processing program, used to convert or process data; the first preset processing algorithm can be stored in the data processing program, and can be used to implement a simple The data model can be converted into another data model, or the original data model can be converted into another data model through complex algorithm operations; the target data can be data that meets the storage requirements of the target database after cleaning and processing.
预设处理规则可用于识别和去除缓存数据中的异常值、错误值和重复值,从而提高数据的质量;还可用于实现不同数据格式之间的转换,使数据符合目标数据库或分析工具的要求;还可用于指导将来自不同数据源的数据进行合并、聚合和关联,以便进行进一步的分析。Preset processing rules can be used to identify and remove outliers, erroneous values, and duplicate values in cached data to improve data quality; they can also be used to convert between different data formats to make the data meet the requirements of the target database or analysis tool ; can also be used to guide the merging, aggregation and correlation of data from different data sources for further analysis.
在本公开实施例中,预设处理规则可为一种操作指南,描述了如何对缓存数据进行处理的过程;而第一预设处理算法可为一种具体的实现方法,可根据预设处理规则对缓存数据进行清洗、转换和整合等。预设处理规则用于为第一预设处理算法提供指导,第一预设处理算法可根据预设处理规则执行具体操作。在实际应用场景中,预设处理规则和第一预设处理算法可共同确保缓存数据在处理过程中的质量和准确性。In the embodiment of the present disclosure, the preset processing rule can be an operation guide, describing the process of how to process cached data; and the first preset processing algorithm can be a specific implementation method, which can be processed according to the preset Rules clean, transform and integrate cached data. The preset processing rules are used to provide guidance for the first preset processing algorithm, and the first preset processing algorithm can perform specific operations according to the preset processing rules. In actual application scenarios, the preset processing rules and the first preset processing algorithm can jointly ensure the quality and accuracy of cached data during processing.
对于本公开实施例,确定合理大小的线程池(即确定线程池中的线程数量)可为在特定应用场景下,根据系统资源、任务负载和性能要求等因素来配置的最佳线程数量,合理大小的线程池可以在保证系统资源充分利用的同时,避免线程过多或过少所带来的问题。For the embodiments of the present disclosure, determining a reasonably sized thread pool (that is, determining the number of threads in the thread pool) can be the optimal number of threads configured according to factors such as system resources, task load, and performance requirements in a specific application scenario. A large and small thread pool can avoid problems caused by too many or too few threads while ensuring full utilization of system resources.
具体的,数据处理算法中还包含重要的生成合理大小的线程池算法,算法公式(第二预设处理算法)如下:Specifically, the data processing algorithm also includes an important algorithm for generating a thread pool of reasonable size. The algorithm formula (second preset processing algorithm) is as follows:
式中,Nthreads表示线程池,Ncpu表示处理器核心数,Ucpu表示处理器资源资源使用率,W表示程序等待时长,C表示程序处理时长;In the formula, N threads represents the thread pool, N cpu represents the number of processor cores, U cpu represents the processor resource utilization, W represents the program waiting time, and C represents the program processing time;
其中,处理器(可为CPU)核心数越高表示计算能力越高;当处理器资源资源使用率较高时,可能需要增加线程数以提高数据处理速度;而当处理器资源资源使用率较低时,可以适当减少线程数以降低系统资源争抢程度;程序等待时长(可为程序花费在等待(例如等待IO操作结果)上的时长),线程池算法需要考虑程序在等待(如IO操作结果)上的时间,等待时间越长,说明程序在等待IO操作上的开销越大,可能需要增加线程数以提高数据处理速度;程序处理时长(即程序实际占用处理器计算的时长),线程池算法需要考虑程序实际占用处理器计算的时间,计算时间越长,说明程序在计算方面的开销越大,可能需要增加线程数以提高数据处理速度。Among them, the higher the number of cores of the processor (which can be a CPU), the higher the computing power; when the processor resource usage is high, the number of threads may need to be increased to increase the data processing speed; and when the processor resource usage is high When it is low, the number of threads can be appropriately reduced to reduce the degree of contention for system resources; the program waiting time (can be the length of time the program spends waiting (such as waiting for IO operation results)), and the thread pool algorithm needs to consider the time the program is waiting (such as IO operations). result), the longer the waiting time, the greater the program's overhead in waiting for IO operations, and the number of threads may need to be increased to increase the data processing speed; the program processing time (that is, the time the program actually takes up the processor for calculations), the thread The pool algorithm needs to consider the time that the program actually takes up the processor for calculation. The longer the calculation time, the greater the computational overhead of the program. The number of threads may need to be increased to increase the data processing speed.
可利用JVisualVM工具收集程序的监视数据,计算程序等待时长与程序处理时长之间的比值,可以帮助评估程序在等待IO操作上的开销与实际计算开销之间的关系,从而调整线程池的大小。You can use the JVisualVM tool to collect monitoring data of the program and calculate the ratio between the waiting time of the program and the processing time of the program. This can help evaluate the relationship between the program's overhead in waiting for IO operations and the actual computing overhead, thereby adjusting the size of the thread pool.
通过该算法,数据处理程序生成合理大小的线程池,可批量处理数据,以提高数据处理能力与速度。Through this algorithm, the data processing program generates a reasonably sized thread pool that can process data in batches to improve data processing capabilities and speed.
相应的,确定线程池中的线程数量的具体过程可包括:获取处理器数据以及数据存储程序的程序数据,处理器数据包括处理器核心数和处理器资源资源使用率,程序数据包括程序等待时长和程序处理时长;Accordingly, the specific process of determining the number of threads in the thread pool may include: obtaining processor data and program data of the data storage program. The processor data includes the number of processor cores and processor resource usage, and the program data includes the program waiting time. and program processing time;
将处理器核心数、处理器资源资源使用率、程序等待时长和程序处理时长代入第二预设处理算法,确定线程池中的线程数量。The number of processor cores, processor resource utilization, program waiting time, and program processing time are substituted into the second preset processing algorithm to determine the number of threads in the thread pool.
步骤205、将处理后的目标数据发送至数据存储程序进行数据存储,数据存储程序用于接收目标数据,并对目标数据进行更新或插入操作,实现数据存储。Step 205: Send the processed target data to the data storage program for data storage. The data storage program is used to receive the target data and update or insert the target data to implement data storage.
其中,数据存储可为将目标数据保存在计算机系统中的过程,包括数据的存储、管理和维护等;数据存储程序可为负责将目标数据存储到数据库或其他存储系统的程序。Among them, data storage can be the process of saving target data in a computer system, including data storage, management and maintenance, etc.; data storage program can be a program responsible for storing target data in a database or other storage system.
数据处理程序会将处理后的目标数据发送给数据存储程序进行存储,数据存储程序负责接收数据处理程序处理完成的目标数据,将其更新或插入到相应的数据存储结构中,从而实现数据的高效存储和管理。The data processing program will send the processed target data to the data storage program for storage. The data storage program is responsible for receiving the target data processed by the data processing program and updating or inserting it into the corresponding data storage structure to achieve data efficiency. Storage and management.
每个线程任务执行完存储操作后,可将执行时长存储到数据库中,有助于记录任务执行情况,便于后续分析、监控和优化系统性能。After each thread task completes the storage operation, the execution time can be stored in the database, which helps record the task execution and facilitates subsequent analysis, monitoring, and optimization of system performance.
步骤206、获取数据采集程序中配置的预设任务执行周期;在判断预设任务执行周期不合理的情况下,对预设任务执行周期进行预警提醒,并优化预设任务执行周期。Step 206: Obtain the preset task execution cycle configured in the data collection program; when it is judged that the preset task execution cycle is unreasonable, provide an early warning reminder for the preset task execution cycle, and optimize the preset task execution cycle.
其中,预设任务执行周期可为指定时任务每隔一段时间执行一次的时间间隔,即从任务上一次执行结束到下一次执行开始的时间段;例如,一个定时任务每分钟执行一次,那么它的周期执行时间就是1分钟,在实际应用中,可以根据需求调整预设任务执行周期以实现更灵活的任务调度。Among them, the preset task execution cycle can be the time interval at which the task is executed at a specified time, that is, the time period from the end of the last execution of the task to the beginning of the next execution; for example, if a scheduled task is executed once every minute, then it The cycle execution time is 1 minute. In practical applications, the preset task execution cycle can be adjusted according to needs to achieve more flexible task scheduling.
对于本公开实施例,由于每个数据同步任务采集的数据源不一样,数据采集、处理规则不一样以及每次数据采集的数据总量也不一样,可通过对数据同步任务进行分析和计算,不断调整和优化预设任务执行周期,使每种数据同步任务每次执行效果达到最佳。For the embodiments of the present disclosure, since the data sources collected by each data synchronization task are different, the data collection and processing rules are different, and the total amount of data collected each time is also different, the data synchronization task can be analyzed and calculated, Continuously adjust and optimize the preset task execution cycle to achieve the best performance for each data synchronization task.
对于本公开实施例,判断预设任务执行周期不合理的方法可包括获取预设任务执行周期内的任务数、执行次数以及执行时间;For embodiments of the present disclosure, the method for determining that the preset task execution cycle is unreasonable may include obtaining the number of tasks, the number of executions, and the execution time within the preset task execution cycle;
计算所有数据采集程序对应的预设任务执行周期的超周期,以及根据执行次数以及执行时间计算数据采集程序对应的截止期限;Calculate the super period of the preset task execution cycle corresponding to all data collection programs, and calculate the deadline corresponding to the data collection program based on the number of executions and execution time;
将预设任务执行周期和超周期代入任务数计算公式,得到超周期内的最大任务数,以及将预设任务执行周期以及执行时间代入负载计算公式,得到系统负载;Substitute the preset task execution period and super-period into the task number calculation formula to obtain the maximum number of tasks within the super-period, and substitute the preset task execution period and execution time into the load calculation formula to obtain the system load;
若满足任务数大于最大任务数、系统负载大于1、执行时间大于截止期限中的至少一项,则确定设任务执行周期不合理。If at least one of the following: the number of tasks is greater than the maximum number of tasks, the system load is greater than 1, and the execution time is greater than the deadline, it is determined that the task execution cycle is unreasonable.
在具体应用场景中,假定,定时任务平台注册了N个数据采集程序T1、T2、…、Tn,第Ti个数据采集程序配置的预设任务执行周期为Pi,则此数据采集程序第j次运行时间为(j-1)Pi(j=1,2,3,…,n"),运行相对截止期限为Dij=jPi,所有数据采集程序执行周期的最小公倍数即任务集的超周期记为SLCM。对于数据采集程序,每个超周期内的情况是相同的,所以只需要在[0,SLCM]范围内进行研究。周期内最大的任务数为:In the specific application scenario, it is assumed that the scheduled task platform has registered N data collection programs T1, T2,..., Tn, and the preset task execution period configured for the Ti-th data collection program is Pi, then the j-th data collection program The running time is (j-1)Pi (j=1, 2, 3,..., n"), the relative running deadline is D ij = jP i , and the least common multiple of the execution cycles of all data collection programs is the super period of the task set Denoted as SLCM. For the data collection program, the situation in each super period is the same, so it only needs to be studied in the range of [0, SLCM]. The maximum number of tasks in the period is:
其中,超周期可为所有数据采集程序的预设任务执行周期的最小公倍数;最小公倍数可为两个或多个整数共有的倍数中最小的一个。Among them, the super period can be the least common multiple of the preset task execution cycles of all data acquisition programs; the least common multiple can be the smallest common multiple of two or more integers.
在具体应用场景中,假定,执行n个数据采集程序的n次采集任务Ti,其执行时间为Ci,周期为Pi。在不考虑系统的其他辅助开销时,对于截止期限优先算法如果满足下式的采集程序就能够被处理器执行。In a specific application scenario, it is assumed that n collection tasks Ti of n data collection programs are executed, with execution time Ci and period Pi. Without considering other auxiliary overhead of the system, the collection program for the deadline priority algorithm can be executed by the processor if it satisfies the following formula.
其中,执行时间可为完成一个特定任务(在这里是数据采集程序)所需要的时间。具体来说,可为从任务开始执行到任务完成所经过的时间,可表示为:执行时间=实际完成任务的时间/任务执行的次数;ρ表示系统负载,若计算出系统负载为1.275>1,即不可调度(超载),系统不能保证所有的任务在截止期限前完成。Among them, the execution time can be the time required to complete a specific task (in this case, the data collection program). Specifically, it can be the time elapsed from the start of task execution to the completion of the task, which can be expressed as: execution time = actual time to complete the task/number of task executions; ρ represents the system load. If the calculated system load is 1.275>1 , that is, unschedulable (overloaded), the system cannot guarantee that all tasks will be completed before the deadline.
在具体应用场景中,记t为系统当前时间,Cr为数据采集程序Ti的估算执行时间,Dij为数据采集程序Ti的第j次运行的截止期限。如果有d=Dij-(t+Cr)≥0表示该次采集程序运行的截止期限是当前可达到的。于是在调度时,计算被调度就绪任务d,如d大于0则进行执行,反之不执行。In the specific application scenario, t is the current time of the system, Cr is the estimated execution time of the data collection program Ti, and Dij is the deadline for the jth run of the data collection program Ti. If d=D ij -(t+C r )≥0, it means that the deadline for running the collection program is currently achievable. Therefore, during scheduling, the scheduled ready task d is calculated. If d is greater than 0, it will be executed, otherwise it will not be executed.
其中,截止期限优先算法(Earliest Deadline First,EDF)可用于实时系统中对具有截止期限的任务进行调度的算法,在这种算法中,任务的优先级根据其截止期限动态分配,截止期限越早,优先级越高。在不考虑系统其他辅助开销的情况下,如果一个采集程序满足以下条件,则有可能被处理器执行:Among them, the Earliest Deadline First (EDF) algorithm can be used in real-time systems to schedule tasks with deadlines. In this algorithm, the priority of a task is dynamically assigned according to its deadline. The earlier the deadline, the earlier the deadline. , the higher the priority. Without considering other auxiliary overhead of the system, if a collection program meets the following conditions, it may be executed by the processor:
采集程序的执行时间不超过其截止期限,以及采集程序在截止期限之前完成执行,从而满足实时性要求。The execution time of the collection program does not exceed its deadline, and the collection program completes execution before the deadline, thereby meeting the real-time requirement.
在处理实时任务时,截止期限优先算法会优先考虑那些截止期限临近的任务,确保它们能在规定的时间内完成执行。这样的调度策略可以提高系统的实时性能,确保各项任务按照预期的时间顺序执行。When processing real-time tasks, the deadline priority algorithm will give priority to tasks with approaching deadlines to ensure that they can be completed within the specified time. Such a scheduling strategy can improve the real-time performance of the system and ensure that tasks are executed in the expected time sequence.
在具体应用场景中,例如,定时任务平台注册了3个数据采集程序A、B、C,配置的任务关键程度Ki分别为中、低、高,配置的执行周期Pi分别为20ms、40ms、50ms,根据存储在数据库中的已执行参数查出数据采集程序A、B、C历次执行中用时最长的时长Ci分别为10ms、15ms、20ms,如图4所示。In specific application scenarios, for example, the scheduled task platform has registered three data collection programs A, B, and C. The configured task criticality Ki is medium, low, and high respectively, and the configured execution periods Pi are 20ms, 40ms, and 50ms respectively. , according to the executed parameters stored in the database, it is found that the longest duration Ci in the previous executions of data collection programs A, B, and C is 10ms, 15ms, and 20ms respectively, as shown in Figure 4.
对于本公开实施例,由上述可知,根据定时任务平台注册的数据采集程序个数、每个数据采集程序配置的预设任务执行周期和任务关键度、数据库中存储的每个数据采集程序已执行的采集任务时长,可计算出系统周期内最大的任务数、以及可计算出程序是否能够被处理器执行、以及可计算出某次采集任务是否可执行,若满足任务数大于最大任务数、系统负载大于1、执行时间大于截止期限中的至少一项,则确定配置的预设任务执行周期不合理,系统则发送修改任务周期执行时间的提醒,收到提醒后可修改任务周期执行时间,从而达到不断优化定时任务周期执行时间的效果。For the embodiments of the present disclosure, it can be seen from the above that according to the number of data collection programs registered on the scheduled task platform, the preset task execution cycle and task criticality configured for each data collection program, and the execution of each data collection program stored in the database The duration of the collection task can be used to calculate the maximum number of tasks within the system cycle, whether the program can be executed by the processor, and whether a certain collection task can be executed. If the number of tasks is greater than the maximum number of tasks, the system If at least one of the load is greater than 1 and the execution time is greater than the deadline, it is determined that the configured preset task execution cycle is unreasonable, and the system will send a reminder to modify the task cycle execution time. After receiving the reminder, the task cycle execution time can be modified, thus Achieve the effect of continuously optimizing the execution time of the scheduled task cycle.
对于本公开实施例,本申请提出了支持多数据源的概念,其本质上是一组不同数据采集、处理、存储程序的集合,支持多样化采集需求,该数据来源可以是关系型数据库中的数据、缓存型数据库中的数据或请求某个接口得到的数据等。For the embodiments of this disclosure, this application proposes the concept of supporting multiple data sources, which is essentially a set of different data collection, processing, and storage programs to support diversified collection requirements. The data source can be a relational database. Data, data in cache database or data obtained by requesting an interface, etc.
本申请将数据同步与定时任务进行整合,通过定时任务管理平台对数据同步程序进行集中管理。平台提供可视化的配置,可配置采集程序的执行参数、数据处理程序的处理规则和程序的执行周期等。This application integrates data synchronization with scheduled tasks, and centrally manages the data synchronization program through the scheduled task management platform. The platform provides visual configuration, which can configure the execution parameters of the collection program, the processing rules of the data processing program, and the execution cycle of the program.
本申请中数据处理步骤中加入数据处理算法和线程池大小生成算法,可处理不同需求、不同容量大小的数据。In this application, data processing algorithms and thread pool size generation algorithms are added to the data processing steps, which can process data with different needs and different capacities.
本申请中同步数据的定时任务执行时间可根据算法计算出是否合理,不合理则发送修改提醒,进而修改以达到优化效果。In this application, the timing task execution time of the synchronized data can be calculated based on the algorithm whether it is reasonable. If it is not reasonable, a modification reminder will be sent, and then modified to achieve the optimization effect.
本申请优化了传统数据同步系统中,单个系统只能支持单一数据源采集需求,引入了定时任务管理平台,管理平台可注册不同数据源采集需求程序,可以很好的解决数据源单一问题,并可对不同采集程序进行统一管理,且同步数据的定时任务执行时间可根据算法得知是否合理。This application optimizes the traditional data synchronization system. A single system can only support a single data source collection requirement. It introduces a scheduled task management platform. The management platform can register different data source collection requirement programs, which can well solve the problem of a single data source and Different collection programs can be managed uniformly, and the timing task execution time of synchronized data can be determined based on the algorithm to determine whether it is reasonable.
综上,本公开提供的数据同步方法,与现有技术相比,本公开通过定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则;在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;利用数据处理程序对初始数据按照预设处理规则处理,得到处理后的目标数据。通过这种方式,既同时支持从多种数据源获取数据,又可以灵活设置同步时间或控制数据同步频率。In summary, compared with the existing technology, the data synchronization method provided by the present disclosure obtains preset configuration information through timing. The preset configuration information includes: periodic startup task time and preset processing rules; when the current time is equal to periodic startup During task time, multiple data collection programs are used to collect initial data from each data source according to the corresponding execution parameters; the data processing program is used to process the initial data according to preset processing rules to obtain the processed target data. In this way, it not only supports obtaining data from multiple data sources at the same time, but also can flexibly set the synchronization time or control the data synchronization frequency.
基于上述图1和图2所示方法的具体实现,本实施例提供了一种数据同步装置,如图5所示,该装置包括:第一获取模块31、采集模块32、处理模块33;Based on the specific implementation of the methods shown in Figures 1 and 2, this embodiment provides a data synchronization device, as shown in Figure 5. The device includes: a first acquisition module 31, a collection module 32, and a processing module 33;
第一获取模块31,用于定时获取预设配置信息,所述预设配置信息包括:周期启动任务时间、预设处理规则;The first acquisition module 31 is used to regularly acquire preset configuration information, where the preset configuration information includes: periodic startup task time and preset processing rules;
采集模块32,用于在当前时间等于所述周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;The collection module 32 is configured to use multiple data collection programs to collect initial data from each data source according to the corresponding execution parameters when the current time is equal to the cycle startup task time;
处理模块33,用于利用数据处理程序对所述初始数据按照所述预设处理规则处理,得到处理后的目标数据。The processing module 33 is configured to use a data processing program to process the initial data according to the preset processing rules to obtain processed target data.
在具体的应用场景中,如5所示,该装置还包括:计算模块34、存储模块35、传递模块36、终止模块37;In a specific application scenario, as shown in 5, the device also includes: a calculation module 34, a storage module 35, a transfer module 36, and a termination module 37;
计算模块34,用于计算所述初始数据的数据量;Calculation module 34, used to calculate the data volume of the initial data;
存储模块35,用于在所述数据量大于0的情况下,将所述初始数据存储到缓存系统中,得到缓存数据;The storage module 35 is used to store the initial data in the cache system to obtain cache data when the data amount is greater than 0;
传递模块36,用于利用消息中间件将所述缓存数据以消息的形式传递给所述数据处理程序;The transfer module 36 is used to transfer the cached data to the data processing program in the form of a message using message middleware;
终止模块37,用于在所述初始数据的数据量等于0的情况下,则终止处理流程。The termination module 37 is configured to terminate the processing flow when the data amount of the initial data is equal to 0.
在具体的应用场景中,处理模块33,可用于确定线程池中的线程数量;在所述数据处理程序接收到所述消息时,利用所述线程池中的线程通过第一预设处理算法按照所述预设处理规则,对所述缓存数据进行数据处理,得到处理后的所述目标数据。In a specific application scenario, the processing module 33 can be used to determine the number of threads in the thread pool; when the data processing program receives the message, use the threads in the thread pool to use the first preset processing algorithm according to The preset processing rules perform data processing on the cached data to obtain the processed target data.
在具体的应用场景中,处理模块33,可用于获取处理器数据以及所述数据存储程序的程序数据,所述处理器数据包括处理器核心数和处理器资源资源使用率,所述程序数据包括程序等待时长和程序处理时长;In a specific application scenario, the processing module 33 can be used to obtain processor data and program data of the data storage program. The processor data includes the number of processor cores and processor resource usage. The program data includes Program waiting time and program processing time;
将所述处理器核心数、所述处理器资源资源使用率、所述程序等待时长和所述程序处理时长代入第二预设处理算法,确定所述线程池中的线程数量。The number of processor cores, the processor resource utilization rate, the program waiting time and the program processing time are substituted into the second preset processing algorithm to determine the number of threads in the thread pool.
在具体的应用场景中,如5所示,该装置还包括:存储模块38、第二获取模块39、提醒模块40;In a specific application scenario, as shown in 5, the device also includes: a storage module 38, a second acquisition module 39, and a reminder module 40;
存储模块38,用于将处理后的所述目标数据发送至数据存储程序进行数据存储,所述数据存储程序用于接收所述目标数据,并对所述目标数据进行更新或插入操作,实现数据存储;The storage module 38 is used to send the processed target data to a data storage program for data storage. The data storage program is used to receive the target data and update or insert the target data to realize data storage. storage;
第二获取模块39,用于获取所述数据采集程序中配置的预设任务执行周期;The second acquisition module 39 is used to acquire the preset task execution cycle configured in the data collection program;
提醒模块40,用于在判断所述预设任务执行周期不合理的情况下,对所述预设任务执行周期进行预警提醒,并优化所述预设任务执行周期。The reminder module 40 is configured to provide an early warning reminder for the preset task execution cycle and optimize the preset task execution cycle when it is determined that the preset task execution cycle is unreasonable.
在具体的应用场景中,提醒模块40,可用于获取所述预设任务执行周期内的任务数、执行次数以及执行时间;In a specific application scenario, the reminder module 40 can be used to obtain the number of tasks, the number of executions, and the execution time within the preset task execution cycle;
计算所有所述数据采集程序对应的预设任务执行周期的超周期,以及根据所述执行次数以及所述执行时间计算所述数据采集程序对应的截止期限;Calculate the super period of the preset task execution cycles corresponding to all the data collection programs, and calculate the deadline corresponding to the data collection program based on the number of executions and the execution time;
将所述预设任务执行周期和所述超周期代入任务数计算公式,得到所述超周期内的最大任务数,以及将所述预设任务执行周期以及所述执行时间代入负载计算公式,得到系统负载;Substitute the preset task execution period and the super period into the task number calculation formula to obtain the maximum number of tasks within the super period, and substitute the preset task execution period and the execution time into the load calculation formula to obtain system load;
若满足所述任务数大于所述最大任务数、所述系统负载大于1、所述执行时间大于截止期限中的至少一项,则确定所述设任务执行周期不合理。If at least one of the following: the number of tasks is greater than the maximum number of tasks, the system load is greater than 1, and the execution time is greater than the deadline, it is determined that the task execution cycle is unreasonable.
需要说明的是,本实施例提供的一种可应用于边缘节点侧的抽帧任务调度装置所涉及各功能单元的其它相应描述,可以参考图1中方法的对应描述,在此不再赘述。It should be noted that for other corresponding descriptions of the functional units involved in the frame extraction task scheduling device applicable to the edge node side provided in this embodiment, please refer to the corresponding description of the method in Figure 1 and will not be described again here.
基于上述如图1和图2所示方法,相应的,本公开还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述如图1和图2所示的方法。Based on the above-mentioned methods shown in Figures 1 and 2, correspondingly, the present disclosure also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the above-mentioned methods shown in Figures 1 and 2 are implemented. The method shown in 2.
基于这样的理解,本公开的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施场景的方法。Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several Instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present disclosure.
基于上述如图1和图2所示的方法,以及图5所示的虚拟装置实施例,为了实现上述目的,本公开实施例还提供了一种电子设备,可配置在车辆(如电动汽车)端侧,该设备包括存储介质和处理器;存储介质,用于存储计算机程序;处理器,用于执行计算机程序以实现上述如图1和图2所示的方法。Based on the above methods shown in Figures 1 and 2, and the virtual device embodiment shown in Figure 5, in order to achieve the above purpose, embodiments of the present disclosure also provide an electronic device that can be configured in a vehicle (such as an electric vehicle) On the end side, the device includes a storage medium and a processor; the storage medium is used to store a computer program; and the processor is used to execute the computer program to implement the above-mentioned methods shown in Figures 1 and 2.
可选的,上述实体设备还可以包括用户接口、网络接口、摄像头、射频(RadioFrequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)等。Optionally, the above-mentioned physical devices may also include user interfaces, network interfaces, cameras, radio frequency (Radio Frequency, RF) circuits, sensors, audio circuits, WI-FI modules, etc. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc. The optional user interface may also include a USB interface, a card reader interface, etc. Optional network interfaces may include standard wired interfaces, wireless interfaces (such as WI-FI interfaces), etc.
本领域技术人员可以理解,本公开提供的上述实体设备结构并不构成对该实体设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the above-mentioned physical device structure provided by the present disclosure does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.
存储介质中还可以包括操作系统、网络通信模块。操作系统是管理上述实体设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块用于实现存储介质内部各组件之间的通信,以及与信息处理实体设备中其它硬件和软件之间通信。The storage medium may also include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the above-mentioned physical devices and supports the operation of information processing programs and other software and/or programs. The network communication module is used to realize communication between components within the storage medium, as well as communication with other hardware and software in the information processing physical device.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本公开可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。本公开提供的数据同步方法、装置及电子设备,与现有技术相比,本公开通过定时获取预设配置信息,预设配置信息包括:周期启动任务时间、预设处理规则;在当前时间等于周期启动任务时间时,利用多个数据采集程序,按照对应的执行参数在各个数据源中采集初始数据;利用数据处理程序对初始数据按照预设处理规则处理,得到处理后的目标数据。通过这种方式,既同时支持从多种数据源获取数据,又可以灵活设置同步时间或控制数据同步频率。Through the above description of the embodiments, those skilled in the art can clearly understand that the present disclosure can be implemented by means of software plus a necessary general hardware platform, or can also be implemented by hardware. Compared with the existing technology, the data synchronization method, device and electronic equipment provided by the present disclosure obtain preset configuration information through timing. The preset configuration information includes: periodic startup task time, preset processing rules; at the current time, it is equal to When the task is started in a cycle, multiple data collection programs are used to collect initial data from each data source according to the corresponding execution parameters; the data processing program is used to process the initial data according to preset processing rules to obtain the processed target data. In this way, it not only supports obtaining data from multiple data sources at the same time, but also can flexibly set the synchronization time or control the data synchronization frequency.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Furthermore, the term "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus including a list of elements includes not only those elements but also other elements not expressly listed, Or it also includes elements inherent to the process, method, article or equipment. Without further limitation, an element qualified by the statement "comprises a..." does not exclude the presence of additional identical elements in the process, method, article, or device that includes the element.
以上仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above are only specific embodiments of the present disclosure, enabling those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure is not to be limited to the embodiments herein but is to be accorded the widest scope consistent with the principles and novel features claimed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311605112.5A CN117573777A (en) | 2023-11-28 | 2023-11-28 | Data synchronization method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311605112.5A CN117573777A (en) | 2023-11-28 | 2023-11-28 | Data synchronization method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117573777A true CN117573777A (en) | 2024-02-20 |
Family
ID=89893461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311605112.5A Pending CN117573777A (en) | 2023-11-28 | 2023-11-28 | Data synchronization method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117573777A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119202083A (en) * | 2024-09-10 | 2024-12-27 | 中电信人工智能科技(北京)有限公司 | Multimodal data synchronization method, device, collection system and electronic equipment |
-
2023
- 2023-11-28 CN CN202311605112.5A patent/CN117573777A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119202083A (en) * | 2024-09-10 | 2024-12-27 | 中电信人工智能科技(北京)有限公司 | Multimodal data synchronization method, device, collection system and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102130950B (en) | Distributed Monitoring Method Based on Hadoop Cluster | |
CN110532078A (en) | A kind of edge calculations method for optimizing scheduling and system | |
CN115373835A (en) | Task resource adjusting method and device for Flink cluster and electronic equipment | |
CN109766194B (en) | Method and system for realizing low-coupling plan task component based on message | |
CN106325984B (en) | Big data task scheduling device | |
CN112445598A (en) | Task scheduling method and device based on quartz, electronic equipment and medium | |
CN113760638B (en) | A log service method and device based on kubernetes cluster | |
CN113918288B (en) | Task processing method, device, server and storage medium | |
CN117573777A (en) | Data synchronization method, device and electronic equipment | |
KR101770736B1 (en) | Method for reducing power consumption of system software using query scheduling of application and apparatus for reducing power consumption using said method | |
US9733997B2 (en) | Event management method and distributed system | |
CN110704851A (en) | Public cloud data processing method and device | |
CN111611479B (en) | Data processing method and related device for network resource recommendation | |
CN110990227B (en) | Numerical pool application characteristic performance acquisition and monitoring system and operation method thereof | |
CN113641472A (en) | Method and device for realizing heterogeneity, transformation and synchronization of distributed applications | |
CN118093126A (en) | Task scheduling method, device, system, server and storage medium | |
CN115185683A (en) | Cloud platform stream processing resource allocation method based on dynamic optimization model | |
CN114428671A (en) | Data processing method, data processing device, electronic device and storage medium | |
Ouyang et al. | An approach for modeling and ranking node-level stragglers in cloud datacenters | |
CN111861012A (en) | A test task execution time prediction method and optimal execution node selection method | |
CN112486683B (en) | Processor control method, control device, and computer-readable storage medium | |
CN112783613B (en) | Method and device for scheduling units | |
US20230004322A1 (en) | Managing provenance information for data processing pipelines | |
CN108920722B (en) | Parameter configuration method and device and computer storage medium | |
CN111782482B (en) | Interface pressure testing method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |