CN114610278A

CN114610278A - Layered streaming processing framework system and processing method

Info

Publication number: CN114610278A
Application number: CN202210155443.2A
Authority: CN
Inventors: 陈昌桂
Original assignee: Suning Consumer Finance Co ltd
Current assignee: Suning Consumer Finance Co ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-06-10

Abstract

A hierarchical stream processing frame system and a processing method thereof comprise a database, a data access layer, a service logic layer and a presentation layer which are in a hierarchical structure from bottom to top, wherein the data access layer accesses and operates the database, the service logic layer is responsible for operating the data layer and logically processing data services, and the presentation layer is responsible for content display; the business logic layer is divided into a main service and a plurality of sub-services, the main service is a production line for showing the processing flow of the sub-services, and each sub-service is a station on the production line and is used for processing specific business. The invention provides a service layering and streaming processing framework and a method, which can improve service performance, reduce time consumption and effectively manage thread-level shared data and distributed shared data.

Description

Layered streaming processing framework system and processing method

Technical Field

The invention belongs to the field of service processing, and particularly relates to a layered streaming processing framework system and a processing method.

Background

Today, the daily transaction amount of a financial system is generally millions, and the calling amount of the service is also generally millions. Many services such as credit granting, etc. usually adopt an asynchronous processing mode because the service is complex and takes a long time, and then accept the service first, then process the service through an asynchronous task, and then call back or wait for a result query. In the process of task processing, overtime or abnormal conditions such as network communication overtime, long service time, need of manual review, network jitter and the like may occur, and if the conditions are met, the task needs to be rerun, that is, the task is scheduled again.

However, when a task is repeatedly scheduled, the following problems may occur:

1) asynchronous task codes are complex and time-consuming, the codes are easy to write and are bloated, and logic is unclear.

2) Some codes only need to be executed once successfully in the current task, and subsequent repeated scheduling does not need to be executed again. When a certain piece of code is repeatedly executed without being executed again, the following problems may arise: prolonging the service consumption time; performance loss of network communication, DB, etc.; if a system requests the same serial number again, the system should support idempotent, but the setting is failed, and the whole link fails. These are problems that need to be avoided.

3) How large the dead time should be set for distributed caching that occurs in a task is also a matter of consideration. If the setting is too small, the task may fail when running again; if the setting is too large, the resources are occupied, and it is difficult to know how long the service needs to be, in short, how large the setting is without reference; however, if the timeout is not set, it becomes permanent garbage, which is also not suitable.

4) Data needs to be transferred among the sub-services, and how to reduce the damage of the encapsulation while carrying out effective transfer is a problem to be solved.

5) How to achieve non-invasion and more intuition in the processes of log query and link tracking of sub-services is also a current situation which needs to be improved urgently in the prior art.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a layered streaming processing framework system and a processing method. The invention aims to provide a service layering and streaming processing idea, which aims to improve service performance, reduce time consumption and manage thread-level shared data and distributed shared data.

In order to achieve the purpose, the invention adopts the following technical scheme:

a layered streaming framework system, characterized by: the system comprises a database, a data access layer, a service logic layer and a presentation layer, wherein the database, the data access layer, the service logic layer and the presentation layer are in a hierarchical structure from bottom to top; the business logic layer is divided into a main service and a plurality of sub-services, the main service is a production line for showing the processing flow of the sub-services, and each sub-service is a station on the production line and is used for processing specific business.

In order to optimize the technical scheme, the specific measures adopted further comprise:

the system further comprises a registration center and a distributed cache, wherein the registration center stores pipeline information and site information, and the distributed cache is used for caching the service data generated in the site.

Further, the pipeline information comprises pipeline codes, timeout time of the pipeline and a station set which has successfully executed; the site information comprises a site execution state, the overtime time of the site and the key information cached in the site to the distributed cache.

Further, the registry selects either Redis or Zookeper.

Furthermore, the caching of the service data in the site is divided into thread local shared data and distributed shared data.

Furthermore, the thread local shared data is transmitted through a thread local variable, and if the pipeline needs to be rerun, the thread local shared data is locally increased to the distributed cache from the thread; during the rerun, the thread local shared data falls back to the thread local from the distributed cache; when a rerun is not required, the thread local shared data does not trigger a rise/fall back.

Further, the distributed shared data is stored in a distributed cache, and cached key information is sent to a registration center for overtime management and control; the whole production line uniformly sets the overtime time, and the overtime time can be automatically prolonged according to the actual service time consumption; when the service runs, the distributed cache never fails; if the service is successful or fails, the distributed cache fails.

The invention also provides a hierarchical flow type processing method based on the frame system, which is characterized by comprising the following steps:

the task is scheduled, the unique number of the assembly line is determined according to the service serial number annotated by the main service, and the site name of the unique identification site is determined according to the annotation method of the site in the sub-service;

the method comprises the steps that a site starts processing, whether pipeline information is acquired or not is judged, and if yes, whether the site executes a service or not is judged; if not, dissociating the station, and attaching the station to the thread; when the service is judged to be executed by the site, if the service is executed by the site, the site skips the execution and refreshes the overtime of the site, and if the service is not executed by the site, the service is normally executed; when the service is normally executed, the executed data is stored in a thread local sharing and distributed sharing mode, and a service processing result is returned.

Further, the service processing result includes four types of SUCCESS, SKIP, AGAIN, and QUIT:

when the service processing result is SUCCESS, the execution of the station is successful, the overtime time is set for the cache data generated by the station, and the station is stored in the production line; when the pipeline runs again, if the site is found to exist, the site is skipped;

when the service processing result is SKIP, the retry is infinitely repeated, the cache data of the site is removed, and the site is finished and waits to be scheduled again;

when the service processing result is AGAIN, indicating limited retry, removing the cache data of the site, ending the site, and waiting to be scheduled AGAIN;

when the service processing result is QUIT, indicating that the execution of the site fails, and storing the site in a production line; when the pipeline runs again, if the local station is found, the station is skipped.

Further, when the service processing result is SUCCESS or QUIT, after the site is stored in the pipeline, whether the total time consumption of service processing is over half is judged, if yes, the timeout time of the pipeline is increased, and the cache data of the pipeline and the site are refreshed; if not, the cache data of the site is refreshed.

The invention has the beneficial effects that:

1) the invention layers the service layer, reduces the coupling between the code blocks, and the code is clear and easy to maintain. Dividing the lengthy code in the main service into a section of relatively isolated code according to the service, and then regarding each section as a sub-service to carry out service layering.

2) The site set by the invention can be cached and can be skipped when the rerun is carried out. The execution result of the site is stored in the registration center, and the site which has been successfully executed can be skipped when the site runs again. The processing time of the whole assembly line is greatly shortened; the processing performance of the service is improved; the server is depressurized.

3) The invention designs distributed cache timeout management and control. The cached overtime time is refreshed uniformly after each site is executed; when the service is running again, the service is refreshed automatically; when the discovery time is too short, the timeout time is automatically prolonged. The setting of the timeout time is not required to be worried about, and whether the cache is invalid or not is not required to be worried about. According to the timeout time set by the @ WorkLine annotation, the timeout time cached in each site is set in a unified mode, and the difference value between the timeout times set by a production line in sequence is reduced. The default growth factor is 1.5, and when the overtime is perceived to be too short, the overtime is prolonged by the growth factor.

4) The invention adopts thread-level data sharing, and realizes data sharing among all sites through the packaged thread local variables.

5) The invention defaults to adding distributed locks to the pipeline. According to the serial number, only one pipeline is running by default.

6) The invention can set the current limit for the pipeline or the station. And the flow limitation can be set for the site of each sub-service, so that the layered flow limitation of the services can be realized, and certain scenes are very suitable. Through current limiting, the processing capacity and stability of a certain station can be better ensured.

7) The data cache of the invention can automatically switch between thread sharing and distributed sharing and automatically trigger the rising/falling.

8) The invention can realize the link tracking of log management and service. The processing log of each site can be seen, and the processing condition of a certain pipeline can be tracked.

Drawings

FIG. 1 is a service hierarchy diagram of the present invention.

FIG. 2 is an architecture diagram of the pipeline and stations of the present invention.

Fig. 3 is a storage structure diagram of the registry of the present invention.

FIG. 4 is a flow chart of the site caching of the present invention.

Fig. 5 is a flow chart of the site execution of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

The invention provides a layered streaming processing framework system and a processing method, wherein in the actual operation, a task line is divided into a section of relatively isolated codes, and then each section is regarded as a sub-service to carry out service layering. The task line only displays the codes of the service framework and can be regarded as a production line; the specific business processing is executed by the sub-services, and can be seen as a site. The invention can be effectively applied to the service processing fields of finance and the like, such as: the credit application service is successfully accepted, then is subjected to internal approval (sub-services such as agreement signing, OCR, wind control and the like), and finally returns the success or failure of the application.

Next, the hierarchical flow processing framework system and the processing method provided by the present invention are explained in terms of service layering, churn processing, registry, site data caching, and architecture design.

Service layering

The invention discloses an idea of service layered streaming processing. The three-layer architecture or the MVC architecture is to reduce the coupling degree between the system modules. The invention aims to further perform layered decoupling on the service layer and reduce the coupling degree of the service layer. The services are layered into a main Service (Service) and a sub Service (Service). The main service is regarded as a pipeline (WorkLine) and manages the operation of the sub-services; sub-services are considered as individual stations (stations) on the pipeline, handling specific traffic.

The traditional three-layer architecture is a presentation layer, a business logic layer and a data access layer, and the invention divides the business logic layer into a main service and N sub-services, namely a service multi-layer architecture diagram, as shown in figure 1.

Secondly, stream processing: pipeline and station

The main service can be regarded as a production line and mainly shows the processing flow of the sub-services; the sub-services can be seen as individual sites on the pipeline, handle specific traffic, and can be nested. The stations are in pipelined form, and are suitable for use with either synchronous or asynchronous services. The synchronization method can make the structure clear, thread-level data sharing, distributed cache timeout control, log link tracking and the like; an asynchronous method, namely task repeated scheduling, can also make the structure clear, thread-level data sharing, distributed cache timeout control, log link tracking, site caching and the like. As shown in particular in fig. 2.

Third, registration center

The pipeline information and the site information are stored in a registry, and the registry can select Redis or Zookeeper (the registry adopts Redis to say in this part). The pipeline information contains information such as the encoding of the pipeline, timeout time, and the like, and the station set that has successfully executed. The site information comprises a site execution state, timeout time and key information cached to Redis in the site, and the keys cached in the distributed mode are mainly used for timeout control. The structure of the registry is shown in fig. 3.

Fourth, site data caching

1) Site caching

Sites that have already been executed may skip execution at the next compensation (mission rerun). If the station execution is successful, the process is skipped in the subsequent scheduling. The existing method is to set a temporary flag field in the database, but this results in too much intrusion, too many sub-services and too redundant fields. The idea of site caching is shown in fig. 4, if a site is not executed, a service is executed; otherwise, execution is skipped.

2) Data sharing for sites

Caching of traffic data may occur within a site. The method is divided into two categories:

the thread shares data locally and transmits the data through a thread local variable. If the rerun is required, the data is locally risen to the distributed cache from the thread. During the re-run, data is dropped from the distributed cache back to the thread local. If no rerun is needed, the data does not trigger the rising/falling;

secondly, the distributed shared data is stored in a distributed cache redis, the cached key is given to the frame, and the frame is subjected to overtime control: the timeout time is uniformly set in the whole production line, and can be automatically prolonged according to actual service time consumption. As long as the service runs, the cache never fails; if the service is successful or fails, the cache fails. Therefore, developers do not need to worry about setting the timeout time, and do not need to worry about whether the cache is invalid when using the cache.

Fifth, the structure design

1) Adding dependencies

2) Pipeline annotation

enableLock: if distributed locks are supported, it is default that only one pipeline can be executed for a single serial number.

id: a unique number of pipelines.

expireTime: timeout, representing the timeout time of the entire pipeline, unit: and second.

3) Site annotation

id: when the pipeline does not exist, namely the pipeline annotated by a certain @ WorkLine does not enter the site, the id is valid; if the pipeline exists and the id is not empty, the current site is used as the unique number of the pipeline, and the current site is attached to the pipeline represented by the id; if the pipeline exists and the id is empty, the current site is a free site and depends on the current thread.

And (4) station: the name of the site. The station is uniquely identified, and stations on a pipeline should be different. The method name of the site is used by default.

expireTime: the station timeout time. The timeout time of the pipeline in which it is used is default. If the station timeout time is set to be greater than the pipeline timeout time, then the timeout time for the entire pipeline uses this. The timeout time of the pipeline is guaranteed to be the maximum value of the timeout time of each station.

The following describes how the above problems can be solved with the present invention in a more complex embodiment. The embodiment has complex service, consumes long time, needs asynchronous task processing, and has various scenes such as exception, failure, timeout, retry and the like, which is specifically as follows:

the task is scheduled, namely the task enters an audio task method of @ WorkLine annotation in a main service QuotaApplyServiceImpl, and the unique code of the pipeline is determined according to the attribute id of the annotation, wherein the id is a service serial number (such as a credit application serial number). Then enter the method of @ Station annotation in the sub-service, QuotaApplySubServiceImpl. The station includes: protocol signature, OCR information storage, signature state inquiry, credit information storage, image data uploading and client loan information transmission.

The pseudo code is as follows:

each site handles specific tasks such as adding @ Station annotation to the signstation protocols method. The pseudo code is as follows:

the stations process respective service codes and are relatively decoupled. Data sharing among sites, if the data sharing is thread level sharing, a thread local variable tool packaged by a frame can be used; if the distributed sharing exists, the key is stored in the distributed cache, and the key is only required to be stored in the keysInStation, which indicates that the key is cached under the current site, and the key is given to the frame for timeout control.

After the site service is executed, the returned result may be SUCCESS, SKIP, AGAIN, QUIT. The specific process executed by the station is as shown in fig. 5:

1) when the result is SUCCESS, the state is terminal, which indicates that the station successfully executes, a timeout time is set for the cache data generated by the station, and the station is jammed in the pipeline, which indicates that the station successfully executes. When the pipeline runs again, if the site is found to exist, the site is skipped;

2) when the number is SKIP, the system indicates unlimited retries, removes the cache of the station, and waits to be scheduled again after the station is finished;

3) when the number of the AGAIN is AGAIN, limited retry is indicated, the cache of the station is removed, the station is finished, and the station waits to be scheduled AGAIN;

4) when QUIT, the termination state indicates that the station failed execution and the station is stuck in the pipeline. When the pipeline is restarted, the station is found to exist, and the station is skipped.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention may be apparent to those skilled in the relevant art and are intended to be within the scope of the present invention.

Claims

1. A layered streaming framework system, characterized by: the system comprises a database, a data access layer, a service logic layer and a presentation layer, wherein the database, the data access layer, the service logic layer and the presentation layer are in a hierarchical structure from bottom to top; the business logic layer is divided into a main service and a plurality of sub-services, the main service is a production line for showing the processing flow of the sub-services, and each sub-service is a station on the production line and is used for processing specific business.

2. The layered streaming framework system of claim 1, wherein: the system further comprises a registration center and a distributed cache, wherein the registration center stores pipeline information and site information, and the distributed cache is used for caching the service data generated in the site.

3. A layered streaming framework system in accordance with claim 2, wherein: the pipeline information comprises pipeline codes, timeout time of the pipeline and a station set which is successfully executed; the site information comprises a site execution state, the overtime time of the site and the key information cached in the site to the distributed cache.

4. A layered streaming framework system in accordance with claim 2, wherein: the registry selects either Redis or Zookeeper.

5. A layered streaming framework system in accordance with claim 2, wherein: and caching the service data in the site, wherein the caching is divided into thread local shared data and distributed shared data.

6. The layered streaming framework system of claim 5, wherein: the thread local shared data is transmitted through a thread local variable, and if the pipeline needs to be rerun, the thread local shared data is locally increased to a distributed cache from the thread; during the rerun, the thread local shared data falls back to the thread local from the distributed cache; when a rerun is not required, the thread local shared data does not trigger a rise/fall back.

7. The layered streaming framework system of claim 5, wherein: the distributed shared data is stored in a distributed cache, and the cached key information is sent to a registration center for overtime control; the whole production line uniformly sets the overtime time, and the overtime time can be automatically prolonged according to the actual service time consumption; when the service runs, the distributed cache never fails; if the service is successful or fails, the distributed cache fails.

8. A hierarchical flow processing method based on the framework system of any of claims 1-7, characterized in that:

the method comprises the steps that a site starts processing, whether pipeline information is acquired or not is judged, and if yes, whether the site executes a service or not is judged; if not, dissociating the station, and attaching the station to the thread; when the station judges whether the service is executed or not, if the station executes the service, the station skips executing and refreshes the overtime time of the station, and if the station does not execute the service, the service is normally executed; when the service is normally executed, the executed data is stored in a thread local sharing and distributed sharing mode, and a service processing result is returned.

9. The hierarchical streaming approach of claim 8, wherein: the service processing result comprises four types of SUCCESS, SKIP, AGAIN and QUIT:

10. The hierarchical streaming approach of claim 9, wherein: when the service processing result is SUCCESS or QUIT, after the station is stored in the pipeline, judging whether the total time consumption of service processing is over half, if so, increasing the overtime time of the pipeline, and refreshing the cache data of the pipeline and the station; if not, the cache data of the site is refreshed.