Joke Collection Website - Public benefit messages - Distributed scheduled task scheduling framework practice

Distributed scheduled task scheduling framework practice

The distributed task scheduling framework is an essential tool for almost every large-scale application. This article introduces the demand background and pain points of the use of the task scheduling framework, and conducts a review of the use of the open source distributed task scheduling framework commonly used in the industry. In order to explore the practice, and analyze the advantages and disadvantages of these frameworks and thoughts on their own business.

1. Business background

1.1 Why it is necessary to use scheduled task scheduling

(1) Time-driven processing scenario: Send coupons on the hour, update revenue every day, Tag data and crowd data are refreshed daily.

(2) Batch data processing: Batch statistical report data on a monthly basis, update SMS status in batches, and the real-time requirements are not high.

(3) Asynchronous execution decoupling: Activity status refresh, asynchronous execution of offline query, decoupling from internal logic.

1.2 Usage requirements and pain points

(1) Task execution monitoring and alarm capabilities.

(2) Tasks can be flexibly and dynamically configured without restarting.

(3) Transparent business, low coupling, streamlined configuration, and easy development.

(4) Easy to test.

(5) High availability, no single point of failure.

(6) Tasks cannot be executed repeatedly to prevent logic anomalies.

(7) Distributed parallel processing capabilities for large tasks.

2. Practice and exploration of open source frameworks

2.1 Java native Timer and

ScheduledExecutorService

2.1.1 Timer usage

Timer defects:

Due to the above defects, try not to use Timer. The idea will also clearly prompt you to use ScheduledThreadPoolExecutor instead of Timer.

2.1.2 Use of ScheduledExecutorService

ScheduledExecutorService has repaired the defects of Timer. First, the internal implementation of ScheduledExecutorService is ScheduledThreadPool thread pool, which can support the concurrent execution of multiple tasks.

If an exception occurs in a task executed by a certain thread, it will also be handled and will not affect the execution of other thread tasks. In addition, ScheduledExecutorService is based on time interval delay, and the execution will not change due to changes in system time.

Of course, ScheduledExecutorService also has its own limitations: it can only be scheduled based on the delay of the task, and cannot meet the needs of absolute time and calendar scheduling.

2.2 Spring Task

2.2.1 Spring Task usage

Spring task is a lightweight scheduled task framework independently developed by spring and does not need to rely on other additional package, the configuration is relatively simple.

Annotation configuration is used here

2.2.2 Spring Task defects

Spring Task itself does not support persistence, and there is no official distributed cluster mode. It can only be implemented manually by developers in business applications, which cannot meet the needs of visualization and easy configuration.

2.3 The forever classic Quartz

2.3.1 Basic introduction

The Quartz framework is the most famous open source task scheduling tool in the Java field and is currently the de facto timing tool. Task standards, almost all open source scheduled task frameworks are built based on Quartz core scheduling.

2.3.2 Principle Analysis

Core components and architecture

Key concepts

(1) Scheduler: task scheduler, is the execution Task scheduling controller. It is essentially a planning and scheduling container that registers all Triggers and corresponding JobDetails. It uses the thread pool as the basic component for task running to improve task execution efficiency.

(2) Trigger: Trigger is used to define the time rules of task scheduling and tell the task scheduler when to trigger the task. CronTrigger is a powerful trigger built based on cron expressions.

(3) Calendar: A collection of specific time points in the calendar. A trigger can contain multiple Calendars, which can be used to exclude or include certain time points.

(4) JobDetail: It is an executable job, used to describe the Job implementation class and other related static information, such as the name of the Job, listeners and other related information.

(5) Job: Task execution interface, with only one execute method, used to execute real business logic.

(6) JobStore: Task storage methods, mainly including RAMJobStore and JDBCJobStore. RAMJobStore is stored in the memory of the JVM and has the risk of loss and quantity limitation. JDBCJobStore persists task information into the database. , supports clustering.

2.3.3 Practical instructions

(1) About the basic use of Quartz

(2) Business use must meet the requirements of dynamic modification and restart without loss, which is generally required Save using database.

(3) Componentization

(4) Extension

2.3.4 Defects and deficiencies

(1) Task information needs to be Persistence to the business data table, coupled with the business.

(2) Scheduling logic and execution logic coexist in the same project. When machine performance is fixed, business and scheduling will inevitably affect each other.

(3) In quartz cluster mode, tasks are only acquired through database exclusive locks, and task execution does not implement a complete load balancing mechanism.

2.4 Lightweight artifact XXL-JOB

2.4.1 Basic introduction

XXL-JOB is a lightweight distributed task scheduling platform with main features It is platform-based, easy to deploy, fast to develop, simple to learn, lightweight, and easy to expand, and the code is still being continuously updated.

It mainly provides several functional modules such as dynamic configuration management of tasks, task monitoring and statistical reports, and scheduling logs. It supports multiple operating modes and routing strategies, and can be simply sharded based on the number of corresponding executor machine clusters. Data processing.

2.4.2 Principle Analysis

Before version 2.1.0, the core scheduling module was based on the quartz framework. Version 2.1.0 began to develop self-developed scheduling components, removing quartz dependencies, and usage time. Rotation.

2.4.3 Practical instructions

Please refer to the official documentation for detailed configuration and introduction.

2.4.3.1 demo uses:

@JobHandler(value="offlineTaskJobHandler"), just implement the business logic. (Note: Dubbo is introduced this time, which will be introduced later).

(Swipe to view)

Example 2: Shard broadcast task.

(Swipe to view)

2.4.3.2 Integrate dubbo

(1) Introduce dubbo-spring-boot-starter and business facade jar package dependencies.

(Swipe to view)

(2) The configuration file is added to dubbo consumer configuration (multiple configuration files can be defined according to the environment and switched through profile).

(Swipe to view)

(3) Just inject the facade interface through @Reference in the code.

(Swipe to view)

(4) Add the @EnableDubboConfiguration annotation to the startup program.

(Swipe to view)

2.4.4 Task visual configuration

Built-in platform projects facilitate developers to manage tasks and monitor execution logs , and provides some functions that facilitate testing.

2.4.5 Extensions

(1) Optimization of task monitoring and reports.

(2) Expansion of task alarm methods, such as adding alarm center, providing internal information and SMS alarms.

(3) Different monitoring alarms and retry strategies for abnormal situations in actual business internal execution.

2.5 Highly available Elastic-Job

2.5.1 Basic introduction

Elastic-Job is a distributed scheduling solution consisting of two independent sub-systems. It consists of projects Elastic-Job-Lite and Elastic-Job-Cloud.

Elastic-Job-Lite is positioned as a lightweight, decentralized solution that provides distributed task coordination services in the form of jar packages.

Elastic-Job-Cloud uses Mesos + Docker solutions to provide additional services such as resource management, application distribution, and process isolation.

Unfortunately, there has been no iteration update record for two years.

2.5.2 Principle analysis

2.5.3 Practical instructions

2.5.3.1 Demo use

(1) Install zookeeper and configure Registration center config, the configuration file is added to the registration center zk configuration.

(Slide to view)

(Slide to view)

(2) Configure the data source config and add the data source configuration to the configuration file.

(Slide to view)

(Slide to view)

(3) Configure event config.

(Swipe to view)

(4) In order to facilitate the flexible configuration of different task triggering events, add the ElasticSimpleJob annotation.

(Swipe to view)

(5) Initialize the configuration.

(Swipe to view)

(6) Implement the SimpleJob interface, integrate dubbo according to the method above, and complete the business logic.

(Swipe to view)

2.6 Other open source frameworks

(1) Saturn: Saturn is a distributed task scheduling platform open sourced by Vipshop. Modified on the basis of Elastic Job.

(2) SIA-TASK: It is CreditEase’s open source distributed task scheduling platform.

3. Comparison of advantages and disadvantages and thinking on business scenario adaptation

Business thinking:

4. Conclusion

It is not special for concurrent scenarios For high-end systems, xxl-job configuration and deployment is simple and easy to use. It does not require the introduction of redundant components. It also provides a visual console, which is very user-friendly and is a better choice. For systems that hope to directly utilize the capabilities of open source distributed frameworks, it is recommended that you make appropriate selections based on your own circumstances.

Attached: References

High availability architecture

Changing the way the Internet is built