IPM-Scheduler is short for Impala Pool Memory Scheduler, which can adjust memory allcation of each pool dynamically. It is based on historical query data from impala within a certain period of time. Through intelligently adjusting pool resource allocation, IPM-Scheduler can effectively reduce the waiting time of queries, and improve memory utilization. According to our scenario, the total waiting time is reduced by 70%-80%.
- Impala ( >=impala-2.5.0+cdh5.7.2 )
- Cloudera Manager ( >=cdh5.7.2 )
- Python3 ( >= 3.5 )
Important: Testing OK on CDH 5.7.2 and 5.12.1, other versions are not guaranteed to be available.
$ git clone https://github.com/gridsum/IPM-scheduler.git
$ cd IPM-scheduler
$ python3 setup.py install
$ echo "export SCHEDULER_HOME=\`pwd\`" >> ~/.bashrc
- You must edit the config of cloudera_manager and pool in the config file. In addition, you can refer to the config instructions to modify other config items as needed.
- Start daemon:
$ ./bin/scheduler_daemon.sh start
- Stop daemon:
$ ./bin/scheduler_daemon.sh stop
The scheduling principle is mainly as follows:
- First, crawls a certain period of time historical query information from Cloudera Manager.
- Then, generates a memory resource allocation plan for each pool according to scheduler config, impala config and historical query information.
- Finally, executes the memory resource allocation plan by modifying the impala config through Cloudera Manager.
schedule_module_name: 'scheduler'
schedule_py_name: 'example_schedule'`
schedule_class_name: 'DoNothing1Schedule'
-
Edit the config file
-
edit config items about email
-
edit config item:
enable_schedule_report: true
- Backup impala config:
$ ./bin/scheduler_utils.sh backup
- Rollback impala config:
$ ./bin/scheduler_utils.sh rollback
- Check the scheduler config file:
$ ./bin/scheduler_utils.sh check
impala-toolbox-help@gridsum.com
IPM-Scheduler is licensed under the Apache License 2.0.