HPCC Fair-Share Scheduling

The principle of fair-share scheduling is basically that jobs for accounts with less recent usage totals are scheduled to start more quickly than ones for accounts that have accumulated larger recent usage totals. This allows usage to balance out among multiple accounts, and favors job starts for users who have not been running a lot recently over those from accounts that have large-scale continuous usage. The more usage from a particular account accumulates, the more the start times for jobs from this account will be slowed down compared to other submissions from accounts that have not been running such heavy workloads recently.

Basic Algorithm

Our cluster uses the "fair tree" algorithm explained for example on this page on the SchedMD site. A short summary of that page is that there is always an expected start time that depends on recent usage compared with others, the size of the job request in terms of cpu cores and nodes, memory, estimated time to complete and other factors as explained in that link. You can also consult our online Job Submission Guide for further details for job submissions to our cluster.

Factors That Can Affect Scheduling

There are several factors that affect scheduling. The best approach to follow is to specify only the amount of time that you realistically expect each job to take, and not more (with a small buffer of course if execution time is variable), and similarly, to request only the amount of memory that you expect each job will need to execute without running out of memory. Reducing these values to the minimums needed for successful execution will improve the ability of the scheduler to fit each job into available projected slots, after factoring in the fair-share weighting mentioned above.

If the cluster is fully utilized, which is the normal running condition, any new job submission or interactive session request will wait for a suitably-sized job queue slot to open up. For this to happen, another of equivalent size must be freed by one or more other jobs ending, with a wait time that depends on the size and other parameters of the new request. This accounts for the time before start for a job, and is a good thing in general, because jobs that start instantly can only happen if the cluster is underutilized and has idle cores and nodes not being used.

Optimizing Submissions

To minimize this wait time, you should only request the number of cores and nodes that your job needs and will fully utilize to execute. Leaving cores unused, for example by requesting more than the job will utilize, simply wastes the cpu time assigned to the job and slows down work for all users of the cluster, and requesting more time or more memory than required will slow down the ability of the scheduler to assign the requested resources to the job. The default run-time limit is 48 wall-clock hours in most cases, but the scheduler can sometimes fit shorter jobs into the optimization that it performs, so specifying a shorter run time if your job needs less time to complete its work will often result in it being scheduled to start more quickly.

It is also useful to pay attention to the availability of cores within each partition. The balance of available nodes and cores can vary a lot from day to day or even wihtin a given day. You can sometimes speed up overall throughput by submitting jobs in more than one partition, if your code can run with reasonable parameters in each of them.

Purchasing Priority Access

Some research groups have large-scale computing needs for which purchased access to top priority for job submissions and removal of the wall-clock time limit is the best approach for getting a lot of work done in the minimum time. For informatiion on how to purchase priority access and/or dedicated storage, please consult this page.

Please feel free to contact hpccsupport@ttu.edu if you need further information about job scheduling or purchased access.

High Performance Computing Center

Phone
806.742.4350
Email
hpccsupport@ttu.edu