Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 55
3.10 Workload Managers: SLURM/LSF
ОглавлениеWorking on shared computing clusters has become commonplace in contemporary data science applications. Some working knowledge of workload managing programs (aka schedulers) is essential to running statistical software in these environments. Two popular workload managers are SLURM (https://slurm.schedmd.com/documentation.html) and IBM's platform load sharing facility (LSF), another popular workload management platform for distributed high‐performance computing. These schedulers can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. A user would typically interface with a scheduling program via a command line tool or through a scripting language. The user specifies the hardware resources and program inputs. The scheduler then distributes the work across resources, and jobs are run based on system‐prioritization schemes. In such a way, hundreds or even thousands of programs can be run in parallel, increasing the scale of statistical computations possible within a reasonable time frame. For example, simulations for a novel statistical method could require many thousands of runs at various configurations, and this could be done in days rather than months.