Slurm
Open-source cluster management and job scheduling system with high fault tolerance and scalability. Provides user-friendly, flexible job scheduling for efficient management of large-scale computing nodes.
OpenPBS
High-speed, scalable, secure, and resilient job scheduling system supporting modern infrastructure, middleware, and applications. Optimizes job turnaround time and efficiently manages geographically distributed resources.
SGE
Automates job scheduling and execution with flexible scheduling, log tracking, troubleshooting, auto-retry, and scalable resource management.
OpenMPI
Enables high-performance distributed computing by utilizing multiple computers for fast and efficient processing. Highly scalable and fault-tolerant with flexible configurations for various applications.
MPICH
Provides high reliability and performance for inter-process communication. Works across various platforms and architectures with flexible settings, backed by a strong open-source developer community.
Intel oneAPI
Optimized for parallel processing on Intel hardware, maximizing computing resources for high-performance applications with broad flexibility.
Docker
Lightweight virtualization platform that isolates applications from environments. Ensures consistent execution across development and production while optimizing resource usage and startup speed.
Apptainer
Designed for scientific computing and HPC (High-Performance Computing). Supports MPI, GPU, and various scientific libraries, making it ideal for research environments.
OpenHPC
Open-source software stack for high-performance computing (HPC). Provides comprehensive tools for system management and application deployment, integrating numerous HPC applications and libraries.
Open OnDemand
Open-source remote access and management platform for HPC clusters. Enables users to submit jobs, manage files, and run visualization tools via a web-based interface.
Zabbix
Open-source network monitoring and management solution. Offers real-time and historical data collection for servers, network devices, and applications.
NVIDIA CUDA
GPU-accelerated parallel computing platform for high-performance computing, deep learning, and scientific applications. Delivers significant computational speedups using optimized software-hardware integration.
MIG
(Multi-Instance GPU)
NVIDIA’s GPU partitioning technology, allowing a single physical GPU to be divided into multiple virtual instances, enabling efficient resource allocation for diverse workloads.
Red Hat Enterprise Linux(RHEL)
Enterprise-grade Linux distribution with long-term support (10 years), offering high stability and security for mission-critical applications.
Rocky Linux
Open-source and fully compatible with RHEL, providing enterprise-level stability and security with a focus on long-term community-driven support.
Ubuntu
Popular open-source Linux distribution known for ease of use, extensive community support, and a rich ecosystem of applications.