In today’s data-driven world, reliable and scalable storage solutions are essential. However, the cost of enterprise-grade storage can be prohibitive, particularly for medium-sized and large organizations. What if you could leverage your existing, older server hardware to create a robust, scalable, and cost-effective storage system? Today, I am going to introduce you to CEPH, the open-source software-defined storage solution that can transform your old servers and PCs into a state-of-the-art enterprise-grade storage cluster, and you don’t have to pay a single penny! In this article, I will hold your hand and guide you through the process of repurposing your old server hardware using CEPH, unlocking its full potential to build a modern, decentralized, and highly available storage system.
Many companies and organizations have old computers and servers gathering dust—hardware that has been decommissioned from past projects or deemed End-of-Life (EOL) on vendor roadmaps. Instead of letting these valuable resources become e-waste, I offer you a compelling alternative: transform them into a powerful Network Attached Storage (NAS) system. Imagine building a storage cluster with the combined capacity of all your old hard drives, managed by a sophisticated and resilient software layer. That’s the promise of Ceph, and that’s what we are going to explore today!
Ceph: Software-Defined Storage Revolution
Ceph stands out as a leader in software-defined storage (SDS). This means that the storage management logic resides in software, not dedicated hardware. This hardware agnosticism allows Ceph to run on a wide variety of hardware, making it an ideal choice for repurposing and mixing old hardware (servers and PCs) for this project. Its key benefits include:
- Scalability: Easily expand your storage capacity by adding more hardware as needed. Start small and grow as your data needs increase.
- Flexibility: Ceph adapts to your specific requirements, supporting various storage types and configurations.
- Cost-Effectiveness: By leveraging existing hardware and open-source software, you significantly reduce storage costs compared to proprietary solutions.
- Resilience: Ceph’s decentralized architecture ensures high availability. If one node fails, your data remains safe and accessible.
Ceph’s ability to integrate seamlessly with diverse hardware setups makes it a powerful tool for transforming outdated equipment into a modern, efficient storage solution.
Understanding Ceph’s Architecture: The Magic Behind the Scenes
Ceph’s magic lies in its distributed nature. A Ceph cluster consists of multiple compute devices (your old servers and PCs). Each hard drive within these devices becomes an Object Storage Daemon (OSD), responsible for storing data. Ceph uses RADOS (Reliable Autonomic Distributed Object Store) as its core storage engine. Instead of writing files as blocks like traditional storage systems, Ceph stores them as objects. This object-based approach offers improved data management and metadata handling.
Here’s a breakdown of key Ceph concepts:
- Storage Pools: Logical groupings of storage resources, allowing you to organize your storage based on different needs. For example, you might create separate pools for different departments within your organization, such as “Finance” and “Engineering.”
- Placement Groups (PGs): Manage the distribution of data across OSDs. Each PG is replicated across multiple OSDs for redundancy. For instance, if you have 100 OSDs and 300 PGs, each PG will be distributed across several OSDs to ensure data redundancy and availability.
- CRUSH Algorithm: Determines how data is placed across the cluster. CRUSH (Controlled Replication Under Scalable Hashing) ensures efficient data distribution and balancing, even as the cluster grows or changes. For example, if you add new OSDs to the cluster, CRUSH will automatically redistribute data to maintain balance.
- CephFS: A distributed file system designed specifically for Ceph. CephFS decentralizes file metadata for optimal performance. This means that file operations are distributed across multiple metadata servers, reducing bottlenecks and improving performance. For example, a large-scale web application can benefit from CephFS by distributing its file storage across multiple servers, ensuring high availability and performance.
By grasping these key concepts, you can truly appreciate the robustness and flexibility of Ceph’s architecture. This makes Ceph an outstanding choice for building a modern, scalable storage system using repurposed hardware. Now that we’ve covered the theory, it’s time to roll up our sleeves and dive into the practical side of things! Let’s get our hands dirty and start building our Ceph cluster!
Building Your Ceph Cluster: A Step-by-Step Configuration Guide
1. Hardware Preparation
Assuming you can get a hold of two old servers and two old PCs, each with some disks. Here’s how to prepare them:
- Servers: Ensure each server has at least 8GB of RAM, a quad-core processor, and multiple hard drives (preferably 500GB or larger).
- PCs: Ensure each PC has at least 4GB of RAM, a dual-core processor, and multiple hard drives (preferably 300GB or larger).
2. Software Installation
In this stage, we will install the necessary software on each individual node (servers and PCs) in our Ceph cluster. We’ll use Ubuntu 22.04 for this project, Docker for containerized Ceph deployment, and LVM2 for logical volume management. Follow these detailed steps:
- Install Ubuntu 22.04:
- Update and Upgrade the System: Run these commands on each individual node to ensure that all packages are up-to-date.
~$ sudo apt update && sudo apt -y upgrade
~$ sudo reboot
- Install Docker and Required Packages: Docker is used for containerized deployment of Ceph, which simplifies the installation and management of Ceph components. Here are the detailed steps to install Docker on Ubuntu 22.04, by following these steps, you will have Docker installed and running on your Ubuntu 22.04 system, ready for containerized deployment of Ceph. Run these commands on each individual node. This ensures that each node can run Docker containers for Ceph daemons
- Install Required Packages : This command installs necessary packages for Docker, including
apt-transport-https
(for secure APT transport),ca-certificates
(for CA certificates),curl
(for downloading files), andsoftware-properties-common
(for managing software repositories).
~$ sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
- Add Docker’s Official GPG Key: This command downloads Docker’s GPG key and adds it to your system’s keyring. The GPG key is used to verify the authenticity of Docker packages.
~$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
- Set Up the Docker Repository: This command adds Docker’s official APT repository to your system’s software sources list. The
$(lsb_release -cs)
command dynamically inserts your Ubuntu version codename (e.g., “jammy” for Ubuntu 22.04).
~$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
- Update the Package Database: This command updates your package database to include Docker’s repository.
~$ sudo apt update
~$ sudo apt install -y docker-ce
~$ sudo systemctl start docker
- Enable Docker to Start on Boot: This command enables Docker to start automatically when the system boots.
~$ sudo systemctl enable docker
- Install LVM2: LVM2 is used for logical volume management. Run this command on each individual node.
~$ sudo apt install -y lvm2
3. Cluster Setup
Setting up your Ceph cluster involves configuring static IP addresses for stability, installing the necessary tools, and initializing the cluster. You will run these commands on each individual node and the monitor node as specified. Here’s an example IP plan for your cluster:
- Server 1: 192.168.1.10 (Monitor node)
- Server 2: 192.168.1.11
- PC 1: 192.168.1.12
- PC 2: 192.168.1.13
- Configure Static IPs: Configuring static IPs ensures that each node in your cluster has a consistent IP address, which is crucial for stable communication between nodes. Edit the
/etc/netplan/01-netcfg.yaml
file on each machine ( Add the following configuration, replacing the IP address with the appropriate one for each machine):
network:
version: 2
ethernets:
eth0:
dhcp4: no
addresses:
- 192.168.1.10/24
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8, 8.8.4.4]
Apply the Netplan configuration to set the static IP address:
~$ sudo netplan apply
- Install Cephadm: Cephadm is the tool used to deploy and manage the Ceph cluster. Download the Cephadm script, make the script executable and move the script to a director in your Environment PATH (Run these commands on each individual node to install Cephadm).
~$ curl --silent --remote-name --location https://github.com/ceph/ceph/raw/master/src/cephadm/cephadm
~$ chmod +x cephadm
~$ sudo mv cephadm /usr/local/bin/
- Bootstrap the Cluster: Bootstrapping the cluster initializes it with the first monitor node (192.168.1.10). This step is crucial as it sets up the initial Ceph daemons (Run this command on the monitor node (192.168.1.10) to bootstrap the cluster).
~$ sudo cephadm bootstrap --mon-ip 192.168.1.10
- Check Running Docker Containers: Ensure that Docker is running and the Ceph daemons are deployed correctly on all nodes, run the following command on each individual node.
~$ sudo docker ps
- Add Hosts: Adding hosts to the cluster involves integrating the other nodes into the Ceph cluster. Run these commands on the monitor node (192.168.1.10) to add each additional node to the cluster.
~$ ceph orch host add server2 192.168.1.11
~$ ceph orch host add pc1 192.168.1.12
~$ ceph orch host add pc2 192.168.1.13
4. OSD Deployment:
Object Storage Daemons (OSDs) are the backbone of your Ceph cluster, responsible for storing data, handling data replication, recovery, and rebalancing. Deploying OSDs is a critical step in setting up your Ceph cluster, as it ensures that your storage resources are properly utilized and managed. In this section, we will guide you through the process of preparing your disks and deploying them as OSDs within your Ceph cluster. By following these steps, you will enable your cluster to store and manage data efficiently, ensuring high availability and resilience. Let’s get started!
- Prepare Disks: This step involves wiping any existing data on the disks to ensure they are clean and ready for use as OSDs. You need to run these commands on each individual node where the disks are located (Replace
/dev/sdX
with the appropriate disk identifier for each disk you are preparing.).
~$ sudo sgdisk --zap-all /dev/sdX
~$ sudo wipefs -a /dev/sdX
- Deploy OSDs: Deploying OSDs involves adding the prepared disks as Object Storage Daemons to the cluster. You should run these commands on the monitor node (192.168.1.10) to add the disks from each individual node to the cluster (Replace
/dev/sdX
,/dev/sdY
,/dev/sdZ
, and/dev/sdW
with the appropriate disk identifiers for each disk you are adding).
~$ ceph orch daemon add osd server1:/dev/sdX
~$ ceph orch daemon add osd server2:/dev/sdY
~$ ceph orch daemon add osd pc1:/dev/sdZ
~$ ceph orch daemon add osd pc2:/dev/sdW
5. CephFS Configuration
Configuring CephFS involves creating storage pools, setting up the Ceph File System, and deploying the Metadata Server (MDS). You should run these commands on the monitor node (192.168.1.10) to configure CephFS for your cluster.
- Create Pools: Pools are logical groupings of storage resources. Run these commands on the monitor node to create the necessary pools for CephFS. In CephFS, there are two primary types of pools;
cephfs_data
andcephfs_meta
.- cephfs_data: This pool stores the actual file data. When you save a file to CephFS, the file’s contents are stored in the
cephfs_data
pool. - cephfs_meta: This pool stores the metadata associated with the files, such as file names, directories, permissions, and other attributes. The Metadata Server (MDS) uses this pool to manage and retrieve metadata efficiently.
- cephfs_data: This pool stores the actual file data. When you save a file to CephFS, the file’s contents are stored in the
~$ ceph osd pool create cephfs_data 32
~$ ceph osd pool create cephfs_meta 32
- Create the File System: This command creates the Ceph File System (CephFS) using the previously created pools. Run this on the monitor node (192.168.1.10):
~$ ceph fs new mycephfs cephfs_meta cephfs_data
- Deploy Metadata Server: The Metadata Server (MDS) manages the file system metadata. Run this command on the monitor node (192.168.1.10) to deploy the MDS:
~$ ceph orch apply mds mycephfs --placement="count=1"
6. Accessing and Using Your Cluster From Client Devices/Machines. and Monitoring Your Storage Cluster
Accessing and monitoring your Ceph storage cluster is crucial for ensuring its performance, health, and availability. This step involves mounting the Ceph File System (CephFS) on client machines and setting up monitoring tools to keep track of the cluster’s status. You should run these commands on the appropriate client machines and the monitor node.
Mount CephFS
Mounting CephFS allows you to access the storage cluster from client machines (the computers or devices that connect to the Ceph cluster to use its storage resources). These machines can be running different operating systems, such as Linux or Windows, and they access the storage provided by the Ceph cluster for various purposes, such as storing files, running applications, or performing data analysis. Here are the steps for both Linux and Windows clients:
- On Linux: Run these commands on the Linux client machine to install the Ceph FUSE Client and also mount CephFS:
~$ sudo apt install -y ceph-fuse # Install Ceph FUSE client
~$ sudo mkdir /mnt/cephfs # Create a mount point (the directory where CephFS will be mounted)
~$ sudo mount -t ceph 192.168.1.10:6789:/ /mnt/cephfs -o name=admin,secret=<key> # Replace <key> with your admin key
- On Windows: These are machines running a Windows operating system. They can access CephFS using SMB (Server Message Block) shares, which allows them to map network drives to the Ceph storage. You can map a network drive to the CephFS share using the following steps:
- Open File Explorer and click on “This PC”.
- Click on “Map network drive”.
- Enter the folder path as
\\192.168.1.10\cephfs
and click “Finish”.
7. Ceph Dashboard
The Ceph dashboard provides a visual interface for monitoring the cluster. Run these commands on the monitor node (192.168.1.10) to enable and access the dashboard:
- Enable the Dashboard: This command enables the Ceph dashboard module:
~$ ceph mgr module enable dashboard
- Create a Self-Signed Certificate: This command creates a self-signed certificate for the dashboard:
~$ ceph dashboard create-self-signed-cert
~$ ceph dashboard set-login-credentials admin <password> # Replace <password> with your desired password
- Access the Dashboard: Open a web browser and navigate to
https://192.168.1.10:8443
. Log in using the credentials you set.
8. Ceph Cluster Monitoring
Monitoring the cluster involves checking its health, status, and running Docker containers. This ensures that you can keep track of the cluster’s performance, health, and availability, making it easier to manage and maintain. Run these commands on the monitor node (192.168.1.10):
~$ ceph health
~$ ceph status
~$ ceph -w
- Check Running Docker Containers: Verify that the Ceph daemons are running in Docker containers:
~$ sudo docker ps
Ceph’s flexibility allows for diverse hardware setups. You can mix and match different hardware components, as long as they meet Ceph’s requirements. This is where the true power of repurposing comes in. By understanding Ceph’s internals, you can fine-tune your cluster for optimal performance. Explore advanced features like tiering and block storage to further enhance your storage solution. Ceph offers a compelling alternative to expensive proprietary storage solutions. By repurposing your old hardware, you can build a modern, scalable, and resilient storage system at a fraction of the cost. Embrace the power of open-source software and transform your unused hardware into a valuable asset. With Ceph, you’re not just building a storage system; you’re building a foundation for innovation and growth. We encourage you to explore the full potential of Ceph and stay tuned for more ideas on how to repurpose and reuse old hardware to save on company costs. Together, we can make technology more sustainable and cost-effective.
Happy building!