Proxmox Cluster (Tucana Cloud) - ZFS Pool - Part IV

This article documents the beginning of my journey moving from data stored into singles disks to proper redundancy and backup.

Every journey has a beginning and for me, it all started when a disk with (hopefully) not so much important data failed pretty much beyond recovery.

With a couple of RAID cards available, I have decided to sit all my data on top of RAID arrays.

We should be careful here because RAID arrays are not backup, we should take it with a grain of salt and always follow the saying that "RAID is not backup! If you do not have a minimum of 3 backups. You do not have a backup!"

I have now my data spread around as follows :

  • 5.5TB in a  RAID6 with 8 x 1TB disks on a home server.
  • 4.8TB in a RAID6 with 10 x 600GB disks on my hypervisor 1 (HV1) colocated into a rack.
  • 4.8TB in a RAID6 with 10 x 600GB disks on my hypervisor 2 (HV2) colocated into a rack.

I am recently building another desktop and the current one with my 5.5TB will become a server. As my infrastructure is growing, it changed my focus into storage and the question hammering my mind is :

What's the best storage strategy moving forward ?

My storage currently is not uniform, non-scalable and I need to move into something :

  • Manageable - a format that allows easy monitoring.
  • Scalable - something that is easy to change in the future.
  • Efficient - a setup that provides efficiency for different use cases.
  • Redundant - protection to the data stored.

After some research, it seems that the answer to the question above is: ZFS.

What is ZFS ?

According to iXsystems ZFS is :

OpenZFS (ZFS) file system, handles both disk and volume management. ZFS offers RAID options mirror, stripe, and its own parity distribution called RAIDZ that functions like RAID5 on hardware RAID. The file system is extremely flexible and secure, with various drive combinations, checksums, snapshots, and replications all possible. For a deeper dive on ZFS technology, read the ZFS Primer section of the FreeNAS documentation.

Since the goal of this article is not to explain the technologies in detail but present an overview of my setup I will reference all material used in the research for this article in the end.

Let's create a real setup and make some tests with a ZFS storage system.

We will start listing our disks :

root@hv2:~/storage/megacli# megacli -pdlist -aALL | grep -i 'slot\|raw'
Slot Number: 0
Raw Size: 136.732 GB [0x11177330 Sectors]
Slot Number: 1
Raw Size: 136.732 GB [0x11177330 Sectors]
Slot Number: 2
Raw Size: 279.396 GB [0x22ecb25c Sectors]
Slot Number: 3
Raw Size: 279.396 GB [0x22ecb25c Sectors]
Slot Number: 4
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 5
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 6
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 7
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 8
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 9
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 10
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 11
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 12
Raw Size: 558.792 GB [0x45d964b8 Sectors]
Slot Number: 13
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Slot Number: 20
Raw Size: 931.512 GB [0x74706db0 Sectors]
Slot Number: 21
Raw Size: 68.366 GB [0x88bb992 Sectors]

The disks that will be used in our ZFS pool are the 558GB ones. They are in slots from 4 to 13.

ATTENTION - Use JBOD instead of RAID 0 to avoid read-modify-write overhead. Further explanation can be found here and here.

The script below will create the single disk RAID 0 needed :

root@hv2:~/storage/megacli# cat create-raid-0.sh 
#!/bin/bash

# This script creates a RAID 0 with the disks passed as arguments.
# Arguments order :
# 1) enclosure ID. Ex) 17
#   1.b) Run megacli -LdPdInfo -aALL -NoLog | grep -i enclosure 
#        to confirm the enclosure ID that usually is only one.
#        Make sure to get this number right and avoid data loss.
# 2) Disks range or single disk slot number. Ex) 1 2-10
# Usage Example:
# ./create-raid-0.sh 17 22 2-10
# The command above create a single disk RAID0 array with disk 22 and all the disks in the range 2 to 10
# 
#
# The script has minor checks for errors, so USE WITH CAUTION.



#### >> The script currently works just with a single range << ###
### Example : ./create-raid-0 17 10 14
# first : enclosure ID
# second argument : is the start of the range used to create the RAID0 arrays.
# third argument : is the end of the range.
# 
# The above creates a single disk RAID0 for each disk from slot 10 to 14.


# Full path to the MegaRaid CLI binary
MEGACLI="/opt/MegaRAID/MegaCli/MegaCli64"

ENCLOSURE_ID=$1
RANGE_START=$2
RANGE_END=$3

##################### Range Validation ################################
## -gt = greater
#if [ $RANGE_START -gt $RANGE_END ]
#then
#       echo '[
#       {"errors":[
#                       "Invalid Range. Start of range is greater than end.",
#               ]
#       }]'
#       exit 1
#fi
#######################################################################


while [ $RANGE_START -le $RANGE_END ]
do
        #echo "megacli -cfgldadd -r0[${ENCLOSURE_ID}:${RANGE_START}] WB RA Cached CachedBadBBU -strpsz512 -a0 -NoLog" 

        # Create RAID0 with a single disk.
        /opt/MegaRAID/MegaCli/MegaCli64 -cfgldadd -r0[${ENCLOSURE_ID}:${RANGE_START}] WB RA Cached CachedBadBBU -strpsz512 -a0 -NoLog
        ((RANGE_START=$RANGE_START+1))
done

If you don't use the script is possible to create the arrays manually replacing 17:4 ( Enclosure ID : Disk Slot Number ) in a way that to reflects your environment with the command below:

/opt/MegaRAID/MegaCli/MegaCli64 -cfgldadd -r0[17:4}] WB RA Cached CachedBadBBU -strpsz512 -a0 -NoLog

It will take some minutes for the command to complete. Just wait until you are on the shell again.

root@hv2:~/storage/megacli# ./create-raid-0.sh 17 4 13
                                     
Adapter 0: Created VD 1

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 2

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 3

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 4

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 5

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 6

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 7

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 8

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 9

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
                                     
Adapter 0: Created VD 10

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

We can now use our jsonify script to inspect the virtual disks :

root@hv2:~/storage/megacli# ./jsonify.sh | jq '.[]'
{
  "id": "0",
  "raid-type": "  Primary-1, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  136.218 GB ",
  "state": "  Optimal ",
  "number-drives": "  2 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  0 ",
      "disk-size": "  136.732 GB [0x11177330 Sectors] ",
      "disk-state": "  Online, Spun Up "
    },
    {
      "enclosure-id": "  17 ",
      "slot": "  1 ",
      "disk-size": "  136.732 GB [0x11177330 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "1",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  4 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "2",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  5 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "3",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  6 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "4",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  7 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "5",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  8 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "6",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  9 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "7",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  10 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "8",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  11 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "9",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.281 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  12 ",
      "disk-size": "  558.792 GB [0x45d964b8 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}
{
  "id": "10",
  "raid-type": "  Primary-0, Secondary-0, RAID Level Qualifier-0 ",
  "size": "  558.406 GB ",
  "state": "  Optimal ",
  "number-drives": "  1 ",
  "physical-disks": [
    {
      "enclosure-id": "  17 ",
      "slot": "  13 ",
      "disk-size": "  558.911 GB [0x45dd2fb0 Sectors] ",
      "disk-state": "  Online, Spun Up "
    }
  ]
}

And we can now see the disks presented to our proxmox host :

You can use the proxmox GUI to create our ZFS pool, but because I am a CLI guy let's create it with the command below:


root@hv2:~/storage/megacli# zpool create -f -o ashift=12 hv2-tucana-lab \
mirror /dev/sdb /dev/sdc \
mirror /dev/sdd /dev/sde \
mirror /dev/sdf /dev/sdg \
mirror /dev/sdh /dev/sdi \
mirror /dev/sdj /dev/sdk \
mirror /dev/sdl /dev/sdm

We have created a 5 x 2-way mirror. This setup configuration is similar to a RAID10 that offers fast IOPs since the goal is to host VMs that will benefit from such setup however we are sacrificing space that is half the total raw space avaiable.

root@hv2:~/storage/megacli# zfs list
NAME                           USED  AVAIL     REFER  MOUNTPOINT
hv2-tucana-lab                27.8G  2.60T       96K  /hv2-tucana-lab
hv2-tucana-lab/folder         2.00G  2.60T     2.00G  /hv2-tucana-lab/folder

We will present the pool to our proxmox and this time using the GUI.

Navigate to Datacenter > Storage > Add > ZFS

If you do no want to leave the CLI the command below will achieve the same result :

# pvesm add zfspool hv2-tucana-lab -pool hv2-tucana-lab

We can now start to deploy our VMs into the newly created ZFS pool. A future article will cover how to test the IOPs and tune parameters of different types of RAIDZ to have a faster and more resilient pool.

We will also explore how to setup an SSD for cache, how to resilver our pool in case of a disk failure and set monitoring. Keep tuned for more articles.

Resources

Overview of ZFS Pools in FreeNAS - iXsystems, Inc. - Enterprise Storage & Servers
This blog provides an overview of creating pools after installing FreeNAS. To begin, we are going to create a pool so storage disks can be allocated and shared. Head over to Storage, then Pools. This window lists all pools and datasets currently on your FreeNAS machine, and will not have any entries…
Six Metrics for Measuring ZFS Pool Performance Part 1 - iXsystems, Inc. - Enterprise Storage & Servers
Learn more about why OpenZFS (ZFS) storage pool layouts have such a significant impact on system performance under various types of workloads.
Six Metrics for Measuring ZFS Pool Performance Part 2 - iXsystems, Inc. - Enterprise Storage & Servers
Learn more about why OpenZFS (ZFS) storage pool layouts have such a significant impact on system performance under various types of workloads.
Storage - Proxmox VE
-
ZFS on Linux - Proxmox VE
Storage: ZFS over iSCSI - Proxmox VE
ZFS over iSCSI for Proxmox and FreeNAS
virtualisierung:proxmox_kvm_und_lxc:proxmox_debian_als_zfs-over-iscsi_server_verwenden [DEEPDOC.AT - enjoy your brain]
ZFS over iSCSI Lacking Container Support - How do I get around this?
Hey all, I am trying to achieve a dual node setup where Node A is my main compute node which has SSDs. This is a SFF chassis, and the goal is to house all the VMs/Containers here (or at least the boot drive for the VMs and containers here). Node B will be the storage node with HDDs, and as you...
Using ZFS with MergerFS - Perfect Media Server
ZFS 101—Understanding ZFS storage and performance
Learn to get the most out of your ZFS filesystem in our new series on storage fundamentals.
MegaCLI Scripts and Commands LSI @ Calomel.org
ZFS tuning cheat sheet – JRS Systems: the blog