Adaptec - Replace Failed Disk

Replacing failed disks is part of managing storage arrays and a mistake could lead to catastrophic data loss. The steps below will guide us on how to replace a failed disk in an Adaptec RAID card.

First, let's get the controller ID.

arcconf getversion
Controllers found: 1
Controller #1
==============
Firmware           : 5.3-0 (19204)
Staged Firmware    : 0.0-0 (19204)
BIOS               : 5.3-0 (19204)
Driver             : 1.2-1 (50983)
Boot Flash         : 0.0-0 (19204) 

We can now check the status of our RAID Arrays.

arcconf getconfig
Usage: GETCONFIG <Controller#> [AD | LD [LD#] | PD | MC | CN | [AL]] [nologs]
 ===================================================================================

 Prints controller configuration information.

    Option  AD  : Adapter information only
            LD  : Logical device information only
            LD# : Optionally display information about the specified logical device
            PD  : Physical device information only
            MC  : Maxcache 3.0 information only
            CN  : Connector information for smartHBA only
            AL  : All information (optional)
arcconf getconfig 1 LD
Controllers found: 1
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
   Logical device name                      : LogicalDrv 0
   Block Size of member drives              : 512 Bytes
   RAID level                               : 6 Reed-Solomon
   Unique Identifier                        : EF9AD836
   Status of logical device                 : Suboptimal, Fault Tolerant
   Size                                     : 5720054 MB
   Parity space                             : 1906688 MB
   Stripe-unit size                         : 256 KB
   Read-cache setting                       : Enabled
   Read-cache status                        : On
   Write-cache setting                      : Enabled
   Write-cache status                       : On
   MaxCache read cache setting              : Enabled
   MaxCache read cache status               : Off
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : No
   Power settings                           : Disabled
   --------------------------------------------------------
   Logical device segment information
   --------------------------------------------------------
   Segment 0                                : Present (Controller:1,Connector:0,Device:0)      WD-WCAW34523030
   Segment 1                                : Present (Controller:1,Connector:0,Device:1) 9XG4HR6T
   Segment 2                                : Present (Controller:1,Connector:0,Device:2)      WD-WMATV6234407
   Segment 3                                : Present (Controller:1,Connector:0,Device:3)      WD-WMATV6327356
   Segment 4                                : Present (Controller:1,Connector:1,Device:0)      WD-WCAW34522334
   Segment 5                                : Present (Controller:1,Connector:1,Device:1)      WD-WMATV6292750
   Segment 6                                : Present (Controller:1,Connector:1,Device:2) 9XG4HNW3
   Segment 7                                : Missing



Command completed successfully.

We can see in the results that HDD7 (segment 7) is missing and it needs to be replaced. I will locate the failed drive and physically replace it.

After the replacement we need to rescan for new drives.

arcconf rescan 1
Controllers found: 1
 Rescan started in the background and can take upto 10 mins to complete.

Command completed successfully.

We can confirm if the new disk has been recognised.

arcconf getconfig 1 PD | grep -i "Device #\|Serial Number"
Device #0
         Serial number                      : WD-WCAW34523030
      Device #1
         Serial number                      : 9XG4HR6T
      Device #2
         Serial number                      : WD-WMATV6234407
      Device #3
         Serial number                      : WD-WMATV6327356
      Device #4
         Serial number                      : WD-WCAW34522334
      Device #5
         Serial number                      : WD-WMATV6292750
      Device #6
         Serial number                      : 9XG4HNW3
      Device #7
         Serial number                      : WD-WMATV6104245

The new disk is listed under Device #7.

Rebuild the Array

We need to initialize the replaced disk.

 Usage: TASK START <Controller#> DEVICE <Channel# ID#> <task> [<password>][noprompt] [nologs]

Channel# ID#   : The Channel and ID of the physical device on which task is to be
                     performed.  Optionally ALL indicates all ready drives for initialize
                     task only (ex. ARCCONF TASK START 1 DEVICE ALL INITIALIZE).
    Physical Tasks : verify_fix
                     verify
                     clear
                     initialize
                     secureerase

We need the channel ID of the new inserted disk to initialize it.

arcconf getconfig 1 PD | grep -i "Device #\|Serial Number\|Channel"
...
Device #7
         Reported Channel,Device(T:L)       : 0,7(7:0)
         Serial number                      : WD-WMATV6104245
arcconf TASK START 1 DEVICE 0 7 initialize
Controllers found: 1
Initializing Channel 0, Device 7 will erase its metadata, the section
where all the logical device definition data is stored

Are you sure you want to continue?
Press y, then ENTER to continue or press ENTER to abort: y

Initializing Channel 0, Device 7.

Command completed successfully.

After the drive has been initialize if the rebuild does not start automatically we need to set the drive as hot spare to our suboptimal array.

arcconf SETSTATE 1 DEVICE 0 7 HSP LOGICALDRIVE 0

And the rebuild should start automatically.

arcconf getstatus 1
Controllers found: 1
Logical device Task:
   Logical device                 : 0
   Task ID                        : 106
   Current operation              : Rebuild
   Status                         : In Progress
   Priority                       : High
   Percentage complete            : 0


Command completed successfully.

After the rebuild the array should report as optimal.

tiago@home-svr2:~$ arcconf getconfig 1 LD 
Controllers found: 1
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
   Logical device name                      : LogicalDrv 0
   Block Size of member drives              : 512 Bytes
   RAID level                               : 6 Reed-Solomon
   Unique Identifier                        : EF9AD836
   Status of logical device                 : Optimal
   Size                                     : 5720054 MB
   Parity space                             : 1906688 MB
   Stripe-unit size                         : 256 KB
   Read-cache setting                       : Enabled
   Read-cache status                        : On
   Write-cache setting                      : Enabled
   Write-cache status                       : On
   MaxCache read cache setting              : Enabled
   MaxCache read cache status               : Off
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : No
   Power settings                           : Disabled
   --------------------------------------------------------
   Logical device segment information
   --------------------------------------------------------
   Segment 0                                : Present (Controller:1,Connector:0,Device:0)      WD-WCAW34523030
   Segment 1                                : Present (Controller:1,Connector:0,Device:1) 9XG4HR6T
   Segment 2                                : Present (Controller:1,Connector:0,Device:2)      WD-WMATV6234407
   Segment 3                                : Present (Controller:1,Connector:0,Device:3)      WD-WMATV6327356
   Segment 4                                : Present (Controller:1,Connector:1,Device:0)      WD-WCAW34522334
   Segment 5                                : Present (Controller:1,Connector:1,Device:1)      WD-WMATV6292750
   Segment 6                                : Present (Controller:1,Connector:1,Device:2) 9XG4HNW3
   Segment 7                                : Present (Controller:1,Connector:1,Device:3)      WD-WMATV6104245



Command completed successfully.

Resources

What is the cause and resolution for array state degraded and array members state optimal?
After drive failure and replacement auto-rebuild does not start, why?
Adaptec hardware RAID controller - Hetzner Docs