Wednesday, November 25, 2015

Formatted with Type 2 Protection, huh?


Bought some Seagate SCSI disks (ST9600104SS) with a synology expansion unit (RX1213sas) to expand the storage array of a Synology Rackstation device.  If you recognize the title then you know why I am posting.

I added this new batch of disks to the synology expansion unit, connected the expansion via external mini-SAS cabling to the host Synology and in the DiskStation administration panel the disks show up just fine, great.  I attempt to expand the raid group, nothing, no gui message, no error message.  After about two weeks of troubleshooting (expansion unit cabling, etc), I check /var/log/messages and get my first real clue:

sfdisk: exception.c:159 Error: Input/output error during write on /dev/sas15

I/O error, I then try to partition the disk using fdisk (gparted is not available), and same issue, I cannot write partition information to the disk.  At this point I was not sure if the issue was expansion unit related or disk related.  Over a two week period the following took place to help troubleshoot:
  • The cold spare disks from the original batch work just fine in the expansion unit, so the expansion unit and cabling are good.
  • The new batch of disks work in a Dell server with a raid controller just fine, was able to write a partition via the raid controller, and I even created a volume and installed an OS.
  • Once I partitioned/wiped the disks via the Dell server I tried them in the synology expansion unit again, same I/O error as before.
Got a breakthrough by using smartclt (that was thankfully available on the synology) to get smart information of the new batch of disks, and compare with the old batch:

Original disk [serial blanked out]:

NAS> smartctl -i /dev/sas1
smartctl 6.2 (build date Oct 28 2015) [x86_64-linux-3.10.35] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST9600104SS
Revision:             FMF2
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Rotation Rate:        10000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5003c3e2d97
Serial number:        
Device type:          disk
Transport protocol:   SAS
Local Time is:        Tue Nov 24 07:40:08 2015 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

New disk:

NAS> smartctl -i /dev/sas15
smartctl 6.2 (build date Oct 28 2015) [x86_64-linux-3.10.35] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST9600104SS
Revision:             MS05
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Formatted with type 2 protection
Rotation Rate:        10000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5002891f82b
Serial number:        
Device type:          disk
Transport protocol:   SAS
Local Time is:        Mon Nov 23 11:36:05 2015 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

Notice the differences?  Well for one, the firmware (revisions) are different, this is because the original batch of disks are Dell (OEM) branded disks, the new batch of disks are Seagate (retail) branded disks.  But it was the second difference that caused me some confusion:

Formatted with type 2 protection

Not knowing what this was, I then went down a seemingly never ending spiral of T10 Protection Information [PDF] standards.  Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification.  The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine.

It was here I focused on this error, and not on attempting to match the firmware on the disks.

First problem I needed to tackle was to find a way to use some better tools on the disks, the built in Synology utilities are pretty bare-bones.  Since these are SAS disks I can't pop them in a desktop to work on them, I luckily had an old server that uses an HBA controller (for software raid) and not a raid card.  This made it easier to query the disks directly.  I ran smartctl on this server to get the smart info from a new disk, and it DID NOT explicitly say it was a Type 2 disk, like above, which through me for a loop, but just affirmed I needed better tools.

In Seagate's T10 Protection Information document [above] there is a paragraph on how to set and determine the PI Type using FMTPINFO.  After looking into how I can query this, I found that linux has a suite of utilities designed just for this purpose, sg3_utils.  The CentOS live USB I was using sadly did not have this package installed, but it was in the repos:

yum install sg3_utils

Using the amazing example area of this Ubuntu man page on sg_format, I once again queried an original disk and a new disk for the Type information:

Original disk:

[root@livecd ~]# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Thin provisioning: tpe=0, tprz=0
   ....

New disk:

[root@livecd ~]# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=1, p_type=1, p_i_exponent=0
   Thin provisioning: tpe=0, tprz=0
   ....

Notice the prot_en and p_type bits, now I knew without a doubt the first batch of disks and second batch of disks I purchased are two completely different formats.  Unknown at this point to me was WHY the NAS controller would not read and write to these disks, but I figured if I can low-level format the disks with NO protection information, then I might get lucky.  Thankfully the Ubuntu man page above has excellent examples, and I was easily able to format them with sg_format, please note a format completely erases the disks!

[root@livecd ~]# sg_format --format --fmtpinfo=0 /dev/sda
SEAGATE   ST9600104SS   MS05   peripheral_type: disk [0x0]
  << supports protection information>>
Mode Sense (block descriptor) data, prior to changes:
  Number of blocks=1172123568 [0x45dd2fb0]
  Block size=512 [0x200]

A FORMAT will commence in 10 seconds
ALL data on /dev/sda will be DESTROYED
    Press control-C to abort
A FORMAT will commence in 5 seconds
ALL data on /dev/sda will be DESTROYED
    Press control-C to abort

Format has started
Format in progress, 0% done
....
Format in progress, 99% done
FORMAT Complete

The format took about 8 hours, lets do another check for the protection type:

[root@livecd ~]# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0

   Thin provisioning: tpe=0, tprz=0

Woohoo!  With the disks formatted with no protection information I threw them back in the expansion unit to try them once again, and what-do-you-know, Diskstation was able to add the disks to the Raid Group just fine:


After the initial failures with this batch of disks I could have easily scoffed, returned the disks or expansion unit for a refund (and I would have gotten one), but instead a simple curiosity of finding out what Protection Information was led me to the solution, and knowledge I can bring to other similar problems in the future.