2008-10-02

DRBD on Debian 上的建置與實測筆記

這是DRBD (Distributed Replicated Block Device) 在DEBIAN上的建置筆記與簡單測試過程的紀錄
整個測試環境是在VMware Server 1.07 build-108231 上
Host配置:
OS- Windows 2003 R2 Enterprise with SP2
CPU- Intel Q6600 2.4GHz
RAM- DDR2-800 2GB *4 = 8GB
HDD- Seagete ST3500320AS *2 with ICH9 AHCI
VM所使用的分區為 Windows Software Raid-0

Guest配置:
OS- Debian 4.0 etch [Kernel:2.6.18-6-686 (2.6.18.dfsg.1-22etch2)]
CPU- 由VM分兩顆Processors出來
RAM- 512MB
HDD- 8G scsi with Independent-persistent
Eth0- Bridged
Eth1- Host-Only #DRBD獨立內網資料交換用

以下皆以Guest為實際操作/測試標的
先準備好兩個相同的VM環境:NODE-A,NODE-B
(可以在 Debian 基本安裝完以後直接把硬碟檔.vmdk複製過去然後修改設定比較節省時間)
硬碟分割:
/dev/sda1 256M /boot #FS:ext3
/dev/sda2 7.7G LVM #作為LVM的PV,屬於VG0
/dev/sda3 512M swap
LVM:
/dev/VG0/LVROOT 4G / #FS:XFS 系統根目錄,測試就懶得細分了
/dev/VG0/LVDRBD 2G /DRBD #測試DRBD用

網路設定
NODE-A
eth0 192.168.1.101 #bridged HOST的網卡,可連外
eth1 192.168.100.101 #Host-Only 網段網卡,作為DRBD交換資料用
NODE-B
eth0 192.168.1.102 #bridged HOST的網卡,可連外
eth1 192.168.100.102 #Host-Only 網段網卡,作為DRBD交換資料用

安裝/測試步驟:
1.設定APT的Sources List
在NODE-A先將 Backports 加進/etc/apt/sources.list中,這樣才有DRBD8可以用(etch本身提供的是drbd0.7,有點舊了)
==================================
NODE-A# echo "deb http://www.backports.org/debian etch-backports main" >>/etc/apt/sources.list
==================================
因為 Backports的GPG在做apt-get update時會出現
==================================
W: GPG error: http://www.backports.org etch-backports Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY EA8E8B2116BA136C
W: You may want to run apt-get update to correct these problems
==================================
所以需要把它的PGP匯入
NODE-A# gpg --keyserver hkp://subkeys.pgp.net --recv-keys 16BA136C
NODE-A# gpg --export | apt-key add -
之後就能正常進行 apt-get update 了

2.安裝drbd8 (source & utils)
NODE-A# apt-get install drbd8-source drbd8-utils
同時會把所有需要的相依套件都裝進去

3.產生drbd8 kernel module
brdb8-source裝完後會在 /usr/src/drbd8.tar.bz2
把它解開
NODE-A# cd /usr/src ; tar jxvf /usr/src/drbd8.tar.bz2
使用 module-assistant 來編譯 drbd8 的kernel module
NODE-A# module-assistant auto-install drbd8
也會自動把所有編譯過程中需要的相依套件自動裝進去,編譯完以後自動安裝
同時產出的 kernel module 會在 /usr/src/drbd8-2.6.18-6-686_8.0.13-2~bpo40+1+2.6.18.dfsg.1-22etch2_i386.deb (檔名會視現行KERNEL版本編號而異)

4.從NODE-A把編好的kernel module & drbd8-utils 直接丟到NODE-B去安裝,省得再NODE-B上還要把相同步驟再搞一次
/usr/src/drbd8-2.6.18-6-686_8.0.13-2~bpo40+1+2.6.18.dfsg.1-22etch2_i386.deb #drbd8 kernel module
/var/cache/apt/archives/drbd8-utils_2%3a8.0.13-2~bpo40+1_i386.deb #從apt的cache archives裡面把drbd8-utils翻出來
在NODE-B上直接用 dpkg -i 把它裝進去就好了(前提是NODE-A & NODE-B的Kernel是一樣的)
NODE-B# dpkg -i drbd8-2.6.18-6-686_8.0.13-2~bpo40+1+2.6.18.dfsg.1-22etch2_i386.deb drbd8-utils_2%3a8.0.13-2~bpo40+1_i386.deb

5.設定/etc/drbd.conf
測試用基礎設定,就依照原本安裝完預設的值的來修改,簡單設定如下
NODE-A & NODE-B : /etc/drbd.conf
==================================
common {
syncer { rate 10M; }
}
resource r0 {
protocol C;
disk { on-io-error detach; }
on NODE-A {
device /dev/drbd0;
disk /dev/VG0/LVDRBD;
address 192.168.100.101:7788;
meta-disk internal;
}
on NODE-B {
device /dev/drbd0;
disk /dev/VG0/LVDRBD;
address 192.168.100.102:7788;
meta-disk internal;
}
}
==================================

6.初始化 resource r0
NODE-A# drbdadm create-md r0
NODE-B# drbdadm create-md r0


7.啟動DRBD service
NODE-A# /etc/init.d/drbd start
NODE-B# /etc/init.d/drbd start

8.檢視DRBD resource r0 狀態
NODE-A# drbdadm state r0
Secondary/Secondary
NODE-B# drbdadm state r0
Secondary/Secondary
=====連線已建立,目前兩個NODE都是Secondary狀態====
NODE-A# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
NODE-B# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
=====由/proc/drbd 看到的狀態 尚未同步所以是Inconsistent/Inconsistent=====

9.初始化同步r0,將NODE-A設為primary
以NODE-A的資料為基準開始同步
NODE-A# drbdadm -- --overwrite-data-of-peer primary r0
r0所在的/dev/VG0/LVDRBD有2G,首次初始化同步需要點時間,因為在/etc/drbd.conf中設定syncer { rate 10M; } 限制同步最大頻寬使用10MBps(80Mbps),所以2G的資料大概花了三分二十秒左右

同步中在NODE-B上看 /proc/drbd 的資訊
NODE-B# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r---
ns:0 nr:1842176 dw:1842176 dr:0 al:0 bm:112 lo:0 pe:0 ua:0 ap:0
[================>...] sync'ed: 87.9% (254876/2097052)K
finish: 0:00:23 speed: 10,984 (10,288) K/sec
resync: used:0/61 hits:115023 misses:113 starving:0 dirty:0 changed:113
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

同步完之後在NODE-A上看 /proc/drbd 的資訊
NODE-A# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:2097052 nr:0 dw:0 dr:2097052 al:0 bm:128 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:130938 misses:128 starving:0 dirty:0 changed:128
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

同步完之後看給DRBD用的網卡eth1
NODE-A:~# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:0C:29:68:EC:2C
inet addr:192.168.100.101 Bcast:192.168.100.254 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:659296 errors:0 dropped:0 overruns:0 frame:0
TX packets:1502254 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:47778966 (45.5 MiB) TX bytes:2248845944 (2.0 GiB)
Interrupt:177 Base address:0x1480

NODE-B:~# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:0C:29:D3:37:A3
inet addr:192.168.100.102 Bcast:192.168.100.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1502140 errors:12 dropped:18 overruns:0 frame:0
TX packets:659298 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2248686736 (2.0 GiB) TX bytes:47778708 (45.5 MiB)
Interrupt:177 Base address:0x1480

可以看到由NODE-A傳了2G資料給NODE-B

10.於NODE-A(primary)上建立file system,我是用XFS
NODE-A:~# mkfs.xfs /dev/drbd0
meta-data=/dev/drbd0 isize=256 agcount=8, agsize=65532 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=524256, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0

11.於NODE-A(primary)上將/dev/drbd0 掛載到 /DRBD 上
NODE-A:~# mount /dev/drbd0 /DRBD
NODE-A:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VG0-LVROOT
4184064 609632 3574432 15% /
tmpfs 258408 0 258408 0% /lib/init/rw
udev 10240 52 10188 1% /dev
tmpfs 258408 0 258408 0% /dev/shm
/dev/sda1 241116 13240 215428 6% /boot
/dev/drbd0 2086784 288 2086496 1% /DRBD

12.於NODE-A(primary)測試大檔(1G)寫入速度
寫入1G資料到LOCAL DISK上,花了2.69529秒(VM的DISK CACHE加速的嫌疑)
NODE-A:~# dd if=/dev/zero of=/TEST_1G bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.69529 seconds, 389 MB/s
寫入1G資料到DRBD上,花了10.8353秒
NODE-A:~# dd if=/dev/zero of=/DRBD/TEST_1G bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.8353 seconds, 96.8 MB/s

寫入過程中,在NODE-B(secondary)上的CPU使用量
Tasks: 55 total, 3 running, 52 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 5.3%sy, 0.0%ni, 53.0%id, 0.0%wa, 7.2%hi, 34.5%si, 0.0%st
Mem: 516820k total, 36056k used, 480764k free, 368k buffers
Swap: 498004k total, 0k used, 498004k free, 20104k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2333 root 16 0 0 0 0 R 69 0.0 1:09.22 drbd0_receiver
2338 root -3 0 0 0 0 S 0 0.0 0:00.56 drbd0_asender
1 root 15 0 1948 648 552 S 0 0.1 0:02.43 init

可以看到在secondary上,
5.3%sy -- System CPU time
7.2%hi -- Hardware IRQ
34.5%si -- Software Interrupts

在1G資料寫入DRBD時,於NODE-B(secondary)上用iptraf觀察eth1的流量
Peak total activity: 623426.88 kbits/s, 76228.80 packets/s
Peak incoming rate: 609721.00 kbits/s, 50648.00 packets/s
Peak outgoing rate: 14150.92 kbits/s, 25580.80 packets/s

在VM Host-Only網卡上面可以跑到500Mbps以上

13.於NODE-A(primary)測試1000個1M小檔寫入速度
先建一個1M小檔
NODE-A:~# dd if=/dev/zero of=/tmp/0 bs=1M count=1
用while loop copy這個1M檔案1000次
NODE-A:~# date ;i=1;while [ $i -le 1000 ] ; do cp /tmp/0 /DRBD/$i; i=$[$i+1]; done;date
Thu Oct 2 23:14:00 CST 2008
Thu Oct 2 23:14:10 CST 2008
由時間戳顯示花了10秒鐘

於NODE-B(secondary)上用iptraf觀察eth1的流量
Peak total activity: 674872.19 kbits/s, 82699.00 packets/s
Peak incoming rate: 659846.19 kbits/s, 54869.80 packets/s
Peak outgoing rate: 15513.48 kbits/s, 27829.20 packets/s

14.把NODE-A設為secondary,讓NODE-B當primary
需要先把NODE-A上面mount的/dev/drbd0 umount
NODE-A:~# umount /DRBD
NODE-A:~# drbdadm secondary r0
在NODE-B上把它設為primary並mount到 /DRBD 目錄
NODE-B:~# drbdadm primary r0
NODE-B:~# mount /dev/drbd0 /DRBD
NODE-B:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VG0-LVROOT
4184064 500296 3683768 12% /
tmpfs 258408 0 258408 0% /lib/init/rw
udev 10240 52 10188 1% /dev
tmpfs 258408 0 258408 0% /dev/shm
/dev/sda1 241116 13240 215428 6% /boot
/dev/drbd0 2086784 1024564 1062220 50% /DRBD

因為是在VM上面測試並非實體機器可能會出現誤差,所以測試結果僅供參考.
DRBD詳細說明文件 http://www.drbd.org/users-guide/users-guide.html

沒有留言: