1、前言
这里实验在使用cephfs时,如果cephfs的元数据损坏或丢失了,那该如何恢复出用户数据。下面就为大家演示下如何恢复。
2、准备测试环境
2.1、准备测试集群
我是基于L版本做的实验,J版也是可以的。测试环境如下:
[root@ceph05 ~]# ceph -v
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
[root@ceph05 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.07397 root default
-3 0.03699 host ceph05
0 hdd 0.01799 osd.0 up 1.00000 1.00000
1 hdd 0.01900 osd.1 up 1.00000 1.00000
-5 0.03699 host ceph06
2 hdd 0.01900 osd.2 up 1.00000 1.00000
3 hdd 0.01900 osd.3 up 1.00000 1.00000
[root@ceph05 deployceph]# ceph -s
cluster:
id: 176feab8-ca22-47bf-b809-202deac53c6f
health: HEALTH_WARN
crush map has straw_calc_version=0
services:
mon: 1 daemons, quorum ceph05
mgr: ceph05(active)
mds: cephfs-1/1/1 up {0=ceph05=up:active}
osd: 4 osds: 4 up, 4 in
data:
pools: 10 pools, 304 pgs
objects: 918 objects, 2.60GiB
usage: 6.70GiB used, 71.3GiB / 78.0GiB avail
pgs: 304 active+clean
2.2、准备测试数据
挂载kc
[root@ceph05 deployceph]# mount -t ceph 192.168.10.30:/ /cephfs
[root@ceph05 deployceph]# df -h|grep ceph
···
192.168.10.30:/ 78G 6.8G 72G 9% /cephfs
···
写入数据(这里我写入了几个比较有代表性的文件类型:txt、jpg、png、pdf、word、excel)
[root@ceph05 deployceph]# ll /cephfs/
total 5912
-rw-r--r-- 1 root root 31232 Mar 15 12:18 111.doc
-rw-r--r-- 1 root root 20593 Mar 15 12:18 22.xlsx
-rw-r--r-- 1 root root 12494 Mar 15 12:17 5be23a3eec2c0.png
-rw-r--r-- 1 root root 3189 Mar 15 12:17 cmap.txt
-rw-r--r-- 1 root root 5985243 Mar 15 12:17 hello0.pdf
3、模拟故障
这里直接模拟元数据丢失的情况,删除metadata池里面所有的元数据对象:
[root@ceph05 deployceph]# rados -p metadata ls|xargs -i rados -p metadata rm {}
[root@ceph05 deployceph]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
78.0GiB 71.3GiB 6.71GiB 8.60
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
.rgw.root 1 1.09KiB 0 63.3GiB 4
default.rgw.control 2 0B 0 63.3GiB 8
default.rgw.meta 3 720B 0 63.3GiB 5
default.rgw.log 4 0B 0 63.3GiB 207
default.rgw.buckets.index 5 0B 0 63.3GiB 1
default.rgw.buckets.data 6 1.02KiB 0 63.3GiB 2
pool01 7 2.60GiB 3.94 63.3GiB 666
rbd 8 36B 0 63.3GiB 4
metadata 17 0B 0 63.3GiB 0
data 18 5.77MiB 0 63.3GiB 6
看到metadata池里面没有对象了已经,重启下mds看效果,因为mds里面会缓存元数据信息,所以要重启下mds:
[root@ceph05 deployceph]# systemctl restart ceph-mds@ceph05
[root@ceph05 deployceph]#
[root@ceph05 deployceph]#
[root@ceph05 deployceph]# ceph -s
cluster:
id: 176feab8-ca22-47bf-b809-202deac53c6f
health: HEALTH_WARN
1 filesystem is degraded
1 filesystem has a failed mds daemon
crush map has straw_calc_version=0
services:
mon: 1 daemons, quorum ceph05
mgr: ceph05(active)
mds: cephfs-0/1/1 up , 1 failed
osd: 4 osds: 4 up, 4 in
data:
pools: 10 pools, 304 pgs
objects: 905 objects, 2.60GiB
usage: 6.71GiB used, 71.3GiB / 78.0GiB avail
pgs: 304 active+clean
看到集群现在不正常了,访问kc里面的数据卡住,说明数据已经无法正常读取了。
4、开始恢复
使用我编写的py脚本(文末给出了源码)恢复,把脚本放到集群任意一台节点上执行:
[root@ceph05 rcy]# python recovery_cephfs.py -p data
-p指定cephfs的数据池,运行完之后会在当前目录下产生两个文件夹和一个运行脚本的日志文件。
[root@ceph05 rcy]# ll
total 16
-rw-r--r-- 1 root root 3826 Mar 15 13:57 recovery_cephfs.py
drwxr-xr-x 2 root root 120 Mar 15 13:57 recoveryfiles
-rw-r--r-- 1 root root 4804 Mar 15 13:57 recovery.log
drwxr-xr-x 2 root root 4096 Mar 15 13:57 recoveryobjs
查看恢复出来的文件在recoveryfiles文件夹下:
[root@ceph05 rcy]# ll recoveryfiles/
total 12364
-rw-r--r-- 1 root root 5985243 Mar 15 13:57 10000000000-pdf
-rw-r--r-- 1 root root 3189 Mar 15 13:57 10000000001-text
-rw-r--r-- 1 root root 12494 Mar 15 13:57 10000000002-png
-rw-r--r-- 1 root root 31232 Mar 15 13:57 10000000003-text
-rw-r--r-- 1 root root 20593 Mar 15 13:57 10000000004-excel
文件名格式为”文件在cephfs里面的inode-该文件可能的类型“。恢复出来的文件名后面会给出该文件的类型。这样就可以使用合适的软件打开该文件来验证文件是否完整。
5、总结
在cephfs文件系统的元数据完全损坏的情况下,只要数据池对象不丢失,就可以恢复出完整的数据。恢复的思路如下:
- 获取数据池对象
- 根据inode找到该文件的所有对象
- 拼接对象
使用脚本注意事项:
- 现在的脚本在只加入了txt、jpg、png、pdf、word、excel这些文件类型的识别,需要其他的就需要自己加入到脚本里面了
- 只适合副本池
- 如果数据量特别大,不适合使用脚本,不过可以参考脚本的思路去一个一个文件恢复
6、脚本
# coding: utf-8
import os
import shutil
import json
import sys
import subprocess
import copy
import logging
import argparse
__auth__ = 'ypdai'
SLEEP_INTERVAL = 1
logging.basicConfig(filename='./recovery.log', format='%(asctime)s : %(levelname)s %(message)s',
level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
RECOVERY_OBJ_DIR = os.path.join(BASE_DIR, 'recoveryobjs')
RECOVERY_FILE_DIR = os.path.join(BASE_DIR, 'recoveryfiles')
def exec_cmd(cmd):
"""
执行shell命令,并返回标准输出和执行状态码
:param cmd:
:return:
"""
logging.info('exec_cmd():: cmd: {}'.format(cmd))
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
p.wait()
out = p.stdout.read()
code = p.returncode
logging.info('exec_cmd():: cmd exec out: {}, code: {}'.format(out, code))
return out, code
def prepare(pool_name):
"""
准备恢复条件
1、检查所给pool name是不是cephfs的数据池
2、创建recoveryobjs和recoveryfiles文件加
:param pool_name:
:return:
"""
cmd = 'ceph fs ls -f json-pretty'
out, code = exec_cmd(cmd)
out = json.loads(out)
for cnt in out:
if pool_name not in cnt.get('data_pools'):
return False
if os.path.isdir(RECOVERY_OBJ_DIR):
shutil.rmtree(RECOVERY_OBJ_DIR)
os.mkdir(RECOVERY_OBJ_DIR)
if os.path.isdir(RECOVERY_FILE_DIR):
shutil.rmtree(RECOVERY_FILE_DIR)
os.mkdir(RECOVERY_FILE_DIR)
return True
def get_file_type(file_path):
cmd = 'file %s' % file_path
out, code = exec_cmd(cmd)
out = out.split(':')[-1].lower()
file_type = 'text'
if 'word' in out:
file_type = 'word'
elif 'excel' in out:
file_type = 'excel'
elif 'pdf' in out:
file_type = 'pdf'
elif 'text' in out:
file_type = 'text'
elif 'jpeg' in out:
file_type = 'jpg'
elif 'png' in out:
file_type = 'png'
return file_type
def do_recovery(pool_name):
"""
具体执行恢复,大概恢复逻辑如下:
1、从数据池里面获取所有的数据对象
2、找到每个文件的head对象,然后把数据这个文件的其他对象内容写入head对象里面
3、根据head对象的文件类型,推测该文件的实际类型
:param pool_name:
:return:
"""
cmd = 'for obj in $(rados -p %s ls);do rados -p %s get ${obj} %s/${obj};done' % (
pool_name, pool_name, RECOVERY_OBJ_DIR)
out, code = exec_cmd(cmd)
if code != 0:
logging.error('do_recovery():: get obj from rados failed.')
return
cmd = 'ls %s' % RECOVERY_OBJ_DIR
out, code = exec_cmd(cmd)
if code != 0:
logging.error('do_recovery():: list obj failed.')
return
done_lst = []
objects = out.split()
for obj in objects:
inode, number = obj.split('.')
if inode in done_lst:
continue
cmd = '''ls -l %s | awk '{print $NF}' | grep ^%s |sort''' % (RECOVERY_OBJ_DIR, inode)
out, code = exec_cmd(cmd)
files = out.split('\n')
head_file = files[0]
file_type = get_file_type('%s/%s' % (RECOVERY_OBJ_DIR, head_file))
cmd = 'cp %s/%s %s/%s-%s' % (RECOVERY_OBJ_DIR, head_file, RECOVERY_FILE_DIR, inode, file_type)
out, code = exec_cmd(cmd)
for f in files[1:]:
if not f:
continue
cmd = 'cat %s/%s >> %s/%s-%s' % (RECOVERY_OBJ_DIR, f, RECOVERY_FILE_DIR, inode, file_type)
out, code = exec_cmd(cmd)
done_lst.append(inode)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--pool', required=True, type=str, dest='pool',
help='select given cephfs data pool by name')
args = parser.parse_args()
if not prepare(args.pool):
logging.error('main():: invalid pool name.')
sys.exit(1)
logging.info('=== main():: recovery start')
do_recovery(args.pool)
logging.info('=== main():: recovery done')