Ramblings of Linux openstack & ceph

Ansible -Rolling Ceph Update

| Comments

I run a PB scale ceph cluster providing RBD and RGW access, We are used to ceph’s point release updates and the advice from Sage to update existing clusters to the new version as soon as possible.

I had lot of experience upgrading ceph clusters all the way back from Hammer to current Firefly, The process has always been to update the mon’s followed by the OSD’s and finally the clients in my case RGW’s and Openstack Clients. But the process of doing updates manually was time consuming and very dull - To the rescue comes ansible, this the below playbook we are able to upgrade a PB scale cluster from 0.94.5 to 0.94.6 in just over 1hr fully unattended.

GitHub Raw Link

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
- hosts: all
  tasks:
  - name: Update packages
    apt: upgrade=dist update_cache=yes

- hosts: mons
  serial: 1
  tasks:

  - name: Restart ceph-mon service
    service: >
      name=ceph-mon-all
      state=restarted

  - name: Waiting for the monitor to join the quorum
    shell: >
      ceph -s | grep quorum | head -n1 | egrep -sq 
    register: result
    until: result.rc == 0
    retries: 5
    delay: 10

- hosts: osd 
  serial: 1
  tasks:

  - name: Set OSD flag for pcie
    command: ceph osd set 
    with_items:
      - noout
      - noscrub
      - nodeep-scrub
    delegate_to: ceph-mon-1

  - name: Waiting for clean PGs pre restart
    shell: >
      test "$(ceph pg stat | sed 's/^.*pgs://;s/active+clean.*//;s/ //')" -eq "$(ceph pg stat | sed 's/pgs.*//;s/^.*://;s/ //')" && ceph health | egrep -sq "HEALTH_OK|HEALTH_WARN"
    register: result
    until: result.rc == 0
    retries: 300
    delay: 10
    delegate_to: ceph-mon-1

  - name: Restart OSD processes
    service: >
      name=ceph-osd-all
      state=restarted

  - name: Waiting for clean PGs post restart
    shell: >
      test "$(ceph pg stat | sed 's/^.*pgs://;s/active+clean.*//;s/ //')" -eq "$(ceph pg stat | sed 's/pgs.*//;s/^.*://;s/ //')" && ceph health | egrep -sq "HEALTH_OK|HEALTH_WARN"
    register: result
    until: result.rc == 0
    retries: 300
    delay: 10
    delegate_to: ceph-mon-1

  - name: UnSet OSD maintenance flags
    command: ceph osd unset 
    with_items:
      - noout
      - noscrub
      - nodeep-scrub
    delegate_to: ceph-mon-1

- hosts: radosgw
  serial: 1
  tasks:
  - name: Restart rgw service after upgrade
    service: >
      name=radosgw-all
      state=restarted

  - name : wait for the rgw service to be running
    # TODO - replace this with a until: service status=running?
    shell: >
      pgrep radosgw
    register: result
    until: result.rc == 0
    retries: 100
    delay: 10

- hosts: loadbalancers
  serial: 1
  tasks:
  - name: Restart Load Balancers after the upgrade
    service: >
      name=haproxy
      state=restarted

  - name : wait for the haproxy service to be running
    shell: >
      pgrep haproxy
    register: result
    until: result.rc == 0
    retries: 100
    delay: 10

Comments