Unverified Commit 221dc3ec authored by travisn's avatar travisn Committed by Jared Watts
Browse files

docs: User docs/guides updates for multiple storage type support

Signed-off-by: default avatartravisn <tnielsen@redhat.com>
parent 3a2a2d20
Showing with 336 additions and 278 deletions
+336 -278
......@@ -22,27 +22,27 @@ storage cluster.
Most of the examples make use of the `ceph` client command. A quick way to use
the Ceph client suite is from a [Rook Toolbox container](toolbox.md).
The Kubernetes based examples assume Rook OSD pods are in the `rook` namespace.
If you run them in a different namespace, modify `kubectl -n rook [...]` to fit
The Kubernetes based examples assume Rook OSD pods are in the `rook-ceph` namespace.
If you run them in a different namespace, modify `kubectl -n rook-ceph [...]` to fit
your situation.
## Log Collection
All Rook logs can be collected in a Kubernetes environment with the following command:
```bash
(for p in $(kubectl -n rook get pods -o jsonpath='{.items[*].metadata.name}')
(for p in $(kubectl -n rook-ceph get pods -o jsonpath='{.items[*].metadata.name}')
do
for c in $(kubectl -n rook get pod ${p} -o jsonpath='{.spec.containers[*].name}')
for c in $(kubectl -n rook-ceph get pod ${p} -o jsonpath='{.spec.containers[*].name}')
do
echo "BEGIN logs from pod: ${p} ${c}"
kubectl -n rook logs -c ${c} ${p}
kubectl -n rook-ceph logs -c ${c} ${p}
echo "END logs from pod: ${p} ${c}"
done
done
for i in $(kubectl -n rook-system get pods -o jsonpath='{.items[*].metadata.name}')
for i in $(kubectl -n rook-ceph-system get pods -o jsonpath='{.items[*].metadata.name}')
do
echo "BEGIN logs from pod: ${i}"
kubectl -n rook-system logs ${i}
kubectl -n rook-ceph-system logs ${i}
echo "END logs from pod: ${i}"
done) | gzip > /tmp/rook-logs.gz
```
......@@ -60,14 +60,14 @@ difficult. The following scripts will clear things up quickly.
# Get OSD Pods
# This uses the example/default cluster name "rook"
OSD_PODS=$(kubectl get pods --all-namespaces -l \
app=rook-ceph-osd,rook_cluster=rook -o jsonpath='{.items[*].metadata.name}')
app=rook-ceph-osd,rook_cluster=rook-ceph -o jsonpath='{.items[*].metadata.name}')
# Find node and drive associations from OSD pods
for pod in $(echo ${OSD_PODS})
do
echo "Pod: ${pod}"
echo "Node: $(kubectl -n rook get pod ${pod} -o jsonpath='{.spec.nodeName}')"
kubectl -n rook exec ${pod} -- sh -c '\
echo "Node: $(kubectl -n rook-ceph get pod ${pod} -o jsonpath='{.spec.nodeName}')"
kubectl -n rook-ceph exec ${pod} -- sh -c '\
for i in /var/lib/rook/osd*; do
[ -f ${i}/ready ] || continue
echo -ne "-$(basename ${i}) "
......@@ -213,7 +213,7 @@ ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
Now we have a separate storage group for our SSDs, but we can't use that storage
until we associate a pool with it. The default group already has a pool called
`rbd` in many cases. If you [created a pool via CustomResourceDefinition](pool-crd.md),
`rbd` in many cases. If you [created a pool via CustomResourceDefinition](ceph-pool-crd.md),
it will use the default storage group as well.
Here's how to create new pools:
......@@ -280,7 +280,7 @@ and OSDs in the `default` root hierarchy.
The `size` setting of a pool tells the cluster how many copies of the data
should be kept for redundancy. By default the cluster will distribute these
copies between `host` buckets in the CRUSH Map This can be set when [creating a
pool via CustomResourceDefinition](pool-crd.md) or after creation with `ceph`.
pool via CustomResourceDefinition](ceph-pool-crd.md) or after creation with `ceph`.
So for example let's change the `size` of the `rbd` pool to three:
......@@ -330,12 +330,12 @@ The default override settings are blank. Cutting out the extraneous properties,
we would see the following defaults after creating a cluster:
```bash
$ kubectl -n rook get ConfigMap rook-config-override -o yaml
$ kubectl -n rook-ceph get ConfigMap rook-config-override -o yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: rook-config-override
namespace: rook
namespace: rook-ceph
data:
config: ""
```
......@@ -345,7 +345,7 @@ The next time the daemon pod(s) start, the settings will be merged with the defa
settings created by Rook.
```bash
kubectl -n rook edit configmap rook-config-override
kubectl -n rook-ceph edit configmap rook-config-override
```
Modify the settings and save. Each line you add should be indented from the `config` property as such:
......@@ -355,7 +355,7 @@ apiVersion: v1
kind: ConfigMap
metadata:
name: rook-config-override
namespace: rook
namespace: rook-ceph
data:
config: |
[global]
......
......@@ -14,16 +14,16 @@ This guide assumes you have created a Rook cluster as explained in the main [Qui
## Provision Storage
Before Rook can start provisioning storage, a StorageClass and its storage pool need to be created. This is needed for Kubernetes to interoperate with Rook for provisioning persistent volumes. For more options on pools, see the documentation on [creating storage pools](pool-crd.md).
Before Rook can start provisioning storage, a StorageClass and its storage pool need to be created. This is needed for Kubernetes to interoperate with Rook for provisioning persistent volumes. For more options on pools, see the documentation on [creating storage pools](ceph-pool-crd.md).
Save this storage class definition as `rook-storageclass.yaml`:
Save this storage class definition as `storageclass.yaml`:
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Pool
metadata:
name: replicapool
namespace: rook
namespace: rook-ceph
spec:
replicated:
size: 3
......@@ -31,15 +31,16 @@ spec:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-block
provisioner: rook.io/block
name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
pool: replicapool
clusterName: rook-ceph
```
Create the storage class.
```bash
kubectl create -f rook-storageclass.yaml
kubectl create -f storageclass.yaml
```
## Consume the storage: Wordpress sample
......@@ -63,7 +64,7 @@ mysql-pv-claim Bound pvc-95402dbc-efc0-11e6-bc9a-0cc47a3459ee 20Gi
wp-pv-claim Bound pvc-39e43169-efc1-11e6-bc9a-0cc47a3459ee 20Gi RWO 1m
```
Once the wordpress and mysql pods are in the `Running` state, get the cluster IP of the wordpress app and enter it in your brower:
Once the wordpress and mysql pods are in the `Running` state, get the cluster IP of the wordpress app and enter it in your browser:
```bash
$ kubectl get svc wordpress
......@@ -73,6 +74,12 @@ wordpress 10.3.0.155 <pending> 80:30841/TCP 2m
You should see the wordpress app running.
If you are using Minikube, the Wordpress URL can be retrieved with this one-line command:
```console
echo http://$(minikube ip):$(kubectl get service wordpress -o jsonpath='{.spec.ports[0].nodePort}')
```
**NOTE:** When running in a vagrant environment, there will be no external IP address to reach wordpress with. You will only be able to reach wordpress via the `CLUSTER-IP` from inside the Kubernetes cluster.
## Consume the storage: Toolbox
......@@ -86,6 +93,6 @@ To clean up all the artifacts created by the block demo:
```
kubectl delete -f wordpress.yaml
kubectl delete -f mysql.yaml
kubectl delete -n rook pool replicapool
kubectl delete storageclass rook-block
kubectl delete -n rook-ceph pool replicapool
kubectl delete storageclass rook-ceph-block
```
---
title: Cluster
title: Ceph Cluster
weight: 32
indent: true
---
# Cluster CRD
# Ceph Cluster CRD
Rook allows creation and customization of storage clusters through the custom resource definitions (CRDs). The following settings are
available for a cluster.
available for a Ceph cluster.
## Settings
......@@ -25,7 +25,8 @@ Settings can be specified at the global level to apply to the cluster as a whole
- If a path is not specified, an [empty dir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) will be used and the config will be lost when the pod or host is restarted. This option is **not recommended**.
- **WARNING**: For test scenarios, if you delete a cluster and start a new cluster on the same hosts, the path used by `dataDirHostPath` must be deleted. Otherwise, stale keys and other config will remain from the previous cluster and the new mons will fail to start.
If this value is empty, each pod will get an ephemeral directory to store their config files that is tied to the lifetime of the pod running on that node. More details can be found in the Kubernetes [empty dir docs](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).
- `hostNetwork`: uses network of the hosts instead of using the SDN below the containers.
- `network`: The network settings for the cluster
- `hostNetwork`: uses network of the hosts instead of using the SDN below the containers.
- `monCount`: set the number of mons to be started. The number should be odd and between `1` and `9`. Default if not specified is `3`.
For more details on the mons and when to choose a number other than `3`, see the [mon health design doc](https://github.com/rook/rook/blob/master/design/mon-health.md).
- `placement`: [placement configuration settings](#placement-configuration-settings)
......@@ -39,7 +40,7 @@ For more details on the mons and when to choose a number other than `3`, see the
- [storage configuration settings](#storage-configuration-settings)
#### Node updates
Nodes can be added and removed over time by updating the Cluster CRD, for example with `kubectl -n rook edit cluster rook`.
Nodes can be added and removed over time by updating the Cluster CRD, for example with `kubectl -n rook-ceph edit cluster rook`.
This will bring up your default text editor and allow you to add and remove storage nodes from the cluster.
This feature is only available when `useAllNodes` has been set to `false`.
......@@ -48,14 +49,13 @@ This feature is only available when `useAllNodes` has been set to `false`.
In addition to the cluster level settings specified above, each individual node can also specify configuration to override the cluster level settings and defaults.
If a node does not specify any configuration then it will inherit the cluster level settings.
- `name`: The name of the node, which should match its `kubernetes.io/hostname` label.
- `devices`: A list of individual device names belonging to this node to include in the storage cluster.
- `name`: The name of the device (e.g., `sda`).
- `config`: Config settings applied to all OSDs on the node unless overridden by `devices` or `directories`. See the [config settings](#osd-configuration-settings) below.
- [storage selection settings](#storage-selection-settings)
- [storage configuration settings](#storage-configuration-settings)
### Storage Selection Settings
Below are the settings available, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster.
Below are the settings available, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster.
- `useAllDevices`: `true` or `false`, indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. **Not recommended** unless you have a very controlled environment where you will not risk formatting of devices with existing data. When `true`, all devices will be used except those with partitions created or a local filesystem. Is overridden by `deviceFilter` if specified.
- `deviceFilter`: A regular expression that allows selection of devices to be consumed by OSDs. If individual devices have been specified for a node then this filter will be ignored. This field uses [golang regular expression syntax](https://golang.org/pkg/regexp/syntax/). For example:
- `sdb`: Only selects the `sdb` device if found
......@@ -63,19 +63,22 @@ Below are the settings available, both at the cluster and individual node level,
- `^sd[a-d]`: Selects devices starting with `sda`, `sdb`, `sdc`, and `sdd` if found
- `^s`: Selects all devices that start with `s`
- `^[^r]`: Selects all devices that do *not* start with `r`
- `metadataDevice`: Name of a device to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data.
- `devices`: A list of individual device names belonging to this node to include in the storage cluster.
- `name`: The name of the device (e.g., `sda`).
- `config`: Device-specific config settings. See the [config settings](#osd-configuration-settings) below.
- `directories`: A list of directory paths that will be included in the storage cluster. Note that using two directories on the same physical device can cause a negative performance impact.
- `path`: The path on disk of the directory (e.g., `/rook/storage-dir`).
- `config`: Directory-specific config settings. See the [config settings](#osd-configuration-settings) below.
- `location`: Location information about the cluster to help with data placement, such as region or data center. This is directly fed into the underlying Ceph CRUSH map. More information on CRUSH maps can be found in the [ceph docs](http://docs.ceph.com/docs/master/rados/operations/crush-map/).
### Storage Configuration Settings
Below are the settings available, both at the cluster and individual node level, that affect how the selected storage resources will be configured.
- `location`: Location information about the cluster to help with data placement, such as region or data center. This is directly fed into the underlying Ceph CRUSH map. More information on CRUSH maps can be found in the [ceph docs](http://docs.ceph.com/docs/master/rados/operations/crush-map/).
- `storeConfig`: Configuration information about the store format for each OSD.
### OSD Configuration settings
The following storage selection settings are specific to Ceph and do not apply to other backends. All variables are key-value pairs represented as strings.
- `metadataDevice`: Name of a device to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data.
- `storeType`: `filestore` or `bluestore`, the underlying storage format to use for each OSD. The default is set dynamically to `bluestore` for devices, while `filestore` is the default for directories. Set this store type explicitly to override the default. Warning: Bluestore is **not** recommended for directories in production. Bluestore does not purge data from the directory and over time will grow without the ability to compact or shrink.
- `databaseSizeMB`: The size in MB of a bluestore database.
- `walSizeMB`: The size in MB of a bluestore write ahead log (WAL).
- `journalSizeMB`: The size in MB of a filestore journal.
- `databaseSizeMB`: The size in MB of a bluestore database. Include quotes around the size.
- `walSizeMB`: The size in MB of a bluestore write ahead log (WAL). Include quotes around the size.
- `journalSizeMB`: The size in MB of a filestore journal. Include quotes around the size.
### Placement Configuration Settings
......@@ -119,13 +122,13 @@ For more information on resource requests/limits see the official Kubernetes doc
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
# cluster level storage configuration and selection
......@@ -133,11 +136,11 @@ spec:
useAllNodes: true
useAllDevices: true
deviceFilter:
metadataDevice:
location:
storeConfig:
databaseSizeMB: 1024 # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: 1024 # this value can be removed for environments with normal sized disks (20 GB or larger)
config:
metadataDevice:
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
```
### Storage Configuration: Specific devices
......@@ -149,13 +152,13 @@ Each node's 'name' field should match their 'kubernetes.io/hostname' label.
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
# cluster level storage configuration and selection
......@@ -163,11 +166,11 @@ spec:
useAllNodes: false
useAllDevices: false
deviceFilter:
metadataDevice:
location:
storeConfig:
databaseSizeMB: 1024 # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: 1024 # this value can be removed for environments with normal sized disks (20 GB or larger)
config:
metadataDevice:
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
nodes:
- name: "172.17.4.101"
directories: # specific directories to use for storage can be specified for each node
......@@ -176,7 +179,7 @@ spec:
devices: # specific devices to use for storage can be specified for each node
- name: "sdb"
- name: "sdc"
storeConfig: # configuration can be specified at the node level which overrides the cluster level config
config: # configuration can be specified at the node level which overrides the cluster level config
storeType: bluestore
- name: "172.17.4.301"
deviceFilter: "^sd."
......@@ -191,22 +194,22 @@ Individual nodes can override the cluster wide specified directories list.
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
# cluster level storage configuration and selection
storage:
useAllNodes: false
useAllDevices: false
storeConfig:
databaseSizeMB: 1024 # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: 1024 # this value can be removed for environments with normal sized disks (20 GB or larger)
config:
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
directories:
- path: "/rook/storage-dir"
nodes:
......@@ -227,13 +230,13 @@ tolerate taints with a key of 'storage-node'.
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
placement:
......@@ -271,13 +274,13 @@ You can override these requests/limits for OSDs per node when using `useAllNodes
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
# cluster level resource requests/limits configuration
......
---
title: Shared File System
title: Ceph Shared File System
weight: 38
indent: true
---
# Shared File System CRD
# Ceph Shared File System CRD
Rook allows creation and customization of shared file systems through the custom resource definitions (CRDs). The following settings are available
for file systems.
for Ceph file systems.
## Sample
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Filesystem
metadata:
name: myfs
namespace: rook
namespace: rook-ceph
spec:
metadataPool:
replicated:
......@@ -60,7 +60,7 @@ spec:
### Pools
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](ceph-pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.
- `metadataPool`: The settings used to create the file system metadata pool. Must use replication.
- `dataPools`: The settings to create the file system data pools. If multiple pools are specified, Rook will add the pools to the file system. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
......@@ -71,5 +71,5 @@ The metadata server settings correspond to the MDS daemon settings.
- `activeCount`: The number of active MDS instances. As load increases, CephFS will automatically partition the file system across the MDS instances. Rook will create double the number of MDS instances as requested by the active count. The extra instances will be in standby mode for failover.
- `activeStandby`: If true, the extra MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. The instances will be assigned by CephFS in failover pairs. If false, the extra MDS instances will all be on passive standby mode and will not maintain a warm cache of the metadata.
- `placement`: The mds pods can be given standard Kubernetes placement restrictions with `nodeAffinity`, `tolerations`, `podAffinity`, and `podAntiAffinity` similar to placement defined for daemons configured by the [cluster CRD](/cluster/examples/kubernetes/rook-cluster.yaml).
- `resources`: Set resource requests/limits for the Filesystem MDS Pod(s), see [Resource Requirements/Limits](cluster-crd.md#resource-requirementslimits).
- `placement`: The mds pods can be given standard Kubernetes placement restrictions with `nodeAffinity`, `tolerations`, `podAffinity`, and `podAntiAffinity` similar to placement defined for daemons configured by the [cluster CRD](/cluster/examples/kubernetes/ceph/cluster.yaml).
- `resources`: Set resource requests/limits for the Filesystem MDS Pod(s), see [Resource Requirements/Limits](ceph-cluster-crd.md#resource-requirementslimits).
---
title: Object Store
title: Ceph Object Store
weight: 36
indent: true
---
# Object Store CRD
# Ceph Object Store CRD
Rook allows creation and customization of object stores through the custom resource definitions (CRDs). The following settings are available
for object stores.
for Ceph object stores.
## Sample
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: ObjectStore
metadata:
name: my-store
namespace: rook
namespace: rook-ceph
spec:
metadataPool:
replicated:
......@@ -64,7 +64,7 @@ spec:
### Pools
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least three devices (2 data + 1 coding chunks) in the cluster.
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](ceph-pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least three devices (2 data + 1 coding chunks) in the cluster.
- `metadataPool`: The settings used to create all of the object store metadata pools. Must use replication.
- `dataPool`: The settings to create the object store data pool. Can use replication or erasure coding.
......@@ -80,4 +80,4 @@ The gateway settings correspond to the RGW daemon settings.
- `instances`: The number of pods that will be started to load balance this object store. Ignored if `allNodes` is true.
- `allNodes`: Whether RGW pods should be started on all nodes. If true, a daemonset is created. If false, `instances` must be set.
- `placement`: The Kubernetes placement settings to determine where the RGW pods should be started in the cluster.
- `resources`: Set resource requests/limits for the Gateway Pod(s), see [Resource Requirements/Limits](cluster-crd.md#resource-requirementslimits).
- `resources`: Set resource requests/limits for the Gateway Pod(s), see [Resource Requirements/Limits](ceph-cluster-crd.md#resource-requirementslimits).
---
title: Pool
title: Ceph Pool
weight: 34
indent: true
---
# Pool CRD
# Ceph Pool CRD
Rook allows creation and customization of storage pools through the custom resource definitions (CRDs). The following settings are available
for pools.
......@@ -12,11 +12,11 @@ for pools.
## Sample
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Pool
metadata:
name: ecpool
namespace: rook
namespace: rook-ceph
spec:
replicated:
# size: 3
......
......@@ -28,18 +28,18 @@ There are two main categories of information you will need to investigate issues
## Kubernetes Tools
Kubernetes status is the first line of investigating when something goes wrong with the cluster. Here are a few artifacts that are helpful to gather:
- Rook pod status:
- `kubectl get pod -n rook -o wide`
- `kubectl get pod -n rook-system -o wide`
- `kubectl get pod -n rook-ceph -o wide`
- `kubectl get pod -n rook-ceph-system -o wide`
- Logs for Rook pods
- Logs for the operator: `kubectl logs -n rook-system -l app=rook-operator`
- Logs for a specific pod: `kubectl logs -n rook <pod-name>`, or a pod using a label such as mon1: `kubectl logs -n rook -l mon=rook-ceph-mon1`
- Logs for the operator: `kubectl logs -n rook-ceph-system -l app=rook-operator`
- Logs for a specific pod: `kubectl logs -n rook-ceph <pod-name>`, or a pod using a label such as mon1: `kubectl logs -n rook-ceph -l mon=rook-ceph-mon1`
- Logs on a specific node to find why a PVC is failing to mount:
- Rook agent errors around the attach/detach: `kubectl logs -n rook-system <rook-agent-pod>`
- Rook agent errors around the attach/detach: `kubectl logs -n rook-ceph-system <rook-ceph-agent-pod>`
- Connect to the node, then get kubelet logs (if your distro is using systemd): `journalctl -u kubelet`
- See the [log collection topic](advanced-configuration.md#log-collection) for a script that will help you gather the logs
- Other Rook artifacts:
- The monitors that are expected to be in quorum: `kubectl -n rook get configmap rook-ceph-mon-endpoints -o yaml | grep data`
- More artifacts in the `rook` namespace: `kubectl -n rook get all`
- The monitors that are expected to be in quorum: `kubectl -n rook-ceph get configmap rook-ceph-mon-endpoints -o yaml | grep data`
- More artifacts in the `rook` namespace: `kubectl -n rook-ceph get all`
## Ceph Tools
After you verify the basic health of the running pods, next you will want to run Ceph tools for status of the storage components. There are two ways to run the Ceph tools, either in the Rook toolbox or inside other Rook pods that are already running.
......@@ -54,7 +54,7 @@ After you verify the basic health of the running pods, next you will want to run
The Ceph tools are found in all of the Rook pods where Ceph is running, such as the operator, monitors, and OSDs. Rather than starting the toolbox pod, you can connect to the existing pods to more quickly execute the Ceph tools. For example, to connect to the operator pod:
```bash
kubectl -n rook-system exec -it $(kubectl -n rook-system get pods -l app=rook-operator -o jsonpath='{.items[0].metadata.name}') -- bash
kubectl -n rook-ceph-system exec -it $(kubectl -n rook-ceph-system get pods -l app=rook-operator -o jsonpath='{.items[0].metadata.name}') -- bash
```
Now from inside the operator pod you can execute the Ceph tools.
......@@ -83,13 +83,13 @@ There are many Ceph sub-commands to look at and manipulate Ceph objects, well be
* `kubectl describe pod` for the pod mentions one or more of the following:
* `PersistentVolumeClaim is not bound`
* `timeout expired waiting for volumes to attach/mount`
* `kubectl -n rook-system get pod` shows the rook-agent pods in a `CrashLoopBackOff` status
* `kubectl -n rook-ceph-system get pod` shows the rook-ceph-agent pods in a `CrashLoopBackOff` status
## Possible Solutions Summary
* `rook-agent` pod is in a `CrashLoopBackOff` status because it cannot deploy its driver on a read-only filesystem: [Flexvolume configuration pre-reqs](./k8s-pre-reqs.md#flexvolume-configuration)
* `rook-ceph-agent` pod is in a `CrashLoopBackOff` status because it cannot deploy its driver on a read-only filesystem: [Flexvolume configuration pre-reqs](./k8s-pre-reqs.md#flexvolume-configuration)
* Persistent Volume and/or Claim are failing to be created and bound: [Volume Creation](#volume-creation)
* `rook-agent` pod is failing to mount and format the volume: [Rook Agent Mounting](#volume-mounting)
* You are using Kubernetes 1.7.x or earlier and the Kubelet has not been restarted after `rook-agent` is in the `Running` status: [Restart Kubelet](#kubelet-restart)
* `rook-ceph-agent` pod is failing to mount and format the volume: [Rook Agent Mounting](#volume-mounting)
* You are using Kubernetes 1.7.x or earlier and the Kubelet has not been restarted after `rook-ceph-agent` is in the `Running` status: [Restart Kubelet](#kubelet-restart)
## Investigation Details
If you see some of the symptoms above, it's because the requested Rook storage for your pod is not being created and mounted successfully.
......@@ -113,38 +113,38 @@ Events:
To troubleshoot this, let's walk through the volume provisioning steps in order to confirm where the failure is happening.
### Rook Agent Deployment
The `rook-agent` pods are responsible for mapping and mounting the volume from the cluster onto the node that your pod will be running on.
If the `rook-agent` pod is not running then it cannot perform this function.
Below is an example of the `rook-agent` pods failing to get to the `Running` status because they are in a `CrashLoopBackOff` status:
The `rook-ceph-agent` pods are responsible for mapping and mounting the volume from the cluster onto the node that your pod will be running on.
If the `rook-ceph-agent` pod is not running then it cannot perform this function.
Below is an example of the `rook-ceph-agent` pods failing to get to the `Running` status because they are in a `CrashLoopBackOff` status:
```console
> kubectl -n rook-system get pod
NAME READY STATUS RESTARTS AGE
rook-agent-ct5pj 0/1 CrashLoopBackOff 16 59m
rook-agent-zb6n9 0/1 CrashLoopBackOff 16 59m
rook-operator-2203999069-pmhzn 1/1 Running 0 59m
> kubectl -n rook-ceph-system get pod
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-ct5pj 0/1 CrashLoopBackOff 16 59m
rook-ceph-agent-zb6n9 0/1 CrashLoopBackOff 16 59m
rook-operator-2203999069-pmhzn 1/1 Running 0 59m
```
If you see this occurring, you can get more details about why the `rook-agent` pods are continuing to crash with the following command and its sample output:
If you see this occurring, you can get more details about why the `rook-ceph-agent` pods are continuing to crash with the following command and its sample output:
```console
> kubectl -n rook-system get pod -l app=rook-agent -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.message}{"\n"}{end}'
rook-agent-ct5pj mkdir /usr/libexec/kubernetes: read-only file system
rook-agent-zb6n9 mkdir /usr/libexec/kubernetes: read-only file system
> kubectl -n rook-ceph-system get pod -l app=rook-ceph-agent -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.message}{"\n"}{end}'
rook-ceph-agent-ct5pj mkdir /usr/libexec/kubernetes: read-only file system
rook-ceph-agent-zb6n9 mkdir /usr/libexec/kubernetes: read-only file system
```
From the output above, we can see that the agents were not able to bind mount to `/usr/libexec/kubernetes` on the host they are scheduled to run on.
For some environments, this default path is read-only and therefore a better path must be provided to the agents.
First, clean up the agent deployment with:
```console
kubectl -n rook-system delete daemonset rook-agent
kubectl -n rook-ceph-system delete daemonset rook-ceph-agent
```
Once the `rook-agent` pods are gone, **follow the instructions in the [Flexvolume configuration pre-reqs](./k8s-pre-reqs.md#flexvolume-configuration)** to ensure a good value for `--volume-plugin-dir` has been provided to the Kubelet.
Once the `rook-ceph-agent` pods are gone, **follow the instructions in the [Flexvolume configuration pre-reqs](./k8s-pre-reqs.md#flexvolume-configuration)** to ensure a good value for `--volume-plugin-dir` has been provided to the Kubelet.
After that has been configured, and the Kubelet has been restarted, start the agent pods up again by restarting `rook-operator`:
```console
kubectl -n rook-system delete pod -l app=rook-operator
kubectl -n rook-ceph-system delete pod -l app=rook-operator
```
### Kubelet Restart
#### **Kubernetes 1.7.x only**
If the `rook-agent` pods are all in the `Running` state then another thing to confirm is that **if you are running on Kubernetes 1.7.x**, the Kubelet must be restarted after the `rook-agent` pods are running.
If the `rook-ceph-agent` pods are all in the `Running` state then another thing to confirm is that **if you are running on Kubernetes 1.7.x**, the Kubelet must be restarted after the `rook-ceph-agent` pods are running.
A symptom of this can be found in the Kubelet's log/journal, with the following error saying `no volume plugin matched`:
```console
......@@ -158,16 +158,16 @@ Let's confirm that with the following commands and their output:
```console
> kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-9f273fbc-bdbf-11e7-bc4c-001c428b9fc8 20Gi RWO Delete Bound default/mysql-pv-claim rook-block 25m
pvc-9f273fbc-bdbf-11e7-bc4c-001c428b9fc8 20Gi RWO Delete Bound default/mysql-pv-claim rook-ceph-block 25m
> kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-9f273fbc-bdbf-11e7-bc4c-001c428b9fc8 20Gi RWO rook-block 25m
mysql-pv-claim Bound pvc-9f273fbc-bdbf-11e7-bc4c-001c428b9fc8 20Gi RWO rook-ceph-block 25m
```
Both your volume and its claim should be in the `Bound` status.
If one or neither of them is not in the `Bound` status, then look for details of the issue in the `rook-operator` logs:
```console
kubectl -n rook-system logs `kubectl -n rook-system -l app=rook-operator get pods -o jsonpath='{.items[*].metadata.name}'`
kubectl -n rook-ceph-system logs `kubectl -n rook-ceph-system -l app=rook-operator get pods -o jsonpath='{.items[*].metadata.name}'`
```
If the volume is failing to be created, there should be details in the `rook-operator` log output, especially those tagged with `op-provisioner`.
......@@ -176,31 +176,31 @@ One common cause for the `rook-operator` failing to create the volume is when th
In that scenario, the `rook-operator` log would show a failure similar to the following:
```
2018-03-28 18:58:32.041603 I | op-provisioner: creating volume with configuration {pool:replicapool clusterNamespace:rook fstype:}
2018-03-28 18:58:32.041728 I | exec: Running command: rbd create replicapool/pvc-fd8aba49-32b9-11e8-978e-08002762c796 --size 20480 --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring
E0328 18:58:32.060893 5 controller.go:801] Failed to provision volume for claim "default/mysql-pv-claim" with StorageClass "rook-block": Failed to create rook block image replicapool/pvc-fd8aba49-32b9-11e8-978e-08002762c796: failed to create image pvc-fd8aba49-32b9-11e8-978e-08002762c796 in pool replicapool of size 21474836480: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/rook/rook.config
2018-03-28 18:58:32.041603 I | op-provisioner: creating volume with configuration {pool:replicapool clusterNamespace:rook-ceph fstype:}
2018-03-28 18:58:32.041728 I | exec: Running command: rbd create replicapool/pvc-fd8aba49-32b9-11e8-978e-08002762c796 --size 20480 --cluster=rook --conf=/var/lib/rook/rook-ceph/rook.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring
E0328 18:58:32.060893 5 controller.go:801] Failed to provision volume for claim "default/mysql-pv-claim" with StorageClass "rook-ceph-block": Failed to create rook block image replicapool/pvc-fd8aba49-32b9-11e8-978e-08002762c796: failed to create image pvc-fd8aba49-32b9-11e8-978e-08002762c796 in pool replicapool of size 21474836480: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/rook-ceph/rook.config
. output:
```
The solution is to ensure that the [`clusterNamespace`](https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/rook-storageclass.yaml#L25) field matches the **namespace** of the Rook cluster when creating the `StorageClass`.
### Volume Mounting
The final step in preparing Rook storage for your pod is for the `rook-agent` pod to mount and format it.
If all the preceding sections have been successful or inconclusive, then take a look at the `rook-agent` pod logs for further clues.
You can determine which `rook-agent` is running on the same node that your pod is scheduled on by using the `-o wide` output, then you can get the logs for that `rook-agent` pod similar to the example below:
The final step in preparing Rook storage for your pod is for the `rook-ceph-agent` pod to mount and format it.
If all the preceding sections have been successful or inconclusive, then take a look at the `rook-ceph-agent` pod logs for further clues.
You can determine which `rook-ceph-agent` is running on the same node that your pod is scheduled on by using the `-o wide` output, then you can get the logs for that `rook-ceph-agent` pod similar to the example below:
```console
> kubectl -n rook-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
rook-agent-h6scx 1/1 Running 0 9m 172.17.8.102 172.17.8.102
rook-agent-mp7tn 1/1 Running 0 9m 172.17.8.101 172.17.8.101
rook-operator-2203999069-3tb68 1/1 Running 0 9m 10.32.0.7 172.17.8.101
> kubectl -n rook-ceph-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-agent-h6scx 1/1 Running 0 9m 172.17.8.102 172.17.8.102
rook-ceph-agent-mp7tn 1/1 Running 0 9m 172.17.8.101 172.17.8.101
rook-operator-2203999069-3tb68 1/1 Running 0 9m 10.32.0.7 172.17.8.101
> kubectl -n rook-system logs rook-agent-h6scx
> kubectl -n rook-ceph-system logs rook-ceph-agent-h6scx
2017-10-30 23:07:06.984108 I | rook: starting Rook v0.5.0-241.g48ce6de.dirty with arguments '/usr/local/bin/rook agent'
...
```
In the `rook-agent` pod logs, you may see a snippet similar to the following:
In the `rook-ceph-agent` pod logs, you may see a snippet similar to the following:
```console
Failed to complete rbd: signal: interrupt.
```
......@@ -216,7 +216,7 @@ If `uname -a` shows that you have a kernel version older than `3.15`, you'll nee
### Filesystem Mounting
In the `rook-agent` pod logs, you may see a snippet similar to the following:
In the `rook-ceph-agent` pod logs, you may see a snippet similar to the following:
```console
2017-11-07 00:04:37.808870 I | rook-flexdriver: WARNING: The node kernel version is 4.4.0-87-generic, which do not support multiple ceph filesystems. The kernel version has to be at least 4.7. If you have multiple ceph filesystems, the result could be inconsistent
```
......@@ -241,7 +241,7 @@ We want to help you get your storage working and learn from those lessons to pre
Create a [rook-tools pod](./toolbox.md) to investigate the current state of CEPH. Here is an example of what one might see. In this case the `ceph status` command would just hang so a CTRL-C needed to be sent.
```console
$ kubectl -n rook exec -it rook-tools bash
$ kubectl -n rook-ceph exec -it rook-tools bash
root@rook-tools:/# ceph status
^CCluster connection interrupted or timed out
```
......@@ -249,7 +249,7 @@ root@rook-tools:/# ceph status
Another indication is when one or more of the MON pods restart frequently. Note the 'mon107' that has only been up for 16 minutes in the following output.
```console
$ kubectl -n rook get all -o wide --show-all
$ kubectl -n rook-ceph get all -o wide --show-all
NAME READY STATUS RESTARTS AGE IP NODE
po/rook-ceph-mgr0-2487684371-gzlbq 1/1 Running 0 17h 192.168.224.46 k8-host-0402
po/rook-ceph-mon107-p74rj 1/1 Running 0 16m 192.168.224.28 k8-host-0402
......@@ -279,13 +279,13 @@ If the first mon is not detected healthy, the operator will continue to check un
### Operator fails to connect to the mon
First look at the logs of the operator to confirm if it is able to connect to the mons.
```
$ kubectl -n rook-system logs -l app=rook-operator
$ kubectl -n rook-ceph-system logs -l app=rook-operator
```
Likely you will see an error similar to the following that the operator is timing out when connecting to the mon. The last command is `ceph mon_status`,
followed by a timeout message five minutes later.
```
2018-01-21 21:47:32.375833 I | exec: Running command: ceph mon_status --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format json --out-file /tmp/442263890
2018-01-21 21:47:32.375833 I | exec: Running command: ceph mon_status --cluster=rook --conf=/var/lib/rook/rook-ceph/rook.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/442263890
2018-01-21 21:52:35.370533 I | exec: 2018-01-21 21:52:35.071462 7f96a3b82700 0 monclient(hunting): authenticate timed out after 300
2018-01-21 21:52:35.071462 7f96a3b82700 0 monclient(hunting): authenticate timed out after 300
2018-01-21 21:52:35.071524 7f96a3b82700 0 librados: client.admin authentication error (110) Connection timed out
......@@ -304,7 +304,7 @@ A common issue is that the CNI is not configured correctly.
Second we need to verify if the mon pod started successfully.
```
$ kubectl -n rook get pod -l app=rook-ceph-mon
$ kubectl -n rook-ceph get pod -l app=rook-ceph-mon
NAME READY STATUS RESTARTS AGE
rook-ceph-mon0-r8tbl 0/1 CrashLoopBackOff 2 47s
```
......@@ -314,7 +314,7 @@ you should see the reason by describing the pod.
```
# the pod shows a termination status that the keyring does not match the existing keyring
$ kubectl -n rook describe pod -l mon=rook-ceph-mon0
$ kubectl -n rook-ceph describe pod -l mon=rook-ceph-mon0
...
Last State: Terminated
Reason: Error
......@@ -342,7 +342,7 @@ Then when the cluster CRD is applied to start a new cluster, the rook-operator s
When an OSD starts, the device or directory will be configured for consumption. If there is an error with the configuration, the pod will crash and you will see the CrashLoopBackoff
status for the pod. Look in the osd pod logs for an indication of the failure.
```
$ kubectl -n rook logs rook-ceph-osd-fl8fs
$ kubectl -n rook-ceph logs rook-ceph-osd-fl8fs
...
```
......@@ -350,7 +350,7 @@ One common case for failure is that you have re-deployed a test cluster and some
If your cluster is larger than a few nodes, you may get lucky enough that the monitors were able to start and form quorum. However, now the OSDs pods may fail to start due to the
old state. Looking at the OSD pod logs you will see an error about the file already existing.
```
$ kubectl -n rook logs rook-ceph-osd-fl8fs
$ kubectl -n rook-ceph logs rook-ceph-osd-fl8fs
...
2017-10-31 20:13:11.187106 I | mkfs-osd0: 2017-10-31 20:13:11.186992 7f0059d62e00 -1 bluestore(/var/lib/rook/osd0) _read_fsid unparsable uuid
2017-10-31 20:13:11.187208 I | mkfs-osd0: 2017-10-31 20:13:11.187026 7f0059d62e00 -1 bluestore(/var/lib/rook/osd0) _setup_block_symlink_or_file failed to create block symlink to /dev/disk/by-partuuid/651153ba-2dfc-4231-ba06-94759e5ba273: (17) File exists
......
......@@ -7,7 +7,9 @@ weight: 30
Rook allows you to create and manage your storage cluster through custom resource definitions (CRDs). Each type of resource
has its own CRD defined.
- [Cluster](cluster-crd.md): A Rook cluster provides the basis of the storage platform to serve block, object stores, and shared file systems.
- [Pool](pool-crd.md): A pool manages the backing store for a block store. Pools are also used internally by object and file stores.
- [Object Store](object-store-crd.md): An object store exposes storage with an S3-compatible interface.
- [File System](filesystem-crd.md): A file system provides shared storage for multiple Kubernetes pods.
## Ceph
- [Cluster](ceph-cluster-crd.md): A Rook cluster provides the basis of the storage platform to serve block, object stores, and shared file systems.
- [Pool](ceph-pool-crd.md): A pool manages the backing store for a block store. Pools are also used internally by object and file stores.
- [Object Store](ceph-object-store-crd.md): An object store exposes storage with an S3-compatible interface.
- [File System](ceph-filesystem-crd.md): A file system provides shared storage for multiple Kubernetes pods.
......@@ -19,7 +19,7 @@ that it is the only mon in quorum, and then restart the good mon.
### Stop the operator
First, stop the operator so it will not try to failover the mons while we are modifying the monmap
```bash
kubectl -n rook-system delete deployment rook-operator
kubectl -n rook-ceph-system delete deployment rook-operator
```
### Inject a new monmap
......@@ -30,7 +30,7 @@ In this example, the healthy mon is `rook-ceph-mon1`, while the unhealthy mons a
Connect to the pod of a healthy mon and run the following commands.
```bash
kubectl -n rook exec -it <mon-pod> bash
kubectl -n rook-ceph exec -it <mon-pod> bash
# set a few simple variables
cluster_namespace=rook
......@@ -70,7 +70,7 @@ Exit the shell to continue.
Edit the configmap that the operator uses to track the mons.
```bash
kubectl -n rook edit configmap rook-ceph-mon-endpoints
kubectl -n rook-ceph edit configmap rook-ceph-mon-endpoints
```
In the `data` element you will see three mons such as the following (or more depending on your `moncount`):
......@@ -88,7 +88,7 @@ Save the file and exit.
### Restart the mon
You will need to restart the good mon pod to pick up the changes. Delete the good mon pod and kubernetes will automatically restart the mon.
```bash
kubectl -n rook delete pod -l mon=rook-ceph-mon1
kubectl -n rook-ceph delete pod -l mon=rook-ceph-mon1
```
Start the rook [toolbox](/Documentation/toolbox.md) and verify the status of the cluster.
......@@ -102,7 +102,7 @@ The status should show one mon in quorum. If the status looks good, your cluster
Start the rook operator again to resume monitoring the health of the cluster.
```bash
# create the operator. it is safe to ignore the errors that a number of resources already exist.
kubectl create -f rook-operator.yaml
kubectl create -f operator.yaml
```
The operator will automatically add more mons to increase the quorum size again, depending on the `monCount`.
......@@ -22,16 +22,16 @@ Please refer to [cephfs experimental features](http://docs.ceph.com/docs/master/
## Create the File System
Create the file system by specifying the desired settings for the metadata pool, data pools, and metadata server in the `Filesystem` CRD. In this example we create the metadata pool with replication of three and a single data pool with erasure coding. For more options, see the documentation on [creating shared file systems](filesystem-crd.md).
Create the file system by specifying the desired settings for the metadata pool, data pools, and metadata server in the `Filesystem` CRD. In this example we create the metadata pool with replication of three and a single data pool with erasure coding. For more options, see the documentation on [creating shared file systems](ceph-filesystem-crd.md).
Save this shared file system definition as `rook-filesystem.yaml`:
Save this shared file system definition as `filesystem.yaml`:
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Filesystem
metadata:
name: myfs
namespace: rook
namespace: rook-ceph
spec:
metadataPool:
replicated:
......@@ -48,10 +48,10 @@ spec:
The Rook operator will create all the pools and other resources necessary to start the service. This may take a minute to complete.
```bash
# Create the file system
$ kubectl create -f rook-filesystem.yaml
$ kubectl create -f filesystem.yaml
# To confirm the file system is configured, wait for the mds pods to start
$ kubectl -n rook get pod -l app=rook-ceph-mds
$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME READY STATUS RESTARTS AGE
rook-ceph-mds-myfs-7d59fdfcf4-h8kw9 1/1 Running 0 12s
rook-ceph-mds-myfs-7d59fdfcf4-kgkjp 1/1 Running 0 12s
......@@ -60,7 +60,7 @@ rook-ceph-mds-myfs-7d59fdfcf4-kgkjp 1/1 Running 0 12s
To see detailed status of the file system, start and connect to the [Rook toolbox](toolbox.md). A new line will be shown with `ceph status` for the `mds` service. In this example, there is one active instance of MDS which is up, with one MDS instance in `standby-replay` mode in case of failover.
```bash
$ ceph status
$ ceph status
...
services:
mds: myfs-1/1/1 up {[myfs:0]=mzw58b=up:active}, 1 up:standby-replay
......@@ -115,16 +115,16 @@ spec:
volumes:
- name: image-store
flexVolume:
driver: rook.io/rook
driver: ceph.rook.io/rook
fsType: ceph
options:
fsName: myfs # name of the filesystem specified in the filesystem CRD.
clusterNamespace: rook # namespace where the Rook cluster is deployed
clusterNamespace: rook-ceph # namespace where the Rook cluster is deployed
# by default the path is /, but you can override and mount a specific path of the filesystem by using the path attribute
# path: /some/path/inside/cephfs
```
You now have a docker registry which is HA with persistent storage.
After creating it with `kubectl create -f kube-registry.yaml`, you now have a docker registry which is HA with persistent storage.
#### Kernel Version Requirement
If the Rook cluster has more than one filesystem and the application pod is scheduled to a node with kernel version older than 4.7, inconsistent results may arise since kernels older than 4.7 do not support specifying filesystem namespaces.
......@@ -137,11 +137,10 @@ Once you have pushed an image to the registry (see the [instructions](https://gi
## Teardown
To clean up all the artifacts created by the file system demo:
```bash
kubectl -n kube-system delete secret rook-admin
kubectl delete -f kube-registry.yaml
```
To delete the filesystem components and backing data, delete the Filesystem CRD. **Warning: Data will be deleted**
```
kubectl -n rook delete Filesystem myfs
kubectl -n rook-ceph delete Filesystem myfs
```
......@@ -23,12 +23,12 @@ Please refer to the below sections for more information on your specific platfor
Restart Kubelet in order for this change to take effect.
## For Kubernetes >= 1.9.x
In Kubernetes 1.9.x, you must provide the above set Flexvolume plugin directory when deploying the [rook-operator](/cluster/examples/kubernetes/rook-operator.yaml) by setting the environment variable `FLEXVOLUME_DIR_PATH`. For example:
In Kubernetes 1.9.x, you must provide the above set Flexvolume plugin directory when deploying the [rook-operator](/cluster/examples/kubernetes/ceph/operator.yaml) by setting the environment variable `FLEXVOLUME_DIR_PATH`. For example:
```yaml
- name: FLEXVOLUME_DIR_PATH
value: "/var/lib/kubelet/volumeplugins"
```
(In the `rook-operator.yaml` manifest replace `<PathToFlexVolumes>` with the path)
(In the `operator.yaml` manifest replace `<PathToFlexVolumes>` with the path)
## For Rancher
......
---
title: Operator
title: Ceph Operator
weight: 51
indent: true
---
# Operator Helm Chart
# Ceph Operator Helm Chart
Installs [rook](https://github.com/rook/rook) to create, configure, and manage Rook clusters on Kubernetes.
Installs [rook](https://github.com/rook/rook) to create, configure, and manage Ceph clusters on Kubernetes.
## Introduction
This chart bootstraps a [rook-operator](https://github.com/rook/rook) deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
This chart bootstraps a [rook-ceph-operator](https://github.com/rook/rook) deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
## Prerequisites
......@@ -33,33 +33,43 @@ kubectl --namespace kube-system patch deploy/tiller-deploy -p '{"spec": {"templa
## Installing
The Rook Operator helm chart will install the basic components necessary to create a storage platform for your Kubernetes cluster.
The Ceph Operator helm chart will install the basic components necessary to create a storage platform for your Kubernetes cluster.
After the helm chart is installed, you will need to [create a Rook cluster](quickstart.md#create-a-rook-cluster).
The `helm install` command deploys rook on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. It is recommended that the rook operator be installed into the `rook-system` namespace (you will install your clusters into separate namespaces).
The `helm install` command deploys rook on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. It is recommended that the rook operator be installed into the `rook-ceph-system` namespace (you will install your clusters into separate namespaces).
Rook currently publishes builds to the `alpha` and `master` channels. In the future, `beta` and `stable` will also be available.
### Alpha
The alpha channel is the most recent release of Rook that is considered ready for testing by the community.
```console
helm repo add rook-alpha https://charts.rook.io/alpha
```
For the v0.7 release (see the [v0.7 documentation](https://rook.io/docs/rook/v0.7/helm-operator.html)):
```console
helm install --namespace rook-system rook-alpha/rook
```
After the v0.8 release is available:
```console
helm install --namespace rook-ceph-system rook-alpha/rook-ceph
```
### Master
The master channel includes the latest commits, with all automated tests green. Historically it has been very stable, though there is no guarantee.
To install the helm chart from master, you will need to pass the specific version returned by the `search` command.
```console
helm repo add rook-master https://charts.rook.io/master
helm search rook
helm install --namespace rook-system rook-master/rook --version <version>
helm search rook-ceph
helm install --namespace rook-ceph-system rook-master/rook-ceph --version <version>
```
For example:
```
helm install rook-master/rook --version v0.6.0-156.gef983d6
helm install rook-master/rook-ceph --version v0.6.0-156.gef983d6
```
### Development Build
......@@ -68,16 +78,16 @@ To deploy from a local build from your development environment:
1. Copy the image to your K8s cluster, such as with the `docker save` then the `docker load` commands
1. Install the helm chart
```console
cd cluster/charts/rook
helm install --namespace rook-system --name rook .
cd cluster/charts/rook-ceph
helm install --namespace rook-ceph-system --name rook-ceph .
```
## Uninstalling the Chart
To uninstall/delete the `rook` deployment:
To uninstall/delete the `rook-ceph` deployment:
```console
$ helm delete --purge rook
$ helm delete --purge rook-ceph
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
......@@ -88,7 +98,7 @@ The following tables lists the configurable parameters of the rook-operator char
| Parameter | Description | Default |
|--------------------|--------------------------------------|----------------------|
| `image.repository` | Image | `rook/rook` |
| `image.repository` | Image | `rook/ceph` |
| `image.tag` | Image tag | `master` |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
| `rbacEnable` | If true, create & use RBAC resources | `true` |
......@@ -108,14 +118,14 @@ You can pass the settings with helm command line parameters. Specify each parame
`--set key=value[,key=value]` argument to `helm install`. For example, the following command will install rook where RBAC is not enabled.
```console
$ helm install --namespace rook-system --name rook rook-alpha/rook --set rbacEnable=false
$ helm install --namespace rook-ceph-system --name rook-ceph rook-alpha/rook-ceph --set rbacEnable=false
```
### Settings File
Alternatively, a yaml file that specifies the values for the above parameters (`values.yaml`) can be provided while installing the chart.
```console
$ helm install --namespace rook-system --name rook rook-alpha/rook -f values.yaml
$ helm install --namespace rook-ceph-system --name rook-ceph rook-alpha/rook-ceph -f values.yaml
```
Here are the sample settings to get you started.
......@@ -123,7 +133,7 @@ Here are the sample settings to get you started.
```yaml
image:
prefix: rook
repository: rook/rook
repository: rook/ceph
tag: master
pullPolicy: IfNotPresent
......
......@@ -41,14 +41,14 @@ kubectl create -f prometheus-service.yaml
Ensure that the Prometheus server pod gets created and advances to the `Running` state before moving on:
```bash
kubectl -n rook get pod prometheus-rook-prometheus-0
kubectl -n rook-ceph get pod prometheus-rook-prometheus-0
```
## Prometheus Web Console
Once the Prometheus server is running, you can open a web browser and go to the URL that is output from this command:
```bash
echo "http://$(kubectl -n rook -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"
echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"
```
You should now see the Prometheus monitoring website.
......@@ -96,7 +96,7 @@ To clean up all the artifacts created by the monitoring walkthrough, copy/paste
kubectl delete -f service-monitor.yaml
kubectl delete -f prometheus.yaml
kubectl delete -f prometheus-service.yaml
kubectl -n rook delete statefulset prometheus-rook-prometheus
kubectl -n rook-ceph delete statefulset prometheus-rook-prometheus
kubectl delete -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.8/bundle.yaml
```
Then the rest of the instructions in the [Prometheus Operator docs](https://github.com/coreos/prometheus-operator#removal) can be followed to finish cleaning up.
......
......@@ -15,14 +15,14 @@ This guide assumes you have created a Rook cluster as explained in the main [Kub
## Create an Object Store
Now we will create the object store, which starts the RGW service in the cluster with the S3 API.
Specify your desired settings for the object store in the `rook-object.yaml`. For more details on the settings see the [Object Store CRD](object-store-crd.md).
Specify your desired settings for the object store in the `object.yaml`. For more details on the settings see the [Object Store CRD](ceph-object-store-crd.md).
```yaml
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: ObjectStore
metadata:
name: my-store
namespace: rook
namespace: rook-ceph
spec:
metadataPool:
replicated:
......@@ -43,10 +43,10 @@ spec:
When the object store is created the Rook operator will create all the pools and other resources necessary to start the service. This may take a minute to complete.
```bash
# Create the object store
kubectl create -f rook-object.yaml
kubectl create -f object.yaml
# To confirm the object store is configured, wait for the rgw pod to start
kubectl -n rook get pod -l app=rook-ceph-rgw
kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
```
## Create a User
......@@ -82,14 +82,14 @@ export AWS_ACCESS_KEY_ID=<accessKey>
export AWS_SECRET_ACCESS_KEY=<secretKey>
```
- `Host`: The DNS host name where the rgw service is found in the cluster. Assuming you are using the default `rook` cluster, it will be `rook-ceph-rgw-my-store.rook`.
- `Endpoint`: The endpoint where the rgw service is listening. Run `kubectl -n rook get svc rook-ceph-rgw-my-store`, then combine the clusterIP and the port.
- `Host`: The DNS host name where the rgw service is found in the cluster. Assuming you are using the default `rook-ceph` cluster, it will be `rook-ceph-rgw-my-store.rook-ceph`.
- `Endpoint`: The endpoint where the rgw service is listening. Run `kubectl -n rook-ceph get svc rook-ceph-rgw-my-store`, then combine the clusterIP and the port.
- `Access key`: The user's `access_key` as printed above
- `Secret key`: The user's `secret_key` as printed above
The variables for the user generated in this example would be:
```bash
export AWS_HOST=rook-ceph-rgw-my-store.rook
export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph
export AWS_ENDPOINT=10.104.35.31:80
export AWS_ACCESS_KEY_ID=XEZDB3UJ6X7HVBE7X7MA
export AWS_SECRET_ACCESS_KEY=7yGIZON7EhFORz0I40BFniML36D2rl8CQQ5kXU6l
......@@ -134,7 +134,7 @@ you will need to setup an external service through a `NodePort`.
First, note the service that exposes RGW internal to the cluster. We will leave this service intact and create a new service for external access.
```bash
$ kubectl -n rook get service rook-ceph-rgw-my-store
$ kubectl -n rook-ceph get service rook-ceph-rgw-my-store
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-rgw-my-store 10.3.0.177 <none> 80/TCP 2m
```
......@@ -146,10 +146,10 @@ apiVersion: v1
kind: Service
metadata:
name: rook-ceph-rgw-my-store-external
namespace: rook
namespace: rook-ceph
labels:
app: rook-ceph-rgw
rook_cluster: rook
rook_cluster: rook-ceph
rook_object_store: my-store
spec:
ports:
......@@ -159,7 +159,7 @@ spec:
targetPort: 80
selector:
app: rook-ceph-rgw
rook_cluster: rook
rook_cluster: rook-ceph
rook_object_store: my-store
sessionAffinity: None
type: NodePort
......@@ -173,10 +173,10 @@ kubectl create -f rgw-external.yaml
See both rgw services running and notice what port the external service is running on:
```bash
$ kubectl -n rook get service rook-ceph-rgw-my-store rook-ceph-rgw-my-store-external
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-rgw-my-store 10.0.0.83 <none> 80/TCP 21m
rook-ceph-rgw-my-store-external 10.0.0.26 <nodes> 80:30041/TCP 1m
$ kubectl -n rook-ceph get service rook-ceph-rgw-my-store rook-ceph-rgw-my-store-external
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-rgw-my-store ClusterIP 10.104.82.228 <none> 80/TCP 4m
rook-ceph-rgw-my-store-external NodePort 10.111.113.237 <none> 80:31536/TCP 39s
```
Internally the rgw service is running on port `80`. The external port in this case is `30041`. Now you can access the object store from anywhere! All you need is the hostname for any machine in the cluster, the external port, and the user credentials.
Internally the rgw service is running on port `80`. The external port in this case is `31536`. Now you can access the object store from anywhere! All you need is the hostname for any machine in the cluster, the external port, and the user credentials.
......@@ -25,9 +25,9 @@ If you are using `dataDirHostPath` to persist rook data on kubernetes hosts, mak
If you're feeling lucky, a simple Rook cluster can be created with the following kubectl commands. For the more detailed install, skip to the next section to [deploy the Rook operator](#deploy-the-rook-operator).
```
cd cluster/examples/kubernetes
kubectl create -f rook-operator.yaml
kubectl create -f rook-cluster.yaml
cd cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml
kubectl create -f cluster.yaml
```
After the cluster is running, you can create [block, object, or file](#storage) storage to be consumed by other applications in your cluster.
......@@ -37,11 +37,11 @@ After the cluster is running, you can create [block, object, or file](#storage)
The first step is to deploy the Rook system components, which include the Rook agent running on each node in your cluster as well as Rook operator pod.
```bash
cd cluster/examples/kubernetes
kubectl create -f rook-operator.yaml
cd cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml
# verify the rook-operator and rook-agents pods are in the `Running` state before proceeding
kubectl -n rook-system get pod
# verify the rook-ceph-operator and rook-ceph-agent pods are in the `Running` state before proceeding
kubectl -n rook-ceph-system get pod
```
You can also deploy the operator with the [Rook Helm Chart](helm-operator.md).
......@@ -57,43 +57,43 @@ For versions of Kubernetes prior to 1.8, the Kubelet process on all nodes will r
## Create a Rook Cluster
Now that the Rook operator and agent pods are running, we can create the Rook cluster. For the cluster to survive reboots,
make sure you set the `dataDirHostPath` property. For more settings, see the documentation on [configuring the cluster](cluster-crd.md).
make sure you set the `dataDirHostPath` property. For more settings, see the documentation on [configuring the cluster](ceph-cluster-crd.md).
Save the cluster spec as `rook-cluster.yaml`:
Save the cluster spec as `cluster.yaml`:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: rook
name: rook-ceph
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
storage:
useAllNodes: true
useAllDevices: false
storeConfig:
config:
storeType: bluestore
databaseSizeMB: 1024
journalSizeMB: 1024
databaseSizeMB: "1024"
journalSizeMB: "1024"
```
Create the cluster:
```bash
kubectl create -f rook-cluster.yaml
kubectl create -f cluster.yaml
```
Use `kubectl` to list pods in the `rook` namespace. You should be able to see the following pods once they are all running:
```bash
$ kubectl -n rook get pod
$ kubectl -n rook-ceph get pod
NAME READY STATUS RESTARTS AGE
rook-ceph-mgr0-1279756402-wc4vt 1/1 Running 0 5m
rook-ceph-mon0-jflt5 1/1 Running 0 6m
......
......@@ -68,7 +68,7 @@ working recommendation for rook:
##### ClusterRole and ClusterRoleBinding
Next up you require a `ClusterRole` and a corresponding `ClusterRoleBinding`, which enables the Rook Agent `ServiceAccount` to run the rook-agent `Pods` on all nodes
Next up you require a `ClusterRole` and a corresponding `ClusterRoleBinding`, which enables the Rook Agent `ServiceAccount` to run the rook-ceph-agent `Pods` on all nodes
with privileged rights. Here are the definitions:
```yaml
......@@ -95,18 +95,18 @@ kind: Namespace
metadata:
name: rook-system
---
# Allow the rook-agent serviceAccount to use the privileged PSP
# Allow the rook-ceph-agent serviceAccount to use the privileged PSP
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rook-agent-psp
name: rook-ceph-agent-psp
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: privileged-psp-user
subjects:
- kind: ServiceAccount
name: rook-agent
name: rook-ceph-agent
namespace: rook-system
```
......@@ -126,7 +126,7 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rook-default-psp
namespace: rook
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
......@@ -134,14 +134,14 @@ roleRef:
subjects:
- kind: ServiceAccount
name: default
namespace: rook
namespace: rook-ceph
---
# Allow the rook-ceph-osd serviceAccount to use the privileged PSP
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rook-ceph-osd-psp
namespace: rook
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
......@@ -149,5 +149,5 @@ roleRef:
subjects:
- kind: ServiceAccount
name: rook-ceph-osd
namespace: rook
namespace: rook-ceph
```
......@@ -6,8 +6,8 @@ indent: true
# Cleaning up a Cluster
If you want to tear down the cluster and bring up a new one, be aware of the following resources that will need to be cleaned up:
- `rook-system` namespace: The Rook operator and agent created by `rook-operator.yaml`
- `rook` namespace: The Rook storage cluster created by `rook-cluster.yaml` (the cluster CRD)
- `rook-ceph-system` namespace: The Rook operator and agent created by `operator.yaml`
- `rook-ceph` namespace: The Rook storage cluster created by `cluster.yaml` (the cluster CRD)
- `/var/lib/rook`: Path on each host in the cluster where configuration is cached by the ceph mons and osds
Note that if you changed the default namespaces or paths in the sample yaml files, you will need to adjust these namespaces and paths throughout these instructions.
......@@ -23,35 +23,34 @@ These commands will clean up the resources from the [block](block.md#teardown) a
```console
kubectl delete -f wordpress.yaml
kubectl delete -f mysql.yaml
kubectl delete -n rook pool replicapool
kubectl delete storageclass rook-block
kubectl delete -n kube-system secret rook-admin
kubectl delete -n rook-ceph pool replicapool
kubectl delete storageclass rook-ceph-block
kubectl delete -f kube-registry.yaml
```
## Delete the Cluster CRD
After those block and file resources have been cleaned up, you can then delete your Rook cluster. This is important to delete **before removing the Rook operator and agent or else resources may not be cleaned up properly**.
```console
kubectl delete -n rook cluster rook
kubectl delete -n rook-ceph cluster rook-ceph
```
Verify that the cluster CRD has been deleted before continuing to the next step.
```
kubectl -n rook get cluster
kubectl -n rook-ceph get cluster
```
## Delete the Operator and Agent
This will begin the process of all cluster resources being cleaned up, after which you can delete the rest of the deployment with the following:
```console
kubectl delete -n rook-system daemonset rook-agent
kubectl delete -f rook-operator.yaml
kubectl delete clusterroles rook-agent
kubectl delete clusterrolebindings rook-agent
kubectl delete -n rook-ceph-system daemonset rook-ceph-agent
kubectl delete -f operator.yaml
kubectl delete clusterroles rook-ceph-agent
kubectl delete clusterrolebindings rook-ceph-agent
```
Optionally remove the rook namespace if it is not in use by any other resources.
Optionally remove the rook-ceph namespace if it is not in use by any other resources.
```
kubectl delete namespace rook
kubectl delete namespace rook-ceph
```
## Delete the data on hosts
......@@ -70,13 +69,13 @@ The most common issue cleaning up the cluster is that the `rook` namespace or th
Look at the pods:
```
kubectl -n rook get pod
kubectl -n rook-ceph get pod
```
If a pod is still terminating, you will need to wait or else attempt to forcefully terminate it (`kubectl delete pod <name>`).
Now look at the cluster CRD:
```
kubectl -n rook get cluster
kubectl -n rook-ceph get cluster
```
If the cluster CRD still exists even though you have executed the delete command earlier, see the next section on removing the finalizer.
......@@ -86,12 +85,12 @@ When a Cluster CRD is created, a [finalizer](https://kubernetes.io/docs/tasks/ac
The operator is responsible for removing the finalizer after the mounts have been cleaned up. If for some reason the operator is not able to remove the finalizer (ie. the operator is not running anymore), you can delete the finalizer manually.
```
kubectl -n rook edit cluster rook
kubectl -n rook-ceph edit cluster rook-ceph
```
This will open a text editor (usually `vi`) to allow you to edit the CRD. Look for the `finalizers` element and delete the following line:
```
- cluster.rook.io
- cluster.ceph.rook.io
```
Now save the changes and exit the editor. Within a few seconds you should see that the cluster CRD has been deleted and will no longer block other cleanup such as deleting the `rook` namespace.
......@@ -45,4 +45,4 @@ Visit the official [Tectonic manual boot](https://coreos.com/tectonic/docs/lates
## Start Rook
After the Tectonic Installer ran and the Kubernetes cluster is started and ready, you can follow the [Rook installation guide](quickstart.md).
If you want to specify which disks Rook uses, follow the instructions in [creating Rook clusters](cluster-crd.md)
If you want to specify which disks Rook uses, follow the instructions in [creating Rook clusters](ceph-cluster-crd.md)
......@@ -20,7 +20,7 @@ apiVersion: v1
kind: Pod
metadata:
name: rook-tools
namespace: rook
namespace: rook-ceph
spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
......@@ -70,12 +70,12 @@ kubectl create -f rook-tools.yaml
Wait for the toolbox pod to download its container and get to the `running` state:
```bash
kubectl -n rook get pod rook-tools
kubectl -n rook-ceph get pod rook-tools
```
Once the rook-tools pod is running, you can connect to it with:
```bash
kubectl -n rook exec -it rook-tools bash
kubectl -n rook-ceph exec -it rook-tools bash
```
All available tools in the toolbox are ready for your troubleshooting needs. Example:
......@@ -88,7 +88,7 @@ rados df
When you are done with the toolbox, remove the pod:
```bash
kubectl -n rook delete pod rook-tools
kubectl -n rook-ceph delete pod rook-tools
```
## Troubleshooting without the Toolbox
......
......@@ -32,7 +32,7 @@ In order to successfully upgrade a Rook cluster, the following prerequisites mus
Review the [health verification section](#health-verification) in order to verify your cluster is in a good starting state.
* `dataDirHostPath` must be set in your Cluster spec.
This persists metadata on host nodes, enabling pods to be terminated during the upgrade and for new pods to be created in their place.
More details about `dataDirHostPath` can be found in the [Cluster CRD readme](./cluster-crd.md#cluster-settings).
More details about `dataDirHostPath` can be found in the [Cluster CRD readme](./ceph-cluster-crd.md#cluster-settings).
* All pods consuming Rook storage should be created, running, and in a steady state. No Rook persistent volumes should be in the act of being created or deleted.
The minimal sample Cluster spec that will be used in this guide can be found below (note that the specific configuration may not be applicable to all environments):
......@@ -130,31 +130,46 @@ In this guide, we will be upgrading a live Rook cluster running `v0.7.0` to the
Let's get started!
### Agents
The Rook agents are deployed by the operator to run on every node. They are in charge of handling all operations related to the consumption of storage from the cluster.
The agents are deployed and managed by a Kubernetes daemonset. Since the agents are stateless, the simplest way to update them is by deleting them and allowing the operator
to create them again.
The Rook agents are deployed by the operator to run on every node.
They are in charge of handling all operations related to the consumption of storage from the cluster.
The agents are deployed and managed by a Kubernetes daemonset.
Since the agents are stateless, the simplest way to update them is by deleting them and allowing the operator to create them again.
Delete the agent daemonset:
Delete the agent daemonset and permissions:
```bash
kubectl -n rook-system delete daemonset rook-agent
kubectl delete clusterroles rook-agent
kubectl delete clusterrolebindings rook-agent
```
Now when the operator is updated, the agent daemonset will automatically be created again with the new version.
Now when the operator is recreated, the agent daemonset will automatically be created again with the new version.
### Operator
The Rook operator is the management brains of the cluster, so it should be upgraded first before other components.
In the event that the new version requires a migration of metadata or config, the operator is the one that would understand how to perform that migration.
The operator is managed by a Kubernetes deployment, so in order to upgrade the version of the operator pod, we will need to edit the image version of the pod template in the deployment spec. This can be done with the following command:
Since the upgrade process for this version includes support for storage providers beyond Ceph, we will need to start up a Ceph specific operator.
Let's delete the deployment for the old operator and its permissions first:
```bash
kubectl -n rook-system set image deployment/rook-operator rook-operator=rook/rook:v0.7.0-27.gbfc8ec6
kubectl -n rook-system delete deployment rook-operator
kubectl delete clusterroles rook-operator
kubectl delete clusterrolebindings rook-operator
```
Now we need to create the new Ceph specific operator.
**IMPORTANT:** Ensure that you are using the latest manifests from either `master` or the `release-0.8` branch. If you have custom configuration options set in your old `rook-operator.yaml` manifest, you will need to set those values in the new Ceph operator manifest below.
Navigate to the new Ceph manifests directory, apply your custom configuration options if you are using any, and then create the new Ceph operator:
```bash
cd cluster/examples/kubernetes/ceph
cat operator.yaml | sed -e 's/namespace: rook-ceph-system/namespace: rook-system/g' | kubectl create -f -
```
Once the command is executed, Kubernetes will begin the flow of the deployment updating the operator pod.
#### Operator Health Verification
To verify the operator pod is `Running` and using the new version of `rook/rook:v0.7.0-27.gbfc8ec6`, use the following commands:
To verify the operator pod is `Running` and using the new version of `rook/ceph:master`, use the following commands:
```bash
OPERATOR_POD_NAME=$(kubectl -n rook-system get pods -l app=rook-operator -o jsonpath='{.items[0].metadata.name}')
OPERATOR_POD_NAME=$(kubectl -n rook-system get pods -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}')
kubectl -n rook-system get pod ${OPERATOR_POD_NAME} -o jsonpath='{.status.phase}{"\n"}{.spec.containers[0].image}{"\n"}'
```
......@@ -178,9 +193,9 @@ so we will delete the old pod and start the new toolbox.
kubectl -n rook delete pod rook-tools
```
After verifying the old tools pod has terminated, start the new toolbox. You will need to either create the toolbox using the yaml in the master branch
or simply set the version of the container to `rook/rook:v0.7.0-27.gbfc8ec6` before creating the toolbox.
or simply set the version of the container to `rook/toolbox:master` before creating the toolbox.
```
kubectl create -f rook-tools.yaml
cat rook-tools.yaml | sed -e 's/namespace: rook-ceph/namespace: rook/g' | kubectl create -f -
```
### API
......@@ -192,10 +207,10 @@ kubectl -n rook delete deploy rook-api
### Monitors
There are multiple monitor pods to upgrade and they are each individually managed by their own replica set.
**For each** monitor's replica set, you will need to update the pod template spec's image version field to `rook/rook:v0.7.0-27.gbfc8ec6`.
**For each** monitor's replica set, you will need to update the pod template spec's image version field to `rook/ceph:master`.
For example, we can update the replica set for `mon0` with:
```bash
kubectl -n rook set image replicaset/rook-ceph-mon0 rook-ceph-mon=rook/rook:v0.7.0-27.gbfc8ec6
kubectl -n rook set image replicaset/rook-ceph-mon0 rook-ceph-mon=rook/ceph:master
```
Once the replica set has been updated, we need to manually terminate the old pod which will trigger the replica set to create a new pod using the new version.
......@@ -216,7 +231,7 @@ If all of the monitors (and the cluster health overall) look good, then we can m
This is okay as long as the cluster health looks good and all monitors eventually reach quorum again.
### Object Storage Daemons (OSDs)
The OSD pods can be managed in two different ways, depending on how you specified your storage configuration in your [Cluster spec](./cluster-crd.md#cluster-settings).
The OSD pods can be managed in two different ways, depending on how you specified your storage configuration in your [Cluster spec](./ceph-cluster-crd.md#cluster-settings).
* **Use all nodes:** all storage nodes in the cluster will be managed by a single daemon set.
Only the one daemon set will need to be edited to update the image version, then each OSD pod will need to be deleted so that a new pod will be created by the daemon set to take its place.
* **Specify individual nodes:** each storage node specified in the cluster spec will be managed by its own individual replica set.
......@@ -235,7 +250,7 @@ kubectl -n rook edit replicaset rook-ceph-osd-<node>
Update the version of the container.
```
image: rook/rook:v0.7.0-27.gbfc8ec6
image: rook/ceph:master
```
Once the daemon set (or replica set) is updated, we can begin deleting each OSD pod **one at a time** and verifying a new one comes up to replace it that is running the new version.
......@@ -259,9 +274,9 @@ Remember after each OSD pod to verify the cluster health using the instructions
### Ceph Manager
Similar to the Rook operator, the Ceph manager pods are managed by a deployment.
We will edit the deployment to use the new image version of `rook/rook:v0.7.0-27.gbfc8ec6`:
We will edit the deployment to use the new image version of `rook/ceph:master`:
```bash
kubectl -n rook set image deploy/rook-ceph-mgr0 rook-ceph-mgr0=rook/rook:v0.7.0-27.gbfc8ec6
kubectl -n rook set image deploy/rook-ceph-mgr0 rook-ceph-mgr0=rook/ceph:master
```
To verify that the manager pod is `Running` and on the new version, use the following:
......@@ -269,14 +284,37 @@ To verify that the manager pod is `Running` and on the new version, use the foll
kubectl -n rook get pod -l app=rook-ceph-mgr -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.phase}{" "}{.spec.containers[0].image}{"\n"}{end}'
```
### Legacy Custom Resource Definitions (CRDs)
During this upgrade process, the new Ceph operator automatically migrated legacy custom resources to their new `rook.io/v1alpha2` and `ceph.rook.io/v1alpha1` types.
First confirm that there are no remaining legacy CRD instances:
```bash
kubectl -n rook get clusters.rook.io
kubectl -n rook get objectstores.rook.io
kubectl -n rook get filesystems.rook.io
kubectl -n rook get pools.rook.io
kubectl -n rook get volumeattachments.rook.io
```
After confirming that each of those commands returns `No resources found`, it is safe to go ahead and delete the legacy CRD types:
```bash
kubectl delete crd clusters.rook.io
kubectl delete crd filesystems.rook.io
kubectl delete crd objectstores.rook.io
kubectl delete crd pools.rook.io
kubectl delete crd volumeattachments.rook.io
```
### Optional Components
If you have optionally installed either [object storage](./object.md) or a [shared file system](./filesystem.md) in your Rook cluster, the sections below will provide guidance on how to update them as well.
They are both managed by deployments, which we have already covered in this guide, so the instructions will be brief.
#### Object Storage (RGW)
If you have object storage installed, first edit the RGW deployment to use the new image version of `rook/rook:v0.7.0-27.gbfc8ec6`:
If you have object storage installed, first edit the RGW deployment to use the new image version of `rook/ceph:master`:
```bash
kubectl -n rook set image deploy/rook-ceph-rgw-my-store rook-ceph-rgw-my-store=rook/rook:v0.7.0-27.gbfc8ec6
kubectl -n rook set image deploy/rook-ceph-rgw-my-store rook-ceph-rgw-my-store=rook/ceph:master
```
To verify that the RGW pod is `Running` and on the new version, use the following:
......@@ -285,9 +323,9 @@ kubectl -n rook get pod -l app=rook-ceph-rgw -o jsonpath='{range .items[*]}{.met
```
#### Shared File System (MDS)
If you have a shared file system installed, first edit the MDS deployment to use the new image version of `rook/rook:v0.7.0-27.gbfc8ec6`:
If you have a shared file system installed, first edit the MDS deployment to use the new image version of `rook/ceph:master`:
```bash
kubectl -n rook set image deploy/rook-ceph-mds-myfs rook-ceph-mds-myfs=rook/rook:v0.7.0-27.gbfc8ec6
kubectl -n rook set image deploy/rook-ceph-mds-myfs rook-ceph-mds-myfs=rook/ceph:master
```
To verify that the MDS pod is `Running` and on the new version, use the following:
......@@ -296,7 +334,7 @@ kubectl -n rook get pod -l app=rook-ceph-mds -o jsonpath='{range .items[*]}{.met
```
## Completion
At this point, your Rook cluster should be fully upgraded to running version `rook/rook:v0.7.0-27.gbfc8ec6` and the cluster should be healthy according to the steps in the [health verification section](#health-verification).
At this point, your Rook cluster should be fully upgraded to running version `rook/ceph:master` and the cluster should be healthy according to the steps in the [health verification section](#health-verification).
## Upgrading Kubernetes
Rook cluster installations on Kubernetes prior to version 1.7.x, use [ThirdPartyResource](https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-third-party-resource/) that have been deprecated as of 1.7 and removed in 1.8. If upgrading your Kubernetes cluster Rook TPRs have to be migrated to CustomResourceDefinition (CRD) following [Kubernetes documentation](https://kubernetes.io/docs/tasks/access-kubernetes-api/migrate-third-party-resource/). Rook TPRs that require migration during upgrade are:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment