Commits · 65677d28b6ea3a0889c75cfb00bbd436d218ccd9 · 小白蛋 / Rook

This project is mirrored from https://gitee.com/wangmingco/rook.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

01 Dec, 2020 4 commits

ceph: add log collector · 65677d28

Sébastien Han authored 4 years ago


We can now collect logs directly into a side-car container.
A new CRD spec has been added:

spec:
  logCollector:
    enabled: true
    periodicity: 24h

Every 24h we will rotate log files for each Ceph daemon.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c6a87203)

# Conflicts:
#	Documentation/ceph-cluster-crd.md
#	cluster/examples/kubernetes/ceph/cluster.yaml
#	pkg/operator/ceph/cluster/crash/crash.go

65677d28

ceph: use debug message for snap schedule · 60c6375e

Sébastien Han authored 4 years ago


We don't need to print this every 60s in the operator log.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit aca5a8cc)

60c6375e

Merge pull request #6722 from rook/mergify/bp/release-1.5/pr-6719 · c0ef6bfd
mergify[bot] authored 4 years ago
```
ceph: cleanup should ignore ceph daemon pods that are not scheduled on any node. (bp #6719)
```
c0ef6bfd

ceph: ignore unscheduled ceph daemons while cleaning up · e4e4ced4

Santosh Pillai authored 4 years ago


Before cleaning up the cluster, we wait for all the daemon pods to be cleaned up. This fails when
a daemon is in pending state and has no NodeName. This PR ignores daemon pods that are not scheduled on any node.
Signed-off-by: Santosh Pillai <sapillai@redhat.com>
(cherry picked from commit 753bdb35)

e4e4ced4

26 Nov, 2020 3 commits
- Merge pull request #6706 from rook/mergify/bp/release-1.5/pr-6702 · 3ac7edf0
  mergify[bot] authored 4 years ago
```
ceph: fix pod labels set on csi components (bp #6702)
```
  3ac7edf0
- ceph: fix pod labels set on csi components · 534a0e7f
  Alexander Trost authored 4 years ago
```
Signed-off-by: Alexander Trost <galexrt@googlemail.com>
(cherry picked from commit b04f8823)
```
  534a0e7f
- Merge pull request #6705 from rook/mergify/bp/release-1.5/pr-6696 · 4d72bf2a
  Satoru Takeuchi authored 4 years ago
```
ceph: fix metadata device passed by-id (bp #6696)
```
  4d72bf2a
25 Nov, 2020 6 commits

ceph: fix metadata device passed by-id · 44b2df0c

Sébastien Han authored 4 years ago

The code was assuming that devices were passed by the user as
"/dev/sda", this is bad! We all know people should be using paths like
/dev/disk/by-id so we must support them.

Closes: https://github.com/rook/rook/issues/6685

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 5d9612c2)

44b2df0c

Merge pull request #6704 from rook/mergify/bp/release-1.5/pr-6693 · 311d7721
mergify[bot] authored 4 years ago
```
ceph: ability to abort orchestration (bp #6693)
```
311d7721

ceph: ability to abort orchestration · c00c551b

Sébastien Han authored 4 years ago

We can now prioritize orchestrations on certain event. Today only two
events will cancel on-going orchestrations (if any):

* request for cluster deletion
* request for cluster upgrade

If one of the two are caught by the watcher we will cancel the on-going
orchestration.
For that we implemented a simple approach based on check points, where
we will check for a cancellation request in certain part of the
orchestration. Mainly before each mons/mgr/osds orchestration loops.

This solution is not perfect, but we are waiting for the
controller-runtime to release its 0.7 version which will embed context
support. With that we will be able to cancel reconciles more precisely
and rapidly.

Operator log example:

```
2020-11-24 13:54:59.499719 I | op-mon: parsing mon endpoints: a=10.109.126.120:6789
2020-11-24 13:54:59.499719 I | op-mon: parsing mon endpoints: a=10.109.126.120:6789
2020-11-25 12:59:12.986264 I | ceph-cluster-controller: done reconciling ceph cluster in namespace "rook-ceph"
2020-11-25 13:07:33.776947 I | ceph-cluster-controller: CR has changed for "rook-ceph". diff=  v1.ClusterSpec{
        CephVersion: v1.CephVersionSpec{
                Image:            "ceph/ceph:v15.2.5",
-               AllowUnsupported: true,
+               AllowUnsupported: false,
        },
        DriveGroups: nil,
        Storage:     {UseAllNodes: true, Selection: {UseAllDevices: &true}},
        ... // 20 identical fields
  }
2020-11-25 13:07:33.777039 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"
2020-11-25 13:07:33.785088 I | op-mon: parsing mon endpoints: a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789
2020-11-25 13:07:33.788626 I | ceph-cluster-controller: detecting the ceph image version for image ceph/ceph:v15.2.5...
2020-11-25 13:07:35.280789 I | ceph-cluster-controller: detected ceph image version: "15.2.5-0 octopus"
2020-11-25 13:07:35.280806 I | ceph-cluster-controller: validating ceph version from provided image
2020-11-25 13:07:35.285888 I | op-mon: parsing mon endpoints: a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789
2020-11-25 13:07:35.287828 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-11-25 13:07:35.288082 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2020-11-25 13:07:35.621625 I | ceph-cluster-controller: cluster "rook-ceph": version "15.2.5-0 octopus" detected for image "ceph/ceph:v15.2.5"
2020-11-25 13:07:35.642688 I | op-mon: start running mons
2020-11-25 13:07:35.646323 I | op-mon: parsing mon endpoints: a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789
2020-11-25 13:07:35.654070 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.107.242.49:6789","10.109.71.30:6789","10.98.93.224:6789"]}] data:a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789 mapping:{"node":{"a":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"},"b":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"},"c":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"}}} maxMonId:2]
2020-11-25 13:07:35.868253 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-11-25 13:07:35.868573 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2020-11-25 13:07:37.074353 I | op-mon: targeting the mon count 3
2020-11-25 13:07:38.153435 I | op-mon: checking for basic quorum with existing mons
2020-11-25 13:07:38.178029 I | op-mon: mon "a" endpoint is [v2:10.107.242.49:3300,v1:10.107.242.49:6789]
2020-11-25 13:07:38.670191 I | op-mon: mon "b" endpoint is [v2:10.109.71.30:3300,v1:10.109.71.30:6789]
2020-11-25 13:07:39.477820 I | op-mon: mon "c" endpoint is [v2:10.98.93.224:3300,v1:10.98.93.224:6789]
2020-11-25 13:07:39.874094 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.107.242.49:6789","10.109.71.30:6789","10.98.93.224:6789"]}] data:a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789 mapping:{"node":{"a":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"},"b":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"},"c":{"Name":"minikube","Hostname":"minikube","Address":"192.168.39.3"}}} maxMonId:2]
2020-11-25 13:07:40.467999 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-11-25 13:07:40.469733 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2020-11-25 13:07:41.071710 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-11-25 13:07:41.078903 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2020-11-25 13:07:41.125233 I | op-mon: deployment for mon rook-ceph-mon-a already exists. updating if needed
2020-11-25 13:07:41.327778 I | op-k8sutil: updating deployment "rook-ceph-mon-a" after verifying it is safe to stop
2020-11-25 13:07:41.327895 I | op-mon: checking if we can stop the deployment rook-ceph-mon-a
2020-11-25 13:07:44.045644 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-a"
2020-11-25 13:07:44.045706 I | op-mon: checking if we can continue the deployment rook-ceph-mon-a
2020-11-25 13:07:44.045740 I | op-mon: waiting for mon quorum with [a b c]
2020-11-25 13:07:44.109159 I | op-mon: mons running: [a b c]
2020-11-25 13:07:44.474596 I | op-mon: Monitors in quorum: [a b c]
2020-11-25 13:07:44.478565 I | op-mon: deployment for mon rook-ceph-mon-b already exists. updating if needed
2020-11-25 13:07:44.493374 I | op-k8sutil: updating deployment "rook-ceph-mon-b" after verifying it is safe to stop
2020-11-25 13:07:44.493403 I | op-mon: checking if we can stop the deployment rook-ceph-mon-b
2020-11-25 13:07:47.135524 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-b"
2020-11-25 13:07:47.135542 I | op-mon: checking if we can continue the deployment rook-ceph-mon-b
2020-11-25 13:07:47.135551 I | op-mon: waiting for mon quorum with [a b c]
2020-11-25 13:07:47.148820 I | op-mon: mons running: [a b c]
2020-11-25 13:07:47.445946 I | op-mon: Monitors in quorum: [a b c]
2020-11-25 13:07:47.448991 I | op-mon: deployment for mon rook-ceph-mon-c already exists. updating if needed
2020-11-25 13:07:47.462041 I | op-k8sutil: updating deployment "rook-ceph-mon-c" after verifying it is safe to stop
2020-11-25 13:07:47.462060 I | op-mon: checking if we can stop the deployment rook-ceph-mon-c
2020-11-25 13:07:48.853118 I | ceph-cluster-controller: CR has changed for "rook-ceph". diff=  v1.ClusterSpec{
        CephVersion: v1.CephVersionSpec{
-               Image:            "ceph/ceph:v15.2.5",
+               Image:            "ceph/ceph:v15.2.6",
                AllowUnsupported: false,
        },
        DriveGroups: nil,
        Storage:     {UseAllNodes: true, Selection: {UseAllDevices: &true}},
        ... // 20 identical fields
  }
2020-11-25 13:07:48.853140 I | ceph-cluster-controller: upgrade requested, cancelling any ongoing orchestration
2020-11-25 13:07:50.119584 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-c"
2020-11-25 13:07:50.119606 I | op-mon: checking if we can continue the deployment rook-ceph-mon-c
2020-11-25 13:07:50.119619 I | op-mon: waiting for mon quorum with [a b c]
2020-11-25 13:07:50.130860 I | op-mon: mons running: [a b c]
2020-11-25 13:07:50.431341 I | op-mon: Monitors in quorum: [a b c]
2020-11-25 13:07:50.431361 I | op-mon: mons created: 3
2020-11-25 13:07:50.734156 I | op-mon: waiting for mon quorum with [a b c]
2020-11-25 13:07:50.745763 I | op-mon: mons running: [a b c]
2020-11-25 13:07:51.045108 I | op-mon: Monitors in quorum: [a b c]
2020-11-25 13:07:51.054497 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: CANCELLING CURRENT ORCHESTATION
2020-11-25 13:07:52.055208 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"
2020-11-25 13:07:52.070690 I | op-mon: parsing mon endpoints: a=10.107.242.49:6789,b=10.109.71.30:6789,c=10.98.93.224:6789
2020-11-25 13:07:52.088979 I | ceph-cluster-controller: detecting the ceph image version for image ceph/ceph:v15.2.6...
2020-11-25 13:07:53.904811 I | ceph-cluster-controller: detected ceph image version: "15.2.6-0 octopus"
2020-11-25 13:07:53.904862 I | ceph-cluster-controller: validating ceph version from provided image
```

Closes: https://github.com/rook/rook/issues/6587

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ad249904)

c00c551b

ceph: add dedicated predicate for CephCluster object · 9d0e61ee

Sébastien Han authored 4 years ago


Since we want to pass a context to it, let's extract the logic into its
own predicate.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 70c6752e)

9d0e61ee

ceph: export isDoNotReconcile function · 14dad124

Sébastien Han authored 4 years ago


Since the predicate for the CephCluster object will soon move into its
own predidacte we need to export isDoNotReconcile so that it can be
consummed by the "cluster" package.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9b9f45b7)

14dad124

ceph: remove "createInstance" function · 4f826825

Sébastien Han authored 4 years ago


Since we moved to the controller-runtime, events are processed one by
one and so are reconciles. This means we won't have multiple
orchestrations happening at the same time. Thus removing this code.

Also removing one unused variable.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 10dd8e11)

4f826825

24 Nov, 2020 1 commit
- Merge pull request #6684 from rook/mergify/bp/release-1.5/pr-6568 · 69e5ba94
  mergify[bot] authored 4 years ago
```
Bump Controller Runtime version to 0.6 (bp #6568)
```
  69e5ba94
20 Nov, 2020 6 commits

ceph: apply finalizer before updating object status · c97c889f

Sébastien Han authored 4 years ago


We must add the finalizer right after the object creation otherwise the
seerver will later return an error on update that the object has been
modified. Indeed, it has been by the task that updates the status when
the object is first created.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 97be23e3)

c97c889f

ceph: changes after the latest multus CSI updates · ae303c1c
Arun Kumar Mohan authored 4 years ago
```
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
(cherry picked from commit f5fc08b8)
```
ae303c1c
ceph: changes for 'sigs.k8s.io/sig-storage-lib-external-provisioner/v6' · 159209ed
Arun Kumar Mohan authored 4 years ago
```
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
(cherry picked from commit ded16f77)
```
159209ed

ceph: manual changes needed for kubernetes api updates · 3710d140

Arun Kumar Mohan authored 4 years ago


Fetched latest lib-bucket-provisioner changes as well.
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
(cherry picked from commit 65d16bfc)

3710d140

ceph: changes made by running `make codegen` target · 3fe2d11e
Arun Kumar Mohan authored 4 years ago
```
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
(cherry picked from commit 9ff98995)
```
3fe2d11e

ceph: updating the dependencies for operator SDK v1.0.0 · 2384f408

Arun Kumar Mohan authored 4 years ago


Updating the dependencies' versions to match with the newer Operator SDK
version v1.x
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
(cherry picked from commit 421f340c)

2384f408

19 Nov, 2020 6 commits

Merge pull request #6681 from travisn/release-1.5.1 · 364989a9
Travis Nielsen authored 4 years ago
```
build: Update release version to v1.5.1
```
364989a9

build: update release version to v1.5.1 · 9fd4c0b1

Travis Nielsen authored 4 years ago


For the patch release we update the version to v1.5.1
Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

9fd4c0b1

Merge pull request #6677 from rook/mergify/bp/release-1.5/pr-6676 · dacee01f
mergify[bot] authored 4 years ago
```
ceph: update cephcsi to latest v3.1.2 release (bp #6676)
```
dacee01f

ceph: update cephcsi to latest release · d7b27f8f

Madhu Rajanna authored 4 years ago


updating cephcsi to v3.1.2 which is a latest
bugfix release.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit ca3e2385)

d7b27f8f

Merge pull request #6675 from rook/mergify/bp/release-1.5/pr-6497 · 6103ec49
mergify[bot] authored 4 years ago
```
ceph: OSD PDB reconciler changes (bp #6497)
```
6103ec49

ceph: osd pdb reconciler changes · 76e6a7a1

Santosh Pillai authored 4 years ago


-creates a single PDB (max-unavailable=1) for all OSDs.  This PDB allows one OSD to go down at a given time.
-When a drain is detected, blocking PDBs (max-unavailable=0) will be created for each failure domain that is not being drained and the main PDB (max-unavilable=1) will be deleted. This will allow all the OSDs in the currently drained failure domain to be removed while blocking the deletion  of OSDs in other failure domains.
-Once the PGs are healthy again, the blocking PDBs will be deleted and the main PDB will be restored.
-Add PG healthcheck timeout
-Delete any legacy node drain pods and blocking OSD PDBs
Signed-off-by: Santosh Pillai <sapillai@redhat.com>
(cherry picked from commit 8602b9c1)

76e6a7a1

18 Nov, 2020 10 commits

Merge pull request #6670 from rook/mergify/bp/release-1.5/pr-6666 · 4fd5521d
mergify[bot] authored 4 years ago
```
ci: fix device intermittent failure (bp #6666)
```
4fd5521d
Merge pull request #6668 from rook/mergify/bp/release-1.5/pr-6553 · a639e88c
mergify[bot] authored 4 years ago
```
ceph: add snapshot scheduling for mirrored pools (bp #6553)
```
a639e88c

ci: probe the device and wait for udev · 6c49a8ef

Sébastien Han authored 4 years ago


When we are done creating the partitions it's good to give the kernel
some time to reprobe the device and for udev to finish syncing up.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 38e47560)

6c49a8ef

ci: use sudo test if the device is a block · ee5190f1

Sébastien Han authored 4 years ago

Permissions on the disk might changed due to the partitions being
created. So the CI user is not able to read the device correctly.

Closes: https://github.com/rook/rook/issues/6580

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dcd84e63)

ee5190f1

ceph: add snapshot scheduling for mirrored pools · 2d3aced3

Sébastien Han authored 4 years ago


Now, we can schedule snapshots on pools from the CephBlockPool CR when
the pool is mirrored.
It can be enabled like this:

```
mirroring:
  enabled: true
  mode: pool
  snapshotSchedules:
    - interval: 24h # daily snapshots
      startTime: 14:00:00-05:00
```

Multiple schedules are supported since snapshotSchedules is a list.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit afc7ecff)

2d3aced3

Merge pull request #6656 from rook/mergify/bp/release-1.5/pr-6655 · 943b560a
mergify[bot] authored 4 years ago
```
docs: Clarify helm warning that could delete cluster (bp #6655)
```
943b560a
Merge pull request #6662 from rook/mergify/bp/release-1.5/pr-6658 · 7731d6fd
mergify[bot] authored 4 years ago
```
ceph: Restore mon clusterIP if the service is missing (bp #6658)
```
7731d6fd
Merge pull request #6661 from rook/mergify/bp/release-1.5/pr-6592 · 811f54d7
mergify[bot] authored 4 years ago
```
ceph: update cleanupPolicy design doc (bp #6592)
```
811f54d7

ceph: restore mon clusterIP if missing · 50d7c09c

Travis Nielsen authored 4 years ago


In a disaster recovery scenario, the mon service may have been
accidentally deleted, while the expected mon endpoint is still
found in the mon endpoints configmap. In this case, we create
the mon service with the same endpoint as previously.
Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
(cherry picked from commit 66de535a)

50d7c09c

ceph: update cleanupPolicy design doc · 035680f9

Santosh Pillai authored 4 years ago


Cleanup Policy design doc is not up to date to with respect to latest implementation. This PR updates the design doc.
Signed-off-by: Santosh Pillai <sapillai@redhat.com>
(cherry picked from commit cd936b64)

035680f9

17 Nov, 2020 4 commits

docs: clarify helm warning that could delete cluster · 446cc36c

Travis Nielsen authored 4 years ago


In the helm chart the CRDs are installed if crds.enabled is set to
true. If false, the helm chart will not install them. If changed
to false during an upgrade, the CRDs will be removed and the cluster
is destroyed. There is no way to prevent this while still being flexible
about CRD management, so we make the warnings as clear as possible.
Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
(cherry picked from commit ba534ce2)

446cc36c

Merge pull request #6653 from rook/mergify/bp/release-1.5/pr-6396 · 96bf158d
mergify[bot] authored 4 years ago
```
ceph: support ceph cluster and CSI on multus in different namespace (bp #6396)
```
96bf158d
Merge pull request #6652 from rook/mergify/bp/release-1.5/pr-6648 · ee50a5c5
mergify[bot] authored 4 years ago
```
ceph: add external script to the container image (bp #6648)
```
ee50a5c5
Merge pull request #6651 from rook/mergify/bp/release-1.5/pr-6646 · 94a9fb47
mergify[bot] authored 4 years ago
```
ceph: update ceph quick start doc to use new crds.yaml file (bp #6646)
```
94a9fb47