OpenShift 4.20 and ceph-csi community edition
OpenShift ↔ Community Ceph CSI Runbook (RBD + CephFS)
This is the repeatable step-by-step procedure derived from your PoC (OCP 4.20 + upstream ceph-csi) to connect OpenShift to a community Ceph cluster and provision: RBD (RWO/Block) and CephFS (RWX/Filesystem).
- RBD PVC
rbd-rwo-testbecame Bound ✅ - CephFS PVC
cephfs-rwx-testbecame Bound ✅
0) What you need before starting
- Ceph: FSID, MON IPs, admin access to create pools/users/subvol groups.
- Network: OCP worker nodes reach Ceph MONs on
6789(msgr1) and/or3300(msgr2). - OpenShift: cluster-admin access (
oc), ability to apply upstream manifests.
1) Community Ceph side (do this first)
1.1 Create RBD pool for OpenShift
cephadm shell --fsid <FSID> -- bash -lc '
ceph osd pool create ocp-rbd 128
ceph osd pool application enable ocp-rbd rbd
'
1.2 Create CSI client for RBD (IMPORTANT: include mgr caps)
cephadm shell --fsid <FSID> -- bash -lc '
ceph auth get-or-create client.ocp \
mon "profile rbd" \
osd "profile rbd pool=ocp-rbd" \
mgr "allow r"
ceph auth get client.ocp
ceph auth get-key client.ocp
'
1.3 Ensure CephFS exists and you know its fsName
cephadm shell --fsid <FSID> -- ceph fs ls
1.4 Create CSI client for CephFS (IMPORTANT: include mgr caps)
Your failure “does your client key have mgr caps?” was fixed by adding mgr "allow r".
cephadm shell --fsid <FSID> -- bash -lc '
ceph auth get-or-create client.ocp-cephfs \
mon "allow r" \
mgr "allow r" \
mds "allow rw" \
osd "allow rw pool=cephfs.<FSNAME>.data, allow rw pool=cephfs.<FSNAME>.meta"
ceph auth get client.ocp-cephfs
ceph auth get-key client.ocp-cephfs
'
1.5 Create CephFS CSI subvolume group
Your failure “subvolume group ’csi’ does not exist” was fixed by creating it.
cephadm shell --fsid <FSID> -- bash -lc '
ceph fs subvolumegroup create <FSNAME> csi
ceph fs subvolumegroup ls <FSNAME>
'
2) OpenShift side
2.1 Create a namespace for upstream ceph-csi
oc new-project ceph-csi
2.2 Create config maps (cluster config + ceph.conf base + kms config)
cat <<'EOF' | oc apply -n ceph-csi -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: ceph-csi-config
data:
config.json: |-
[
{
"clusterID": "<FSID>",
"monitors": [
"<MON1>:6789",
"<MON2>:6789",
"<MON3>:6789"
]
}
]
EOF
cat <<'EOF' | oc apply -n ceph-csi -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: ceph-config
data:
ceph.conf: |
[global]
fsid = <FSID>
mon_host = <MON1>,<MON2>,<MON3>
EOF
cat <<'EOF' | oc apply -n ceph-csi -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: ceph-csi-encryption-kms-config
data:
config.json: |
{}
EOF
2.3 Create secrets
cat <<'EOF' | oc apply -n ceph-csi -f -
apiVersion: v1
kind: Secret
metadata:
name: csi-rbd-secret
type: Opaque
stringData:
userID: "ocp"
userKey: "<KEY_OF_client.ocp>"
EOF
cat <<'EOF' | oc apply -n ceph-csi -f -
apiVersion: v1
kind: Secret
metadata:
name: csi-cephfs-secret
type: Opaque
stringData:
userID: "ocp-cephfs"
userKey: "<KEY_OF_client.ocp-cephfs>"
EOF
2.4 Deploy upstream ceph-csi manifests (RBD + CephFS)
- Apply RBAC + CSIDriver + Nodeplugins (DaemonSets) + Provisioners (Deployments) for both drivers.
- Make sure the upstream YAML uses
namespace: ceph-csi(upstream examples often default todefault).
2.5 OpenShift SCC
oc adm policy add-scc-to-user privileged -n ceph-csi -z rbd-csi-nodeplugin
oc adm policy add-scc-to-user privileged -n ceph-csi -z cephfs-csi-nodeplugin
2.6 The #1 culprit you hit: /etc/ceph read-only
- Symptom:
failed to write ceph configuration file (open /etc/ceph/keyring: read-only file system) - Cause:
/etc/cephmounted from ConfigMap is read-only, but ceph-csi writes a keyring there. - Fix pattern (what made your RBD work):
- Mount
/etc/cephfrom anemptyDirvolume (writable). - Add an initContainer that writes:
/etc/ceph/ceph.conf/etc/ceph/keyring/tmp/csi/keys/userID+/tmp/csi/keys/userKey
- Do this for both provisioners: RBD and CephFS.
- Mount
2.7 Create StorageClasses
cat <<'EOF' | oc apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
provisioner: rbd.csi.ceph.com
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
clusterID: "<FSID>"
pool: "ocp-rbd"
imageFeatures: "layering"
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi
EOF
cat <<'EOF' | oc apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-rwx
provisioner: cephfs.csi.ceph.com
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
clusterID: "<FSID>"
fsName: "<FSNAME>"
pool: "cephfs.<FSNAME>.data"
csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi
csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi
EOF
3) Validation
3.1 CSI pods
oc -n ceph-csi get pods
oc -n ceph-csi get ds,deploy
3.2 RBD test PVC
cat <<'EOF' | oc apply -n default -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-rwo-test
spec:
accessModes: ["ReadWriteOnce"]
volumeMode: Block
resources:
requests:
storage: 5Gi
storageClassName: ceph-rbd
EOF
oc -n default get pvc rbd-rwo-test
3.3 CephFS test PVC
cat <<'EOF' | oc apply -n default -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-rwx-test
spec:
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 5Gi
storageClassName: cephfs-rwx
EOF
oc -n default get pvc cephfs-rwx-test
4) “Gotchas” summary (from your PoC)
- Read-only /etc/ceph from ConfigMap caused ceph-csi crash: fixed with
emptyDir + initContainer. - CephFS mgr caps missing caused: “does your client key have mgr caps?” fixed with
mgr "allow r". - CephFS subvolume group missing caused: “subvolume group ’csi’ does not exist” fixed by creating it.
- Pending provisioner replicas due to anti-affinity + master taints: scale replicas down or relax scheduling.
Security note: don’t store real keys in Git; for non-PoC use sealed-secrets/external secret stores.
