Main menu

Pages

Ceph Storage Architecture

What is Ceph Storage??

What is Ceph Storage??


Ceph Storage is a software-defined storage solution that scales out to multi petabytes, using servers based on industry-standard hardware and storage devices.

Ceph Storage Architecture:

Ceph is based on a modular and distributed architecture that contains the following elements.

  1. An object storage back end is known as RADOS (Reliable Autonomic Distributed Object Store), RADOS is a self-healing and self-managing software-based object store.
  2. A variety of access methods to interact with RADOS.

Ceph storage back end is based on the following daemons:

  • Monitors (Ceph-MON) maintain maps of the cluster state including the monitor map, manager map, the OSD map, the MDS map, and the CRUSH map. These maps are used to help the other daemons coordinate with each other.
  • Object Storage Devices (Ceph-OSDs) store data and handle data replication, recovery, and rebalancing.
  • Managers (Ceph-MGR) keep track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. Expose cluster information through a web browser-based dashboard and REST API.
  • Metadata Servers (MDSs) store metadata used by CephFS to allow efficient POSIX command execution by clients.

Ceph Monitors (MONS)

  • Ceph Monitors (MONs) are the daemons that maintain a master copy of the cluster map. 
  • The cluster map is a collection of six maps that contain information about the state of the Ceph cluster and its configuration
  • Monitors provide consensus for distributed decision making
  • Each cluster event must be handled correctly, the appropriate map updated, and the update replicated to each Monitor daemon.
  • the cluster must be configured with an odd number of monitors. means that More than half must be functional for the Ceph Storage cluster to be an operational and accessible cluster.

Ceph Object Storage Devices (OSDs)

  • OSDs are The building blocks of the Ceph storage cluster.
  • OSDs connects a storage device (such as a hard disk ) to the Ceph storage cluster. 
  • An individual storage server may run multiple OSD daemons and provide multiple OSDs to the cluster.
  • The CRUSH algorithm is used to store objects in OSDs using placement groups
  • The replication of objects to multiple OSDs is handled automatically
  • One OSD is the primary OSD for the object's placement group
  • Ceph clients always contact the primary OSD in the acting set when it reads or writes data. So primary OSD provides.
    •  Serves all I/O requests
    • Replicates and protects the data
    • Checks the coherence of the data
    • Rebalances the data
    • Recovers the data

  • Other OSDs are secondary OSDs.

    • Always acts under the control of the primary OSD
    • Capable of becoming the primary OSD

  • Each OSD has its own OSD journal. The OSD journal is not related to the file system journal but is used to improve the performance of write operations to the OSD.

    • Each writes operation to the Ceph cluster is acknowledged to the client after all involved OSD journals have recorded the write request. The OSD then commits the operation to its backing file system.
    • Every few seconds, the OSD stops writing new requests to its journal to apply the contents of the OSD journal to the backing file system. It then trims the committed requests from the journal to reclaim space on the journal's storage device.
    • OSD journals use raw volumes on the OSD nodes and should be configured on a separate device, if possible a fast device such as an SSD, for performance-oriented and/or write-intensive environments.
      Ceph OSD

Ceph Managers (MGRs)

  • Ceph Managers (MGRs) provide for the collection of cluster statistics.
  • It's recommended that you deploy at least two Ceph managers for each cluster, each running in a separate failure domain.
  • If a manager is not available in your cluster, it will have no impact on client I/O operations. However, it will cause attempts to query cluster statistics to fail.
  • The manager daemon centralizes access to all data collected from the cluster and can provide a simple web dashboard to storage administrators on TCP port 7000 (by default).

Metadata Server (MDS)

  • The Ceph Metadata Server (MDS) is a service which provides POSIX-compliant, shared file system metadata management, which supports both directory hierarchy and file metadata, including ownership, time stamps, and mode.
  • The MDS uses RADOS instead of local storage to store metadata, and has no access to file content, because it is only required for file access. 
  • The MDS is therefore a Ceph component but not a RADOS component.
  • The MDS enables CephFS to interact with the Ceph Object Store, mapping an inode to an object, and remembering where data is stored within a tree. Clients accessing a CephFS file system first make a request to an MDS, which provides the information needed to get files from the correct OSDs.

Ceph Access Methods:

The Ceph Native API (librados)

  • librados is a native C library that allows applications to work directly with RADOS to access objects stored by the Ceph cluster.
  • librados is used as the bottom layer of other Ceph interfaces. For example, services such as Ceph Block Device and Ceph Object Gateway are built using librados.

The Ceph Object Gateway  (RADOS Gateway)

  • The Ceph Object Gateway: The RESTful APIs for Amazon S3 and OpenStack Swift compatibility. The Ceph Object Gateway is also referred to as the RADOS Gateway, RADOSGW, or RGW.
  • Is an object storage interface built using librados.
  • It uses this library to communicate with the Ceph cluster and writes to OSD processes directly.
  • It provides applications with a gateway with a RESTful API, and supports two interfaces: Amazon S3 and OpenStack Swift.
    • Image storage (for example, SmugMug, Tumblr)
    • Backup services
    • File storage and sharing (for example, Dropbox)

The Ceph Block Device (RBD, librbd)

  • The Ceph Block Device (RBD, librbd): This is a Python module that provides a block devicelike interface to the RADOS object store. Each Ceph Block Device is referred to as a RADOS Block Device (RBD) image.
  • provides block storage within a Ceph cluster through RBD images.
  • RBD images are constructed from individual objects scattered across different OSDs in the cluster
  • Because the objects that make up the RBD are distributed to different OSDs around the cluster, access to the block device is automatically parallelized.
  • It provides:
    • Storage for virtual disks in the Ceph cluster 
    • Mount support in the Linux kernel 
    • Boot support in QEMU, KVM, and OpenStack Cinder

The Ceph File System (CephFS, libcephfs)

  • The Ceph File System (CephFS, libcephfs): Provides access to a Ceph cluster through a POSIX-like file system interface
  • Ceph File System (CephFS) is a parallel file system that provides a scalable, single-hierarchy shared disk. 
  • The metadata associated with the files stored in CephFS is managed by the Ceph Metadata Server (MDS).This includes information such as the access, change, and modification time stamps for the files. 

reactions