Additional resource requirements for Cloudera Data Engineering

For standalone Cloudera Data Engineering, Cloudera recommends three nodes (one master and two workers) with the following minimum memory, storage, and hardware requirements for each node:

Component Minimum Recommended
Node Count 2 4
CPU 16 cores for CDE workspace (base and virtual cluster) and 8 cores for workload 16 cores for CDE workspace (base and virtual cluster) and 32 cores (you can extend this depending upon the workload size)
Memory 64 GB for CDE workspace (base and virtual cluster) and 32 GB (you can extend this depending upon the workload size) 64 GB for CDE workspace (base and virtual cluster) and 64 GB (you can extend this depending upon the workload size)
Storage 200 GB blob storage and 500 GB NFS storage 200 GB blob storage and 500 GB NFS storage
Network Bandwidth 1 GB/s to all nodes and base cluster 10 GB/s to all nodes and base cluster

CDE Service and Virtual Cluster requirements

  • CDE Service requirements: Overall for a CDE service, it requires 110 GB Block PV or NFS PV, 7 CPU cores, and 15 GB memory.
    Table 1. CDE Service requirements:
    Component vCPU Memory Block PV or NFS PV Number of replicas
    Embedded DB 4 8 GB 100 GB 1
    Config Manager 500 m 1 GB -- 2
    Dex Downloads 250 m 512 MB -- 1
    Knox 250 m 1 GB -- 1
    Management API 1 2 GB -- 1
    NGINX Ingress Controller 100 m 90 MB -- 1
    FluentD Forwarder 250 m 512 MB -- 1
    Grafana 250 m 512 MB 10 GB 1
    Data Connector 250 m 512 MB -- 1
    Total 7 15 GB 110 GB
  • CDE Virtual Cluster requirements:
    • For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 5.35 CPU cores, and 15.6 GB per virtual cluster.
    • For Spark 2: If you are using Spark 2, you need additional 500 m CPU, 4.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 5.85 CPU cores, and 20.1 GB per virtual cluster.
    Table 2. CDE Virtual Cluster requirements for Spark 3:
    Component vCPU Memory Block PV or NFS PV Number of replicas
    Airflow API 350 m 612 MB 100 GB 1
    Airflow Scheduler 1 1 GB 100 GB 1
    Airflow Web 250 m 512 MB -- 1
    Runtime API 250 m 512 MB 100 GB 1
    Livy 3 12 GB 100 GB 1
    SHS 250 m 1 GB 1
    Pipelines 250 m 512 MB -- 1
    Total 5350 m 15.6 GB 400 GB
  • Workloads: Depending upon the workload, you must configure resources.
    • The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
    • In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
    • The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
    • In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
    • Minimal Airflow jobs need 100 m CPU and 200 MB memory per Airflow worker.