NoSQL Nest

From Zero to Green: Automating a Production-Ready, Secured 3-Node Elasticsearch Cluster

AKEBLI Ouassim — Tue, 17 Feb 2026 08:48:31 GMT

Chapter 1: Architecture & Design

Before writing a single line of code, we must understand the "Physical" and "Logical" layout of the cluster we are building, This project deploys a hyper-converged, 3-node Elasticsearch cluster using Vagrant and VirtualBox. The infrastructure simulates a production environment with dedicated networks, storage partitioning, and full SSL security.

1.1 Environment Specification

Component	Specification	Details
OS Base	Ubuntu 22.04 LTS (Jammy64)	Standardized base box.
Compute	3 Nodes (`es1`, `es2`, `es3`)	4 GB RAM, 2 vCPUs per node.
Storage	15GB Dedicated LVM Disk	Partitioned for `/mnt` (System), `home`, `data`, `logs`, and `backup`.
Network	Dual Interface	`NAT` (Mgmt/SSH) + `Host-Only` (Cluster Traffic `192.168.56.x`).
Software	Elasticsearch 9.3.0	Manual Tarball Installation (Archive).
Security	xPack Security Enabled	Full SSL/TLS for Inter-node (Transport) and Client (HTTP) traffic.

1.2 Logical Architecture

This diagram illustrates the physical layout of the cluster. It highlights the "Hub-and-Spoke" network topology where all nodes connect via a private Virtual Switch, and the Storage Architecture where es1 acts as the central NFS repository for backups.

graph TD
    %% === HOST LAYER ===
    subgraph HostLayer ["Host Machine
32GB RAM / 8 Cores
Provider: VirtualBox"]
        style HostLayer fill:#f9f9f9,stroke:#333,stroke-width:2px

        %% --- VIRTUAL SWITCHING ---
        subgraph Switches ["Virtual Network Infrastructure"]
            style Switches fill:#ffffff,stroke:#999,stroke-dasharray: 5 5

            VS_NAT["NAT Switch
DHCP / Internet Access
Port Forwarding: SSH (2222->22)"]
            style VS_NAT fill:#dbeeff,stroke:#0055bb,color:#0055bb

            VS_PROD["Host-Only Switch
Subnet: 192.168.56.0/24
Gateway: 192.168.56.1"]
            style VS_PROD fill:#e0ffe0,stroke:#007700,color:#007700
        end

        %% ============================================================
        %% NODE 1 (MASTER / NFS SERVER)
        %% ============================================================
        subgraph ES1 ["VM: es1 (4GB RAM / 2 vCPU)
OS: Ubuntu 22.04 LTS"]
            style ES1 fill:#fff,stroke:#333

            subgraph ES1_OS ["OS Configuration"]
                style ES1_OS fill:#f4f4f4,stroke:none
                K1["sysctl: vm.max_map_count=262144
ulimit: nofile=65535, memlock=inf
User: elasticsearch (1122)"]
            end

            subgraph ES1_App ["JVM: Elasticsearch 9.3.0"]
                style ES1_App fill:#ffeae6,stroke:#cc5500
                JVM1["Heap: 2GB (Xms2g Xmx2g)
GC: G1GC
bootstrap.memory_lock: true"]
                Roles1["Node Roles: Master, Data, Ingest
SSL: Enabled (Transport & HTTP)"]
            end

            subgraph ES1_Store ["LVM: vg01 (15GB XFS)"]
                style ES1_Store fill:#e1e1e1,stroke:none
                D1_Data["/mnt/../data (4GB)
(Indices & Shards)"]
                D1_Logs["/mnt/../logs (1GB)"]
                D1_home["/mnt/../home (2GB)"]
                D1_NFS["/mnt/../backup (4GB)
Type: Physical Dir
Status: NFS EXPORT"]
                style D1_NFS fill:#fffec8,stroke:#dba900,stroke-width:2px
            end

            subgraph ES1_Net ["Network Interfaces"]
                style ES1_Net fill:none,stroke:none
                N1_NAT["enp0s3 (DHCP)
Mgmt"]
                N1_PROD["prod (Static)
IP: 192.168.56.11
Ports: 9200, 9300, 2049"]
            end
        end

        %% ============================================================
        %% NODE 2 (DATA NODE)
        %% ============================================================
        subgraph ES2 ["VM: es2 (4GB RAM / 2 vCPU)"]
            style ES2 fill:#fff,stroke:#333

            subgraph ES2_App ["JVM: Elasticsearch 9.3.0"]
                style ES2_App fill:#ffeae6,stroke:#cc5500
                JVM2["Heap: 2GB (Locked)"]
            end

            subgraph ES2_Store ["LVM: vg01 (15GB XFS)"]
                style ES2_Store fill:#e1e1e1,stroke:none
                D2_Data["/mnt/../data (4GB)"]
                D2_Logs["/mnt/../logs (1GB)"]
                D2_home["/mnt/../home (2GB)"]
                D2_NFS["/mnt/../backup
Type: NFS Mount
Source: 192.168.56.11"]
                style D2_NFS fill:#fffec8,stroke:#dba900,stroke-dasharray: 5 5
            end

            subgraph ES2_Net ["Network Interfaces"]
                style ES2_Net fill:none,stroke:none
                N2_NAT["enp0s3 (DHCP)"]
                N2_PROD["prod (Static)
IP: 192.168.56.12
Ports: 9200, 9300"]
            end
        end

        %% ============================================================
        %% NODE 3 (DATA NODE)
        %% ============================================================
        subgraph ES3 ["VM: es3 (4GB RAM / 2 vCPU)"]
            style ES3 fill:#fff,stroke:#333

            subgraph ES3_App ["JVM: Elasticsearch 9.3.0"]
                style ES3_App fill:#ffeae6,stroke:#cc5500
                JVM3["Heap: 2GB (Locked)"]
            end

            subgraph ES3_Store ["LVM: vg01 (15GB XFS)"]
                style ES3_Store fill:#e1e1e1,stroke:none
                D3_Data["/mnt/../data (4GB)"]
                D3_Logs["/mnt/../logs (1GB)"]
                D3_home["/mnt/../home (2GB)"]
                D3_NFS["/mnt/../backup
Type: NFS Mount
Source: 192.168.56.11"]
                style D3_NFS fill:#fffec8,stroke:#dba900,stroke-dasharray: 5 5
            end

            subgraph ES3_Net ["Network Interfaces"]
                style ES3_Net fill:none,stroke:none
                N3_NAT["enp0s3 (DHCP)"]
                N3_PROD["prod (Static)
IP: 192.168.56.13
Ports: 9200, 9300"]
            end
        end
    end

    %% === CONNECTIONS ===

    %% 1. SSH / Management (Blue)
    VS_NAT -- "SSH / APT" --> N1_NAT
    VS_NAT -- "SSH / APT" --> N2_NAT
    VS_NAT -- "SSH / APT" --> N3_NAT
    linkStyle 0,1,2 stroke:#0055bb,stroke-width:2px

    %% 2. Cluster Traffic (Green)
    N1_PROD == "TCP 9300 (Transport SSL)" ==> VS_PROD
    N2_PROD == "TCP 9300 (Transport SSL)" ==> VS_PROD
    N3_PROD == "TCP 9300 (Transport SSL)" ==> VS_PROD
    linkStyle 3,4,5 stroke:#007700,stroke-width:3px

    %% 3. NFS Data Flow (Yellow / Dashed)
    D1_NFS -.-> N1_PROD
    VS_PROD -. "TCP 2049 (NFSv4)" .-> N2_PROD
    VS_PROD -. "TCP 2049 (NFSv4)" .-> N3_PROD

    N2_PROD -.-> D2_NFS
    N3_PROD -.-> D3_NFS

    linkStyle 6,7,8,9,10 stroke:#dba900,stroke-width:2px,stroke-dasharray: 5 5

1.3 Provisioning Workflow

The deployment is orchestrated via 12 modular shell scripts. This diagram details the execution flow, specifically the critical Phase 3 (Script 09) where nodes synchronize via the NFS share to securely generate and distribute SSL certificates without manual intervention.

sequenceDiagram
    autonumber
    participant V as Vagrant Host
    participant N1 as ES1 Master
    participant N23 as ES2 and ES3
    participant NFS as NFS Share

    Note over V, N23: === PHASE 1 - OS PREPARATION Parallel ===

    par Run Scripts 01-05
        V->>N1: "Provision 01-05"
        V->>N23: "Provision 01-05"
    and
        N1->>N1: "Install Pkgs - Create User - LVM Setup"
        N23->>N23: "Install Pkgs - Create User - LVM Setup"
    end

    Note over V, N23: === PHASE 2 - INFRASTRUCTURE Sequential Logic ===

    V->>N1: "Run 06-nfs.sh"
    N1->>N1: "Start NFS Server and Export backup dir"

    V->>N23: "Run 06-nfs.sh"
    N23->>N1: "Mount NFS Share from 192.168.56.11"

    par Run Scripts 07-08
        V->>N1: "07-Network and 08-Install"
        V->>N23: "07-Network and 08-Install"
    and
        N1->>N1: "Rename NIC to prod - Unpack Tarball"
        N23->>N23: "Rename NIC to prod - Unpack Tarball"
    end

    Note over V, N23: === PHASE 3 - SECURITY BARRIER Script 09 ===

    rect rgb(255, 240, 240)
        Note right of V: Critical Synchronization Point

        V->>N1: "Run 09-certs.sh"
        N1->>NFS: "Generate CA elastic-stack-ca.p12"
        N1->>NFS: "Generate Node Certs"

        V->>N23: "Run 09-certs.sh"

        loop Wait for CA
            N23->>NFS: "Check if ca.p12 exists"
            NFS-->>N23: "Not yet... sleep 2s"
        end

        NFS-->>N23: "CA Found"
        N23->>N23: "Generate Node Certs using CA"
        N23->>NFS: "Create DONE marker file"

        N1->>NFS: "Check if all 3 markers exist"
        N1->>NFS: "Delete CA - Cleanup security risk"
    end

    Note over V, N23: === PHASE 4 - FINAL CONFIG Parallel ===

    par Run Scripts 10-13
        V->>N1: "10-Keystore - 11-Service - 12-Config - 13-Sudoers"
        V->>N23: "10-Keystore - 11-Service - 12-Config - 13-Sudoers"
    and
        N1->>N1: "Set SSL Passwords - Enable Service - Heap - Sudo Rights"
        N23->>N23: "Set SSL Passwords - Enable Service - Heap - Sudo Rights"
    end

    Note over N1, N23: === CLUSTER BOOT ===

    N1->>N1: "Systemd Start PID 1"
    N23->>N23: "Systemd Start"

    N1->>N23: "Handshake over Port 9300 SSL"
    N23->>N1: "Join Cluster es-cluster"

    Note over V, N23: SUCCESS - Cluster is Green

Chapter 2: Lab Setup & Deployment

Now that we understand the design, let's build it. We will use PowerShell to create the script files instantly.

2.1 Prerequisites

Ensure you have the following installed on your host machine:

VirtualBox (Version 7.0 or higher recommended)
Vagrant (Version 2.4 or higher)
PowerShell (Standard on Windows, or pwsh on Mac/Linux)

2.2 Initialize the Lab

Open your PowerShell terminal and run the following commands to create a clean workspace:

mkdir lab_elasticsearch
cd lab_elasticsearch

2.3 Generate Provisioning Scripts & Technical Script Breakdown

The Vagrantfile (Orchestrator)

This file defines the 3 VMs, sets their IPs, RAM, and CPUs, and tells Vagrant to run the 13 shell scripts in order.

@'
Vagrant.configure("2") do |config|
  # UPDATED: Changed to Noble (24.04) to match your requested package versions
  config.vm.box = "ubuntu/jammy64"

  (1..3).each do |i|
    config.vm.define "es#{i}" do |node|
      node.vm.hostname = "es#{i}"
      node.vm.network "private_network", ip: "192.168.56.#{10+i}"

      # Hardware: 15GB Disk + 2GB RAM
      node.vm.disk :disk, size: "15GB", name: "extra_storage"

      node.vm.provider "virtualbox" do |vb|
        vb.memory = "4096"
        vb.cpus = 2
      end      

      node.vm.provision "packages", type: "shell", path: "01-packages.sh"
      node.vm.provision "user", type: "shell", path: "02-user.sh"
      node.vm.provision "memlock", type: "shell", path: "03-config.sh"
      node.vm.provision "file_structure", type: "shell", path: "04-storage.sh"
      node.vm.provision "swap", type: "shell", path: "05-swap.sh"
      node.vm.provision "nfs", type: "shell", path: "06-nfs.sh"
      node.vm.provision "network", type: "shell", path: "07-network.sh"
      node.vm.provision "install", type: "shell", path: "08-install.sh"
      node.vm.provision "certs", type: "shell", path: "09-certs.sh"
      node.vm.provision "keystore", type: "shell", path: "10-keystore.sh"
      node.vm.provision "service", type: "shell", path: "11-service.sh"
      node.vm.provision "es_config", type: "shell", path: "12-configure-cluster.sh"
      node.vm.provision "sudoers", type: "shell", path: "13-sudoers.sh"

    end
  end
end
'@ | Set-Content -Path "Vagrantfile"

Phase 1: Foundation (Scripts 01-05)

This phase establishes a standardized OS environment tuned specifically for high-performance database workloads.

01-packages.sh : Installs essential system utilities. lvm2 and xfsprogs are required for the storage layer. libuser allows advanced user management. The DEBIAN_FRONTEND=noninteractive flag prevents the script from hanging on user prompts.

@'
#!/bin/bash
set -e

export DEBIAN_FRONTEND=noninteractive
apt-get update -y

apt-get install -y libuser acl tar lvm2 xfsprogs wget net-tools
'@ | Set-Content -Path "01-packages.sh"

02-user.sh (Identity): Creates the elasticsearch user with a fixed UID (1122) and GID (1122).
- Why Fixed IDs? Since we are using NFS, the numeric User ID must match on all 3 servers. If es1 writes a file as user 1001 and es2 reads it as user 1002, permission errors will occur.

@'
#!/bin/bash
set -e

# Create Group
if ! getent group elasticsearch >/dev/null; then
    groupadd -g 1122 -r elasticsearch
fi

# Create User
if ! id -u elasticsearch >/dev/null 2>&1; then
    useradd -u 1122 -g 1122 -r -s /bin/bash -m -d /home/elasticsearch elasticsearch
fi
'@ | Set-Content -Path "02-user.sh"

03-config.sh (Kernel Tuning): Applies critical sysctl settings.
- vm.max_map_count=262144: Required for Lucene (the search engine core) to use mmapfs for efficient index access. Without this, Elasticsearch will not boot.
- limits.conf: Increases file descriptors to 65535, as databases hold thousands of files open simultaneously.

@'
#!/bin/bash
set -e

# set limits
cat < /etc/security/limits.d/elasticsearch.conf
elasticsearch   -       nofile      65535
elasticsearch   hard    memlock     unlimited
elasticsearch   soft    memlock     unlimited
elasticsearch   -       nproc       4096
EOF

# set sysctl
cat < /etc/sysctl.d/elasticsearch.conf
vm.max_map_count=262144
EOF

# Apply sysctl immediately
sysctl --system
'@ | Set-Content -Path "03-config.sh"

04-storage.sh (LVM Partitioning): Formats the raw 15GB disk into a Logical Volume Manager (LVM) group vg01. It creates separate partitions for Home, Data, Logs, and Backups formatted with XFS (optimized for large file handling). This separation prevents a runaway log file from crashing the database by filling the data disk.

@'
#!/bin/bash
set -e

# Create VG01
if ! vgs vg01 >/dev/null 2>&1; then
    DISK=$(lsblk -dn -o NAME,SIZE | grep '15G' | awk '{print "/dev/"$1}')
    pvcreate $DISK
    vgcreate vg01 $DISK
fi

# Helper to create LV and Format XFS
create_lv() {
    local size=$1
    local name=$2
    if ! lvs vg01/$name >/dev/null 2>&1; then
        lvcreate -L $size -n $name vg01
        mkfs.xfs /dev/vg01/$name
    fi
}

# Create LVs
create_lv 2G mnt
create_lv 4G data
create_lv 2G home
create_lv 1G logs
create_lv 4G backup

# Mount hierarchy
mount_and_fstab() {
    local lv=$1
    local path=$2

    mkdir -p $path
    if ! grep -q "$path " /proc/mounts; then
        mount /dev/vg01/$lv $path
        echo "/dev/vg01/$lv $path xfs defaults 0 0" >> /etc/fstab
    fi
}

# Mount /mnt
mount_and_fstab mnt "/mnt"

# Create subfolders
mkdir -p /mnt/elasticsearch/data
mkdir -p /mnt/elasticsearch/home
mkdir -p /mnt/elasticsearch/logs
mkdir -p /mnt/elasticsearch/backup

# Mount the sub-volumes
mount_and_fstab data   "/mnt/elasticsearch/data"
mount_and_fstab home   "/mnt/elasticsearch/home"
mount_and_fstab logs   "/mnt/elasticsearch/logs"
mount_and_fstab backup "/mnt/elasticsearch/backup"

# Fix Permissions
chown -R 1122:1122 /mnt/elasticsearch

lsblk
'@ | Set-Content -Path "04-storage.sh"

05-swap.sh (Performance): Permanently disables swap memory. If the OS swaps Elasticsearch memory to disk, performance degrades instantly and can cause Garbage Collection pauses that disconnect the node from the cluster.

@'
#!/bin/bash
set -e

# Turn off swap immediately
swapoff -a

# Remove swap entry from /etc/fstab so it stays off after reboot
sed -i '/swap/s/^/#/' /etc/fstab

# Verify
if [ $(swapon --show | wc -l) -eq 0 ]; then
    echo "Swap is disabled."
else
    echo "Swap might still be active."
    swapon --show
fi
'@ | Set-Content -Path "05-swap.sh"

Phase 2: Infrastructure (Scripts 06-08)

This phase builds the network and storage plumbing required for clustering.

06-nfs.sh (Shared Storage):
- On ES1: Installs nfs-kernel-server and exports the /mnt/elasticsearch/backup partition.
- On ES2/3: Installs nfs-common and mounts that export. This allows all nodes to see the same "Snapshot Repository," enabling the entire cluster to back up data to a single location.

@'
#!/bin/bash
set -e

HOSTNAME=$(hostname)
NFS_SERVER_IP="192.168.56.11"
SHARE_PATH="/mnt/elasticsearch/backup"

# SERVER CONFIG (Only on es1)
if [ "$HOSTNAME" == "es1" ]; then

    apt-get update -y
    apt-get install -y nfs-kernel-server

    # Permission Check
    chown 1122:1122 "$SHARE_PATH"
    chmod 775 "$SHARE_PATH"

    # Configure Exports
    if ! grep -q "$SHARE_PATH" /etc/exports; then
        echo "$SHARE_PATH 192.168.56.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
    fi

    exportfs -a
    systemctl restart nfs-kernel-server

    # Verify
    touch "$SHARE_PATH/verify_nfs.txt"

# CLIENT CONFIG (Only on es2 & es3)
else

    apt-get update -y
    apt-get install -y nfs-common

    mkdir -p "$SHARE_PATH"
    chown 1122:1122 "$SHARE_PATH"

    # Check if NFS is already mounted
    if ! grep -q "$NFS_SERVER_IP:$SHARE_PATH" /proc/mounts; then
        mount "$NFS_SERVER_IP:$SHARE_PATH" "$SHARE_PATH"
        echo "$NFS_SERVER_IP:$SHARE_PATH $SHARE_PATH nfs defaults 0 0" >> /etc/fstab
    fi
fi
'@ | Set-Content -Path "06-nfs.sh"

07-network.sh (Interface Naming): Identifies the secondary network card (enp0s8 usually) and renames it to prod using systemd link rules. This guarantees that our configuration files (network.host: _prod_) will always find the correct interface, regardless of how VirtualBox assigns PCI slots.

@'
#!/bin/bash
set -e
echo "--- Renaming Interface enp0s8 to prod ---"

CURRENT_NAME="enp0s8"
NEW_NAME="prod"

# Check if rename is already done
if ip link show "$NEW_NAME" >/dev/null 2>&1; then
    exit 0
fi

# Get the MAC address of the current interface
if [ -d "/sys/class/net/$CURRENT_NAME" ]; then
    MAC=$(cat /sys/class/net/$CURRENT_NAME/address)
else
    exit 1
fi

# Create persistent Systemd Link Rule
cat < /etc/systemd/network/10-rename-prod.link
[Match]
MACAddress=$MAC

[Link]
Name=$NEW_NAME
EOF

# Update Netplan Configuration
sed -i "s/$CURRENT_NAME/$NEW_NAME/g" /etc/netplan/*.yaml

# Apply Changes Immediately
# We can safely down this interface because Vagrant uses enp0s3 (NAT) for SSH.
ip link set $CURRENT_NAME down
ip link set $CURRENT_NAME name $NEW_NAME
ip link set $NEW_NAME up

# Apply Netplan to bind the IP to the new name
netplan apply

#ip addr show $NEW_NAME
'@ | Set-Content -Path "07-network.sh"

08-install.sh (Software): Downloads the official Elasticsearch 9.3.0 tarball and extracts it to /mnt/elasticsearch/home. We use the tarball method (instead of apt) for total control over the installation directory structure.

@'
#!/bin/bash
set -e
echo "--- Downloading and Installing Elasticsearch ---"

URL="https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-9.3.0-linux-x86_64.tar.gz"
TMP_FILE="/tmp/elasticsearch.tar.gz"
DEST_DIR="/mnt/elasticsearch/home"

# Download
wget -q -O "$TMP_FILE" "$URL"

# Extract
tar -xzf "$TMP_FILE" -C "$DEST_DIR" --strip-components=1

# Cleanup
rm -f "$TMP_FILE"

# Permission Fix
chown -R 1122:1122 "$DEST_DIR"

'@ | Set-Content -Path "08-install.sh"

Phase 3: Security & Identity (Scripts 09-10)

This is the most complex phase, handling the "Chicken and Egg" problem of distributing security certificates in an automated environment.

09-certs.sh (Certificate Authority):
- Synchronization Barrier: The script uses the NFS share as a "Dead Drop." es1 generates a Certificate Authority (CA). es2 and es3 enter a while loop, pausing execution until they see the CA file appear on the shared drive.
- Generation: Once the CA is available, each node generates its own node.p12 certificate signed by that CA.
- Cleanup: The last node to finish deletes the CA file from the share to ensure the master key is not left exposed.

@'
#!/bin/bash
set -e

ES_HOME="/mnt/elasticsearch/home"
CERT_TOOL="$ES_HOME/bin/elasticsearch-certutil"
NFS_PATH="/mnt/elasticsearch/backup" 
LOCAL_CERTS="$ES_HOME/config/certs"
HOSTNAME=$(hostname)

# IP Calc
NODE_NUM=$(echo $HOSTNAME | tr -dc '0-9')
NODE_IP="192.168.56.$((10 + NODE_NUM))"

# --- CA GENERATION (es1 only) ---
if [ "$HOSTNAME" == "es1" ]; then
    if [ ! -f "$NFS_PATH/elastic-stack-ca.p12" ]; then
        $CERT_TOOL ca --out "$NFS_PATH/elastic-stack-ca.p12" --pass ""
    fi
    chmod 777 "$NFS_PATH/elastic-stack-ca.p12"
fi

# --- CERT GENERATION ---
while [ ! -f "$NFS_PATH/elastic-stack-ca.p12" ]; do sleep 2; done

mkdir -p "$LOCAL_CERTS"

# UPDATED NAME: node.p12
if [ ! -f "$LOCAL_CERTS/node.p12" ]; then
    $CERT_TOOL cert \
        --ca "$NFS_PATH/elastic-stack-ca.p12" \
        --ca-pass "" \
        --out "$LOCAL_CERTS/node.p12" \
        --pass "" \
        --name "$HOSTNAME" \
        --dns "$HOSTNAME,localhost" \
        --ip "$NODE_IP,127.0.0.1"
fi

chown -R 1122:1122 "$LOCAL_CERTS"
chmod 600 "$LOCAL_CERTS/node.p12"
chmod 700 "$LOCAL_CERTS"

# --- CLEANUP ---
touch "$NFS_PATH/$HOSTNAME.cert_done"
DONE_COUNT=$(find "$NFS_PATH" -maxdepth 1 -name "*.cert_done" | wc -l)

if [ "$DONE_COUNT" -ge 3 ]; then
    rm -f "$NFS_PATH/elastic-stack-ca.p12"
    rm -f "$NFS_PATH"/*.cert_done
    rm -f "$NFS_PATH"/*.txt
fi
'@ | Set-Content -Path "09-certs.sh"

10-keystore.sh (Secret Management): Elasticsearch stores sensitive passwords in a secure elasticsearch.keystore file. This script adds the SSL passwords (empty strings in this lab context) to the keystore. It uses a clever input redirection trick (< $PASS_FILE) to feed passwords into the command, preventing Java from crashing in the headless Vagrant terminal.

@'
#!/bin/bash
set -e
echo "--- Setup Keystore ---"

ES_HOME="/mnt/elasticsearch/home"
KEYSTORE_BIN="$ES_HOME/bin/elasticsearch-keystore"
PASS_FILE="/tmp/keystore_pass"

# Create a temp file with a NEWLINE
echo "" > "$PASS_FILE"

# Create Keystore
if [ ! -f "$ES_HOME/config/elasticsearch.keystore" ]; then
    $KEYSTORE_BIN create
fi

# Add Keys function
add_key() {
    local key_name=$1
    if ! $KEYSTORE_BIN list | grep -q "$key_name"; then
        $KEYSTORE_BIN add --stdin --force "$key_name" < "$PASS_FILE"
    fi
}

# Add Transport Layer Passwords
add_key "xpack.security.transport.ssl.keystore.secure_password"
add_key "xpack.security.transport.ssl.truststore.secure_password"

# Add HTTP Layer Passwords
add_key "xpack.security.http.ssl.keystore.secure_password"
add_key "xpack.security.http.ssl.truststore.secure_password"

# Cleanup & Permissions
rm -f "$PASS_FILE"

chown 1122:1122 "$ES_HOME/config/elasticsearch.keystore"
chmod 600 "$ES_HOME/config/elasticsearch.keystore"

'@ | Set-Content -Path "10-keystore.sh"

Phase 4: Service & Cluster (Scripts 11-12)

The final phase configures the application and boots the cluster.

11-service.sh (Systemd Integration): Creates a systemd unit file to manage Elasticsearch as a background service. We use Type=simple because it is the most robust method for tarball installations, avoiding timeout issues often seen with Type=notify.

@'
#!/bin/bash
set -e

# Variables
SERVICE_FILE="/etc/systemd/system/elasticsearch.service"
ES_HOME="/mnt/elasticsearch/home"
ES_CONF="$ES_HOME/config"
USER="elasticsearch"
GROUP="elasticsearch"

# Create the Systemd Unit File

cat < $SERVICE_FILE
[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
RuntimeDirectory=elasticsearch
PrivateTmp=true

# Environment Variables
Environment=ES_HOME=$ES_HOME
Environment=ES_PATH_CONF=$ES_CONF
Environment=PID_DIR=/run/elasticsearch

# Execution
WorkingDirectory=$ES_HOME
User=$USER
Group=$GROUP
ExecStart=$ES_HOME/bin/elasticsearch -p /run/elasticsearch/elasticsearch.pid --quiet

# Logging
StandardOutput=journal
StandardError=inherit

# Resource Limits
LimitNOFILE=65535
LimitNPROC=4096
LimitAS=infinity
LimitFSIZE=infinity
LimitMEMLOCK=infinity

# Timeouts
TimeoutStartSec=75
TimeoutStopSec=0
KillSignal=SIGTERM
KillMode=process
SendSIGKILL=no
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target
EOF

# Set Kernel Parameters
SYSCTL_FILE="/etc/sysctl.d/99-elasticsearch.conf"
if [ ! -f "$SYSCTL_FILE" ]; then
    echo "vm.max_map_count=262144" > "$SYSCTL_FILE"
    sysctl -p "$SYSCTL_FILE"
fi

# Reload Systemd and Enable
systemctl daemon-reload
systemctl enable elasticsearch

# Check Status
systemctl status elasticsearch --no-pager | grep "Loaded:"
'@ | Set-Content -Path "11-service.sh"

12-configure-cluster.sh (Bootstrap): Writes the final elasticsearch.yml and jvm.options.
- Heap: Sets -Xms2g -Xmx2g (50% of RAM).
- Discovery: Lists all 3 IPs so nodes can find each other.
- Binding: network.host: [_local_, "_prod_"] forces the node to listen on the specific internal interface we configured, securing it from the public internet.
- Security: Enables xpack.security and points to the certificates generated in Script 09.

@'
#!/bin/bash
set -e
echo "--- Configuring Elasticsearch Cluster & JVM ---"

# Variables
ES_HOME="/mnt/elasticsearch/home"
CONFIG_DIR="$ES_HOME/config"
YML_FILE="$CONFIG_DIR/elasticsearch.yml"
JVM_FILE="$CONFIG_DIR/jvm.options.d/heap.options"
HOSTNAME=$(hostname)

# JVM HEAP CONFIGURATION
cat < $JVM_FILE
-Xms2g
-Xmx2g
EOF
chown 1122:1122 $JVM_FILE

# ELASTICSEARCH.YML CONFIGURATION

cat < $YML_FILE
# --- Cluster & Node ---
cluster.name: es-cluster
node.name: ${HOSTNAME}

# --- Paths ---
path.data: /mnt/elasticsearch/data
path.logs: /mnt/elasticsearch/logs
path.repo: ["/mnt/elasticsearch/backup"]

# --- Network ---
network.host: [_local_, "_prod_"]
http.port: 9200

# --- Discovery ---
discovery.seed_hosts: ["192.168.56.11", "192.168.56.12", "192.168.56.13"]
cluster.initial_master_nodes: ["es1", "es2", "es3"]

# --- Memory ---
bootstrap.memory_lock: true

# --- Safety ---
action.destructive_requires_name: true

# --- Security (xPack) ---
xpack.security.enabled: true

# Transport Layer
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/node.p12
xpack.security.transport.ssl.truststore.path: certs/node.p12

# HTTP Layer
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/node.p12
xpack.security.http.ssl.truststore.path: certs/node.p12
EOF

# Secure the config file
chown 1122:1122 $YML_FILE
chmod 660 $YML_FILE

# RESTART SERVICE

# We perform a reload/restart to apply the changes
systemctl restart elasticsearch

# Wait and Check Health
sleep 15
if systemctl is-active --quiet elasticsearch; then
    PROD_IP=$(ip -4 addr show prod | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
    echo "SUCCESS: Elasticsearch is running on $HOSTNAME binding to prod ($PROD_IP)"
else
    exit 1
fi
'@ | Set-Content -Path "12-configure-cluster.sh"

13-sudoers.sh (Service Management): Grants the elasticsearch user specific sudo privileges to start, stop, restart, and check the status of the elasticsearch service without requiring a password. This allows for automated maintenance scripts or easier manual intervention without needing root access.

@'
#!/bin/bash
set -e

SUDO_FILE="/etc/sudoers.d/elasticsearch"

# Allow systemctl commands without password
cat < $SUDO_FILE
elasticsearch ALL=(root) NOPASSWD: /usr/bin/systemctl start elasticsearch.service
elasticsearch ALL=(root) NOPASSWD: /usr/bin/systemctl stop elasticsearch.service
elasticsearch ALL=(root) NOPASSWD: /usr/bin/systemctl restart elasticsearch.service
elasticsearch ALL=(root) NOPASSWD: /usr/bin/systemctl status elasticsearch.service
elasticsearch ALL=(root) NOPASSWD: /usr/bin/systemctl status elasticsearch
EOF

# Strict permissions are required for sudoers files
chmod 0440 $SUDO_FILE
'@ | Set-Content -Path "13-sudoers.sh"

2.4 Deploy the Cluster

Start the deployment. Vagrant will bring up the VMs in parallel and execute the 12 scripts on each.

vagrant up

Note: This process will take approximately 5-10 minutes depending on your internet connection speed (downloading the 600MB Elasticsearch tarball).

Chapter 3: Verification & Expected Results

Once vagrant up completes, you should verify that your cluster is healthy, secure, and functioning as designed.

3.1 Terminal Output Check

At the very end of the deployment logs in your PowerShell window, you should see the success message from Script 12 for each node:

es1: SUCCESS: Elasticsearch is running on es1 binding to prod (192.168.56.11) es2: SUCCESS: Elasticsearch is running on es2 binding to prod (192.168.56.12) es3: SUCCESS: Elasticsearch is running on es3 binding to prod (192.168.56.13)

3.2 Service Validation

vagrant ssh es1

Once inside, run the following to confirm the service status:

sudo -iu elasticsearch
sudo systemctl status elasticsearch

Expected Result: You should see Active: active (running) and the memory usage should be close to 2GB (due to bootstrap.memory_lock).

3.3 Cluster Health Check (SSL Verification)

Since we enabled security, we must use HTTPS and authenticate. Use the built-in elastic superuser. Since we haven't set a password yet, we will reset it first to something we know (e.g., 123456), then check the health.

# 1. Reset password
/mnt/elasticsearch/home/bin/elasticsearch-reset-password -u elastic -i 

# (Type '123456' when prompted)

# 2. Check Cluster Health
export ELAS_PASS='123456'
curl -k -u elastic:$ELAS_PASS "https://192.168.56.11:9200/_cluster/health?pretty"

Expected Result:

{
  "cluster_name" : "es-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 3,
  "active_shards" : 6,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "unassigned_primary_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Status: Green means all nodes are online and talking to each other.
Number of nodes: 3 confirms es2 and es3 joined successfully.

3.4 NFS Storage Verification

Verify that the NFS share is mounted correctly on the worker nodes.

# Still inside es1, exit back to host
exit

# Login to es2
vagrant ssh es2

# Check disk mounts
sudo -iu elasticsearch
df -h | grep backup

Expected Result:

192.168.56.11:/mnt/elasticsearch/backup  4.0G   61M  4.0G   2% /mnt/elasticsearch/backup

This confirms that es2 is effectively using the disk space of es1 for its snapshot repository.

Conclusion

Congratulations! You have successfully engineered a platform, not just installed software.

By following this guide, you have moved beyond simple "Hello World" tutorials and built a Production-Grade Infrastructure that mimics real-world enterprise environments.

What you have achieved:

True High Availability: A 3-node cluster that can survive the loss of a server without losing data.
Hardened Security: Full SSL encryption on the Transport layer (Node-to-Node) and HTTP layer (Client-to-Node), protecting your data from the start.
Professional Storage: A tiered LVM storage strategy that protects your OS from log floods and enables snapshots via NFS.
Zero-Touch Automation: A reusable Vagrant template that spins up identical, clean environments in minutes, perfect for testing upgrades or training new team members.

Your cluster is now ready for data ingestion, Kibana integration, or experimenting with complex sharding strategies. You have built a solid foundation—now go build something amazing on top of it.

Elasticsearch en Production : Le Guide Ultime de l'Architecture et des Opérations

AKEBLI Ouassim — Wed, 04 Feb 2026 13:02:23 GMT

Définition

Elasticsearch est un moteur de recherche et d'analyse distribué et RESTful, capable de répondre à un nombre croissant de cas d'utilisation. À la base, il s'agit d'une base de données NoSQL, mais contrairement aux bases de données traditionnelles conçues pour le stockage et la récupération simple, Elasticsearch est optimisé pour la vitesse et la pertinence.

Architecture & Capacity Planning

Ce diagramme illustre une architecture typique de cluster de production, montrant la séparation des tâches entre les nœuds maîtres dédiés (dedicated master nodes), les nœuds de données étagés (tiered data nodes - hot/warm) et les nœuds de coordination (coordinating nodes).

1. Node Roles

A. Rôles de base obligatoires

1. Master-eligible node (`master`)

Fonction : Responsable des actions à l'échelle du cluster, telles que la création ou la suppression d'index, le suivi des nœuds faisant partie du cluster et l'allocation des shards aux nœuds.
Pourquoi c'est obligatoire : Sans maître, le cluster ne peut pas être formé et aucun changement au niveau du cluster ne peut être suivi.
Conseil de production : Vous avez généralement besoin de 3 nœuds maîtres dédiés pour la Haute Disponibilité (High Availability) afin d'éviter le "split-brain".

2. Data node (`data`)

Fonction : Contient les shards qui hébergent vos documents indexés. Ces nœuds effectuent des opérations liées aux données comme le CRUD, la recherche et les agrégations.
Pourquoi c'est obligatoire : Sans nœuds de données, vous ne pouvez stocker aucune donnée.
Sous-rôles (Tiered Architecture) :
- data_content : Pour les données à usage général qui ne correspondent pas à un cycle de vie de séries temporelles (time-series).
- data_hot : Pour les données time-series strictes (ex : logs) qui sont activement écrites et interrogées.
- data_warm : Pour les données plus anciennes qui sont en lecture seule (read-only) mais encore fréquemment interrogées.
- data_cold : Pour les données consultées peu fréquemment (optimisé pour le stockage).
- data_frozen : Pour les données stockées dans le stockage objet (S3) et rarement interrogées (Searchable Snapshots).

B. Core Utility Roles (Hautement recommandés)

Bien qu'ils ne soient pas strictement "obligatoires" pour le démarrage du cluster, ils sont standard dans presque tous les environnements de production.

1. Ingest node (`ingest`)

Fonction : Exécute des "ingest pipelines" pour prétraiter les documents avant l'indexation. Cela agit comme un Logstash léger à l'intérieur d'Elasticsearch (ex : parsing JSON, suppression de champs, renommage de champs).
Comportement par défaut : Chaque nœud est un ingest node par défaut, sauf configuration contraire.

2. Coordinating-only node (Aucun rôle spécifique défini)

Fonction : Ces nœuds ont une liste de rôles vide (node.roles: []). Ils agissent comme des "Smart Load Balancers". Ils acceptent les requêtes de recherche, les distribuent aux nœuds de données spécifiques détenant les données, rassemblent les résultats, effectuent la réduction finale (tri/agrégation) et envoient la réponse au client.
Cas d'utilisation : Grands clusters avec un trafic de recherche important pour éviter que les data nodes ne soient submergés par des tâches d'agrégation gourmandes en CPU.

C. Specialized Roles (Optionnels)

Ceux-ci sont spécifiques à certaines fonctionnalités de la Suite Elastic.

1. Machine Learning node (`ml`)

Fonction : Exécute les tâches natives de Machine Learning (détection d'anomalies, prévisions).
Exigence : Ces tâches sont gourmandes en CPU et en RAM. Si vous utilisez des fonctionnalités ML, vous devez avoir au moins un nœud ML.

2. Remote Cluster Client (`remote_cluster_client`)

Fonction : Permet au cluster de se connecter à d'autres clusters (Cross-Cluster Search ou Cross-Cluster Replication).
Défaut : Activé par défaut sur tous les nœuds.

3. Transform node (`transform`)

Fonction : Exécute des tâches de transformation qui pivotent ou résument les données dans de nouveaux index (similaire aux "Materialized Views" en SQL).

4. Voting-only node (`voting_only`)

Fonction : Un nœud master-eligible qui peut participer aux élections du maître (voting) mais ne peut pas devenir le maître élu.
Cas d'utilisation : Rarement utilisé ; principalement pour départager les votes (tie-breaking) dans les clusters à nombre pair.

2. Hardware Requirements

Ces chiffres sont basés sur les contraintes de la JVM (Java Virtual Machine) et les meilleures pratiques opérationnelles pour la récupération et la stabilité.

A. RAM (Mémoire)

C'est la ressource la plus critique. Elle est divisée entre la JVM Heap (pour l'application) et l'OS Filesystem Cache (pour les fichiers de segments Lucene).

Minimum : 8 GB - 16 GB
- Exécuter un nœud de production avec moins de 8GB est risqué. Vous avez besoin de suffisamment de marge (headroom) pour que l'OS puisse mettre en cache les segments d'index fréquemment consultés.
Le "Sweet Spot" : 64 GB
- C'est la spécification standard pour un nœud haute performance. Cela vous permet d'allouer environ 30GB à la Heap et de laisser environ 34GB pour le cache de l'OS.
Maximum (Effectif) : 64 GB (RAM Physique)

⚠️ La limite "Compressed Oops" Vous devez strictement éviter d'allouer plus de 31GB-32GB à la JVM Heap. Si la Heap dépasse ~32GB, la JVM cesse d'utiliser les "Compressed Object Pointers" (les pointeurs gonflent de 32-bit à 64-bit). Cela réduit considérablement l'efficacité de la mémoire, réduisant effectivement votre mémoire disponible de moitié.

B. Disk (Stockage)

La vitesse du disque dicte le débit d'indexation, et la taille du disque dicte la durée de la récupération (rééquilibrage des shards).

Exigence de type : SSD / NVMe
- Les disques durs rotatifs (Spinning HDDs) ne sont acceptables que pour les tiers "Cold" ou "Frozen".
Capacité minimale : 200 GB
- Les petits clusters ont besoin d'assez d'espace pour les logs, l'OS et une marge pour le shard merging.

Capacité maximale (Par nœud) :

Hot Nodes (High IO) : Limite de 2 TB - 4 TB.
- Raisonnement : Si un nœud contient 10TB de données "hot" et tombe en panne, répliquer ces 10TB sur un nouveau nœud via le réseau prend des heures ou des jours, laissant le cluster dans un état "Yellow" (à risque).
Warm/Cold Nodes : 10 TB - 16 TB.
- Raisonnement : Puisque ces nœuds sont query-heavy (lecture) mais low-write (écriture), vous pouvez les remplir avec un stockage dense, à condition d'accepter des temps de récupération plus lents.

C. Network

Elasticsearch est un système distribué ; le réseau en est le bus.

Minimum : 1 Gbps (Gigabit Ethernet)
- Acceptable uniquement pour les petits clusters avec des taux d'indexation faibles.
- Risque : Lors d'une "peer recovery" (quand un nœud revient en ligne), un lien de 1 Gbps sera saturé à 100%, provoquant une augmentation de la latence de recherche.
Recommandé / Maximum : 10 Gbps - 25 Gbps
- 10 Gbps est la norme pour les clusters de production modernes afin de garantir que la récupération n'impacte pas le trafic live.
Latence : Doit être de quelques millisecondes (intra-datacenter).
- Avertissement : Étendre un cluster unique sur des régions géographiques distinctes (ex : US-East vers EU-West) est généralement non supporté et causera une instabilité due aux erreurs de timeout.

3. Sizing & Sharding

Ce diagramme fournit une visualisation claire de la façon dont un index unique est divisé en shards primaires (P) et comment chaque shard primaire a un shard réplica correspondant (R) distribué sur différents nœuds pour la haute disponibilité.

A. Shard Strategy

Le sharding est le mécanisme qui permet à Elasticsearch de s'étendre au-delà des limites matérielles d'un seul serveur. Cependant, c'est la source la plus courante de problèmes de performance en production.

1. Le Concept

Un index Elasticsearch est en réalité un regroupement logique de Shards. Chaque shard est une instance autonome d'Apache Lucene, qui est un moteur de recherche entièrement fonctionnel à part entière. Lorsque vous exécutez une recherche sur un index, Elasticsearch interroge tous les shards pertinents en parallèle et fusionne les résultats.

2. Le piège de l'Oversharding"

Les nouveaux utilisateurs pensent souvent : "Si les shards offrent du parallélisme, plus de shards signifie plus de vitesse." C'est une fausse idée connue sous le nom d'oversharding.

La surcharge des métadonnées : Chaque shard consomme des ressources. Le Cluster State (le "cerveau" du cluster) doit suivre l'emplacement, le statut et la taille de chaque shard. Si vous avez 100 000 petits shards, le Cluster State devient énorme et les mises à jour (comme la création d'un nouvel index) deviennent incroyablement lentes.
La taxe "Map-Reduce" : Lorsque vous recherchez dans un index avec 50 shards, le nœud de coordination doit envoyer la requête à 50 endroits, attendre 50 réponses et fusionner 50 résultats. Si ces shards sont minuscules (ex : 50MB), la surcharge de gestion de la requête dépasse le bénéfice du traitement parallèle.
Coût en mémoire : Chaque shard a une empreinte mémoire de base dans la JVM Heap pour conserver les informations de segment Lucene. Trop de petits shards épuiseront votre mémoire Heap même si le cluster est inactif.

✅ La Règle d'Or : 10GB – 50GB Pour la recherche à usage général (ex : produits, utilisateurs), visez une taille de shard comprise entre 10GB et 50GB.

Pourquoi > 10GB ? Pour minimiser la surcharge par shard et maximiser une compression efficace.
Pourquoi < 50GB ? Pour s'assurer que la récupération est rapide. Si un nœud tombe en panne, déplacer un shard de 50GB vers un nouveau nœud via le réseau est gérable. Déplacer un shard de 500GB prend tellement de temps que votre cluster reste dans un état vulnérable (santé "Yellow") pendant des heures.

B. Replicas

La réplication sert deux objectifs distincts dans un cluster : la High Availability (HA) et le Read Throughput (débit de lecture).

1. Failover (Haute Disponibilité)

Un Replica Shard est une copie précise d'un Primary Shard.

Le Mécanisme : Si le nœud tenant un Primary Shard plante, le nœud maître "promeut" instantanément le Replica Shard (vivant sur un nœud différent) pour être le nouveau Primary.
Standard de Production : Vous devez définir number_of_replicas: 1 (au minimum). Cela garantit que si un seul nœud tombe en panne, aucune donnée n'est perdue et le cluster reste pleinement opérationnel.
Le Compromis : Les réplicas doublent vos besoins de stockage. 100GB de données avec 1 replica nécessite 200GB d'espace disque physique.

2. Read Throughput (Scaling Search)

Contrairement aux Primary shards, qui gèrent à la fois les lectures et les écritures, les Replicas sont généralement utilisés pour les lectures.

Load Balancing : Lorsqu'une requête de recherche arrive, le nœud de coordination la route intelligemment. Elle peut aller vers le Primary ou l'un de ses Replicas.
Scaling Up : Si votre application est "Read Heavy" (ex : un site e-commerce où les utilisateurs cherchent souvent mais les produits changent rarement), vous pouvez augmenter les performances en ajoutant plus de replicas.
- Exemple : Un index avec 1 Primary et 5 Replicas permet à 6 nœuds de répondre aux requêtes de recherche simultanément pour ces données spécifiques.

Distinction Clé :

Primary Shards sont fixés à la création de l'index (les changer nécessite une réindexation).

Replica Shards peuvent être changés dynamiquement. Vous pouvez passer de 1 replica à 5 replicas instantanément si vous prévoyez un pic de trafic (comme le Black Friday), et revenir en arrière ensuite.

Environment Preparation (OS Tuning)

Elasticsearch est sensible aux configurations du Système d'Exploitation. Ne pas les optimiser empêchera souvent le cluster de démarrer (Bootstrap Checks).

1. Disable Swapping (Le "Tueur de Performance")

Le Concept : Dans un serveur standard, si la RAM physique est pleine, l'OS déplace les pages mémoire inactives vers le disque dur (swap space). Pour Elasticsearch, c'est catastrophique. Le Java Garbage Collector (GC) a besoin de scanner la mémoire pour récupérer de l'espace. Si cette mémoire est sur le disque (qui est 100 000x plus lent que la RAM), un cycle GC qui prend habituellement des millisecondes prendra des secondes ou des minutes.

Le Résultat : Le nœud devient non réactif (pause "Stop-the-world"), le cluster pense que le nœud est mort, l'éjecte et déclenche un rééquilibrage massif des données.

Comment le configurer

Vous avez deux méthodes principales. La meilleure pratique est de faire les deux.

OS Level (Permanent) : Disable swap completely.
```
 sudo swapoff -a
```
Application Level (Memory Lock) : Force Elasticsearch à verrouiller son espace d'adressage mémoire dans la RAM pour que l'OS ne puisse pas le swapper.
- Dans elasticsearch.yml :
```
  bootstrap.memory_lock: true
```
- Note : Vous devrez peut-être éditer le fichier de service systemd (systemctl edit elasticsearch) pour autoriser cette limite :
```
  [Service]
  LimitMEMLOCK=infinity
```

2. File Descriptors (La "Limite de Capacité")

Le Concept : Elasticsearch (via Lucene) décompose vos données en fichiers immuables fortement compressés appelés "segments". Un seul nœud peut facilement garder des milliers de ces petits fichiers ouverts simultanément. La limite par défaut de Linux pour les fichiers ouverts par utilisateur est souvent de 1024. C'est beaucoup trop bas. Si Elasticsearch atteint cette limite, il peut perdre silencieusement des données ou planter car il ne peut pas écrire dans de nouveaux fichiers.

Comment le configurer

Vous devez augmenter la limite à au moins 65 536.

Vérifier la limite actuelle : ulimit -n
Correctif permanent : Éditer /etc/security/limits.conf :
```
  elasticsearch - nofile 65536
```
(Si vous installez via un paquet RPM/Deb, cela est souvent fait automatiquement, mais vous devez le vérifier).

3. Virtual Memory (L'exigence `mmap`)

Le Concept : C'est spécifique à la façon dont Lucene lit les données. Il utilise un appel système appelé mmap (memory map) pour mapper les fichiers sur le disque directement dans l'espace d'adressage de la mémoire virtuelle. C'est incroyablement rapide car cela laisse le noyau gérer la mise en cache des fichiers. Cependant, la limite par défaut du système d'exploitation sur le nombre de "maps mémoire" qu'un processus peut posséder est généralement de 65 530. Elasticsearch en nécessite beaucoup plus.

Comment le configurer

C'est la raison la plus courante des échecs de démarrage.

Commande (Live) :
```
  sysctl -w vm.max_map_count=262144
```
Correctif permanent : Ajoutez cette ligne à /etc/sysctl.conf :
```
  vm.max_map_count=262144
```

4. JVM Heap Size (Le jeu d'équilibre)

C'est le paramètre le plus mal compris. Vous configurez la mémoire de la Machine Virtuelle Java (JVM).

A. `Xms` et `Xmx` (Min vs. Max)

Le Problème : Par défaut, Java démarre avec une petite heap (Xms) et l'agrandit selon les besoins jusqu'au max (Xmx). Ce processus de redimensionnement met en pause l'exécution.
La Solution : Définissez-les à la même valeur. Cela alloue toute la mémoire immédiatement au démarrage, empêchant les pauses de redimensionnement.
```
  # /etc/elasticsearch/jvm.options
  -Xms4g
  -Xmx4g
```

B. La règle des 50% (Pourquoi pas 100% ?)

Si vous avez une machine de 64GB, pourquoi donner seulement 30GB à Elasticsearch ? Pourquoi pas 60GB ?

La Raison : Elasticsearch repose sur deux types de mémoire :
1. JVM Heap : Pour les objets de requête, les agrégations et le Cluster State.
2. OS Filesystem Cache : C'est là que vivent les données réelles (segments Lucene).
Si vous donnez toute la RAM à la Heap, l'OS n'a plus de place pour mettre les fichiers en cache. Le disque sera sollicité à l'excès (thrashed), et les performances s'effondreront.
Règle : 50% à la Heap, 50% laissés libres pour l'OS.

C. La limite de 32GB (Compressed Oops)

Vous ne devez jamais définir la Heap au-dessus de ~32GB (le seuil exact varie, généralement 30GB-31GB est sûr).

La Science : En dessous de 32GB, Java utilise des "Compressed Ordinary Object Pointers" (Compressed Oops). Il utilise des pointeurs de 32-bit pour référencer la mémoire.
Le Piège : Une fois que vous franchissez le seuil (ex : 32.1GB), Java bascule vers des pointeurs 64-bit. Ils sont plus grands.
Le Résultat : Une Heap de 35GB stocke en réalité moins de données qu'une Heap de 31GB car les pointeurs eux-mêmes prennent beaucoup plus de place. De plus, cela consomme plus de bande passante CPU.

Security (La couche "Must-Have")

Cette section piège souvent les nouveaux administrateurs car elle implique des certificats et des mots de passe, ce qui peut être fastidieux. Cependant, dans les versions modernes d'Elasticsearch (version 8.x+), la sécurité est activée par défaut. Vous ne pouvez pas exécuter un cluster de production sans elle.

1. TLS/SSL Encryption (Le "Tunnel Chiffré")

Le chiffrement empêche les attaques de type "Man-in-the-Middle". Dans Elasticsearch, nous implémentons cela en deux couches distinctes. Si vous manquez la première, votre cluster ne démarrera même pas.

A. Transport Layer (Node-to-Node)

Ce que c'est : Le canal de communication interne sur le port 9300 où les nœuds se parlent (élection des maîtres, déplacement des shards, réplication des données).
Pourquoi c'est obligatoire : Elasticsearch nécessite une confiance mutuelle. Le Nœud A doit prouver au Nœud B qu'il est une partie légitime du cluster, et non un serveur malveillant essayant de voler des données.
Le Mécanisme :
1. Vous générez une Certificate Authority (CA).
2. Vous signez un certificat pour chaque nœud en utilisant cette CA.
Note Cruciale : Si vous n'activez pas le Transport SSL, Elasticsearch refusera de se lier à une adresse IP non-loopback (c'est-à-dire qu'il reste en "Development Mode").
Outil clé : bin/elasticsearch-certutil (Cet outil intégré simplifie la création de ces certificats).

B. HTTP Layer (Client-to-Cluster)

Ce que c'est : L'API externe sur le port 9200 où Kibana, votre application (Java/Python/Node.js) et les utilisateurs se connectent.
Pourquoi c'est critique : Sans cela, les identifiants Basic Auth (username/password) sont envoyés en texte clair. N'importe qui sur le réseau peut intercepter le mot de passe admin.
Configuration : Vous utilisez généralement la même CA pour signer ces certificats, ou vous pouvez utiliser une CA publique (Let's Encrypt, Verisign) si votre cluster est public-facing.

2. Authentication & RBAC (Le "Gatekeeper")

Une fois la connexion sécurisée, vous devez contrôler qui se connecte et ce qu'ils peuvent toucher. C'est le Role-Based Access Control (RBAC).

A. Les utilisateurs intégrés

Lorsque vous démarrez le cluster pour la première fois, vous exécutez bin/elasticsearch-reset-password. Cela configure les comptes réservés qui sont vitaux pour la stack :

elastic: Le "Superuser" (Root). Il a le contrôle total.

Danger : N'utilisez pas ce compte dans le code de votre application ! Si ces identifiants fuitent, tout votre cluster est compromis.
kibana_system: Un service account utilisé uniquement par le serveur Kibana pour parler à Elasticsearch. Il ne peut pas être utilisé pour se connecter au tableau de bord.

B. Le Principe du Moindre Privilège

Vous devriez créer des rôles personnalisés pour chaque cas d'usage spécifique.

Le Rôle "Developer" : Peut lire et surveiller les index mais ne peut pas supprimer de données ou changer les paramètres du cluster.
Le Rôle "App" : Peut écrire dans l'index logs-* mais ne peut pas lire l'index salary-data.
Document Level Security (DLS) : Vous pouvez même restreindre l'accès au sein d'un seul index.
- Exemple : "L'utilisateur A peut chercher dans l'index employees, mais seulement les documents où department: 'marketing'."

3. Audit Logging (La "Boîte Noire")

Si des données disparaissent ou fuitent, comment savez-vous ce qui s'est passé ?

Ce qu'il suit : Vous pouvez le configurer pour journaliser des événements spécifiques : "Authentication Failed," "Index Deleted," ou même "L'utilisateur X a cherché la requête Y."
Conformité : C'est obligatoire pour des normes comme le RGPD, HIPAA et PCI-DSS.
Avertissement de performance : L'Audit logging est intensif en I/O.
Mauvaise pratique : Journaliser chaque opération de "read". Cela remplira votre disque de logs et ralentira les performances de recherche.
Meilleure pratique : Journaliser uniquement les opérations d'écriture/suppression" et les "authentication failures" pour détecter les attaques par force brute.

Deployment Methods

L'Introduction : Une taille unique ne convient pas à tous

Elasticsearch est agnostique en matière d'infrastructure. Il s'exécute partout où Java s'exécute. Le choix de la méthode de déploiement dépend généralement de trois facteurs :

Infrastructure existante : Êtes-vous déjà "all-in" sur Kubernetes ? Avez-vous des racks de serveurs physiques ?
Expertise de l'équipe : Vos opérationnels sont-ils à l'aise avec le Linux kernel tuning, ou préfèrent-ils écrire des manifestes YAML ?
Besoins d'évolutivité : Avez-vous besoin d'ajouter 10 nœuds en 5 minutes pendant le Black Friday, ou votre cluster est-il relativement statique ?

1. Bare Metal / Virtual Machines (L'approche "Classique")

C'est la manière traditionnelle de déployer des logiciels. Vous traitez Elasticsearch comme n'importe quelle autre base de données (PostgreSQL, MySQL).

Comment ça marche :

Vous provisionnez des serveurs Linux (matériel physique ou VMs comme EC2/Azure VMs), installez Java (si vous utilisez d'anciennes versions d'ES), configurez les prérequis de l'OS (comme discuté dans la section OS Tuning), ajoutez le dépôt Elastic et installez via les gestionnaires de paquets :

# Exemple Ubuntu/Debian
wget -qO - [https://artifacts.elastic.co/GPG-KEY-elasticsearch](https://artifacts.elastic.co/GPG-KEY-elasticsearch) | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
sudo apt-get install elasticsearch

Avantages :

Performance Maximale : Vous avez un accès direct aux ressources matérielles sans aucune couche d'abstraction de conteneurisation. C'est idéal pour les cas d'utilisation ultra-performants.
Dépannage plus simple : Si vous devez déboguer la latence réseau ou les I/O disque, vous utilisez les outils Linux standard (iostat, tcpdump) directement sur l'hôte.
La persistance est facile : Les données sont écrites directement sur les disques attachés. Vous n'avez pas à vous soucier des Container Storage Interfaces (CSI) complexes.

Inconvénients :

Surcharge de maintenance : Monter en charge signifie provisionner manuellement un nouveau serveur et le configurer. Les mises à jour nécessitent des "rolling restarts" manuels et prudents.
Configuration Drift : Sans outils de gestion de configuration solides (Ansible, Chef, Puppet), les serveurs peuvent lentement diverger dans leurs configurations au fil du temps, menant à des problèmes du type "ça marche sur le nœud 1 mais pas sur le nœud 2".
Idéal pour : Les environnements IT traditionnels, les clusters stables de longue durée et les équipes ayant de solides compétences en administration Linux.

2. Docker / Containers (L'approche "Prototypage Rapide")

Docker a tout changé en permettant aux développeurs de lancer des stacks complexes localement en quelques secondes.

Comment ça marche : Elastic fournit des images Docker officielles et pré-renforcées. Vous exécutez rarement des commandes docker run simples. Au lieu de cela, vous utilisez Docker Compose pour définir un cluster multi-nœuds dans un seul fichier YAML.

Avantages :

Vitesse : Vous pouvez passer de zéro à un cluster fonctionnel de 3 nœuds sur votre ordinateur portable en moins de 60 secondes.
Cohérence : L'environnement est identique entre le développement, les tests et la pré-production. "Ça marche sur ma machine" signifie réellement quelque chose.
Isolation : Les dépendances sont empaquetées avec le conteneur.

Inconvénients :

Pas un Orchestrateur de Production : Docker Compose n'est généralement pas recommandé pour les environnements de production multi-hôtes. Il manque de fonctionnalités avancées de failover, de scaling et de networking nécessaires pour la haute disponibilité.
State Management : Vous devez être très prudent avec le volume mapping pour vous assurer que les données persistent si un conteneur redémarre.

Idéal pour : Le développement local, les pipelines de tests CI/CD et les très petits déploiements de production sur un seul hôte.

3. Kubernetes & ECK (Le Standard "Cloud-Native")

Si votre organisation a adopté Kubernetes (K8s), c'est presque certainement ainsi que vous devriez déployer Elasticsearch. Mais il y a un avertissement massif.

Le Piège du "Helm Chart" Les nouveaux utilisateurs de K8s essaient souvent de déployer Elasticsearch en utilisant des graphiques Helm génériques standard. Évitez cela. Elasticsearch est un système distribué complexe et stateful. Un déploiement K8s standard ne comprend pas que vous ne pouvez pas simplement tuer 3 nœuds maîtres simultanément pendant une mise à jour sans détruire le cluster.

La Solution : Elastic Cloud sur Kubernetes (ECK) Elastic a développé son propre Kubernetes Operator appelé ECK.

Qu'est-ce qu'un Operator ? Pensez-y comme un robot logiciel qui s'exécute à l'intérieur de votre cluster K8s et possède une connaissance opérationnelle humaine sur Elasticsearch. Il sait exactement dans quel ordre redémarrer les nœuds pour que le cluster ne tombe jamais en panne.

Comment ça marche : Au lieu de gérer directement les pods et les statefulsets, vous installez l'opérateur ECK, puis vous soumettez un simple YAML de ressource personnalisée à K8s disant "Je veux un cluster Elasticsearch".

L'Opérateur voit le YAML et crée automatiquement les Services, StatefulSets, PersistentVolumeClaims et génère les certificats TLS.

Avantages :

Day 2 Operations Automatisées : L'Opérateur gère automatiquement le scaling, les rolling upgrades, la configuration sécurisée et les backups.
Écosystème Elastic : Il rend le déploiement de Kibana, APM Server et Beats aux côtés d'Elasticsearch incroyablement facile.

Inconvénients :

Haute Complexité : Vous avez besoin d'une expertise Kubernetes significative avant d'ajouter la complexité d'exécuter une base de données stateful par-dessus.

Idéal pour : Les organisations modernes, cloud-native, nécessitant une infrastructure hautement dynamique et évolutive.

Configuration Best Practices

Le Guide Critique de Configuration `elasticsearch.yml`

Le fichier elasticsearch.yml est le centre de contrôle de votre nœud. Bien qu'il existe des centaines de paramètres, se tromper sur ces quelques-uns est la cause la plus fréquente de pannes de production ou de perte de données.

1. Identité : Noms de Cluster & de Nœud

Dans le vide, les noms ne semblent pas techniques. Dans un système distribué, ils sont vitaux pour l'observabilité et l'isolation.

A. `cluster.name`

Le Défaut : elasticsearch
Le Risque : Si vous laissez cela par défaut, un développeur malveillant ou imprudent démarrant une instance locale sur le même réseau (Wi-Fi ou VPN) pourrait accidentellement découvrir et rejoindre votre cluster de production.
Meilleure Pratique : Soyez descriptif et spécifique à l'environnement.

cluster.name: prod-search-cluster-v1

B. `node.name`

Le Défaut : Le nom d'hôte du serveur.
Le Risque : Des noms d'hôtes comme ip-10-0-0-5 sont difficiles à lire dans les logs ou les tableaux de bord Kibana.
Meilleure Pratique : Utilisez une convention de nommage qui indique le rôle et le numéro du nœud. Cela rend le débogage beaucoup plus rapide ("Oh, prod-master-02 est en panne" est plus actionnable que "Le Serveur X est en panne").

node.name: prod-data-hot-01

2. Discovery (Prévenir le "Split-Brain")

"Discovery" est le processus par lequel les nœuds se trouvent et élisent un leader (Master). Si cela est mal configuré, les nœuds formeront des clusters séparés et concurrents, menant à une incohérence des données (Split-Brain).

A. `discovery.seed_hosts` (L'annuaire téléphonique)

Ce paramètre dit au nœud : "Quand tu te réveilles, appelle ces gens pour demander où est le cluster."

Configuration : Vous n'avez pas besoin de lister chaque nœud. Listez simplement les adresses IP ou les noms d'hôtes de vos nœuds éligibles maîtres.

discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]

Note : Si vous utilisez le Cloud/AWS, vous pourriez utiliser un plugin (comme discovery-ec2) pour les détecter automatiquement, mais coder les IP en dur est plus sûr pour le bare metal.

B. `cluster.initial_master_nodes` (Le Bootstrapper)

C'est le paramètre le plus déroutant pour les débutants. Il n'est utilisé qu'une seule fois dans toute la vie du cluster : la toute première fois que vous l'allumez.

Le Problème : Quand vous démarrez 3 nœuds vides, ils pensent tous "Je devrais être le Roi". Sans ce paramètre, ils pourraient former 3 clusters séparés de 1 nœud chacun.
La Solution : Ce paramètre les force à former un quorum. Il dit : "Ne démarre pas le cluster tant que tu ne vois pas un vote de ces nœuds spécifiques."
Configuration : DOIT correspondre exactement au node.name de vos nœuds maîtres.

cluster.initial_master_nodes: ["prod-master-01", "prod-master-02", "prod-master-03"]

Avertissement Critique : Une fois que le cluster est formé pour la première fois, supprimez ce paramètre (ou commentez-le) de votre gestion de configuration. Si vous le laissez, et que plus tard vous essayez de redémarrer un nœud pour rejoindre un cluster existant, il pourrait essayer d'amorcer un nouveau cluster au lieu de rejoindre l'ancien.

3. Path Settings (Sauver votre OS)

Par défaut, Elasticsearch écrit les données dans /var/lib/elasticsearch et les logs dans /var/log/elasticsearch. C'est dangereux.

A. Le risque de la "Partition Racine"

Sur Linux, /var fait généralement partie de la partition racine (/). Si vos utilisateurs inondent le cluster de données (remplissant path.data) ou si le cluster envoie des boucles d'erreurs massives (remplissant path.logs), vous remplirez le disque racine à 100%.

La Conséquence : Quand / est plein, Linux plante. Vous ne pouvez pas vous connecter en SSH pour réparer. Vous devez redémarrer physiquement en mode secours.

B. `path.data`

Meilleure Pratique : Montez un grand disque physique séparé (NVMe/SSD) sur un chemin comme /mnt/data et pointez Elasticsearch dessus. Si ce disque se remplit, Elasticsearch cesse de fonctionner, mais l'OS reste en vie, vous permettant de résoudre le problème.

path.data: /mnt/data/elasticsearch

Astuce Pro (Striping) : Vous pouvez fournir plusieurs chemins. Elasticsearch agira comme un RAID 0 logiciel, répartissant (striping) les données entre eux.

path.data:
  - /mnt/disk1
  - /mnt/disk2

C. `path.logs`

Meilleure Pratique : Idéalement, expédiez les logs vers un système distant (en utilisant Filebeat). Si vous les stockez localement, gardez-les sur une partition séparée de path.data pour qu'un pic de logs massif ne consomme pas votre espace de stockage de données.

Operations & Maintenance: From Hobby to Production

Cette section définit la différence entre un cluster "hobby" et un cluster de "production". Le déploiement est un événement unique ; les opérations sont éternelles.

1. Monitoring: Vous ne pouvez pas gérer ce que vous ne voyez pas

Une erreur courante est d'attendre que les utilisateurs se plaignent que "La recherche est lente" avant de vérifier le cluster. Vous avez besoin d'une visibilité proactive.

A. Les Outils

1. Elastic Stack Monitoring (La Voie Native)

Comment ça marche : Vous activez xpack.monitoring. Le cluster envoie des métriques à lui-même (ou de préférence, à un "Monitoring Cluster" séparé pour éviter d'ajouter de la charge au système de production).
Avantages : Profondément intégré ; l'interface Kibana est pré-construite et excellente.

2. Prometheus & Grafana (La Voie Cloud-Native)

Comment ça marche : Vous exécutez un conteneur sidecar elasticsearch-exporter. Prometheus le scrape, et Grafana le visualise.
Avantages : Standard de l'industrie ; vous permet de corréler les métriques Elasticsearch avec les métriques Linux/Network sur le même tableau de bord.

B. Les "Big 4" Métriques à surveiller

JVM Heap Usage
- Sain : Un motif en "dents de scie" (la mémoire se remplit, le Garbage Collection la nettoie, répéter).
- Danger : Une ligne plate près de 75-90%. Cela signifie que le nœud est affamé de mémoire et va bientôt planter avec une OutOfMemoryError.
Garbage Collection (GC) Count & Time
- Danger : Si le temps de GC "Old Gen" grimpe, votre nœud met en pause l'exécution (Stop-the-World) pour nettoyer la mémoire. Les requêtes de recherche seront suspendues pendant ces pauses.
CPU Usage
- Un CPU élevé est normal pendant l'indexation lourde, mais s'il reste à 100% en continu, vos nœuds sont sous-dimensionnés ou vos requêtes sont trop complexes (ex : wildcards commençant par *).
Thread Pool Rejections
- C'est la métrique d'erreur la plus critique. Cela signifie que le nœud dit : "Je suis trop occupé ; je ne peux pas accepter cette requête." Si les rejets de recherche ou d'écriture sont > 0, vous avez un problème de capacité.

2. Backups: Replicas ≠ Backups

C'est la leçon la plus importante en matière de sécurité des données.

Le Mythe

"J'ai 2 réplicas, donc j'ai 3 copies de mes données. Je n'ai pas besoin de backups."

La Réalité

Les réplicas protègent contre les Hardware Failure (crash de disque). Ils ne protègent pas contre l'Erreur Humaine.

Scénario : Vous exécutez accidentellement DELETE /users.
Résultat : Elasticsearch supprime le Primary shard immédiatement, et propage instantanément cette instruction de suppression à tous les Replica shards. Vos données ont disparu des 3 copies en quelques millisecondes.

La Solution : Snapshots & SLM

Vous devez prendre des Snapshots (instantanés), qui sont des backups incrémentiels envoyés vers un stockage de référentiel externe (S3, Google Cloud Storage, Azure Blob, ou un lecteur NFS partagé).

Incrémentiel : Le premier snapshot copie tout. Le second snapshot copie seulement les segments qui ont changé. C'est léger et rapide.
SLM (Snapshot Lifecycle Management) : N'écrivez pas de scripts manuels. Utilisez la fonctionnalité intégrée SLM pour définir une politique.
Exemple de Politique : "Prendre un snapshot chaque nuit à 2h du matin. Garder les 30 derniers snapshots. Supprimer les plus anciens automatiquement."

3. Updates: La stratégie de "Rolling Restart"

Mettre à jour une base de données signifiait autrefois "Scheduled Downtime" (Temps d'arrêt planifié) un dimanche soir. Avec Elasticsearch, vous pouvez mettre à jour sans temps d'arrêt si vous suivez la procédure de "Rolling Restart".

La Logique

Vous n'éteignez jamais tout le cluster. Vous éteignez un nœud, le mettez à jour, le rallumez et passez au suivant.

L'Étape Critique : Désactiver l'Allocation

Avant d'arrêter un nœud, vous devez dire au cluster : "J'éteins ce nœud exprès. Ne panique pas et ne commence pas à reconstruire ses données ailleurs."

Le Workflow

1. Stop Allocation (Cela fige la disposition du cluster pour que les shards ne bougent pas).

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

2. Stop the Node Exécutez systemctl stop elasticsearch.

3. Upgrade Mettez à jour le paquet ou remplacez l'image Docker.

4. Start the Node Exécutez systemctl start elasticsearch.

5. Wait for Green Surveillez les logs ou _cat/nodes jusqu'à ce que le nœud rejoigne le cluster.

6. Re-enable Allocation

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}

7. Repeat Passez au nœud suivant.

Common Pitfalls: The Difference Between Novices and Experts

Cette section sépare les novices des experts. Ce sont les problèmes qui n'apparaissent pas dans un tutoriel "Hello World" ; ils n'apparaissent que lorsque vous êtes en production, sous charge, et généralement à 3 heures du matin.

1. Split Brain (Le problème des "Deux Capitaines")

Ce diagramme fournit une représentation visuelle du scénario de "split-brain", où une partition réseau conduit à la formation de deux clusters séparés, chacun avec son propre maître, risquant l'incohérence des données.

C'est le scénario cauchemardesque pour les systèmes distribués.

Le Scénario

Imaginez que vous avez un cluster de 3 nœuds (A, B, C) dans une seule pièce. Un switch réseau tombe en panne, coupant la pièce en deux. Les nœuds A et B peuvent se parler, mais le nœud C est isolé.

Le Glitch

Les nœuds A+B réalisent que C a disparu. Ils élisent le Nœud A comme Master.
Le nœud C pense que A et B sont morts. Il s'élit lui-même comme Master.

Le Résultat (Split Brain)

Vous avez maintenant deux maîtres actifs dans le même cluster.

L'Application 1 écrit des données sur le Nœud A.
L'Application 2 écrit des données sur le Nœud C.

La Catastrophe

Quand le réseau revient, vous avez deux versions différentes de l'historique. Elasticsearch ne peut pas "fusionner" ces chronologies. Vous perdrez probablement les données écrites du côté le plus petit de la partition.

Le Fix (Quorum)

Dans les anciennes versions (6.x et moins) : Vous deviez définir manuellement discovery.zen.minimum_master_nodes à (N/2) + 1.
Dans les versions modernes (7.x+) : Elasticsearch utilise automatiquement un système de Voting Configuration. Cependant, vous devez vous assurer d'avoir 3 nœuds éligibles maîtres (un nombre impair) pour qu'il y ait toujours un gagnant majoritaire lors d'un vote. Ne faites jamais tourner un cluster de production avec exactement 2 nœuds maîtres.

2. Mapping Explosion (La "Mort par Champs")

Elasticsearch est sans schéma par défaut (Dynamic Mapping), ce qui semble génial jusqu'à ce que cela plante votre cluster.

Le Scénario

Vous journalisez les cookies utilisateurs ou les en-têtes HTTP. Un développeur décide d'envoyer un document JSON où les clés sont des UUID uniques ou des horodatages.

{
  "2024-01-30_12:00": "error",
  "2024-01-30_12:01": "info"
}

Le Bug

Le dynamic mapping voit un nouveau champ ("2024-01-30_12:00") et l'ajoute au Cluster State (le registre global de tous les paramètres).

Le Résultat

Chaque clé unique devient un nouveau champ. Si vous envoyez 10 000 documents avec des clés uniques, vous créez 10 000 champs.

Le Cluster State devient massif (des centaines de Mo).
Cet état doit être synchronisé avec chaque nœud. Le mettre à jour prend des secondes. Le cluster devient non réactif.

Le Correctif

Désactiver le Dynamic Mapping : Définissez dynamic: false ou strict dans vos modèles de production.
Utiliser le Type de Données flattened : Si vous devez stocker du JSON non structuré avec des clés inconnues, mappez ce champ spécifique comme type: "flattened". Elasticsearch traitera l'objet JSON entier comme un seul champ mot-clé, empêchant l'explosion.
Limiter les Champs : La limite par défaut est de 1 000 champs par index (index.mapping.total_fields.limit). Ne l'augmentez pas. Si vous l'atteignez, votre modèle de données est mauvais.

3. Deep Pagination (La "Requête Tueuse")

Vos utilisateurs veulent sauter à la "Page 50 000" des résultats de recherche. Vous devez leur dire "Non".

Le Scénario

Un utilisateur exécute une requête avec from: 50000, size: 10.

Le Bug (Distributed Sorting Cost)

Pour trouver le "top 10" des résultats commençant à 50 000, Elasticsearch ne peut pas simplement sauter les 50 000 premiers enregistrements.

Chaque shard impliqué dans la recherche doit récupérer ses propres 50 010 meilleurs résultats et les garder en mémoire.
Si vous avez 10 shards, le nœud de coordination reçoit 50 010 * 10 = 500 100 documents.
Il doit trier ce demi-million d'enregistrements en RAM, rejeter les 500 000 premiers et renvoyer les 10 derniers.

Le Résultat

Pics massifs de CPU et boucles de Garbage Collection (GC). Si plusieurs utilisateurs font cela simultanément, le nœud manque de mémoire (OOM) et plante.

Le Correctif

Limite stricte : Elasticsearch définit par défaut index.max_result_window à 10 000. Ne l'augmentez pas à moins de savoir exactement ce que vous faites.
Pour les Utilisateurs (search_after) : C'est la manière efficace de paginer. Cela dit à Elasticsearch : "Donne-moi les 10 résultats suivants après cette valeur de tri spécifique du dernier résultat." Cela ne nécessite pas de scan profond.
Pour les Scripts (Scroll API / PIT) : Si vous devez exporter l'ensemble des données, utilisez le Point-in-Time (PIT) ou l'API Scroll, qui est conçue pour le traitement par lots.

Elasticsearch in Production: The Definitive Architecture & Operations Guide

AKEBLI Ouassim — Mon, 02 Feb 2026 16:06:20 GMT

Definition

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. At its core, it is a NoSQL database, but unlike traditional databases designed for storage and retrieval, Elasticsearch is optimized for speed and relevance.

Architecture & Capacity Planning

This diagram illustrates a typical producción cluster architecture, showing the separation of duties between dedicated master nodes, tiered data nodes (hot/warm), and coordinating nodes.

1. Node Roles

A. Mandatory Basic Roles

1. Master-eligible node (`master`)

Function: Responsible for cluster-wide actions such as creating or deleting indices, tracking which nodes are part of the cluster, and allocating shards to nodes.
Why it's mandatory: Without a master, the cluster cannot be formed, and no cluster-level changes can be tracked.
Production Tip: You typically need 3 dedicated master nodes for High Availability (to avoid "split-brain").

2. Data node (`data`)

Function: Holds the shards that contain your indexed documents. These nodes perform data-related operations like CRUD, search, and aggregations.
Why it's mandatory: Without data nodes, you cannot store any data.
Sub-roles (Tiered Architecture):
- data_content: For general-purpose data that doesn't fit a time-series lifecycle.
- data_hot: For strictly time-series data (e.g., logs) that is being actively written to and queried.
- data_warm: For older data that is read-only but still queried frequently.
- data_cold: For data accessed infrequently (optimized for storage).
- data_frozen: For data stored in object storage (S3) and rarely queried (Searchable Snapshots).

B. Core Utility Roles (Highly Recommended)

While not strictly "mandatory" for the cluster to start, these are standard in almost all production environments.

1. Ingest node (`ingest`)

Function: Runs "ingest pipelines" to pre-process documents before indexing. This acts like a lightweight Logstash inside Elasticsearch (e.g., parsing JSON, removing fields, renaming fields).
Default Behavior: Every node is an ingest node by default unless configured otherwise.

2. Coordinating-only node (No specific role set)

Function: These nodes have an empty role list (node.roles: []). They act as "Smart Load Balancers." They accept search requests, distribute them to the specific data nodes holding the data, gather the results, perform the final reduction (sorting/aggregating), and send the response to the client.
Use Case: Large clusters with heavy search traffic to prevent data nodes from being overwhelmed by CPU-intensive aggregation tasks.

C. Specialized Roles (Optional)

These are specific to certain features of the Elastic Stack.

1. Machine Learning node (`ml`)

Function: Runs the native Machine Learning jobs (anomaly detection, forecasting).
Requirement: These jobs are CPU and RAM intensive. If you use ML features, you must have at least one ML node.

2. Remote Cluster Client (`remote_cluster_client`)

Function: Allows the cluster to connect to other clusters (Cross-Cluster Search or Cross-Cluster Replication).
Default: Enabled by default on all nodes.

3. Transform node (`transform`)

Function: Runs transform jobs which pivot or summarize data into new indices (similar to "Materialized Views" in SQL).

4. Voting-only node (`voting_only`)

Function: A master-eligible node that can participate in master elections (voting) but cannot actually become the elected master.
Use Case: Rarely used; mostly for tie-breaking in even-numbered clusters.

2. Hardware Requirements

These figures are based on the constraints of the JVM (Java Virtual Machine) and operational best practices for recovery and stability.

A. RAM (Memory)

This is the most critical resource. It is split between the JVM Heap (for the application) and the OS Filesystem Cache (for Lucene segment files).

Minimum: 8 GB - 16 GB
- Running a production node with less than 8GB is risky. You need enough headroom for the OS to cache frequently accessed index segments.
The "Sweet Spot": 64 GB
- This is the standard specification for a high-performance node. It allows you to allocate ~30GB to the Heap and leave ~34GB for the OS cache.
Maximum (Effective): 64 GB (Physical RAM)

⚠️ The "Compressed Oops" Limit
You should strictly avoid allocating more than 31GB-32GB to the JVM Heap. If the Heap crosses ~32GB, the JVM stops using "Compressed Object Pointers" (pointers swell from 32-bit to 64-bit). This drastically reduces memory efficiency, effectively cutting your available memory by half.

B. Disk (Storage)

Disk speed dictates indexing throughput, and disk size dictates how long recovery (rebalancing shards) takes.

Type Requirement: SSD / NVMe
- Spinning HDDs are only acceptable for "Cold" or "Frozen" tiers.
Minimum Capacity: 200 GB
- Small clusters need enough space for logs, the OS, and headroom for shard merging.

Maximum Capacity (Per Node):

Hot Nodes (High IO): 2 TB - 4 TB limit.
- Reasoning: If a node holds 10TB of hot data and dies, replicating that 10TB to a new node over the network takes hours or days, leaving the cluster in a "Yellow" (at risk) state.
Warm/Cold Nodes: 10 TB - 16 TB.
- Reasoning: Since these nodes are query-heavy but low-write, you can pack them with dense storage, provided you accept slower recovery times.

C. Network

Elasticsearch is a distributed system; the network is the bus.

Minimum: 1 Gbps (Gigabit Ethernet)
- Acceptable only for small clusters with low indexing rates.
- Risk: During a "peer recovery" (when a node comes back online), a 1 Gbps link will be 100% saturated, causing search latency to spike.
Recommended / Maximum: 10 Gbps - 25 Gbps
- 10 Gbps is the standard for modern production clusters to ensure recovery doesn't impact live traffic.
Latency: Must be low single-digit ms (intra-datacenter).
- Warning: Spanning a single cluster across distinct geographical regions (e.g., US-East to EU-West) is generally unsupported and will cause instability due to timeout errors.

3. Sizing & Sharding

This diagram provides a clear visualization of how a single index is divided into primary shards (P) and how each primary shard has a corresponding replica shard (R) distributed across different nodes for high availability.

A. Shard Strategy

Sharding is the mechanism that allows Elasticsearch to scale beyond the hardware limits of a single server. However, it is the most common source of performance issues in production.

1. The Concept

An Elasticsearch index is actually a logical grouping of Shards. Each shard is a self-contained instance of Apache Lucene, which is a fully functional search engine in its own right. When you execute a search on an index, Elasticsearch queries all relevant shards in parallel and merges the results.

2. The "Oversharding" Trap

New users often think, "If shards provide parallelism, more shards must mean more speed." This is a fallacy known as oversharding.

The Metadata Overhead: Every shard consumes resources. The Cluster State (the "brain" of the cluster) must track the location, status, and size of every shard. If you have 100,000 small shards, the Cluster State grows huge, and updates (like creating a new index) become incredibly slow.
The "Map-Reduce" Tax: When you search an index with 50 shards, the coordinating node must send the request to 50 places, wait for 50 responses, and merge 50 results. If those shards are tiny (e.g., 50MB), the overhead of managing the request outweighs the benefit of parallel processing.
Memory Cost: Each shard has a baseline memory footprint in the JVM Heap to hold Lucene segment info. Too many small shards will exhaust your Heap memory even if the cluster is idle.

✅ The Golden Rule: 10GB – 50GB
For general-purpose search (e.g., products, users), aim for a shard size between 10GB and 50GB.

Why > 10GB? To minimize the per-shard overhead and maximize efficient compression.
Why < 50GB? To ensure recovery is fast. If a node fails, moving a 50GB shard to a new node over the network is manageable. Moving a 500GB shard takes so long that your cluster remains in a vulnerable state ("Yellow" health) for hours.

B. Replicas

Replication serves two distinct purposes in a cluster: High Availability (HA) and Read Throughput.

1. Failover (High Availability)

A Replica Shard is a precise copy of a Primary Shard.

The Mechanism: If the node holding a Primary Shard crashes, the master node instantly "promotes" the Replica Shard (living on a different node) to be the new Primary.
Production Standard: You must set number_of_replicas: 1 (at minimum). This ensures that if any single node fails, no data is lost, and the cluster remains fully operational.
The Trade-off: Replicas double your storage requirements. 100GB of data with 1 replica requires 200GB of physical disk space.

2. Read Throughput (Scaling Search)

Unlike Primary shards, which handle both reads and writes, Replicas are usually used for reads.

Load Balancing: When a search request comes in, the coordinating node intelligently routes it. It can go to the Primary or any of its Replicas.
Scaling Up: If your application is "Read Heavy" (e.g., an e-commerce site where users search frequently but products change rarely), you can increase performance by adding more replicas.
- Example: An index with 1 Primary and 5 Replicas allows 6 nodes to answer search queries simultaneously for that specific data.

Key Distinction:

Primary Shards are fixed at index creation (changing them requires reindexing).

Replica Shards can be changed dynamically. You can scale from 1 replica to 5 replicas instantly if you expect a traffic spike (like Black Friday), and scale back down afterwards.

Environment Preparation (OS Tuning)

Elasticsearch is sensitive to Operating System configurations. Failing to tune these will often prevent the cluster from starting (Bootstrap Checks).

1. Disable Swapping (The "Performance Killer")

The Concept: In a standard server, if physical RAM is full, the OS moves inactive memory pages to the hard disk (swap space). For Elasticsearch, this is catastrophic. The Java Garbage Collector (GC) needs to scan memory to reclaim space. If that memory is on the disk (which is 100,000x slower than RAM), a GC cycle that usually takes milliseconds will take seconds or minutes.

The Result: The node becomes unresponsive ("Stop-the-world" pause), the cluster thinks the node is dead, drops it, and triggers a massive data rebalance.

How to configure it

You have two main methods. The best practice is to do both.

OS Level (Permanent): Disable swap completely.
```
 sudo swapoff -a
```
Application Level (Memory Lock): Force Elasticsearch to lock its memory address space into RAM so the OS cannot swap it out.
- In elasticsearch.yml:
```
  bootstrap.memory_lock: true
```
- Note: You may need to edit the systemd service file (systemctl edit elasticsearch) to allow this limit:
```
  [Service]
  LimitMEMLOCK=infinity
```

2. File Descriptors (The "Capacity Limit")

The Concept: Elasticsearch (via Lucene) breaks your data into heavily compressed immutable files called "segments." A single node can easily hold thousands of these small files open simultaneously. The default Linux limit for open files per user is often 1024. This is far too low. If Elasticsearch hits this limit, it can silently lose data or crash because it cannot write to new files.

How to configure it

You must increase the limit to at least 65,536.

Check current limit: ulimit -n
Permanent fix: Edit /etc/security/limits.conf:
```
  elasticsearch - nofile 65536
```
(If you install via RPM/Deb package, this is often done automatically, but you must verify it).

3. Virtual Memory (The `mmap` Requirement)

The Concept: This is specific to how Lucene reads data. It uses a system call called mmap (memory map) to map the files on the disk directly into the virtual memory address space. This is incredibly fast because it lets the kernel manage file caching. However, the default operating system limit on how many "memory maps" a process can own is usually 65,530. Elasticsearch requires significantly more.

How to configure it

This is the most common reason for startup failures.

Command (Live):
```
  sysctl -w vm.max_map_count=262144
```
Permanent fix: Add this line to /etc/sysctl.conf:
```
  vm.max_map_count=262144
```

4. JVM Heap Size (The Balancing Act)

This is the most misunderstood setting. You are configuring the Java Virtual Machine (JVM) memory.

A. `Xms` and `Xmx` (Min vs. Max)

The Problem: By default, Java starts with a small heap (Xms) and grows it as needed up to the max (Xmx). This resizing process pauses execution.
The Fix: Set them to the same value. This allocates all the memory immediately at startup, preventing resizing pauses.
```
  # /etc/elasticsearch/jvm.options
  -Xms4g
  -Xmx4g
```

B. The 50% Rule (Why not 100%?)

If you have a 64GB machine, why give Elasticsearch only 30GB? Why not 60GB?

The Reason: Elasticsearch relies on two types of memory:
1. JVM Heap: For query objects, aggregations, and cluster state.
2. OS Filesystem Cache: This is where the actual data (Lucene segments) lives.
If you give all RAM to the Heap, the OS has no room to cache the files. The disk will be thrashed, and performance will tank.
Rule: 50% to Heap, 50% left free for the OS.

C. The 32GB Limit (Compressed Oops)

You must never set the Heap above ~32GB (exact threshold varies, usually 30GB-31GB is safe).

The Science: Below 32GB, Java uses "Compressed Ordinary Object Pointers" (Compressed Oops). It uses 32-bit pointers to reference memory.
The Trap: Once you cross the threshold (e.g., 32.1GB), Java switches to 64-bit pointers. These are larger.
The Result: A 35GB Heap actually stores less data than a 31GB Heap because the pointers themselves take up so much more space. Plus, it consumes more CPU bandwidth.

Security (The "Must-Have" Layer)

This section often trips up new administrators because it involves certificates and passwords, which can be tedious. However, in modern Elasticsearch (version 8.x+), security is enabled by default. You cannot run a production cluster without it.

1. TLS/SSL Encryption (The "Encrypted Tunnel")

Encryption prevents "Man-in-the-Middle" attacks. In Elasticsearch, we implement this in two distinct layers. If you miss the first one, your cluster will not even start.

A. Transport Layer (Node-to-Node)

What it is: The internal communication channel on port 9300 where nodes talk to each other (electing masters, moving shards, replicating data).
Why it's mandatory: Elasticsearch requires mutual trust. Node A must prove to Node B that it is a legitimate part of the cluster, not a rogue server trying to steal data.
The Mechanism:
1. You generate a Certificate Authority (CA).
2. You sign a certificate for each node using that CA.
Crucial Note: If you do not enable Transport SSL, Elasticsearch will refuse to bind to a non-loopback IP address (i.e., it stays in "Development Mode").
Key Tool: bin/elasticsearch-certutil (This built-in tool simplifies creating these certificates).

B. HTTP Layer (Client-to-Cluster)

What it is: The external API on port 9200 where Kibana, your application (Java/Python/Node.js), and users connect.
Why it's critical: Without this, Basic Auth credentials (username/password) are sent in plain text. Anyone on the network can sniff the admin password.
Configuration: You generally use the same CA to sign these certificates, or you can use a public CA (Let's Encrypt, Verisign) if your cluster is public-facing.

2. Authentication & RBAC (The "Gatekeeper")

Once the connection is secure, you need to control who is logging in and what they can touch. This is Role-Based Access Control (RBAC).

A. The Built-in Users

When you first start the cluster, you run bin/elasticsearch-reset-password. This sets up reserved accounts that are vital for the stack:

elastic: The "Superuser" (Root). It has full control.

Danger: Do not use this account in your application code! If those credentials leak, your entire cluster is compromised.
kibana_system: A service account used only by the Kibana server to talk to Elasticsearch. It cannot be used to log in to the dashboard.

B. The Principle of Least Privilege

You should create custom roles for every specific use case.

The "Developer" Role: Can read and monitor indices but cannot delete data or change cluster settings.
The "App" Role: Can write to the logs-* index but cannot read from the salary-data index.
Document Level Security (DLS): You can even restrict access within a single index.
- Example: "User A can search the employees index, but only document where department: 'marketing'."

3. Audit Logging (The "Black Box")

If data disappears or leaks, how do you know what happened?

What it tracks: You can configure it to log specific events: "Authentication Failed," "Index Deleted," or even "User X searched for query Y."
Compliance: This is mandatory for standards like GDPR, HIPAA, and PCI-DSS.
Performance Warning: Audit logging is I/O intensive.
Bad Practice: Logging every single "read" operation. This will fill your disk with logs and slow down search performance.
Best Practice: Log only "write/delete" operations and "authentication failures" to catch brute-force attacks.

Deployment Methods

The Introduction: One Size Does Not Fit All

Elasticsearch is infrastructure-agnostic. It runs wherever Java runs. The choice of deployment method usually depends on three factors:

Existing Infrastructure: Are you already all-in on Kubernetes? Do you have racks of physical servers?
Team Expertise: Are your ops people comfortable with Linux kernel tuning, or do they prefer writing YAML manifests?
Scalability Needs: Do you need to add 10 nodes in 5 minutes during Black Friday, or is your cluster relatively static?

1. Bare Metal / Virtual Machines (The "Classic" Approach)

This is the traditional way of deploying software. You treat Elasticsearch like any other database (PostgreSQL, MySQL).

How it works:

You provision Linux servers (physical hardware or VMs like EC2/Azure VMs), install Java (if using older ES versions), configure the OS prerequisites (as discussed in the OS Tuning section), add the Elastic repository, and install via package managers:

# Ubuntu/Debian example
wget -qO - [https://artifacts.elastic.co/GPG-KEY-elasticsearch](https://artifacts.elastic.co/GPG-KEY-elasticsearch) | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
sudo apt-get install elasticsearch

Pros:

Maximum Performance: You have direct access to hardware resources without any containerization abstraction layer. This is ideal for ultra-high-performance use cases.
Simpler Troubleshooting: If you need to debug network latency or disk I/O, you use standard Linux tools (iostat, tcpdump) directly on the host.
Persistence is easy: Data is written directly to attached disks. You don't have to worry about complex container storage interfaces (CSI).

Cons:

Maintenance Overhead: Scaling up means manually provisioning a new server and configuring it. Upgrades require careful, manual "rolling restarts."
Configuration Drift: Without strong configuration management tools (Ansible, Chef, Puppet), servers can slowly diverge in their configurations over time, leading to "it works on node 1 but not node 2" issues.
Best For: Traditional IT environments, long-running stable clusters, and teams with strong Linux administration skills.

2. Docker / Containers (The "Rapid Prototyping" Approach)

Docker changed everything by allowing developers to spin up complex stacks locally in seconds.

How it works: Elastic provides official, pre-hardened Docker images. You rarely run plain docker run commands. Instead, you use Docker Compose to define a multi-node cluster in a single YAML file.

Pros:

Speed: You can go from zero to a working 3-node cluster on your laptop in under 60 seconds.
Consistency: The environment is identical across development, testing, and staging. "It works on my machine" actually means something.
Isolation: Dependencies are packaged with the container.

Cons:

Not a Production Orchestrator: Docker Compose is generally not recommended for multi-host production environments. It lacks advanced failover, scaling, and networking features needed for high availability.
State Management: You must be very careful with volume mapping to ensure data persists if a container restarts.

Best For: Local development, CI/CD testing pipelines, and very small, single-host production deployments.

3. Kubernetes & ECK (The "Cloud-Native" Standard)

If your organization has adopted Kubernetes (K8s), this is almost certainly how you should deploy Elasticsearch. But there is a massive caveat.

The "Helm Chart" Trap New K8s users often try to deploy Elasticsearch using standard generic Helm charts. Avoid this. Elasticsearch is a complex, stateful distributed system. A standard K8s deployment doesn't understand that you can't just kill 3 master nodes simultaneously during an update without destroying the cluster.

The Solution: Elastic Cloud on Kubernetes (ECK) Elastic developed their own Kubernetes Operator called ECK.

What is an Operator? Think of it as a software robot that runs inside your K8s cluster and possesses human operational knowledge about Elasticsearch. It knows exactly the order in which to restart nodes so the cluster never goes down.

How it works: Instead of managing pods and statefulsets directly, you install the ECK operator, and then you submit a simple custom resource YAML to K8s saying "I want an Elasticsearch cluster".

The Operator sees The YAML and automatically creates the Services, StatefulSets, PersistentVolumeClaims, and generates the TLS certificates.

Pros:

Day 2 Operations Automated: The Operator handles scaling, rolling upgrades, secure configuration, and backups automatically.
Elastic Ecosystem: It makes deploying Kibana, APM Server, and Beats alongside Elasticsearch incredibly easy.

Cons:

High Complexity: You need significant Kubernetes expertise before you add the complexity of running a stateful database on top of it.

Best For: Modern, cloud-native organizations requiring highly dynamic, scalable infrastructure.

Configuration Best Practices

The Critical `elasticsearch.yml` Configuration Guide

The elasticsearch.yml file is the control center of your node. While there are hundreds of settings, getting these specific few wrong is the most common cause of production outages or data loss.

1. Identity: Cluster & Node Names

In a vacuum, names don't seem technical. In a distributed system, they are vital for observability and isolation.

A. `cluster.name`

The Default: elasticsearch
The Risk: If you leave this as default, a rogue developer starting a local instance on the same network (Wi-Fi or VPN) could accidentally discover and join your production cluster.
Best Practice: Be descriptive and specific to the environment.

cluster.name: prod-search-cluster-v1

B. `node.name`

The Default: The server’s hostname.
The Risk: Hostnames like ip-10-0-0-5 are hard to read in logs or Kibana dashboards.
Best Practice: Use a naming convention that indicates the role and number of the node. This makes debugging much faster ("Oh, prod-master-02 is down" is more actionable than "Server X is down").

node.name: prod-data-hot-01

2. Discovery (preventing the "Split-Brain")

"Discovery" is the process where nodes find each other and elect a leader (Master). If this is misconfigured, nodes will form separate, competing clusters, leading to data inconsistency (Split-Brain).

A. `discovery.seed_hosts` (The Phone Book)

This setting tells the node: "When you wake up, call these people to ask where the cluster is."

Configuration: You don't need to list every single node. Just list the IP addresses or hostnames of your Master-eligible nodes.

discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]

Note: If using Cloud/AWS, you might use a plugin (like discovery-ec2) to auto-detect these, but hardcoding IPs is safest for bare metal.

B. `cluster.initial_master_nodes` (The Bootstrapper)

This is the most confusing setting for beginners. It is only used once in the entire life of the cluster: the very first time you turn it on.

The Problem: When you start 3 empty nodes, they all think, "I should be the King." Without this setting, they might form 3 separate clusters of 1 node each.
The Solution: This setting forces them to form a quorum. It says, "Do not start the cluster until you see a vote from these specific nodes."
Configuration: MUST match the node.name of your master nodes exactly.

cluster.initial_master_nodes: ["prod-master-01", "prod-master-02", "prod-master-03"]

Critical Warning: Once the cluster has formed for the first time, remove this setting (or comment it out) from your config management. If you leave it, and later try to restart a node to join an existing cluster, it might try to bootstrap a new cluster instead of joining the old one.

3. Path Settings (Saving your OS)

By default, Elasticsearch writes data to /var/lib/elasticsearch and logs to /var/log/elasticsearch. This is dangerous.

A. The Risk of the "Root Partition"

On Linux, /var is usually part of the root partition (/). If your users spam the cluster with data (filling path.data) or the cluster throws massive error loops (filling path.logs), you will fill up the root disk to 100%.

The Consequence: When / is full, Linux crashes. You can't SSH in to fix it. You have to physically reboot into rescue mode.

B. `path.data`

Best Practice: Mount a separate, large physical disk (NVMe/SSD) to a path like /mnt/data and point Elasticsearch there. If this disk fills up, Elasticsearch stops working, but the OS stays alive, allowing you to fix the issue.

path.data: /mnt/data/elasticsearch

Pro Tip (Striping): You can provide multiple paths. Elasticsearch will act like a software RAID 0, striping data across them.

path.data:
  - /mnt/disk1
  - /mnt/disk2

C. `path.logs`

Best Practice: Ideally, ship logs to a remote system (using Filebeat). If storing locally, keep them on a separate partition from path.data so that a massive log spike doesn't consume your data storage space.

Operations & Maintenance: From Hobby to Production

This section defines the difference between a "hobby" cluster and a "production" cluster. Deployment is a one-time event; operations are forever.

1. Monitoring: You Cannot Manage What You Cannot See

A common mistake is waiting for users to complain "Search is slow" before checking the cluster. You need proactive visibility.

A. The Tools

1. Elastic Stack Monitoring (The Native Way)

How it works: You enable xpack.monitoring. The cluster sends metrics to itself (or preferably, to a separate "Monitoring Cluster" to avoid adding load to the production system).
Pros: Deeply integrated; the Kibana UI is pre-built and excellent.

2. Prometheus & Grafana (The Cloud-Native Way)

How it works: You run an elasticsearch-exporter sidecar container. Prometheus scrapes it, and Grafana visualizes it.
Pros: Industry standard; allows you to correlate Elasticsearch metrics with Linux/Network metrics on the same dashboard.

B. The "Big 4" Metrics to Watch

JVM Heap Usage
- Healthy: A "sawtooth" pattern (memory fills up, Garbage Collection clears it, repeats).
- Danger: A flat line near 75-90%. This means the node is starving for memory and will soon crash with OutOfMemoryError.
Garbage Collection (GC) Count & Time
- Danger: If "Old Gen" GC time spikes, your node is pausing execution (Stop-the-World) to clean memory. Search requests will hang during these pauses.
CPU Usage
- High CPU is normal during heavy indexing, but if it stays at 100% continuously, your nodes are under-provisioned or your queries are too complex (e.g., wildcards starting with *).
Thread Pool Rejections
- This is the most critical error metric. It means the node is saying, "I am too busy; I cannot accept this request." If search or write rejections are > 0, you have a capacity problem.

2. Backups: Replicas ≠ Backups

This is the single most important lesson in data safety.

The Myth

"I have 2 replicas, so I have 3 copies of my data. I don't need backups."

The Reality

Replicas protect against Hardware Failure (disk crash). They do not protect against Human Error.

Scenario: You accidentally run DELETE /users.
Result: Elasticsearch deletes the Primary shard immediately, and instantly propagates that delete instruction to all Replica shards. Your data is gone from all 3 copies in milliseconds.

The Solution: Snapshots & SLM

You must take Snapshots, which are incremental backups sent to external repository storage (S3, Google Cloud Storage, Azure Blob, or a shared NFS drive).

Incremental: The first snapshot copies everything. The second snapshot only copies the segments that changed. It is lightweight and fast.
SLM (Snapshot Lifecycle Management): Do not write manual scripts. Use the built-in SLM feature to define a policy.
Example Policy: "Take a snapshot every night at 2 AM. Keep the last 30 snapshots. Delete older ones automatically."

3. Updates: The "Rolling Restart" Strategy

Upgrading a database used to mean "Scheduled Downtime" on a Sunday night. With Elasticsearch, you can upgrade with zero downtime if you follow the "Rolling Restart" procedure.

The Logic

You never turn off the whole cluster. You turn off one node, upgrade it, turn it back on, and move to the next.

The Critical Step: Disabling Allocation

Before you shut down a node, you must tell the cluster: "I am turning this node off on purpose. Do not panic and do not start rebuilding its data elsewhere."

The Workflow

1. Stop Allocation (This freezes the cluster layout so shards don't move).

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

2. Stop the Node Run systemctl stop elasticsearch.

3. Upgrade Update the package or replace the Docker image.

4. Start the Node Run systemctl start elasticsearch.

5. Wait for Green Watch the logs or _cat/nodes until the node rejoins the cluster.

6. Re-enable Allocation

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}

7. Repeat Move to the next node.

Common Pitfalls: The Difference Between Novices and Experts

This section separates the novices from the experts. These are the issues that don't show up in a "Hello World" tutorial; they only appear when you are in production, under load, and usually at 3 AM.

1. Split Brain (The "Two Captains" Problem)

This diagram provides a visual representation of the "split-brain" scenario, where a network partition leads to the formation of two separate clusters, each with its own master, risking data inconsistency.

This is the nightmare scenario for distributed systems.

The Scenario

Imagine you have a cluster of 3 nodes (A, B, C) in a single room. A network switch fails, cutting the room in half. Node A and B can talk to each other, but Node C is isolated.

The Glitch

Nodes A+B realize C is gone. They elect Node A as Master.
Node C thinks A and B are dead. It elects itself as Master.

The Result (Split Brain)

You now have two active masters in the same cluster.

Application 1 writes data to Node A.
Application 2 writes data to Node C.

The Catastrophe

When the network comes back, you have two different versions of the history. Elasticsearch cannot "merge" these timelines. You will likely lose the data written to the smaller side of the partition.

The Fix (Quorum)

In older versions (6.x and below): You had to manually set discovery.zen.minimum_master_nodes to (N/2) + 1.
In modern versions (7.x+): Elasticsearch uses a Voting Configuration system automatically. However, you must ensure you have 3 Master-eligible nodes (an odd number) so there is always a majority winner in a vote. Never run a production cluster with exactly 2 master nodes.

2. Mapping Explosion (The "Death by Fields")

Elasticsearch is schema-less by default (Dynamic Mapping), which sounds great until it crashes your cluster.

The Scenario

You are logging user cookies or HTTP headers. A developer decides to send a JSON document where the keys are unique UUIDs or timestamps.

{
  "2024-01-30_12:00": "error",
  "2024-01-30_12:01": "info"
}

The Glitch

Dynamic mapping sees a new field ("2024-01-30_12:00") and adds it to the Cluster State (the global registry of all settings).

The Result

Every unique key becomes a new field. If you send 10,000 documents with unique keys, you create 10,000 fields.

The Cluster State becomes massive (hundreds of MBs).
This state must be synced to every node. Updating it takes seconds. The cluster becomes unresponsive.

The Fix

Disable Dynamic Mapping: Set dynamic: false or strict in your production templates.
Use the flattened Data Type: If you need to store unstructured JSON with unknown keys, map that specific field as type: "flattened". Elasticsearch will treat the entire JSON object as a single keyword field, preventing the explosion.
Limit Fields: The default limit is 1,000 fields per index (index.mapping.total_fields.limit). Do not raise this. If you hit it, your data model is wrong.

3. Deep Pagination (The "Killer Query")

Your users want to jump to "Page 50,000" of the search results. You must tell them "No."

The Scenario

A user runs a query with from: 50000, size: 10.

The Glitch (Distributed Sorting cost)

To find the "top 10" results starting at 50,000, Elasticsearch cannot just skip the first 50,000 records.

Every shard involved in the search must fetch its own top 50,010 results and hold them in memory.
If you have 10 shards, the coordinating node receives 50,010 * 10 = 500,100 documents.
It must sort all half-million records in RAM, discard the first 500,000, and return the last 10.

The Result

Massive CPU spikes and Garbage Collection (GC) loops. If multiple users do this simultaneously, the node runs out of memory (OOM) and crashes.

The Fix

Hard Limit: Elasticsearch defaults index.max_result_window to 10,000. Do not increase this unless you know exactly what you are doing.
For Users (search_after): This is the efficient way to paginate. It tells Elasticsearch, "Give me the next 10 results after this specific sort value from the last result." It requires no deep scanning.
For Scripts (Scroll API / PIT): If you need to export the entire dataset, use the Point-in-Time (PIT) or Scroll API, which is designed for batch processing.

NoSQL Nest

From Zero to Green: Automating a Production-Ready, Secured 3-Node Elasticsearch Cluster

Chapter 1: Architecture & Design

1.1 Environment Specification

1.2 Logical Architecture

1.3 Provisioning Workflow

Chapter 2: Lab Setup & Deployment

2.1 Prerequisites

2.2 Initialize the Lab

2.3 Generate Provisioning Scripts & Technical Script Breakdown

The Vagrantfile (Orchestrator)

Phase 1: Foundation (Scripts 01-05)

Phase 2: Infrastructure (Scripts 06-08)

Phase 3: Security & Identity (Scripts 09-10)

Phase 4: Service & Cluster (Scripts 11-12)

2.4 Deploy the Cluster

Chapter 3: Verification & Expected Results

3.1 Terminal Output Check

3.2 Service Validation

3.3 Cluster Health Check (SSL Verification)

3.4 NFS Storage Verification

Conclusion

Elasticsearch en Production : Le Guide Ultime de l'Architecture et des Opérations

Définition

Architecture & Capacity Planning

1. Node Roles

A. Rôles de base obligatoires

1. Master-eligible node (master)

2. Data node (data)

B. Core Utility Roles (Hautement recommandés)

1. Ingest node (ingest)

2. Coordinating-only node (Aucun rôle spécifique défini)

C. Specialized Roles (Optionnels)

1. Machine Learning node (ml)

2. Remote Cluster Client (remote_cluster_client)

3. Transform node (transform)

4. Voting-only node (voting_only)

2. Hardware Requirements

A. RAM (Mémoire)

B. Disk (Stockage)

C. Network

3. Sizing & Sharding

A. Shard Strategy

1. Le Concept

2. Le piège de l'Oversharding"

B. Replicas

1. Failover (Haute Disponibilité)

2. Read Throughput (Scaling Search)

Environment Preparation (OS Tuning)

1. Disable Swapping (Le "Tueur de Performance")

Comment le configurer

2. File Descriptors (La "Limite de Capacité")

Comment le configurer

3. Virtual Memory (L'exigence mmap)

Comment le configurer

4. JVM Heap Size (Le jeu d'équilibre)

A. Xms et Xmx (Min vs. Max)

B. La règle des 50% (Pourquoi pas 100% ?)

C. La limite de 32GB (Compressed Oops)

Security (La couche "Must-Have")

1. TLS/SSL Encryption (Le "Tunnel Chiffré")

A. Transport Layer (Node-to-Node)

B. HTTP Layer (Client-to-Cluster)

2. Authentication & RBAC (Le "Gatekeeper")

A. Les utilisateurs intégrés

B. Le Principe du Moindre Privilège

3. Audit Logging (La "Boîte Noire")

Deployment Methods

L'Introduction : Une taille unique ne convient pas à tous

1. Bare Metal / Virtual Machines (L'approche "Classique")

Comment ça marche :

2. Docker / Containers (L'approche "Prototypage Rapide")

3. Kubernetes & ECK (Le Standard "Cloud-Native")

Configuration Best Practices

Le Guide Critique de Configuration elasticsearch.yml

1. Identité : Noms de Cluster & de Nœud

A. cluster.name

B. node.name

2. Discovery (Prévenir le "Split-Brain")

A. discovery.seed_hosts (L'annuaire téléphonique)

1. Master-eligible node (`master`)

2. Data node (`data`)

1. Ingest node (`ingest`)

1. Machine Learning node (`ml`)

2. Remote Cluster Client (`remote_cluster_client`)

3. Transform node (`transform`)

4. Voting-only node (`voting_only`)

3. Virtual Memory (L'exigence `mmap`)

A. `Xms` et `Xmx` (Min vs. Max)

Le Guide Critique de Configuration `elasticsearch.yml`

A. `cluster.name`

B. `node.name`

A. `discovery.seed_hosts` (L'annuaire téléphonique)

B. `cluster.initial_master_nodes` (Le Bootstrapper)

B. `path.data`

C. `path.logs`

1. Master-eligible node (`master`)

2. Data node (`data`)

1. Ingest node (`ingest`)

1. Machine Learning node (`ml`)

2. Remote Cluster Client (`remote_cluster_client`)

3. Transform node (`transform`)

4. Voting-only node (`voting_only`)

3. Virtual Memory (The `mmap` Requirement)

A. `Xms` and `Xmx` (Min vs. Max)