4.2 サーバー・VM・クラウド監視 - 統合インフラ環境の包括的監視

現代のエンタープライズ環境では、オンプレミスサーバー、仮想化環境、パブリッククラウドが複雑に組み合わさったハイブリッド構成が主流です。New Relic Infrastructureは、これらの異なるインフラ環境を統一的に監視し、運用効率を最大化する包括的なソリューションを提供します。

本セクションでは、物理サーバーから最新のマルチクラウド環境まで、すべてのインフラストラクチャを効果的に監視する実践的手法を体系的に解説します。

🎯 このセクションの学習目標

📊 技術的スキル習得

  • Linux/Windows サーバーの高度な監視設定
  • 仮想化環境(VMware、Hyper-V)の専門的監視
  • マルチクラウド統合の戦略的実装
  • ハイブリッドクラウド監視の最適化

🏢 ビジネス価値実現

  • 運用コスト削減:統合監視による効率化(30-50%削減)
  • 可用性向上:予防的監視による障害予防(MTTR 60%短縮)
  • リソース最適化:使用状況分析によるコスト削減(20-40%削減)
  • コンプライアンス対応:規制要件への自動対応

🖥️ 物理・仮想サーバー監視の高度実装

🐧 Linux サーバー監視のエンタープライズ設定

⚙️ 高度なエージェント設定

yaml
# /etc/newrelic-infra.yml - エンタープライズLinux設定
license_key: YOUR_ENTERPRISE_LICENSE_KEY
display_name: "{{.environment}}-{{.role}}-{{.hostname}}"

# パフォーマンス最適化
metrics_system_sample_rate: 15s
metrics_process_sample_rate: 20s
metrics_network_sample_rate: 10s
metrics_storage_sample_rate: 20s

# エンタープライズ環境識別
custom_attributes:
  # インフラ分類
  environment: production
  data_center: tokyo-dc1
  rack_location: "rack-A-15"
  server_class: bare_metal
  
  # ビジネス情報
  business_unit: ecommerce
  cost_center: infrastructure
  service_tier: tier1
  criticality: mission_critical
  
  # コンプライアンス
  compliance_zone: pci_dss
  data_classification: confidential
  backup_policy: daily_encrypted
  retention_policy: 7years
  
  # 運用情報
  maintenance_window: "02:00-04:00_JST"
  primary_contact: "infrastructure-team"
  escalation_policy: "critical_infra"

# 詳細システム監視
enable_process_metrics: true
process_config:
  # Webサーバー監視
  - name: "nginx_processes"
    match:
      - "nginx: master process"
      - "nginx: worker process"
    attributes:
      service: web_tier
      component: reverse_proxy
      monitoring_level: comprehensive
  
  # アプリケーションサーバー
  - name: "java_applications"
    match:
      - "java.*tomcat"
      - "java.*spring"
      - "java.*jetty"
    attributes:
      service: app_tier
      component: application_server
      jvm_monitoring: enabled
      gc_monitoring: detailed
  
  # データベース
  - name: "database_servers"
    match:
      - "postgres.*server"
      - "mysql.*server"
    attributes:
      service: data_tier
      component: database
      replication_monitoring: enabled
      backup_monitoring: enabled

# ネットワーク詳細監視
network_interface_filters:
  enabled_interface_filters:
    - "eth*"      # 物理インターフェース
    - "en*"       # 最新命名規則
    - "bond*"     # ボンディング
    - "team*"     # チーミング
  disabled_interface_filters:
    - "lo"        # ループバック
    - "docker*"   # Docker仮想IF
    - "br-*"      # ブリッジ
    - "veth*"     # 仮想Ethernet

# ストレージ高度監視
file_systems_config:
  # 重要パーティション監視
  include_file_systems:
    - mount_point: "/"
      fs_type: "ext4"
      attributes:
        partition_type: root
        backup_required: true
        monitoring_level: critical
    
    - mount_point: "/var/log"
      fs_type: "ext4"
      attributes:
        partition_type: logs
        log_rotation: enabled
        retention_days: 30
    
    - mount_point: "/opt/app"
      fs_type: "ext4"
      attributes:
        partition_type: application
        backup_required: true
        snapshot_enabled: true
    
    - mount_point: "/data"
      fs_type: "xfs"
      attributes:
        partition_type: database
        backup_required: true
        encryption: enabled
        
  # 除外ファイルシステム
  ignore_file_system_types:
    - "tmpfs"
    - "devtmpfs"
    - "sysfs"
    - "proc"
    - "squashfs"

# セキュリティ設定
strip_command_line: true
disable_cloud_metadata: false
http_server_enabled: true
http_server_host: "127.0.0.1"
http_server_port: 8003

# ログ管理
log_file: "/var/log/newrelic-infra/newrelic-infra.log"
log_format: "json"
log_to_stdout: false
verbose: 1

📊 システムパフォーマンス監視スクリプト

bash
#!/bin/bash
# エンタープライズLinux システム監視スクリプト
# /usr/local/bin/enterprise-system-monitor.sh

# 設定
NEWRELIC_INSERT_KEY="YOUR_INSERT_KEY"
NEWRELIC_ACCOUNT_ID="YOUR_ACCOUNT_ID"
HOSTNAME=$(hostname)
ENVIRONMENT="production"

# APIエンドポイント
INSIGHTS_API="https://insights-collector.newrelic.com/v1/accounts/$NEWRELIC_ACCOUNT_ID/events"

# 詳細システムメトリクス収集
collect_system_metrics() {
    echo "=== Collecting Enterprise System Metrics ==="
    
    # CPU詳細情報
    local cpu_cores=$(nproc)
    local load_1min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
    local load_5min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $2}' | sed 's/,//')
    local load_15min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $3}')
    
    # メモリ詳細情報
    local memory_total=$(free -b | grep '^Mem:' | awk '{print $2}')
    local memory_used=$(free -b | grep '^Mem:' | awk '{print $3}')
    local memory_free=$(free -b | grep '^Mem:' | awk '{print $4}')
    local memory_cached=$(free -b | grep '^Mem:' | awk '{print $6}')
    local swap_total=$(free -b | grep '^Swap:' | awk '{print $2}')
    local swap_used=$(free -b | grep '^Swap:' | awk '{print $3}')
    
    # ディスクI/O統計
    local disk_reads=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{sum+=$4} END {print sum}')
    local disk_writes=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{sum+=$5} END {print sum}')
    local disk_util=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{if($10>max) max=$10} END {print max}')
    
    # ネットワーク統計
    local net_rx_bytes=$(cat /sys/class/net/eth0/statistics/rx_bytes 2>/dev/null || echo 0)
    local net_tx_bytes=$(cat /sys/class/net/eth0/statistics/tx_bytes 2>/dev/null || echo 0)
    local net_rx_errors=$(cat /sys/class/net/eth0/statistics/rx_errors 2>/dev/null || echo 0)
    local net_tx_errors=$(cat /sys/class/net/eth0/statistics/tx_errors 2>/dev/null || echo 0)
    
    # プロセス統計
    local total_processes=$(ps aux | wc -l)
    local running_processes=$(ps aux | awk '$8 ~ /^R/ {count++} END {print count}')
    local zombie_processes=$(ps aux | awk '$8 ~ /^Z/ {count++} END {print count}')
    
    # New Relicに送信
    curl -X POST "$INSIGHTS_API" \
         -H "Content-Type: application/json" \
         -H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
         -d "[{
           \"eventType\": \"EnterpriseSystemMetrics\",
           \"timestamp\": $(date +%s),
           \"hostname\": \"$HOSTNAME\",
           \"environment\": \"$ENVIRONMENT\",
           \"cpu.cores\": $cpu_cores,
           \"cpu.load_1min\": $load_1min,
           \"cpu.load_5min\": $load_5min,
           \"cpu.load_15min\": $load_15min,
           \"memory.total_bytes\": $memory_total,
           \"memory.used_bytes\": $memory_used,
           \"memory.free_bytes\": $memory_free,
           \"memory.cached_bytes\": $memory_cached,
           \"memory.usage_percent\": $(echo \"scale=2; $memory_used * 100 / $memory_total\" | bc),
           \"swap.total_bytes\": $swap_total,
           \"swap.used_bytes\": $swap_used,
           \"disk.reads_per_sec\": $disk_reads,
           \"disk.writes_per_sec\": $disk_writes,
           \"disk.utilization_percent\": $disk_util,
           \"network.rx_bytes\": $net_rx_bytes,
           \"network.tx_bytes\": $net_tx_bytes,
           \"network.rx_errors\": $net_rx_errors,
           \"network.tx_errors\": $net_tx_errors,
           \"processes.total\": $total_processes,
           \"processes.running\": $running_processes,
           \"processes.zombie\": $zombie_processes
         }]"
}

# セキュリティ監視
collect_security_metrics() {
    echo "=== Collecting Security Metrics ==="
    
    # ログイン失敗回数
    local failed_logins=$(grep "Failed password" /var/log/auth.log 2>/dev/null | wc -l || echo 0)
    
    # 最近のsudo使用
    local sudo_usage=$(grep "sudo:" /var/log/auth.log 2>/dev/null | tail -n 10 | wc -l || echo 0)
    
    # ファイルシステム変更(重要ディレクトリ)
    local system_file_changes=0
    if [ -f "/var/log/aide/aide.log" ]; then
        system_file_changes=$(grep -c "changed" /var/log/aide/aide.log 2>/dev/null || echo 0)
    fi
    
    # 開いているネットワーク接続
    local open_connections=$(ss -tuln | grep LISTEN | wc -l)
    
    # 不審なプロセス検出(簡易版)
    local suspicious_processes=$(ps aux | grep -E "(nc|netcat|ncat)" | grep -v grep | wc -l)
    
    curl -X POST "$INSIGHTS_API" \
         -H "Content-Type: application/json" \
         -H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
         -d "[{
           \"eventType\": \"SecurityMetrics\",
           \"timestamp\": $(date +%s),
           \"hostname\": \"$HOSTNAME\",
           \"environment\": \"$ENVIRONMENT\",
           \"security.failed_logins\": $failed_logins,
           \"security.sudo_usage\": $sudo_usage,
           \"security.file_changes\": $system_file_changes,
           \"security.open_connections\": $open_connections,
           \"security.suspicious_processes\": $suspicious_processes,
           \"security.last_update\": \"$(date -Iseconds)\"
         }]"
}

# アプリケーション固有メトリクス
collect_application_metrics() {
    echo "=== Collecting Application Metrics ==="
    
    # データベース接続プール(PostgreSQL例)
    local db_connections=0
    if command -v psql >/dev/null 2>&1; then
        db_connections=$(psql -t -c "SELECT count(*) FROM pg_stat_activity;" 2>/dev/null | tr -d ' ' || echo 0)
    fi
    
    # Webサーバー統計(Nginx例)
    local nginx_active_connections=0
    local nginx_requests_per_sec=0
    if [ -f "/var/log/nginx/access.log" ]; then
        nginx_active_connections=$(ss -tuln | grep ":80\|:443" | wc -l)
        nginx_requests_per_sec=$(tail -n 100 /var/log/nginx/access.log | wc -l)
    fi
    
    # Redis統計
    local redis_connected_clients=0
    local redis_memory_usage=0
    if command -v redis-cli >/dev/null 2>&1; then
        redis_connected_clients=$(redis-cli info clients 2>/dev/null | grep "connected_clients:" | cut -d: -f2 | tr -d '\r' || echo 0)
        redis_memory_usage=$(redis-cli info memory 2>/dev/null | grep "used_memory:" | cut -d: -f2 | tr -d '\r' || echo 0)
    fi
    
    curl -X POST "$INSIGHTS_API" \
         -H "Content-Type: application/json" \
         -H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
         -d "[{
           \"eventType\": \"ApplicationMetrics\",
           \"timestamp\": $(date +%s),
           \"hostname\": \"$HOSTNAME\",
           \"environment\": \"$ENVIRONMENT\",
           \"database.connections\": $db_connections,
           \"webserver.active_connections\": $nginx_active_connections,
           \"webserver.requests_per_minute\": $nginx_requests_per_sec,
           \"cache.connected_clients\": $redis_connected_clients,
           \"cache.memory_usage_bytes\": $redis_memory_usage
         }]"
}

# メイン実行
main() {
    echo "Starting Enterprise System Monitoring for $HOSTNAME"
    echo "Timestamp: $(date)"
    
    # 各種メトリクス収集
    collect_system_metrics
    collect_security_metrics  
    collect_application_metrics
    
    echo "Monitoring data collection completed"
}

# 引数による実行制御
case "$1" in
    "system")
        collect_system_metrics
        ;;
    "security")
        collect_security_metrics
        ;;
    "application")
        collect_application_metrics
        ;;
    "all"|"")
        main
        ;;
    *)
        echo "Usage: $0 {system|security|application|all}"
        exit 1
        ;;
esac

🪟 Windows サーバー監視の実装

⚙️ Windows エージェント設定

yaml
# C:\Program Files\New Relic\newrelic-infra\newrelic-infra.yml
license_key: YOUR_ENTERPRISE_LICENSE_KEY
display_name: "WIN-{{.environment}}-{{.hostname}}"

# Windows固有設定
enable_win_services: true
enable_win_processes: true

# カスタム属性(Windows環境)
custom_attributes:
  # システム情報
  os_family: windows
  os_version: "2019"
  domain: "corp.company.com"
  
  # ビジネス分類
  environment: production
  business_unit: finance
  application_tier: web_tier
  
  # Windows固有
  windows_edition: "Standard"
  active_directory: enabled
  exchange_server: true
  iis_role: enabled

# Windowsサービス監視
win_services_config:
  enabled_services:
    - "W3SVC"          # IIS
    - "MSSQLSERVER"    # SQL Server
    - "SQLSERVERAGENT" # SQL Server Agent
    - "DNS"            # DNS Server
    - "DHCP"           # DHCP Server
    - "Spooler"        # Print Spooler
    - "Schedule"       # Task Scheduler
    - "EventLog"       # Windows Event Log

# Windowsプロセス監視
win_process_config:
  - name: "iis_processes"
    match:
      - "w3wp.exe"
      - "iisexpress.exe"
    attributes:
      service: web_server
      tier: frontend
      
  - name: "sql_server_processes"
    match:
      - "sqlservr.exe"
      - "sqlagent.exe"
    attributes:
      service: database
      tier: data

# パフォーマンスカウンター
performance_counters:
  - name: "processor_utilization"
    counter: "\\Processor(_Total)\\% Processor Time"
    attributes:
      metric_type: system_performance
      
  - name: "memory_available"
    counter: "\\Memory\\Available MBytes"
    attributes:
      metric_type: system_performance
      
  - name: "iis_requests"
    counter: "\\Web Service(_Total)\\Total Method Requests/sec"
    attributes:
      metric_type: application_performance

📊 Windows PowerShell 監視スクリプト

powershell
# Windows エンタープライズ監視スクリプト
# C:\Scripts\Enterprise-Windows-Monitor.ps1

param(
    [Parameter(Mandatory=$true)]
    [string]$NewRelicInsertKey,
    
    [Parameter(Mandatory=$true)]  
    [string]$NewRelicAccountId,
    
    [string]$Environment = "production"
)

# 設定
$InsightsAPI = "https://insights-collector.newrelic.com/v1/accounts/$NewRelicAccountId/events"
$Hostname = $env:COMPUTERNAME

# システムメトリクス収集
function Collect-SystemMetrics {
    Write-Host "=== Collecting Windows System Metrics ===" -ForegroundColor Green
    
    # CPU使用率
    $CPUUsage = Get-WmiObject Win32_Processor | Measure-Object -Property LoadPercentage -Average | Select-Object -ExpandProperty Average
    
    # メモリ情報
    $TotalMemory = (Get-WmiObject Win32_ComputerSystem).TotalPhysicalMemory
    $FreeMemory = (Get-WmiObject Win32_OperatingSystem).FreePhysicalMemory * 1024
    $UsedMemory = $TotalMemory - $FreeMemory
    $MemoryUsagePercent = [math]::Round(($UsedMemory / $TotalMemory) * 100, 2)
    
    # ディスク情報
    $DiskInfo = Get-WmiObject Win32_LogicalDisk | Where-Object {$_.DriveType -eq 3} | ForEach-Object {
        [PSCustomObject]@{
            DriveLetter = $_.DeviceID
            TotalSize = $_.Size
            FreeSpace = $_.FreeSpace
            UsedSpace = $_.Size - $_.FreeSpace
            UsagePercent = [math]::Round((($_.Size - $_.FreeSpace) / $_.Size) * 100, 2)
        }
    }
    
    # プロセス統計
    $TotalProcesses = (Get-Process).Count
    $RunningServices = (Get-Service | Where-Object {$_.Status -eq 'Running'}).Count
    $StoppedServices = (Get-Service | Where-Object {$_.Status -eq 'Stopped'}).Count
    
    # ネットワーク統計
    $NetworkAdapters = Get-WmiObject Win32_NetworkAdapter | Where-Object {$_.NetEnabled -eq $true}
    $ActiveConnections = (Get-NetTCPConnection | Where-Object {$_.State -eq 'Established'}).Count
    
    # メトリクスデータ構築
    $MetricsData = @{
        eventType = "WindowsSystemMetrics"
        timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
        hostname = $Hostname
        environment = $Environment
        "cpu.usage_percent" = $CPUUsage
        "memory.total_bytes" = $TotalMemory
        "memory.used_bytes" = $UsedMemory
        "memory.free_bytes" = $FreeMemory
        "memory.usage_percent" = $MemoryUsagePercent
        "processes.total" = $TotalProcesses
        "services.running" = $RunningServices
        "services.stopped" = $StoppedServices
        "network.active_connections" = $ActiveConnections
        "network.adapters_enabled" = $NetworkAdapters.Count
    }
    
    # ディスク情報を追加
    foreach ($Disk in $DiskInfo) {
        $DriveLetter = $Disk.DriveLetter.Replace(":", "")
        $MetricsData["disk.$DriveLetter.total_bytes"] = $Disk.TotalSize
        $MetricsData["disk.$DriveLetter.used_bytes"] = $Disk.UsedSpace
        $MetricsData["disk.$DriveLetter.free_bytes"] = $Disk.FreeSpace
        $MetricsData["disk.$DriveLetter.usage_percent"] = $Disk.UsagePercent
    }
    
    # New Relicに送信
    Send-MetricsToNewRelic -Data $MetricsData
}

# IIS監視
function Collect-IISMetrics {
    Write-Host "=== Collecting IIS Metrics ===" -ForegroundColor Green
    
    if (Get-WindowsFeature -Name IIS-WebServerRole -ErrorAction SilentlyContinue) {
        # IIS統計取得
        $IISSites = Get-IISSite
        $W3WPProcesses = Get-Process -Name w3wp -ErrorAction SilentlyContinue
        
        # アプリケーションプール統計
        $AppPools = Get-IISAppPool
        $RunningAppPools = ($AppPools | Where-Object {$_.State -eq 'Started'}).Count
        $StoppedAppPools = ($AppPools | Where-Object {$_.State -eq 'Stopped'}).Count
        
        $IISData = @{
            eventType = "WindowsIISMetrics"
            timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
            hostname = $Hostname
            environment = $Environment
            "iis.sites_total" = $IISSites.Count
            "iis.worker_processes" = $W3WPProcesses.Count
            "iis.app_pools_running" = $RunningAppPools
            "iis.app_pools_stopped" = $StoppedAppPools
        }
        
        Send-MetricsToNewRelic -Data $IISData
    }
}

# SQL Server監視
function Collect-SQLServerMetrics {
    Write-Host "=== Collecting SQL Server Metrics ===" -ForegroundColor Green
    
    try {
        # SQL Serverサービス確認
        $SQLService = Get-Service -Name "MSSQLSERVER" -ErrorAction SilentlyContinue
        
        if ($SQLService -and $SQLService.Status -eq 'Running') {
            # SQL Server接続試行
            $ConnectionString = "Server=localhost;Database=master;Integrated Security=true;Connection Timeout=10;"
            $Connection = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
            $Connection.Open()
            
            # 基本統計クエリ
            $Command = $Connection.CreateCommand()
            $Command.CommandText = @"
                SELECT 
                    (SELECT COUNT(*) FROM sys.dm_exec_sessions WHERE is_user_process = 1) as ActiveConnections,
                    (SELECT COUNT(*) FROM sys.databases WHERE state = 0) as OnlineDatabases,
                    (SELECT COUNT(*) FROM sys.dm_exec_requests) as ActiveRequests
"@
            
            $Reader = $Command.ExecuteReader()
            if ($Reader.Read()) {
                $SQLData = @{
                    eventType = "WindowsSQLServerMetrics"
                    timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
                    hostname = $Hostname
                    environment = $Environment
                    "sqlserver.service_status" = $SQLService.Status
                    "sqlserver.active_connections" = $Reader["ActiveConnections"]
                    "sqlserver.online_databases" = $Reader["OnlineDatabases"]
                    "sqlserver.active_requests" = $Reader["ActiveRequests"]
                }
                
                Send-MetricsToNewRelic -Data $SQLData
            }
            
            $Reader.Close()
            $Connection.Close()
        }
    }
    catch {
        Write-Warning "SQL Server metrics collection failed: $($_.Exception.Message)"
    }
}

# Active Directory監視
function Collect-ActiveDirectoryMetrics {
    Write-Host "=== Collecting Active Directory Metrics ===" -ForegroundColor Green
    
    try {
        # Domain Controller確認
        $DCRole = Get-WindowsFeature -Name AD-Domain-Services -ErrorAction SilentlyContinue
        
        if ($DCRole -and $DCRole.InstallState -eq 'Installed') {
            # AD統計取得
            $ADUsers = (Get-ADUser -Filter * -ErrorAction SilentlyContinue).Count
            $ADComputers = (Get-ADComputer -Filter * -ErrorAction SilentlyContinue).Count
            $ADGroups = (Get-ADGroup -Filter * -ErrorAction SilentlyContinue).Count
            
            # FSMO役割確認
            $FSMORoles = Get-ADForest | Select-Object -ExpandProperty SchemaMaster, DomainNamingMaster
            $IsFSMOHolder = ($FSMORoles -contains $env:COMPUTERNAME)
            
            $ADData = @{
                eventType = "WindowsActiveDirectoryMetrics"
                timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
                hostname = $Hostname
                environment = $Environment
                "ad.users_count" = $ADUsers
                "ad.computers_count" = $ADComputers
                "ad.groups_count" = $ADGroups
                "ad.is_fsmo_holder" = $IsFSMOHolder
                "ad.service_running" = (Get-Service -Name "NTDS" -ErrorAction SilentlyContinue).Status -eq 'Running'
            }
            
            Send-MetricsToNewRelic -Data $ADData
        }
    }
    catch {
        Write-Warning "Active Directory metrics collection failed: $($_.Exception.Message)"
    }
}

# New Relicへのメトリクス送信
function Send-MetricsToNewRelic {
    param(
        [hashtable]$Data
    )
    
    try {
        $JsonData = $Data | ConvertTo-Json -Compress
        $Body = "[$JsonData]"
        
        $Headers = @{
            'Content-Type' = 'application/json'
            'X-Insert-Key' = $NewRelicInsertKey
        }
        
        Invoke-RestMethod -Uri $InsightsAPI -Method POST -Headers $Headers -Body $Body
        Write-Host "Metrics sent successfully for $($Data.eventType)" -ForegroundColor Green
    }
    catch {
        Write-Error "Failed to send metrics to New Relic: $($_.Exception.Message)"
    }
}

# メイン実行
function Main {
    Write-Host "Starting Windows Enterprise Monitoring for $Hostname" -ForegroundColor Cyan
    Write-Host "Timestamp: $(Get-Date)" -ForegroundColor Cyan
    
    # 各種メトリクス収集実行
    Collect-SystemMetrics
    Collect-IISMetrics
    Collect-SQLServerMetrics
    Collect-ActiveDirectoryMetrics
    
    Write-Host "Windows monitoring data collection completed" -ForegroundColor Cyan
}

# 引数による実行制御
switch ($args[0]) {
    "system" { Collect-SystemMetrics }
    "iis" { Collect-IISMetrics }
    "sqlserver" { Collect-SQLServerMetrics }
    "ad" { Collect-ActiveDirectoryMetrics }
    default { Main }
}

☁️ 仮想化環境監視

🔧 VMware環境の監視実装

⚙️ VMware vSphere統合設定

yaml
# VMware vSphere 統合設定
# /etc/newrelic-infra/integrations.d/vmware-vsphere.yml

integrations:
  - name: nri-vmware-vsphere
    env:
      # vCenter接続情報
      VCENTER_URL: "https://vcenter.company.com/sdk"
      VCENTER_USER: "[email protected]"
      VCENTER_PASS: "secure_monitoring_password"
      
      # SSL設定
      VALIDATE_SSL: true
      CA_BUNDLE_FILE: "/etc/ssl/certs/ca-bundle.crt"
      
      # 収集設定
      METRICS: true
      EVENTS: true
      INVENTORY: true
      
      # 高度な設定
      DATACENTER_LOCATION: "tokyo-dc1"
      ENABLE_VM_METRICS: true
      ENABLE_HOST_METRICS: true
      ENABLE_CLUSTER_METRICS: true
      ENABLE_DATASTORE_METRICS: true
      ENABLE_RESOURCE_POOL_METRICS: true
      
      # パフォーマンス設定
      BATCH_SIZE: 100
      TIMEOUT: 60
      
    interval: 300s  # 5分間隔
    
    labels:
      environment: production
      virtualization: vmware
      datacenter: tokyo-dc1
      integration: vsphere
      
    # リソースフィルター
    inventory_source: vmware
    
    # カスタム属性マッピング
    custom_attributes:
      vm_monitoring_level: detailed
      host_monitoring_level: comprehensive
      cluster_monitoring_level: summary

📊 VMware 詳細監視スクリプト

python
#!/usr/bin/env python3
"""
VMware エンタープライズ監視スクリプト
"""

import json
import requests
from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
import time
from datetime import datetime, timezone

class VMwareMonitor:
    def __init__(self, vcenter_host, username, password, newrelic_insert_key, account_id):
        self.vcenter_host = vcenter_host
        self.username = username
        self.password = password
        self.newrelic_insert_key = newrelic_insert_key
        self.account_id = account_id
        self.insights_api = f"https://insights-collector.newrelic.com/v1/accounts/{account_id}/events"
        self.service_instance = None
        
    def connect(self):
        """vCenterに接続"""
        try:
            # SSL証明書の検証を無効化(本番環境では適切な証明書を使用)
            context = ssl.create_default_context()
            context.check_hostname = False
            context.verify_mode = ssl.CERT_NONE
            
            self.service_instance = SmartConnect(
                host=self.vcenter_host,
                user=self.username,
                pwd=self.password,
                sslContext=context
            )
            print(f"✅ Connected to vCenter: {self.vcenter_host}")
            return True
            
        except Exception as e:
            print(f"❌ Failed to connect to vCenter: {e}")
            return False
    
    def disconnect(self):
        """vCenter接続を切断"""
        if self.service_instance:
            Disconnect(self.service_instance)
            print("📤 Disconnected from vCenter")
    
    def collect_cluster_metrics(self):
        """クラスター統計を収集"""
        try:
            content = self.service_instance.RetrieveContent()
            cluster_view = content.viewManager.CreateContainerView(
                content.rootFolder, [vim.ClusterComputeResource], True
            )
            
            cluster_metrics = []
            
            for cluster in cluster_view.view:
                # 基本情報
                cluster_info = {
                    'eventType': 'VMwareClusterMetrics',
                    'timestamp': int(time.time()),
                    'cluster_name': cluster.name,
                    'environment': 'production'
                }
                
                # ホスト統計
                total_hosts = len(cluster.host)
                connected_hosts = sum(1 for host in cluster.host if host.runtime.connectionState == 'connected')
                
                # リソース統計
                if cluster.summary:
                    cluster_info.update({
                        'cluster.total_hosts': total_hosts,
                        'cluster.connected_hosts': connected_hosts,
                        'cluster.total_cpu_cores': cluster.summary.numCpuCores or 0,
                        'cluster.total_cpu_threads': cluster.summary.numCpuThreads or 0,
                        'cluster.total_memory_mb': cluster.summary.totalMemory // (1024*1024) if cluster.summary.totalMemory else 0,
                        'cluster.ha_enabled': cluster.configuration.dasConfig.enabled if cluster.configuration.dasConfig else False,
                        'cluster.drs_enabled': cluster.configuration.drsConfig.enabled if cluster.configuration.drsConfig else False
                    })
                
                # VM統計
                total_vms = 0
                powered_on_vms = 0
                for host in cluster.host:
                    total_vms += len(host.vm)
                    powered_on_vms += sum(1 for vm in host.vm if vm.runtime.powerState == 'poweredOn')
                
                cluster_info.update({
                    'cluster.total_vms': total_vms,
                    'cluster.powered_on_vms': powered_on_vms
                })
                
                cluster_metrics.append(cluster_info)
            
            cluster_view.Destroy()
            return cluster_metrics
            
        except Exception as e:
            print(f"❌ Failed to collect cluster metrics: {e}")
            return []
    
    def collect_host_metrics(self):
        """ESXiホスト統計を収集"""
        try:
            content = self.service_instance.RetrieveContent()
            host_view = content.viewManager.CreateContainerView(
                content.rootFolder, [vim.HostSystem], True
            )
            
            host_metrics = []
            
            for host in host_view.view:
                if host.runtime.connectionState != 'connected':
                    continue
                    
                host_info = {
                    'eventType': 'VMwareHostMetrics',
                    'timestamp': int(time.time()),
                    'host_name': host.name,
                    'environment': 'production'
                }
                
                # 基本情報
                if host.summary:
                    host_info.update({
                        'host.connection_state': host.runtime.connectionState,
                        'host.power_state': host.runtime.powerState,
                        'host.cpu_cores': host.summary.hardware.numCpuCores,
                        'host.cpu_threads': host.summary.hardware.numCpuThreads,
                        'host.cpu_mhz': host.summary.hardware.cpuMhz,
                        'host.memory_mb': host.summary.hardware.memorySize // (1024*1024),
                        'host.esxi_version': host.config.product.version if host.config else 'unknown',
                        'host.esxi_build': host.config.product.build if host.config else 'unknown'
                    })
                
                # パフォーマンス統計
                if host.summary.quickStats:
                    host_info.update({
                        'host.cpu_usage_mhz': host.summary.quickStats.overallCpuUsage or 0,
                        'host.memory_usage_mb': host.summary.quickStats.overallMemoryUsage or 0,
                        'host.uptime_seconds': host.summary.quickStats.uptime or 0
                    })
                    
                    # 使用率計算
                    if host.summary.hardware:
                        total_cpu = host.summary.hardware.numCpuCores * host.summary.hardware.cpuMhz
                        total_memory = host.summary.hardware.memorySize // (1024*1024)
                        
                        host_info['host.cpu_usage_percent'] = round(
                            (host.summary.quickStats.overallCpuUsage / total_cpu) * 100, 2
                        ) if total_cpu > 0 else 0
                        
                        host_info['host.memory_usage_percent'] = round(
                            (host.summary.quickStats.overallMemoryUsage / total_memory) * 100, 2
                        ) if total_memory > 0 else 0
                
                # VM統計
                if hasattr(host, 'vm'):
                    host_info.update({
                        'host.total_vms': len(host.vm),
                        'host.powered_on_vms': sum(1 for vm in host.vm if vm.runtime.powerState == 'poweredOn')
                    })
                
                host_metrics.append(host_info)
            
            host_view.Destroy()
            return host_metrics
            
        except Exception as e:
            print(f"❌ Failed to collect host metrics: {e}")
            return []
    
    def collect_vm_metrics(self):
        """仮想マシン統計を収集"""
        try:
            content = self.service_instance.RetrieveContent()
            vm_view = content.viewManager.CreateContainerView(
                content.rootFolder, [vim.VirtualMachine], True
            )
            
            vm_metrics = []
            
            for vm in vm_view.view:
                if not vm.summary:
                    continue
                    
                vm_info = {
                    'eventType': 'VMwareVMMetrics',
                    'timestamp': int(time.time()),
                    'vm_name': vm.name,
                    'environment': 'production'
                }
                
                # 基本情報
                vm_info.update({
                    'vm.power_state': vm.runtime.powerState,
                    'vm.connection_state': vm.runtime.connectionState,
                    'vm.cpu_count': vm.summary.config.numCpu,
                    'vm.memory_mb': vm.summary.config.memorySizeMB,
                    'vm.guest_os': vm.summary.config.guestFullName or 'unknown',
                    'vm.vm_tools_status': vm.summary.guest.toolsStatus if vm.summary.guest else 'unknown',
                    'vm.template': vm.summary.config.template
                })
                
                # パフォーマンス統計(電源ONの場合のみ)
                if vm.runtime.powerState == 'poweredOn' and vm.summary.quickStats:
                    vm_info.update({
                        'vm.cpu_usage_mhz': vm.summary.quickStats.overallCpuUsage or 0,
                        'vm.memory_usage_mb': vm.summary.quickStats.hostMemoryUsage or 0,
                        'vm.guest_memory_usage_mb': vm.summary.quickStats.guestMemoryUsage or 0,
                        'vm.uptime_seconds': vm.summary.quickStats.uptimeSeconds or 0
                    })
                    
                    # 使用率計算
                    if vm.summary.config.numCpu and vm.runtime.host:
                        host_cpu_mhz = vm.runtime.host.summary.hardware.cpuMhz
                        total_vm_cpu_mhz = vm.summary.config.numCpu * host_cpu_mhz
                        
                        vm_info['vm.cpu_usage_percent'] = round(
                            (vm.summary.quickStats.overallCpuUsage / total_vm_cpu_mhz) * 100, 2
                        ) if total_vm_cpu_mhz > 0 else 0
                    
                    if vm.summary.config.memorySizeMB:
                        vm_info['vm.memory_usage_percent'] = round(
                            (vm.summary.quickStats.hostMemoryUsage / vm.summary.config.memorySizeMB) * 100, 2
                        ) if vm.summary.config.memorySizeMB > 0 else 0
                
                # ディスク情報
                if vm.summary.storage:
                    vm_info.update({
                        'vm.provisioned_storage_gb': round(vm.summary.storage.committed / (1024**3), 2),
                        'vm.used_storage_gb': round(vm.summary.storage.uncommitted / (1024**3), 2) if vm.summary.storage.uncommitted else 0
                    })
                
                # ホスト情報
                if vm.runtime.host:
                    vm_info['vm.host_name'] = vm.runtime.host.name
                
                vm_metrics.append(vm_info)
            
            vm_view.Destroy()
            return vm_metrics
            
        except Exception as e:
            print(f"❌ Failed to collect VM metrics: {e}")
            return []
    
    def send_to_newrelic(self, metrics_data):
        """メトリクスをNew Relicに送信"""
        if not metrics_data:
            return
            
        try:
            headers = {
                'Content-Type': 'application/json',
                'X-Insert-Key': self.newrelic_insert_key
            }
            
            # バッチで送信(100件ずつ)
            batch_size = 100
            for i in range(0, len(metrics_data), batch_size):
                batch = metrics_data[i:i+batch_size]
                response = requests.post(
                    self.insights_api,
                    headers=headers,
                    json=batch,
                    timeout=30
                )
                
                if response.status_code == 200:
                    print(f"✅ Sent {len(batch)} metrics to New Relic")
                else:
                    print(f"❌ Failed to send metrics: {response.status_code}")
                    
        except Exception as e:
            print(f"❌ Failed to send metrics to New Relic: {e}")
    
    def run_monitoring(self):
        """メイン監視処理"""
        print("🚀 Starting VMware Enterprise Monitoring")
        print(f"📅 Timestamp: {datetime.now(timezone.utc).isoformat()}")
        
        if not self.connect():
            return False
        
        try:
            # 各種メトリクス収集
            print("📊 Collecting cluster metrics...")
            cluster_metrics = self.collect_cluster_metrics()
            
            print("🖥️  Collecting host metrics...")
            host_metrics = self.collect_host_metrics()
            
            print("💻 Collecting VM metrics...")
            vm_metrics = self.collect_vm_metrics()
            
            # メトリクス送信
            all_metrics = cluster_metrics + host_metrics + vm_metrics
            if all_metrics:
                print(f"📤 Sending {len(all_metrics)} metrics to New Relic...")
                self.send_to_newrelic(all_metrics)
                print("✅ VMware monitoring completed successfully")
            else:
                print("⚠️  No metrics collected")
                
            return True
            
        except Exception as e:
            print(f"❌ Monitoring failed: {e}")
            return False
            
        finally:
            self.disconnect()

# メイン実行
if __name__ == "__main__":
    import os
    
    # 環境変数から設定取得
    VCENTER_HOST = os.environ.get('VCENTER_HOST', 'vcenter.company.com')
    VCENTER_USER = os.environ.get('VCENTER_USER', '[email protected]')
    VCENTER_PASS = os.environ.get('VCENTER_PASS', '')
    NEWRELIC_INSERT_KEY = os.environ.get('NEWRELIC_INSERT_KEY', '')
    NEWRELIC_ACCOUNT_ID = os.environ.get('NEWRELIC_ACCOUNT_ID', '')
    
    if not all([VCENTER_PASS, NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID]):
        print("❌ Required environment variables not set")
        exit(1)
    
    monitor = VMwareMonitor(
        VCENTER_HOST, VCENTER_USER, VCENTER_PASS, 
        NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID
    )
    
    success = monitor.run_monitoring()
    exit(0 if success else 1)

💻 Hyper-V環境監視

⚙️ Hyper-V PowerShell監視

powershell
# Hyper-V エンタープライズ監視スクリプト
# C:\Scripts\Enterprise-HyperV-Monitor.ps1

param(
    [Parameter(Mandatory=$true)]
    [string]$NewRelicInsertKey,
    
    [Parameter(Mandatory=$true)]
    [string]$NewRelicAccountId,
    
    [string]$Environment = "production"
)

# 設定
$InsightsAPI = "https://insights-collector.newrelic.com/v1/accounts/$NewRelicAccountId/events"
$Hostname = $env:COMPUTERNAME

# Hyper-V ホスト情報収集
function Collect-HyperVHostMetrics {
    Write-Host "=== Collecting Hyper-V Host Metrics ===" -ForegroundColor Green
    
    try {
        # Hyper-V機能確認
        $HyperVFeature = Get-WindowsFeature -Name Hyper-V -ErrorAction SilentlyContinue
        if (-not ($HyperVFeature -and $HyperVFeature.InstallState -eq 'Installed')) {
            Write-Warning "Hyper-V role not installed"
            return
        }
        
        # ホストリソース情報
        $VMHost = Get-VMHost
        $HostProcessor = Get-WmiObject Win32_Processor | Select-Object -First 1
        $HostMemory = Get-WmiObject Win32_ComputerSystem
        
        # 仮想マシン統計
        $AllVMs = Get-VM
        $RunningVMs = $AllVMs | Where-Object {$_.State -eq 'Running'}
        $StoppedVMs = $AllVMs | Where-Object {$_.State -eq 'Off'}
        $PausedVMs = $AllVMs | Where-Object {$_.State -eq 'Paused'}
        
        # 仮想スイッチ統計
        $VirtualSwitches = Get-VMSwitch
        $ExternalSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'External'}
        $InternalSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'Internal'}
        $PrivateSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'Private'}
        
        # リソースプール統計
        $ProcessorPools = Get-VMResourcePool -ResourcePoolType Processor
        $MemoryPools = Get-VMResourcePool -ResourcePoolType Memory
        
        $HostData = @{
            eventType = "HyperVHostMetrics"
            timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
            hostname = $Hostname
            environment = $Environment
            "host.hyperv_version" = $VMHost.Version
            "host.virtual_hard_disk_path" = $VMHost.VirtualHardDiskPath
            "host.virtual_machine_path" = $VMHost.VirtualMachinePath
            "host.processor_cores" = $HostProcessor.NumberOfCores
            "host.logical_processors" = $HostProcessor.NumberOfLogicalProcessors
            "host.total_memory_gb" = [math]::Round($HostMemory.TotalPhysicalMemory / 1GB, 2)
            "vms.total" = $AllVMs.Count
            "vms.running" = $RunningVMs.Count
            "vms.stopped" = $StoppedVMs.Count
            "vms.paused" = $PausedVMs.Count
            "switches.total" = $VirtualSwitches.Count
            "switches.external" = $ExternalSwitches.Count
            "switches.internal" = $InternalSwitches.Count
            "switches.private" = $PrivateSwitches.Count
            "resource_pools.processor" = $ProcessorPools.Count
            "resource_pools.memory" = $MemoryPools.Count
        }
        
        Send-MetricsToNewRelic -Data $HostData
    }
    catch {
        Write-Error "Failed to collect Hyper-V host metrics: $($_.Exception.Message)"
    }
}

# 仮想マシン詳細情報収集
function Collect-HyperVVMMetrics {
    Write-Host "=== Collecting Hyper-V VM Metrics ===" -ForegroundColor Green
    
    try {
        $VMs = Get-VM
        
        foreach ($VM in $VMs) {
            # 基本VM情報
            $VMData = @{
                eventType = "HyperVVMMetrics"
                timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
                hostname = $Hostname
                environment = $Environment
                "vm.name" = $VM.Name
                "vm.state" = $VM.State
                "vm.status" = $VM.Status
                "vm.generation" = $VM.Generation
                "vm.version" = $VM.Version
                "vm.cpu_count" = $VM.ProcessorCount
                "vm.memory_assigned_mb" = $VM.MemoryAssigned / 1MB
                "vm.memory_startup_mb" = $VM.MemoryStartup / 1MB
                "vm.dynamic_memory_enabled" = $VM.DynamicMemoryEnabled
                "vm.uptime_seconds" = $VM.Uptime.TotalSeconds
            }
            
            # 動的メモリ設定
            if ($VM.DynamicMemoryEnabled) {
                $VMData["vm.memory_minimum_mb"] = $VM.MemoryMinimum / 1MB
                $VMData["vm.memory_maximum_mb"] = $VM.MemoryMaximum / 1MB
            }
            
            # パフォーマンス統計(実行中の場合)
            if ($VM.State -eq 'Running') {
                try {
                    # CPU使用率
                    $VMProcessor = Get-Counter "\Hyper-V Hypervisor Virtual Processor(*)\% Guest Run Time" | 
                                  Where-Object {$_.CounterSamples.InstanceName -like "*$($VM.Name)*"} |
                                  Select-Object -First 1
                    
                    if ($VMProcessor) {
                        $VMData["vm.cpu_usage_percent"] = [math]::Round($VMProcessor.CounterSamples.CookedValue, 2)
                    }
                    
                    # メモリ圧力
                    $MemoryPressure = Get-Counter "\Hyper-V Dynamic Memory VM(*)\Current Pressure" |
                                     Where-Object {$_.CounterSamples.InstanceName -eq $VM.Name} |
                                     Select-Object -First 1
                    
                    if ($MemoryPressure) {
                        $VMData["vm.memory_pressure"] = [math]::Round($MemoryPressure.CounterSamples.CookedValue, 2)
                    }
                }
                catch {
                    Write-Warning "Failed to collect performance data for VM: $($VM.Name)"
                }
            }
            
            # ネットワークアダプター情報
            $NetworkAdapters = Get-VMNetworkAdapter -VM $VM
            $VMData["vm.network_adapters"] = $NetworkAdapters.Count
            
            # ハードディスク情報
            $HardDrives = Get-VMHardDiskDrive -VM $VM
            $VMData["vm.hard_drives"] = $HardDrives.Count
            
            # 統合サービス状態
            $IntegrationServices = Get-VMIntegrationService -VM $VM
            $EnabledServices = ($IntegrationServices | Where-Object {$_.Enabled}).Count
            $VMData["vm.integration_services_enabled"] = $EnabledServices
            $VMData["vm.integration_services_total"] = $IntegrationServices.Count
            
            Send-MetricsToNewRelic -Data $VMData
        }
    }
    catch {
        Write-Error "Failed to collect VM metrics: $($_.Exception.Message)"
    }
}

# 仮想スイッチ詳細監視
function Collect-HyperVSwitchMetrics {
    Write-Host "=== Collecting Hyper-V Switch Metrics ===" -ForegroundColor Green
    
    try {
        $Switches = Get-VMSwitch
        
        foreach ($Switch in $Switches) {
            $SwitchData = @{
                eventType = "HyperVSwitchMetrics"
                timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
                hostname = $Hostname
                environment = $Environment
                "switch.name" = $Switch.Name
                "switch.type" = $Switch.SwitchType
                "switch.allow_management_os" = $Switch.AllowManagementOS
                "switch.embedded_teaming_enabled" = $Switch.EmbeddedTeamingEnabled
                "switch.iov_enabled" = $Switch.IovEnabled
                "switch.packet_direct_enabled" = $Switch.PacketDirectEnabled
            }
            
            # 外部スイッチの場合、物理アダプター情報
            if ($Switch.SwitchType -eq 'External') {
                $NetAdapter = Get-NetAdapter | Where-Object {$_.InterfaceDescription -eq $Switch.NetAdapterInterfaceDescription}
                if ($NetAdapter) {
                    $SwitchData["switch.physical_adapter"] = $NetAdapter.Name
                    $SwitchData["switch.link_speed_gbps"] = $NetAdapter.LinkSpeed / 1000000000
                    $SwitchData["switch.adapter_status"] = $NetAdapter.Status
                }
            }
            
            # 接続されたVM数
            $ConnectedVMs = Get-VMNetworkAdapter | Where-Object {$_.SwitchName -eq $Switch.Name}
            $SwitchData["switch.connected_vms"] = $ConnectedVMs.Count
            
            Send-MetricsToNewRelic -Data $SwitchData
        }
    }
    catch {
        Write-Error "Failed to collect switch metrics: $($_.Exception.Message)"
    }
}

# New Relicへの送信関数
function Send-MetricsToNewRelic {
    param([hashtable]$Data)
    
    try {
        $JsonData = $Data | ConvertTo-Json -Compress
        $Body = "[$JsonData]"
        
        $Headers = @{
            'Content-Type' = 'application/json'
            'X-Insert-Key' = $NewRelicInsertKey
        }
        
        Invoke-RestMethod -Uri $InsightsAPI -Method POST -Headers $Headers -Body $Body
        Write-Host "✅ Metrics sent for $($Data.eventType): $($Data."vm.name" ?? $Data."switch.name" ?? "Host")" -ForegroundColor Green
    }
    catch {
        Write-Error "Failed to send metrics to New Relic: $($_.Exception.Message)"
    }
}

# メイン実行
function Main {
    Write-Host "🚀 Starting Hyper-V Enterprise Monitoring for $Hostname" -ForegroundColor Cyan
    Write-Host "📅 Timestamp: $(Get-Date)" -ForegroundColor Cyan
    
    # 各種メトリクス収集
    Collect-HyperVHostMetrics
    Collect-HyperVVMMetrics
    Collect-HyperVSwitchMetrics
    
    Write-Host "✅ Hyper-V monitoring completed" -ForegroundColor Cyan
}

# 実行
Main

☁️ マルチクラウド統合監視

🔧 AWS クラウド統合

⚙️ AWS CloudFormation テンプレート

yaml
# AWS New Relic 統合 CloudFormation テンプレート
# aws-newrelic-integration.yaml

AWSTemplateFormatVersion: '2010-09-09'
Description: 'New Relic Infrastructure AWS Integration Setup'

Parameters:
  NewRelicAccountId:
    Type: String
    Description: 'Your New Relic Account ID'
  
  ExternalId:
    Type: String  
    Description: 'External ID for New Relic (Your Account ID)'
  
  Environment:
    Type: String
    Default: 'production'
    AllowedValues: ['production', 'staging', 'development']

Resources:
  # New Relic Integration Role
  NewRelicInfrastructureRole:
    Type: 'AWS::IAM::Role'
    Properties:
      RoleName: !Sub 'NewRelic-Infrastructure-Role-${Environment}'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              AWS: 'arn:aws:iam::754728514883:root'  # New Relic AWS Account
            Action: 'sts:AssumeRole'
            Condition:
              StringEquals:
                'sts:ExternalId': !Ref ExternalId
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/ReadOnlyAccess'
      Policies:
        - PolicyName: 'NewRelicBudgetPolicy'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'budgets:ViewBudget'
                  - 'budgets:ViewBudgets'
                Resource: '*'
      Tags:
        - Key: Environment
          Value: !Ref Environment
        - Key: Purpose
          Value: 'NewRelic-Monitoring'
        - Key: Team
          Value: 'Infrastructure'

  # カスタム IAM Policy(エンタープライズ権限)
  NewRelicEnhancedPolicy:
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyName: 'NewRelicEnhancedPermissions'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          # EC2 詳細権限
          - Effect: Allow
            Action:
              - 'ec2:DescribeInstances'
              - 'ec2:DescribeInstanceStatus'
              - 'ec2:DescribeInstanceAttribute'
              - 'ec2:DescribeVolumes'
              - 'ec2:DescribeVolumeStatus'
              - 'ec2:DescribeVolumeAttribute'
              - 'ec2:DescribeSnapshots'
              - 'ec2:DescribeImages'
              - 'ec2:DescribeSecurityGroups'
              - 'ec2:DescribeNetworkInterfaces'
              - 'ec2:DescribeVpcs'
              - 'ec2:DescribeSubnets'
              - 'ec2:DescribeRouteTables'
              - 'ec2:DescribeInternetGateways'
              - 'ec2:DescribeNatGateways'
              - 'ec2:DescribeReservedInstances'
              - 'ec2:DescribeSpotInstanceRequests'
            Resource: '*'
          
          # RDS 詳細権限
          - Effect: Allow
            Action:
              - 'rds:DescribeDBInstances'
              - 'rds:DescribeDBClusters'
              - 'rds:DescribeDBSubnetGroups'
              - 'rds:DescribeDBParameterGroups'
              - 'rds:DescribeDBClusterParameterGroups'
              - 'rds:DescribeDBSnapshots'
              - 'rds:DescribeDBClusterSnapshots'
              - 'rds:DescribeEvents'
              - 'rds:DescribeEventSubscriptions'
              - 'rds:DescribeDBLogFiles'
              - 'rds:DownloadDBLogFilePortion'
            Resource: '*'
          
          # ELB/ALB 詳細権限
          - Effect: Allow
            Action:
              - 'elasticloadbalancing:DescribeLoadBalancers'
              - 'elasticloadbalancing:DescribeTargetGroups'
              - 'elasticloadbalancing:DescribeTargetHealth'
              - 'elasticloadbalancing:DescribeListeners'
              - 'elasticloadbalancing:DescribeRules'
              - 'elasticloadbalancing:DescribeSSLPolicies'
              - 'elasticloadbalancing:DescribeTags'
            Resource: '*'
          
          # Lambda 詳細権限
          - Effect: Allow
            Action:
              - 'lambda:GetFunction'
              - 'lambda:GetFunctionConfiguration'
              - 'lambda:GetPolicy'
              - 'lambda:ListFunctions'
              - 'lambda:ListEventSourceMappings'
              - 'lambda:ListTags'
              - 'lambda:GetEventSourceMapping'
            Resource: '*'
          
          # CloudWatch 拡張権限
          - Effect: Allow
            Action:
              - 'cloudwatch:GetMetricStatistics'
              - 'cloudwatch:GetMetricData'
              - 'cloudwatch:ListMetrics'
              - 'cloudwatch:DescribeAlarms'
              - 'cloudwatch:DescribeAlarmsForMetric'
              - 'logs:DescribeLogGroups'
              - 'logs:DescribeLogStreams'
              - 'logs:GetLogEvents'
            Resource: '*'
          
          # Auto Scaling 権限
          - Effect: Allow
            Action:
              - 'autoscaling:DescribeAutoScalingGroups'
              - 'autoscaling:DescribeAutoScalingInstances'
              - 'autoscaling:DescribeLaunchConfigurations'
              - 'autoscaling:DescribePolicies'
              - 'autoscaling:DescribeScalingActivities'
            Resource: '*'
          
          # ECS/Fargate 権限
          - Effect: Allow
            Action:
              - 'ecs:DescribeClusters'
              - 'ecs:DescribeServices'
              - 'ecs:DescribeTasks'
              - 'ecs:DescribeTaskDefinition'
              - 'ecs:ListClusters'
              - 'ecs:ListServices'
              - 'ecs:ListTasks'
            Resource: '*'
          
          # S3 権限
          - Effect: Allow
            Action:
              - 's3:GetBucketLocation'
              - 's3:GetBucketNotification'
              - 's3:GetBucketVersioning'
              - 's3:GetBucketWebsite'
              - 's3:ListAllMyBuckets'
              - 's3:GetBucketTagging'
              - 's3:GetBucketLogging'
              - 's3:GetBucketCORS'
              - 's3:GetBucketPolicy'
              - 's3:GetBucketPolicyStatus'
            Resource: '*'
          
          # Cost Explorer & Billing
          - Effect: Allow
            Action:
              - 'ce:GetUsageAndCosts'
              - 'ce:GetReservationCoverage'
              - 'ce:GetReservationPurchaseRecommendation'
              - 'ce:GetReservationUtilization'
              - 'ce:ListCostCategoryDefinitions'
              - 'aws-portal:ViewBilling'
              - 'aws-portal:ViewUsage'
            Resource: '*'
          
          # Organizations (マルチアカウント環境用)
          - Effect: Allow
            Action:
              - 'organizations:DescribeOrganization'
              - 'organizations:ListAccounts'
              - 'organizations:ListRoots'
              - 'organizations:ListOrganizationalUnitsForParent'
              - 'organizations:DescribeAccount'
            Resource: '*'
      Roles:
        - !Ref NewRelicInfrastructureRole

  # SNS Topic for New Relic Notifications
  NewRelicNotificationTopic:
    Type: 'AWS::SNS::Topic'
    Properties:
      TopicName: !Sub 'NewRelic-Infrastructure-Notifications-${Environment}'
      DisplayName: 'New Relic Infrastructure Notifications'
      
  # CloudWatch Alarms for New Relic Integration Health
  NewRelicIntegrationHealthAlarm:
    Type: 'AWS::CloudWatch::Alarm'
    Properties:
      AlarmName: !Sub 'NewRelic-Integration-Health-${Environment}'
      AlarmDescription: 'Monitor New Relic integration health'
      MetricName: 'AssumeRoleFailures'
      Namespace: 'AWS/STS'
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 2
      Threshold: 1
      ComparisonOperator: GreaterThanOrEqualToThreshold
      TreatMissingData: notBreaching
      AlarmActions:
        - !Ref NewRelicNotificationTopic

Outputs:
  RoleArn:
    Description: 'ARN of the New Relic Infrastructure Role'
    Value: !GetAtt NewRelicInfrastructureRole.Arn
    Export:
      Name: !Sub '${AWS::StackName}-RoleArn'
  
  ExternalId:
    Description: 'External ID for New Relic Integration'
    Value: !Ref ExternalId
    Export:
      Name: !Sub '${AWS::StackName}-ExternalId'
  
  NotificationTopicArn:
    Description: 'SNS Topic ARN for New Relic notifications'
    Value: !Ref NewRelicNotificationTopic
    Export:
      Name: !Sub '${AWS::StackName}-NotificationTopic'

📊 AWS コスト監視スクリプト

python
#!/usr/bin/env python3
"""
AWS コスト・使用量監視スクリプト
New Relic Insights への送信
"""

import boto3
import requests
import json
from datetime import datetime, timedelta
import os
from decimal import Decimal

class AWSCostMonitor:
    def __init__(self, newrelic_insert_key, newrelic_account_id, aws_profile=None):
        self.newrelic_insert_key = newrelic_insert_key
        self.newrelic_account_id = newrelic_account_id
        self.insights_api = f"https://insights-collector.newrelic.com/v1/accounts/{newrelic_account_id}/events"
        
        # AWS Session
        if aws_profile:
            self.session = boto3.Session(profile_name=aws_profile)
        else:
            self.session = boto3.Session()
        
        self.ce_client = self.session.client('ce')  # Cost Explorer
        self.organizations_client = None
        
        # Organizations クライアント(マルチアカウント環境用)
        try:
            self.organizations_client = self.session.client('organizations')
        except Exception:
            print("Organizations service not available - single account mode")
    
    def get_account_info(self):
        """アカウント情報取得"""
        try:
            sts_client = self.session.client('sts')
            identity = sts_client.get_caller_identity()
            
            account_info = {
                'account_id': identity['Account'],
                'user_id': identity['UserId'],
                'arn': identity['Arn']
            }
            
            # Organizations情報
            if self.organizations_client:
                try:
                    org = self.organizations_client.describe_organization()
                    account_info['organization_id'] = org['Organization']['Id']
                    account_info['master_account_id'] = org['Organization']['MasterAccountId']
                except Exception:
                    pass
            
            return account_info
            
        except Exception as e:
            print(f"Failed to get account info: {e}")
            return {}
    
    def get_cost_and_usage(self, days=7):
        """コストと使用量データを取得"""
        try:
            end_date = datetime.now().date()
            start_date = end_date - timedelta(days=days)
            
            # 日別コストデータ
            response = self.ce_client.get_cost_and_usage(
                TimePeriod={
                    'Start': start_date.strftime('%Y-%m-%d'),
                    'End': end_date.strftime('%Y-%m-%d')
                },
                Granularity='DAILY',
                Metrics=['BlendedCost', 'UsageQuantity'],
                GroupBy=[
                    {'Type': 'DIMENSION', 'Key': 'SERVICE'},
                ]
            )
            
            cost_metrics = []
            for result in response['ResultsByTime']:
                date = result['TimePeriod']['Start']
                
                total_cost = Decimal('0')
                for group in result['Groups']:
                    service_name = group['Keys'][0]
                    cost = Decimal(group['Metrics']['BlendedCost']['Amount'])
                    usage = Decimal(group['Metrics']['UsageQuantity']['Amount'])
                    
                    total_cost += cost
                    
                    if cost > 0:  # コストが発生しているサービスのみ
                        cost_metrics.append({
                            'eventType': 'AWSCostMetrics',
                            'timestamp': int(datetime.fromisoformat(date).timestamp()),
                            'date': date,
                            'service': service_name,
                            'cost_usd': float(cost),
                            'usage_quantity': float(usage),
                            'currency': 'USD'
                        })
                
                # 日別総コスト
                cost_metrics.append({
                    'eventType': 'AWSCostDaily',
                    'timestamp': int(datetime.fromisoformat(date).timestamp()),
                    'date': date,
                    'total_cost_usd': float(total_cost),
                    'currency': 'USD'
                })
            
            return cost_metrics
            
        except Exception as e:
            print(f"Failed to get cost data: {e}")
            return []
    
    def get_reservation_utilization(self):
        """リザーブドインスタンス使用率取得"""
        try:
            end_date = datetime.now().date()
            start_date = end_date - timedelta(days=30)
            
            response = self.ce_client.get_reservation_utilization(
                TimePeriod={
                    'Start': start_date.strftime('%Y-%m-%d'),
                    'End': end_date.strftime('%Y-%m-%d')
                },
                Granularity='MONTHLY'
            )
            
            reservation_metrics = []
            for result in response['UtilizationsByTime']:
                total = result['Total']
                
                reservation_metrics.append({
                    'eventType': 'AWSReservationUtilization',
                    'timestamp': int(datetime.now().timestamp()),
                    'time_period_start': result['TimePeriod']['Start'],
                    'time_period_end': result['TimePeriod']['End'],
                    'utilization_percentage': float(total['UtilizationPercentage']),
                    'purchased_hours': float(total['PurchasedHours']),
                    'used_hours': float(total['UsedHours']),
                    'unused_hours': float(total['UnusedHours']),
                    'total_actual_hours': float(total['TotalActualHours']),
                    'net_ri_savings': float(total['NetRISavings']),
                    'on_demand_cost_of_ri_hours_used': float(total['OnDemandCostOfRIHoursUsed']),
                    'realized_savings': float(total['RealizedSavings'])
                })
            
            return reservation_metrics
            
        except Exception as e:
            print(f"Failed to get reservation utilization: {e}")
            return []
    
    def get_cost_forecast(self):
        """コスト予測取得"""
        try:
            start_date = datetime.now().date()
            end_date = start_date + timedelta(days=30)
            
            response = self.ce_client.get_cost_forecast(
                TimePeriod={
                    'Start': start_date.strftime('%Y-%m-%d'),
                    'End': end_date.strftime('%Y-%m-%d')
                },
                Metric='BLENDED_COST',
                Granularity='MONTHLY'
            )
            
            forecast_metrics = []
            total = response['Total']
            
            forecast_metrics.append({
                'eventType': 'AWSCostForecast',
                'timestamp': int(datetime.now().timestamp()),
                'forecast_period_start': start_date.strftime('%Y-%m-%d'),
                'forecast_period_end': end_date.strftime('%Y-%m-%d'),
                'forecasted_cost_usd': float(total['Amount']),
                'currency': total['Unit'],
                'confidence_level': 'MEDIUM'  # AWS default
            })
            
            return forecast_metrics
            
        except Exception as e:
            print(f"Failed to get cost forecast: {e}")
            return []
    
    def get_rightsizing_recommendations(self):
        """EC2適正サイズ推奨取得"""
        try:
            response = self.ce_client.get_rightsizing_recommendation(
                Configuration={
                    'BenefitsConsidered': True,
                    'RecommendationTarget': 'SAME_INSTANCE_FAMILY'
                }
            )
            
            rightsizing_metrics = []
            
            # サマリー情報
            summary = response.get('Summary', {})
            rightsizing_metrics.append({
                'eventType': 'AWSRightsizingSummary',
                'timestamp': int(datetime.now().timestamp()),
                'total_recommendation_count': summary.get('TotalRecommendationCount', 0),
                'estimated_total_monthly_savings_usd': float(summary.get('EstimatedTotalMonthlySavingsAmount', 0)),
                'savings_currency': summary.get('SavingsCurrency', 'USD'),
                'savings_percentage': float(summary.get('SavingsPercentage', 0))
            })
            
            # 個別推奨事項
            for rec in response.get('RightsizingRecommendations', [])[:10]:  # 最大10件
                rightsizing_metrics.append({
                    'eventType': 'AWSRightsizingRecommendation',
                    'timestamp': int(datetime.now().timestamp()),
                    'account_id': rec.get('AccountId', ''),
                    'instance_id': rec.get('CurrentInstance', {}).get('ResourceId', ''),
                    'current_instance_type': rec.get('CurrentInstance', {}).get('InstanceType', ''),
                    'recommendation_type': rec.get('RightsizingType', ''),
                    'estimated_monthly_savings_usd': float(rec.get('EstimatedMonthlySavings', 0)),
                    'recommendation_source': 'cost_explorer'
                })
            
            return rightsizing_metrics
            
        except Exception as e:
            print(f"Failed to get rightsizing recommendations: {e}")
            return []
    
    def send_to_newrelic(self, metrics_data):
        """メトリクスをNew Relicに送信"""
        if not metrics_data:
            print("No metrics to send")
            return
        
        try:
            headers = {
                'Content-Type': 'application/json',
                'X-Insert-Key': self.newrelic_insert_key
            }
            
            # アカウント情報を各メトリクスに追加
            account_info = self.get_account_info()
            for metric in metrics_data:
                metric.update(account_info)
                metric['environment'] = os.environ.get('ENVIRONMENT', 'production')
            
            # バッチ送信(100件ずつ)
            batch_size = 100
            for i in range(0, len(metrics_data), batch_size):
                batch = metrics_data[i:i+batch_size]
                
                response = requests.post(
                    self.insights_api,
                    headers=headers,
                    json=batch,
                    timeout=30
                )
                
                if response.status_code == 200:
                    print(f"✅ Sent {len(batch)} cost metrics to New Relic")
                else:
                    print(f"❌ Failed to send metrics: {response.status_code} - {response.text}")
                    
        except Exception as e:
            print(f"❌ Failed to send metrics to New Relic: {e}")
    
    def run_cost_monitoring(self):
        """メインのコスト監視処理"""
        print("🚀 Starting AWS Cost Monitoring")
        print(f"📅 Timestamp: {datetime.now().isoformat()}")
        
        all_metrics = []
        
        # 各種コストデータ収集
        print("💰 Collecting cost and usage data...")
        cost_metrics = self.get_cost_and_usage(days=7)
        all_metrics.extend(cost_metrics)
        
        print("📊 Collecting reservation utilization...")
        reservation_metrics = self.get_reservation_utilization()
        all_metrics.extend(reservation_metrics)
        
        print("🔮 Collecting cost forecast...")
        forecast_metrics = self.get_cost_forecast()
        all_metrics.extend(forecast_metrics)
        
        print("⚡ Collecting rightsizing recommendations...")
        rightsizing_metrics = self.get_rightsizing_recommendations()
        all_metrics.extend(rightsizing_metrics)
        
        # New Relicに送信
        if all_metrics:
            print(f"📤 Sending {len(all_metrics)} cost metrics to New Relic...")
            self.send_to_newrelic(all_metrics)
            print("✅ AWS cost monitoring completed successfully")
        else:
            print("⚠️  No cost metrics collected")

# メイン実行
if __name__ == "__main__":
    # 環境変数から設定取得
    NEWRELIC_INSERT_KEY = os.environ.get('NEWRELIC_INSERT_KEY', '')
    NEWRELIC_ACCOUNT_ID = os.environ.get('NEWRELIC_ACCOUNT_ID', '')
    AWS_PROFILE = os.environ.get('AWS_PROFILE', None)
    
    if not all([NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID]):
        print("❌ Required environment variables not set")
        print("Please set: NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID")
        exit(1)
    
    monitor = AWSCostMonitor(
        NEWRELIC_INSERT_KEY,
        NEWRELIC_ACCOUNT_ID,
        AWS_PROFILE
    )
    
    monitor.run_cost_monitoring()

✅ 4.2セクション完了チェック

🎯 学習目標達成確認

本セクションを完了した時点で、以下ができるようになっているかチェックしてください:

🖥️ 物理・仮想サーバー監視

  • [ ] Linux/Windows サーバーのエンタープライズ設定ができる
  • [ ] VMware vSphere環境の包括的監視を実装できる
  • [ ] Hyper-V環境の詳細監視を設定できる
  • [ ] セキュリティ・コンプライアンス監視を追加できる

☁️ クラウド統合監視

  • [ ] AWS CloudFormationでの統合設定ができる
  • [ ] マルチアカウント環境での監視を構築できる
  • [ ] コスト監視と最適化推奨を実装できる
  • [ ] 予測分析とアラート設定ができる

🏢 エンタープライズ機能

  • [ ] 大規模環境での監視アーキテクチャを設計できる
  • [ ] コンプライアンス要件への対応ができる
  • [ ] コスト効率化の分析と実装ができる
  • [ ] 組織横断での監視体制を構築できる

🚀 次のステップ

サーバー・クラウド監視をマスターしたら、次のセクションに進みましょう:


📖 セクション内ナビゲーション

🔗 第4章内リンク

📚 関連章リンク


🎯 次のステップ: 4.3 コンテナ・Kubernetes環境で、モダンなコンテナ化環境の監視手法を習得しましょう!