4.2 サーバー・VM・クラウド監視 - 統合インフラ環境の包括的監視
現代のエンタープライズ環境では、オンプレミスサーバー、仮想化環境、パブリッククラウドが複雑に組み合わさったハイブリッド構成が主流です。New Relic Infrastructureは、これらの異なるインフラ環境を統一的に監視し、運用効率を最大化する包括的なソリューションを提供します。
本セクションでは、物理サーバーから最新のマルチクラウド環境まで、すべてのインフラストラクチャを効果的に監視する実践的手法を体系的に解説します。
🎯 このセクションの学習目標
📊 技術的スキル習得
- Linux/Windows サーバーの高度な監視設定
- 仮想化環境(VMware、Hyper-V)の専門的監視
- マルチクラウド統合の戦略的実装
- ハイブリッドクラウド監視の最適化
🏢 ビジネス価値実現
- 運用コスト削減:統合監視による効率化(30-50%削減)
- 可用性向上:予防的監視による障害予防(MTTR 60%短縮)
- リソース最適化:使用状況分析によるコスト削減(20-40%削減)
- コンプライアンス対応:規制要件への自動対応
🖥️ 物理・仮想サーバー監視の高度実装
🐧 Linux サーバー監視のエンタープライズ設定
⚙️ 高度なエージェント設定
yaml
# /etc/newrelic-infra.yml - エンタープライズLinux設定
license_key: YOUR_ENTERPRISE_LICENSE_KEY
display_name: "{{.environment}}-{{.role}}-{{.hostname}}"
# パフォーマンス最適化
metrics_system_sample_rate: 15s
metrics_process_sample_rate: 20s
metrics_network_sample_rate: 10s
metrics_storage_sample_rate: 20s
# エンタープライズ環境識別
custom_attributes:
# インフラ分類
environment: production
data_center: tokyo-dc1
rack_location: "rack-A-15"
server_class: bare_metal
# ビジネス情報
business_unit: ecommerce
cost_center: infrastructure
service_tier: tier1
criticality: mission_critical
# コンプライアンス
compliance_zone: pci_dss
data_classification: confidential
backup_policy: daily_encrypted
retention_policy: 7years
# 運用情報
maintenance_window: "02:00-04:00_JST"
primary_contact: "infrastructure-team"
escalation_policy: "critical_infra"
# 詳細システム監視
enable_process_metrics: true
process_config:
# Webサーバー監視
- name: "nginx_processes"
match:
- "nginx: master process"
- "nginx: worker process"
attributes:
service: web_tier
component: reverse_proxy
monitoring_level: comprehensive
# アプリケーションサーバー
- name: "java_applications"
match:
- "java.*tomcat"
- "java.*spring"
- "java.*jetty"
attributes:
service: app_tier
component: application_server
jvm_monitoring: enabled
gc_monitoring: detailed
# データベース
- name: "database_servers"
match:
- "postgres.*server"
- "mysql.*server"
attributes:
service: data_tier
component: database
replication_monitoring: enabled
backup_monitoring: enabled
# ネットワーク詳細監視
network_interface_filters:
enabled_interface_filters:
- "eth*" # 物理インターフェース
- "en*" # 最新命名規則
- "bond*" # ボンディング
- "team*" # チーミング
disabled_interface_filters:
- "lo" # ループバック
- "docker*" # Docker仮想IF
- "br-*" # ブリッジ
- "veth*" # 仮想Ethernet
# ストレージ高度監視
file_systems_config:
# 重要パーティション監視
include_file_systems:
- mount_point: "/"
fs_type: "ext4"
attributes:
partition_type: root
backup_required: true
monitoring_level: critical
- mount_point: "/var/log"
fs_type: "ext4"
attributes:
partition_type: logs
log_rotation: enabled
retention_days: 30
- mount_point: "/opt/app"
fs_type: "ext4"
attributes:
partition_type: application
backup_required: true
snapshot_enabled: true
- mount_point: "/data"
fs_type: "xfs"
attributes:
partition_type: database
backup_required: true
encryption: enabled
# 除外ファイルシステム
ignore_file_system_types:
- "tmpfs"
- "devtmpfs"
- "sysfs"
- "proc"
- "squashfs"
# セキュリティ設定
strip_command_line: true
disable_cloud_metadata: false
http_server_enabled: true
http_server_host: "127.0.0.1"
http_server_port: 8003
# ログ管理
log_file: "/var/log/newrelic-infra/newrelic-infra.log"
log_format: "json"
log_to_stdout: false
verbose: 1
📊 システムパフォーマンス監視スクリプト
bash
#!/bin/bash
# エンタープライズLinux システム監視スクリプト
# /usr/local/bin/enterprise-system-monitor.sh
# 設定
NEWRELIC_INSERT_KEY="YOUR_INSERT_KEY"
NEWRELIC_ACCOUNT_ID="YOUR_ACCOUNT_ID"
HOSTNAME=$(hostname)
ENVIRONMENT="production"
# APIエンドポイント
INSIGHTS_API="https://insights-collector.newrelic.com/v1/accounts/$NEWRELIC_ACCOUNT_ID/events"
# 詳細システムメトリクス収集
collect_system_metrics() {
echo "=== Collecting Enterprise System Metrics ==="
# CPU詳細情報
local cpu_cores=$(nproc)
local load_1min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
local load_5min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $2}' | sed 's/,//')
local load_15min=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $3}')
# メモリ詳細情報
local memory_total=$(free -b | grep '^Mem:' | awk '{print $2}')
local memory_used=$(free -b | grep '^Mem:' | awk '{print $3}')
local memory_free=$(free -b | grep '^Mem:' | awk '{print $4}')
local memory_cached=$(free -b | grep '^Mem:' | awk '{print $6}')
local swap_total=$(free -b | grep '^Swap:' | awk '{print $2}')
local swap_used=$(free -b | grep '^Swap:' | awk '{print $3}')
# ディスクI/O統計
local disk_reads=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{sum+=$4} END {print sum}')
local disk_writes=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{sum+=$5} END {print sum}')
local disk_util=$(iostat -x 1 1 | grep -E '^(sd|nvme)' | awk '{if($10>max) max=$10} END {print max}')
# ネットワーク統計
local net_rx_bytes=$(cat /sys/class/net/eth0/statistics/rx_bytes 2>/dev/null || echo 0)
local net_tx_bytes=$(cat /sys/class/net/eth0/statistics/tx_bytes 2>/dev/null || echo 0)
local net_rx_errors=$(cat /sys/class/net/eth0/statistics/rx_errors 2>/dev/null || echo 0)
local net_tx_errors=$(cat /sys/class/net/eth0/statistics/tx_errors 2>/dev/null || echo 0)
# プロセス統計
local total_processes=$(ps aux | wc -l)
local running_processes=$(ps aux | awk '$8 ~ /^R/ {count++} END {print count}')
local zombie_processes=$(ps aux | awk '$8 ~ /^Z/ {count++} END {print count}')
# New Relicに送信
curl -X POST "$INSIGHTS_API" \
-H "Content-Type: application/json" \
-H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
-d "[{
\"eventType\": \"EnterpriseSystemMetrics\",
\"timestamp\": $(date +%s),
\"hostname\": \"$HOSTNAME\",
\"environment\": \"$ENVIRONMENT\",
\"cpu.cores\": $cpu_cores,
\"cpu.load_1min\": $load_1min,
\"cpu.load_5min\": $load_5min,
\"cpu.load_15min\": $load_15min,
\"memory.total_bytes\": $memory_total,
\"memory.used_bytes\": $memory_used,
\"memory.free_bytes\": $memory_free,
\"memory.cached_bytes\": $memory_cached,
\"memory.usage_percent\": $(echo \"scale=2; $memory_used * 100 / $memory_total\" | bc),
\"swap.total_bytes\": $swap_total,
\"swap.used_bytes\": $swap_used,
\"disk.reads_per_sec\": $disk_reads,
\"disk.writes_per_sec\": $disk_writes,
\"disk.utilization_percent\": $disk_util,
\"network.rx_bytes\": $net_rx_bytes,
\"network.tx_bytes\": $net_tx_bytes,
\"network.rx_errors\": $net_rx_errors,
\"network.tx_errors\": $net_tx_errors,
\"processes.total\": $total_processes,
\"processes.running\": $running_processes,
\"processes.zombie\": $zombie_processes
}]"
}
# セキュリティ監視
collect_security_metrics() {
echo "=== Collecting Security Metrics ==="
# ログイン失敗回数
local failed_logins=$(grep "Failed password" /var/log/auth.log 2>/dev/null | wc -l || echo 0)
# 最近のsudo使用
local sudo_usage=$(grep "sudo:" /var/log/auth.log 2>/dev/null | tail -n 10 | wc -l || echo 0)
# ファイルシステム変更(重要ディレクトリ)
local system_file_changes=0
if [ -f "/var/log/aide/aide.log" ]; then
system_file_changes=$(grep -c "changed" /var/log/aide/aide.log 2>/dev/null || echo 0)
fi
# 開いているネットワーク接続
local open_connections=$(ss -tuln | grep LISTEN | wc -l)
# 不審なプロセス検出(簡易版)
local suspicious_processes=$(ps aux | grep -E "(nc|netcat|ncat)" | grep -v grep | wc -l)
curl -X POST "$INSIGHTS_API" \
-H "Content-Type: application/json" \
-H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
-d "[{
\"eventType\": \"SecurityMetrics\",
\"timestamp\": $(date +%s),
\"hostname\": \"$HOSTNAME\",
\"environment\": \"$ENVIRONMENT\",
\"security.failed_logins\": $failed_logins,
\"security.sudo_usage\": $sudo_usage,
\"security.file_changes\": $system_file_changes,
\"security.open_connections\": $open_connections,
\"security.suspicious_processes\": $suspicious_processes,
\"security.last_update\": \"$(date -Iseconds)\"
}]"
}
# アプリケーション固有メトリクス
collect_application_metrics() {
echo "=== Collecting Application Metrics ==="
# データベース接続プール(PostgreSQL例)
local db_connections=0
if command -v psql >/dev/null 2>&1; then
db_connections=$(psql -t -c "SELECT count(*) FROM pg_stat_activity;" 2>/dev/null | tr -d ' ' || echo 0)
fi
# Webサーバー統計(Nginx例)
local nginx_active_connections=0
local nginx_requests_per_sec=0
if [ -f "/var/log/nginx/access.log" ]; then
nginx_active_connections=$(ss -tuln | grep ":80\|:443" | wc -l)
nginx_requests_per_sec=$(tail -n 100 /var/log/nginx/access.log | wc -l)
fi
# Redis統計
local redis_connected_clients=0
local redis_memory_usage=0
if command -v redis-cli >/dev/null 2>&1; then
redis_connected_clients=$(redis-cli info clients 2>/dev/null | grep "connected_clients:" | cut -d: -f2 | tr -d '\r' || echo 0)
redis_memory_usage=$(redis-cli info memory 2>/dev/null | grep "used_memory:" | cut -d: -f2 | tr -d '\r' || echo 0)
fi
curl -X POST "$INSIGHTS_API" \
-H "Content-Type: application/json" \
-H "X-Insert-Key: $NEWRELIC_INSERT_KEY" \
-d "[{
\"eventType\": \"ApplicationMetrics\",
\"timestamp\": $(date +%s),
\"hostname\": \"$HOSTNAME\",
\"environment\": \"$ENVIRONMENT\",
\"database.connections\": $db_connections,
\"webserver.active_connections\": $nginx_active_connections,
\"webserver.requests_per_minute\": $nginx_requests_per_sec,
\"cache.connected_clients\": $redis_connected_clients,
\"cache.memory_usage_bytes\": $redis_memory_usage
}]"
}
# メイン実行
main() {
echo "Starting Enterprise System Monitoring for $HOSTNAME"
echo "Timestamp: $(date)"
# 各種メトリクス収集
collect_system_metrics
collect_security_metrics
collect_application_metrics
echo "Monitoring data collection completed"
}
# 引数による実行制御
case "$1" in
"system")
collect_system_metrics
;;
"security")
collect_security_metrics
;;
"application")
collect_application_metrics
;;
"all"|"")
main
;;
*)
echo "Usage: $0 {system|security|application|all}"
exit 1
;;
esac
🪟 Windows サーバー監視の実装
⚙️ Windows エージェント設定
yaml
# C:\Program Files\New Relic\newrelic-infra\newrelic-infra.yml
license_key: YOUR_ENTERPRISE_LICENSE_KEY
display_name: "WIN-{{.environment}}-{{.hostname}}"
# Windows固有設定
enable_win_services: true
enable_win_processes: true
# カスタム属性(Windows環境)
custom_attributes:
# システム情報
os_family: windows
os_version: "2019"
domain: "corp.company.com"
# ビジネス分類
environment: production
business_unit: finance
application_tier: web_tier
# Windows固有
windows_edition: "Standard"
active_directory: enabled
exchange_server: true
iis_role: enabled
# Windowsサービス監視
win_services_config:
enabled_services:
- "W3SVC" # IIS
- "MSSQLSERVER" # SQL Server
- "SQLSERVERAGENT" # SQL Server Agent
- "DNS" # DNS Server
- "DHCP" # DHCP Server
- "Spooler" # Print Spooler
- "Schedule" # Task Scheduler
- "EventLog" # Windows Event Log
# Windowsプロセス監視
win_process_config:
- name: "iis_processes"
match:
- "w3wp.exe"
- "iisexpress.exe"
attributes:
service: web_server
tier: frontend
- name: "sql_server_processes"
match:
- "sqlservr.exe"
- "sqlagent.exe"
attributes:
service: database
tier: data
# パフォーマンスカウンター
performance_counters:
- name: "processor_utilization"
counter: "\\Processor(_Total)\\% Processor Time"
attributes:
metric_type: system_performance
- name: "memory_available"
counter: "\\Memory\\Available MBytes"
attributes:
metric_type: system_performance
- name: "iis_requests"
counter: "\\Web Service(_Total)\\Total Method Requests/sec"
attributes:
metric_type: application_performance
📊 Windows PowerShell 監視スクリプト
powershell
# Windows エンタープライズ監視スクリプト
# C:\Scripts\Enterprise-Windows-Monitor.ps1
param(
[Parameter(Mandatory=$true)]
[string]$NewRelicInsertKey,
[Parameter(Mandatory=$true)]
[string]$NewRelicAccountId,
[string]$Environment = "production"
)
# 設定
$InsightsAPI = "https://insights-collector.newrelic.com/v1/accounts/$NewRelicAccountId/events"
$Hostname = $env:COMPUTERNAME
# システムメトリクス収集
function Collect-SystemMetrics {
Write-Host "=== Collecting Windows System Metrics ===" -ForegroundColor Green
# CPU使用率
$CPUUsage = Get-WmiObject Win32_Processor | Measure-Object -Property LoadPercentage -Average | Select-Object -ExpandProperty Average
# メモリ情報
$TotalMemory = (Get-WmiObject Win32_ComputerSystem).TotalPhysicalMemory
$FreeMemory = (Get-WmiObject Win32_OperatingSystem).FreePhysicalMemory * 1024
$UsedMemory = $TotalMemory - $FreeMemory
$MemoryUsagePercent = [math]::Round(($UsedMemory / $TotalMemory) * 100, 2)
# ディスク情報
$DiskInfo = Get-WmiObject Win32_LogicalDisk | Where-Object {$_.DriveType -eq 3} | ForEach-Object {
[PSCustomObject]@{
DriveLetter = $_.DeviceID
TotalSize = $_.Size
FreeSpace = $_.FreeSpace
UsedSpace = $_.Size - $_.FreeSpace
UsagePercent = [math]::Round((($_.Size - $_.FreeSpace) / $_.Size) * 100, 2)
}
}
# プロセス統計
$TotalProcesses = (Get-Process).Count
$RunningServices = (Get-Service | Where-Object {$_.Status -eq 'Running'}).Count
$StoppedServices = (Get-Service | Where-Object {$_.Status -eq 'Stopped'}).Count
# ネットワーク統計
$NetworkAdapters = Get-WmiObject Win32_NetworkAdapter | Where-Object {$_.NetEnabled -eq $true}
$ActiveConnections = (Get-NetTCPConnection | Where-Object {$_.State -eq 'Established'}).Count
# メトリクスデータ構築
$MetricsData = @{
eventType = "WindowsSystemMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"cpu.usage_percent" = $CPUUsage
"memory.total_bytes" = $TotalMemory
"memory.used_bytes" = $UsedMemory
"memory.free_bytes" = $FreeMemory
"memory.usage_percent" = $MemoryUsagePercent
"processes.total" = $TotalProcesses
"services.running" = $RunningServices
"services.stopped" = $StoppedServices
"network.active_connections" = $ActiveConnections
"network.adapters_enabled" = $NetworkAdapters.Count
}
# ディスク情報を追加
foreach ($Disk in $DiskInfo) {
$DriveLetter = $Disk.DriveLetter.Replace(":", "")
$MetricsData["disk.$DriveLetter.total_bytes"] = $Disk.TotalSize
$MetricsData["disk.$DriveLetter.used_bytes"] = $Disk.UsedSpace
$MetricsData["disk.$DriveLetter.free_bytes"] = $Disk.FreeSpace
$MetricsData["disk.$DriveLetter.usage_percent"] = $Disk.UsagePercent
}
# New Relicに送信
Send-MetricsToNewRelic -Data $MetricsData
}
# IIS監視
function Collect-IISMetrics {
Write-Host "=== Collecting IIS Metrics ===" -ForegroundColor Green
if (Get-WindowsFeature -Name IIS-WebServerRole -ErrorAction SilentlyContinue) {
# IIS統計取得
$IISSites = Get-IISSite
$W3WPProcesses = Get-Process -Name w3wp -ErrorAction SilentlyContinue
# アプリケーションプール統計
$AppPools = Get-IISAppPool
$RunningAppPools = ($AppPools | Where-Object {$_.State -eq 'Started'}).Count
$StoppedAppPools = ($AppPools | Where-Object {$_.State -eq 'Stopped'}).Count
$IISData = @{
eventType = "WindowsIISMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"iis.sites_total" = $IISSites.Count
"iis.worker_processes" = $W3WPProcesses.Count
"iis.app_pools_running" = $RunningAppPools
"iis.app_pools_stopped" = $StoppedAppPools
}
Send-MetricsToNewRelic -Data $IISData
}
}
# SQL Server監視
function Collect-SQLServerMetrics {
Write-Host "=== Collecting SQL Server Metrics ===" -ForegroundColor Green
try {
# SQL Serverサービス確認
$SQLService = Get-Service -Name "MSSQLSERVER" -ErrorAction SilentlyContinue
if ($SQLService -and $SQLService.Status -eq 'Running') {
# SQL Server接続試行
$ConnectionString = "Server=localhost;Database=master;Integrated Security=true;Connection Timeout=10;"
$Connection = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
$Connection.Open()
# 基本統計クエリ
$Command = $Connection.CreateCommand()
$Command.CommandText = @"
SELECT
(SELECT COUNT(*) FROM sys.dm_exec_sessions WHERE is_user_process = 1) as ActiveConnections,
(SELECT COUNT(*) FROM sys.databases WHERE state = 0) as OnlineDatabases,
(SELECT COUNT(*) FROM sys.dm_exec_requests) as ActiveRequests
"@
$Reader = $Command.ExecuteReader()
if ($Reader.Read()) {
$SQLData = @{
eventType = "WindowsSQLServerMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"sqlserver.service_status" = $SQLService.Status
"sqlserver.active_connections" = $Reader["ActiveConnections"]
"sqlserver.online_databases" = $Reader["OnlineDatabases"]
"sqlserver.active_requests" = $Reader["ActiveRequests"]
}
Send-MetricsToNewRelic -Data $SQLData
}
$Reader.Close()
$Connection.Close()
}
}
catch {
Write-Warning "SQL Server metrics collection failed: $($_.Exception.Message)"
}
}
# Active Directory監視
function Collect-ActiveDirectoryMetrics {
Write-Host "=== Collecting Active Directory Metrics ===" -ForegroundColor Green
try {
# Domain Controller確認
$DCRole = Get-WindowsFeature -Name AD-Domain-Services -ErrorAction SilentlyContinue
if ($DCRole -and $DCRole.InstallState -eq 'Installed') {
# AD統計取得
$ADUsers = (Get-ADUser -Filter * -ErrorAction SilentlyContinue).Count
$ADComputers = (Get-ADComputer -Filter * -ErrorAction SilentlyContinue).Count
$ADGroups = (Get-ADGroup -Filter * -ErrorAction SilentlyContinue).Count
# FSMO役割確認
$FSMORoles = Get-ADForest | Select-Object -ExpandProperty SchemaMaster, DomainNamingMaster
$IsFSMOHolder = ($FSMORoles -contains $env:COMPUTERNAME)
$ADData = @{
eventType = "WindowsActiveDirectoryMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"ad.users_count" = $ADUsers
"ad.computers_count" = $ADComputers
"ad.groups_count" = $ADGroups
"ad.is_fsmo_holder" = $IsFSMOHolder
"ad.service_running" = (Get-Service -Name "NTDS" -ErrorAction SilentlyContinue).Status -eq 'Running'
}
Send-MetricsToNewRelic -Data $ADData
}
}
catch {
Write-Warning "Active Directory metrics collection failed: $($_.Exception.Message)"
}
}
# New Relicへのメトリクス送信
function Send-MetricsToNewRelic {
param(
[hashtable]$Data
)
try {
$JsonData = $Data | ConvertTo-Json -Compress
$Body = "[$JsonData]"
$Headers = @{
'Content-Type' = 'application/json'
'X-Insert-Key' = $NewRelicInsertKey
}
Invoke-RestMethod -Uri $InsightsAPI -Method POST -Headers $Headers -Body $Body
Write-Host "Metrics sent successfully for $($Data.eventType)" -ForegroundColor Green
}
catch {
Write-Error "Failed to send metrics to New Relic: $($_.Exception.Message)"
}
}
# メイン実行
function Main {
Write-Host "Starting Windows Enterprise Monitoring for $Hostname" -ForegroundColor Cyan
Write-Host "Timestamp: $(Get-Date)" -ForegroundColor Cyan
# 各種メトリクス収集実行
Collect-SystemMetrics
Collect-IISMetrics
Collect-SQLServerMetrics
Collect-ActiveDirectoryMetrics
Write-Host "Windows monitoring data collection completed" -ForegroundColor Cyan
}
# 引数による実行制御
switch ($args[0]) {
"system" { Collect-SystemMetrics }
"iis" { Collect-IISMetrics }
"sqlserver" { Collect-SQLServerMetrics }
"ad" { Collect-ActiveDirectoryMetrics }
default { Main }
}
☁️ 仮想化環境監視
🔧 VMware環境の監視実装
⚙️ VMware vSphere統合設定
yaml
# VMware vSphere 統合設定
# /etc/newrelic-infra/integrations.d/vmware-vsphere.yml
integrations:
- name: nri-vmware-vsphere
env:
# vCenter接続情報
VCENTER_URL: "https://vcenter.company.com/sdk"
VCENTER_USER: "[email protected]"
VCENTER_PASS: "secure_monitoring_password"
# SSL設定
VALIDATE_SSL: true
CA_BUNDLE_FILE: "/etc/ssl/certs/ca-bundle.crt"
# 収集設定
METRICS: true
EVENTS: true
INVENTORY: true
# 高度な設定
DATACENTER_LOCATION: "tokyo-dc1"
ENABLE_VM_METRICS: true
ENABLE_HOST_METRICS: true
ENABLE_CLUSTER_METRICS: true
ENABLE_DATASTORE_METRICS: true
ENABLE_RESOURCE_POOL_METRICS: true
# パフォーマンス設定
BATCH_SIZE: 100
TIMEOUT: 60
interval: 300s # 5分間隔
labels:
environment: production
virtualization: vmware
datacenter: tokyo-dc1
integration: vsphere
# リソースフィルター
inventory_source: vmware
# カスタム属性マッピング
custom_attributes:
vm_monitoring_level: detailed
host_monitoring_level: comprehensive
cluster_monitoring_level: summary
📊 VMware 詳細監視スクリプト
python
#!/usr/bin/env python3
"""
VMware エンタープライズ監視スクリプト
"""
import json
import requests
from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
import time
from datetime import datetime, timezone
class VMwareMonitor:
def __init__(self, vcenter_host, username, password, newrelic_insert_key, account_id):
self.vcenter_host = vcenter_host
self.username = username
self.password = password
self.newrelic_insert_key = newrelic_insert_key
self.account_id = account_id
self.insights_api = f"https://insights-collector.newrelic.com/v1/accounts/{account_id}/events"
self.service_instance = None
def connect(self):
"""vCenterに接続"""
try:
# SSL証明書の検証を無効化(本番環境では適切な証明書を使用)
context = ssl.create_default_context()
context.check_hostname = False
context.verify_mode = ssl.CERT_NONE
self.service_instance = SmartConnect(
host=self.vcenter_host,
user=self.username,
pwd=self.password,
sslContext=context
)
print(f"✅ Connected to vCenter: {self.vcenter_host}")
return True
except Exception as e:
print(f"❌ Failed to connect to vCenter: {e}")
return False
def disconnect(self):
"""vCenter接続を切断"""
if self.service_instance:
Disconnect(self.service_instance)
print("📤 Disconnected from vCenter")
def collect_cluster_metrics(self):
"""クラスター統計を収集"""
try:
content = self.service_instance.RetrieveContent()
cluster_view = content.viewManager.CreateContainerView(
content.rootFolder, [vim.ClusterComputeResource], True
)
cluster_metrics = []
for cluster in cluster_view.view:
# 基本情報
cluster_info = {
'eventType': 'VMwareClusterMetrics',
'timestamp': int(time.time()),
'cluster_name': cluster.name,
'environment': 'production'
}
# ホスト統計
total_hosts = len(cluster.host)
connected_hosts = sum(1 for host in cluster.host if host.runtime.connectionState == 'connected')
# リソース統計
if cluster.summary:
cluster_info.update({
'cluster.total_hosts': total_hosts,
'cluster.connected_hosts': connected_hosts,
'cluster.total_cpu_cores': cluster.summary.numCpuCores or 0,
'cluster.total_cpu_threads': cluster.summary.numCpuThreads or 0,
'cluster.total_memory_mb': cluster.summary.totalMemory // (1024*1024) if cluster.summary.totalMemory else 0,
'cluster.ha_enabled': cluster.configuration.dasConfig.enabled if cluster.configuration.dasConfig else False,
'cluster.drs_enabled': cluster.configuration.drsConfig.enabled if cluster.configuration.drsConfig else False
})
# VM統計
total_vms = 0
powered_on_vms = 0
for host in cluster.host:
total_vms += len(host.vm)
powered_on_vms += sum(1 for vm in host.vm if vm.runtime.powerState == 'poweredOn')
cluster_info.update({
'cluster.total_vms': total_vms,
'cluster.powered_on_vms': powered_on_vms
})
cluster_metrics.append(cluster_info)
cluster_view.Destroy()
return cluster_metrics
except Exception as e:
print(f"❌ Failed to collect cluster metrics: {e}")
return []
def collect_host_metrics(self):
"""ESXiホスト統計を収集"""
try:
content = self.service_instance.RetrieveContent()
host_view = content.viewManager.CreateContainerView(
content.rootFolder, [vim.HostSystem], True
)
host_metrics = []
for host in host_view.view:
if host.runtime.connectionState != 'connected':
continue
host_info = {
'eventType': 'VMwareHostMetrics',
'timestamp': int(time.time()),
'host_name': host.name,
'environment': 'production'
}
# 基本情報
if host.summary:
host_info.update({
'host.connection_state': host.runtime.connectionState,
'host.power_state': host.runtime.powerState,
'host.cpu_cores': host.summary.hardware.numCpuCores,
'host.cpu_threads': host.summary.hardware.numCpuThreads,
'host.cpu_mhz': host.summary.hardware.cpuMhz,
'host.memory_mb': host.summary.hardware.memorySize // (1024*1024),
'host.esxi_version': host.config.product.version if host.config else 'unknown',
'host.esxi_build': host.config.product.build if host.config else 'unknown'
})
# パフォーマンス統計
if host.summary.quickStats:
host_info.update({
'host.cpu_usage_mhz': host.summary.quickStats.overallCpuUsage or 0,
'host.memory_usage_mb': host.summary.quickStats.overallMemoryUsage or 0,
'host.uptime_seconds': host.summary.quickStats.uptime or 0
})
# 使用率計算
if host.summary.hardware:
total_cpu = host.summary.hardware.numCpuCores * host.summary.hardware.cpuMhz
total_memory = host.summary.hardware.memorySize // (1024*1024)
host_info['host.cpu_usage_percent'] = round(
(host.summary.quickStats.overallCpuUsage / total_cpu) * 100, 2
) if total_cpu > 0 else 0
host_info['host.memory_usage_percent'] = round(
(host.summary.quickStats.overallMemoryUsage / total_memory) * 100, 2
) if total_memory > 0 else 0
# VM統計
if hasattr(host, 'vm'):
host_info.update({
'host.total_vms': len(host.vm),
'host.powered_on_vms': sum(1 for vm in host.vm if vm.runtime.powerState == 'poweredOn')
})
host_metrics.append(host_info)
host_view.Destroy()
return host_metrics
except Exception as e:
print(f"❌ Failed to collect host metrics: {e}")
return []
def collect_vm_metrics(self):
"""仮想マシン統計を収集"""
try:
content = self.service_instance.RetrieveContent()
vm_view = content.viewManager.CreateContainerView(
content.rootFolder, [vim.VirtualMachine], True
)
vm_metrics = []
for vm in vm_view.view:
if not vm.summary:
continue
vm_info = {
'eventType': 'VMwareVMMetrics',
'timestamp': int(time.time()),
'vm_name': vm.name,
'environment': 'production'
}
# 基本情報
vm_info.update({
'vm.power_state': vm.runtime.powerState,
'vm.connection_state': vm.runtime.connectionState,
'vm.cpu_count': vm.summary.config.numCpu,
'vm.memory_mb': vm.summary.config.memorySizeMB,
'vm.guest_os': vm.summary.config.guestFullName or 'unknown',
'vm.vm_tools_status': vm.summary.guest.toolsStatus if vm.summary.guest else 'unknown',
'vm.template': vm.summary.config.template
})
# パフォーマンス統計(電源ONの場合のみ)
if vm.runtime.powerState == 'poweredOn' and vm.summary.quickStats:
vm_info.update({
'vm.cpu_usage_mhz': vm.summary.quickStats.overallCpuUsage or 0,
'vm.memory_usage_mb': vm.summary.quickStats.hostMemoryUsage or 0,
'vm.guest_memory_usage_mb': vm.summary.quickStats.guestMemoryUsage or 0,
'vm.uptime_seconds': vm.summary.quickStats.uptimeSeconds or 0
})
# 使用率計算
if vm.summary.config.numCpu and vm.runtime.host:
host_cpu_mhz = vm.runtime.host.summary.hardware.cpuMhz
total_vm_cpu_mhz = vm.summary.config.numCpu * host_cpu_mhz
vm_info['vm.cpu_usage_percent'] = round(
(vm.summary.quickStats.overallCpuUsage / total_vm_cpu_mhz) * 100, 2
) if total_vm_cpu_mhz > 0 else 0
if vm.summary.config.memorySizeMB:
vm_info['vm.memory_usage_percent'] = round(
(vm.summary.quickStats.hostMemoryUsage / vm.summary.config.memorySizeMB) * 100, 2
) if vm.summary.config.memorySizeMB > 0 else 0
# ディスク情報
if vm.summary.storage:
vm_info.update({
'vm.provisioned_storage_gb': round(vm.summary.storage.committed / (1024**3), 2),
'vm.used_storage_gb': round(vm.summary.storage.uncommitted / (1024**3), 2) if vm.summary.storage.uncommitted else 0
})
# ホスト情報
if vm.runtime.host:
vm_info['vm.host_name'] = vm.runtime.host.name
vm_metrics.append(vm_info)
vm_view.Destroy()
return vm_metrics
except Exception as e:
print(f"❌ Failed to collect VM metrics: {e}")
return []
def send_to_newrelic(self, metrics_data):
"""メトリクスをNew Relicに送信"""
if not metrics_data:
return
try:
headers = {
'Content-Type': 'application/json',
'X-Insert-Key': self.newrelic_insert_key
}
# バッチで送信(100件ずつ)
batch_size = 100
for i in range(0, len(metrics_data), batch_size):
batch = metrics_data[i:i+batch_size]
response = requests.post(
self.insights_api,
headers=headers,
json=batch,
timeout=30
)
if response.status_code == 200:
print(f"✅ Sent {len(batch)} metrics to New Relic")
else:
print(f"❌ Failed to send metrics: {response.status_code}")
except Exception as e:
print(f"❌ Failed to send metrics to New Relic: {e}")
def run_monitoring(self):
"""メイン監視処理"""
print("🚀 Starting VMware Enterprise Monitoring")
print(f"📅 Timestamp: {datetime.now(timezone.utc).isoformat()}")
if not self.connect():
return False
try:
# 各種メトリクス収集
print("📊 Collecting cluster metrics...")
cluster_metrics = self.collect_cluster_metrics()
print("🖥️ Collecting host metrics...")
host_metrics = self.collect_host_metrics()
print("💻 Collecting VM metrics...")
vm_metrics = self.collect_vm_metrics()
# メトリクス送信
all_metrics = cluster_metrics + host_metrics + vm_metrics
if all_metrics:
print(f"📤 Sending {len(all_metrics)} metrics to New Relic...")
self.send_to_newrelic(all_metrics)
print("✅ VMware monitoring completed successfully")
else:
print("⚠️ No metrics collected")
return True
except Exception as e:
print(f"❌ Monitoring failed: {e}")
return False
finally:
self.disconnect()
# メイン実行
if __name__ == "__main__":
import os
# 環境変数から設定取得
VCENTER_HOST = os.environ.get('VCENTER_HOST', 'vcenter.company.com')
VCENTER_USER = os.environ.get('VCENTER_USER', '[email protected]')
VCENTER_PASS = os.environ.get('VCENTER_PASS', '')
NEWRELIC_INSERT_KEY = os.environ.get('NEWRELIC_INSERT_KEY', '')
NEWRELIC_ACCOUNT_ID = os.environ.get('NEWRELIC_ACCOUNT_ID', '')
if not all([VCENTER_PASS, NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID]):
print("❌ Required environment variables not set")
exit(1)
monitor = VMwareMonitor(
VCENTER_HOST, VCENTER_USER, VCENTER_PASS,
NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID
)
success = monitor.run_monitoring()
exit(0 if success else 1)
💻 Hyper-V環境監視
⚙️ Hyper-V PowerShell監視
powershell
# Hyper-V エンタープライズ監視スクリプト
# C:\Scripts\Enterprise-HyperV-Monitor.ps1
param(
[Parameter(Mandatory=$true)]
[string]$NewRelicInsertKey,
[Parameter(Mandatory=$true)]
[string]$NewRelicAccountId,
[string]$Environment = "production"
)
# 設定
$InsightsAPI = "https://insights-collector.newrelic.com/v1/accounts/$NewRelicAccountId/events"
$Hostname = $env:COMPUTERNAME
# Hyper-V ホスト情報収集
function Collect-HyperVHostMetrics {
Write-Host "=== Collecting Hyper-V Host Metrics ===" -ForegroundColor Green
try {
# Hyper-V機能確認
$HyperVFeature = Get-WindowsFeature -Name Hyper-V -ErrorAction SilentlyContinue
if (-not ($HyperVFeature -and $HyperVFeature.InstallState -eq 'Installed')) {
Write-Warning "Hyper-V role not installed"
return
}
# ホストリソース情報
$VMHost = Get-VMHost
$HostProcessor = Get-WmiObject Win32_Processor | Select-Object -First 1
$HostMemory = Get-WmiObject Win32_ComputerSystem
# 仮想マシン統計
$AllVMs = Get-VM
$RunningVMs = $AllVMs | Where-Object {$_.State -eq 'Running'}
$StoppedVMs = $AllVMs | Where-Object {$_.State -eq 'Off'}
$PausedVMs = $AllVMs | Where-Object {$_.State -eq 'Paused'}
# 仮想スイッチ統計
$VirtualSwitches = Get-VMSwitch
$ExternalSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'External'}
$InternalSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'Internal'}
$PrivateSwitches = $VirtualSwitches | Where-Object {$_.SwitchType -eq 'Private'}
# リソースプール統計
$ProcessorPools = Get-VMResourcePool -ResourcePoolType Processor
$MemoryPools = Get-VMResourcePool -ResourcePoolType Memory
$HostData = @{
eventType = "HyperVHostMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"host.hyperv_version" = $VMHost.Version
"host.virtual_hard_disk_path" = $VMHost.VirtualHardDiskPath
"host.virtual_machine_path" = $VMHost.VirtualMachinePath
"host.processor_cores" = $HostProcessor.NumberOfCores
"host.logical_processors" = $HostProcessor.NumberOfLogicalProcessors
"host.total_memory_gb" = [math]::Round($HostMemory.TotalPhysicalMemory / 1GB, 2)
"vms.total" = $AllVMs.Count
"vms.running" = $RunningVMs.Count
"vms.stopped" = $StoppedVMs.Count
"vms.paused" = $PausedVMs.Count
"switches.total" = $VirtualSwitches.Count
"switches.external" = $ExternalSwitches.Count
"switches.internal" = $InternalSwitches.Count
"switches.private" = $PrivateSwitches.Count
"resource_pools.processor" = $ProcessorPools.Count
"resource_pools.memory" = $MemoryPools.Count
}
Send-MetricsToNewRelic -Data $HostData
}
catch {
Write-Error "Failed to collect Hyper-V host metrics: $($_.Exception.Message)"
}
}
# 仮想マシン詳細情報収集
function Collect-HyperVVMMetrics {
Write-Host "=== Collecting Hyper-V VM Metrics ===" -ForegroundColor Green
try {
$VMs = Get-VM
foreach ($VM in $VMs) {
# 基本VM情報
$VMData = @{
eventType = "HyperVVMMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"vm.name" = $VM.Name
"vm.state" = $VM.State
"vm.status" = $VM.Status
"vm.generation" = $VM.Generation
"vm.version" = $VM.Version
"vm.cpu_count" = $VM.ProcessorCount
"vm.memory_assigned_mb" = $VM.MemoryAssigned / 1MB
"vm.memory_startup_mb" = $VM.MemoryStartup / 1MB
"vm.dynamic_memory_enabled" = $VM.DynamicMemoryEnabled
"vm.uptime_seconds" = $VM.Uptime.TotalSeconds
}
# 動的メモリ設定
if ($VM.DynamicMemoryEnabled) {
$VMData["vm.memory_minimum_mb"] = $VM.MemoryMinimum / 1MB
$VMData["vm.memory_maximum_mb"] = $VM.MemoryMaximum / 1MB
}
# パフォーマンス統計(実行中の場合)
if ($VM.State -eq 'Running') {
try {
# CPU使用率
$VMProcessor = Get-Counter "\Hyper-V Hypervisor Virtual Processor(*)\% Guest Run Time" |
Where-Object {$_.CounterSamples.InstanceName -like "*$($VM.Name)*"} |
Select-Object -First 1
if ($VMProcessor) {
$VMData["vm.cpu_usage_percent"] = [math]::Round($VMProcessor.CounterSamples.CookedValue, 2)
}
# メモリ圧力
$MemoryPressure = Get-Counter "\Hyper-V Dynamic Memory VM(*)\Current Pressure" |
Where-Object {$_.CounterSamples.InstanceName -eq $VM.Name} |
Select-Object -First 1
if ($MemoryPressure) {
$VMData["vm.memory_pressure"] = [math]::Round($MemoryPressure.CounterSamples.CookedValue, 2)
}
}
catch {
Write-Warning "Failed to collect performance data for VM: $($VM.Name)"
}
}
# ネットワークアダプター情報
$NetworkAdapters = Get-VMNetworkAdapter -VM $VM
$VMData["vm.network_adapters"] = $NetworkAdapters.Count
# ハードディスク情報
$HardDrives = Get-VMHardDiskDrive -VM $VM
$VMData["vm.hard_drives"] = $HardDrives.Count
# 統合サービス状態
$IntegrationServices = Get-VMIntegrationService -VM $VM
$EnabledServices = ($IntegrationServices | Where-Object {$_.Enabled}).Count
$VMData["vm.integration_services_enabled"] = $EnabledServices
$VMData["vm.integration_services_total"] = $IntegrationServices.Count
Send-MetricsToNewRelic -Data $VMData
}
}
catch {
Write-Error "Failed to collect VM metrics: $($_.Exception.Message)"
}
}
# 仮想スイッチ詳細監視
function Collect-HyperVSwitchMetrics {
Write-Host "=== Collecting Hyper-V Switch Metrics ===" -ForegroundColor Green
try {
$Switches = Get-VMSwitch
foreach ($Switch in $Switches) {
$SwitchData = @{
eventType = "HyperVSwitchMetrics"
timestamp = [int64](([datetime]::UtcNow) - (Get-Date "1/1/1970")).TotalSeconds
hostname = $Hostname
environment = $Environment
"switch.name" = $Switch.Name
"switch.type" = $Switch.SwitchType
"switch.allow_management_os" = $Switch.AllowManagementOS
"switch.embedded_teaming_enabled" = $Switch.EmbeddedTeamingEnabled
"switch.iov_enabled" = $Switch.IovEnabled
"switch.packet_direct_enabled" = $Switch.PacketDirectEnabled
}
# 外部スイッチの場合、物理アダプター情報
if ($Switch.SwitchType -eq 'External') {
$NetAdapter = Get-NetAdapter | Where-Object {$_.InterfaceDescription -eq $Switch.NetAdapterInterfaceDescription}
if ($NetAdapter) {
$SwitchData["switch.physical_adapter"] = $NetAdapter.Name
$SwitchData["switch.link_speed_gbps"] = $NetAdapter.LinkSpeed / 1000000000
$SwitchData["switch.adapter_status"] = $NetAdapter.Status
}
}
# 接続されたVM数
$ConnectedVMs = Get-VMNetworkAdapter | Where-Object {$_.SwitchName -eq $Switch.Name}
$SwitchData["switch.connected_vms"] = $ConnectedVMs.Count
Send-MetricsToNewRelic -Data $SwitchData
}
}
catch {
Write-Error "Failed to collect switch metrics: $($_.Exception.Message)"
}
}
# New Relicへの送信関数
function Send-MetricsToNewRelic {
param([hashtable]$Data)
try {
$JsonData = $Data | ConvertTo-Json -Compress
$Body = "[$JsonData]"
$Headers = @{
'Content-Type' = 'application/json'
'X-Insert-Key' = $NewRelicInsertKey
}
Invoke-RestMethod -Uri $InsightsAPI -Method POST -Headers $Headers -Body $Body
Write-Host "✅ Metrics sent for $($Data.eventType): $($Data."vm.name" ?? $Data."switch.name" ?? "Host")" -ForegroundColor Green
}
catch {
Write-Error "Failed to send metrics to New Relic: $($_.Exception.Message)"
}
}
# メイン実行
function Main {
Write-Host "🚀 Starting Hyper-V Enterprise Monitoring for $Hostname" -ForegroundColor Cyan
Write-Host "📅 Timestamp: $(Get-Date)" -ForegroundColor Cyan
# 各種メトリクス収集
Collect-HyperVHostMetrics
Collect-HyperVVMMetrics
Collect-HyperVSwitchMetrics
Write-Host "✅ Hyper-V monitoring completed" -ForegroundColor Cyan
}
# 実行
Main
☁️ マルチクラウド統合監視
🔧 AWS クラウド統合
⚙️ AWS CloudFormation テンプレート
yaml
# AWS New Relic 統合 CloudFormation テンプレート
# aws-newrelic-integration.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'New Relic Infrastructure AWS Integration Setup'
Parameters:
NewRelicAccountId:
Type: String
Description: 'Your New Relic Account ID'
ExternalId:
Type: String
Description: 'External ID for New Relic (Your Account ID)'
Environment:
Type: String
Default: 'production'
AllowedValues: ['production', 'staging', 'development']
Resources:
# New Relic Integration Role
NewRelicInfrastructureRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: !Sub 'NewRelic-Infrastructure-Role-${Environment}'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
AWS: 'arn:aws:iam::754728514883:root' # New Relic AWS Account
Action: 'sts:AssumeRole'
Condition:
StringEquals:
'sts:ExternalId': !Ref ExternalId
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/ReadOnlyAccess'
Policies:
- PolicyName: 'NewRelicBudgetPolicy'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 'budgets:ViewBudget'
- 'budgets:ViewBudgets'
Resource: '*'
Tags:
- Key: Environment
Value: !Ref Environment
- Key: Purpose
Value: 'NewRelic-Monitoring'
- Key: Team
Value: 'Infrastructure'
# カスタム IAM Policy(エンタープライズ権限)
NewRelicEnhancedPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: 'NewRelicEnhancedPermissions'
PolicyDocument:
Version: '2012-10-17'
Statement:
# EC2 詳細権限
- Effect: Allow
Action:
- 'ec2:DescribeInstances'
- 'ec2:DescribeInstanceStatus'
- 'ec2:DescribeInstanceAttribute'
- 'ec2:DescribeVolumes'
- 'ec2:DescribeVolumeStatus'
- 'ec2:DescribeVolumeAttribute'
- 'ec2:DescribeSnapshots'
- 'ec2:DescribeImages'
- 'ec2:DescribeSecurityGroups'
- 'ec2:DescribeNetworkInterfaces'
- 'ec2:DescribeVpcs'
- 'ec2:DescribeSubnets'
- 'ec2:DescribeRouteTables'
- 'ec2:DescribeInternetGateways'
- 'ec2:DescribeNatGateways'
- 'ec2:DescribeReservedInstances'
- 'ec2:DescribeSpotInstanceRequests'
Resource: '*'
# RDS 詳細権限
- Effect: Allow
Action:
- 'rds:DescribeDBInstances'
- 'rds:DescribeDBClusters'
- 'rds:DescribeDBSubnetGroups'
- 'rds:DescribeDBParameterGroups'
- 'rds:DescribeDBClusterParameterGroups'
- 'rds:DescribeDBSnapshots'
- 'rds:DescribeDBClusterSnapshots'
- 'rds:DescribeEvents'
- 'rds:DescribeEventSubscriptions'
- 'rds:DescribeDBLogFiles'
- 'rds:DownloadDBLogFilePortion'
Resource: '*'
# ELB/ALB 詳細権限
- Effect: Allow
Action:
- 'elasticloadbalancing:DescribeLoadBalancers'
- 'elasticloadbalancing:DescribeTargetGroups'
- 'elasticloadbalancing:DescribeTargetHealth'
- 'elasticloadbalancing:DescribeListeners'
- 'elasticloadbalancing:DescribeRules'
- 'elasticloadbalancing:DescribeSSLPolicies'
- 'elasticloadbalancing:DescribeTags'
Resource: '*'
# Lambda 詳細権限
- Effect: Allow
Action:
- 'lambda:GetFunction'
- 'lambda:GetFunctionConfiguration'
- 'lambda:GetPolicy'
- 'lambda:ListFunctions'
- 'lambda:ListEventSourceMappings'
- 'lambda:ListTags'
- 'lambda:GetEventSourceMapping'
Resource: '*'
# CloudWatch 拡張権限
- Effect: Allow
Action:
- 'cloudwatch:GetMetricStatistics'
- 'cloudwatch:GetMetricData'
- 'cloudwatch:ListMetrics'
- 'cloudwatch:DescribeAlarms'
- 'cloudwatch:DescribeAlarmsForMetric'
- 'logs:DescribeLogGroups'
- 'logs:DescribeLogStreams'
- 'logs:GetLogEvents'
Resource: '*'
# Auto Scaling 権限
- Effect: Allow
Action:
- 'autoscaling:DescribeAutoScalingGroups'
- 'autoscaling:DescribeAutoScalingInstances'
- 'autoscaling:DescribeLaunchConfigurations'
- 'autoscaling:DescribePolicies'
- 'autoscaling:DescribeScalingActivities'
Resource: '*'
# ECS/Fargate 権限
- Effect: Allow
Action:
- 'ecs:DescribeClusters'
- 'ecs:DescribeServices'
- 'ecs:DescribeTasks'
- 'ecs:DescribeTaskDefinition'
- 'ecs:ListClusters'
- 'ecs:ListServices'
- 'ecs:ListTasks'
Resource: '*'
# S3 権限
- Effect: Allow
Action:
- 's3:GetBucketLocation'
- 's3:GetBucketNotification'
- 's3:GetBucketVersioning'
- 's3:GetBucketWebsite'
- 's3:ListAllMyBuckets'
- 's3:GetBucketTagging'
- 's3:GetBucketLogging'
- 's3:GetBucketCORS'
- 's3:GetBucketPolicy'
- 's3:GetBucketPolicyStatus'
Resource: '*'
# Cost Explorer & Billing
- Effect: Allow
Action:
- 'ce:GetUsageAndCosts'
- 'ce:GetReservationCoverage'
- 'ce:GetReservationPurchaseRecommendation'
- 'ce:GetReservationUtilization'
- 'ce:ListCostCategoryDefinitions'
- 'aws-portal:ViewBilling'
- 'aws-portal:ViewUsage'
Resource: '*'
# Organizations (マルチアカウント環境用)
- Effect: Allow
Action:
- 'organizations:DescribeOrganization'
- 'organizations:ListAccounts'
- 'organizations:ListRoots'
- 'organizations:ListOrganizationalUnitsForParent'
- 'organizations:DescribeAccount'
Resource: '*'
Roles:
- !Ref NewRelicInfrastructureRole
# SNS Topic for New Relic Notifications
NewRelicNotificationTopic:
Type: 'AWS::SNS::Topic'
Properties:
TopicName: !Sub 'NewRelic-Infrastructure-Notifications-${Environment}'
DisplayName: 'New Relic Infrastructure Notifications'
# CloudWatch Alarms for New Relic Integration Health
NewRelicIntegrationHealthAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmName: !Sub 'NewRelic-Integration-Health-${Environment}'
AlarmDescription: 'Monitor New Relic integration health'
MetricName: 'AssumeRoleFailures'
Namespace: 'AWS/STS'
Statistic: Sum
Period: 300
EvaluationPeriods: 2
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
TreatMissingData: notBreaching
AlarmActions:
- !Ref NewRelicNotificationTopic
Outputs:
RoleArn:
Description: 'ARN of the New Relic Infrastructure Role'
Value: !GetAtt NewRelicInfrastructureRole.Arn
Export:
Name: !Sub '${AWS::StackName}-RoleArn'
ExternalId:
Description: 'External ID for New Relic Integration'
Value: !Ref ExternalId
Export:
Name: !Sub '${AWS::StackName}-ExternalId'
NotificationTopicArn:
Description: 'SNS Topic ARN for New Relic notifications'
Value: !Ref NewRelicNotificationTopic
Export:
Name: !Sub '${AWS::StackName}-NotificationTopic'
📊 AWS コスト監視スクリプト
python
#!/usr/bin/env python3
"""
AWS コスト・使用量監視スクリプト
New Relic Insights への送信
"""
import boto3
import requests
import json
from datetime import datetime, timedelta
import os
from decimal import Decimal
class AWSCostMonitor:
def __init__(self, newrelic_insert_key, newrelic_account_id, aws_profile=None):
self.newrelic_insert_key = newrelic_insert_key
self.newrelic_account_id = newrelic_account_id
self.insights_api = f"https://insights-collector.newrelic.com/v1/accounts/{newrelic_account_id}/events"
# AWS Session
if aws_profile:
self.session = boto3.Session(profile_name=aws_profile)
else:
self.session = boto3.Session()
self.ce_client = self.session.client('ce') # Cost Explorer
self.organizations_client = None
# Organizations クライアント(マルチアカウント環境用)
try:
self.organizations_client = self.session.client('organizations')
except Exception:
print("Organizations service not available - single account mode")
def get_account_info(self):
"""アカウント情報取得"""
try:
sts_client = self.session.client('sts')
identity = sts_client.get_caller_identity()
account_info = {
'account_id': identity['Account'],
'user_id': identity['UserId'],
'arn': identity['Arn']
}
# Organizations情報
if self.organizations_client:
try:
org = self.organizations_client.describe_organization()
account_info['organization_id'] = org['Organization']['Id']
account_info['master_account_id'] = org['Organization']['MasterAccountId']
except Exception:
pass
return account_info
except Exception as e:
print(f"Failed to get account info: {e}")
return {}
def get_cost_and_usage(self, days=7):
"""コストと使用量データを取得"""
try:
end_date = datetime.now().date()
start_date = end_date - timedelta(days=days)
# 日別コストデータ
response = self.ce_client.get_cost_and_usage(
TimePeriod={
'Start': start_date.strftime('%Y-%m-%d'),
'End': end_date.strftime('%Y-%m-%d')
},
Granularity='DAILY',
Metrics=['BlendedCost', 'UsageQuantity'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'SERVICE'},
]
)
cost_metrics = []
for result in response['ResultsByTime']:
date = result['TimePeriod']['Start']
total_cost = Decimal('0')
for group in result['Groups']:
service_name = group['Keys'][0]
cost = Decimal(group['Metrics']['BlendedCost']['Amount'])
usage = Decimal(group['Metrics']['UsageQuantity']['Amount'])
total_cost += cost
if cost > 0: # コストが発生しているサービスのみ
cost_metrics.append({
'eventType': 'AWSCostMetrics',
'timestamp': int(datetime.fromisoformat(date).timestamp()),
'date': date,
'service': service_name,
'cost_usd': float(cost),
'usage_quantity': float(usage),
'currency': 'USD'
})
# 日別総コスト
cost_metrics.append({
'eventType': 'AWSCostDaily',
'timestamp': int(datetime.fromisoformat(date).timestamp()),
'date': date,
'total_cost_usd': float(total_cost),
'currency': 'USD'
})
return cost_metrics
except Exception as e:
print(f"Failed to get cost data: {e}")
return []
def get_reservation_utilization(self):
"""リザーブドインスタンス使用率取得"""
try:
end_date = datetime.now().date()
start_date = end_date - timedelta(days=30)
response = self.ce_client.get_reservation_utilization(
TimePeriod={
'Start': start_date.strftime('%Y-%m-%d'),
'End': end_date.strftime('%Y-%m-%d')
},
Granularity='MONTHLY'
)
reservation_metrics = []
for result in response['UtilizationsByTime']:
total = result['Total']
reservation_metrics.append({
'eventType': 'AWSReservationUtilization',
'timestamp': int(datetime.now().timestamp()),
'time_period_start': result['TimePeriod']['Start'],
'time_period_end': result['TimePeriod']['End'],
'utilization_percentage': float(total['UtilizationPercentage']),
'purchased_hours': float(total['PurchasedHours']),
'used_hours': float(total['UsedHours']),
'unused_hours': float(total['UnusedHours']),
'total_actual_hours': float(total['TotalActualHours']),
'net_ri_savings': float(total['NetRISavings']),
'on_demand_cost_of_ri_hours_used': float(total['OnDemandCostOfRIHoursUsed']),
'realized_savings': float(total['RealizedSavings'])
})
return reservation_metrics
except Exception as e:
print(f"Failed to get reservation utilization: {e}")
return []
def get_cost_forecast(self):
"""コスト予測取得"""
try:
start_date = datetime.now().date()
end_date = start_date + timedelta(days=30)
response = self.ce_client.get_cost_forecast(
TimePeriod={
'Start': start_date.strftime('%Y-%m-%d'),
'End': end_date.strftime('%Y-%m-%d')
},
Metric='BLENDED_COST',
Granularity='MONTHLY'
)
forecast_metrics = []
total = response['Total']
forecast_metrics.append({
'eventType': 'AWSCostForecast',
'timestamp': int(datetime.now().timestamp()),
'forecast_period_start': start_date.strftime('%Y-%m-%d'),
'forecast_period_end': end_date.strftime('%Y-%m-%d'),
'forecasted_cost_usd': float(total['Amount']),
'currency': total['Unit'],
'confidence_level': 'MEDIUM' # AWS default
})
return forecast_metrics
except Exception as e:
print(f"Failed to get cost forecast: {e}")
return []
def get_rightsizing_recommendations(self):
"""EC2適正サイズ推奨取得"""
try:
response = self.ce_client.get_rightsizing_recommendation(
Configuration={
'BenefitsConsidered': True,
'RecommendationTarget': 'SAME_INSTANCE_FAMILY'
}
)
rightsizing_metrics = []
# サマリー情報
summary = response.get('Summary', {})
rightsizing_metrics.append({
'eventType': 'AWSRightsizingSummary',
'timestamp': int(datetime.now().timestamp()),
'total_recommendation_count': summary.get('TotalRecommendationCount', 0),
'estimated_total_monthly_savings_usd': float(summary.get('EstimatedTotalMonthlySavingsAmount', 0)),
'savings_currency': summary.get('SavingsCurrency', 'USD'),
'savings_percentage': float(summary.get('SavingsPercentage', 0))
})
# 個別推奨事項
for rec in response.get('RightsizingRecommendations', [])[:10]: # 最大10件
rightsizing_metrics.append({
'eventType': 'AWSRightsizingRecommendation',
'timestamp': int(datetime.now().timestamp()),
'account_id': rec.get('AccountId', ''),
'instance_id': rec.get('CurrentInstance', {}).get('ResourceId', ''),
'current_instance_type': rec.get('CurrentInstance', {}).get('InstanceType', ''),
'recommendation_type': rec.get('RightsizingType', ''),
'estimated_monthly_savings_usd': float(rec.get('EstimatedMonthlySavings', 0)),
'recommendation_source': 'cost_explorer'
})
return rightsizing_metrics
except Exception as e:
print(f"Failed to get rightsizing recommendations: {e}")
return []
def send_to_newrelic(self, metrics_data):
"""メトリクスをNew Relicに送信"""
if not metrics_data:
print("No metrics to send")
return
try:
headers = {
'Content-Type': 'application/json',
'X-Insert-Key': self.newrelic_insert_key
}
# アカウント情報を各メトリクスに追加
account_info = self.get_account_info()
for metric in metrics_data:
metric.update(account_info)
metric['environment'] = os.environ.get('ENVIRONMENT', 'production')
# バッチ送信(100件ずつ)
batch_size = 100
for i in range(0, len(metrics_data), batch_size):
batch = metrics_data[i:i+batch_size]
response = requests.post(
self.insights_api,
headers=headers,
json=batch,
timeout=30
)
if response.status_code == 200:
print(f"✅ Sent {len(batch)} cost metrics to New Relic")
else:
print(f"❌ Failed to send metrics: {response.status_code} - {response.text}")
except Exception as e:
print(f"❌ Failed to send metrics to New Relic: {e}")
def run_cost_monitoring(self):
"""メインのコスト監視処理"""
print("🚀 Starting AWS Cost Monitoring")
print(f"📅 Timestamp: {datetime.now().isoformat()}")
all_metrics = []
# 各種コストデータ収集
print("💰 Collecting cost and usage data...")
cost_metrics = self.get_cost_and_usage(days=7)
all_metrics.extend(cost_metrics)
print("📊 Collecting reservation utilization...")
reservation_metrics = self.get_reservation_utilization()
all_metrics.extend(reservation_metrics)
print("🔮 Collecting cost forecast...")
forecast_metrics = self.get_cost_forecast()
all_metrics.extend(forecast_metrics)
print("⚡ Collecting rightsizing recommendations...")
rightsizing_metrics = self.get_rightsizing_recommendations()
all_metrics.extend(rightsizing_metrics)
# New Relicに送信
if all_metrics:
print(f"📤 Sending {len(all_metrics)} cost metrics to New Relic...")
self.send_to_newrelic(all_metrics)
print("✅ AWS cost monitoring completed successfully")
else:
print("⚠️ No cost metrics collected")
# メイン実行
if __name__ == "__main__":
# 環境変数から設定取得
NEWRELIC_INSERT_KEY = os.environ.get('NEWRELIC_INSERT_KEY', '')
NEWRELIC_ACCOUNT_ID = os.environ.get('NEWRELIC_ACCOUNT_ID', '')
AWS_PROFILE = os.environ.get('AWS_PROFILE', None)
if not all([NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID]):
print("❌ Required environment variables not set")
print("Please set: NEWRELIC_INSERT_KEY, NEWRELIC_ACCOUNT_ID")
exit(1)
monitor = AWSCostMonitor(
NEWRELIC_INSERT_KEY,
NEWRELIC_ACCOUNT_ID,
AWS_PROFILE
)
monitor.run_cost_monitoring()
✅ 4.2セクション完了チェック
🎯 学習目標達成確認
本セクションを完了した時点で、以下ができるようになっているかチェックしてください:
🖥️ 物理・仮想サーバー監視
- [ ] Linux/Windows サーバーのエンタープライズ設定ができる
- [ ] VMware vSphere環境の包括的監視を実装できる
- [ ] Hyper-V環境の詳細監視を設定できる
- [ ] セキュリティ・コンプライアンス監視を追加できる
☁️ クラウド統合監視
- [ ] AWS CloudFormationでの統合設定ができる
- [ ] マルチアカウント環境での監視を構築できる
- [ ] コスト監視と最適化推奨を実装できる
- [ ] 予測分析とアラート設定ができる
🏢 エンタープライズ機能
- [ ] 大規模環境での監視アーキテクチャを設計できる
- [ ] コンプライアンス要件への対応ができる
- [ ] コスト効率化の分析と実装ができる
- [ ] 組織横断での監視体制を構築できる
🚀 次のステップ
サーバー・クラウド監視をマスターしたら、次のセクションに進みましょう:
- 4.3 コンテナ・Kubernetes環境 - モダンなコンテナ環境の監視
📖 セクション内ナビゲーション
🔗 第4章内リンク
- 🏠 第4章メイン - 章全体の概要
- 🔍 4.1 Infrastructure監視基礎 - 前のセクション
- 🐳 4.3 コンテナ・Kubernetes - 次のセクション
- 🔒 4.4 セキュリティ監視 - セキュリティ強化
- 🤖 4.5 自動化・IaC - 運用自動化
- 📊 4.6 運用戦略 - エンタープライズ運用
📚 関連章リンク
- 第3章:New Relic機能 - プラットフォーム機能の理解
- 第5章:New Relic APM - アプリケーション監視
🎯 次のステップ: 4.3 コンテナ・Kubernetes環境で、モダンなコンテナ化環境の監視手法を習得しましょう!