cron の代わり CloudWatch Events + Lambda + RunCommand でタスク実行を冗長化するCloudFormationテンプレート
前の記事では、実行したいコマンドをCloudWatch Events のターゲットのパラメータとして、JSONで流し込む方式で作りました。
しかしながら、実行するコマンドという危険な内容を外部から受け取る点に、怖さがありました。また、JSONの特殊文字を含むコマンドを指定できない問題がありました。 また、この方法だと複数のタスクを同一のステートマシン/Lambda関数で処理するため、ログが混ざってしまい、デバッグしづらい問題がありました。
そこで、これらの問題を解消するため、次のように変更を加えたCloudFormationテンプレートを作成しました。
- タスクごとに別々のスタックを作成し、タスクごとに別々のステートマシン/Lambda関数で実行する
- コマンドをLambda関数の環境変数として与える
使用方法
事前準備
EC2インスタンスの準備
EC2インスタンスに AWS Systems Manager のための設定が必要です。
SNSトピックの準備
コマンドの成功/失敗を通知するSNSトピックを作成しておきます。
成功時と失敗時で通知先を変えたい場合はそれぞれ作成します。 作成したSNSトピックのARNを控えておきます。(CloudFormationテンプレートのパラメータに使用します)
スタックの作成
CloudFormationスタックを作成します。
Parameters: RuleName: Type: String ScheduleExpression: Type: String EC2InstanceTagName: Type: String Default: Name EC2InstanceTagValue: Type: String Command: Type: String Default: sar Enabled: Type: String Default: "true" AllowedValues: - "true" - "false" TopicCommandExecutionSucceededArn: Type: String TopicCommandExecutionFailedArn: Type: String Conditions: isEnabled: !Equals [ !Ref Enabled, "true" ] Resources: StateMachineExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: - states.amazonaws.com Action: - sts:AssumeRole Path: / Policies: - PolicyName: run-statemachine PolicyDocument: Statement: - Effect: Allow Action: lambda:InvokeFunction Resource: - !GetAtt LambdaSendCommand.Arn - !GetAtt LambdaWaitForCommandExecutions.Arn - Effect: Allow Action: sns:Publish Resource: - !Ref TopicCommandExecutionSucceededArn - !Ref TopicCommandExecutionFailedArn LambdaExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole Path: / ManagedPolicyArns: - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" Policies: - PolicyName: run-command PolicyDocument: Statement: - Effect: Allow Action: - ssm:SendCommand - ssm:ListCommandInvocations - ssm:GetCommandInvocation Resource: "*" StartStateMachineExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: - events.amazonaws.com Action: - "sts:AssumeRole" Path: / Policies: - PolicyName: run-statemachine PolicyDocument: Statement: - Effect: Allow Action: states:StartExecution Resource: !Ref StateMachine LambdaSendCommand: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}-SendCommand Code: ZipFile: |+ const AWS = require('aws-sdk'); const ssm = new AWS.SSM({apiVersion: '2014-11-06'}); const { TagKey, TagValue, Command } = process.env; const debug = (key, object) => { console.log(`DEBUG: ${key}\n`, JSON.stringify(object)); } exports.handler = async (event, context) => { console.log("INFO: request Recieved.\nEvent:\n", JSON.stringify(event)); const sendCommandParams = { DocumentName: 'AWS-RunShellScript', Targets: [ { Key: `tag:${TagKey}`, Values: [TagValue] } ], Parameters: { commands: [Command], executionTimeout: ['3600'] }, MaxConcurrency: '1', MaxErrors: '1', TimeoutSeconds: 3600, }; debug("sendCommandParams", sendCommandParams); const sendCommandResult = await ssm.sendCommand(sendCommandParams).promise(); debug("sendCommandResult", sendCommandResult); const results = { sendCommandParams: sendCommandParams, sendCommandResult: sendCommandResult }; debug("results", results); return results; }; Environment: Variables: Command: !Ref Command TagKey: !Ref EC2InstanceTagName TagValue: !Ref EC2InstanceTagValue Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Runtime: "nodejs12.x" MemorySize: 128 Timeout: 60 LambdaPermissionLambdaSendCommand: Type: AWS::Lambda::Permission Properties: Action: lambda:InvokeFunction FunctionName: !GetAtt LambdaSendCommand.Arn Principal: states.amazonaws.com SourceArn: !Ref StateMachine LambdaWaitForCommandExecutions: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}-WaitForCommandExecutions Code: ZipFile: |+ const AWS = require('aws-sdk'); const ssm = new AWS.SSM({apiVersion: '2014-11-06'}); const debug = (key, object) => { console.log(`DEBUG: ${key}\n`, JSON.stringify(object)); } class CommandNotYetCompleteError extends Error { constructor(message) { super(message); this.name = 'CommandNotYetCompleteError'; } } exports.handler = async (event, context) => { console.log("INFO: request Recieved.\nEvent:\n", JSON.stringify(event)); const { sendCommandParams, sendCommandResult } = event; let commandStatus; const listCommandInvocationsParams = { CommandId: sendCommandResult.Command.CommandId }; debug("listCommandInvocationsParams", listCommandInvocationsParams); const listCommandInvocationsResult = await ssm.listCommandInvocations(listCommandInvocationsParams).promise().catch(e => console.error("CommandInvocations", e)); debug("listCommandInvocationsResult", listCommandInvocationsResult); const getCommandInvocationParams = { CommandId: sendCommandResult.Command.CommandId, InstanceId: listCommandInvocationsResult.CommandInvocations[0].InstanceId, }; debug("getCommandInvocationParams", getCommandInvocationParams); const getCommandInvocationResult = await ssm.getCommandInvocation(getCommandInvocationParams).promise().catch(e => console.error("getCommandInvocation", e)); debug("getCommandInvocationResult", getCommandInvocationResult); if (getCommandInvocationResult) { commandStatus = getCommandInvocationResult.Status; } if (commandStatus !== "Success" && commandStatus !== "Cancelled" && commandStatus !== "TimedOut" && commandStatus !== "Failed") { throw new CommandNotYetCompleteError("Command is not yet complete. Retry"); } const results = { sendCommandParams: sendCommandParams, sendCommandResult: sendCommandResult, getCommandInvocationParams: getCommandInvocationParams, getCommandInvocationResult: getCommandInvocationResult, commandStatus: commandStatus }; debug("results", results); return results; }; Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Runtime: "nodejs12.x" MemorySize: 128 Timeout: 60 LambdaPermissionWaitForCommandExecutions: Type: AWS::Lambda::Permission Properties: Action: lambda:InvokeFunction FunctionName: !GetAtt LambdaWaitForCommandExecutions.Arn Principal: states.amazonaws.com SourceArn: !Ref StateMachine StateMachine: Type: AWS::StepFunctions::StateMachine Properties: StateMachineName: !Sub ${AWS::StackName}-StateMachine DefinitionString: !Sub - |+ { "Comment": "ExecuteScheduleTask", "StartAt": "SendCommand", "States": { "SendCommand": { "Type": "Task", "Resource": "${lambdaSendCommandArn}", "Retry": [ { "ErrorEquals": [ "States.TaskFailed", "States.Timeout" ], "IntervalSeconds": 10, "MaxAttempts": 6, "BackoffRate": 1.0 } ], "Next": "WaitForCommandExecutions" }, "WaitForCommandExecutions": { "Type": "Task", "Resource": "${lambdaWaitForCommandExecutionsArn}", "Retry": [ { "ErrorEquals": [ "CommandNotYetCompleteError" ], "IntervalSeconds": 10, "MaxAttempts": 360, "BackoffRate": 1.0 }, { "ErrorEquals": [ "States.TaskFailed", "States.Timeout" ], "IntervalSeconds": 10, "MaxAttempts": 6, "BackoffRate": 1.0 } ], "Next": "ChoiceCommandStatus" }, "ChoiceCommandStatus": { "Type": "Choice", "Choices": [ { "Variable": "$.commandStatus", "StringEquals": "Success", "Next": "NotifySuccess" } ], "Default": "NotifyFail" }, "NotifySuccess": { "Type": "Task", "Resource": "arn:aws:states:::sns:publish", "Parameters": { "Subject": "Step Functions succeeded", "Message.$":"$", "TopicArn": "${TopicCommandExecutionSucceededArn}" }, "End": true }, "NotifyFail": { "Type": "Task", "Resource": "arn:aws:states:::sns:publish", "Parameters": { "Subject": "Step Functions failed", "Message.$":"$", "TopicArn": "${TopicCommandExecutionFailedArn}" }, "Next": "Fail" }, "Fail": { "Type": "Fail" } } } - lambdaSendCommandArn: !GetAtt LambdaSendCommand.Arn lambdaWaitForCommandExecutionsArn: !GetAtt LambdaWaitForCommandExecutions.Arn RoleArn: !GetAtt StateMachineExecutionRole.Arn Rule: Type: AWS::Events::Rule Properties: Description: !Sub ${RuleName} ScheduleExpression: !Ref ScheduleExpression State: !If [isEnabled, "ENABLED", "DISABLED"] Targets: - Id: StateMachine Arn: !Ref StateMachine RoleArn: !GetAtt StartStateMachineExecutionRole.Arn
パラメータの説明と設定例は以下の通り。
パラメータ | 説明 | 設定例 |
---|---|---|
Command | EC2インスタンスで実行したいシェルスクリプト文字列 | sudo -u apache bash -c 'cd /var/www/cgi-bin/app && /usr/bin/perl ./tools/run-periodic-tasks' |
EC2InstanceTagName | EC2インスタンスを特定するために使用するタグ名 | Name |
EC2InstanceTagValue | EC2インスタンスを特定するために使用する値 | test |
Enabled | このルールが有効か? | true |
RuleName | CloudWatch Events に作成するルール名。わかりやすいものを指定 | run-periodic-tasks |
ScheduleExpression | CRON式またはレート式。 | rate(5 minutes) |
TopicCommandExecutionFailedArn | コマンド失敗時の通知に使うSNSトピックのARN | arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:CommandExecutionFailed |
TopicCommandExecutionSucceededArn | コマンド成功時の通知に使うSNSトピックのARN | arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:CommandExecutionSucceeded |
スタックの作成が完了すると、CloudWatch Events ルールから、Step Functions、Lamba関数、関連のIAMロールなどの一式が揃い、ScheduleExpression に従って実行を開始します。
スタックの変更
スタックの編集でパラメータを変更して実行すればOKです。
スタックの削除
タスク実行が不要になったらスタックを削除すれば、作成したリソースを削除できます。 このスタックではEC2インスタンスやSNSトピックは作成していないため、何も影響を与えません。