Securely set up RDS Aurora Serverless and use custom resources in CloudFormation to create additional databases and users across multiple stages or even services.
Why would you even want to share a Database?
Everything that allocates RAM and CPU on AWS is quiet expensive. Although it is called Aurora Serverless, it still allocates a server that just auto scales very well.
Aurora is capable of scaling down to 0 compute, but the wake up process is way too slow to use this in production, and it’s also quiet annoying in dev environment.
Then, if you use microservices, it is even harder to justify an entire database server for a service which idles most of the time and/or only stores a few hundred records.
MySQL has a solution build in: databases, which act like namespaces. You can use a single MySQL server for multiple purposes by just creating multiple databases and users. That sounds obvious but managing it is traditionally an annoying manual process and AWS has no way of declaratively doing so.
So let’s do something about it.
I want all infrastructure declaration in CloudFormation. Every manual step is 1 step too much.
Basic structure
I’m going to use serverless to deploy the infrastructure, but most of my examples will be pure CloudFormation so you should be able to get along if you use other tools.
I’m going to use 2 kinds of cloudformation stacks
- A shared stack (
serverless-shared.yml
) which i’m going to deploy once per AWS account. It contains the Database Server, the VPC configuration and a few Lambda functions that help us in creating additional resources. - The application stack (
serverless.yml
) which I use as an example service and defines a MySQL database and potentially multiple MySQL users with different privileges.
Prepare serverless
You can skip this if you don’t use serverless, but here is the first part of my serverless-shared.yml
.
Note the variableSyntax
definition. This allows me to use the !Sub
function in a serverless file,
which would normally collide with the serverless variable syntax.
# serverless-shared.yml
service: shared
provider:
name: aws
region: eu-west-1
stage: ${opt:stage, 'global'} # generic stage name to indicate that this is not targeted at dev/prod environments
# Allow using CloudFormation variable syntax without changing the serverless syntax
# The differentiating factor is the capitalization of the first character
# Upper case character means CloudFormation syntax, eg: ${AWS::Region}, ${DatabaseSecret}
# Everything else means serverless syntax, eg: ${ssm:...}, ${self:...}
# https://www.serverless.com/framework/docs/providers/aws/guide/variables#using-custom-variable-syntax
variableSyntax: "\\${((?![A-Z])[ ~:a-zA-Z0-9._@'\",\\-\\/\\(\\)]+?)}"
resources:
# CloudFormation template here
Define a VPC
All resources in AWS have to be deployed to a VPC. The Aurora CloudFormation template says that it’s optional but in reality: it’ll just deploy the database into the default VPC and the default VPC can’t be directly addressed in CloudFormation so that is a no-go.
So let’s define the simplest VPC in CloudFormation possible.
# serverless-shared.yml
Resources:
# the simplest possible VPC with 3 Availability Zones
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.192.0.0/16
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- {Key: Name, Value: !Ref AWS::StackName} # the console will display this as the name
Subnet1:
Type: AWS::EC2::Subnet
Properties: {CidrBlock: 10.192.0.0/20, AvailabilityZone: !Select [0, !GetAZs ''], VpcId: !Ref VPC}
Subnet2:
Type: AWS::EC2::Subnet
Properties: {CidrBlock: 10.192.16.0/20, AvailabilityZone: !Select [1, !GetAZs ''], VpcId: !Ref VPC}
Subnet3:
Type: AWS::EC2::Subnet
Properties: {CidrBlock: 10.192.32.0/20, AvailabilityZone: !Select [2, !GetAZs ''], VpcId: !Ref VPC}
# subnet and security groups for the database later
DatabaseSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: Database
SubnetIds: [!Ref Subnet1, !Ref Subnet2, !Ref Subnet3]
DatabaseSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Database
VpcId: !Ref VPC
SecurityGroupIngress:
- {CidrIp: !GetAtt VPC.CidrBlock, FromPort: 3306, ToPort: 3306, IpProtocol: tcp}
This is a simple VPC with 3 subnets to cover 3 Availability-Zones.
Don’t worry about evenly covering Availability-Zones, AWS will randomly allocate them to your AWS Account, so it is perfectly fine to always use the first few Zones. But you definitely want at least 3 to take advantage of automatic failovers.
I then also define the subnet group that we later use for the actual database.
Define the Database Server
# serverless-shared.yml
Resources:
# [...]
DatabaseSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: !Sub '/${AWS::StackName}/database/root'
GenerateSecretString:
SecretStringTemplate: '{"username": "root"}'
GenerateStringKey: "password"
ExcludeCharacters: '"@/\'
DatabaseServer:
Type: AWS::RDS::DBCluster
Properties:
Engine: aurora-mysql # aurora = mysql 5.6, aurora-mysql = mysql 5.7
EngineMode: serverless
EnableHttpEndpoint: true
MasterUsername: !Sub '{{resolve:secretsmanager:${DatabaseSecret}:SecretString:username}}'
MasterUserPassword: !Sub '{{resolve:secretsmanager:${DatabaseSecret}:SecretString:password}}'
BackupRetentionPeriod: 10 # days
ScalingConfiguration: {MinCapacity: 1, MaxCapacity: 2, AutoPause: true}
DBSubnetGroupName: !Ref DatabaseSubnetGroup
VpcSecurityGroupIds: [!GetAtt DatabaseSecurityGroup.GroupId]
DatabaseSecretAttachment:
Type: AWS::SecretsManager::SecretTargetAttachment
Properties:
SecretId: !Ref DatabaseSecret
TargetId: !Ref DatabaseServer
TargetType: AWS::RDS::DBCluster
Outputs:
DatabaseServer:
Description: The Database Server Arn
Value: !Sub 'arn:${AWS::Partition}:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DatabaseServer}'
Export: {Name: !Sub '${AWS::StackName}-database-server'}
Let’s go though it.
- Define a Secret to store the credentials of the master user and generate it’s password.
- Define the Database Server. An Aurora serverless in mysql 5.7 mode.
Aurora usually brings new features to 5.6 first, but 5.7 allows longer usernames (32 chars vs 16)
and longer indexes which results in no error when trying to index a
VARCHAR(255)
field in utf8mb4 mode. Those 2 reasons resolve some headaches, so I’d stick with 5.7 unless you have a good reason not to. Also JSON columns. - I attach the secret to the Database which will add a few fields like
host
to it which is nice, because it means that one only needs to read the secret and has all information to connect to the database.
The secret content will look like this in the end:
{
"password": "RandomPassword123!",
"engine": "mysql",
"port": 3306,
"host": "stack-name-cluster-name.cluster-abcdabcdabcda.eu-west-1.rds.amazonaws.com",
"username": "root"
}
Note that the secret manager costs ~40 cents per month for managing this secret. We, however, need the password packaged as a secret in order to use the rds-data api later. You could also implement password rotation here if you like.
Implement custom resources
This is the most interesting part.
CloudFormation only sets up the initial database server but all further steps normally have to be done manually, but we can do better and create custom resources in CloudFormation to create databases and users using CloudFormation.
Create a database custom resource
Let’s look at the serverless.yml
of the consuming service first.
# serverless.yml
Resources:
Database:
Type: Custom::Database
Properties:
ServiceToken: !ImportValue 'shared-global-database-service-token'
Name: !Ref AWS::StackName
This is how I want to define additional database in different stacks. I just define how I want them to be named and that’s it. In this case, I just use the stack name itself.
Custom resources always have a ServiceToken
which is a reference to the lambda that handles them,
so let’s implement that resource in the shared stack:
# serverless-shares.yml
Transform:
- AWS::Serverless-2016-10-31 # for inline lambda function
Resources:
DatabaseResourceLambda:
Type: AWS::Serverless::Function
Properties:
Description: !Sub 'Provides a custom CloudFormation resource to create MySQL databases in ${DatabaseServer}.'
FunctionName: !Sub '${AWS::StackName}-database-resource'
Handler: index.handler
Runtime: nodejs12.x
Role: !GetAtt DatabaseAccessPolicy.Arn # attach role to access the database
Environment:
Variables:
resourceArn: !Sub 'arn:${AWS::Partition}:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DatabaseServer}'
secretArn: !Ref DatabaseSecret
InlineCode: |-
const RDSDataService = new (require('aws-sdk/clients/rdsdataservice'))({apiVersion: '2018-08-01'});
const response = require('cfn-response');
const {resourceArn, secretArn} = process.env;
exports.handler = async function (event, context) {
try {
console.log(event);
const name = event.ResourceProperties.Name;
if (event.RequestType === 'Delete') {
const sql = 'DROP DATABASE IF EXISTS `' + name + '`';
await RDSDataService.executeStatement({resourceArn, secretArn, sql}).promise();
return await response.send(event, context, response.SUCCESS);
}
if (name !== event.PhysicalResourceId) {
const sql = 'CREATE DATABASE `' + name + '`';
await RDSDataService.executeStatement({resourceArn, secretArn, sql}).promise();
}
return await response.send(event, context, response.SUCCESS, {Server: resourceArn}, name);
} catch (error) {
if (/^Communications link failure/.test(error.message)) {
throw error; // let lambda reattempt this action if aurora is paused
}
console.error(error);
return await response.send(event, context, response.FAILED);
}
};
DatabaseAccessPolicy:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub '${AWS::StackName}-${AWS::Region}-database-access-policy'
AssumeRolePolicyDocument: {Version: '2012-10-17', Statement: [{Effect: Allow, Action: sts:AssumeRole, Principal: {Service: [lambda.amazonaws.com]}}]}
Policies:
- PolicyName: !Sub "${AWS::StackName}-database-access"
PolicyDocument:
Version: '2012-10-17'
Statement:
- {Effect: Allow, Action: logs:CreateLog*, Resource: !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}*:*'}
- {Effect: Allow, Action: logs:PutLogEvents, Resource: !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}*:*:*'}
- {Effect: Allow, Action: rds-data:ExecuteStatement, Resource: '*'} # this api does not support target resources
- {Effect: Allow, Action: secretsmanager:GetSecretValue, Resource: !Sub 'arn:${AWS::Partition}:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:/${AWS::StackName}/database/*'}
Outputs:
DatabaseServiceToken:
Description: Lambda function that can be used to create a database on a database server in CloudFormation
Value: !GetAtt DatabaseResourceLambda.Arn
Export: {Name: !Sub '${AWS::StackName}-database-service-token'}
I define a lambda function purely in CloudFormation. I could use a serverless function, but I can’t declare their code inline. This function basically has to handle 3 scenarios:
- Create: On which I execute a
CREATE DATABASE
statement.event.PhysicalResourceId
will be undefined in that scenario so the statement is always executed. I return the database name as thePhysicalResourceId
which allows you to use!Ref
to get the name. - Update: Which is the same as Create, but I look at the
event.PhysicalResourceId
and only execute the statement if it does not match. - Delete: Which deletes the resource using
DROP DATABASE IF EXISTS
. This function is also called if thePhysicalResourceId
changed in the “Update complete, cleanup in progress” stage you might have seen in CloudFormation at some point already. This is also why you don’t have to bother with the old database during Update.
There are 2 more important things to note:
- You always want you lambda to respond to resource create requests,
which is why I wrap the entire function in a
try {} catch {}
block. If you don’t do that, you have to wait for CloudFormation to time out, which takes ~30 minutes. It’s also good practice to just log the event so if you are stuck, so you can manually respond using the ResponseURL in the event. - Aurora Serverless might be paused, and that is the one situation where I let the lambda fail in a controlled manner. The CloudFormation invoke will be handled like an event which means Lambda will retry the exectution after some time. If aurora is paused, I just fail and lambda will retry at which point the database is hopefully up and running.
Create a user custom resource
Let’s, again, look at the usage first:
# serverless.yml
Resource:
DatabaseUserSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: !Sub '/shared-global/database/${AWS::StackName}'
GenerateSecretString:
SecretStringTemplate: !Sub '{"username": "${AWS::StackName}"}'
GenerateStringKey: "password"
ExcludeCharacters: '"@/\'
DatabaseUserSecretAttachment:
Type: AWS::SecretsManager::SecretTargetAttachment
Properties:
SecretId: !Ref DatabaseUserSecret
TargetId: !ImportValue 'shared-global-database-server'
TargetType: AWS::RDS::DBCluster
DatabaseUser:
Type: Custom::DatabaseUser
DependsOn: DatabaseUserSecretAttachment
Properties:
ServiceToken: !ImportValue 'shared-global-database-user-service-token'
SecretId: !Ref DatabaseUserSecret
Privileges:
- {Permission: ALL, Database: !Ref Database, Table: '*'}
Ok so there are multiple things going on here.
- Create a secret for our new user, so we have a place to store the password. I use the stack name as a user name here. Note that the username im MySQL (5.7) is limited to 32 characters.
- Attach the secret to the database which will fill in the rest of the properties, just like when we created the root secret.
- Use the custom resource, that I’ll show you later, to create the user with the secret. I also define the users privileges, so I can limit the user to the database we defined earlier. Feel free to create even more restricted user but remeber that any secret in the SecretManager costs 40 cents per month.
So now, let’s define the custom resource in the shared stack:
# serverless-shares.yml
Transform:
- AWS::Serverless-2016-10-31 # for inline lambda function
Resources:
DatabaseUserResourceLambda:
Type: AWS::Serverless::Function
Properties:
Description: !Sub 'Provides a custom CloudFormation resource to create MySQL users in ${DatabaseServer}.'
FunctionName: !Sub '${AWS::StackName}-database-user-resource'
Handler: index.handler
Runtime: nodejs12.x
Role: !GetAtt DatabaseAccessPolicy.Arn # attach role to access the database
Environment:
Variables:
resourceArn: !Sub 'arn:${AWS::Partition}:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DatabaseServer}'
secretArn: !Ref DatabaseSecret
InlineCode: |-
const RDSDataService = new (require('aws-sdk/clients/rdsdataservice'))({apiVersion: '2018-08-01'});
const SecretsManager = new (require('aws-sdk/clients/secretsmanager'))({apiVersion: '2017-10-17'});
const response = require('cfn-response');
const {resourceArn, secretArn} = process.env;
exports.handler = async function (event, context) {
try {
console.log(event);
const userSecretId = event.ResourceProperties.SecretId;
const secret = await SecretsManager.getSecretValue({SecretId: userSecretId}).promise();
const {username, password} = JSON.parse(secret.SecretString);
if (event.RequestType === 'Delete') {
await executeStatement('DROP USER IF EXISTS :username', {username});
return await response.send(event, context, response.SUCCESS);
}
if (userSecretId !== event.PhysicalResourceId) {
await executeStatement('CREATE USER :username IDENTIFIED BY :password', {username, password});
}
await executeStatement('REVOKE ALL PRIVILEGES, GRANT OPTION FROM :username', {username});
for (const {Permission, Database, Table} of event.ResourceProperties.Privileges) {
await executeStatement("GRANT " + Permission + " ON `" + Database + "`." + Table + " TO :username", {username});
}
return await response.send(event, context, response.SUCCESS, null, userSecretId);
} catch (error) {
if (/^Communications link failure/.test(error.message)) {
throw error; // let lambda reattempt this action if aurora is paused
}
console.error(error);
return await response.send(event, context, response.FAILED);
}
};
function executeStatement(sql, parameters) {
parameters = Object.entries(parameters).map(([name, stringValue]) => ({name, value: {stringValue}}));
return RDSDataService.executeStatement({resourceArn, secretArn, sql, parameters}).promise();
}
Outputs:
DatabaseUserServiceToken:
Description: Lambda function that can be used to create a user on the database server in CloudFormation
Value: !GetAtt DatabaseUserResourceLambda.Arn
Export: {Name: !Sub '${AWS::StackName}-database-user-service-token'}
This is similar to the DatabaseResourceLambda
from earlier. It even reuses the same IAMRole.
Let’s get through it.
- Create: (where
event.PhysicalResourceId
is not defined) will create the user and then execute aGRANT
per privilege. - Update: will create a new user when the secret has changed and then execute a
GRANT
per privilege. - Delete: Will simply drop the user if it exists.
bring it together
So that is a lot of YAML so how can we use it all.
Let me show with more pseudo-yaml™.
# serverless.yml
provider:
# [...]
environment:
# the specifics differ depending on how you access the database
DATABASE_RESOURCE: !GetAtt Database.Server
DATABASE_SECRET: !Ref DatabaseUser
DATABASE_NAME: !Ref Database
resources:
Resources:
Database: # [...]
DatabaseUser: # [...]
And that’s all you need in your service. Now you can deploy as many stages or services as you want, using the same database server.
You now should idealy use the rds-data api to work with the database. But that’s not always an option, so you can also define the VPC subnets as output in the shared stack and import them into your services.
But, If you can get some time, you should check if your database abstraction has an rds-data driver or, if not, how hard it would be to create one. I created one for php myself because not having to use a VPC in my main application (and a costly a NAT Gateway) is awesome. You don’t even have to deal with how to receive the Secret and you get free connection pool management too.
Bonus: schedule the AutoPause feature
Aurora Serverless has an AutoPause feature which allows it to scale to 0 capacity when not used. There is a massive downside though: it takes up to around a minute to wake up again. This is unacceptable for production sites and very annoying during development as well, but an unpaused Aurora Serverless costs ~$50 a month so it is worth investigating some alternatives.
What we can do is disabling AutoPause during work hours. Let’s check:
1 capacity unit costs $0,07 per hour (in the eu, 1 cent cheaper in US)
- 24 hours/day ~ 7 days/week = $50,40 / 30 days
- 12 hours/day ~ 7 days/week = $25,20 / 30 days
- 10 hours/day ~ 5 days/week = $15,00 / 30 days ~ average
You see, just running it for half a day is a lot cheaper and limiting it even further will bring it in throwing distance to the smallest RDS instances possible while still having the Multi-AZ failover and performance of an Aurora Database, and possibly swallowing the price of a NAT Gateway if you can use the rds-data api instead of TCP connections.
Depending on what you are doing, it might even be fine for internal management tools that are only used during work hours. However, usually that’s not a good experience but cutting costs in half for development is still nice.
Remember, the database is still available when paused, it’ll just take a minute to start so it’s not like you application will be inaccessable outside your work hours.
And also, it’s Aurora Serverless. It can scale up automatically without you having to set up scaling rules.
So how do we implement that? With a Lambda function in our shared CloudFormation template, of course.
# serverless-shared.yml
Transform:
- AWS::Serverless-2016-10-31 # for inline lambda function
Resources:
DatabaseScalingSchedule:
Type: AWS::Serverless::Function
Properties:
Description: !Sub 'Changes the Scaling Configuration of the database ${DatabaseServer}.'
FunctionName: !Sub '${AWS::StackName}-database-scaling'
Handler: index.handler
Runtime: nodejs12.x
Role: !GetAtt DatabaseAccessPolicy.Arn
Environment: {Variables: {SERVER: !Ref DatabaseServer}}
Events:
Worktime: {Type: Schedule, Properties: {Schedule: 'cron(0 7 ? * MON-FRI *)', Input: '{"AutoPause": false, "MinCapacity": 1, "MaxCapacity": 4}'}}
Freetime: {Type: Schedule, Properties: {Schedule: 'cron(0 17 ? * MON-FRI *)', Input: '{"AutoPause": true, "MinCapacity": 1, "MaxCapacity": 2}'}}
InlineCode: |-
const RDS = new (require('aws-sdk/clients/rds'))({apiVersion: '2014-10-31'});
exports.handler = async function (event) {
return await RDS.modifyDBCluster({DBClusterIdentifier: process.env.SERVER, ScalingConfiguration: event}).promise();
};
DatabaseAccessPolicy:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub '${AWS::StackName}-${AWS::Region}-database-access-policy'
AssumeRolePolicyDocument: {Version: '2012-10-17', Statement: [{Effect: Allow, Action: sts:AssumeRole, Principal: {Service: [lambda.amazonaws.com]}}]}
Policies:
- PolicyName: !Sub "${AWS::StackName}-database-access"
PolicyDocument:
Version: '2012-10-17'
Statement:
# [...]
- {Effect: Allow, Action: rds:ModifyDBCluster, Resource: !Sub 'arn:${AWS::Partition}:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DatabaseServer}'}
Quiet a lot shorter than the lambda functions from earlier, but this one does not require error handling. I reuse the role from before, but I need to allow the ModifyDBCluster call.
I just basically pass the event object as ScalingConfiguration
to modifyDBCluster
.
Then I can just define scheduled events that pass my desired configuration at the desired times.
Working example
I build a demo project that has shows a symfon/PHP project running with this database setup.
Even if you don’t use PHP or Serverless, it might be worth a look to see or even try a working example.
See it here: github.com/Nemo64/serverless-symfony
Changes
- 2020-09-17: I removed the canned password rotation since it requires a VpcEndpoint that costs extra. I don’t want to increase the size of this post so I decided to remove it rather than to extend it.