Table of Contents

Developer Workflow: Enable Stateful Applications on Kubernetes (201)

Agile Stacks Updated by Agile Stacks

Developer Workflow: Enable Stateful Applications on Kubernetes

The goal of this tutorial is to provide automation for stateful applications on Kubernetes. In the previous tutorial (101), you have created and deployed a Python Flask application on Kubernetes. We used Skaffold and Hub CLI to configure, deploy, and debug an application on Kubernetes. In this tutorial you will learn about developer workflow for building apps using a relational database for their backend.

Overview

In the previous tutorial you have created a stateless Python Flask application. It was serving randomly picked words from an array of strings. This array was "hard-coded" inside the application. In this tutorial you will enhance the application to work with stateful services, such as PostgreSQL database. In addition to configuring a database to run on Kubernetes, developers often ask the following questions: how do I develop locally, how should I manage database connections, and how to implement configuration management and database schema upgrades?

To make stateful applications more robust and implement DevOps best practices, you will need to split the problem into four simple steps:

  1. Deploy a stack with a relational database such as PostgreSQL
  2. Discover database parameters and create a databased connection
  3. Transform the application from stateless to stateful
  4. Implement database deployment and migration scripts

Deploy a stack with PostgreSQL database

This tutorial assumes that you have PostgreSQL database already installed. You can use one of the demo clusters available in Agile Stacks demo environment, or you can bring your own cluster and deploy on overlay stack with PostgreSQL database. For highly scalable production deployments, you may prefer to create a stack that uses a managed database service such as Amazon RDS. Please refer to the following tutorial for details about deploying PostgreSQL on Kubernetes: Infrastructure Workflow - Creating Stack Templates

Implement Database Configuration Files

When migrating applications to Kubernetes, one of the recommended application-level changes to implement is extracting application configuration from application code. Database configuration consists of any information that varies across deployments and environments, such as database endpoint URLs, credentials, and various connection parameters. For example, if you have two environments, say staging and production, and each contains a separate database, your application should not have the database endpoint and credentials hard-coded in the application code. Instead, it should be stored in configuration files and environment variables.

By extracting configuration values from your application code, your application becomes more secure, portable, and maintainable.

  1. Navigate to the Python Flask application directory

Open Terminal window and change working directory to the Python Flask application you created in previous tutorial 101.

$ cd stack-apps/apps/python-flask 
  1. Login to the SuperHub with CLI
If you don't have SuperHub CLI installed, please follow steps from the tutorial 101.
$ hub login -u john.doe@donotreply.com
Password: *******

# eval this in your shell
$ export HUB_API=https://api.agilestacks.io
$ export HUB_TOKEN=5c1a78bc..........49ccf45d28

You need to export HUB_TOKEN environment variable to provide authentication for Hub CLI. This step will configure Hub CLI and generate your application configuration for the specified Kubernetes cluster.

  1. Discover database configuration information

The hub ls command lets you list and filter Kubernetes clusters in your environment that meet certain criteria. In this case, we are looking for Kubernetes clusters that provide the PostgreSQL database service. The following command shows a Kubernetes cluster (cluster1.bluesky.superhub.io) and a stack with PostgreSQL component deployed as an overlay stack on top of the Kubernetes cluster (database@... is the name of overlay stack)

$ hub ls -c 'postgresql'
database@cluster1.bluesky.dev.superhub.io
If hub ls does not show any matching clusters, you need to create a new Kubernetes cluster with PostgreSQL database. You can deploy a new cluster via Hub CLI as described in Deploy Kubernetes with Hub CLI, or via Control Plane UI as described in Creating Stacks tutorial.

You can browse configuration parameters for postgresql with following command:

$ hub ls -c 'postgresql' | hub show
{
"components": [
"postgresql"
],
"environment": "TEST01",
"parameters": {
"cloud.availabilityZone": "eu-central-1a",
"cloud.kind": "aws",
"cloud.region": "eu-central-1",
...
}

There are many different parameters that describe the full configuration state of the cluster. However, if you are interested in just database-specific configuration parameters you can limit the set of parameters with the following command.

$ hub ls -c 'postgresql' | hub show -c 'postgresql'
{
"components": [
"postgresql"
],
"outputs": {
"component.postgresql.host": "postgresql-database....cluster.local",
"component.postgresql.namespace": "postgresql",
"component.postgresql.port": "5432",
"component.postgresql.user": "postgres"
},
...
}

Now lets enhance this command and add parameters filter using JQ query language syntax. The following command allows to extract a Database host from stack configuration parameters.

$ hub ls -c 'postgresql' | hub show -c 'postgresql' -q '.outputs.component.postgresql.host'

"postgresql-database-postgresql.postgresql.svc.cluster.local"

This technique is handy for writing application configuration and automation scripts. Next, you will update the code generator with database specific details. The code generator is configured in <my-app-root>/.hub directory:

├─ python-flask/                    # root directory of the application
├── .hub # superhub config, related to code generation
│   ├── env # `hub configure` will download environment configuration here
│   │   ├── configure # a shell script that is triggered by `hub configure` command to provide configuration for the app
│   │   ├── <stack-name>.env # environment file (fetched by `hub configure`). It has all necessary configuration for the application.
│   │   ├── kubeconfig.<stack>.yaml # kubeconfig of the stack (fetched by `hub configure`)
│   ├── templates # Provide a base for the jsonnet manifests generation. Added to the JSONNET_PATH
│   │   ├── deployment.json # JSON or YAML format accepted for base
│   │   ├── deployment.yaml # Yaml format will be converted to JSON before passing it to JSONNET
│   │   ├── skaffold.json # JSON or YAML format accepted for base
│   │   ├── skaffold.yaml # Yaml format will be converted to JSON before passing it to JSONNET
│   │   ├── vscode-launch.json
│   │   ├── vscode-settings.json
│   │   └── vscode-tasks.json
│   ├── Makefile # Triggers `make generate` routines
│   ├── dockerconfig.json
│   └── skaffold.yaml.jsonnet # JSonnet script: the output of this script will be have the same file name in the application directory but with `json` extension
├── k8s # Auto-generated Kubernetes deployment files
│   ├── deployment.yaml
│   ├── ingress.yaml
│   ├── kaniko-secret.yaml
│   └── service.yaml
├── src # Application source code
│   ├── app.py
│   └── requirements.txt
├── .env # Simlink to the active environment configuration file
├── .gitignore # gitignore file. Allows to excludes generated files from Git commit
├── Dockerfile # Docker file defines application container image
└── skaffold.yaml # Auto-generated Skaffold configuration
  1. Modify .hub/env/configure

In this step you will add database specific parameters to the application configuration.

Add the following lines that are highlighted in bold to file .hub/env/configure . The resulting file should look like this:

#!/bin/bash
export JQ_ARGS="-rMc"
uuid=$(uuidgen | tr '[:upper:]' '[:lower:]' | tr -d - | cut -c-4)
HUB_APP_NAME="${HUB_APP_NAME:-rubik-$uuid}"

DB_HOST=`hub show -c postgresql -q '.outputs.component.postgresql.host'`
DB_PORT=`hub show -c postgresql -q '.outputs.component.postgresql.port'`
DB_USER=`hub show -c postgresql -q '.parameters.component.postgresql.user'`
DB_NAME=`hub show -c postgresql -q '.parameters.component.postgresql.database'`
DB_PASS=postgres


DOCKER_HOST=`hub show -q '.parameters.component.docker.auth.host'`
DOCKER_USER=`hub show -q '.parameters.component.docker.auth.basic.username'`
DOCKER_PASS=`hub show -q '.parameters.component.docker.auth.basic.password' | hub ext show-secret`

{
which docker \
&& docker login "${DOCKER_HOST}" -u "${DOCKER_USER}" -p "${DOCKER_PASS}"
} > /dev/null 2>&1

TMPL="#!/bin/shexport
HUB_APP_NAME=${HUB_APP_NAME}
export HUB_DOMAIN_NAME=`hub show -q '.parameters.dns.domain'`
export HUB_INGRESS_HOST=`hub show -q '.parameters.component.ingress.fqdn'`
export HUB_DOCKER_HOST=${DOCKER_HOST}
export HUB_DOCKER_USER=${DOCKER_USER}
export HUB_DOCKER_PASS=${DOCKER_PASS}
export KUBECONFIG=$KUBECONFIG
export HUB_DOTENV=$HUB_DOTENV

export SKAFFOLD_DEFAULT_REPO=`hub show -q '.parameters.component.docker.auth.host'`/libraryexport SKAFFOLD_PROFILE=incluster
export SKAFFOLD_NAMESPACE=default
export SKAFFOLD_CACHE_ARTIFACTS=true
export HUB_DATABASE_URI=postgres://$DB_USER:$DB_PASS@$DB_HOST:$DB_PORT/$DB_NAME"

echo "$TMPL"
Note: command hub ext show-secret can decode secrets stored in the SuperHub vault, to avoid using plain text secrets in the source code.

Database configuration parameters are discovered from the deployed cluster, and stored in environment variables to avoid hard-coding it the application.

  1. Configure your application to point to the selected Kubernetes cluster
$ hub ls
cluster1.bluesky.dev.superhub.io
database@cluster1.bluesky.dev.superhub.io

$ hub configure -f -s cluster1.bluesky.superhub.io

The hub-configure command is used to specify the cluster name where the application is going to be deployed. Please replace cluster1.bluesky.superhub.io with the name of your cluster. When you execute hub-configure command, SuperHub will save Kubernetes cluster configuration file (kubeconfig) in the following directory: .hub/env.  Also, it will generate configuration settings for .env file. Your .env file should look similar to the following example:

#!/bin/bash
export HUB_APP_NAME=rubik-f7af
export HUB_DOMAIN_NAME=cluster1.bluesky.dev.superhub.io
export HUB_INGRESS_HOST=app.cluster1.bluesky.dev.superhub.io
export HUB_DOCKER_HOST=cluster1-harbor.app.cluster1.bluesky.superhub.io
export HUB_DOCKER_USER=admin
export HUB_DOCKER_PASS=****
export KUBECONFIG=.hub/env/kubeconfig.cluster1.bluesky.superhub.io.yaml
export SKAFFOLD_CACHE_ARTIFACTS=true
...
export HUB_DATABASE_URI=postgres://postgres:**@postgresql.svc.cluster.local:5432/postgres

Execute the script to apply environment settings to the current shell:

$ source .env
  1. Add a database connection URL to the deployment manifest. Code generation of Kubernetes configuration files is implemented with JSonnet, which provides a powerful templating DSL. To add database configuration, modify the deployment manifest file .hub/k8s/deployment.yaml.jsonnet
local k8s = import 'k8s.libsonnet';
local template = import 'deployment.json';
local app = std.extVar("HUB_APP_NAME");
local appLabels = { app: app,};
local result = template + {
metadata+: {
name: app,
},
spec+: {
selector+: {
matchLabels+: appLabels,
},
template+: {
metadata+: {
labels+: appLabels,
},
spec+: {
containers: [
container + {
env: [
k8s.envVar("HUB_DATABASE_URI")
],
}
for container in super.containers
],
initContainers: [{
image: "app",
name: "init-db",
command: ["flask", "db", "upgrade"],
env: [
k8s.envVar("HUB_DATABASE_URI")
]
}],
},
},
},
};

std.prune(result)
Download a copy of deployment.yaml.jsonnet script here

In this step, you have defined a new HUB_DATABASE_URI environment variable for the application container. We also added a new "init container" section to automate database initialization with Flask-Migrate.

Click here to learn more about k8s.envVar()
  1. Next, you will generate the application configuration files. You need to generate new configuration files each time you change your cluster.
$ make -C ".hub" clean generate

> Deleting: ../k8s/deployment.yaml
> Deleting: ../k8s/ingress.yaml
> ...
> Deleting: ../.vscode/settings.json
> Deleting: ../.vscode/tasks.json

>
Generated: ../k8s/deployment.yaml
> Generated: ../k8s/ingress.yaml
> ...
> Generated: ../.vscode/settings.json
> Generated: ../.vscode/tasks.json
You should be able to see a container variable HUB_DATABASE_URI propagated in the Kubernetes deployment manifest k8s/deployment.yaml

Enable database workflows with SqlAlchemy

In the 101 tutorial you have created a Python Flask application as a stateless app. For simplicity, the application hard-coded words (values) in a static array. In this tutorial section, you will update the application to read values from a database.

The following Python modules allow to efficiently work with databases.

Modify src/requirements.txt file and add the following modules:

psycopg2-binary   # postgresql driver
flask_sqlalchemy # orm framework for flask
flask_migrate # database migration scripts
Download complete requirements.txt file

In the next steps you will modify the application to add support for databases:

  1. Introduce a new script src/models.py with Python to SQL ORM mappings.
  2. Update src/configure.py with database settings
  3. Update src/app.py and introduce SQLAlchemy and Flask Migrate to the application
  4. Update src/routes.py and add support for database model
  5. Create database migration scripts with flask db .. CLI

Create a database model

SQLAlchemy is a library that facilitates communication between Python programs and databases. Most frequently, this library is used as an Object Relational Mapper (ORM) tool that translates Python classes to tables on relational databases and automatically converts function calls to SQL statements. In this tutorial, we are using a very simple database schema, but with real-life applications, ORM really helps to manage database complexity.

Create a new file src/models.py - this Python script will contain DB models for SqlAlchemy. It will create a database with one VARCHAR(32) column and mark this column as a primary key.

from sqlalchemy import Column, Integer, String
from app import db

class Word(db.Model):
__tablename__ = 'words'

value = Column(String(32), primary_key=True)

def __repr__(self):
return self.value
Download complete models.py script

This script defines a database table Words that has a single VARCHAR(32) column. __repr__() function represents a to string conversion. The string representation of a database mapping object will simplify the Flask route response object conversion below.

Update the application configuration

Open the file src/config.py and add the following lines highlighted in bold. To create SQLAlchemy connection string, we reference the environment variable HUB_DATABASE_URI which was defined in the Kubernetes deployment manifest.

from os import environ

class Config(object):
JSON_ADD_STATUS = environ.get('JSON_ADD_STATUS', False)
JSON_STATUS_FIELD_NAME = environ.get('JSON_ADD_STATUS', 'code')
FLASK_DEBUG = int(environ.get("FLASK_DEBUG", 0))
FLASK_RUN_RELOAD = int(environ.get("FLASK_RUN_RELOAD", 0))
PTVSD_ENABLED = (FLASK_DEBUG == 1 and FLASK_RUN_RELOAD == 0)
SQLALCHEMY_DATABASE_URI = environ['HUB_DATABASE_URI']
SQLALCHEMY_TRACK_MODIFICATIONS = False
Download complete config.py script

To learn more about SQL Alchemy configuration, follow the link here.

Introduce SQL Alchemy and Flask Migrate

Next, add the following lines of code highlighted in bold to the file src/app.py Note that import of data model Word is at the end of the file - this is a workaround to avoid a circular dependency.

from flask import Flask
from os import environ
from config import Config
from flask_sqlalchemy import SQLAlchemy

application = Flask(__name__)
application.config.from_object(Config)

db = SQLAlchemy(application)

if application.config.get('PTVSD_ENABLED'):
import ptvsd
application.logger.info("Starting `ptvs` on port: 3000")
ptvsd.enable_attach(address=('0.0.0.0', 3000))

# Workaround for circular dependency issue
import routes
from models import Word


if __name__ == "__main__":
application.run()
Download complete app.py script

Next you will add a database model to Flask routes. Modify src/routes.py file

from app import application
...
from models import Word

WORDS = Word.query.all() # select * from words

def get_words(howmany=1):
"""
returns list of random words
"""
return sample(WORDS, howmany)
...

@application.route("/gimme")
@application.route("/gimme/<int:howmany>")
def gimme(howmany=1):
"""
returns random list of words with the size defined in parameter howmany
"""
dbarray = get_words(howmany)
result = [w.value for w in dbarray]
return json_response(data=result)
Download complete routes.py script

The code is now ready to be tested. However, if you run the application it will be database aware, but the database is still empty. Let's add some test data with database migration scripts.

Database Migration Scripts

As you change your microservice’s code, you will eventually need to also change your database’s schema, so that it matches at all times. One simple way to achieve this is via database migration scripts. Based on best practices for micro-service architecture, each microservice should be fully responsible for it's own database schema upgrades and test data. Database migration frameworks will keep track of the database schema revisions and will be able to upgrade to the latest version if needed. In some cases the database schema has to be downgraded - to support deployment rollback when something goes wrong.

We will use a tool library called Alembic, which is part of Flask-Migration. To handle database migrations, add the following lines to the application file src/app.py

from flask import Flask
from os import environ
from config import Config
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

application = Flask(__name__)
application.config.from_object(Config)

db = SQLAlchemy(application)
migrate = Migrate(application, db)
...
Download complete app.py script

With the above application you can create a migration repository with the following CLI commands. In the next step, you will setup a temporary network tunnel to generate the database migration scripts. port-forward feature of kubectl simply tunnels the traffic from a specified port at your local host machine to the specified port on the remote database server. Kubectl creates a temporary gateway between your local port and the Kubernetes cluster. In your terminal window, change the current working directory to <app-root>/src

$ cd src/
$ export FLASK_APP=app.py
$ export FLASK_ENV=development
$ export FLASK_SKIP_DOTENV=1 # to avoid confusion with ../.env file
$ pip install flask-migrate flask-json uptime --upgrade
$ flask db --help
$ kubectl -n postgresql port-forward svc/postgresql-database-postgresql 5432 &
$ export HUB_DATABASE_URI=postgres://postgres:postgres@localhost:5432/postgres

To execute database migration scripts you will need the following steps:

  1. Initialize the database migration repository
  2. Create a database migration revision that will install a database schema ( datadef)
  3. Create a database migration revision with initial data set for words database table (referred to as dataset)

In your terminal window, change the current working directory to <app-root>/src

$ cd src 
$ flask db init
$ flask db revision -m datadef
$ flask db revision -m dataset

You should see a new directory migrations/ with the contents similar to

├── migrations                            # template for new revisions
│   ├── versions # directory with all migration revisions
│   │   ├── 9e48bb6d4347_datadef.py # first database revision
│   │   └── c649e4ff4d71_dataset.py # second database revision
│   ├── README
│   ├── alembic.ini # settings such as logging etc
│   ├── env.py # config for environment
│   └── script.py.mako
Note that revision hashes will be different for your app

Next, modify the first database revision migrations/versions/*_datadef.py This script will create a database schema using the mapping imported from the Flask app.

from alembic import op
import sqlalchemy as sa

revision = 'e2c2664f5a85' # will be different for your app
down_revision = None # this is the first revision
branch_labels = None
depends_on = None

from app import db

def upgrade():
db.create_all()
db.session.commit()


def downgrade():
db.drop_all()
db.session.commit()
Download the complete example of datadef.py script

The second revision (dataset.py) defines a dataset available for Flask app. The following code creates an initial list of values to populate the database.

from alembic import op
import sqlalchemy as sa

revision = 'fac5656a6672' # will be different for your app
down_revision = 'e2c2664f5a85' # a reference to the previous revision
branch_labels = None
depends_on = None

from app import db, Word # pylint: disable=import-error

DATA = [
Word(value="helm"),
Word(value="kustomize"),
Word(value="kubernetes"),
Word(value="aws"),
Word(value="gcp"),
Word(value="azure"),
Word(value="terraform"),
Word(value="docker"),
Word(value="shell"),
Word(value="vault"),
Word(value="istio"),
]


def upgrade():
db.session.add_all(DATA)
db.session.commit()


def downgrade():
for data in DATA:
rec = db.session.query(Word).filter_by(value=data.value)
if rec:
rec.delete()

db.session.commit()
Download the complete example for dataset.py here

Now you are ready to run the application. Prior to running Skaffold, please make sure that you are inside the application directory which contains fileskaffold.yaml. If your current working directory is src then you need to go one level up. Before deploying the application, confirm that database configuration has been properly propagated to Kubernetes deployment manifest. Note that this operation may take a minute to complete.

$ cd ..
$
skaffold render
kind: Deployment
...
spec:
containers:
- name: main

env:
- name: HUB_DATABASE_URI
value: postgres://postgres....svc.cluster.local:5432/postgres
...
initContainers:
- name: init-db
command: ["flask", "db", "upgrade"]
env:
- name: HUB_DATABASE_URI
value: postgres://postgres...svc.cluster.local:5432/postgres

In the beginning of this tutorial we have updated a JSonnet .hub/k8s/deployment.yaml.jsonnet script and generated deployment manifests. The environment variable HUB_DATABASE_URI points to the database server from the application container called main . There is also an init container that uses the same application image. It will execute database migration scripts before deploying the application. You are now ready to run the application.

$ skaffold dev
Listing files to watch...
- rubik
Generating tags...
- rubik -> .../library/rubik:20200318-160628
...
Watching for changes...

INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade e2c2664f5a85, datadef
INFO [alembic.runtime.migration] Running upgrade e2c2664f5a85 -> fac5656a6672, dataset

* Serving Flask app "app.py"

Based on the logs, init container executed a database migration prior to deploying the application.

Access the deployed application

There are several ways to access and test the deployed application from the browser.

Via vscode Open tasks shortcut (CMD+Shift+P or F1), type Tasks: Run Task and then select: Ingress: Open in browser

Via ingress document - defined in file: k8s/ingress.yaml

Via command line with following bash script

$ kubectl get ingress --all-namespaces

NAMESPACE NAME HOSTS
default rubik rubik.app.cluster1.bluesky.superhub.io
harbor cluster1-harbor-harbor-ingress cluster1-harbor.app.cluster1.bluesky.superhub.io
...

Based on the output shown above, you can access the deployed application using the following URL: https://rubik.app.cluster1.bluesky.superhub.io. Use get ingress to find your application URL.

The application looks exactly the same as in the previous tutorial. However, the application data is now queried from the Postgres database.

Conclusion

You have implemented a stateful application on Kubernetes. While we used PostgreSQL in this tutorial, you can similarly use RDS or other databases. Configuration management is very important for stateful applications that need to be deployed in multiple environments such as Dev, Test, and Production. Modern tools such as skaffold, helm or kustomize help with extracting application configuration from application code. Agile Stacks SuperHub allow us to discover configuration from deployed Kubernetes clusters and automates configuration of environment-specific variables for the application.

How did we do?

Creating Components from CloudFormation Templates: SageMaker Pipeline

Contact