Creating Components from Operators: Spark on Kubernetes

Igor Mameshin Updated by Igor Mameshin

A custom component is a component that is created and maintained by you, the user. These components can be integrated into any Stack Template in the AgileStacks SuperHub. You can manage, configure, and implement change control for multiple operators using the SuperHub. In this tutorial, you will create Apache Spark component based on Spark Operator from OperatorHub: is a collection of community maintained Kubernetes Operators to make it easier to find and share Operators for Kubernetes.

Apache Spark Cluster component uses an operator to deploy the cluster. The operator lifecycle is controlled by OLM (that must be present on the platform). While it is possible to deploy Apache Spark Operator as is, in this tutorial you will deploy it as part of the stack. Deploying Apache Spark as a custom component AgileStacks Platform allows you to customize it, make it part of a stack deployment unit, and integrate it with the following components:

  • Kubernetes
  • Traefik
  • Let's Encrypt (recommended, but not mandatory)
  • OLM (Operator Lifecycle Manager)
  • OKD Console (recommended, but not mandatory)
  • Dex/Okta for Single-Sign-On

Component registration on the Control Plane

To create a custom component through the Control Plane follow the steps below:

  1. Open Components > Create
  2. Enter component Name, Title and choose a Category the component belongs to. Users are allowed to create their categories
  3. If this is your first component, a new Git repository must be created for a source code of the component. To avoid issues with permissions, currently we only support Agile Stacks hosted Git Service (repositories from Github or other external sources can't be added). Click + Create new. Git repository name is derived from component's name and can't be changed. NOTE: When a new Git repository is created, Agile Stacks generates and pushes Custom Component skeleton (example) to the repository. It's a simple Kubernetes service with UI, that contains a few pods and an ingress.
  4. If you created Custom Components previously, you can choose one of the existing Git repositories (a single repository can contain multiple custom components). In this case you must specify a directory name in GIT sub path field within the existing Git repository that will contain source code of the component.
  5. Configure who can access the component in Permissions section. Members of the groups who have Admin permissions can modify component meta information, modify access rights to the component, see and choose the component on the Happy Meal, and deploy it. The groups with Write permissions can see and choose the component on the Happy Meal, and deploy it. The groups with Read permissions can see the component on the Happy Meal, however can't deploy it.
  6. Optionally, you can provide a short component description in the Brief field and a long description in the Description field. In addition, Component Logo (valid URL), Version, Maturity, License and Tags can be entered on this page.
  7. Click Save button.

By default Custom Component repository contains a source code of an example service. Control-plane users are required to replace the example with the source code of their component. See Create and deploy custom Spark Operator section below for more details about editing the source code of custom components.

Create and deploy Spark Operator component

To deploy an example custom component Apache Spark:

  1. Clone the custom component repository.
  2. Change to the directory where you cloned the component and delete all existing example files.
  3. For a complete example of custom component deployment files that provision Apache Spark cluster using an operator, check out Apache Spark Example repository. The component deploys operator and enhances it with useful Agile Stacks features for ingress managment, TLS and SSO. We provided this repository so you can copy and paste component deployment files instead of writing these files yourself. The repository contains the following files/directories:
    • hub-component.yaml Automation Hub deployment manifest, that describes the main building blocks of a custom component (parameters, tasks, template file locations, etc.). More details are available in the documentation
    • templates directory. The default template engine is curly for ${}, the others are mustache and commentary, respectively. For more information see templating documentation. The directory contains the following templates:
    • Makefile Custom component implementation:
      • It creates an operator subscription (using the template, see above) that links the operator with the operator catalog and keeps it up to date when new versions are released.
      • Deploys Apache Spark cluster custom resource definition that provisions the cluster and waits until all pods are up
      • Creates an ingress to expose Spark cluster dashboard to the world
      • Runs an example task to verify that the cluster is up and running
  4. Commit & push component files to the git repository.
  5. Create a new overlay stack template, that contains the custom component. For more information, please see Create an overlay template with Custom Component section below.

Create and deploy an overlay template with Custom Component

In the previous step we have created a Custom Component. Now the component can be selected from the Catalog and added to a Stack Template:

  1. Open Stack Templates > Create
  2. Enter template name for example Spark, choose Stack type Overlay
  3. The Custom Component created during the previous step should be visible and available for selection. Select the component. NOTE: By default, components with alpha maturity are not shown. To see alpha components on the Happy Meal, adjust filter settings. You can use search filter to quickly find a component by typing its name.
  1. Click Save for later
  2. Click Deploy
  3. Select On platform and choose Platform Stack where you are going to deploy the template. Platform stack is the name of existing Kubernetes cluster where you are going to deploy Spark. Type Name, such as spark in this example.
  4. Click Deploy

Test the deployed Spark Cluster

In the this step you will verify the component output to make sure it was properly deployed.

  1. For this component, you have already created a test as part of the deployment Makefile - examine run-example-task
  2. View component output after the component is deployed

If Spark cluster was deployed successfully, you will see the following output in the stack deployment log.

20/02/10 17:01:36 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.278 s 
20/02/10 17:01:36 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.398094 s Pi is roughly 3.147755738778694
  1. Navigate to Stacks > List > stack name > Details.
  2. Open Apache Spark Dashboard by clicking on the Spark Operator button highlighted above. You may need to enter a user name and password, since we added Dex authentication to Spark. You should be able to see the following dashboard page to confirm that Spark cluster was deployed successfully. Congratulations, you have deployed Spark on Kubernetes!
Spark Cluster on Kubernetes

How did we do?

Machine Learning Workflow - Creating an ML Pipeline

Creating Components from CloudFormation Templates: SageMaker Pipeline