Creating Components from Operators: Spark on Kubernetes

Igor Mameshin Updated by Igor Mameshin

A custom component is a component that is created and maintained by you, the user. These components can be integrated into any Stack Template in the AgileStacks SuperHub. You can manage, configure, and implement change control for multiple operators using the SuperHub. In this tutorial, you will create Apache Spark component based on Spark Operator from OperatorHub:

https://operatorhub.io/operator/radanalytics-spark

OperatorHub.io is a collection of community maintained Kubernetes Operators to make it easier to find and share Operators for Kubernetes.

Apache Spark Cluster component uses an operator to deploy the cluster. The operator lifecycle is controlled by OLM (that must be present on the platform). While it is possible to deploy Apache Spark Operator as is, in this tutorial you will deploy it as part of the stack. Deploying Apache Spark as a custom component AgileStacks Platform allows you to customize it, make it part of a stack deployment unit, and integrate it with the following components:

  • Kubernetes
  • Traefik
  • Let's Encrypt (recommended, but not mandatory)
  • OLM (Operator Lifecycle Manager)
  • OKD Console (recommended, but not mandatory)

Component registration on the Control Plane

To create a custom component through the Control Plane follow the steps below:

  1. Open Components > Create
  2. Enter component Name, Title and choose a Category the component belongs to. Users are allowed to create their categories
  3. If this is your first component, a new Git repository must be created for a source code of the component. To avoid issues with permissions, currently we only support Agile Stacks hosted Git Service (repositories from Github or other external sources can't be added). Click + Create new. Git repository name is derived from component's name and can't be changed. NOTE: When a new Git repository is created, Agile Stacks generates and pushes Custom Component skeleton (example) to the repository. It's a simple Kubernetes service with UI, that contains a few pods and an ingress.
  4. If you created Custom Components previously, you can choose one of the existing Git repositories (a single repository can contain multiple custom components). In this case you must specify a directory name in GIT sub path field within the existing Git repository that will contain source code of the component.
  5. Configure who can access the component in Permissions section. Members of the groups who have Admin permissions can modify component meta information, modify access rights to the component, see and choose the component on the Happy Meal, and deploy it. The groups with Write permissions can see and choose the component on the Happy Meal, and deploy it. The groups with Read permissions can see the component on the Happy Meal, however can't deploy it.
  6. Optionally, you can provide a short component description in the Brief field and a long description in the Description field. In addition, Component Logo (valid URL), Version, Maturity, License and Tags can be entered on this page.
  7. Click Save button.

By default Custom Component repository contains a source code of an example service. Control-plane users are required to replace the example with the source code of their component. See Create and deploy custom Spark Operator section below for more details about editing the source code of custom components.

Create and deploy Spark Operator component

To deploy an example custom component Apache Spark:

  1. Clone the custom component repository.
  2. Change to the directory where you cloned the component and delete all existing example files.
  3. The next step is to create a stack component from the operator published on OperatorHub.io:
    https://operatorhub.io/operator/radanalytics-spark

    Click on the Install button to view instructions how to install the operator programmatically. The operator developer recommends to use the following command to install the operator
    kubectl create -f https://operatorhub.io/install/radanalytics-spark.yaml.
    Copy this command to Makefile and parametrize environment specific properties, such as ${component.custom-spark.name}
    Makefile is the main file that implements the complete lifecycle of a component. It specifies all operations for the component: deploy, undeploy, test, etc. For more details about Makefile format, refer to the following section Stacks Under the Hood - Manifests
    Copy 'subscription.yaml' from the Operator repository to create Subscription CRD so that OLM creates an operator in the specified namespace.
  4. For a complete example of component deployment files that are using operators, check out Apache Spark Example repository. We provided this repository so you can cut and paste component deployment files instead of writing these files.
  5. Copy the following files from Apache Spark Example to the repository of your custom component:
    • hub-component.yaml Automation Hub deployment manifest, that describes a Custom Component. More details are available in the commented lines of the example file or in documentation
    • Makefile Custom component implementation as Operator. Kubectl is used to install Apache Spark operator. See commented lines of the Makefile for more information
    • operator-group.yaml.template Spark Operator configuration template. The default template engine is curly for ${}, the others are mustache and commentary, respectively. For more information see templating documentation
    • ingress.yaml.template ingres ....
  6. Commit & push component files to the git repository.
  7. Create a new overlay stack template, that contains the custom component. For more information, please see Create an overlay template with Custom Component section below.

Create and deploy an overlay template with Custom Component

In the previous step we have created a Custom Component. Now the component can be selected from the Catalog and added to a Stack Template:

  1. Open Stack Templates > Create
  2. Enter template name for example Spark, choose Stack type Overlay
  3. The Custom Component created during the previous step should be visible and available for selection. Select the component. NOTE: By default, components with alpha maturity are not shown. To see alpha components on the Happy Meal, adjust filter settings. You can use search filter to quickly find a component by typing its name.
  1. Click Save for later
  2. Click Deploy
  3. Select On platform and choose Platform Stack where you are going to deploy the template. Platform stack is the name of existing Kubernetes cluster where you are going to deploy Spark. Type Name, such as spark in this example.
  4. Click Deploy

Test the deployed Spark Cluster

In the this step you will verify the component output to make sure it was properly deployed.

  1. For this component, you have already created a test as part of the deployment Makefile - examine run-example-task
  2. View component output after the component is deployed

If Spark cluster was deployed successfully, you will see the following output in the stack deployment log.

20/02/10 17:01:36 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.278 s 
20/02/10 17:01:36 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.398094 s Pi is roughly 3.147755738778694
  1. Navigate to Stacks > List > stack name > Details.
  2. Open Apache Spark Dashboard by clicking on the Spark Operator button highlighted above. You may need to enter a user name and password, since we added Dex authentication to Spark. You should be able to see the following dashboard page to confirm that Spark cluster was deployed successfully. Congratulations, you have deployed Spark on Kubernetes!
Spark Cluster on Kubernetes

How did we do?

Creating an ML Pipeline

Creating Components from CloudFormation Templates: SageMaker Pipeline

Contact