Thursday, November 18, 2021

Service to Service call patterns - GKE with Anthos Service Mesh on a single cluster

 This is second in a series of posts exploring service to service call patterns in some of the application runtimes on Google Cloud. The first in the series explored service to service call patterns in GKE

This post will expand on it by adding in a Service Mesh, specifically Anthos Service Mesh, and explore how the service to service patterns change in the presence of a mesh. The service to service call with be across services in a single cluster. The next post will explore services deployed to multiple GKE clusters.


Set-Up

The steps to set-up a GKE cluster and install Anthos service mesh on top of it is described in this document - https://cloud.google.com/service-mesh/docs/unified-install/install, in brief these are the commands that I had to run in my GCP Project to get a cluster running:


If the installation of cluster and the mesh has run through cleanly, a good way to verify the installation is to see if the cluster gets registered as a Anthos managed cluster in the Google Cloud Console.



The services that I will be installing is fairly simple and looks like this:


Using a UI, the caller can make the producer behave in certain ways:
  • Introduce response time delays
  • Respond with certain status codes
This will help check how the mesh environment will behave in the face of these behaviors.


The codebase for the "caller" and "producer" are in this repository - https://github.com/bijukunjummen/sample-service-to-service, there are kubernetes manifests available in the repository to bring up these services.

Behavior 1 - Mutual TLS

The first behavior that I want to see is for the the caller and the producer to verify each others identities by presenting and validating their certificates.

This can be done by adding in a istio DestinationRule for the producer, along these lines:






This also adds in the DestinationRule for the caller, this is because the caller gets the call from the browser via an Ingress Gateway and even this call needs to be authenticated using mtls

Alright now that the set-up in place, the following is what gets captured as the request flows from the Browser to the Ingress Gateway to the Caller to the Producer.


The sign that the mTLS works is seeing the "x-forwarded-client-cert" header, this is in both the Callers headers coming in from Ingress-gateway, and in the "Producers" headers coming in from the Caller.

Behavior 2 - Timeout

The second behavior that I want to explore is the timeouts. A request timeout can be set for the call from the Caller to Producer by creating a Virtual Service for the Producer with the value set, along these lines:



With this configuration in place a request from the caller with a delay of 6 seconds, causes the Mesh to timeout and present an error that looks like this:


The mesh responds with a http status code of 504 with a message of "Upstream timed out". 

Behavior 3 - Circuit Breaker

Circuit breaker is implemented using a Destination Rule resource
Here I have configuration which breaks the circuit if 3 continuous 5XX responses are received from the Producer in a 15 second interval, and then does not make a request for another 15 seconds

With this configuration in place a request with broken circuit looks like this:

The mesh responds with a http status code of 503 and a message of "no healthy upstream"


Conclusion

The neat thing is that in all scenarios so far, the way the Caller calls the Producer remains exactly the same, it is the mesh which injects in the appropriate security controls through mTLS and the resilience of calling service through timeouts and circuit breaker. 

Wednesday, October 27, 2021

Service to Service call patterns in Google Cloud - GKE

This is a series of posts that will explore service to service call patterns in some of the application runtimes in Google Cloud. This specific post will explore GKE without using a service mesh and the next post will explore GKE with Anthos Service Mesh.


Set Up

The set-up is simple, two applications - caller and producer are hosted on the application runtime with caller making a http request to the producer. An additional UI is packaged with the caller that should make it easy to test the different scenarios.


The producer is special, a few faults can be injected into the producers response based on the post body from the caller:

  1. An arbitrary delay
  2. A specific response http status code

These will be used for checking how the runtimes behave under faulty situation.

GKE Autopilot Runtime

The fastest way to get a fully managed Kubernetes cluster in Google Cloud is to spin up a GKE Autopilot cluster. Assuming such a cluster is available, the service to service call pattern is through the abstraction of a Kubernetes service and looks something like this:

A manifest file which enables this is the following:


Once a service resource is created, here called "sample-producer" for instance, a client can call it using the services FQDN - sample-producer.default.svc.cluster.local. In my sample, the caller and the called are in the same namespace, for such cases calling by just the service name is sufficient.

A sample service to service call and its output in a simple UI looks like this:



A few things to see here:
  1. As the request flows from the browser to the caller to the producer, the headers are captured at each stage and presented. There is nothing special with the headers so far, once service meshes come into play they start to get far more interesting.
  2. The delay does not do anything, the browser and the caller end up waiting no matter how high the delay.
  3. Along the same lines, if the producer starts failing, caller continues to send requests down to the service, instead of short circuiting it.

Conclusion

Service to service call in a Kubernetes environment is straightforward with the abstraction of a Kubernetes service resource providing a simple way for clients to reach the instances hosting an application. Layering in a service mesh provides a great way for the service to service calls to be much more resilient without the application explicitly needing to add in libraries to handle request timeouts or faulty upstream services. This will be the topic of the next blog post. 

Thursday, September 30, 2021

Google Cloud Deploy - CD for a Java based project

This is a short write-up on using Google Cloud Deploy for Continuous Deployment of a Java-based project. 

Google Cloud Deploy is a new entrant to the CD space. It facilitates a continuous deployment currently to GKE based targets and in future to other Google Cloud application runtime targets.

Let's start with why such a tool is required, why not an automation tool like Cloud Build or Jenkins. In my mind it comes down to these things:

  1. State - a dedicated CD tool can keep state of the artifact, to the environments where the artifact is deployed. This way promotion of deployments, rollback to an older version, roll forward is easily done. Such an integration can be built into a CI tool but it will involve a lot of coding effort.
  2. Integration with the Deployment environment - a CD tools integrates well the target deployment platform without too much custom code needed.

Target Flow

I am targeting a flow which looks like this, any merge to a "main" branch of a repository should:
1. Test and build an image
2. Deploy the image to a "dev" GKE cluster
3. The deployment can be promoted from the "dev" to the "prod" GKE cluster


Building an Image

Running the test and building the image is handled with a combination of Cloud Build providing the build automation environment and skaffold providing tooling through Cloud Native Buildpacks. It may be easier to look at the code repository to see how both are wired up - https://github.com/bijukunjummen/hello-skaffold-gke


Deploying the image to GKE

Now that an image has been baked, the next step is to deploy this into a GKE Kubernetes environment.  Cloud Deploy has a declarative way of specifying the environments(referred to as Targets) and how to promote the deployment through the environments. A Google Cloud Deploy pipeline looks like this:


The pipeline is fairly easy to read. Target(s) describe the environments to deploy the image to and the pipeline shows how progression of the deployment across the environments is handled. 

One thing to notice is that the "prod" target has been marked with a "requires approval" flag which is a way to ensure that the promotion to prod environment happens only with an approval. Cloud Deploy documentation has a good coverage of all these concepts. Also, there is a strong dependence on skaffold to generate the kubernetes manifests and deploying them to the relevant targets.

Given such a deployment pipeline, it can be put in place using:

gcloud beta deploy apply --file=clouddeploy.yaml --region=us-west1

Alright, now that the CD pipeline is in place, a "Release" can be triggered once the testing is completed in a "main" branch, a command which looks like this is integrated with the Cloud Build pipeline to do this, with a file pointing to the build artifacts:

gcloud beta deploy releases create release-01df029 --delivery-pipeline hello-skaffold-gke --region us-west1 --build-artifacts artifacts.json
This deploys the generated kubernetes manifests pointing to the right build artifacts to the "dev" environment


and can then be promoted to additional environments, prod in this instance.

Conclusion

This is a whirlwind tour of Google Cloud Deploy and the feature that it offers. It is still early days and I am excited to see where the Product goes. The learning curve is fairly steep, it is expected that a developer understands:
  1. Kubernetes, which is the only application runtime currently supported, expect other runtimes to be supported as the Product evolves.
  2. skaffold, which is used for building, tagging, generating kubernetes artifacts
  3. Cloud Build and its yaml configuration
  4. Google Cloud Deploys yaml configuration

It will get simpler as the Product matures.

Saturday, September 25, 2021

Cloud Build and Gradle/Maven Caching

One of the pain points in all the development projects that I have worked on has been setting up/getting an infrastructure for automation. This has typically meant getting access to an instance of Jenkins. I have great respect for Jenkins as a tool, but each deployment of Jenkins tends to become a Snowflake over time with the different set of underlying plugins, version of software, variation of pipeline script etc.

This is exactly the niche that a tool like Cloud Build solves for, deployment is managed by Google Cloud platform, and the build steps are entirely user driven based on the image used for each step of the pipeline.

In the first post I went over the basics of creating a Cloud Build configuration and in the second post went over a fairly comprehensive pipeline for a Java based project.

This post will conclude the series by showing an approach to caching in the pipeline - this is far from original, I am borrowing generously from a few sample configurations that I have found. So let me start by describing the issue being solved for.


Problem

Java has two popular build tools - Gradle and Maven. Each of these tools download a bunch of dependencies and cache these dependencies at startup -
  1. The tool itself is not a binary, but a wrapper which knows to download the right version of the tools binary.
  2. The projects dependencies specified in tool specific DSL's are then downloaded from repositories.
The issue is that across multiple builds the dependencies tend to get downloaded when run

Caching across Runs of a Build

The solution is to cache the downloaded artifacts across the different runs of a build. There is unfortunately no built in way (yet) in Cloud Build to do this, however a mechanism can be built along these lines:
  1. Cache the downloaded dependencies into Cloud Storage at the end of the build 
  2. And then use it to rehydrate the dependencies at the beginning of the build, if available

A similar approach should work for any tool that downloads dependencies. The trick though is figuring out where each tool places the dependencies and knowing what to save to Cloud storage and back. 

Here is an approach for Gradle and Maven.

Each step of the cloud build loads the exact same volume:
    
	volumes:
      - name: caching.home
        path: /cachinghome

Then  explodes the cached content from cloud storage into this volume.

    dir: /cachinghome
    entrypoint: bash
    args:
      - -c
      - |
        (
          gsutil cp gs://${_GCS_CACHE_BUCKET}/gradle-cache.tar.gz /tmp/gradle-cache.tar.gz &&
          tar -xzf /tmp/gradle-cache.tar.gz
        ) || echo 'Cache not found'
    volumes:
      - name: caching.home
        path: /cachinghome

Now, Gradle and Maven store the dependencies into a ".gradle" and ".m2" folder in a users home directory respectively. The trick then is to link the $USER_HOME/.gradle and $USER_HOME/.m2 folder to the exploded directory:


  - name: openjdk:11
    id: test
    entrypoint: "/bin/bash"
    args:
      - '-c'
      - |-
        export CACHING_HOME="/cachinghome"
        USER_HOME="/root"
        GRADLE_HOME="$${USER_HOME}/.gradle"
        GRADLE_CACHE="$${CACHING_HOME}/gradle"

        mkdir -p $${GRADLE_CACHE}

        [[ -d "$${GRADLE_CACHE}" && ! -d "$${GRADLE_HOME}" ]] && ln -s "$${GRADLE_CACHE}" "$${GRADLE_HOME}"
        ./gradlew check
    volumes:
      - name: caching.home
        path: /cachinghome

The gradle tasks should now use the cached content if available or create the cached content if it is being run for the first time. 


It may be simpler to see a sample build configuration which is here - https://github.com/bijukunjummen/hello-cloud-build/blob/main/cloudbuild.yaml


Friday, August 20, 2021

Cloud Build - CI/CD for a Java Project

In a previous blog post I went over the basics of what it takes to create a configuration for Cloud Build. This post will expand on it by creating a functional CI/CD pipeline for a java project using Cloud Build. Note that I am claiming the pipeline will be functional but far from optimal, a follow up post at some point will go over potential optimizations to the pipeline.


Continuous Integration

The objective of Continuous integration is to ensure that developers regularly merge quality code into a common place. The quality is ascertained using automation, which is where a tool like Cloud Build comes in during the CI process.

Consider a flow where developers work on feature branches and when ready send a pull request to the main branch

Now to ensure quality, checks should be run on the developers feature branch before it is allowed to be merged into the "main" branch. This means two things:

1. Running quality checks on the developers feature branch
2. Merges to main branch should not be permitted until checks are run.


Let's start with Point 1 - Running quality checks on a feature branch

Running quality checks on a feature branch

This is where integration of Cloud Build with the repo comes into place. I am using this repository - https://github.com/bijukunjummen/hello-cloud-build, to demonstrate this integration with Cloud Build. If you have access to a Google Cloud environment, a new integration of Cloud build build with a repository looks something like this:



Once this integration is in place, a Cloud Build "trigger" should be created to act on a new pull request to the repository:




Here is where the Cloud Build configuration comes into play, it specifies what needs to happen when a Pull Request is made to the repository. This is a Java based project with gradle as the build tool, I want to run tests and other checks, which is normally done through a gradle task called "check", a build configuration which does this is simple:




steps:
  - name: openjdk:11
    id: test
    entrypoint: "./gradlew"
    args: [ "check" ]

Onto the next objective - Merges to the main branch should not be allowed until the checks are clean

Merges to main branch only with a clean build

This is done on the repository side on github, through settings that look like this - 

The settings protects the "main" branch by only allowing in merges after the checks in the PR branch is clean. It also prevents checking in code directly to the main branch.


With these two considerations, checking the feature branch before merges are allowed, and allowing merges to "main" branch after checks should ensure that quality code should get into the "main" branch. 

Onto the Continuous Deployment side of the house. 


Continuous Deployment

So now presumably a clean code has made its way to the main branch and we want to deploy it to an environment. 

In Cloud Build this translates to a "trigger", that acts on commits to specific branches and looks like this for me:


and again the steps expressed as a Cloud Build configuration, has steps to re-run the checks and deploy the code to Cloud Run 


steps:
  - name: openjdk:11
    id: test
    entrypoint: "/bin/bash"
    args:
      - '-c'
      - |-
        ./gradlew check

  - name: openjdk:11
    id: build-image
    entrypoint: "/bin/bash"
    args:
      - '-c'
      - |-
        ./gradlew jib --image=gcr.io/$PROJECT_ID/hello-cloud-build:$SHORT_SHA
 
  - name: 'gcr.io/cloud-builders/gcloud'
    id: deploy
    args: [ 'run', 'deploy', "--image=gcr.io/$PROJECT_ID/hello-cloud-build:$SHORT_SHA", '--platform=managed', '--project=$PROJECT_ID', '--region=us-central1', '--allow-unauthenticated', '--memory=256Mi', '--set-env-vars=SPRING_PROFILES_ACTIVE=gcp', 'hello-cloud-build' ]

Here I am using Jib to create the image.

Wrapup


With this tooling in place, a developer flow looks like this. A PR triggers checks and shows up like this on the github side:


and once checks are complete, allows the branch to be merged in:


After merge the code gets cleanly deployed.


Tuesday, August 10, 2021

Google Cloud Build - Hello World

I have been exploring Google Cloud Build recently and this post is a simple introduction to this product. You can think of it as a tool that enables automation of deployments. This post though will not go as far as automating deployments, instead just covering the basics of what it involves in getting a pipeline going. A follow up post will show a continuous deployment pipeline for a java application. 

Steps

The basic steps to set-up a Cloud Build in your GCP project is explained here. Assuming that the Cloud Build has been set-up, I will be using this github project to create a pipeline.

Cloud pipeline is typically placed as a yaml configuration in a file named by convention as "cloudbuild.yaml". The pipeline is described as a series of steps, each step runs in a docker container and the name of the step points to the docker image. So for eg. a step which echo's a message looks like this:

Here the name "bash" points to the docker image named "bash" in docker hub

The project does not need to be configured in Google Cloud Build to run it, instead a utility called "cloud-build-local" can be used for running the build file. 


git clone git@github.com:bijukunjummen/hello-cloud-build.git
cd hello-cloud-build
cloud-build-local .
Alright, now to add a few more steps. Consider a build file with 2 steps: Here the two steps will run serially, first Step A, then Step B. A sample output looks like this on my machine:

Starting Step #0 - "A"
Step #0 - "A": Already have image (with digest): bash
Step #0 - "A": Step A
Finished Step #0 - "A"
2021/08/10 12:50:23 Step Step #0 - "A" finished
Starting Step #1 - "B"
Step #1 - "B": Already have image (with digest): bash
Step #1 - "B": Step B
Finished Step #1 - "B"
2021/08/10 12:50:25 Step Step #1 - "B" finished
2021/08/10 12:50:26 status changed to "DONE"

Concurrent Steps

A little more complex, say if I wanted to execute a few steps concurrently, the way to do it is using waitFor property of a step.


Here "waitFor" value of "-" indicates the start of the build, so essentially Step A and B will run concurrently and an output in my machine looks like this:

Starting Step #1 - "B"
Starting Step #0 - "A"
Step #1 - "B": Already have image (with digest): bash
Step #0 - "A": Already have image (with digest): bash
Step #1 - "B": Step B
Step #0 - "A": Step A
Finished Step #1 - "B"
2021/08/10 12:54:21 Step Step #1 - "B" finished
Finished Step #0 - "A"
2021/08/10 12:54:21 Step Step #0 - "A" finished

One more example where Step A is executed first and then Step B and Step C concurrently:

Passing Data

A root volume at path "/workspace" carries through the build, so if a step wants to pass data to another step then it can be passed through this "/workspace" folder. Here Step A is writing to a file and Step B is reading from the same file.

Conclusion

This covers the basics of the steps in a Cloud Build configuration file. In a subsequent post I will be using these to create a pipeline to deploy a java based application to Google Cloud Run.

Friday, July 2, 2021

Kotlin "Result" type for functional exception handling

In a previous post I had gone over how a "Try" type can be created in Kotlin from scratch to handle exceptions in a functional way. There is no need however to create such a type in Kotlin, a type called "Result" already handles the behavior of "Try" and this post will go over how it works. I will be taking the scenario from my previous post, having two steps:
  1. Parsing a Url
  2. Fetching from the Url
Either of these steps can fail
  • the URL may not be well formed, and 
  • fetching from a remote url may have network issues
So onto the basics of how such a call can be made using the Result type. You can imagine that parsing URL can return this Result type, capturing any exception that may result from such a call:
fun parseUrl(url: String): Result<URL> = 
        kotlin.runCatching { URL(url) }

Kotlin provides the "runCatching" function which accepts the block that can result in an exception and traps the result OR the exception in the "Result" type. Now that a "Result" is available, some basic checks can be made on it, I can check that the call succeeded using the "isSuccess" and "isFailure" properties:
val urlResult: Result<URL> = parseUrl("http://someurl")
urlResult.isSuccess == true
urlResult.isFailure == false

I can get the value using various "get*" methods:
urlResult.getOrNull() // Returns null if the block completed with an exception
urlResult.getOrDefault(URL("http://somedefault")) // Returns a default if the block completed with an exception
urlResult.getOrThrow() // Throws an exception if the block completed with an exception

The true power of "Result" type is however in chaining operations on it. So for eg, if you wanted to retrieve the host name given the url:
val urlResult: Result<URL> = parseUrl("http://someurl")
val hostResult: Result<String> = urlResult.map { url -> url.host }

Or a variant "mapCatching" which can trap any exception when using map operation and capture that as a "Result":
val getResult: Result<String> = urlResult.mapCatching { url -> throw RuntimeException("something failed!") }

All very neat! One nit that I have with the current "Result" is a missing "flatMap" operation, so for eg. consider a case where I have these two functions:
fun parseUrl(url: String): Result<URL> =
    kotlin.runCatching { URL(url) }
    
fun getFromARemoteUrl(url: URL): Result<String> {
    return kotlin.runCatching { "a result" }
}

I would have liked to be able to chain these two operations, along these lines:
val urlResult: Result<URL> = parseUrl("http://someurl")
val getResult: Result<String> = urlResult.flatMap { url -> getFromARemoteUrl(url)}

but a operator like "flatMap" does not exist (so far, as of Kotlin 1.5.20) 

I can do today is a bit of hack:
val urlResult: Result<URL> = parseUrl("http://someurl")
val getResult: Result<String> = urlResult.mapCatching { url -> getFromARemoteUrl(url).getOrThrow() }
OR even better, create an extension function which makes flatMap available to "Result" type, this way and use it:
fun <T, R> Result<T>.flatMap(block: (T) -> (Result<R>)): Result<R> {
    return this.mapCatching {
        block(it).getOrThrow()
    }
}
val urlResult: Result<URL> = parseUrl("http://someurl")
val getResult: Result<String> = urlResult.flatMap { url -> getFromARemoteUrl(url)}
This concludes my exploration of the Result type and the ways to use it. I have found it to be a excellent type to have in my toolbelt.