Saturday, September 24, 2022

Cloud Deploy with Cloud Run

Google Cloud Deploy is a service to continuously deploy to Google Cloud Application runtimes. It has supported Google Kubernetes Engine(GKE) so far, and now is starting to support Cloud Run. This post is about a quick trial of this new and exciting support in Cloud Deploy. 

It may be simpler to explore the entire sample which is available in my github repo here 

End to end Flow

The sample attempts to do the following:

A Cloud Build based build first builds an image. This image is handed over to Cloud Deploy which deploys to Cloud Run. A "dev" and "prod" target is simulated by the Cloud Run applications having names prefixed with the environment name.

Building an image

There are way too many ways to build a container image, my personal favorite is  the excellent Google jib tool which requires a simple plugin to be in place to create AND publish a container image. Once an image is created, the next task is to get the tagged image name for use with say a Kubernetes deployment manifest. 

Skaffold does a great job of orchestrating these two steps, creating an image and rendering the application runtime manifests with the image locations. Since the deployment is to a Cloud Run environment, the manifest looks something like this:

Now, manifest for each target environment may look a little different, so for eg in my case the application name targeted towards dev environment has a "dev-" prefix and for prod environment has a "prod-" prefix. This is where another tool called Kustomize fits in. Kustomize is fairly intuitive, it expresses the variations for each environment as a patch file, so for eg, in my case where I want to prefix the name of the application in the dev environment with a "dev-", the Kustomize configuration looks something like this:

So now, we have 3 tools:
  1. For building an image - Google Jib
  2. Generating the manifests based on environment - Kustomize
  3. Rending the image name in the manifests - Skaffold
Skaffold does a great job of wiring all the tools together, and looks something like this for my example:

Deploying the Image

In the Google Cloud Environment, Cloud Build is used for calling Skaffold and building the image, I have a cloudbuild.yaml file available with my sample, which shows how skaffold is invoked and the image built.

Let's come to the topic of the post, about deploying this image to Cloud Run using Cloud Deploy. Cloud Deploy uses a configuration file to describe where the image needs to be deployed, which is Cloud Run in this instance and how the deployment needs to be promoted across environments. The environments are referred to as "targets" and look like this in my configuration:

They point to the project and region for the Cloud Run service.

Next is the configuration to describe how the pipeline will take the application through the targets:

This simply shows that application will be first deployed to the "dev" target and then promoted to the "prod" target after approval.

The "profiles" in the each of the stages show the profile that will be activated in skaffold, which simply determines which overlay of kustomize will be used to create the manifest.

That covers the entire Cloud Deploy configuration. The next step once the configuration file is ready is to create the deployment pipeline, which is done using a command which looks like this:

gcloud deploy apply --file=clouddeploy.yaml --region=us-west1

and registers the pipeline with Cloud Deploy service.

So just to quickly recap, I now have the image built by Cloud Build, the manifests generated using skaffold, kustomize, and a pipeline registered with Cloud Deploy, the next step is to trigger the pipeline for the image and the artifacts, which is done through another command, which is hooked up to Cloud Build:
gcloud deploy releases create release-$SHORT_SHA --delivery-pipeline clouddeploy-cloudrun-sample --region us-west1 --build-artifacts artifacts.json

This would trigger the deploy to the different Cloud Run targets - "dev" in my case to start with:

Once deployed, I have a shiny Cloud Run app all ready to accept requests!

This can now be promoted to my "prod" target with a manual approval process:


Cloud Deploy's support for Cloud Run works great, it takes a familiar tooling with Skaffold typically meant for Kubernetes manifests and uses it cleverly for Cloud Run deployment flows. I look forward to more capabilities in Cloud Deploy with support for Blue/Green, Canary deployment models.

Sunday, September 4, 2022

Skaffold for Local Java App Development

Skaffold is a tool which handles the workflow of building, pushing and deploying container images and has the added benefit of facilitating an excellent local dev loop. 

In this post I will be exploring using Skaffold for local development of a Java based application

Installing Skaffold

Installing Skaffold locally is straightforward, and explained well here. It works great with minikube as a local kubernetes development environment. 

Skaffold Configuration

My sample application is available in a github repository here -

Skaffold requires at a minimum, a configuration expressed in a skaffold.yml file, with details of 

  • How to build an image
  • Where to push the image 
  • How to deploy the image - Kubernetes artifacts which should be hydrated with the details of the published image and used for deployment.

In my project, the skaffold.yml file looks like this:

apiVersion: skaffold/v2beta16
kind: Config
  name: hello-skaffold-gke
  - image: hello-skaffold-gke
    jib: {}
    - kubernetes/hello-deployment.yaml
    - kubernetes/hello-service.yaml

This tells Skaffold:

  • that the container image should be built using the excellent jib tool
  • The location of the kubernetes deployment artifacts, in my case a deployment and a service describing the application
The Kubernetes manifests need not hardcode the container image tag, instead  they can use a placeholder which gets hydrated by Skaffold:

apiVersion: apps/v1
kind: Deployment
  name: hello-skaffold-gke-deployment
  replicas: 1
      app: hello-skaffold-gke
        app: hello-skaffold-gke
        - name: hello-skaffold-gke
          image: hello-skaffold-gke
            - containerPort: 8080
The image section gets populated with real tagged image name by Skaffold. 

Now that we have a Skaffold descriptor in terms of skaffold.yml file and Kubernetes manifests, let's see some uses of Skaffold.

Building a local Image

A local image is built using the "skaffold build" command, trying it on my local environment:

skaffold build --file-output artifacts.json

results in an image published to the local docker registry, along with a artifact.json file with a content pointing to the created image

  "builds": [
      "imageName": "hello-skaffold-gke",
      "tag": "hello-skaffold-gke:a44382e0cd08ba65be1847b5a5aad099071d8e6f351abd88abedee1fa9a52041"

If I wanted to tag the image with the coordinates to the Artifact Registry, I can specify an additional flag "default-repo", the following way:

skaffold build --file-output artifacts.json

resulting in a artifacts.json file with content that looks like this:

  "builds": [
      "imageName": "hello-skaffold-gke",
      "tag": ""
The kubernetes manifests can now be hydrated using a command which looks like this:

skaffold render -a artifacts.json --digest-source=local

which hydrates the manifests, and the output looks like this:

apiVersion: apps/v1
kind: Deployment
  name: hello-skaffold-gke-deployment
  namespace: default
  replicas: 1
      app: hello-skaffold-gke
        app: hello-skaffold-gke
      - image:
        name: hello-skaffold-gke
        - containerPort: 8080
apiVersion: v1
kind: Service
  name: hello-skaffold-gke-service
  namespace: default
  - name: hello-skaffold-gke
    port: 8080
    app: hello-skaffold-gke
  type: LoadBalancer
The right image name now gets plugged into the Kubernetes manifests and can be used for deploying to any Kubernetes environment.


Local Development loop with Skaffold

The additional benefit of having a Skaffold configuration file is in the excellent local development loop provided by Skaffold. All that needs to be done to get into the development loop is to run the following command:

skaffold dev --port-forward

which builds an image, renders the kubernetes artifacts pointing to the image and deploying the Kubernetes artifacts to the relevant local Kubernetes environment, minikube in my case:

➜  hello-skaffold-gke git:(main) ✗ skaffold dev --port-forward
Listing files to watch...
 - hello-skaffold-gke
Generating tags...
 - hello-skaffold-gke -> hello-skaffold-gke:5aa5435-dirty
Checking cache...
 - hello-skaffold-gke: Found Locally
Tags used in deployment:
 - hello-skaffold-gke -> hello-skaffold-gke:a44382e0c008bf65be1847b5a5aad099071d8e6f351abd88abedee1fa9a52041
Starting deploy...
 - deployment.apps/hello-skaffold-gke-deployment created
 - service/hello-skaffold-gke-service created
Waiting for deployments to stabilize...
 - deployment/hello-skaffold-gke-deployment is ready.
Deployments stabilized in 2.175 seconds
Port forwarding service/hello-skaffold-gke-service in namespace default, remote port 8080 ->
Press Ctrl+C to exit
Watching for changes...
The dev loops kicks in if any of the file is changed in the project, the image gets rebuilt and deployed again and is surprisingly quick with a tool like jib for creating images.

Debugging with Skaffold

Debugging also works great with skaffold, it starts the appropriate debugging agent for the language being used, so for java, if I were to run the following command:

skaffold debug --port-forward

and attach a debugger in Intellij using a "Remote process" pointing to the debug port

It would pause execution when a code with breakpoint is invoked!

Debugging Kubernetes artifacts

Since real Kubernetes artifacts are being used in the dev loop, we get to test the artifacts and see if there is any typos in them. So for eg, if I were to make a mistake and refer to "port" as "por", it would show up in the dev loop with an error the following way:

WARN[0003] deployer cleanup:kubectl create: running [kubectl --context minikube create --dry-run=client -oyaml -f /Users/biju/learn/hello-skaffold-gke/kubernetes/hello-deployment.yaml -f /Users/biju/learn/hello-skaffold-gke/kubernetes/hello-service.yaml]
 - stdout: "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: hello-skaffold-gke-deployment\n  namespace: default\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: hello-skaffold-gke\n  template:\n    metadata:\n      labels:\n        app: hello-skaffold-gke\n    spec:\n      containers:\n      - image: hello-skaffold-gke\n        name: hello-skaffold-gke\n        ports:\n        - containerPort: 8080\n"
 - stderr: "error: error validating \"/Users/biju/learn/hello-skaffold-gke/kubernetes/hello-service.yaml\": error validating data: [ValidationError(Service.spec.ports[0]): unknown field \"por\" in io.k8s.api.core.v1.ServicePort, ValidationError(Service.spec.ports[0]): missing required field \"port\" in io.k8s.api.core.v1.ServicePort]; if you choose to ignore these errors, turn validation off with --validate=false\n"
 - cause: exit status 1  subtask=-1 task=DevLoop
kubectl create: running [kubectl --context minikube create --dry-run=client -oyaml -f /Users/biju/learn/hello-skaffold-gke/kubernetes/hello-deployment.yaml -f /Users/biju/learn/hello-skaffold-gke/kubernetes/hello-service.yaml]
 - stdout: "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: hello-skaffold-gke-deployment\n  namespace: default\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: hello-skaffold-gke\n  template:\n    metadata:\n      labels:\n        app: hello-skaffold-gke\n    spec:\n      containers:\n      - image: hello-skaffold-gke\n        name: hello-skaffold-gke\n        ports:\n        - containerPort: 8080\n"
 - stderr: "error: error validating \"/Users/biju/learn/hello-skaffold-gke/kubernetes/hello-service.yaml\": error validating data: [ValidationError(Service.spec.ports[0]): unknown field \"por\" in io.k8s.api.core.v1.ServicePort, ValidationError(Service.spec.ports[0]): missing required field \"port\" in io.k8s.api.core.v1.ServicePort]; if you choose to ignore these errors, turn validation off with --validate=false\n"
 - cause: exit status 1
This is a great way to make sure that the Kubernetes manifests are tested in some way before deployment


Skaffold is an awesome tool to have in my toolbox, it facilitates building of container images, tagging them with sane names, hydrating the Kubernetes manifests using the images, deploying the manifests to a Kubernetes environment. In addition it provides a great development and debugging loop.

Wednesday, July 6, 2022

Google Cloud Function Gradle Plugin

 It is easy to develop a Google Cloud Function using Java with Gradle as the build tool. It is however not so simple to test it locally.

The current recommended approach to testing especially with gradle is very complicated. It requires pulling in Invoker libraries and adding a custom task to run the invoker function.

I have now authored a gradle plugin which makes local testing way more easier!


The way the Invoker is added in for a Cloud Function Gradle project looks like this today:

This has a lot of opaque details, for eg, what does the configurations of invoker even mean, what is the magical task that is being registered?


Now contrast it with the approach with the plugin:

All the boiler plate is now gone, configuration around the function class, which port to start it up on much more simplified. Adding this new plugin contributes a task that can be invoked the following way:

./gradlew cloudFunctionRun
It would start up an endpoint using which the function can be tested locally.


It may be far easier to see fully working samples incorporating this plugin. These samples are available here —

Thursday, June 23, 2022

Google Cloud Functions (2nd Gen) Java Sample

Cloud Functions (2nd Gen) is Google’s Serverless Functions as a Service Platform. 2nd Generation is now built on top of the excellent Google Cloud Run as a base. Think of Google Cloud Run as a Serverless environment for running containers which respond to events(http being the most basic, all sorts of other events via eventarc).

The blue area above shows the flow of code, the Google Cloud cli for Cloud Function, orchestrates the flow where the source code is placed in Google Cloud Storage bucket, a Cloud Build is triggered to build this code, package it into a container and finally this container is run using Cloud Run which the user can access via Cloud Functions console. Cloud Functions essentially becomes a pass through to Cloud Run.

The rest of this post will go into the details of how such a function can be written using Java.

tl;dr — sample code is available here, and has all the relevant pieces hooked up.

Method Signature

To expose a function to respond to http events is fairly straightforward, it just needs to conform to the functions framework interface, for java it is available here

To pull in this dependency using gradle as the build tool looks like this:

The dependency is required purely for compilation, at runtime the dependency is provided through a base image that Functions build time uses.

The function signature looks like this:

Testing the Function

This function can be tested locally using an Invoker that is provided by the functions-framework-api, my code shows how it can be hooked up with gradle, suffice to say that invoker allows an endpoint to brought up and tested with utilities like curl.

Deploying the Function

Now comes the easy part about deploying the function. Since a lot of Google Cloud Services need to be orchestrated to get a function deployed — GCS, Cloud Build, Cloud Run and Cloud Function, the command line to deploy the function does a great job of indicating which services need to be activated, the command to run looks like this:

gcloud beta functions deploy java-http-function \
--gen2 \
--runtime java17 \
--trigger-http \
--entry-point functions.HelloHttp \
--source ./build/libs/ \

Note that atleast for Java, it is sufficient to build the code locally and provide the built uber jar(jar with all dependencies packaged in) as the source.

Once deployed, the endpoint can be found using the following command:

gcloud beta functions describe java-http-function --gen2
and the resulting endpoint accessed via a curl command!

Hello World

What is Deployed

This is a bit of an exploration of what gets deployed into a GCP project, let’s start with the Cloud Function itself.

See how for a Gen2 function, a “Powered by Cloud Run” shows up which links to the actual cloud run deployment that powers this cloud function, clicking through leads to:


This concludes the steps to deploy a simple Java based Gen2 Cloud Function that responds to http calls. The post shows how the Gen 2 Cloud Function is more or less a pass through to Cloud Run. The sample is available in my github repository —

Saturday, May 14, 2022

Google Cloud Structured Logging for Java Applications

 One advice for logging that I have seen when targeting applications to cloud platforms is to simply write to Standard Out and platform takes care of sending it to the appropriate log sinks. This mostly works except when it doesn't - it especially doesn't when analyzing failure scenarios. Typically for Java applications this means looking through a stack trace and each line of a stack trace is treated as a separate log entry by the log sinks, this creates these problems:

  1. Correlating multiple line of output as being part of a single stack trace
  2. Since applications are multi-threaded even related logs may not be in just the right order
  3. The severity of logs is not correctly determined and so does not find its way into the Error Reporting system

This post will go into a few approaches when logging from a Java application in Google Cloud Platform


Let me go over the problem once more, so say I were to log the following way in Java code:"Hello Logging") 

And it shows up the following way in the GCP Logging console

  "textPayload": "2022-04-29 22:00:12.057  INFO 1 --- [or-http-epoll-1] org.bk.web.GreetingsController           : Hello Logging",
  "insertId": "626c5fec0000e25a9b667889",
  "resource": {
    "type": "cloud_run_revision",
    "labels": {
      "service_name": "hello-cloud-run-sample",
      "configuration_name": "hello-cloud-run-sample",
      "project_id": "biju-altostrat-demo",
      "revision_name": "hello-cloud-run-sample-00008-qow",
      "location": "us-central1"
  "timestamp": "2022-04-29T22:00:12.057946Z",
  "labels": {
    "instanceId": "instanceid"
  "logName": "projects/myproject/logs/",
  "receiveTimestamp": "2022-04-29T22:00:12.077339403Z"

This looks reasonable. Now consider the case of logging in case of an error:

  "textPayload": "\t\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe( ~[reactor-core-3.4.17.jar:3.4.17]",
  "insertId": "626c619b00005956ab868f3f",
  "resource": {
    "type": "cloud_run_revision",
    "labels": {
      "revision_name": "hello-cloud-run-sample-00008-qow",
      "project_id": "biju-altostrat-demo",
      "location": "us-central1",
      "configuration_name": "hello-cloud-run-sample",
      "service_name": "hello-cloud-run-sample"
  "timestamp": "2022-04-29T22:07:23.022870Z",
  "labels": {
    "instanceId": "0067430fbd3ad615324262b55e1604eb6acbd21e59fa5fadd15cb4e033adedd66031dba29e1b81d507872b2c3c6cd58a83a7f0794965f8c5f7a97507bb5b27fb33"
  "logName": "projects/biju-altostrat-demo/logs/",
  "receiveTimestamp": "2022-04-29T22:07:23.317981870Z"

There would be multiple of these in the GCP logging console, for each line of the stack trace with no way to correlate them together. Additionally, there is no severity attached to these event and so the error would not end up with Google Cloud Error Reporting service.

Configuring Logging

There are a few approaches to configuring logging for a Java application targeted to be deployed to Google Cloud. The simplest approach, if using Logback, is to use the Logging appender provided by Google Cloud available here -

Adding the appender is easy, a logback.xml file with the appender configured looks like this:

    <appender name="gcpLoggingAppender" class="">
    <root level="INFO">
        <appender-ref ref="gcpLoggingAppender"/>
This works great, but it has a huge catch. It requires connectivity to a GCP environment as it writes the logs directly to Cloud Logging system, which is not ideal for local testing. 

An approach that works when running in a GCP environment as well as locally is to simply direct the output to Standard Out, this will ensure that the logs are written in a json structured format and shipped correctly to Cloud Logging.
    <appender name="gcpLoggingAppender" class="">
    <root level="INFO">
        <appender-ref ref="gcpLoggingAppender"/>
If you are using Spring Boot as the framework, the approach can be even be customized such that on a local environment the logs get written to Standard Out in a line by line manner, and when deployed to GCP, the logs are written as Json output:
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <include resource="org/springframework/boot/logging/logback/console-appender.xml"/>

    <appender name="gcpLoggingAppender" class="">

    <root level="INFO">
        <springProfile name="gcp">
            <appender-ref ref="gcpLoggingAppender"/>
        <springProfile name="local">
            <appender-ref ref="CONSOLE"/>

This Works..But

Google Cloud logging appender works great, however there is an issue. It doesn't capture the entirety of a stack trace for some reason. I have an issue open which should address this. In the meantime if capturing the full stack in the logs is important then a different approach is to simply write a json formatted log using the native json layout provided by logback:

<appender name="jsonLoggingAppender" class="ch.qos.logback.core.ConsoleAppender">
    <layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
        <jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter">
        <timestampFormat>yyyy-MM-dd HH:mm:ss.SSS</timestampFormat>
The fields however does not match the structured log format recommended by GCP, especially the severity, a quick tweak can be made by implementing a custom JsonLayout class that looks like this:

package org.bk.logback.custom;

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.contrib.json.classic.JsonLayout;

import java.util.Map;

public class GcpJsonLayout extends JsonLayout {
    private static final String SEVERITY_FIELD = "severity";

    protected void addCustomDataToJsonMap(Map<String, Object> map, ILoggingEvent event) {
        map.put(SEVERITY_FIELD, severityFor(event.getLevel()));

    private static Severity severityFor(Level level) {
        return switch (level.toInt()) {
            // TRACE
            case 5000 -> Severity.DEBUG;
            // DEBUG
            case 10000 -> Severity.DEBUG;
            // INFO
            case 20000 -> Severity.INFO;
            // WARNING
            case 30000 -> Severity.WARNING;
            // ERROR
            case 40000 -> Severity.ERROR;
            default -> Severity.DEFAULT;

which takes care of mapping to the right Severity levels for Cloud Error reporting. 


Use Google Cloud Logback appender and you should be set. Consider the alternate approaches only if you think you are lacking more of the stacktrace.

Saturday, April 30, 2022

Calling Google Cloud Services in Java

If you want to call Google Cloud Services using a Java based codebase, then broadly there are two approaches to incorporating the client libraries in your code — the first, let’s call it a “direct” approach is to use the Google Cloud Client libraries available here, the second approach is to use a “wrapper”, Spring Cloud GCP libraries available here.

So given both these libraries which one should you use. My take is simple — if you have a Spring Boot based app likely Spring Cloud GCP should be the preferred approach else the “direct” libraries.

Using Pub/Sub Client libraries

The best way to see the two approaches in action is to use it for making a call — in this case to publish a message to Cloud Pubsub.
The kind of contract I am expecting to implement looks like this:

The “message” is a simple type and looks like this, represented as a Java record:

Given this, let’s start with the “direct” approach.

Direct Approach

The best way that I have found to get to the libraries is using this page —, which in turn links to the client libraries for the specific GCP services, the cloud pub/sub one is here — I use gradle for my builds and to pull in pub/sub libs with gradle is done this way:

implementation platform('')
With the library pulled in, the code to publish a message looks like this:

The message is converted to a raw json and published to Cloud Pub/Sub which returns a ApiFuture type. I have previously covered how such a type can be converted to reactive types which is finally returned from the publishing code.

The “publisher” is created using a helper method:

Publisher publisher = Publisher.newBuilder("sampletopic").build();

Spring Cloud GCP Approach

The documentation for Spring Cloud GCP project is available here, first to pull in the dependencies, for a Gradle based project it looks like this:

dependencies {
   implementation ''

dependencyManagement {
   imports {
      mavenBom "${springCloudGcpVersion}"
      mavenBom "${springCloudVersion}"

With the right dependencies pulled in Spring Boot Auto-configuration comes into play and automatically creates a type called the PubSubTemplate with properties that can tweak configuration A code to publish a message to a topic using a PubSubTemplate looks like this:


Given these two code snippets, these are some of the differences:
  • Spring Cloud GCP has taken care of a bunch of boiler plate around how to create a Publisher (and subscriber if listening to messages)
  • The PubSubTemplate provides simpler helper methods for publishing messages and for listening to messages, the return type which is ListenableFuture with PubSubTemplate can easily be transformed to reactive types unlike the ApiFuture return type
  • Testing with Spring Cloud GCP is much simpler as the Publisher needs to be tweaked extensively to work with an emulator and Spring Cloud GCP handles this complication under the covers


The conclusion for me is that Spring Cloud GCP is compelling, if a project is Spring Boot based then Spring Cloud GCP will fit in great and provides just the right level of abstraction in dealing with the Google Cloud API’s.
The snippets in this blog post doesn’t do justice to some of the complexities of the codebase, my github repo may help with a complete working codebase with both “direct” and Spring cloud GCP based code —

Saturday, March 5, 2022

Modeling one-to-many relation in Firestore, Bigtable, Spanner

I like working with services that need little to no provisioning effort — these are typically termed as Fully Managed services by different Providers.

The most provisioning effort is typically required for database systems, I remember having to operate a Cassandra cluster in a previous job and the amount of effort spent on provisioning, upkeep was far from trivial and I appreciated and empathized with the role of a Database administrator dearly during that time.

My objective in this post is to explore how a one-to-many relationship can be maintained in 3 managed database solutions on Google Cloud — Firestore, Bigtable and Spanner.

Data Model

The data model is to represent a Chat Room with Chat Messages in the rooms.

Chat Room just has name as an attribute. Each Chat Room has a set of Chat Messages, with each message having a payload and creation date as attributes. A sample would look something like this:

So now comes the interesting question, how can this one-to-many relation be modeled using Firestore, Bigtable and Spanner. Let’s start with Firestore.

One-to-many using Firestore

Managing a One-to-many relation comes naturally to Firestore. The concepts map directly to the structures of Firestore:

  • Each Chat Room instance and each Chat Message can be thought of as a Firestore “Document”.
  • All the Chat Room instances are part of a “ChatRooms” “Collection”
  • Each Chat Room “Document” has a “Sub-Collection” to hold all the Chat Messages relevant to it, this way establishing a One-to-Many relationship

One-to-Many using Bigtable

A quick aside, in Bigtable information is stored in the following form

Each Chat Room and Chat Room message can be added in as rows with carefully crafted row keys.

  • A chat room, needs to be retrieved by its id, so a row key may look something like this: “ROOM/R#room-id”
  • Chat Room message row key can be something like this: “MESSAGES/R#chatroom-id/M#message-id”

Since Bigtable queries can be based on prefixes, a retrieval of messages by a prefix of “MESSAGES/R#chatroom-id” would retrieve all messages in the Chat Room “chatroom-id”. Not as intuitive as the Firestore structure as it requires carefully thinking about the row key structure.

One-to-Many using Spanner

Spanner behaves like a traditional relational database with a lot of smarts under the covers to scale massively. So for a one-to-many data model perspective, the relational concepts just carry over.

Chat Rooms can be stored in a “ChatRooms” table with the columns holding attributes of a chat room

Chat Messages can be stored in a “ChatMessages” table with columns holding the attributes of a chat message. A foreign key, say “ChatRoomId” in Chat Message can point to the relevant Chat Room.

Given this, all chat messages for a room can be retrieved using a query on Chat Messages with a filter on the Chat Room Id.


I hope this gives a taste of what it takes to model in these three excellent fully managed GCP databases.

Tuesday, February 1, 2022

Google Cloud Java Client — ApiFuture to Reactive types

 Google Cloud Java Client libraries use a ApiFuture type to represent the result of an API call. The calls are asynchronous and the ApiFuture type represents the result once the call is completed.

If you have used Reactive stream based libraries like Project Reactor, a big benefit of using the Reactive types like Mono and Flux is that they provide a rich set of operators that provide a way to transform the data once available from the asynchronous call.

This should become clearer in an example. Consider a Cloud Firestore call to retrieve a ChatRoom entity by id:

There are few issues here, the “get()” call is used for blocking and waiting on the response of the async call to come through, which can throw an exception which needs to be accounted for. Then the response is shaped into the ChatRoom type.

Now, look at the same flow with reactive types, assuming that there is a utility available to convert the ApiFuture type to the Mono type:

Here the map operator takes care of transforming the result to the required “ChatRoom” type and any exception is wrapped in Mono type itself.

Alright, so now how can the ApiFutureUtil be implemented, a basic implementation looks like this:

This utility serves the purpose of transforming the ApiFuture type, however one catch is that this Mono type is hot. What does this mean — normally reactive streams pipeline(with all the operators chained together) represents the computation, this computation comes alive only when somebody subscribes to this pipeline, with a ApiFuture converted to Mono, even without anybody subscribing, the result will still be emitted. This is okay as the purpose is to use the Mono type for its operators. If “cold” is desired then even the Api call itself can be deferred something like this:

I hope this gives some idea of how Reactive Stream types can be created from ApiFuture. This is far from original though, if you desire a canned approach of doing this, a better solution is to use Spring-Cloud-Gcp Java library which already has these utilities baked in.

Sunday, January 16, 2022

Service to Service Call Pattern - Multi-Cluster Ingress

Multi-Cluster Ingress is a neat feature of Anthos and GKE (Google Kubernetes Engine), whereby a user accessing an application that is hosted on multiple GKE clusters, in different zones is directed to the right cluster that is nearest to the user!

So for eg. consider two GKE clusters, one in us-west1, based out of Oregon, USA and another in europe-north1, based out of Finland. An application is installed to these two clusters. Now, a user accessing the application from US will be lead to the GKE cluster in us-west1 and a user coming in from Europe will be lead to the GKE cluster in europe-north1. Multi-cluster Ingress enables this easily!

Enabling Multi-Cluster Ingress

Alright, so how does this work. 

Let me once again assume that I have two clusters available in my GCP project, one in us-west1-a zone and another in europe-north1-a, and an app called "Caller" deployed to these two clusters. For a cluster, the way to get traffic into the cluster from a user outside of it is typically done using an "Ingress"

This works great for a single cluster, however not so for a bunch of clusters. A different kind of an Ingress resource is required that spans GKE clusters and this is where a Multi-Cluster ingress comes in - an ingress that spans clusters.

Multi-Cluster Ingress is a Custom resource provided by GKE and looks something like this:

It is defined in one of the clusters, designated as a "config" cluster. 
See how there is a a reference to "sample-caller-mcs" above, that is pointing to a "MultiClusterService" resource, which is again a custom resource that will work only in the context of a GKE project. A definition for such a resource, looks almost like a Service and here is the one for "sample-caller-mcs"

Now that there is a MultiClusterIngress defined pointing to a MultiClusterService, what all happens under the covers:
1. A load balancer is created which uses an ip advertised using anycast - better details are here. These anycast ip's help get the request through to the cluster closest to the user.
2. A Network Endpoint Group(NEG) is created for every cluster that matches the definition of MultiClusterService. These NEG's are used as the backend of the loadbalancer.

Sample Application

I have a sample set of applications and deployment manifests available here that demonstrates Multi-Cluster Ingress. There are instructions to go with it here. This brings up an environment which looks like this:

Now to simulate a request coming in from us-west1-a is easy for me since I am in US, another approach is to simply spin up an instance in us-west1-a and use that to make a request the following way:

And the "caller" invoked should be the one in us-west1-a, similarly if the request is made from an instance in europe-north1-a:

The "caller" invoked will be the one in europe-north1-a!!


This really boggles my mind, being able to spin up two clusters on two different continents, and having a request from the user directed to the one closest to them, in a fairly simple way. There is a lot going on under the covers, however this is abstracted out using the resource types of MultiClusterIngress and MultiClusterService. 

Tuesday, January 4, 2022

Service to Service call pattern - Using Anthos Service Mesh

Anthos Service Mesh makes it very simple for a service in one cluster to call service in another cluster. Not just calling the service but also doing so securely, with fault tolerance and observability built-in.

This is a fourth in a series of posts on service to service call patterns in Google Cloud. 

The first post explored Service to Service call pattern in a GKE runtime using a Kubernetes Service abstraction

The second post explored Service to Service call pattern in a GKE runtime with Anthos Service mesh

The third post explored the call pattern across multiple GKE runtimes with Multi-Cluster Service

Target Call Pattern

There are two services deployed to two different clusters. The "caller" in "cluster1" invokes the "producer" in "cluster2".

Creating Clusters and Anthos Service Mesh

The entire script to create the cluster is here. The script:
1. Spins up two GKE standard clusters
2. Adds firewall rules to enable ip's in one cluster to reach the other cluster
3. Installs service mesh on each of the clusters

Caller and Producer Installation

The caller and the producer is deployed using the normal kubernetes deployment descriptors, no additional special resource is required to get the set-up to work, so for eg, the callers deployment looks like this:

apiVersion: apps/v1
kind: Deployment
  name: sample-caller-v1
    app: sample-caller
    version: v1
  replicas: 1
      app: sample-caller
      version: v1
        app: sample-caller
        version: v1
      serviceAccountName: sample-caller-sa
        - name: sample-caller
            - containerPort: 8080

Caller to Producer Call

The neat thing with this entire set-up is that from the callers perspective a call continues to be made to the dns name of a service representing the producer. So assuming that the producer's service is deployed to the same namespace, then a  dns name of "producer" should just work.

So with this in place, a call from the caller to producer looks something like this:

The call fails, with a message that the "sample-producer" host name in cluster1 cannot be resolved. This is perfectly okay as such a service has not been created in cluster1. Creating such a service:

resolves the issue and a call cleanly goes through!! This is magical, see how the service in cluster 1 resolves the pods in cluster2!

Additionally the presence of x-forwarded-client-cert header in the producer indicates that the mTLS is being used during the call. 

Fault Tolerance

So security via mTLS is accounted for, now I want to layer in some level of fault tolerance. This can be done by ensuring that the calls timeout instead of just hanging, and not making repeated calls to producer if it starts to be non-responsive. This is typically done using istio configuration. Since Anthos service mesh is essentially a managed istio, the configuration for timeout looks something like this, using a VirtualService configuration

And circuit breaker, using a Destination Rule which looks like this:

All of it is just straight kubernetes configuration and it just works across multiple clusters.


The fact that I can treat multiple clusters as if they were a single cluster is I believe the real value proposition of Anthos Service Mesh, all the work around how to enable such a communication securely with fault tolerance is what the Mesh brings to the table.

My repository has all the sample that I have used for the post -