Saturday, May 14, 2022

Google Cloud Structured Logging for Java Applications

 One advice for logging that I have seen when targeting applications to cloud platforms is to simply write to Standard Out and platform takes care of sending it to the appropriate log sinks. This mostly works except when it doesn't - it especially doesn't when analyzing failure scenarios. Typically for Java applications this means looking through a stack trace and each line of a stack trace is treated as a separate log entry by the log sinks, this creates these problems:

  1. Correlating multiple line of output as being part of a single stack trace
  2. Since applications are multi-threaded even related logs may not be in just the right order
  3. The severity of logs is not correctly determined and so does not find its way into the Error Reporting system

This post will go into a few approaches when logging from a Java application in Google Cloud Platform


Problem

Let me go over the problem once more, so say I were to log the following way in Java code:

  LOGGER.info("Hello Logging") 
  

And it shows up the following way in the GCP Logging console

{
  "textPayload": "2022-04-29 22:00:12.057  INFO 1 --- [or-http-epoll-1] org.bk.web.GreetingsController           : Hello Logging",
  "insertId": "626c5fec0000e25a9b667889",
  "resource": {
    "type": "cloud_run_revision",
    "labels": {
      "service_name": "hello-cloud-run-sample",
      "configuration_name": "hello-cloud-run-sample",
      "project_id": "biju-altostrat-demo",
      "revision_name": "hello-cloud-run-sample-00008-qow",
      "location": "us-central1"
    }
  },
  "timestamp": "2022-04-29T22:00:12.057946Z",
  "labels": {
    "instanceId": "instanceid"
  },
  "logName": "projects/myproject/logs/run.googleapis.com%2Fstdout",
  "receiveTimestamp": "2022-04-29T22:00:12.077339403Z"
}
  

This looks reasonable. Now consider the case of logging in case of an error:

{
  "textPayload": "\t\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2068) ~[reactor-core-3.4.17.jar:3.4.17]",
  "insertId": "626c619b00005956ab868f3f",
  "resource": {
    "type": "cloud_run_revision",
    "labels": {
      "revision_name": "hello-cloud-run-sample-00008-qow",
      "project_id": "biju-altostrat-demo",
      "location": "us-central1",
      "configuration_name": "hello-cloud-run-sample",
      "service_name": "hello-cloud-run-sample"
    }
  },
  "timestamp": "2022-04-29T22:07:23.022870Z",
  "labels": {
    "instanceId": "0067430fbd3ad615324262b55e1604eb6acbd21e59fa5fadd15cb4e033adedd66031dba29e1b81d507872b2c3c6cd58a83a7f0794965f8c5f7a97507bb5b27fb33"
  },
  "logName": "projects/biju-altostrat-demo/logs/run.googleapis.com%2Fstdout",
  "receiveTimestamp": "2022-04-29T22:07:23.317981870Z"
}

There would be multiple of these in the GCP logging console, for each line of the stack trace with no way to correlate them together. Additionally, there is no severity attached to these event and so the error would not end up with Google Cloud Error Reporting service.

Configuring Logging

There are a few approaches to configuring logging for a Java application targeted to be deployed to Google Cloud. The simplest approach, if using Logback, is to use the Logging appender provided by Google Cloud available here - https://github.com/googleapis/java-logging-logback.

Adding the appender is easy, a logback.xml file with the appender configured looks like this:

<configuration>
    <appender name="gcpLoggingAppender" class="com.google.cloud.logging.logback.LoggingAppender">
    </appender>
    <root level="INFO">
        <appender-ref ref="gcpLoggingAppender"/>
    </root>
</configuration>
This works great, but it has a huge catch. It requires connectivity to a GCP environment as it writes the logs directly to Cloud Logging system, which is not ideal for local testing. 

An approach that works when running in a GCP environment as well as locally is to simply direct the output to Standard Out, this will ensure that the logs are written in a json structured format and shipped correctly to Cloud Logging.
<configuration>
    <appender name="gcpLoggingAppender" class="com.google.cloud.logging.logback.LoggingAppender">
        <redirectToStdout>true</redirectToStdout>
    </appender>
    <root level="INFO">
        <appender-ref ref="gcpLoggingAppender"/>
    </root>
</configuration>
If you are using Spring Boot as the framework, the approach can be even be customized such that on a local environment the logs get written to Standard Out in a line by line manner, and when deployed to GCP, the logs are written as Json output:
<configuration>
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <include resource="org/springframework/boot/logging/logback/console-appender.xml"/>

    <appender name="gcpLoggingAppender" class="com.google.cloud.logging.logback.LoggingAppender">
        <redirectToStdout>true</redirectToStdout>
    </appender>

    <root level="INFO">
        <springProfile name="gcp">
            <appender-ref ref="gcpLoggingAppender"/>
        </springProfile>
        <springProfile name="local">
            <appender-ref ref="CONSOLE"/>
        </springProfile>
    </root>
</configuration>  
  

This Works..But

Google Cloud logging appender works great, however there is an issue. It doesn't capture the entirety of a stack trace for some reason. I have an issue open which should address this. In the meantime if capturing the full stack in the logs is important then a different approach is to simply write a json formatted log using the native json layout provided by logback:

<appender name="jsonLoggingAppender" class="ch.qos.logback.core.ConsoleAppender">
    <layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
        <jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter">
        </jsonFormatter>
        <timestampFormat>yyyy-MM-dd HH:mm:ss.SSS</timestampFormat>
        <appendLineSeparator>true</appendLineSeparator>
    </layout>
</appender> 
  
The fields however does not match the structured log format recommended by GCP, especially the severity, a quick tweak can be made by implementing a custom JsonLayout class that looks like this:

package org.bk.logback.custom;

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.contrib.json.classic.JsonLayout;
import com.google.cloud.logging.Severity;

import java.util.Map;

public class GcpJsonLayout extends JsonLayout {
    private static final String SEVERITY_FIELD = "severity";

    @Override
    protected void addCustomDataToJsonMap(Map<String, Object> map, ILoggingEvent event) {
        map.put(SEVERITY_FIELD, severityFor(event.getLevel()));
    }

    private static Severity severityFor(Level level) {
        return switch (level.toInt()) {
            // TRACE
            case 5000 -> Severity.DEBUG;
            // DEBUG
            case 10000 -> Severity.DEBUG;
            // INFO
            case 20000 -> Severity.INFO;
            // WARNING
            case 30000 -> Severity.WARNING;
            // ERROR
            case 40000 -> Severity.ERROR;
            default -> Severity.DEFAULT;
        };
    }
}

which takes care of mapping to the right Severity levels for Cloud Error reporting. 

Conclusion

Use Google Cloud Logback appender and you should be set. Consider the alternate approaches only if you think you are lacking more of the stacktrace.

Saturday, April 30, 2022

Calling Google Cloud Services in Java

If you want to call Google Cloud Services using a Java based codebase, then broadly there are two approaches to incorporating the client libraries in your code — the first, let’s call it a “direct” approach is to use the Google Cloud Client libraries available here, the second approach is to use a “wrapper”, Spring Cloud GCP libraries available here.

So given both these libraries which one should you use. My take is simple — if you have a Spring Boot based app likely Spring Cloud GCP should be the preferred approach else the “direct” libraries.


Using Pub/Sub Client libraries

The best way to see the two approaches in action is to use it for making a call — in this case to publish a message to Cloud Pubsub.
The kind of contract I am expecting to implement looks like this:

The “message” is a simple type and looks like this, represented as a Java record:


Given this, let’s start with the “direct” approach.

Direct Approach

The best way that I have found to get to the libraries is using this page — https://github.com/googleapis/google-cloud-java/, which in turn links to the client libraries for the specific GCP services, the cloud pub/sub one is here — https://github.com/googleapis/java-pubsub. I use gradle for my builds and to pull in pub/sub libs with gradle is done this way:


implementation platform('com.google.cloud:libraries-bom:25.1.0')
implementation('com.google.cloud:google-cloud-pubsub')
With the library pulled in, the code to publish a message looks like this:


The message is converted to a raw json and published to Cloud Pub/Sub which returns a ApiFuture type. I have previously covered how such a type can be converted to reactive types which is finally returned from the publishing code.

The “publisher” is created using a helper method:


Publisher publisher = Publisher.newBuilder("sampletopic").build();

Spring Cloud GCP Approach

The documentation for Spring Cloud GCP project is available here, first to pull in the dependencies, for a Gradle based project it looks like this:



dependencies {
   implementation 'com.google.cloud:spring-cloud-gcp-starter-pubsub'
}

dependencyManagement {
   imports {
      mavenBom "com.google.cloud:spring-cloud-gcp-dependencies:${springCloudGcpVersion}"
      mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
   }
}

With the right dependencies pulled in Spring Boot Auto-configuration comes into play and automatically creates a type called the PubSubTemplate with properties that can tweak configuration A code to publish a message to a topic using a PubSubTemplate looks like this:

Comparison


Given these two code snippets, these are some of the differences:
  • Spring Cloud GCP has taken care of a bunch of boiler plate around how to create a Publisher (and subscriber if listening to messages)
  • The PubSubTemplate provides simpler helper methods for publishing messages and for listening to messages, the return type which is ListenableFuture with PubSubTemplate can easily be transformed to reactive types unlike the ApiFuture return type
  • Testing with Spring Cloud GCP is much simpler as the Publisher needs to be tweaked extensively to work with an emulator and Spring Cloud GCP handles this complication under the covers

Conclusion

The conclusion for me is that Spring Cloud GCP is compelling, if a project is Spring Boot based then Spring Cloud GCP will fit in great and provides just the right level of abstraction in dealing with the Google Cloud API’s.
The snippets in this blog post doesn’t do justice to some of the complexities of the codebase, my github repo may help with a complete working codebase with both “direct” and Spring cloud GCP based code — https://github.com/bijukunjummen/gcp-pub-sub-sample

Saturday, March 5, 2022

Modeling one-to-many relation in Firestore, Bigtable, Spanner

I like working with services that need little to no provisioning effort — these are typically termed as Fully Managed services by different Providers.

The most provisioning effort is typically required for database systems, I remember having to operate a Cassandra cluster in a previous job and the amount of effort spent on provisioning, upkeep was far from trivial and I appreciated and empathized with the role of a Database administrator dearly during that time.

My objective in this post is to explore how a one-to-many relationship can be maintained in 3 managed database solutions on Google Cloud — Firestore, Bigtable and Spanner.

Data Model

The data model is to represent a Chat Room with Chat Messages in the rooms.



Chat Room just has name as an attribute. Each Chat Room has a set of Chat Messages, with each message having a payload and creation date as attributes. A sample would look something like this:

So now comes the interesting question, how can this one-to-many relation be modeled using Firestore, Bigtable and Spanner. Let’s start with Firestore.

One-to-many using Firestore

Managing a One-to-many relation comes naturally to Firestore. The concepts map directly to the structures of Firestore:

  • Each Chat Room instance and each Chat Message can be thought of as a Firestore “Document”.
  • All the Chat Room instances are part of a “ChatRooms” “Collection”
  • Each Chat Room “Document” has a “Sub-Collection” to hold all the Chat Messages relevant to it, this way establishing a One-to-Many relationship

One-to-Many using Bigtable

A quick aside, in Bigtable information is stored in the following form







Each Chat Room and Chat Room message can be added in as rows with carefully crafted row keys.

  • A chat room, needs to be retrieved by its id, so a row key may look something like this: “ROOM/R#room-id”
  • Chat Room message row key can be something like this: “MESSAGES/R#chatroom-id/M#message-id”

Since Bigtable queries can be based on prefixes, a retrieval of messages by a prefix of “MESSAGES/R#chatroom-id” would retrieve all messages in the Chat Room “chatroom-id”. Not as intuitive as the Firestore structure as it requires carefully thinking about the row key structure.

One-to-Many using Spanner

Spanner behaves like a traditional relational database with a lot of smarts under the covers to scale massively. So for a one-to-many data model perspective, the relational concepts just carry over.

Chat Rooms can be stored in a “ChatRooms” table with the columns holding attributes of a chat room

Chat Messages can be stored in a “ChatMessages” table with columns holding the attributes of a chat message. A foreign key, say “ChatRoomId” in Chat Message can point to the relevant Chat Room.





Given this, all chat messages for a room can be retrieved using a query on Chat Messages with a filter on the Chat Room Id.

Conclusion

I hope this gives a taste of what it takes to model in these three excellent fully managed GCP databases.