Jekyll2023-12-04T21:03:10-05:00https://www.ombulabs.com/blog/rss.xmlOmbuLabs BlogThe Lean Software BoutiqueOmbuLabsHacktoberfest 2023: How We Merged Open Source Contributions with Learning Objectives2023-11-21T03:43:00-05:002023-11-21T03:43:00-05:00https://www.ombulabs.com/blog/hacktoberfest-2023-summaryAs a company, one of our core values is to be “open by default.” At the same time, one of our goals is to use our open source investment time as a way to improve our skills as engineers and designers.

In that spirit, this year we decided to organize our open source contribution time in a way that wasn’t limited to our own open source projects. This is a short post to explain how we aligned our open source contributions with our learning goals, what contributions we made, and why it mattered.

Our Motivation

Last year, as a company, we did an exercise in participating in Hacktoberfest with our team. There were positive and negative notes but, overall, feedback around the exercise itself was positive.

This year we had specific goals and topics we wanted to focus on as a team. We decided to use open source projects as a way to learn and practice while also contributing to the community.

Therefore, this year we aligned our open source contributions with our learning purposes. As a part of our company, we conduct monthly one-on-one calls with our full-time employees. In those calls, we learn about areas and skills that our direct reports would like to improve.

The problem is that sometimes client work doesn’t give us the opportunities we need to work on said skills.

That’s why we decided to use the month of October to contribute to open source projects with the following intentions:

  • For senior engineers: We wanted them to improve their upgrading and debugging skills, so that they could improve their skills when it comes to fixing medium to high complexity bugs.

  • For mid-level engineers: We wanted them to work on features so that they could improve their skills when it came to greenfield-like projects.

Hacktoberfest Plan

This year we decided not to restrict contributions to repositories that were officially participating in Hacktoberfest.

We asked everyone to suggest repositories before we started and we quickly came up with a list of approved projects.

Senior engineers were asked to work on two kinds of issues: technical debt and bugs.

Mid-level engineers were asked to work on any kind of issue they found interesting, with a focus on new features or feature changes.

To organize that:

  • We divided everyone into pairs.
  • Each pair was asked to select issues from the list of approved projects.

Our Teams

This time we decided to split in teams:

Our Own Open Source Projects

When it came to our own projects, we decided to have only Ariel and Ernesto’s team work on open source projects maintained by OmbuLabs.

We focused on these projects:

Approved Projects

We wanted to make sure that our teams focused on projects that were approved by our engineering management team. The list included some well-known and really useful tools that we’ve been using for years:

Contributions

In terms of contributions, we considered activity on pull requests and issues as a valid contribution. We understand that sometimes you are looking to add value to an open source project, and after hours of research and trying many different things, all you can add is a comment to an existing issue. In our exercise, and in general, that counts as a contribution too!

Issues

Here are all the issues where we added value:

Pull requests

Here are all the pull requests we submitted:

Investment Time

In total during the month of October we invested 392 hours in our open source contributions. That represents an investment of $79,000 into open source by 10 of our senior and mid-level engineers.

Next Steps

We plan to take all of our contributions across the finish line, using our regular, monthly and paid open source investment time. Outside of Hacktoberfest, on average, as a team we invest 38 hours per month on open source contributions.

We look forward to continuing our investment in the open source projects that add value to the world and our communities. We believe this is the way to hone our craft, learn new things faster, and become better professionals.

]]>
etagwerker
Running Airflow on Google Kubernetes Engine without Helm2023-11-17T15:54:25-05:002023-11-17T15:54:25-05:00https://www.ombulabs.com/blog/airflow-on-gke-without-helmGoogle Cloud Platform (GCP) can be a very good option for Airflow and, although it offers its own managed deployment of Airflow, Cloud Composer, managing our own deployment gives us more granular control over the underlying infrastructure, impacting choices such as what Python version to run and even when to upgrade Airflow itself.

The Airflow community maintains a Helm chart for Airflow deployment on a Kubernetes cluster. The Helm chart comes with a lot of resources, as it contains a full Airflow deployment with all the capabilities. We didn’t need all of that, and we wanted granular control over the infrastructure. Therefore, we chose not to use Helm, although it provides a very good starting point for the configuration.

Overview

The Airflow installation consists of five different components that interact with each other, as illustrated below:

Airflow Components (Source: Official Airflow Documentation)

In order to configure our Airflow deployment on GCP, we used a few different services:

  • Google Kubernetes Engine (GKE) for the cluster hosting the scheduler and webserver deployments
  • Postgres instance in CloudSQL for the metadata database
  • Git-sync to store DAG files in an ephemeral volume, syncing directly from GitHub

NOTE: The steps below assume you have both the Google Cloud SDK and kubectl installed, and a GCP project set up.

CloudSQL and Cluster Set Up

Before deploying Airflow, we need to configure a CloudSQL instance for the metadata database and the GKE cluster that will host the Airflow deployment. We opted to use a Virtual Private Cloud (VPC) to allow the connection between GKE and CloudSQL.

Setting up a CloudSQL instance

To create a CloudSQL instance for the Airflow database:

gcloud sql instances create airflow_metadb \
    --database-version=POSTGRES_15 \
    --tier=db-n1-standard-2 \
    --region=us-east1 \
    --network=airflow_network
    --root-password=admin

Customize the database version, tier, region, and network to your needs. If you don’t plan on using a VPC, you don’t need the network argument. Check out the gcloud sql instances create documentation for a full list of what’s available.

Connect to the newly created instance to create a database to serve as the Airflow metadata database. Here, we’ll create a database called airflow_metadb:

gcloud beta sql connect airflow_metadb

This will open a Postgres shell, where you can create the database.

CREATE DATABASE airflow_meta;

Finally, get the instance’s IP address and port to construct the database connection URL, which will be needed for the Airflow set up. You’ll need the IP address listed as PRIVATE:

gcloud sql instances describe airflow_metadb

Your connection URL should follow the format:

postgresql+psycopg2://username:password@instance-ip-address:port/db-name

for a Postgres instance.

Setting up a GKE Cluster

Before initializing a new Kubernetes cluster on GKE, make sure you have the right project set in the gcloud CLI:

gcloud config set project airflow

Create a new cluster on GKE:

gcloud container clusters create airflow-cluster \
    --machine-type e2-standard-2 \
    --num-nodes 1 \
    --region "us-east1" \
    --scopes "cloud-platform"

Choose the correct machine type for your needs. If your cluster ends up requesting more resources than you need, you’ll end up overpaying for Airflow. Conversely, if you have less resources than required, you will run into issues such as memory pressure. Also choose the number of nodes to start and the region according to your needs. The --scopes argument set to cloud-platform allows the GKE cluster to communicate with other GCP resources. If that is not needed or desired, remove it.

For a full list of the options available, check the gcloud container clusters create documentation.

Authenticate kubectl against your newly created cluster:

gcloud container clusters get-credentials airflow-cluster --region "us-east1"

and create a Kubernetes namespace for the Airflow deployment. Although not necessary, this is a good practice, and it’ll allow for the grouping and isolating of resources, enabling, for example, separation of a production and staging deployment within the same cluster.

kubectl create namespace airflow

The cluster should now be set up and ready.

Cluster Preparation

Our goal was to have Airflow deployed to a GKE cluster and the Airflow UI exposed via a friendly subdomain. In order to do that, we need to obtain and use a certificate.

To make the process of obtaining, renewing, and using certificates as easy as possible, we decided to use cert-manager, a native Kubernetes certificate management controller. For that to work, though, we need to ensure that traffic is routed to the correct service, so requests made to the cert-manager solver to confirm domain ownership reach the right service, and requests made to access the Airflow UI also reach the right service.

In order to do that, an nginx ingress controller was needed.

NGINX Ingress Controller Configuration

Unlike an Ingress, an Ingress Controller is an application running inside the cluster that configures a load balancer according to multiple ingress resources. The NGINX ingress controller is deployed in a pod along with such load balancer.

To help keep the ingress controller resources separate from the rest, let’s create a namespace for it:

kubectl create namespace ingress-nginx

The easiest way to deploy the ingress controller to the cluster is through the official Helm Chart. Make sure you have helm installed, then add the nginx Helm repository and update your local Helm chart repository cache:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

Install the ingress-nginx Helm chart in the cluster:

helm install nginx-ingress ingress-nginx/ingress-nginx -n ingress-nginx

where nginx-ingress is the name we’re assigning to the instance of the Helm chart we’re deploying, ingress-nginx/ingress-nginx is the chart to be installed (the ingress-nginx chart in the ingress-nginx Helm repository) and -n ingress-nginx specifies the namespace within the Kubernetes cluster in which to install the chart.

With the controller installed, run:

kubectl get services -n ingress-nginx

and look for the EXTERNAL IP of the ingress-nginx-controller service. That is the IP address of the load balancer. To expose the Airflow UI via a subdomain, we configured an A record pointing to this IP address.

Cert-manager Configuration

Now that the controller is in place, we can proceed with the installation of the cert-manager. First, apply the CRD (CustomResourceDefinition) resources:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.13.0/cert-manager.crds.yaml

The cert-manager relies on its own custom resource types to work, this ensures these resources are installed.

Like with the controller, we’ll also create a separate namespace for the cert-manager resources:

kubectl create namespace cert-manager

And install cert-manager using the Helm chart maintained by Jetstack:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v.1.13.0

With cert-manager installed, we now need two additional resources to configure it: a ClusterIssuer and Certificate.

The ClusterIssuer creates a resource to represent a certificate issuer within Kubernetes, i.e., it defines a Kubernetes resource to tell cert-manager who the certificate issuing entity is and how to connect to it. You can create a simple ClusterIssuer for Let’s Encrypt as follows:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: my_email@my_domain.com
    privateKeySecretRef:
      name: letsencrypt
    solvers:
    - http01:
        ingress:
          class: nginx

The Certificate resource then defines the certificate to issue:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: airflow-certificate
  namespace: airflow
spec:
  secretName: cert-tls-secret
  issuerRef:
    name: letsencrypt
    kind: ClusterIssuer
  commonName: airflow.my_domain.com
  dnsNames:
  - airflow.my_domain.com

Apply both resources to the cluster to get the certificate issued. Assuming everything went well and the DNS records are set up correctly, when you run:

kubectl describe certificate airflow-certificate -n airflow

you should see Status: True at the bottom of the certificate’s description, indicating the certificate has been issued.

Now our cluster is ready to receive the Airflow deployment.

Deploying Airflow

The Airflow deployment includes a few different pieces, so we can get Airflow to properly work. The Airflow installation in Kubernetes ends up looking more like this:

Airflow on Kubernetes (Source: Official Airflow Documentation)

Our complete Airflow deployment resources ended up looking like this:

resources
|---- airflow.cfg
|---- secrets.yaml
|---- persistent_volumes
      |---- airflow-logs-pvc.yaml
|---- rbac
      |---- cluster-role.yaml
      |---- cluster-rolebinding.yaml
|---- scheduler
      |---- scheduler-deployment.yaml
      |---- scheduler-serviceaccount.yaml
|---- statsd
      |---- statsd-deployment.yaml
      |---- statsd-service.yaml
|---- webserver
      |---- webserver-deployment.yaml
      |---- webserver-ingress.yaml
      |---- webserver-service.yaml
      |---- webserver-serviceaccount.yaml

Secrets and Configuration

In order to successfully deploy Airflow, we need to make sure the airflow.cfg file is available in the relevant pods. Airflow allows you to configure a variety of different things through this file (check the Configuration Reference for more detailed information).

In Kubernetes, this kind of configuration is stored in a ConfigMap, which a special kind of “volume” you can mount inside your pods and use to make configuration files available to them. The ConfigMap works together with Kubernetes secrets, meaning you can reference a Secret directly inside a ConfigMap or pass the Secret as an environment variable and reference that.

Of note: Kubernetes secrets are somewhat unsafe considering they just contain a base64 encrypted string that can be easily decrypted. If secrets need to be versioned or committed somewhere, it’s better to use GCP’s Secret Manager instead.

A ConfigMap for the airflow.cfg file can be created running:

kubectl create configmap airflow-config --from-file=airflow.cfg -n airflow

where airflow-config is the name of the ConfigMap created and the -n airflow flag is necessary to create the resource in the correct namespace.

Kubernetes secrets can be created using a secrets.yaml manifest file to declare individual secrets:

apiVersion: v1
kind: Secret
metadata:
  name: airflow-metadata
type: Opaque
data:
  connection: "your-base64-encrypted-connection-string"
  fernet-key: "your-base64-encrypted-fernet-key"

---

apiVersion: v1
kind: Secret
metadata:
  name: git-sync-secrets
type: Opaque
data:
  username: "your-base64-encrypted-username"
  token: "your-base64-encrypted-token"

If you decide to go with plain Kubernetes secrets, keep this yaml file private (don’t commit it to a repository). To apply it to your cluster and create all the defined secrets, run:

kubectl apply -f secrets.yaml -n airflow

This command will apply the secrets.yaml file to the Kubernetes cluster, in the airflow namespace. If secrets.yaml is a valid Kubernetes manifest file and the secrets are properly defined, all Kubernetes secrets specific within the file will be created in the cluster and namespace.

Persistent Volumes

What volumes (and how many volumes) you’ll need will depend on how you decide to store Airflow logs and how your DAGs are structured. There are, in essence, two ways to store DAG information:

  • Store DAGs in a persistent volume
  • Sync them from a git repository into an ephemeral volume mounted inside the cluster

The key point to keep in mind is that the folder the Airflow scheduler and webserver are watching to retrieve DAGs from and fill in the DagBag needs to contain built DAGs Airflow can process. In our case, our DAGs are static, built directly into DAG files. Therefore, we went with a simple git-sync approach, syncing our DAG files into an ephemeral volume and pointing the webserver and scheduler there.

This means the only persistent volume we needed was to store Airflow logs.

A PersistentVolume is a cluster resource that exists independently of a Pod, meaning the disk and data stored there will persist as the cluster changes, and Pods are deleted and created. These can be dynamically created through a PersistentVolumeClaim, which is a request for and claim to a PersistentVolume resource:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airflow-logs-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

This creates an airflow-logs-pvc resource we can use to store Airflow logs.

Role-Based Access Control (RBAC)

Kubernetes RBAC is a security feature allowing us to manage access to resources within the cluster through defined roles. A Role is a set of rules that defines the actions allowed within a specific namespace. A RoleBinding is a way to associate a specific Role with a user or, in our case, a service account.

To define roles that apply cluster-wide rather than specific to a namespace, you can use a ClusterRole and an associated ClusterRoleBinding instead.

In the context of our Airflow deployment, a ClusterRole is required to allow the relevant service account to manage Pods. Therefore, we created an airflow-pod-operator role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  namespace: airflow
  name: airflow-pod-operator
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "delete", "get", "list", "patch", "watch"]

with an associated role binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: airflow-pod-operator
subjects:
- kind: ServiceAccount
  name: airflow-service-account
  namespace: airflow
roleRef:
  kind: Role
  name: airflow-pod-operator
  apiGroup: rbac.authorization.k8s.io

Scheduler Deployment

The scheduler is a critical component of the Airflow application, and it needs to be deployed to its own Pod inside the cluster. At its core, the scheduler is responsible for ensuring DAGs run when they are supposed to, and tasks are scheduled and ordered accordingly.

The scheduler deployment manifest file that comes with the Helm chart (you can find it inside the scheduler folder) is a good starting point for the configuration. You’ll only need to tweak it a bit to match your namespace and any specific configuration you might have around volumes.

In our case, we wanted to sync our DAGs from a GitHub repository, so we needed to configure a git-sync container. An easy way to get started is to configure the connection with a username and token, although for a production deployment it’s best to configure the connection via SSH. With git-sync configured, our scheduler deployment looked like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-scheduler
  namespace: airflow
  labels:
    tier: airflow
    component: scheduler
    release: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      tier: airflow
      component: scheduler
      release: airflow
  template:
    metadata:
      labels:
        tier: airflow
        component: scheduler
        release: airflow
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      restartPolicy: Always
      terminationGracePeriodSeconds: 10
      serviceAccountName: airflow-service-account
      volumes:
        - name: config
          configMap:
            name: airflow-config
        - name: dags-volume
          emptyDir: {}
        - name: logs-volume
          persistentVolumeClaim:
            claimName: airflow-logs-pvc
      initContainers:
        - name: run-airflow-migrations
          image: apache/airflow:2.7.1-python3.11
          imagePullPolicy: IfNotPresent
          args: ["bash", "-c", "airflow db migrate"]
          env:
            - name: AIRFLOW__CORE_FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
          volumeMounts:
            - name: config
              mountPath: "/opt/airflow/airflow.cfg"
              subPath: airflow.cfg
              readOnly: true
      containers:
        - name: git-sync
          image: registry.k8s.io/git-sync/git-sync:v4.0.0-rc5
          args:
            - --repo=https://github.com/ombulabs/airflow-pipelines
            - --depth=1
            - --period=60s
            - --link=current
            - --root=/git
            - --ref=main
          env:
            - name: GITSYNC_USERNAME
              valueFrom:
                secretKeyRef:
                  name: git-username
                  key: username
            - name: GITSYNC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: git-token
                  key: token
          volumeMounts:
            - name: dags-volume
              mountPath: /git

        - name: scheduler
          image: us-east1-docker.pkg.dev/my_project/airflow-images/airflow-deployment:latest
          imagePullPolicy: Always
          args:
            - scheduler
          env:
            - name: AIRFLOW__CORE__DAGS_FOLDER
              value: "/git/current"
            - name: AIRFLOW__CORE__FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
          livenessProbe:
            failureThreshold: 15
            periodSeconds: 30
            exec:
              command:
                - python
                - -Wignore
                - -c
                - |
                  import os
                  os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                  os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
                  from airflow.jobs.scheduler_job import SchedulerJob
                  from airflow.utils.net import get_hostname
                  import sys
                  job = SchedulerJob.most_recent_job()
                  sys.exit(0 if job.is_alive() and job.hostname == get_hostname() else 1)
          volumeMounts:
            - name: config
              mountPath: "/opt/airflow/airflow.cfg"
              subPath: airflow.cfg
              readOnly: true
            - name: dags-volume
              mountPath: /git
            - name: logs-volume
              mountPath: "/opt/airflow/logs"

The scheduler deployment is divided into two “stages”, the initContainers and the containers. When Airflow starts, it needs to run database migrations in the metadata database. That is what the init container is doing. It runs as soon as the scheduler pod starts, and ensures the database migration is completed before the main application containers start. Once the init container is done with the start up task, the git-sync and scheduler containers can run.

Notice that the scheduler container references a custom image in Artifact Registry. Given our pipeline set up and choice of executor, we replaced the official Airflow image in the deployment with our own image.

Webserver Deployment

The webserver is another critical Airflow component, it exposes the Airflow UI and manages user interaction with Airflow. Its deployment is all too similar to that of the scheduler, with minor differences, so we won’t go into it in detail. The manifest file looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-webserver
  namespace: airflow
  labels:
    tier: airflow
    component: webserver
    release: airflow
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate: 2023-11-17 20:54:25
      maxSurge: 3
      maxUnavailable: 1
  selector:
    matchLabels:
      tier: airflow
      component: webserver
      release: airflow
  template:
    metadata:
      labels:
        tier: airflow
        component: webserver
        release: airflow
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      restartPolicy: Always
      terminationGracePeriodSeconds: 10
      serviceAccountName: default
      volumes:
        - name: config
          configMap:
            name: airflow-config
        - name: dags-volume
          emptyDir: {}
        - name: logs-volume
          persistentVolumeClaim:
            claimName: airflow-logs-pvc
      initContainers:
        - name: run-airflow-migrations
          image: apache/airflow:2.7.1-python3.11
          imagePullPolicy: IfNotPresent
          args: ["bash", "-c", "airflow db migrate"]
          env:
            - name: AIRFLOW__CORE__FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
          volumeMounts:
            - name: config
              mountPath: "/opt/airflow/airflow.cfg"
              subPath: airflow.cfg
              readOnly: true
      containers:
        - name: git-sync
          image: registry.k8s.io/git-sync/git-sync:v4.0.0-rc5
          args:
            - --repo=https://github.com/ombulabs/airflow-pipelines
            - --depth=1
            - --period=60s
            - --link=current
            - --root=/git
            - --ref=main
          env:
            - name: GITSYNC_USERNAME
              valueFrom:
                secretKeyRef:
                  name: git-username
                  key: username
            - name: GITSYNC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: git-token
                  key: token
          volumeMounts:
            - name: dags-volume
              mountPath: /git

        - name: webserver
          image: us-east1-docker.pkg.dev/my_project/airflow-images/ombu-airflow-deployment:latest
          imagePullPolicy: Always
          args:
            - webserver
          env:
            - name: AIRFLOW__CORE__FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-metadata
                  key: connection
            - name: AIRFLOW__WEBSERVER__AUTH_BACKEND
              value: "airflow.api.auth.backend.basic_auth"
          volumeMounts:
            - name: config
              mountPath: "/opt/airflow/airflow.cfg"
              subPath: airflow.cfg
              readOnly: true
            - name: dags-volume
              mountPath: /git
            - name: logs-volume
              mountPath: "/opt/airflow/logs"
          ports:
            - name: airflow-ui
              containerPort: 8080
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 15

Perhaps the most notable thing here is the presence of the AIRFLOW__WEBSERVER__AUTH_BACKEND environment variable. This allows us to use a basic authentication backend with Airflow. As part of this deployment, we didn’t configure the creation of a root user, meaning one needed to be created from within the container by the first person trying to access the UI. If you find yourself in the same situation:

Run

kubectl exec -it <webserver-pod-name> -n airflow -c webserver -- /bin/sh

to access the shell within the webserver container. By default, running the command without the -c webserver flag will access the git-sync container, which is not what we want. Once inside the shell, run:

su airflow

To switch to the airflow user. This is needed to run airflow commands. Now you can run:

airflow users create --username <your_username> --firstname <first_name> --lastname <last_name> --role <the-user-role> --email <your-email> --password <your-password>

This will create a user with the specified role. This only needs to be run to create the first admin user after a fresh deployment, additional users can be created directly from within the interface.

Services and Ingresses

Having the webserver deployed to a pod is not enough to be able to access the UI. It needs a Service resource associated with it to allow access to the workload running inside the cluster. From our webserver manifest file, we defined an airflow-ui port name for the 8080 container port. Now we need a service that exposes this port so that network traffic can be directed to the correct pod:

kind: Service
apiVersion: v1
metadata:
  name: webserver-svc
  namespace: airflow
spec:
  type: ClusterIP
  selector:
    tier: airflow
    component: webserver
    release: airflow
  ports:
    - name: airflow-ui
      protocol: TCP
      port: 80
      targetPort: 8080

There are five types of Kubernetes services that can be defined, with the ClusterIP type being the default. It provides an internal IP and DNS name, making the service only accessible within the cluster. This means that we now have a service associated with the webserver, but we still can’t access the UI through a friendly subdomain as a regular application.

For that, we’ll configure an ingress next. An Ingress is an API object that defines the rules and configurations to manage external access to our cluster’s services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: airflow-ingress
  namespace: airflow
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt"
spec:
  ingressClassName: "nginx"
  tls:
    - hosts:
        - airflow.my_domain.com
      secretName: cert-tls-secret
  rules:
    - host: airflow.my_domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: webserver-svc
                port:
                  number: 80

The key configuration here that allows us to define the settings for secure HTTPS connections is the tls section. There, we can list all hosts for which to enable HTTPS and the name of the Kubernetes Secret that holds the TLS certificate and private key to use to secure the connection. This secret is automatically created by cert-manager.

Service Accounts

Finally, in order to ensure our resources have the necessary permission to spawn new pods and manage pods, we need to configure service accounts for them. You can choose to configure individual service accounts for each resource or a single service account for all resources, depending on your security requirements.

The ServiceAccount resource can be configured as:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: airflow
  labels:
    tier: airflow
    component: scheduler
    release: airflow
automountServiceAccountToken: true

Since we wanted users to be able to manage workflows directly from the UI, we also configured a service account for the webserver.

The StatsD Application

This is an optional component that collects metrics inside the Airflow application. The deployment is similar to the other two, so we won’t dive into it.

Conclusion

Airflow is now deployed to a GKE cluster and accessible via our chosen subdomain. This allows us to have a higher level of control over our infrastructure, while still leveraging GKE’s built-in resources to auto-scale as needed.

]]>
abizzinotto
Introducing the Account Advocate - A Dedicated Partner for Success2023-08-16T09:00:00-04:002023-08-16T09:00:00-04:00https://www.ombulabs.com/blog/introducing-account-advocateAs a company, we are committed to ensuring our client’s success and believe that maintaining strong relationships with the people who trust us with their projects is a driving force of success. One of our core values is, in fact, Continuous Improvement, and we make an effort to live it every day.

In that spirit, we are excited to announce a new role in our organization, the Account Advocate, a key role in our team fully dedicated to championing client interests, collaboration and ensuring successful partnerships that go above and beyond.

What is the Account Advocate?

The Account Advocate is a key, strategic role focused on ensuring that our clients are happy with our partnership not only from a technical and delivery perspective, but also from a business perspective. They are an advocate and representative for your business stakeholders inside our team, dedicated to connecting your vision with our delivery and ensuring your goals are met and any potential concerns are heard and addressed.

The Account Advocate works closely with the Project Manager to ensure success, but while the Project Manager focuses on delivery and the success of the existing project, the Account Advocate focuses on the overall relationship with the business, makes sure value delivery expectations are met, your team is being heard and ensures we’re delivering value to your company at every opportunity.

They also facilitate communication with senior leadership on both ends, ensuring that you have all the support you need for a successful collaboration.

How does the relationship work?

Communication is key to everything we do. We value open and honest communication with our clients and between our teams. As such, you will have plenty of contact and checkpoints with our delivery team.

The Account Advocate is focused on more strategic goals and higher-level partnership priorities , so they will aim to meet with business stakeholders quarterly. If a different frequency is preferred, we will most definitely adapt, but we believe at least quarterly contact is important to ensure success and happiness on both ends of the partnership.

While communicating and collaborating with you, the Account Advocate will focus on:

  1. Client Happiness: We are committed to understanding your goals, challenges and opportunities. Client happiness is at the core of our business, and they are your voice within our organization, ensuring your feedback is being heard and any concerns you might have are understood and addressed swiftly.

  2. Strong Partnership and Collaboration: Ongoing collaboration makes partnerships grow stronger, and we are interested in delivering as much value to your organization as we can. They will collaborate closely with your business team to foster trust and open communication and facilitate collaboration at the higher levels of leadership.

  3. New Opportunities: We are vested in your success and believe in going above and beyond in everything we do. The Account Advocate is interested in hearing what other problems we can help solve, other challenges we can help you overcome and overall other ways in which we can contribute to deliver cost-effective solutions that solve real problems and generate actual value for you and your team.

  4. Problem Resolution: We believe in Challenging Projects over Profitable Projects, that’s why we are so passionate about every project we work on. That also means we understand challenges arise and are a part of every successful collaboration. The Account Advocate is focused on solving any issues swiftly and transparently, ensuring minimal disruption.

Collaboration is Key to Success

As we introduce the Account Advocate role to our team and to our partnership, we are excited to see how it will contribute to an even more successful and strong relationship with our clients. This role strengthens our commitment to client happiness and success and our interest in building long-lasting relationships based on trust, open communication and transparency.

We look forward to working with you and your team on our next successful project! Contact us to get your next project started!

]]>
abizzinotto
Design Sprint Day 5: Test2023-05-23T02:00:00-04:002023-05-23T02:00:00-04:00https://www.ombulabs.com/blog/design-sprint-day-5This is part of our series on design sprints. If you haven’t read our previous articles, I encourage you to read more about our design sprint process.

Day 5 of the design sprint is about testing your prototype and getting feedback on your ideas. That way, you can quickly learn what is or isn’t working about the concept. Yesterday, the interviewer spent time putting together a list of questions for the interview sessions. Earlier this week, your team recruited 5 participants for Friday’s research. Now you are ready to do the dang thing.

Design Sprint Day 5 - Test

Why do we test?

We test early with a low-fidelity prototype because it’s smart and far less expensive than waiting until something is built. It’s important to try to find test participants who are outside of your organization, or at least unfamiliar with the product. The Design Sprint can’t be considered complete before research is done, so get ready to find out how other people feel about what you’ve been working on all week.

Activity 1: Preparing for the Interviews

Identify Research Goals

What does the team hope to learn from these interviews? A high level goal of “Do people like this?” might become something like “What do people think about the solution? What are the positives and negatives? What do people like or dislike about our solution vs our competitor’s solution?

Writing the Interview Questions

  • Start with easy open-ended interview questions that align with your research goals, such as “How long have you been doing…”.

  • Only ask open-ended questions, no “yes/no” questions, nor “multiple choice” questions like “would you do x?”.

  • You can ask things like “What was the most useful part of this prototype? What was the least useful?”.

  • Avoid asking any questions that might lead a participant to a particular answer. You want to learn as much as possible in the sessions, so keep the questions open-ended. You’ll be surprised how much you learn.

  • When you have finished writing your questions, run a pilot version of the session with a team member.

Adjust as needed if you notice any hiccups.

Materials

  • A laptop: A video-enabled virtual meeting tool, like Zoom, Webex, or Google Hangouts that enables your participants to share their screen and your team to observe from their computers.

  • A link to the prototype (like a Google slides link or something like that).

Activity 2: The Interview

Notes for the Moderator

People will try to please you and will generally be kind in interviews, so assure them that you’d like them to be honest with their feedback. Make sure that your interviewees understand that you are not testing them, but rather that they are helping you test the prototype. Tell the interviewee that they are not under any scrutiny and that all difficulties or issues are useful information for the team and will help make the solution better. Plan for each interview should take about 30m or so, depending on how many questions they will be asked. Give yourself about 20m between each interview to organize your notes and prepare for the next session.

Notes for the Observers

While the interviews are happening, the rest of the team should be paying attention and watching the interviews remotely. While observing, they should be taking notes on post-its of any notable comments, behaviors, or other observations. These notes will be used to determine the next course of action in terms of adjustments and fixes. Don’t worry about taking overlapping notes. The notes will be organized later and duplication will not affect the quality of the work at all.

Activity 3: Finding Themes

Once the interviews are complete, the team reviews their notes together, grouping likes notes together into themes. The team will discuss these themes. You’ll learn what went well, what didn’t go so well, and what direction or changes you should try again in the next iteration. Any changes should be prioritized by the team, and then used to determine the next steps for your fledgling product.

At this point, you have successfully completed the Design Sprint! Bravo!

]]>
nathalie
Handling Environment Variables in Ruby2023-05-17T08:00:00-04:002023-05-17T08:00:00-04:00https://www.ombulabs.com/blog/handling-env-variables-in-rubyConfiguring your Rails application can be tricky. How do you define secrets? How do you set different values for your local development and for production? How can you make it more maintainable and easy to use?

Using environment variables to store information in the environment itself is one of the most used techniques to address some of these issues. However, if not done properly, the developer experience can deteriorate over time, making it difficult to onboard new team members. Security vulnerabilities can even be introduced if secrets are not handled with care.

In this article, we’ll talk about a few tools that we like to use at OmbuLabs and ideas to help you manage your environment variables efficiently.

12 Factor Apps

In 2011, Heroku created The Twelve-Factor App methodology aimed at providing good practices to simplify the development and deployment of web applications.

As the name suggest, the methodology includes twelve factors, and the third factor states that the configuration of the application should be stored in the environment.

The idea of storing configuration in the environment was not created by Heroku but the popularity of Heroku for Ruby and Rails applications made this approach really popular.

The main benefit is that our code doesn’t have to store secrets or configuration values that can vary depending on where or how the application is run. Our code simply assumes that those values are available and correct.

Dotenv

The idea of storing configuration in the environment is simple for a single-app production environment, it is easy to set environment variables for the whole system.

Hosting providers like Heroku or Render have a configuration panel to manage the environment variables. However, when many applications have to run in the same system each of them may need different values for a given environment variable, and then the “environment” depends on the current project and not only on the system.

One of many tools to assist with this is the dotenv gem, which wraps our application with specific environment values based on hidden files that can be loaded independently for each app without polluting the system’s environment variables.

The way dotenv works is that it will read environment variables names and values from a file named .env and will populate the ENV hash with them.

By default, dotenv will NOT override variables if they are already present in the ENV hash, but that can be changed using overload instead of load when initializing the gem.

Sample or Template files

Since the .env file holds information that is specific for a given environment, this file is not meant to be included in the git repository.

How do we let new engineers know that we make use of a .env file or what the required environment variables are? The dotenv gem provides a good solution.

The dotenv gem provides a template feature to generate a .env.template file with the same environment variables but without actual values.

Another common practice is to use a file called .env.sample with similar content.

When a new developer clones the repository, they can copy the .env.template or .env.sample file as .env (or any of the variants, we’ll talk about this in a moment) and replace the values as needed.

Dotenv-validator

One issue that we have faced in many projects is when a new developer would need to know the environment variables (listed in a .env.sample file), but wouldn’t know what to use as values that make sense.

In many cases any value works when the code doesn’t depend on the actual format of the value. However, when the data type or format does matter then things can go wrong.

One example we had for this issue was a third-party gem that required an API secret, the gem would verify the format of the secret against a regular expression and some actions would fail with an invalid secret format error.

To prevent this, we created and open-sourced the dotenv-validator gem, which leverages the use of a .env.sample file with comments for every environment variable to provide extra information about the expected format of the value for each variable.

This gem includes a mechanism to warn an engineer about missing or incorrect environment variables when the application starts.

Dotenv-Rails

By default, dotenv only looks for a file named .env, but, when using dotenv-rails, it will provide some naming conventions that we can adopt to further differentiate the environment variables we use not only per app but also per Rails environment.

When running a Rails app with dotenv-rails, environment variable files are looked up in this order:

root.join(".env.#{Rails.env}.local"),
(root.join(".env.local") unless Rails.env.test?),
root.join(".env.#{Rails.env}"),
root.join(".env")

Using this convention we can specify different environment variables for the same application when we run the application with rails s or when we run the tests.

Note that all the files listed above are loaded and processed by dotenv in that specific order. This means you can have generic environment variables in a .env file and be more specific overriding/defining only some of them in a file for the current Rails environment without having to copy all the variables to the new file.

Foreman

New Rails application comes with a bin/dev script that uses the foreman gem to run multiple processes at once. foreman is aware of the .env file and will load it before our application loads it. However, there’s one important difference, the way foreman parses the .env file is not the same as the way dotenv processes the same file.

The dotenv gem understands comments and they are ignored when setting the values in the ENV hash, while foreman does not ignore them. So, a .env file that looks like this:

MY_ENV="my value"    # some comment here

Will produce different values for ENV["MY_ENV"] depending on how the application is run:

  • when running the app directly with rails s, the comment is ignored by dotenv and ENV["MY_ENV"] returns the string "my value"
  • when running the app through foreman the comment is not ignored, so ENV["MY_ENV"] returns the string '"my value" # some comment here' (then, when the Rails app loads, the .env file is parsed again by dotenv but since the variable was already defined by foreman, it is not replaced)

One workaround for this is to rely on the naming convention of alternative files: if, for example, we use .env.development and .env.test files, these will only be parsed by dotenv thanks to the dotenv-rails convention and not by foreman.

Another option is to configure the initialization of dotenv to use overload instead of load.

Docker

Docker is a really popular solution for containerizing applications, and Docker-related files will be created by Rails for new apps (since Rails 7.1).

When using docker-compose, it will look for a .env file and, in some cases, it may not ignore comments or even process the values differently than dotenv.

You can check the docs here.

If environment variables are not populated correctly by docker-compose compared to dotenv, the workarounds used for foreman can be used here too.

Dotenv wrapper

Sometimes we have to run applications that are not aware of the .env file but do expect some configuration in the ENV hash. For example, a background job process running a worker that reads some information from the ENV hash.

In that case, instead of changing our job-runner code to load dotenv, we can use the dotenv executable to wrap any command. For example:

dotenv -f ".env.local" bundle exec rake sidekiq

This wrapper can then be used in a Procfile to ensure dotenv works as expected when using foreman for example if we don’t use a .env file.

Figaro

Another popular gem with a similar functionality is the figaro gem. Compared to dotenv, figaro is focused more on Ruby on Rails applications and provides some features like ensuring the presence of specific environment variables (one of the features of dotenv-validator).

dotenv is not focused on Ruby on Rails applications (but can be used with no issues) and its development has been more active.

In Conclusion

Because of the work we do at OmbuLabs with multiple clients, handling environment variables with a .env file is key for us to quickly change between projects locally without polluting the system’s environment variables.

For our projects we don’t use a .env file in production, since we define the environment variables in the Heroku dashboard, but we still use dotenv-validator to ensure that the application has all the variables with correct values to avoid unexpected issues.

We try to keep the .env.sample file with development-ready values, but it’s not always possible when some variables can be specific for a machine or developer, so adding format validation can help the developer set the correct value.

Feel free to reach out to OmbuLabs for help in your project, we offer many types of services.

]]>
arieljuod
Quickstart: Preparing Your Organization To Work with an Agency2023-05-10T08:34:56-04:002023-05-10T08:34:56-04:00https://www.ombulabs.com/blog/project-management/best-practices/preparing-your-organization-to-work-with-an-agencyYou’ve signed a contract with an agency, awesome! At this point you already know that agencies like ours offer expertise and resources that can help you overcome challenges, increase efficiency, and achieve your goals.

Now that the executives, sales team, and lawyers have signed off on the project, how do you get off to a quick start to accomplish your business goals?

Access Provisioning

Provide access for the external agency before the project kick-off call.

Access provisioning can take anywhere from 1-7 days in best case scenarios. Our projects are a fixed retainer, and we begin billing from the project start date, whether we have working accesses or not.

Be sure to select your project start date while keeping in mind how long it will take to grant the agency full access.

As soon as contracts are signed, one of our project managers will reach out to provide information and begin the access process. Clients who have documentation around external contract access provisioning and are proactive about onboarding our team are able start projects without delay.

Update your Documentation

Update your readme and communicate QA process workflow.

When was the last time you set up your application locally? What does your QA process look like? How long does it normally take to review PRs?

Our PM and engineers ask these questions at the start of any new project. We’ve found that projects begin swiftly when clients have recently updated their readme, and can explain their QA process clearly.

If it’s been a while since you updated your documentation, or if you don’t have any, now is the time to whip something up to support the collaborative effort!

Establish Point of Contact (POC)

Appoint a clear decision maker and escalation point to facilitate seamless communication.

Depending on your organization’s size, the decision makers could be the same people who are about to collaborate with us. In many cases, our initial communication is with executives or lead engineers.

When a contract is signed, it is important to inform the agency who they will interact with daily, who is needed to conduct check-in calls, and who is a decision maker in the case of code or other types of project changes.

Explain the Business Case

Communicate the business case to the development team.

As you prepare your developers and engineers to work with us or another agency, it is useful to explain the business case, applicable scope of work information, how you foresee the workflow changing (if at all), and expectations around collaboration with external stakeholders.

Teams that understand their roles and responsibilities clearly can collaborate best.

Conclusion

While hiring an external agency may have some challenges, there are easy steps to take to prepare your organization and team to mitigate those risks and start projects without a hitch.

You can hit the ground running by documenting and clearly communicating you access provisioning, setup and QA processes. Having a clear project POC, and preparing you team with internal communication will make for a smooth and quick transition process.

Are you interested in working with our agency? We provide many services including staff augmentation, Ruby on Rails Upgrades, and JavaScript Upgrades. You can also checkout some of our case studies if you want to know more about past companies who have worked with us.

]]>
shpm55
Design Sprint Day 4: Prototype2023-05-02T05:15:00-04:002023-05-02T05:15:00-04:00https://www.ombulabs.com/blog/user-experience/product-design/design-sprint-day-4This is part of our series on design sprints. If you haven’t read our previous articles, I encourage you to read more about our design sprint process.

Day 4 is a little different from the other days of the Design Sprint. Instead of a series of workshops, we will spend most of the day each working on one part of the prototype.

Towards the end of the day, we will do a test run to check on our progress and adjust from there.

Design Sprint Day 4 - Prototype

How to start

Using the storyboard from Wednesday as our map, we will divide and conquer the prototype.

Example of a storyboard.

The team will be split into 5 roles:

  • Makers
  • Asset Collectors
  • Writers
  • Stitcher
  • Interviewer

Makers

Makers will create the various sections of the prototype.

How many? 2 to 3 Makers.

Makers will split the storyboard (or storyboards!) into sections.

Each maker is responsible for creating the prototype for their sections of storyboard.

Asset Collectors

Asset Collectors will gather images/icons and other assets that the makers will need.

How many? 2-3 Asset Collectors.

The asset collectors will make sure that the makers have the assets they need to continue their work. This means finding images, icons, illustrations, sounds, or anything else that the maker can stay focused on and leave these decisions to someone else.

Writer

Writer to provide the text for all the parts of the prototype.

How many? 1 Writer.

The writer fine tunes all the copy from the storyboard and provides that copy to the makers.

This might include fake text for an article about whatever you’re prototyping, an email about the product, an advertisement etc, as well as the copy in the prototype.

Stitcher

The stitcher is responsible for taking the sections of the prototype or prototypes and attaching them together.

Their job it is to make sure that the whole experience makes sense from end to end.

How many? 1 Stitcher.

The stitcher puts it all together and makes sure that all the pieces fit together into a seamless prototype.

Interviewer

Interviewer, who will write the interview script for Friday.

How many? 1 Interviewer.

The interviewer will write questions for the interviews tomorrow based on the storyboard and the prototype.

But how will we build a prototype?

A prototype is a tool for research and discovery – not a functional app.

A good prototype feels real enough to pretty well replicate your desired experience and help your interviewees get in the headspace of the problem you’re asking them to think about.

The goal for THIS prototype is to make something that does those things AND can be ready to test after 8 hours of work.

The prototype is not a blueprint for a product. It’s a way to get feedback on an idea from people who might use a product like yours in the future, and then apply that feedback so that you have a really good idea of where to look next as you continue the process of making an idea into a service.

You don’t need any fancy design software to do this because this is intended to be accessible for everyone. You could lean on Keynote or Powerpoint.

I recommend these tools because they are basic, they are not designer-only, and because they are relatively easy to use. You can even use the transition features in both of them to show the flow of the prototype.

Stock Photos

You need images, you need text, you need to transition between scenarios and steps, and you need a way to set up your starting scenario. You can (and absolutely should) fake things if you need to.

Does there need to be an email? Fake it.

Does an automated phone call play into your scenario? Fake that, too.

The reason for this is to keep the prototype simple and to prevent the team from conflating this exercise with a normal design process. It’s certainly part of that process, though.

Feel empowered to use a bit of hand-waving during the interview if the prototype takes a little more imagination in some areas, too. Of course you can make this prototype using design software like Sketch, Figma, Balsamiq, XD, or whatever else you like, but beware of letting your Design Sprint prototype become something bigger than it needs to be.

Don’t get too precious about the prototype. Focus on picking a tool that will be easy to use, then use the heck out of it.

If you’re working on an iPad or iPhone app, Apple provides free iOS interface elements for Keynote. There are also lots of free UI kits for PowerPoint. You can also use images of elements you need and Frankenstein your way to a functional (read: testable) prototype.

Putting it all together

At about 3pm or so, or about 5 hours into your prototype day, try the thing out.

The prototype should be in a strong rough-draft place. Quickly put your sections together (e.g. just have each Maker play through their sections in order) so you can walk through what you have.

Walk through the full scenario and see how it looks and feels. Make note of any rough spots, then take the notes back with you as you’re round the corner on getting the prototype to a testable place.

Finally, the stitcher will take all the files and put them together into 1 whole prototype. Make sure that the interview questions line up with the prototype, and you’re ready for your interviews on Day 5.

Need to see this in action? Contact us to validate your idea with a one-week Design Sprint! 🚀

]]>
nathalie
A Quick Intro to Graphs2023-04-25T04:15:34-04:002023-04-25T04:15:34-04:00https://www.ombulabs.com/blog/data-structures/a-quick-intro-to-graphsIn my previous post, I mentioned how I was having issues with populating my random maze. The main problem is that there isn’t a clear way to programmatically add random rooms and paths between those rooms. I ended up with a method way more complex than I wanted and that only did the most basic thing, which was to add random types of rooms and just connect them randomly as well, but without verifying any form of room placement, for example.

This is a clear sign that we need another layer of abstraction. We need something that can hold our maze data and to take care of placing the rooms and connecting them according to the rules we establish. After some research, I think I found the right alternative: the Graph.

What are graphs?

Generally speaking, a graph consists of a set of vertices or nodes that can be interconnected by a set of edges. There are many types of graphs, but, as a data type, graphs usually implement the concepts of undirected graphs and directed graphs.

A small graph

Undirected graphs are those whose edges don’t have a specific direction. As such, if nodes 1 and 2 of a graph are connected this way, it means we have a path going from 1 to 2 and that same path would allow us to go back, from 2 to 1.

In a directed graph, however, edges do have directions, and an edge that goes from 1 to 2 won’t allow you to move back, you’ll need to add a directed edge from 2 to 1.

The only other concepts concerning graphs that are of interest to us are the definitions of adjacency and a path.

Adjacency, as the name says, is the characteristic that two nodes can have which means that they are connected by an edge.

Finally, a path merely represents a sequence of edges that connect any two nodes of our graph.

Why use them?

In general, graphs are the data structure to go to when we care not so much about how data is stored but more about how it’s connected.

In our specific case, we want to generate a set of rooms that should be connected to each other. Indeed, if we wanted a really quirky experience, we could just have them all connect to other rooms without any regard for positioning and, while I do kind of like that idea, I want to be able to have a more structured approach.

Basically, my requirements are:

  1. Rooms (nodes) must have a limit to the number of doors (edges) they can have (4, for starters)

  2. I want them to be in a square grid. This implies that some pairs of rooms can’t have a door connecting them

  3. I want to be able to “grow” the maze by starting with a single room and then randomly adding adjacent rooms until I have my maze

Also, since my doors are edges, it makes sense to me that the graph we implement is undirected, since I want the player to be able to just cross back and forth through any given door.

Now, graphs don’t implement all the rules that satisfy these requirements, but they make it way easier to encode this information. That’s what we call separation of concerns. Our little maze builder will take care of determining the rules of generating rooms, walls and doors, and the Graph class will be responsible for holding the connection information itself.

Basic operations and some design decisions

Now that we know what a graph is and why we’d want to use it, I want to briefly discuss what we need our graph to do.

Operations

Like other data structures, a graph implementation must provide us with a minimal set of operations that will allow us to use it. Since this list may vary according to one’s needs, I decided to go with this list:

  1. adjacent(x, y) - Tests whether there’s an edge between nodes x and y
  2. neighbors(x) - Lists all nodes adjacent to x
  3. add_node(x) - Adds node x, if it isn’t present
  4. remove_node(x) - Removes node x, if it is present
  5. add_edge(x, y) - Adds an edge between nodes x and y
  6. remove_edge(x, y) - Removes the edge between nodes x and y
  7. get_node_value(x) - Returns the value associated with node x
  8. set_node_value(x, v) - Sets the value v to node x
  9. get_edge_value(x, y) - Returns the value associated with the edge between x and y
  10. set_edge_value(x, y, v) - Sets the value v to the edge between x and y

Design considerations

Not all graphs are equal, however. There are a few different ways to implement the functionality listed above and which one is best will depend on what it is we’ll be using this graph for.

The main decision comes to how we’ll represent this graph in memory (and in code, for that matter).

There are two common ways to do so:

  1. Adjacency list: Nodes are stored as records or objects and each node holds a list of adjacent nodes. If we desire to store data on edges, each node must also hold a list of their edges and each edge will store its incident nodes.

  2. Adjacency matrix: A two-dimensional matrix where the rows represent source nodes and columns represent the destination nodes. The data for nodes or edges must be stored separately.

Each approach has pros and cons to it.

The adjacency list is usually the one used for most applications, since it’s faster to add new nodes and edges and, if it’s not too big, removing nodes and edges are operations that take a time proportional to the number of nodes or edges. However, if your main use case is to add or remove edges, or lookup if two nodes are adjacent, the adjacency matrix is best. Its main drawback is that it’s the one that consumes the most memory.

The rule of thumb is: if you have a lot of edges compared to nodes (a dense graph), the adjacency matrix is preferred. If your graph is sparse (way more nodes than edges), the adjacency list is the way to go.

In our case, our graph can be either. But since mazes are usually better if we have more paths (i.e. more edges), we’ll usually have more dense graphs than not, which is why I intend to implement our graph using the adjacency matrix.

Conclusion

I’ll end this article here. I felt that the subject matter was too extensive to cover graph theory, our design decisions and our implementation. In Part 2 I’ll write about the code related to graphs.

]]>
mateuspereira
Why you should speak at conferences2023-04-04T21:00:00-04:002023-04-04T21:00:00-04:00https://www.ombulabs.com/blog/conferences/why-you-should-speak-at-conferencesSpeaking at conferences can be a daunting task, and I am not here to deny that. But beyond that daunting task lies a bunch of benefits. Through this post, I am trying to shed light on some of those benefits and how you can make the task of speaking at a conference a little less daunting. Please continue reading if this interests you.

First of all, what makes me the “right” person to talk about this?

Who am I ?

I am a software engineer with just over 10 years of experience. I am also an introvert and have found ways to engage with complete strangers at a conference (more on this later!).

I have also spoken at over 10 conferences in the last 8 years. Most of those conferences have been in foreign countries where I don’t know anyone and yet have had a fun experience attending these conferences.

What are the benefits of speaking at a conference?

As an introvert, you do not have to take the first step to interact with someone

If you are attending a conference and you find it hard to initiate a conversation with strangers, you may end up attending the conference all by yourself. By the time the conference is over, you might feel it was not a positive experience for you and that you don’t want to go through it again.

I have been there.

But when I spoke at my first conference (Ruby Conf India, 2014), I had a completely different experience though. I met and spoke to a lot of people and had a good time at the conference.

This happened again at the next conference I spoke at, Ruby Conf Brasil, 2014. This time, it was in a country I have never been to, did not know anyone from there, but met a lot of people, made some friends, and I had a good time.

It was then I realized that, when I am speaking at a conference, I did not have to initiate the engagement. It was the other way around (mostly), and that made talking to a complete stranger not a nightmare anymore. In fact it gave me confidence to even initiate conversations.

Brand development

When we read brand development, let’s not just think about our employer. Yes, conferences do give them credibility and help them in brand development. But it also helps you as an individual to build your personal brand.

For instance, if you are a Javascript developer and you frequently blog about JS-related topics and tweet about JS trends, the individuals who enjoyed your session may interact with you more on Twitter and in your blog articles after you give a talk at a JS conference. Thus helping you in building a brand for yourself.

What it does for your employer.

Consider this, you are working at a Ruby on Rails consultancy, and when you give a talk at a Ruby conference, you would mention where you work, and put the name / logo of your employer in your slides. So it gives them instant visibility. First, as a potential employer, for people who might be looking for work, and second, as a consultancy, for people who might be looking for Ruby on Rails shops for their next product development project.

It’s a win-win for you and your employer.

Getting over stage fright

One of the biggest reasons I did not even think of giving a talk at conferences was stage fright. And I will be honest, even after giving 10 or so talks, I still fear the stage (somewhat). My experience has been that you can learn to live with this fear and find ways to deal with it.

How do I deal with it? I try to remember and follow these things:

  • Relax and deep breaths: I tend to sit in a quiet corner 15 mins before my talk and take deep breaths. This brings me composure. And I tend to not eat anything or eat something very light before my talk.
  • Not knowing is OK. I used to think if I am giving a talk on anything, I should know the answer to each and every question asked by the audience. It is with time I have realized that it is OK to not know all the answers. If I don’t know the answer to a question, I accept it on stage and connect with that person on Twitter to reach out when I have the answer.
  • Humour always helps. I tend to start my talk with some light humor or an anecdote about myself which brings some friendly laughter or smile to the audience and that sets the tone for my talk.

Travel and experience diverse cultures

If you are someone who likes to travel, this is one of the best ways to work and travel. You get to travel to a new place and spend time with people from that place, and, in many cases, the employer takes care of the travel expenses or the conference provides you a reimbursement up to a certain sum.

If I get an opportunity to give a talk at a conference, OmbuLabs, my employer, provides me the benefit of covering the cost of flight and hotel for the duration of the conference. As an employee, that is a huge benefit as I have one less thing to worry about when it comes to speaking at conferences.

When you travel for a conference, it gives you a chance to not just travel and meet new people, but also learn more about their work culture, their best practices and the kind of work people in the community are doing.

How to get started with speaking at conferences?

When it comes to speaking at a conference, a few things that one can be worried about are:

  • Can I talk in front of so many people?
  • Can I talk about something for over 20 minutes?
  • Will people like my talk?

1) Giving a talk at your company

This is a great starting point because you are already familiar with the audience, a small group of people who you work and interact with regularly.

2) Lightning talks at a conference

This is a great opportunity to get some stage time at a real conference. Lightning talks are, as the name suggests, lightning fast. Each speaker gets 5 minutes to speak about a topic they want. So you get to be on stage in front of a relatively large audience, but for a short amount of time. It is a great way to get some experience.

3) Talk at local meetups

Local meetups are like mini conferences. You get to speak in front of some people, for a duration of about 20-25 mins on a topic of your choice. This is a great way to practice public speaking, all while getting involved in the local community of your choice. It is also a great way to get feedback from people.

4) Practice helps!

Once you have done all the above and are finally ready to speak at a conference, you want to make sure that you give your best. The best part about speaking at a conference is that you control the narrative of your talk, you control in which direction you want the talk to go.

So take time to prepare the content, don’t rush that activity and practice your talk as many times as you can. Use a mirror to practice, give the same talk to your team, to your mother, at your local meetup and then some more to the mirror.

Doing this, not only makes you more confident about the content and the flow of the talk, but it also helps you time your talk.

I believe in over-preparing the content of the talk. If the duration of the talk is 25 minutes, I would ideally prepare content which will last me for about 45 minutes and from there I start trimming the content down to what makes my talk the best.

While I am practicing my talk again and again, I am also timing myself for the talk. So I don’t end up in a situation where the organizers ask me to finish my talk in the next 2 minutes while I am only 50% done through the content of my talk.

So, practice your talk as much as you can!

Conclusion

I sincerely hope the information in this post was helpful to you and will aid you in making the decision to volunteer to speak at a conference.

]]>
rishijain
How to Create a Positive and Productive Remote Work Culture2023-04-04T05:00:00-04:002023-04-04T05:00:00-04:00https://www.ombulabs.com/blog/culture/remote/positive-remote-work-cultureAt OmbuLabs, we love remote work. While other companies are asking employees to come back into the office, we are continuing to lean into the remote-first work culture that we had even before the pandemic. In this article, I will discuss the reasons why remote work is essential to the culture of our company. I will then outline how we create a culture that is both remote-friendly and productive.

Why do we work remotely?

Obviously, we cannot deny some benefits to working remotely, such as saving money on office costs. But there are other benefits of working remotely that are hard to replicate in an office. Here are a few reasons why OmbuLabs loves remote work:

  1. To take advantage of talent. One of our company’s core values is talent is everywhere. Diverse hiring makes better teams, and what is the best way to ensure that we have diversity in our teams? By removing geographic restrictions.

  2. Because we value personal autonomy. According to a Microsoft report, 85% of leaders have trouble believing that their employees are being productive when working a hybrid schedule. Workers do not need to be watched every moment. By trusting our employees, we allow them to do their best work.

  3. To create a work-life balance. Humans spend enough time working already. Do we really need to spend 1-2 hours in traffic, to sit in an office for 8 hours a day? Skipping the commute, and fitting work around our schedule translates to a better work-life balance and more engaged workers.

  4. To be more productive. What is better than being able to roll out of bed and start working? And working around illness and small kids, between running errands, allows us to focus fully while we are at work.

How do we create a remote-friendly culture?

As other companies have learned from the pandemic, you cannot just have your employees install Slack and Zoom on your computers and phones and declare yourself a remote company. To successfully function as a remote-friendly company, it’s important to think about how to mimic the real-world experience. How do you create an environment where team members can work together and share ideas? We use the same tools (Slack, Zoom, Slite) as everyone else. But what makes our culture different? We lean into those tools to recreate real-time communication.

One way we accomplish this is by coming together in real-time. Yes, that means we spend a lot of time in Slack and Zoom. ‘That sounds horrible!’, you may be thinking, but hear me out. On the surface, this sounds like we spend a lot of time in meetings. Which I guess is a problem if you think meetings have no value. But to capture the flow of ideas that only happens when you are in the same room? You need real-time communication.

To make it less like we are spending hours in countless Zoom meetings, we try to have fun during work hours. Some of the ways that we try to make it fun are:

  • Setting time for the whole team to get together. At the end of our weekly staff meeting, we have an ‘OmbuBeer’. This is our time to play games with each other and chit-chat.
  • Donut calls: this is our replacement for water cooler conversations. Every two weeks, we have a 30-minute - 1-hour call to chat with a different teammate about whatever we want.

How do we make remote work more productive?

Managers might not like remote work because it isn’t as easy to see who is “working hard”, since you can’t see people working. This is how we have overcome that issue:

  • We have several policies in place to ensure everyone has the same expectations. We have an “officeless guide”, a document that outlines our expectations around communication. This includes having policies around being available, especially outside work hours.
  • Since our team has members in several different time zones, we require our full-time staff to stick to a 4-hour overlap between EST. That ensures we can keep some degree of real time communication with each other.
  • Time tracking policy: we have a strict time tracking policy that we make everyone sign. We require every team member to track at least 98% of their time in Noko, our timer app. This is super important for us, especially as a consultancy, since if we can’t bill our time accurately, we don’t get paid accurately.

Conclusion

Remote work is a valuable resource both to employers and employees. At OmbuLabs, remote work is essential to our culture, and enables us to carry out our core values as a company.

]]>
gelseyt