Amol Dighe

Google I/O 2026: Entering the Agentic Era

2026-05-21T00:00:00+00:00

Every year, Google I/O serves as a compass for the tech industry, charting the course for the next generation of consumer experiences but this time it marked a massive paradigm shift.

We have officially transitioned from the “Generative Era” of AI to the “Agentic Era.”

Instead of just responding to prompts or summarizing search queries, Google’s latest wave of technology centers on autonomous, proactive AI agents that can reason, orchestrate workflows, collaborate, and execute long-horizon digital tasks in the background.

graph TD
    A[Gemini 3.5 Core Engine] --> B(Consumer & Workspace Agents)
    A --> C(Developer Platforms & Security)
    A --> D(Hardware & Wearables)
    A --> E(AI Infrastructure)
    
    B --> B1[Gemini Spark]
    B --> B2[Universal Cart]
    B --> B3[Daily Brief]
    B --> B4[Ask YouTube & Ask Maps]
    B --> B5[Docs Live & Google Pics]
    
    C --> C1[Antigravity 2.0]
    C --> C2[CodeMender]
    
    D --> D1[Intelligent Eyewear]
    D --> D2[Wear OS 7 & Widgets]

    E --> E1[TPU 8t & TPU 8i]

Here is a comprehensive breakdown of every major tool and launch at Google I/O 2026:

1. Gemini 3.5 Flash

Gemini 3.5 Flash is the latest high-speed, lightweight model in the Gemini 3.5 series. It is built specifically for speed, low-latency responsiveness, and orchestrating complex “agentic” and coding workflows that require rapid, multi-step execution.

Ultra-Fast Speed: It operates up to four times faster than previous frontier models, making it ideal for real-time interactions.
Native Integration: Immediately rolled out as the default engine powering the Gemini App and the new AI Mode in Google Search.
Optimized for Coding: Features highly enhanced reasoning capabilities for programming and recursive troubleshooting tasks.

2. Gemini Spark

Gemini Spark is Google’s new “24/7 personal AI agent.” It is designed to act as an autonomous digital assistant that proactively manages daily chores, schedules, and digital tasks without requiring direct supervision or keeping the user’s browser active.

Persistent Cloud Execution: Runs continuously on dedicated virtual machines in Google Cloud, meaning it can sort emails, flag calendar conflicts, and book appointments even when your phone or laptop is completely shut off.
Proactive Orchestration: Learns your habits and preferences to autonomously draft replies, organize documents, and handle digital errands.
Exclusive Subscription Rollout: Currently available to trusted testers and coming soon to Google AI Ultra subscribers ($100/month or included in the updated premium tiers).

3. Gemini Omni (and Gemini Omni Flash)

Gemini Omni is a new multimodal generative “world model” that treats video, audio, image, and text as native inputs and outputs. It is focused heavily on real-time, creative generative video creation and fluid editing.

Conversational Video Editing: Features Gemini Omni Flash, allowing users to generate and modify video content simply by speaking to the model (e.g., “Add a warm lens flare,” or “Change the car color to metallic blue”).
Seamless Multimodality: Eliminates the latency of feeding inputs to separate models by natively processing high-fidelity video, speech, and text simultaneously.

4. Google Antigravity 2.0

Antigravity 2.0 is Google’s new, agent-first developer platform. It is a comprehensive suite designed to help engineers build, monitor, orchestrate, and deploy parallel multi-agent systems and agentic workflows.

Standalone Desktop App: A visual, state-of-the-art developer workspace to orchestrate multi-agent environments, track agent tasks in real-time, and run interactive simulations.
Antigravity CLI & Python SDK: Enables developers to build, test, and spin up agents programmatically directly from the command line.
Model Context Protocol (MCP) Support: Built-in standard protocol support, making it incredibly simple to securely connect agents to local databases, shell tools, and third-party APIs.
Generative App Development: Demonstration showcased the ability to construct complete apps and simple operating systems through natural language instructions.

5. CodeMender

CodeMender is an autonomous AI security and engineering agent built inside the Antigravity Agent Platform. It is used to automatically detect, analyze, patch, and rewrite critical vulnerabilities in codebases.

Self-Healing Codebases: Autonomously scans local or remote code repositories for critical vulnerabilities.
Automatic Patch Generation & Testing: Not only finds security issues but automatically drafts code patches, runs unit tests to ensure no regressions, and submits pull requests for human review.

6. Ask YouTube

Ask YouTube is a new conversational search experience built into YouTube that allows users to query the actual content of videos to find specific answers without having to watch them all the way through.

Time-Stamped Responses: If you ask “How do I calibrate the focus ring in this video?”, the AI answers the question in text and jumps you directly to the exact millisecond in the video where that step is shown.
Conversational Dialogue: Users can ask follow-up questions, summarize key takeaways of long-form podcasts, or extract ingredient lists from cooking videos.

7. Ask Maps

Ask Maps is a conversational, Gemini-powered assistant integrated directly within Google Maps. It allows users to query Google Maps using complex, scenario-based natural language to find location recommendations and plan itineraries without relying on restrictive keywords.

Scenario-Based Inquiries: Solves complex real-world queries (e.g., “Where can I charge my EV, in the next 10 minutes, with a restaurant nearby that serves pasta?”).
Deep Personalization: Accounts for your saved locations, travel patterns, and past preferences to offer highly tailored recommendations.

8. Google Pics

Google Pics is a brand-new AI design and precision image-editing tool integrated directly into Google Workspace (Docs, Slides, Drive), powered by an on-device Gemini Nano Banana model.

Dynamic Canvas Editing: Allows users to modify isolated components of an image (e.g., resizing or moving an object) while AI seamlessly fills in the background behind it.
Workspace Productivity: Enables office workers to design professional flyers, mockups, social media graphics, and translate texts embedded in visual assets on the fly.

9. Docs Live

Docs Live is a voice-enabled, interactive collaboration tool integrated within Google Workspace (including Google Docs, Gmail, and Google Keep). It allows users to write, draft, edit, and organize documents hands-free via real-time conversational voice dialogue.

Real-Time Spoken Dictation & Structuring: Converts spoken thoughts into beautifully structured, formatted outlines and paragraphs on the fly.
Voice-Driven Editing: Allows users to issue real-time verbal commands to refine drafts (e.g., “Change this paragraph’s tone to be more professional,” or “Insert a summary bullet list of these notes”).
Cross-Workspace Integration: Seamlessly search and extract context from Gmail inboxes and Google Drive files entirely through natural spoken conversations.

10. Universal Cart

Universal Cart is an agentic, unified shopping hub operating across Search, Gemini, YouTube, and Gmail. It allows users to shop, aggregate, compare, and check out items from multiple online retailers in one single checkout stream.

Background Deal Tracking: Constantly monitors prices, applies discount codes, flags restocks, and reports historical price drops for items in your cart.
Universal Commerce Protocol (UCP): A new standard that allows agents to securely complete purchases on behalf of users.

11. Daily Brief

Daily Brief is an intelligent personal dashboard powered by Gemini that automatically digests information from your digital life to give you a highly customized, actionable start to your day.

Multi-Source Triage: Aggregates and synthesizes unread emails, upcoming calendar appointments, and outstanding tasks into one comprehensive briefing.
Conversational Follow-ups: Proactively suggests actions (e.g., “You have an email from Sarah asking to reschedule; would you like me to move your 2:00 PM calendar block?”).

12. Wear OS 7 & Wear Widgets

Wear OS 7 is the latest major operating system update for smartwatches, with a massive architectural upgrade centered on real-time widgets and on-device intelligence.

Wear Widgets: Dynamic tiles that mirror active application states and phone-based information in real-time, rather than requiring you to open apps.
Gemini Smart Engine: Deep integration of Gemini to provide context-aware shortcuts and voice-driven agentic assistance directly on the wrist.

13. Google Intelligent Eyewear (Smart Glasses)

Developed in collaboration with Samsung, Warby Parker, and Gentle Monster, these are sleek, audio-first smart glasses designed to provide a highly private, hands-free spoken connection to Gemini on the go.

Audio-Focused Assistance: Provides real-time spoken translations, directions, and reminders quietly in your ear, eliminating the need for bulky displays or constant phone-checking.
Stylish Designs: Built with leading fashion brands like Warby Parker to look like premium, everyday eyewear. Launching in Fall 2026.

14. TPU 8t & TPU 8i (Eighth-Generation Tensor Processing Units)

Google’s custom-designed eighth-generation AI accelerator chips, split into two specialized workloads: TPU 8t for massive model training and TPU 8i for real-time model inference.

TPU 8t (Training Optimized): Delivers nearly 3x the raw compute of previous generations, custom-built to scale training across over one million TPUs globally using JAX and Pathways to shrink frontier model training cycles from months to weeks.
TPU 8i (Inference Optimized): Specifically designed to power real-time agentic workloads, optimizing energy efficiency to deliver 2x higher performance-per-watt for low-latency, scalable AI application serving.

Billed as the “biggest upgrade to the Search box in 25 years,” Google Search has evolved from a text-and-link query engine into a multi-input reasoning engine.

Rich Multi-Input: Users can drag-and-drop text, images, files (such as spreadsheets), and even video files directly into the Search box to get instant, synthesized, intent-based answers.
Unified Desktop & Mobile Experience: Fully merges AI Overviews and AI Mode into a single, seamless, interactive search stream.

Conclusion: Welcome to the Future of Technology

Google I/O 2026 has set a bold course. The days of treating AI like a simple search bar or a writing prompt are fading. The agentic future is here, where AI operates as a collaborative partner, running in the background to handle the tedious work of software development, shopping, and organization.

Which of these announcements are you most excited to try?

Macbook Terminal Setup

2026-05-15T00:00:00+00:00

Recently I setup a new MacBook Pro, this gave me a chance to revist my command line setup and the necessary tools I use on a daily basis to speed up my development workflow.

Install Homebrew

Homebrew allow you to install software on your Mac using the command line.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install iTerm2

iTerm2 is a terminal emulator for macOS. It is a replacement for the default terminal application.

brew install --cask iterm2

Install git

Git is a version control system for tracking changes in source code during software development. Git is pre installed on macOS. You can verify it by running git --version command.

Install oh my zsh

Oh My Zsh is an open-source framework for managing your Zsh configuration. It includes a collection of community-maintained plugins and themes.

sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Useful Plugins for Oh My Zsh

zsh-autosuggestions https://github.com/zsh-users/zsh-autosuggestions/blob/master/INSTALL.md

git clone https://github.com/zsh-users/zsh-autosuggestions ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions

zsh-syntax-highlighting https://github.com/zsh-users/zsh-syntax-highlighting/blob/master/INSTALL.md

git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting

web-search

Edit file ~/.zshrc and add plugins to the line with plugins=(...).

plugins=(git web-search zsh-autosuggestions zsh-syntax-highlighting)

source .zshrc to activate the plugins

Homebrew formulae are also available to install the plugins

brew install zsh-autosuggestions
brew install zsh-syntax-highlighting

I prefer the above git clone + ~/.zshrc approch

Install starship prompt

brew install starship

I will be using an existing preset of startship prompt https://starship.rs/presets/tokyo-night which needs Nerd font to be installed. List of all the Nerd fonts can be found here: https://www.nerdfonts.com/font-downloads

Here’s a quick command to list all the available nerd fonts and install

brew search '/font-.*-nerd-font/' 
brew install --cask 

Example for installing JetBrains Mono Nerd Font

brew install --cask font-jetbrains-mono-nerd-font

To enable this font in iterm2 follow these steps:

Open iterm2
Go to Preferences -> Profiles -> Text
Click on “Change Font” and select “JetBrains Mono Nerd Font”

I like Tokyo Night theme for starship prompt. You can find more presets here: https://starship.rs/presets/ Use starship to import the preset for https://starship.rs/presets/tokyo-night

starship preset tokyo-night -o ~/.config/starship.toml

Install carapace

Carapace is a shell completion generator for commands. It allows you to generate completion scripts for your shell.

brew install carapace

Setup carapace to work with zsh - https://carapace-sh.github.io/carapace-bin/setup.html#zsh

Edit .zshrc file and add the following lines:

autoload -U compinit && compinit
export CARAPACE_BRIDGES='zsh,fish,bash,inshellisense' # optional
zstyle ':completion:*' format $'\e[2;37mCompleting %d\e[m'
source <(carapace _carapace)

source .zshrc to activate the changes & enjoy the carapace autocompletion feature for all the tools you installed using brew (and lots of other tools).

Install fzf

fzf is a general-purpose command-line fuzzy finder.

brew install fzf

Install tmux

tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.

brew install tmux

This one is more useful on production servers. But good to have it installed locally on Mac as well.

Postgres Disk & WAL

2026-04-09T00:00:00+00:00

Why is there size difference in my postgres cluster data directory?

I have a postgres patroni cluster with 3 nodes. Each node has a seperate data directory for postgres.

While verifiying the status of the cluster it was noticed that the cluster replica were stuck in a starting state.

[root@postgres-poc-node-3:~]# patronictl list
+ Cluster: postgres-poc-node (7587706677345667891) -------------------------+---------+----------+-----+-------------+-----+------------+-----+
| Member                             | Host                                                | Role    | State    |  TL | Receive LSN | Lag | Replay LSN | Lag |
+------------------------------------+-----------------------------------------------------+---------+----------+-----+-------------+-----+------------+-----+
| postgres-poc-node-1 | postgres-poc-node-1.localhost:5432 | Replica | starting |     |     unknown |     |    unknown |     |
| postgres-poc-node-2 | postgres-poc-node-2.localhost:5432 | Replica | starting |     |     unknown |     |    unknown |     |
| postgres-poc-node-3 | postgres-poc-node-3.localhost:5432 | Leader  | running  | 135 |             |     |            |     |
+------------------------------------+-----------------------------------------------------+---------+----------+-----+-------------+-----+------------+-----+

Each of the replica nodes had to be reinitialized to bring them back to a running state.

[root@postgres-poc-node-2:~]# patronictl reinit postgres-poc-node postgres-poc-node-2
+ Cluster: postgres-poc-node (7587706677345667891) -------------------------+---------+----------+-----+-------------+-----+------------+-----+
| Member                             | Host                                                | Role    | State    |  TL | Receive LSN | Lag | Replay LSN | Lag |
+------------------------------------+-----------------------------------------------------+---------+----------+-----+-------------+-----+------------+-----+
| postgres-poc-node-1 | postgres-poc-node-1.localhost:5432 | Replica | starting |     |     unknown |     |    unknown |     |
| postgres-poc-node-2 | postgres-poc-node-2.localhost:5432 | Replica | starting |     |     unknown |     |    unknown |     |
| postgres-poc-node-3 | postgres-poc-node-3.localhost:5432 | Leader  | running  | 135 |             |     |            |     |
+------------------------------------+-----------------------------------------------------+---------+----------+-----+-------------+-----+------------+-----+
Are you sure you want to reinitialize members postgres-poc-node-2? [y/N]: y
Success: reinitialize for member postgres-poc-node-2

Once all the replica nodes were back online, I noticed that the data directory size on the replica nodes was significantly smaller than the leader node.

leader node

[root@postgres-poc-node-3:~]# du -sh /var/lib/postgresql/*/* | sort -h | tail -20
4.0K	/var/lib/postgresql/17/pg_stat_tmp
4.0K	/var/lib/postgresql/17/pg_tblspc
4.0K	/var/lib/postgresql/17/pg_twophase
4.0K	/var/lib/postgresql/17/PG_VERSION
4.0K	/var/lib/postgresql/17/postgresql.auto.conf
4.0K	/var/lib/postgresql/17/postgresql.conf
4.0K	/var/lib/postgresql/17/postgresql.conf.backup
4.0K	/var/lib/postgresql/17/postmaster.opts
4.0K	/var/lib/postgresql/17/postmaster.pid
4.0K	/var/lib/postgresql/patroni/pgpass
16K	/var/lib/postgresql/17/pg_logical
20K	/var/lib/postgresql/17/pg_replslot
32K	/var/lib/postgresql/17/postgresql.base.conf
32K	/var/lib/postgresql/17/postgresql.base.conf.backup
52K	/var/lib/postgresql/17/pg_multixact
600K	/var/lib/postgresql/17/global
15M	/var/lib/postgresql/17/pg_xact
31M	/var/lib/postgresql/17/pg_subtrans
3.3G	/var/lib/postgresql/17/base
16G	/var/lib/postgresql/17/pg_wal

replica node

[root@postgres-poc-node-2:~]# du -sh /var/lib/postgresql/*/* | sort -h | tail -20
4.0K	/var/lib/postgresql/17/pg_tblspc
4.0K	/var/lib/postgresql/17/pg_twophase
4.0K	/var/lib/postgresql/17/PG_VERSION
4.0K	/var/lib/postgresql/17/postgresql.auto.conf
4.0K	/var/lib/postgresql/17/postgresql.conf
4.0K	/var/lib/postgresql/17/postgresql.conf.backup
4.0K	/var/lib/postgresql/17/postmaster.opts
4.0K	/var/lib/postgresql/17/postmaster.pid
4.0K	/var/lib/postgresql/patroni/pgpass
16K	/var/lib/postgresql/17/pg_logical
20K	/var/lib/postgresql/17/pg_replslot
28K	/var/lib/postgresql/17/pg_subtrans
32K	/var/lib/postgresql/17/postgresql.base.conf
32K	/var/lib/postgresql/17/postgresql.base.conf.backup
52K	/var/lib/postgresql/17/pg_multixact
232K	/var/lib/postgresql/17/backup_manifest
568K	/var/lib/postgresql/17/global
15M	/var/lib/postgresql/17/pg_xact
33M	/var/lib/postgresql/17/pg_wal
3.3G	/var/lib/postgresql/17/base

base contains the actual table and index files. Both nodes have about 3.3 GB there, so the actual database contents are roughly identical.

The large difference is entirely from pg_wal:

Leader: 16 GB WAL retained
Replica: 33 MB WAL retained

This is expected because replicas replay WAL and discard old WAL locally, while the leader must retain WAL for replicas, crash recovery, PITR, and replication slots.

Write-Ahead Logging is PostgreSQL’s durability and replication mechanism.

Basic idea:

A client changes data: update orders set status='done' where id=10;
PostgreSQL first writes a WAL record describing the change into pg_wal
Only after WAL is safely flushed does PostgreSQL consider the transaction committed
Later, the actual table file in base/ is updated

This is called “write-ahead” because WAL is written before the actual data files.

Why this exists?

crash recovery
replication
point-in-time recovery
consistency during restart

In this case as the replication was broken for a long time hence the leader node had to retain all the WAL files for the replicas to catch up.

Once the cluster was healthy again the replicas started catching up and the WAL files were discarded from the leader node.

[root@postgres-poc-node-3:~]# patronictl list
+ Cluster: postgres-poc-node (7587706677345667891) -------------------------+---------+-----------+-----+-------------+-----+------------+-----+
| Member                             | Host                                                | Role    | State     |  TL | Receive LSN | Lag | Replay LSN | Lag |
+------------------------------------+-----------------------------------------------------+---------+-----------+-----+-------------+-----+------------+-----+
| postgres-poc-node-1 | postgres-poc-node-1.localhost:5432 | Replica | streaming | 135 |  D/7F9BD988 |   0 |    unknown |     |
| postgres-poc-node-2 | postgres-poc-node-2.localhost:5432 | Replica | streaming | 135 |  D/7F9BD988 |   0 | D/7F9BD988 |   0 |
| postgres-poc-node-3 | postgres-poc-node-3.localhost:5432 | Leader  | running   | 135 |             |     |            |     |
+------------------------------------+-----------------------------------------------------+---------+-----------+-----+-------------+-----+------------+-----+

[root@postgres-poc-node-3:~]# du -sh /var/lib/postgresql/*/* | sort -h | tail -20
4.0K	/var/lib/postgresql/17/pg_stat_tmp
4.0K	/var/lib/postgresql/17/pg_tblspc
4.0K	/var/lib/postgresql/17/pg_twophase
4.0K	/var/lib/postgresql/17/PG_VERSION
4.0K	/var/lib/postgresql/17/postgresql.auto.conf
4.0K	/var/lib/postgresql/17/postgresql.conf
4.0K	/var/lib/postgresql/17/postgresql.conf.backup
4.0K	/var/lib/postgresql/17/postmaster.opts
4.0K	/var/lib/postgresql/17/postmaster.pid
4.0K	/var/lib/postgresql/patroni/pgpass
16K	/var/lib/postgresql/17/pg_logical
20K	/var/lib/postgresql/17/pg_replslot
28K	/var/lib/postgresql/17/pg_subtrans
32K	/var/lib/postgresql/17/postgresql.base.conf
32K	/var/lib/postgresql/17/postgresql.base.conf.backup
52K	/var/lib/postgresql/17/pg_multixact
600K	/var/lib/postgresql/17/global
15M	/var/lib/postgresql/17/pg_xact
337M	/var/lib/postgresql/17/pg_wal
3.3G	/var/lib/postgresql/17/base

GitHub Agentic Workflow

2026-04-03T00:00:00+00:00

The focus of this post is github agentic workflow. Before we dive into it lets brush through basics of Github Events & Actions.

GitHub Events

An event is any significant action performed on GitHub, ranging from code changes to administrative updates. Common examples include: Push: When code is pushed to a branch. Pull Request: When a pull request is created, updated, or merged. Issues: When an issue is opened, edited, or labeled. Release: When a new software release is published. Scheduled Events: Using cron syntax to run tasks at specific times (e.g., nightly builds). Manual Triggers: Using workflow_dispatch to start a workflow manually through the UI or API.

GitHub Actions

A powerful and flexible automation platform provided by GitHub that allows developers to automate tasks directly within their repositories. It is commonly used to implement CI/CD pipelines that build and test code on every pull request and deploy merged code to production. It can also automate any repository action by listening to various events such as code pushes, issue creation, pull request updates, or package registry events.

Workflow & Job

The core component of GitHub Actions, representing an automated process capable of executing one or more tasks. Workflows are defined using YAML files stored in the .github/workflows directory of a repository. A workflow runs in response to a specific event, such as a code push or a manual trigger. A workflow consists of one or more jobs. Each job contains a series of individual steps (e.g., cloning code, installing dependencies, running tests) that are executed in series. Multiple jobs within a single workflow can run in parallel by default.

Github Runners

A runner is a server (usually a virtual machine) that is responsible for executing the steps defined in a job. GitHub automatically provisions a runner for each job based on the runs-on configuration specified in the YAML file. Users can access real-time logs, outputs, and artifacts for every step executed on a runner through the GitHub UI.

Here’s a simple example of a GitHub Actions workflow - https://github.com/amoldighe/pytest/blob/main/.github/workflows/python-test.yaml

How Github Events Associate with GitHub Actions

The relationship between events and actions is defined within a Workflow file (a YAML file located in .github/workflows).

The “on” Key: You use the on keyword in your YAML configuration to specify which event(s) should trigger the automation. For example, on: push tells GitHub to run the workflow every time code is uploaded.
Filtering: You can refine how an event triggers an action by adding filters. For instance, you can specify that a workflow should only run on a push to the main branch.
Contextual Information: When an event triggers a workflow, GitHub provides “event context” to the runner. This data includes details like who initiated the event, the branch name, and the commit ID, which the action can use to perform its tasks (like posting a comment on the specific issue that was just opened).

In summary, GitHub Events act as the “sensor” that detects activity, while GitHub Actions provide the “response” or logic that executes in reaction to that activity.

What is Agentic Workflow?

Github supports agentic workflow through Github Actions with a paid Github subscription, specifically one with an active GitHub Copilot subscription. While the underlying GitHub Actions (which execute the workflows) have a free tier, the AI agents that power the automation depend on premium Copilot requests to function, with each run often consuming multiple requests.

Instead we will implement the agentic workflow using Github Actions and Openrouter API.

What is Openrouter?

OpenRouter is an API platform that provides access to various AI models from different providers through a unified interface. It allows developers to use different AI models with a single API key and pricing structure. It supports various AI models including OpenAI, Anthropic, Google, and many more. Best part about openrouter is that it provides free tier for using various AI models. For this example I will be using the free tire of Openrouter for PR review agent.

How to create a PR review agent?

I am setting up a workflow which will be triggered when a pull request is opened or updated. It uses github action to get the diff of the pull request and then sends it to the openrouter API for review. The review is then posted as a comment on the pull request.

The workflow YAML - https://github.com/amoldighe/pytest/blob/main/.github/workflows/pr-review.yaml leverages DiffGuard AI PR Review action from github marketplace.

      - name: AI PR Review
        uses: jonit-dev/openrouter-github-action@main
        with:
          # Required inputs
          github_token: $ # Automatically provided
          open_router_key: $ # Must be set in repository secrets

          # Optional inputs with defaults
          model_id: 'google/gemma-3-27b-it:free' # Default model
          max_tokens: '2048' # Default max tokens
          review_label: 'ai-review' # Optional: Only review PRs with this label

          # Optional custom prompt
          custom_prompt: |
            You are a security-focused reviewer. Analyze this PR with emphasis on:

The Openrouter API key need to be setup as part of your repository secrets and variables & refrenced in the workflow YAML file. On using the label ai-review on the pull request the workflow will be triggered.

Here’s a sample output of the PR review analyzed using model google/gemma-3-27b-it

Details can be viewed on this PR - https://github.com/amoldighe/pytest/pull/6

OpenCode, Agents & Skills

2026-03-20T00:00:00+00:00

In the rapidly evolving landscape of AI-assisted software development, OpenCode has emerged as a significant player, particularly with its focus on agents and skills. This post delves into what these components are, how they function within the OpenCode ecosystem, and why they matter for developers looking to leverage AI more effectively.

What is OpenCode?

Opencode is an open-source AI agent designed to help plan, implement, debug, and refactor entire codebases autonomously. Unlike traditional coding assistants that only autocomplete code or generate passive snippets in a chat window, OpenCode operates primarily through a continuous “agentic loop.” This means it can actively explore your file system, execute terminal commands, and modify code independently to achieve a complex goal.

Opencode supports a wide variety of state-of-the-art AI models—from proprietary giants like Claude, OpenAI, and Gemini, to powerful locally-hosted models. Through its modular architecture of “Agents” and “Skills,” developers can seamlessly create and share highly specialized workflows that are perfectly tailored to their specific tech stacks and organizational standards.

Install Opencode - https://opencode.ai/

In OpenCode, agents are specialized AI assistants configured to handle specific tasks, workflows, and operations throughout the code development lifecycle. Instead of using a single monolithic AI for everything, OpenCode allows developers to define and use multiple targeted agents that possess custom prompts, utilize different underlying AI models, and have explicitly defined tool access.

Here is a breakdown of how agents in OpenCode work:

1. Types of Agents

Primary Agents: These are the main assistants you interact with directly in your workflow. Standard examples include the “Build” agent (a full-access AI for writing, modifying, and executing code) and the “Plan” agent (a read-only AI used for code analysis, architecture design, and exploration).
Subagents: These are specialized “worker” assistants that primary agents can seamlessly invoke when they encounter a specific or complex sub-task (e.g., a subagent dedicated purely to advanced web searching or analyzing specific logs).

2. Tools and Automation (The Agentic Loop)

OpenCode agents operate on an “agentic loop.” This means they don’t just predict text—they autonomously analyze a goal, decide which tools to use, execute actions, read the results, and iterate until the task is complete. Their standard toolset includes:

File Operations: Reading, writing, and precisely modifying logic in your workspace.
Terminal Execution: Running bash commands, scripts, and tests.
LSP Diagnostics: Checking for syntax and linting errors using the Language Server Protocol.
Web Fetching: Pulling in external context, APIs, or documentation from the internet.

3. Agent Configuration

Agents are highly customizable and can be configured using an opencode.json file or Markdown specs. You can control:

Their specific role and system prompt.
Allowed tools (e.g., restricting an agent from running bash commands to keep it safely read-only).
The AI Model they use (OpenCode is provider-agnostic and prevents vendor lock-in, supporting Claude, OpenAI, Gemini, or locally hosted open-source models).
The Temperature (controlling how creative or deterministic the agent’s output is).

An OpenCode custom agent using a Markdown file (.md) containing a YAML front matter contaning metadata such as name, description, mode (primary/sub), model, temperature, and tool access (e.g., bash: false, write: true) and instructions.

example of an agent in opencode:

(base)  ~/.opencode/agents/ cat coding-review.md
---
id: coding-review
name: Coding Review
description: "Multi-language review agent for modular and functional coding"
category: development
type: standard
version: 1.0.0
author: opencode
mode: primary
temperature: 0.1
tools:
  read: true
  edit: false
  write: false
---

# Code Review Agent
Always start with phrase "CODING REVIEW..."

Focus:
You are a coding specialist focused on writing clean, maintainable, and scalable code. Your role is to review code using modular and functional programming principles.

Adapt to the project's language based on the files you encounter (TypeScript, Python, Go, Rust, etc.).

Core Responsibilities
Implement applications with focus on:

- Modular architecture design
- check for potential bugs & edge case
- Clean code principles
- Security consideration

Running the above Coding Review agent on a python project using one of Opencode free model gave the following output:

4. Collaboration and “Skills”

Agents in OpenCode can collaborate with one another, handing off tasks when specialized knowledge is needed. Furthermore, OpenCode incorporates Agentic Skills—reusable instructions and capability modules that you can attach to an agent. This teaches the agent exactly how to handle specialized frameworks, specific codebases, or distinct organizational patterns without having to rewrite custom prompts every time.

Skills allows Opencode to act as an specialist in specific areas. Without skills, every conversation starts from zero. You explain the same conventions and correct the same mistakes. Every morning, back to zero.

Here an example plan writing skill add to Opencode agent - https://github.com/obra/superpowers/blob/main/skills/writing-plans/SKILL.md

(base)  ~/.opencode/skills/writing-plan/ ls -lth
total 24
-rw-r--r--@ 1 amol.di  staff   1.7K 21 Mar 01:11 plan-document-reviewer-prompt.md
-rw-r--r--@ 1 amol.di  staff   5.3K 21 Mar 00:55 SKILL.md

In Opencode the required skill needs to be loaded before use.

In short, OpenCode agents and skills act as a modular, autonomous engineering team living within your terminal or IDE, each precisely tailored to perform specific development duties efficiently.

PostgreSQL Fundamentals - Configuration & Structure

2026-03-16T00:00:00+00:00

This is a continuation of my effort to explore and understand Postgres. In this post I will be covering important configuration and its database structure.

Connection Settings

max_connections

Maximum number of concurrent client connections. Each connection consumes memory, so balance this with available RAM.

max_connections = 100

Higher values require more memory per connection. Consider using connection pooling (PgBouncer, PgPool) for high-volume applications.

Memory Settings

shared_buffers

Memory used for caching table data. This is the most critical setting for performance.

shared_buffers = 128MB

Recommended: 25% of available RAM. PostgreSQL also uses OS cache, so leaving memory for the OS is important.

effective_cache_size

Hint to the query planner about how much memory is available for caching. Helps the planner make better decisions.

effective_cache_size = 4GB

Recommended: 75% of available RAM. This doesn’t allocate memory - it just advises the planner.

maintenance_work_mem

Memory for maintenance operations like VACUUM, CREATE INDEX, and ALTER TABLE.

maintenance_work_mem = 64MB

Recommended: 512MB-1GB for heavy maintenance.

work_mem

Memory per query operation (sorting, hash joins, bitmap operations).

work_mem = 4MB

A complex query may use multiple work_mem allocations. Increase if you have complex queries with sorts/hashes.

Write Ahead Log (WAL) Settings

wal_buffers

Memory for WAL data before writing to disk.

wal_buffers = -1  # Auto-tuned (recommended)

The default (-1) lets PostgreSQL auto-tune based on shared_buffers.

min_wal_size / max_wal_size

Controls WAL file recycling - keeps at least min_wal_size, cleans up to max_wal_size.

min_wal_size = 80MB
max_wal_size = 1GB

Increase max_wal_size if you have heavy write workloads or use replication.

Parallel Query Workers

max_worker_processes

Maximum background worker processes (including autovacuum).

max_worker_processes = 8

Should match CPU cores. Affects parallel queries and background tasks.

max_parallel_workers_per_gather

Maximum parallel workers per parallel table scan or join.

max_parallel_workers_per_gather = 2

max_parallel_workers

Total parallel workers available across all parallel queries.

max_parallel_workers = 8

max_parallel_maintenance_workers

Parallel workers for maintenance operations (CREATE INDEX).

max_parallel_maintenance_workers = 2

Viewing Current Settings

-- Show all settings
SHOW ALL;

-- Show specific setting
SHOW shared_buffers;

-- Query with details
SELECT name, setting, unit, context 
FROM pg_settings 
WHERE name IN ('shared_buffers', 'max_connections', 'work_mem');

Reloading Configuration

After modifying postgresql.conf, reload or restart:

-- Reload (keeps connections)
pg_ctl reload -D /path/to/data

-- Or from psql (doesn't require superuser)
SELECT pg_reload_conf();

-- Full restart for memory changes
pg_ctl restart -D /path/to/data

Quick Tuning Recommendations

Setting	Recommended Starting Value
shared_buffers	25% of RAM
effective_cache_size	75% of RAM
work_mem	4-64MB
maintenance_work_mem	256MB
max_connections	100
max_worker_processes	CPU cores

Remember to test changes in a staging environment before applying to production!

Postgres Databases

Postgres consist of multiple databases, collectively known as database cluster. Upon executing initdb, three databases are created: template0, template1, and postgres. template0 and template1 serve as template databases for creating user databases.

template0 is provided as an initial state template, while template1 allows users to add custom templates. This facilitates user-specific customizations right from database creation. By default, the postgres database is the primary database created using the template1 database. If no specific database is mentioned upon connection, it defaults to the postgres database. User databases are also generated by cloning the template1 database.

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# \dt
Did not find any tables.

template1=# create table t1 (c1 int);
CREATE TABLE
template1=# \dt
          List of tables
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t1   | table | postgres
(1 row)

template1=# create database db01;
CREATE DATABASE
template1=# \c db01
You are now connected to database "db01" as user "postgres".
db01=# \dt
          List of tables
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t1   | table | postgres
(1 row)

Tables & files in Postgres

Every table in the postgres database is associated with these files

Table OID file used for storing table data.
OID_fsm for managing the table’s free space
OID_vm for managing the visibility of table blocks Indexes created on a table lack a vm file, thus consisting of only two files OID and OID_fsm

~/docker-build/n8n/postgres/base/16408 » ls                                                                                  amol.di@MMMDVAMOLDI
112              175              2610_vm          2666             2831             3394_vm          3602_vm          4166
113              2187             2611             2667             2832             3395             3603             4167
1247             2224             2612             2668             2833             3429             3603_fsm         4168
1247_fsm         2228             2612_fsm         2669             2834             3430             3603_vm          4169
1247_vm          2328             2612_vm          2670             2835             3431             3604             4170

Postgres Tablespace

Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored.

initdb also creates two tablespaces pg_default and pg_global. If a tablespace is not specified when creating a table, it is stored in the pg_default tablespace. Tables managed at the database cluster level are stored in the pg_global tablespace.

postgres=# select oid, * from pg_tablespace;
 oid  | oid  |  spcname   | spcowner | spcacl | spcoptions
------+------+------------+----------+--------+------------
 1663 | 1663 | pg_default |       10 |        |
 1664 | 1664 | pg_global  |       10 |        |

The postgres base directory stores the databases managed by default tablespace.

~/docker-build/n8n/postgres/base » ls                                                                                        amol.di@MMMDVAMOLDI
1     16384 16408 4     5

A single tablespace can be utilized by multiple databases. Within the tablespace directory, subdirectories are created for each database, named after the database’s OID.

postgres=# select oid, datname from pg_database order by 1;
  oid  |  datname
-------+-----------
     1 | template1
     4 | template0
     5 | postgres
 16384 | n8n
 16408 | db01

Benifits of tablespace:

First, if the partition or volume on which the cluster was initialized runs out of space and cannot be extended, a tablespace can be created on a different partition and used until the system can be reconfigured.
Second, tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.

Simple example of using postgres to move table index to a fastdisk

Tables and indexes are stored independently in PostgreSQL. We can have table data on default tablespace (pg_default) & index in a new tablespace (fast_disk)

Current situation:

pg_default
   orders
   orders_customer_id_idx

First create a tablespace on the new disk:

CREATE TABLESPACE index_space LOCATION '/mnt/fast_disk';

Then move the index:

ALTER INDEX large_table_idx SET TABLESPACE index_space;

After move:

pg_default
   orders

index_space
   orders_customer_id_idx

Table remains in original tablespace. Only the index moves, not the table.

Postgres Fundamentals - The Concept

2026-03-06T00:00:00+00:00

Having relied on PostgreSQL within a Patroni cluster to power our production Airflow environment for some time, I’ve recently begun exploring the database more deeply. I am consistently impressed by its robust feature set and its widespread reputation across the industry as a truly reliable, enterprise-grade database solution. This fascination inspired me to pull back the curtain and understand exactly “what is under the hood” and how PostgreSQL actually works. This blog post is the result of that journey, focusing on the essential fundamentals of PostgreSQL architecture and operation.

Introduction

Postgres is a relational database management system (RDBMS). It stores structural data & allows manipulation using SQL. Apart from being a data store it is a transactional, concurrent, extensible data engine built to adhere to ACID properties at scale.

Atomicity - All or nothing
Consistency - A transaction must bring the database from one valid state to another.
Isolation - Concurrent transactions are isolated from each other.
Durability - Data is persistent even in case of system failure.

Unlike standard relational databases, Postgres allows for custom data types, inheritance, and complex structures like JSONB and geometric objects and also allows attaching methods, operators to them.

Postgres Architecture is built around a concept of background processes and shared memory.

Background Processes

These are the essential ‘housekeeping’ processes that keep the system running. The diagram shows the Postmaster handling incoming client connections and spawning individual Backend Processes.

pts/0 S+ 0:00 \_ /usr/lib/postgresql/18/bin/psql -U postgres -d n8n
? Ss 0:17 postgres
? Ss 0:00 postgres: io worker 0
? Ss 0:00 postgres: io worker 2
? Ss 0:00 postgres: io worker 1
? Ss 0:01 postgres: checkpointer
? Ss 0:02 postgres: background writer
? Ss 0:02 postgres: walwriter
? Ss 0:06 postgres: autovacuum launcher
? Ss 0:00 postgres: logical replication launcher
? Ss 0:00 postgres: postgres postgres [local] idle

Process	Action Performed
Postmaster	Listens for new connection requests and spawns a dedicated backend process for each client.
Backend Process	Executes SQL queries, manages transactions, and retrieves or modifies data for a specific connected user.
Background Writer	Periodically flushes “dirty” data pages from the shared buffer cache to persistent disk storage.
Checkpointer	Creates synchronization points by forcing all modified memory buffers to disk and updating the WAL control file.
WAL Writer	Continuously writes transaction log data from memory buffers to sequential Write-Ahead Log files on disk.
Autovacuum Launcher	Monitors table bloat and schedules worker processes to reclaim space from dead tuples.
Autovacuum Worker	Performs the actual cleanup of deleted/updated rows and updates table statistics for the query planner.
Stats Collector	Aggregates and records runtime information about table access, index usage, and row counts.
Archiver	Copies completed WAL segment files to a secure backup storage location for point-in-time recovery.
Logger	Captures system error messages and performance events and writes them to the database log files.

Shared memory

Shared memory is the critical communication highway in PostgreSQL, allowing various background processes to access and update data without constant disk I/O.

Shared Buffer Pool Acts as the primary data cache; it loads 8KB pages from disk into memory so that multiple backend processes can read and modify the same data quickly without hitting the slow physical storage.

WAL Buffer A temporary staging area for Write-Ahead Log (WAL) records; it holds transaction logs in memory until they are flushed to disk by the WAL Writer, ensuring durability without stalling the transaction for every single write.

Commit Log (CLOG) A specialized memory area that tracks the status of every transaction (whether it is in progress, committed, or aborted), allowing processes to quickly determine if a row version (tuple) is visible based on its transaction ID.

Lock Manager Maintains a shared table of all database locks (row-level, table-level, etc.); it coordinates access between concurrent transactions to prevent them from conflicting or corrupting data during updates.

ProcArray Stores the status and metadata of all currently active backend processes; it is primarily used to generate “snapshots” for MVCC, helping the system decide which data versions are visible to which users at any given moment.

Storage

Postgres seperates logics of how querries are executed and storage i.e. how data is stored on disk.

On disk, PostgreSQL organizes data into a specific hierarchy designed for reliability and fast retrieval. Here is a breakdown of the primary storage structures:

Data Files (Heap Files) The primary storage for tables, where data is organized into 8KB pages. Rows are represented internally as “tuples,” which contain both the raw data and metadata (like transaction IDs) used for concurrency control. Instead of overwriting rows, Postgres appends new versions of rows (tuples) to these files, which is the physical foundation for MVCC.

Index Files Separate files (usually B-Trees) that store pointers to the physical locations of rows in the heap files. They allow the database to find specific data without scanning every single page in a table.

Write-Ahead Log (WAL) A sequential “journal” of every change made to the database. Before a change is applied to the main data files, it is recorded here first; this ensures that if the system crashes, the database can “replay” the log to restore its state.

TOAST Tables The Oversized Attribute Storage Technique. When a single row value (like a large JSON blob or long text) exceeds the 8KB page limit, Postgres automatically moves that specific value into a separate “TOAST” file to keep the main table lean and performant. A pointer to the TOAST file is stored in the main table.

Commit Log (CLOG) A set of files in the pg_xact directory that stores the final status of every transaction (Commited, Aborted, or In-Progress). This is the “source of truth” used to determine which row versions are visible to users.

Free Space Map (FSM) A binary file that tracks how much empty space is available in each 8KB page of a table. When a new row is inserted, Postgres consults the FSM to quickly find a page with enough room, rather than searching the whole table.

Visibility Map (VM) A simple map that tracks which pages in a table contain only “frozen” (all-visible) rows. This allows the Autovacuum process to skip those pages and helps speed up “Index-Only” scans.

MVCC & WAL are two important concepts in Postgres that enable high concurrency and data durability.

MVCC (Multi-Version Concurrency Control)

MVCC is the engine that allows multiple users to read and write to the same table simultaneously without locking each other out. The core philosophy is: “Readers never block writers, and writers never block readers.” Here is how it works at the transaction level:

The “No Overwrite” Rule

Unlike other databases that might update a row in place, Postgres never overwrites existing data.

When you UPDATE a row, Postgres marks the old version as “obsolete” and inserts a completely new version (a new tuple) into the table.
When you DELETE a row, it simply marks the row as “deleted” but leaves it on the disk for a while.

Transaction IDs (xmin and xmax)

Every row (tuple) on the disk has two hidden “bookkeeping” columns that manage visibility:

xmin: The ID of the transaction that created (inserted) the row.
xmax: The ID of the transaction that deleted or updated the row. If the row hasn’t been deleted, xmax is 0.

Snapshot Isolation

When you start a transaction, Postgres gives you a Snapshot. This is a list of all transactions that are currently active or committed.
The Logic: Your transaction can only “see” rows where the xmin is a transaction that was already committed before your snapshot was taken.
The Result: If User A is updating a row but hasn’t clicked “Commit” yet, User B can still read the old version of that row. User B is essentially looking at a “version” of the database from a point in the past.

Row Visibility Flow

To determine if a row is visible to your current transaction, Postgres follows these basic rules:

Is xmin committed? If no, the row is invisible (it’s from a future or failed transaction).
Is xmax zero or uncommitted? If yes, the row is still valid and visible.
Is xmax committed? If yes, the row is “dead” (deleted) and invisible to you, because a transaction finished deleting it before you looked.

The Cleanup (Vacuum)

Because every update creates a new version, the database would eventually run out of disk space. This “clutter” is called Bloat. The Autovacuum process periodically scans the table. It looks for rows where the xmax is so old that no active transaction could possibly need to see it anymore. It then clears those rows so the space can be reused for new data.

WAL (Write-Ahead Logging)

WAL is the fundamental mechanism that guarantees database durability (the ‘D’ in ACID). Its core philosophy can be summarized in one rule: “No change to data files is ever made until a description of that change has been written to the log and flushed to permanent storage.” Here is a conceptual breakdown of how WAL works:

The Problem: Memory vs. Disk Speed

To make databases fast, PostgreSQL does most of its work (reading, inserting, updating data) in memory (the Shared Buffer Pool). Writing data sequentially to a log file on disk is much faster than jumping around randomly updating massive data files. If the database crashed while changes were only in memory, those changes would be lost. WAL solves this.

The Solution: Log It First

When a transaction performs an action (e.g., UPDATE users SET age = 30 WHERE id = 1):

PostgreSQL first modifies the data in memory (creating a “dirty page”).
Before the change is written to the main data files, a description of the change (a “WAL record”) is constructed.
This WAL record is written sequentially into the WAL Buffer in memory.

The WAL Writer and Durability

The key moment for durability happens during a COMMIT:

When the application issues a COMMIT command, the transaction cannot be considered “complete” until its corresponding WAL records are safely on disk.
The dedicated WAL Writer process is responsible for flushing WAL records from the memory buffer into sequential WAL segments on physical storage.
Only after the WAL flush is successful does PostgreSQL report “COMMIT Complete” back to the application.

What Happens During a Crash?

If power is lost or the OS crashes:
Upon restart, PostgreSQL detects it didn’t shut down cleanly.
It looks at the main data files, which might be “inconsistent” (some changes might have made it to disk, others might have been lost from memory).
It finds the last known safe point (the last Checkpoint) in the WAL.
It begins Replay (Redo): It reads the WAL segments sequentially from that checkpoint forward.
It re-applies every single change described in the WAL to the main data files, bringing them to a consistent, committed state.

By using WAL, PostgreSQL achieves a balance:

Safety: The sequential WAL write ensures durability.
Performance: The actual data files can be updated lazily in the background by other processes (the Background Writer), allowing user transactions to finish quickly without waiting for random disk I/O.

Opensource AI Fundamentals

2026-02-24T00:00:00+00:00

What is Opensource AI?

A collection of technology & frameworks that is needed to use opensource AI to build systems & applications. e.g. build AI agent to book a flight OR shop for the right shoes at the right price point by comparing across multiple shopping websites.

* AI Models - Proprietary vs Opensource Models

Choosing the right model is the foundation of any AI system. There are two categories:

Proprietary Models — Closed-source, API-access only, typically more capable out-of-the-box:

OpenAI GPT-4.5, o3, o3-mini, o1
Anthropic Claude 3.7 Sonnet (with extended thinking), Claude 3.5 Haiku
Google Gemini 2.0 Flash, Gemini 2.0 Pro (Experimental)
xAI Grok-3, Grok-3 mini
Microsoft Phi-4 (via Azure AI Foundry)

Opensource Models — Weights available publicly, can be run locally or self-hosted:

Llama 3.3 (70B) from Meta — Latest flagship open model, best-in-class instruction following
DeepSeek-R1 / DeepSeek-V3 — Top-tier reasoning & coding, rivals GPT-4o at 1/10th cost
Qwen 2.5 / Qwen2.5-Coder (7B, 32B, 72B) from Alibaba — Excellent coding & multilingual
Qwen3-VL (2B, 7B) — Multimodal (vision + language), great for image understanding tasks
Mistral Small 3.1 (24B) — Fast, efficient, Apache 2.0 licensed, strong instruction following
Gemma 3 (1B, 4B, 12B, 27B) from Google — Lightweight, optimized for local inference
Phi-4 (14B) from Microsoft — Punches above its weight on reasoning benchmarks
GLM-4 from Zhipu AI — Strong multilingual support, especially Chinese + English
Kimi k1.5 from Moonshot AI — Long-context reasoning model (up to 128k tokens)

* Model Ranking Leaderboard

Before picking a model, consult benchmarks and community rankings to find the best fit for your use case (coding, reasoning, instruction-following, multilingual, etc.):

llm-stats.com — Aggregated benchmarks and cost comparison
OpenRouter Rankings — Real-world usage and popularity rankings across providers
HuggingFace Open LLM Leaderboard — Standardized evals (MMLU, HellaSwag, ARC, etc.)

Key benchmarks to look at:

MMLU — General knowledge across 57 subjects
HumanEval / MBPP — Coding ability
MT-Bench — Multi-turn conversation quality
MATH / GSM8K — Mathematical reasoning

* Model Manager — Ollama & Docker Desktop Models

To run and manage open-source models locally, you need a model manager:

Ollama

Easiest way to download, run, and switch between local LLMs
Single command to pull and run: ollama run llama3
REST API at http://localhost:11434 — compatible with OpenAI API format
Supports: Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and more
Cross-platform: macOS, Linux, Windows

Docker Desktop Models

Docker Desktop (4.40+) has a built-in AI model runner

Pull and run models as containers: docker model run ai/llama3.2

~ » docker model list                                                                                  
MODEL NAME  PARAMETERS  QUANTIZATION   ARCHITECTURE  MODEL ID      CREATED       CONTEXT  SIZE
gemma3      3.88 B      MOSTLY_Q4_K_M  gemma3        a353a8898c9d  5 months ago           2.31 GiB

Exposes OpenAI-compatible API endpoint locally

(base)  ~/ curl http://localhost:12434/v1/models

{"object":"list","data":[{"id":"docker.io/ai/gemma3:latest","object":"model","created":1758368217,"owned_by":"docker"}]}

(base)  ~/ curl http://localhost:12434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/gemma3",
    "messages": [{"role": "user", "content": "who are you"}]
  }'

{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"I'm Gemma, a large language model created by the Gemma team at Google DeepMind. I'm an open-weights model, which means I’m widely available for public use! \n\nI can take text and images as inputs and respond with text. \n\nIt’s nice to meet you!"}}],"created":1775723162,"model":"model.gguf","system_fingerprint":"b1-0988acc","object":"chat.completion","usage":{"completion_tokens":65,"prompt_tokens":12,"total_tokens":77,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-0urCVAwxCjdfxOEOJ0fo2uNR987Am3Em","timings":{"cache_n":0,"prompt_n":12,"prompt_ms":124.101,"prompt_per_token_ms":10.34175,"prompt_per_second":96.69543355815023,"predicted_n":65,"predicted_ms":1414.579,"predicted_per_token_ms":21.762753846153846,"predicted_per_second":45.9500671224442}}%

Useful if your stack is already containerized

* Running a Local Model

Steps to get a model running locally:

Install Ollama: Download from ollama.com and install

Pull a model: ollama pull qwen3-vl:2b or ollama pull deepseek-coder-v2:latest

~ » ollama list
NAME                        ID              SIZE      MODIFIED
deepseek-coder-v2:latest    63fb193b3a9b    8.9 GB    45 hours ago
qwen3-vl:2b                 0635d9d857d4    1.9 GB    3 days ago
qwen2.5-coder:7b            dae161e27b0e    4.7 GB    12 days ago

Run the model interactively: ollama run qwen3-vl:2b

Use via API:

curl http://localhost:11434/api/generate \
  -d '{"model": "qwen3-vl:2b", "prompt": "Explain RAG in simple terms"}'

Persist context using chat API for multi-turn conversations
Monitor performance: Check RAM/VRAM usage — most 7B models need ~8GB RAM; 13B needs ~16GB

Tips:

Use quantized models (e.g., Q4_K_M) for lower memory footprint with minimal quality loss
GPU acceleration is automatic on Apple Silicon (Metal) and CUDA (NVIDIA)

* Building AI Agents — No-Code: Ollama + n8n

For no-code / low-code AI agent building:

n8n is an open-source workflow automation tool similar to Zapier/Make, but self-hostable and AI-native.

Architecture:

User Input → n8n Workflow → Ollama (local LLM) → Tool Calls → Response

Steps:

Self-host n8n via Docker: docker run -it --rm -p 5678:5678 n8nio/n8n
Add an AI Agent node in n8n
Connect it to Ollama Chat Model node (point to http://localhost:11434)
Add Tool nodes (e.g., HTTP Request, Google Search, Database query)
Define a system prompt and let the agent autonomously call tools

Use cases:

Auto-research and summarize news
Book flights by scraping airline sites
Price comparison across shopping websites
Email triage and auto-reply

* Building AI Agents — Code: Python + Ollama + OpenAI Agent SDK (ToDo)

SRE Concepts

2025-06-12T00:00:00+00:00

Lets explore these SRE concepts:

1. Service Level Objectives (SLOs)

A Service Level Objective, or SLO, is a precise, measurable target for the reliability of a service. It is a key tool for defining what “good” looks like from a user’s perspective. An SLO is built on two core components:

Service Level Indicator (SLI): A quantitative measure of a service’s behavior. Common SLIs include:
- Latency: The time it takes to serve a request. (e.g., 99% of requests served in under 200ms)
- Availability: The percentage of time a service is operational and serving requests. (e.g., 99.9% uptime)
- Throughput: The number of requests a service can handle per second.
- Error Rate: The percentage of requests that result in an error.
Objective: The specific target you set for that SLI over a defined period.

Why they matter: SLOs shift the focus from internal metrics to what truly impacts the end user. They are the contract between the service provider and the customer (whether internal or external) that sets clear expectations for reliability.

2. Error Budgets

The error budget is a direct byproduct of your SLO. It represents the maximum amount of “unreliability” that a service can tolerate over a given period without violating its SLO.

The Math: If your SLO for a service’s availability is 99.9%, your error budget is the remaining 0.1% of time the service can be unavailable. For a 30-day month, that’s roughly 43 minutes of acceptable downtime.

Why they matter: The error budget is the central mechanism for balancing innovation and reliability. It provides a clear, quantitative threshold for risk.

When the budget is “in the green” (you have time left): The team can take more risks, like deploying a new feature, knowing that a brief failure won’t violate the SLO.
When the budget is “depleted” (you’ve used up your downtime): The team must stop all non-essential feature development and focus solely on improving reliability and fixing the underlying issues that caused the downtime. This “stop and fix” rule prevents the team from digging a deeper reliability hole.

3. Distributed Tracing

As applications become more complex and move from monoliths to microservices, it becomes nearly impossible to track a single request as it travels through a dozen different services. Distributed tracing is the solution to this problem.

What it is: A method for observing and profiling requests as they flow through a distributed system. It creates a complete timeline of a single request, from the moment it enters the system to the final response.
Key Concepts:
- Span: A single unit of work within a trace, representing an operation like a database query, an API call, or a function execution. Each span has a start time, end time, and metadata.
- Trace: A collection of spans that represents a complete end-to-end journey of a request. Spans are connected in a parent-child relationship to show the flow.
- Context Propagation: The mechanism that passes unique trace and span IDs from one service to the next, allowing them to be connected into a single, cohesive trace.

Why it matters: Distributed tracing is essential for:

Root Cause Analysis: Quickly pinpointing which service or component failed or caused a slowdown.
Performance Optimization: Identifying bottlenecks and latency issues within a specific service or in the communication between services.
Understanding System Behavior: Providing a visual map of how different services interact with each other.

4. Architectural Fault Tolerance

Fault tolerance is the design philosophy of building a system that can continue to operate correctly, and with minimal impact, even when one or more of its components fail. It is about anticipating failure and designing a system to be resilient from the ground up.

Key Principles & Patterns:
- Redundancy: Having backup or duplicate components ready to take over if a primary component fails. This can be in the form of a hot-standby (active-passive) or multiple active components (active-active).
- Failover: The automatic process of switching to a redundant system or component when a failure is detected. This should be as fast and seamless as possible to minimize downtime.
- Circuit Breakers: A pattern that prevents a failing service from cascading its failure to other services. If a service is consistently failing, the circuit breaker “trips” and all subsequent requests fail fast instead of waiting and overloading the failing service.
- Load Balancing: Distributing incoming requests across multiple instances of a service to prevent a single point of failure and handle high traffic loads.
- Rate Limiting: A mechanism to control the rate of requests a service receives, protecting it from being overwhelmed and failing.

By combining these concepts, SRE teams can move from a reactive, crisis-driven model to a proactive, data-informed approach to managing reliability. SLOs set the targets, error budgets provide the framework for risk management, distributed tracing offers the visibility to debug and optimize, and architectural fault tolerance ensures the system is built to withstand inevitable failures.

Data Warehouse vs Data Lake vs Data Lakehouse

2025-05-04T00:00:00+00:00

1. Data Warehouse (DW)

A centralized repository designed for structured data (tables, rows, columns). Optimized for business intelligence (BI), reporting, and analytics.

Data type: Structured (relational, transactional, processed).

Schema: Schema-on-write (define schema before loading).

Cost: Expensive (compute + storage tightly coupled).

Use cases: Dashboards, trend analysis, financial reporting.

Examples: Snowflake, Amazon Redshift, Google BigQuery, Teradata.

2. Data Lake

A storage system that holds raw data of all types (structured, semi-structured, unstructured). Designed for flexibility and large-scale storage, not optimized for BI directly.

Data type: Structured (CSV, Parquet), semi-structured (JSON, XML), unstructured (logs, images, videos).

Schema: Schema-on-read (define schema when querying).

Cost: Cheaper (storage & compute decoupled).

Use cases: Data science, machine learning, big data analytics.

Examples: Amazon S3 + Athena, Azure Data Lake Storage, Hadoop HDFS.

3. Data Lakehouse

A hybrid approach that combines the best of both data lakes and warehouses. Stores all data types like a data lake, but supports structured querying and ACID transactions like a warehouse.

Data type: All types (structured + unstructured).

Schema: Flexible → supports both schema-on-write and schema-on-read.

Cost: More cost-effective than DW, scalable like DL.

Use cases: BI + ML/AI + advanced analytics in one platform.

Examples: Databricks Delta Lake, Apache Iceberg, Snowflake (newer versions).

Feature	Data Warehouse	Data Lake	Data Lakehouse
Data Types	Structured only	All (structured → raw)	All (structured + raw)
Schema	Schema-on-write	Schema-on-read	Both
Performance	High for BI	Slower (raw queries)	Optimized (BI + ML)
Cost	High	Low	Medium/Low
Best For	Business reporting	ML, AI, Big Data	Unified analytics