The same thesis, applied across several domains.

Each of these projects is a working test of the same architectural thesis: that the most useful AI work today isn't replacing human judgement — it's giving one person the operating leverage of a small, disciplined team. The pattern repeats across capital allocation, defensive operations, security research, and the way I run my own work.

In each case the agent does the heavy lifting, the human stays the decision-maker, every action is auditable, and every assumption is explicit. They are research projects, not products. The point of featuring them here is to show what the architectural thinking looks like in practice.

/ 01 In Evaluation

Kronos Engine

A research project applying foundation-model forecasting to personal capital discipline.

Investigation

Kronos Engine forecasts crypto price as a calibrated probability distribution and uses that distribution to discipline how capital is sized — it de-risks rather than predicts. It's backtested and deploy-ready, but not yet live-traded. The honest reckoning came from out-of-sample testing on a 74-day, 1,800-candle XRP window: the model has no directional edge (~48.6% hit rate, a coin flip) and lost to a naïve persistence baseline on volatility by ~38%, so I cut both of those uses entirely. The one property that held up was calibration — about 79% of realised prices landed inside the 80% forecast band — so the engine now leans on that single verified strength. It sizes off how wide the distribution is, and steps aside ahead of scheduled macro events.

Architecture
  • Forecasting core: TimesFM-2.5 (200M-param, Google Research) run zero-shot with a native quantile head — direct quantile forecasts at a 1h candle close / 24h horizon, with no fine-tuning pipeline
  • Calibrated quantile envelope as the primary output — validated at ~79% coverage on the 80% band over a 74-day / 1,800-candle XRP holdout
  • Rebuilt risk gate: band-width percentile drives a position-size multiplier, with regime-breakout detection against the previous forecast band, stacked penalties on a hard floor, and fail-safe stale handling so the agent never trades on a missing or stale signal
  • Forward-looking macro layer: a FOMC / CPI / PPI / Fed-chair / CLARITY-Act calendar wired into the gate that auto-cuts size ahead of scheduled catalysts (HIGH ×0.4, ELEVATED ×0.8) — roughly 25% size about 12h before an FOMC decision
  • Automated out-of-sample evaluator freezes a forecast and grades it at horizon end — band coverage, MAE/MAPE, directional call, worst breach — with macro-window context, as a scheduled one-shot job
  • Evaluation harness reports the unflattering truth (directional hit rate, MAE, calibration coverage); those results are what drove dropping direction and volatility forecasting
  • Data and execution: multi-source price feed with fallback (OKX → Binance → Bybit) and Hyperliquid SDK execution. Planned — an authenticated Bybit feed and live VPS deployment
Calibration over prediction The validation killed the parts that sounded smartest — calling direction, forecasting volatility — and left exactly one thing the model does reliably: quote a well-calibrated range. So the engine no longer pretends to know where price is going; it sizes off how wide the distribution is and steps aside before scheduled macro events. The model is still just a primitive — now it's the gate, the calendar, and the band width that do the work.
Stack Python · PyTorch · TimesFM-2.5 (Google Research) · Hyperliquid SDK · OKX / Binance / Bybit data · SQLite · systemd (hourly gate + one-shot evaluator) · Hetzner CPX21 (planned) · Claude
Reference forecasting model: github.com/google-research/timesfm
/ 02 Live

Privacy Posture

An AI agent that systematically reduces personal data exposure across data brokers, breach databases, and OSINT sources.

Investigation

Most people's personal data is scattered across fifty-plus data brokers and people-search sites that scrape, package, and sell it. Manual removal is hours of repetitive work per broker, and the records reappear. Privacy Posture applies an AI agent — operating with human approval gates — to map the PII footprint, draft jurisdiction-correct removal requests, track every request through to confirmation, and re-check for reappearance on a schedule. Effectively a small Security Operations Centre where the analyst is an LLM, the case management system is a structured workspace, and the legal-basis library is encoded as a template set.

What's operating
  • 49 brokers mapped across 3 priority tiers, with active opt-out queue
  • 6-template legal framework library covering CCPA, GDPR, PDPA, generic, follow-up, and reappearance scenarios
  • SLA logic per framework (CCPA 45d, GDPR 30d, PDPA 21d) with 14-day and 90-day reappearance check windows
  • Identity-verification minimisation logic that refuses brokers' overreaching ID-upload demands by citing data minimisation principles
  • Operations hub spanning PII Inventory, Broker Targets, Active and Completed Requests, Escalations, Breach Monitor, and Template Library
  • OSINT baseline scans across name variants, identifiers, and handles, with WHOIS / domain privacy posture checks
  • Time-to-draft per opt-out: ~30 seconds, vs. ~10 minutes manual
Lesson worth featuring The original ledger sat in Google Drive, but the connector is effectively read-only for binary files — which breaks the autonomy claim. Migration to a workspace where the agent has full read/write authority restored the loop. When building agent-native workflows, choose storage where the agent is a first-class citizen, or you re-introduce the manual steps you were trying to remove.
Stack Claude (analyst agent, human-in-the-loop) · MCP integrations (Notion · Gmail) · structured case-management workspace · templated legal-basis library · scheduled OSINT & breach monitoring
/ 03 In Daily Use

Scope Sentinel

An AI-orchestrated research aide for HackerOne — recon, scope validation, and report drafting in one pipeline.

Investigation

A research-aide toolkit for solo bug bounty work, first deployed on HackerOne. Architecture is intentionally split: standard scanners (nuclei, subfinder, httpx) handle the technical execution, while a multi-agent Claude layer — planner, researcher, and report-drafter — handles scope parsing, scanner orchestration, and turning collected output plus researcher observations into platform-formatted draft reports for human review. The most distinctive capability is the report drafter: turning structured scanner output and unstructured notes into submissions ready for human approval.

What's working
  • Multi-agent Claude stack (planner + researcher + report-drafter) orchestrating scanners and turning output into platform-formatted drafts
  • Pulls and parses HackerOne program scope; validates assets are in-scope before any scanning begins
  • Coordinates a recon → scan → triage pipeline end-to-end, with findings, evidence, and drafts flowing through MCP into Notion, Drive, and Gmail
  • Platform-agnostic core; first-platform deployment on HackerOne, with adapters for additional programs as a build-out, not a rewrite
  • Honest workflow improvement: under five hours saved per week, with the bigger gain in consistency of report quality rather than raw throughput
  • Human stays the decision-maker at every consequential step — the tool reduces slow surface area, not judgement
Why it's here Scope Sentinel is the working version of a thesis I bring into commercial conversations: that AI's real leverage is helping one disciplined operator scale the rigor of their own work. The same multi-agent architecture, MCP integration, and human-in-the-loop design pattern apply far beyond security research.
Stack Python · TypeScript · Claude (multi-agent: planner · researcher · report-drafter) · MCP → Notion · Drive · Gmail · nuclei · subfinder · httpx
/ 04 Live

GitHub Validator

A safety dossier on any public GitHub repository — built so I can vet unfamiliar code before I clone or install it.

Investigation

Before I pull an unfamiliar repo onto my machine, I want a fast read on two separate questions: is this code likely to harm my system, and is the project actually maintained or quietly abandoned. You paste a public repo URL and it pulls the repo's GitHub metadata, scans the source for known risky patterns, checks for known vulnerabilities against the OSV database, and runs a semantic check on whether the repo fits what I said I wanted it for. It returns a single-page dossier that deliberately splits a Risk Score ("will this code hurt me") from a maintenance Hygiene Score ("does the maintainer take security seriously"). Every run is appended to a Google Sheet, so I keep a running history of everything I've checked.

What's operating
  • Input is a public GitHub repo URL (github.com/owner/repo) — it validates the source repository, not the published registry package, which is a deliberate scope choice
  • Pulls GitHub metadata: stars, forks, last-commit date, account age, and a bus-factor read for single-maintainer risk
  • Safety scanner flags risky source patterns (e.g. eval, obfuscation, suspicious exec/network calls) and rolls them into a Risk Score
  • Vulnerability lookup against the OSV.dev CVE database
  • Maintenance Posture check: detects SECURITY.md, CODEOWNERS, Dependabot/Renovate config, and a tests directory, scored as a Hygiene Score (0–100) with a tier label (strong / moderate / weak / minimal)
  • Claude-powered semantic match plus automatic logging — each validation writes a row to a Google Sheet through a token-authenticated Apps Script webhook (timestamp, repo, scores, flags, summary, archival status); installable as a PWA with an offline shell
  • Planned — client-side request cooldown and fetch timeouts, wiring the Hygiene Score columns into the Sheet log, and optional edge rate-limiting/WAF (gated behind a paid host plan)
Two Scores, Not One The most useful decision was refusing to collapse everything into a single trust number. "Is this code dangerous" and "is this project maintained" are genuinely different questions, and a blended score hides the one you actually need in the moment. So the dossier reports them separately — Risk on one axis, maintenance Hygiene on the other.
Stack Single-page JavaScript app · Tailwind CSS · HTML · PWA (service worker + manifest) · GitHub REST API · OSV.dev · Claude API (semantic match) · Google Apps Script · Google Sheets (logging) · Netlify
Live: enchanting-llama-294166.netlify.app · Repo private / not published
/ 05 In Daily Use

Second Brain

A personal knowledge vault on Google Drive that Claude reads as working memory — so my AI assistant starts every session already knowing my projects, decisions, and context instead of from zero.

Investigation

Every time I started a new chat with an AI, I re-explained who I am, what I'm building, and what we'd already decided — and the moment the conversation ended, all of that was gone. The thesis is that a second brain shouldn't be a passive archive I search through; it should be the AI's working memory, structured so an agent can read it on demand and write back to it. So I built a Drive-based vault organised by entity type (projects, people, tools, concepts, decisions) where every note opens with consistent metadata, an operating manual tells Claude how to navigate it, and session rituals make sure nothing useful disappears between conversations. I parsed my own full chat-export history into it, then synthesised the raw material into linked wiki pages. It now runs as the context layer behind how I actually work day to day.

What's operating
  • Ingested and parsed my complete AI chat export — 242 conversations — and bucketed them into 19 topic areas as raw source material
  • Synthesised that into ~33 cross-linked wiki pages (projects, tools, people, concepts, decisions), each with structured metadata and [[wikilinks]] between related entities
  • A CLAUDE.md operating manual that loads at the start of every session — identity, hard rules, a topic-routing table, and session start/end protocols
  • Split into CLAUDE.md (instructions only) and Home.md (reference data) so the manual stays lean
  • A command-center dashboard tracking open loops, bottlenecks, and action items across every project, refreshed each session
  • A template library and per-session logging plus weekly-review folders so the vault stays consistent and doesn't rot
  • Ingest and automation scripts (chat-export parsing, daily digest, VPS setup) staged in the vault — scheduled unattended runs are planned once the VPS is provisioned
Memory, not archive The shift that made it work was treating the vault as the AI's working memory rather than my filing cabinet — organised for an agent to use, not just for me to search. The honest limitation: it's only as alive as the discipline around it, which is why the session rituals and review cadence matter more than the file count.
Stack Google Drive · Markdown · YAML frontmatter · Obsidian · Claude Code · MCP · Python · rclone · Hetzner VPS (planned)
Get in touch

Let's talk about
what you're building.

ryan@gruponugara.com
LinkedIn Colombo, Sri Lanka +94 — Available on request