Newton (Nick) Duong

About Me

Building intelligent systems that transform enterprise operations

Enterprise AI Architect with deep expertise in generative AI systems, cloud-native architecture, and the full ML model lifecycle. Recent accomplishments include designing RAG/GraphRAG architectures with vector databases (ChromaDB, Neo4j, FAISS), deploying multi-agent AI systems using GCP ADK with Agent-to-Agent (A2A) patterns, and architecting LLM-powered security analysis platforms with Gemini 3.0.

Skilled at defining integration patterns across LLMs, vector databases, APIs, microservices, and event-driven workflows. Technical leadership experience spans architecture governance, cross-functional design reviews, and mentoring engineering teams on AI best practices. Committed to embedding responsible AI, data governance, and security requirements into scalable, production-ready solutions.

Languages Python, Java, TypeScript, JavaScript

Gen AI & LLMs Claude (Opus/Sonnet), Claude Code, Gemini 3.0, ChatGPT, Llama

Multi-Agent & RAG LangChain, LangGraph, GCP ADK (A2A), RAG/GraphRAG

Frameworks Spring Boot, ReactJS, NodeJS, REST API, IBM API Connect

Vector DBs ChromaDB, Neo4j, FAISS

ML & MLOps PyTorch, Vertex AI, MLflow, LLMOps

Cloud AWS, Azure, GCP (Vertex AI, Cloud Run)

DevOps & CI/CD GitHub Actions, AWS CodePipeline, Jenkins, Docker, K8s, Terraform

Databases DuckDB, Oracle, PostgreSQL, MongoDB, Redis, SQLite

Projects

A selection of enterprise AI and cloud architecture work

MISO Energy Trading Platform

Multi-agent AI system for energy market analysis featuring DAG-based workflow orchestration, PyTorch LSTM-based ML models with MLflow tracking, and graph analysis using DuckDB with Property Graph Query extensions.

mlpartnership.com mlpartnersllc.com

Medical Billing System - AWS

Medical Billing System with React frontend (CloudFront), Python API server (Lambda), and SQLite DB. Audit tab uses Claude (Sonnet 4) and other LLMs to audit claims.

Live Demo Architecture

Medical Billing System - Self-Hosted

Medical Billing System with React frontend, Python API server, and Postgres DB. Audit tab uses Ollama LLM to audit claims.

Live Demo Architecture

Google Cloud RAG - Cloud Run

Retrieval Augmented Generation systems running on Google Cloud Run with multiple deployment configurations.

RAG v1 RAG v2 Architecture

RAG Systems

Retrieval Augmented Generation systems for enterprise knowledge management, combining vector databases with LLM capabilities including resume RAG and ASOP RAG implementations.

Resume RAG ASOP - MiniLM ASOP - BGE-M3

Context/Cache Systems

Context and Cache Augmented Generation systems for enterprise knowledge management with different embedding model implementations.

Context AG Cache AG Architecture

Resume

Professional resume detailing 20+ years of experience in enterprise architecture and technical leadership.

View Resume

OpenWebUI

Open Web UI installed on Dell Xeon 64 GB RAM and Mac Mini M4 16 GB to compare response time for Ollama models.

Dell Server Mac Mini M4 Diagram

Ollama Chatbot

Custom-built enterprise chatbot leveraging Ollama's open-source AI models for natural language processing and contextual responses.

Explore Diagram

CI/CD Pipeline - duong.casa

This site is an example of continuous integration and continuous delivery (CI/CD) with automated SSH deployment.

GitHub Diagram

React Demo

Sample React JS demo showcasing modern frontend development practices.

Demo Architecture

MCP (Java SDK)

Model Context Protocol system with MCP Java SDK implementation for time retrieval and other integrations.

Learn More

Hybrid ML Training Infrastructure

Distributed setup combining high-memory batch processing with GPU-accelerated inference

Dell PowerEdge T140

Data pipelines, ETL processing, and large batch ML training

Intel Xeon E-2226G @ 3.40GHz 64 GB RAM 1 TB HD Linux Rocky 9.5 NGINX

Mac Mini M4

ML inference and model serving - 3-4x faster than CPU-only training

Apple M4 Chip 10-core GPU 16 GB Unified Memory MPS Acceleration MacOS Sequoia 15.4