Nick Duong

Newton (Nick) Duong

AI Architect & Enterprise Systems Leader

About Me

Building intelligent systems that transform enterprise operations

Enterprise AI Architect with deep expertise in generative AI systems, cloud-native architecture, and the full ML model lifecycle. Recent accomplishments include designing RAG/GraphRAG architectures with vector databases (ChromaDB, Neo4j, FAISS), deploying multi-agent AI systems using GCP ADK with Agent-to-Agent (A2A) patterns, and architecting LLM-powered security analysis platforms with Gemini 3.0.

Skilled at defining integration patterns across LLMs, vector databases, APIs, microservices, and event-driven workflows. Technical leadership experience spans architecture governance, cross-functional design reviews, and mentoring engineering teams on AI best practices. Committed to embedding responsible AI, data governance, and security requirements into scalable, production-ready solutions.

Languages Python, Java, TypeScript, JavaScript
Gen AI & LLMs Claude (Opus/Sonnet), Claude Code, Gemini 3.0, ChatGPT, Llama
Multi-Agent & RAG LangChain, LangGraph, GCP ADK (A2A), RAG/GraphRAG
Frameworks Spring Boot, ReactJS, NodeJS, REST API, IBM API Connect
Vector DBs ChromaDB, Neo4j, FAISS
ML & MLOps PyTorch, Vertex AI, MLflow, LLMOps
Cloud AWS, Azure, GCP (Vertex AI, Cloud Run)
DevOps & CI/CD GitHub Actions, AWS CodePipeline, Jenkins, Docker, K8s, Terraform
Databases DuckDB, Oracle, PostgreSQL, MongoDB, Redis, SQLite

Projects

A selection of enterprise AI and cloud architecture work

MISO Energy Trading Platform

Multi-agent AI system for energy market analysis featuring DAG-based workflow orchestration, PyTorch LSTM-based ML models with MLflow tracking, and graph analysis using DuckDB with Property Graph Query extensions.

Medical Billing System - AWS

Medical Billing System with React frontend (CloudFront), Python API server (Lambda), and SQLite DB. Audit tab uses Claude (Sonnet 4) and other LLMs to audit claims.

Medical Billing System - Self-Hosted

Medical Billing System with React frontend, Python API server, and Postgres DB. Audit tab uses Ollama LLM to audit claims.

Google Cloud RAG - Cloud Run

Retrieval Augmented Generation systems running on Google Cloud Run with multiple deployment configurations.

RAG Systems

Retrieval Augmented Generation systems for enterprise knowledge management, combining vector databases with LLM capabilities including resume RAG and ASOP RAG implementations.

Context/Cache Systems

Context and Cache Augmented Generation systems for enterprise knowledge management with different embedding model implementations.

Resume

Professional resume detailing 20+ years of experience in enterprise architecture and technical leadership.

OpenWebUI

Open Web UI installed on Dell Xeon 64 GB RAM and Mac Mini M4 16 GB to compare response time for Ollama models.

Ollama Chatbot

Custom-built enterprise chatbot leveraging Ollama's open-source AI models for natural language processing and contextual responses.

CI/CD Pipeline - duong.casa

This site is an example of continuous integration and continuous delivery (CI/CD) with automated SSH deployment.

React Demo

Sample React JS demo showcasing modern frontend development practices.

MCP (Java SDK)

Model Context Protocol system with MCP Java SDK implementation for time retrieval and other integrations.

Hybrid ML Training Infrastructure

Distributed setup combining high-memory batch processing with GPU-accelerated inference

Dell PowerEdge T140

Data pipelines, ETL processing, and large batch ML training

Intel Xeon E-2226G @ 3.40GHz 64 GB RAM 1 TB HD Linux Rocky 9.5 NGINX

Mac Mini M4

ML inference and model serving - 3-4x faster than CPU-only training

Apple M4 Chip 10-core GPU 16 GB Unified Memory MPS Acceleration MacOS Sequoia 15.4

Contact Me

Have a question? I'd love to hear from you