System Architecture
Technical overview of the Munin research assistant system.
High-Level Overview
+---------------------+
| Users |
+----------+----------+
|
v
+---------------------+
| Cloudflare Tunnel |
| (Zero Trust) |
| Email OTP Auth |
+----------+----------+
|
v
+-----------------------------------------------------------------------------+
| MUNIN CLUSTER |
| |
| +-------------------+ +-------------------+ +-------------------+ |
| | Open WebUI |---->| Retrieval API |---->| Qdrant | |
| | (Chat UI) | | (RAG Layer) | | (Vector DB) | |
| | Port 3000 | | Port 8080 | | Port 6333 | |
| +---------+---------+ +---------+---------+ +-------------------+ |
| | | |
| | v |
| | +-------------------+ +-------------------+ |
| | | SearXNG | | Neo4j | |
| | | (Web Search) | | (Citations) | |
| | | Port 8888 | | Port 7474 | |
| | +-------------------+ +-------------------+ |
| v |
| +-------------------+ |
| | vLLM | Personas: |
| | Port 8000 | - Chatgeti (research) |
| | | - Codegeti (code) |
| | Qwen3-30B-A3B-AWQ | - Writegeti (writing) |
| | (MoE 30B/3B) | |
| +-------------------+ |
| |
| GPU 1 (RTX 5090) - 90% VRAM |
+-----------------------------------------------------------------------------+
Persona Comparison
| Persona | Purpose | Thinking Mode | Tools Available |
|---|---|---|---|
| Chatgeti | Research assistant, paper analysis, literature discovery | Enabled | Paper Search, Citations, Academic Search (Semantic Scholar, PubMed), Web Search |
| Codegeti | Code assistant, scientific Python, SLURM scripts | Enabled | Web Search, Date/Time |
| Writegeti | Academic writing, scientific communication, editing | Enabled | Web Search, Date/Time |
All personas use the same underlying model: Qwen3-30B-A3B-Thinking-AWQ (MoE architecture, always-thinking).
Workflow: Use Chatgeti to discover and analyze papers -> Codegeti to implement methods -> Writegeti to write about results.
RAG (Retrieval-Augmented Generation) Flow
When you ask a question with RAG enabled, here's what happens:
Open WebUI receives your question and checks which knowledge bases are enabled.
The retrieval service queries all selected sources simultaneously (papers, web).
Your query is embedded using SPECTER (papers) and compared against stored vectors in Qdrant.
Top matching documents are ranked and assembled into context for the LLM, with source attribution.
The LLM receives your question + retrieved context and generates a grounded response with citations.
Knowledge Bases
| Source | Content | Embeddings | Update Frequency |
|---|---|---|---|
| Papers | Scientific PDFs (title, abstract, metadata) | SPECTER (768 dim) | Manual upload |
| Web | Live web search results | N/A (real-time) | Real-time |
| User Docs | Your uploaded files | Open WebUI default | On upload |
Model Configuration
| Parameter | Value | Notes |
|---|---|---|
| Model | Qwen3-30B-A3B-Thinking-AWQ-4bit | MoE architecture, always-thinking |
| Total Parameters | 30B | With 3B active per forward pass |
| Quantization | AWQ 4-bit | ~17GB model size |
| Context Window | 32,768 tokens | Constrained by RTX 5090 VRAM |
| Concurrent Requests | 3 | Balance between throughput and latency |
| Thinking Mode | Always enabled | Outputs reasoning in collapsible section |
Service Ports
| Port | Service | Access |
|---|---|---|
| 3000 | Open WebUI | Via Cloudflare Tunnel |
| 6333 | Qdrant Vector DB | Internal only |
| 7474 / 7687 | Neo4j Graph DB | Internal only |
| 8000 | vLLM (all personas) | Internal only |
| 8070 | GROBID (PDF parsing) | Internal only |
| 8080 | Retrieval Service | Internal only |
| 8888 | SearXNG | Internal only |
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Edge | Cloudflare Tunnel + Access | Secure access, authentication, SSL |
| UI | Open WebUI | Chat interface, document upload |
| RAG | Custom FastAPI Service | Multi-source retrieval |
| Search | SearXNG | Privacy-respecting web search |
| Inference | vLLM | Fast LLM serving (OpenAI-compatible) |
| Model | Qwen3-30B-A3B-Thinking-AWQ | MoE architecture (30B/3B active) |
| Vectors | Qdrant | Semantic search |
| Graph | Neo4j | Citation relationships |
| Parsing | GROBID | Scientific PDF extraction |
| Embeddings | SPECTER, BGE-base | Text to vectors |
| Compute | SLURM | Job scheduling, GPU allocation |
| Containers | Docker Compose | Service orchestration |
User Types
| WebUI Users (Remote Researchers) | Cluster Users (SLURM/SSH Access) |
|---|---|
| Access: Cloudflare -> WebUI | Access: VPN -> SSH -> SLURM |
| Auth: Email OTP (30-day sessions) | Auth: Linux accounts + SSH keys |
| GPU: Via vLLM service | GPU: GPU 0 via SLURM jobs |
| Can: Chat with personas, use RAG, upload documents | Can: Submit SLURM jobs, run training/inference, SSH access |