New Page Scenario Target: NovaTech Assessment Type: Grey Box assessment focused on their AI infratructure Access: Network access to a segment hosting several AI-facing services System A public-facing customer assistant with HTTP API access An internal GitLab instance hosting AI project repositories Two knowledge base endpoints backed by RAG pipelines A SIEM server collecting AI interaction logs Attack Surface Not monolithic, consisting of multiple layers. The Orchestration Layer coordinates how requests flow through the system. Frameworks like  LangChain ,  LangGraph ,  CrewAI , and  AutoGen manage prompt construction, context windows, and multi-step reasoning. Each framework has characteristic behaviors and error messages that can be fingerprinted through interaction. In practice, these frameworks often embed the components shown in the diagram. RAG logic, tool definitions, and inference client calls are typically integrated within the orchestration code rather than existing as separate services. RAG:  Fetch relevant context from vector databases before sending prompts to the model. While often implemented within orchestration frameworks, RAG functionality has distinct enumerable parameters.  Agent Tools:  Expose capabilities through protocols like Model Context Protocol  (MCP)  (more about MCP)  and includes permission boundaries that can be tested. External Integration: Connect the AI to databases, file systems, and other agents via protocols like Google's Agent-to-Agent . Inference server: Host the actual model and handles tokenization, generation, and response formatting. Such as Ollama, vLLM, TGI. Reconnaissance Methods Passive:  Analyze HTTP headers Review public API documentation Exam source code repo Mine job posts for tech stack hints More+ Active: Probe model knowledge cutoffs Tool enumeration RAG pipeline analysis via Chunk boundary detection Permission testing More+ Model Layer : Information : Model identity (vendor, family, version), capability boundaries (context window, supported languages), training data characteristics (knowledge cutoff, domain expertise), and behavioral constraints (content policies, safety filters). Method : Knowledge probing, capability testing, and response pattern analysis. RAG : Information : Embedding model identity, vector database type, chunking parameters (size, overlap, strategy), retrieval thresholds, and document sources Method : Chunk boundary probing, embedding similarity analysis, and source citation extraction. Agent : Information : Available tools and their schemas, permission boundaries, orchestration logic, and error handling behavior Method : MCP schema extraction, tool invocation testing, and permission boundary probing Infrastructure : Information :  traditional web application information plus AI-specific details: API endpoints, rate limits, error message formats, and backend service identities. Method : HTTP header analysis, error message mining, and endpoint enumeration Passive Reconnaissance HTTP Header offsec@kali:~$ curl -s -I http://192.168.50.21/ HTTP/1.1 200 OK Server: nginx/1.24.0 (Ubuntu) ... X-Powered-By: NovaTech/2.1.0 X-AI-Backend: OpenAI-GPT5.2 X-RAG-Provider: ChromaDB X-AI-Backend: OpenAI's GPT-5.2 model X-RAG-Provider: ChromaDB as the vector DB for RAG functionality API endpoint Common paths like /api/health, /api/status, etc. offsec@kali:~$ curl -s http://192.168.50.21/api/health | jq { "mcp_enabled": true, "model": "gpt-5.2-turbo", "rag_enabled": false, "service": "novatech-customer-assistant", "status": "healthy", ... "version": "2.1.0" } offsec@kali:~$ curl -s -X POST http://192.168.50.21/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"Hello"}]}' | jq { "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "Thank you for your inquiry. I'm the NovaTech Customer Assistant. This is a demonstration environment - the AI service is currently in maintenance mode. Please try again later or contact support.", "role": "assistant" } } ], "created": 1771625450, ... "model": "gpt-5.2-turbo", "object": "chat.completion", "rag_sources": null, "usage": { "completion_tokens": 50, "prompt_tokens": 297, "total_tokens": 347 } } Gathered information: The model provider and version (OpenAI GPT-5.2-turbo) The vector database (ChromaDB) The application version (NovaTech 2.1.0) MCP capability (enabled) Per-request token consumption OpenAI API compatibility Git Repo Enumeration Dependency Files like requirements.txt: offsec@kali:~$ cat support-assistant/requirements.txt # Project Aurora - AI Customer Support Assistant # Cloud-based architecture using Google Gemini API # Core LLM SDK google-generativeai>=0.8.0 # Agent Framework crewai>=0.41.0 # Vector Database Client pinecone-client>=3.0.0 # Google Cloud AI Platform (embeddings) google-cloud-aiplatform>=1.38.0 # Data Validation pydantic>=2.0 # Web Framework fastapi>=0.109.0 uvicorn>=0.27.0 ... offsec@kali:~$ cat code-reviewer/requirements.txt # Project Phoenix - AI Code Review Bot # Self-hosted architecture with GPU inference # Inference Server vllm>=0.6.0 # Agent Framework pyautogen>=0.2.0 # Vector Database pymilvus>=2.4.0 # Embeddings sentence-transformers>=2.3.0 # Code-specific Embeddings tree-sitter>=0.21.0 tree-sitter-python>=0.21.0 tree-sitter-javascript>=0.21.0 tree-sitter-go>=0.21.0 # Model Loading huggingface-hub>=0.20.0 transformers>=4.38.0 # Quantization autoawq>=0.2.0 ... google-generativeai (Gemini API), crewai (agent framework), and pinecone-client (managed vector database). These libraries act as client SDKs for external cloud services, vllm (local inference server), pyautogen (agent framework), pymilvus (self-hosted vector database), and tree-sitter (AST parsing for code analysis). These dependencies operate entirely on local infrastructure, suggesting that both model inference and data storage are self-hosted.  Attack Surface: Cloud-based: API key exposure, request interception, third-party dependencies Self-hosted: Local service security, access controls, infrastructure hardening.  RAG Configuration offsec@kali:~$ cat support-assistant/config/rag.yaml # RAG Pipeline Configuration for Aurora Support Assistant # Optimized for customer documentation search chunking: strategy: "text" chunk_size: 512 chunk_overlap: 100 separator: "\n\n" embeddings: provider: "google" model: "text-embedding-004" dimensions: 768 batch_size: 100 retrieval: top_k: 5 score_threshold: 0.75 distance_metric: "cosine" rerank: false vector_store: provider: "pinecone" environment: "${PINECONE_ENVIRONMENT}" index_name: "${PINECONE_INDEX_NAME}" namespace: "customer-docs" metric: "cosine" pod_type: "p1.x1" ... Aurora uses standard text chunking (512 characters) with Google's  text-embedding-004 model. offsec@kali:~$ cat code-reviewer/config/rag.yaml # RAG Pipeline Configuration for Phoenix Code Reviewer # Optimized for code semantic search chunking: strategy: "ast_aware" chunk_size: 1500 chunk_overlap: 300 languages: - python - javascript - typescript - go - java - rust code_parsing: extract_functions: true extract_classes: true extract_imports: true include_docstrings: true include_comments: true preserve_structure: true embeddings: provider: "huggingface" model: "Salesforce/codet5p-110m-embedding" dimensions: 256 batch_size: 32 device: "cuda:0" normalize: true reranker: enabled: true model: "BAAI/bge-reranker-base" top_k_rerank: 20 retrieval: top_k: 10 score_threshold: 0.65 distance_metric: "IP" vector_store: provider: "milvus" host: "${MILVUS_HOST}" port: "${MILVUS_PORT}" collection_name: "${MILVUS_COLLECTION}" index_type: "IVF_FLAT" index_params: nlist: 1024 search_params: nprobe: 16 consistency_level: "Eventually" ... Phoenix uses AST-aware chunking (1500 characters) with a code-specific embedding model ( codet5p-110m ). The larger chunk size and code-aware strategy indicate that Phoenix processes source code, not documents. Agent Tools Reveal capabilities and framework choices offsec@kali:~$ cat support-assistant/src/agents/tools.py ... from crewai import tool @tool def knowledge_search(query: str, department: str) -> str: """ Search customer documentation and knowledge base. ... @tool def ticket_lookup(ticket_id: str) -> dict: """ Look up support ticket status and history. ... @tool def escalate_ticket(ticket_id: str, reason: str) -> str: """Escalate ticket to human agent (create only)""" ... Handle customer support workflows. offsec@kali:~$ cat code-reviewer/prompts/function_schemas.json { "functions": [ { "name": "search_codebase", "description": "Semantic search across indexed repositories", "parameters": {"query": "string", "language": "string", "max_results": "int"} }, { "name": "post_review_comment", "description": "Post inline comment on merge request", "parameters": {"repo": "string", "mr_id": "int", "file_path": "string", ...} }, { "name": "run_security_scan", "description": "Run SAST security scanner on code (read-only)", ... } ] }  Enable code review System Prompts and Guardrails Define the AI's persona, capabilities, and restrictions offsec@kali:~$ cat support-assistant/prompts/system.txt You are Aurora, NovaTech's official customer support AI assistant. ## Your Role You help customers with: - Product questions and feature explanations - Troubleshooting common issues - Support ticket creation and status updates ... ## Restrictions - DO NOT: - Promise features that are not yet released - Discuss pricing, discounts, or negotiate contracts - Share internal documentation or employee information - Compare NovaTech products to competitors - Discuss security vulnerabilities or ongoing incidents ... offsec@kali:~$ cat support-assistant/config/safety.yaml # Safety and Guardrails Configuration for Aurora # Gemini API Safety Settings gemini_safety_settings: HARM_CATEGORY_HARASSMENT: "BLOCK_MEDIUM_AND_ABOVE" HARM_CATEGORY_DANGEROUS_CONTENT: "BLOCK_MEDIUM_AND_ABOVE" ... blocked_topics: - "competitor comparisons" - "internal roadmap" - "security vulnerabilities" - "acquisition plans" ... Supply Chain offsec@kali:~$ cat support-assistant/.env.example # Google Gemini API GOOGLE_API_KEY=your_google_api_key_here GOOGLE_PROJECT_ID=novatech-prod GOOGLE_REGION=us-central1 # Pinecone Vector Database PINECONE_API_KEY=your_pinecone_api_key_here PINECONE_ENVIRONMENT=gcp-starter PINECONE_INDEX_NAME=aurora-prod # Application Settings LOG_LEVEL=INFO MAX_CONVERSATION_TURNS=50 ENABLE_PII_DETECTION=true # Slack Integration (for escalations) SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/yyy/zzz SLACK_CHANNEL=#support-escalations ...... offsec@kali:~$ cat code-reviewer/config/models.yaml # Model Configuration for Phoenix Code Reviewer inference_server: type: "vllm" host: "${VLLM_HOST}" port: "${VLLM_PORT}" ... gpu_memory_utilization: 0.92 max_model_len: 32768 tensor_parallel_size: 2 ... primary_model: source: "huggingface" model_id: "Qwen/Qwen2.5-Coder-32B-Instruct" ... quantization: enabled: true method: "AWQ" bits: 4 group_size: 128 ... recommended_gpu: "2x NVIDIA A100 80GB" ... Active Reconnaissance AI capabilities in modern applications are typically embedded within web applications rather than exposed on dedicated ports. As a result, traditional port-based scanning is often insufficient. Instead, we probe for AI indicators in JavaScript configurations, API endpoints, and HTTP headers. Javascript exposes api endpoint. offsec@kali:~$ curl -s http://192.168.50.31/js/chat-widget.js // NovaTech Chat Widget Configuration // Internal Use Only - Do Not Distribute (function() { window.__NOVATECH_CONFIG__ = { apiBase: "/api/v2", assistantEndpoint: "/api/v2/assistant", featureFlags: { enableAI: true, debugMode: false, legacySupport: true }, timeout: 30000 }; console.log("NovaTech Helpdesk Widget Initialized"); })(); Response reveals provider, model, token counts and other information. offsec@kali:~$ curl -s -X POST http://192.168.50.31/api/v2/assistant \ -H "Content-Type: application/json" \ -d '{"message": "Hello"}' | jq { "content": "How can I assist you today?", "metadata": { "provider": "ollama", "model": "llama3.2:1b", "latency_ms": 418, "created_at": "2026-02-20T22:44:55.365061897Z", "done": true, "done_reason": "stop", "load_duration": 176895153, "prompt_eval_count": 26, "prompt_eval_duration": 40254381, "eval_count": 8, "eval_duration": 195446762 } } Headers identify api gateway. offsec@kali:~$ curl -sI http://192.168.50.31:8000/v1/billing HTTP/1.1 200 OK Content-Type: application/json Content-Length: 39 Connection: keep-alive RateLimit-Reset: 23 X-RateLimit-Remaining-Minute: 59 X-RateLimit-Limit-Minute: 60 RateLimit-Remaining: 59 RateLimit-Limit: 60 ... Server: kong/3.9.1 X-Kong-Upstream-Latency: 10 X-Kong-Proxy-Latency: 37 Via: 1.1 kong/3.9.1   Model Identity Identifying the underlying model reveals potential attack vectors because different model families exhibit distinct behaviors and weaknesses.. But models can be configured with system prompts, guardrails, and temperature settings that can mask their identity. Method: Direct Identity Probing  - Ask the model about its identity Contradiction Testing  - Probe with false assertions to reveal training biases Context Window Testing - Measure memory limits through marker injection Knowledge cutoff can be a reliable fingerprinting technique.   Model-Specific Behavior : Characteristic behaviors in response style, such as verbosity. Code generation patterns Refusal phrasing. Capability boundary mapping: Evaluate a model's practical limits. Parameter count is a major factor. Context Window Test: Injecting fillers with a mark to measure. Deployment tools like Ollama can configure context window size, so a measured limit may reflect a deployment setting rather than a model constraint. In practice, however, most operators leave the default in place, and a mismatch between two endpoints still narrows the list of candidate models. As with all fingerprinting signals, context window results are most reliable when combined with other techniques. The knowledge cutoff is identical (early 2024), but context window size and behavioral patterns clearly differentiate the models. Identity probing provided direct confirmation in this case, though models can be configured to give misleading responses.