ada/agentguard-ci

Fork 0

Files

T

Elizabeth W 89b3586030 noted outdated features

2026-04-19 21:17:14 -06:00

9.7 KiB

Raw Permalink Blame History

Security & Pipeline Architecture Decision Record (ADR)

1. Context and Constraints

The architecture is designed for a self-hosted homelab environment with internet-exposed applications. The development workflow heavily relies on AI coding agents. The system must satisfy the following constraints:

Language & Style: Exclusively TypeScript, utilizing a highly functional, Domain-Driven Design architecture (based on Scott Wlaschin's Domain Modeling Made Functional pipeline methods, monads, and deep function chaining).
Infrastructure: Self-hosted Gitea/GitLab with Argo Workflows. Repositories are private.
Low Friction / Low Noise: Developer fatigue must be minimized. False positives are unacceptable, as they lead to ignoring security tools altogether.
AI Agent Defense: Must specifically catch AI-introduced flaws (hallucinated packages, insecure API usage, bypasses).
Low Maintenance: Avoid managing heavy databases or infrastructure just to run security scans.
Budget: Willing to pay for premium tools (e.g., ~$50/month) only if they significantly reduce developer friction and pipeline execution time.

2. Pipeline Execution Strategy: Argo Workflows

To maintain developer velocity (the "Friction" principle), pipeline feedback must be fast.

Decision: Utilize Argo Workflows natively configured with Directed Acyclic Graphs (DAGs) and Step-Level Memoization.
Trade-off Analyzed: Paid remote-caching CI runners (like RWX Mint, Dagger, or Earthly) vs. Argo.
Why Argo won: Argo natively supports parallel pod execution. Five distinct 1-minute security steps will run simultaneously, resulting in a total pipeline execution time of 1 minute. Argo's native "Step-Level Memoization" allows task skipping if inputs haven't changed, providing sufficient caching without the cost of enterprise CI tools.

3. The Security Stack (Defense in Depth)

Layer 1: Instant IDE Feedback (0-second delay)

Tool: eslint with eslint-plugin-security and @typescript-eslint.
Reasoning: Linters are "dumb" but instantaneous. They will catch AI agents generating immediately dangerous syntax (like eval() or unsafe Regex) before a commit is even made.

outdated, using pulumi crossguard

Layer 2: Infrastructure as Code (IaC) Scanning

Tool: Checkov (Open Source)
Reasoning: Lightweight CLI tool to ensure the AI agents do not accidentally expose internal homelab ports to the internet or misconfigure container permissions.

Layer 3: Supply Chain Security

Tool: Socket.dev (Team Tier - $25/month)
Trade-off Analyzed: Standard npm audit / Dependabot vs. Socket.dev.
Why Socket won: AI agents are notorious for hallucinating package names, which leads to downloading typosquatted malware. Furthermore, standard tools alert on every CVE, causing massive alert fatigue. Socket provides Reachability Analysis—it maps the data flow and only alerts if the code actually calls the vulnerable function within a library. This drastically reduces false positives and justifies the $25/mo cost.

Layer 4: Static Application Security Testing (SAST)

Tool: Semgrep Code / Teams Tier ($30/month)
Trade-off Analyzed: Semgrep OSS / Free Tier vs. Semgrep Teams.
Why Semgrep Teams won: Standard SAST tools (including Semgrep OSS/Pro Engine) rely on strict AST Taint Analysis. Because this project uses a highly functional architecture (custom pipelines, generic monads like Result<T,E>, and .bind() chaining), traditional data-flow trackers easily lose the trace, leading to dangerous False Negatives.
The Deciding Feature: The Teams tier unlocks AI-Powered Detection (not just the AI Assistant for triage). Semgrep's AI reads the semantic intent of the code, successfully tracing data through complex functional wrappers where traditional scanners fail. This acts as an automated senior security reviewer reading the AI agent's PRs.

4. Alternatives Dismissed

Tool	Reason for Rejection
CodeQL	Best-in-class taint analysis, but requires an exorbitant GitHub Advanced Security license for private repositories. Scans are also notoriously slow (compile-to-database architecture), violating the "low friction / fast pipeline" constraint.
SonarQube	Excellent for tech debt, but violates the "Low Maintenance" constraint. Requires spinning up and maintaining a dedicated Postgres database and Java server in the homelab. Generates too much noise out-of-the-box.
Snyk Code	Great UX, but lacks the ability to write custom rules. If the AI agent develops a specific bad habit unique to this codebase, Snyk cannot be easily tuned to block it.
Checkmarx / Veracode	Built for massive legacy enterprise compliance. Far too expensive, slow, and noisy for a modern, agile homelab setup.

outdated using harvester default registry

5. Future Considerations / Phase 2

Build Caching: If actual container build steps (docker build, npm install) become the bottleneck in Argo Workflows, evaluate adding open-source caching layers like Kaniko or BuildKit inside Argo pods before purchasing paid caching solutions.
Custom Semgrep Rules: If the AI agent repeatedly makes domain-specific logic errors (e.g., misusing a specific custom Monad), write lightweight custom Semgrep YAML rules to permanently block those specific anti-patterns.

6. Detailed Analysis of Rejected Alternatives

When designing this pipeline, several industry-standard tools were evaluated. They were ultimately rejected because they violated one or more core constraints of this specific environment: Low Homelab Maintenance, Fast Pipeline Execution (Low Friction), Support for Functional Architectures, or Affordability for Private Repos.

Rejected: CodeQL (by GitHub)

What it is: A highly advanced, semantic code-scanning engine that treats code like a database to perform deep data-flow and taint analysis.
Why it was rejected (Dealbreakers):
- The "Private Repo Tax": CodeQL is only free for public open-source projects. Because this homelab uses self-hosted private Gitea/GitLab repositories, using CodeQL would require purchasing GitHub Advanced Security (GHAS) enterprise licenses, which are prohibitively expensive for an individual.
- Pipeline Friction: CodeQL requires a build phase. It has to compile the TypeScript code into a relational database before it can run queries. This adds significant time (often minutes) to the pipeline, violating the strict fast-feedback loop required for high-velocity AI agent development.

Rejected: SonarQube / SonarCloud

What it is: A holistic code quality and SAST platform that acts as a central dashboard for code smells, technical debt, and security vulnerabilities.
Why it was rejected (Dealbreakers):
- High Infrastructure Maintenance: To run SonarQube for free on private code, it must be self-hosted. This requires standing up a dedicated Java-based application server and a PostgreSQL database in the homelab cluster. Managing, updating, and tuning this infrastructure directly contradicts the "Low Maintenance" constraint.
- Noise and Focus: SonarQube heavily flags general "code smells" and style issues. In an AI-driven workflow, this generates massive alert fatigue. The goal is to catch critical security flaws and complex logic errors, not to argue with the scanner about whether an AI agent wrote a slightly redundant if statement.

Rejected: Snyk Code (SAST capabilities)

What it is: A developer-first, high-speed SAST tool powered by machine learning, famous for instant IDE feedback. (Note: Snyk was evaluated as a SAST alternative to Semgrep, whereas Socket was chosen for dependencies).
Why it was rejected (Dealbreakers):
- The "Black Box" Limitation: Snyk does not allow users to write custom security rules. If the AI agent develops a bad habit specific to this project's unique functional domain-modeling architecture, there is no way to write a quick rule to block it. Semgrep’s YAML rules allow for immediate, custom course-correction.
- Scan Limits: The free tier heavily restricts the number of scans per month. Because AI agents often generate dozens of micro-commits and rapid iterations, the pipeline would frequently hit rate limits, blocking deployment.

Rejected: Legacy Enterprise Scanners (Checkmarx, Veracode, Fortify)

What it is: Heavyweight commercial application security testing platforms utilized by massive enterprises, banks, and governments for compliance auditing.
Why it was rejected (Dealbreakers):
- Execution Speed: Historically known for extremely slow scan times (sometimes taking hours), completely destroying developer velocity and CI/CD parallelism.
- Extreme Cost: Pricing starts in the tens of thousands of dollars.
- High False Positives: Out-of-the-box, these tools are incredibly noisy and require dedicated AppSec teams to tune them to the specific application architecture.

Rejected: Paid CI/CD Execution Engines (RWX Mint, Dagger Cloud, Buildkite)

What it is: Next-generation CI platforms that offer advanced DAG execution, deep remote layer caching, and highly optimized parallel builds.
Why it was rejected (Dealbreakers):
- Redundancy with Argo: While powerful, paying a premium for these platforms is unnecessary. Because the homelab already utilizes Kubernetes, Argo Workflows natively provides DAG parallel execution and step-level memoization for free. The compute is already paid for by the homelab hardware, making commercial CI orchestrators an unnecessary expense for this phase of the architecture.

9.7 KiB Raw Permalink Blame History Unescape Escape