Skip to content

Configuration Reference

Each Factory-managed project uses a factory.md file at its root. The CEO auto-generates this during discovery mode, but you can edit it manually.

Minimal Configuration

## Goal
One sentence describing what the project should achieve.

## Scope
### Modifiable
- src/**
- tests/**

## Guards
- Do not delete existing tests
- Do not modify files outside scope

## Eval
### Command
pytest --tb=short -q

### Threshold
0.8

All Sections

## Goal (required)

What the project should achieve. One sentence that guides the Strategist's hypotheses.

## Scope / Modifiable (required)

Glob patterns defining which files the factory may edit. Anything outside scope triggers a guard violation.

## Scope
### Modifiable
- src/**
- tests/**
- docs/**

## Guards (required)

Inviolable rules checked before every merge. Guard violations force a revert regardless of eval score.

## Guards
- Do not delete existing tests
- Do not modify files outside scope
- Do not remove error handling

## Eval / Command (required)

Shell command for running project evaluation. Must produce parseable output.

## Eval / Threshold

Minimum composite score to keep a change. Default: 0.8.

## Target Branch

Branch for experiment PRs. Default: main.

Set to a different branch (e.g. factory/dev) to stage all factory work separately:

## Target Branch
factory/dev

Override per-run: factory ceo ~/my-project --branch staging

## Hypothesis Budget

Controls hypothesis generation constraints per cycle. The Strategist clears as many backlog items as possible and adds at most max_new new items:

## Hypothesis Budget
- min_growth: 2
- max_new: 2
  • min_growth: Minimum hypotheses targeting growth dimensions (guaranteed, never cannibalized)
  • max_new: Maximum new items the Strategist may add to the backlog per cycle

Override per-run: factory ceo ~/my-project --min-growth 3 --max-new 1

## Project Eval

User-defined eval dimensions for domain-specific metrics:

## Project Eval
- name: benchmark_accuracy
  command: python eval/benchmark.py
  parse: json
  weight: 0.6
  timeout: 300
  description: Run benchmark and report accuracy
- name: response_latency
  command: python eval/latency_test.py
  parse: exit_code
  weight: 0.4

See Eval System for details on parse formats and scoring.

## Eval Weights

Custom weight distribution across the three eval tiers:

## Eval Weights
- hygiene: 0.25
- growth: 0.25
- project: 0.50

Default when project eval is present: 0.30 / 0.20 / 0.50. Without project eval: 0.50 / 0.50.

## Smoke Test

An e2e verification command that must pass before any change is kept:

## Smoke Test
```bash
curl -sf http://localhost:8000/health
Good smoke tests are fast (under 30s), test the core user flow, and catch integration issues that unit tests miss.

### `## Constraints`

Soft rules that guide behavior but don't block merges:

```markdown
## Constraints
- Prefer small, focused changes over large refactors
- Add tests for any new public function

## Research Target

Only for research/benchmark projects. Defines the metric to improve iteratively. When present, auto-detection routes to research mode instead of improve mode.

## Research Target
- objective: maximize SWE-bench resolve rate
- metric: resolved/total
- target: 0.35
- run_command: python run_benchmark.py
- result_path: results/output.json
- result_parser: json
- timeout: 3600
Field Purpose
objective Human-readable description of the research goal
metric Key to extract from results (JSON path or regex)
target Goal value — experiments stop when this is reached
run_command Shell command to execute the benchmark/evaluation
result_path Where the run command writes results
result_parser How to parse results: json, regex, or exit_code
timeout Maximum seconds for the run command

## Mutable Surfaces

Files the Builder is allowed to modify during research experiments. One glob pattern per line. Only used in research mode.

## Mutable Surfaces
- src/**/*.py
- config/*.yaml

## Fixed Surfaces

Ground truth files, test data, and eval infrastructure. These are fingerprinted for leakage detection and must never be modified. One glob pattern per line. Only used in research mode.

## Fixed Surfaces
- tests/gold/*.json
- eval/**/*.py
- data/benchmark/*.jsonl

## Research Constraints

Additional rules for the research loop. Only used in research mode.

## Research Constraints
- Do not use GPT-4 (cost constraint)
- Each experiment must complete within 30 minutes

## Cost Budget

Per-cycle or total budget constraints for research experiments.

## Cost Budget
$5/cycle, $50 total

.factory/ Directory

Generated at runtime by the factory. Add to .gitignore — do not edit manually:

.factory/
├── config.json              # Parsed from factory.md
├── eval_profile.json        # Discovered eval dimensions
├── results.tsv              # Append-only experiment history
├── events.jsonl             # Structured event log
├── performance_report.json  # Aggregated verdicts, observations, stats
├── experiments/
│   └── 001/
│       ├── hypothesis.md
│       ├── eval_before.json
│       ├── eval_after.json
│       ├── changes.diff
│       └── verdict.json
├── strategy/
│   ├── current.md
│   ├── observations.md
│   ├── backlog.md
│   └── insights.md
├── reviews/
│   ├── <role>-latest.md
│   └── ceo-verdict-<role>.md
├── archive/                 # Archivist notes
│   ├── experiments/
│   ├── strategies/
│   ├── sources/
│   └── patterns/
└── agents/                  # Per-project prompt overrides

Environment Variables

The Factory spawns Claude Code as subprocesses — it does not call the Claude API directly. No external services are required beyond Claude Code itself.

Variable Purpose Default
FACTORY_PROJECTS_DIR Parent directory for projects created from prompts ~/factory-projects
FACTORY_MODEL Model override for agent subprocesses (Claude Code default)
FACTORY_PLAYBOOKS_DIR Directory for ACE-evolved agent playbooks ~/.factory/playbooks
FACTORY_REGISTRY_DIR Override global registry location ~/.factory
FACTORY_RUNNER CLI backend: claude or bob claude

See Setup Guide — Environment Variables for the full list, including Claude Code authentication, Bob Shell, notifications, and advanced CEO options.