Configuration Reference¶
Each re:factory-managed project uses a factory.md file at its root. The CEO auto-generates this during discovery mode, but you can edit it manually.
Minimal Configuration¶
## Goal
One sentence describing what the project should achieve.
## Scope
### Modifiable
- src/**
- tests/**
## Guards
- Do not delete existing tests
- Do not modify files outside scope
## Eval
### Command
pytest --tb=short -q
### Threshold
0.8
All Sections¶
## Goal (required)¶
What the project should achieve. One sentence that guides the Strategist's hypotheses.
## Scope / Modifiable (required)¶
Glob patterns defining which files re:factory may edit. Anything outside scope triggers a guard violation.
## Guards (required)¶
Inviolable rules checked before every merge. Guard violations force a revert regardless of eval score.
## Guards
- Do not delete existing tests
- Do not modify files outside scope
- Do not remove error handling
## Eval / Command (required)¶
Shell command for running project evaluation. Must produce parseable output.
## Eval / Threshold¶
Minimum composite score to keep a change. Default: 0.8.
## Target Branch¶
Branch for experiment PRs. Default: main.
Set to a different branch (e.g. factory/dev) to stage all factory work separately:
Override per-run: factory ceo ~/my-project --branch staging
## Hypothesis Budget¶
Controls hypothesis generation constraints per cycle. The Strategist clears as many backlog items as possible and adds at most max_new new items:
- min_growth: Minimum hypotheses targeting growth dimensions (guaranteed, never cannibalized)
- max_new: Maximum new items the Strategist may add to the backlog per cycle
Override per-run: factory ceo ~/my-project --min-growth 3 --max-new 1
## Project Eval¶
User-defined eval dimensions for domain-specific metrics:
## Project Eval
- name: benchmark_accuracy
command: python eval/benchmark.py
parse: json
weight: 0.6
timeout: 300
description: Run benchmark and report accuracy
- name: response_latency
command: python eval/latency_test.py
parse: exit_code
weight: 0.4
See Eval System for details on parse formats and scoring.
## Eval Weights¶
Custom weight distribution across the three eval tiers:
Default when project eval is present: 0.30 / 0.20 / 0.50. Without project eval: 0.50 / 0.50.
## Smoke Test¶
An e2e verification command that must pass before any change is kept:
Good smoke tests are fast (under 30s), test the core user flow, and catch integration issues that unit tests miss.
### `## Clean PR`
Strips non-essential artifacts (eval scripts, benchmarks, `.factory/` data, eval test files) from PRs before pushing to external repositories. Useful when contributing factory-managed code to upstream repos that don't want factory infrastructure.
```markdown
## Clean PR
- clean_pr: true
- clean_pr_include: ["src/**", "lib/**"]
- clean_pr_exclude: ["src/internal/**"]
| Field | Purpose | Default |
|---|---|---|
clean_pr |
Enable clean PR mode | false |
clean_pr_include |
Include-only glob patterns — if set, only matching files survive | [] |
clean_pr_exclude |
Additional exclude patterns beyond defaults | [] |
Default excludes (always applied): eval/score.py, benchmarks/**, tests/eval_*, .factory/**. A file matched by both include and exclude is excluded (exclude wins).
Resolution precedence: CLI flag (--clean-pr / --no-clean-pr) > config.json > default (false).
## Constraints¶
Soft rules that guide behavior but don't block merges:
## Constraints
- Prefer small, focused changes over large refactors
- Add tests for any new public function
## Research Target¶
Only for research/benchmark projects. Defines the metric to improve iteratively. When present, auto-detection routes to research mode instead of improve mode.
## Research Target
- objective: maximize SWE-bench resolve rate
- metric: resolved/total
- target: 0.35
- run_command: python run_benchmark.py
- result_path: results/output.json
- result_parser: json
- timeout: 3600
| Field | Purpose |
|---|---|
objective |
Human-readable description of the research goal |
metric |
Key to extract from results (JSON path or regex) |
target |
Goal value — experiments stop when this is reached |
run_command |
Shell command to execute the benchmark/evaluation |
result_path |
Where the run command writes results |
result_parser |
How to parse results: json, regex, or exit_code |
timeout |
Maximum seconds for the run command |
## Mutable Surfaces¶
Files the Builder is allowed to modify during research experiments. One glob pattern per line. Only used in research mode.
## Fixed Surfaces¶
Ground truth files, test data, and eval infrastructure. These are fingerprinted for leakage detection and must never be modified. One glob pattern per line. Only used in research mode.
## Research Constraints¶
Additional rules for the research loop. Only used in research mode.
## Research Constraints
- Do not use GPT-4 (cost constraint)
- Each experiment must complete within 30 minutes
## Inner Loop¶
Multi-run configuration for research mode. Runs the evaluation harness multiple times per cycle and aggregates the metric. Useful for stochastic pipelines where a single run doesn't give a reliable signal. Only used in research mode.
## Inner Loop
- runs_per_cycle: 5
- aggregate: mean
- plateau_threshold: 3
- max_inner_runs_per_cycle: 10
| Field | Purpose | Default |
|---|---|---|
runs_per_cycle |
Number of times to run the harness per cycle | 1 |
aggregate |
How to combine scores: mean, median, max, all_pass |
mean |
plateau_threshold |
Consecutive non-improving cycles before triggering outer loop | 3 |
max_inner_runs_per_cycle |
Optional cap on runs per cycle | None |
## Outer Loop Surfaces¶
Surface scoping for inner/outer loop transitions. When inner loop improvements plateau, re:factory expands the Builder's scope to include outer surfaces for architectural changes. Only used in research mode.
## Outer Loop Surfaces
- max_outer_cycles: 5
- inner: prompts/*.md
- inner: config/*.yaml
- outer: src/**/*.py
- outer: agents/**/*.md
| Field | Purpose |
|---|---|
max_outer_cycles |
Maximum outer loop expansions before stopping |
inner: <glob> |
Narrow surfaces used during inner loop (one per line) |
outer: <glob> |
Additional surfaces unlocked after plateau (one per line) |
Entries use prefix format — inner: and outer: followed by a glob pattern. Multiple entries per type are allowed.
## Cost Budget¶
Per-cycle or total budget constraints for research experiments.
.factory/ Directory¶
Generated at runtime by re:factory. Add to .gitignore — do not edit manually:
.factory/
├── config.json # Parsed from factory.md
├── eval_profile.json # Discovered eval dimensions
├── results.tsv # Append-only experiment history
├── events.jsonl # Structured event log
├── performance_report.json # Aggregated verdicts, observations, stats
├── experiments/
│ └── 001/
│ ├── hypothesis.md
│ ├── eval_before.json
│ ├── eval_after.json
│ ├── changes.diff
│ └── verdict.json
├── strategy/
│ ├── current.md
│ ├── observations.md
│ ├── backlog.md
│ └── insights.md
├── reviews/
│ ├── <role>-latest.md
│ └── ceo-verdict-<role>.md
├── archive/ # Archivist notes
│ ├── experiments/
│ ├── strategies/
│ ├── sources/
│ └── patterns/
└── agents/ # Per-project prompt overrides
User Configuration (~/.factory/config.toml)¶
All FACTORY_* environment variables can also be set in ~/.factory/config.toml. Env vars remain fully supported (for CI, Docker, scripting) — config.toml is additive.
Precedence (highest wins)¶
Example config.toml¶
[defaults]
runner = "claude" # CLI backend: "claude" or "bob"
model = "" # Claude model for agent subprocesses
projects_dir = "~/factory-projects" # Root for factory-managed projects
[credentials.vertex]
CLAUDE_CODE_USE_VERTEX = "1"
ANTHROPIC_VERTEX_PROJECT_ID = "my-gcp-project"
CLOUD_ML_REGION = "us-east5"
[credentials.bob]
FACTORY_RUNNER = "bob"
BOBSHELL_API_KEY = "..."
Commands¶
factory config edit # Open config in $EDITOR (creates template if missing)
factory config show # Show resolved config (secrets masked)
factory config show --reveal # Show full values including secrets
factory config migrate # Create starter config from current env vars
Credential profiles¶
Profiles let you switch between environments without juggling env vars:
factory ceo ~/my-project --profile vertex
factory run ~/my-project --profile bob --loop
factory agent researcher --task "..." --project ~/my-project --profile vertex
Profile credentials are injected via os.environ.setdefault(), so pre-existing env vars always win.
Security¶
- Config file is created with
0o600permissions (owner read/write only) factory config showmasks secrets by default (keys containing "key", "token", "secret", "password")- Profile names are validated:
[a-zA-Z0-9_-]+only (no path traversal) - Credential keys must be valid shell identifiers:
[A-Z_][A-Z0-9_]*
Available defaults keys¶
| Key | Env var equivalent | Default |
|---|---|---|
runner |
FACTORY_RUNNER |
claude |
model |
FACTORY_MODEL |
(Claude Code default) |
projects_dir |
FACTORY_PROJECTS_DIR |
~/factory-projects |
vault_path |
FACTORY_VAULT_PATH |
(unset) |
playbooks_dir |
FACTORY_PLAYBOOKS_DIR |
~/.factory/playbooks |
registry_dir |
FACTORY_REGISTRY_DIR |
~/.factory |
managed_dirs |
FACTORY_MANAGED_DIRS |
(unset) |
runner_quiet |
FACTORY_RUNNER_QUIET |
(unset) |
bob_dry_run |
FACTORY_BOB_DRY_RUN |
(unset) |
bob_max_invocations_per_cycle |
FACTORY_BOB_MAX_INVOCATIONS_PER_CYCLE |
8 |
ceo_respawn_disabled |
FACTORY_CEO_RESPAWN_DISABLED |
(unset) |
ceo_max_respawns |
FACTORY_CEO_MAX_RESPAWNS |
3 |
Environment Variables¶
All environment variables listed below can alternatively be set in ~/.factory/config.toml (see above). Env vars are still supported for CI, Docker, and scripting.
| Variable | Purpose | Default |
|---|---|---|
FACTORY_PROJECTS_DIR |
Parent directory for projects created from prompts | ~/factory-projects |
FACTORY_MODEL |
Model override for agent subprocesses | (Claude Code default) |
FACTORY_PLAYBOOKS_DIR |
Directory for ACE-evolved agent playbooks | ~/.factory/playbooks |
FACTORY_REGISTRY_DIR |
Override global registry location | ~/.factory |
FACTORY_RUNNER |
CLI backend: claude or bob |
claude |
See Setup Guide — Environment Variables for the full list, including Claude Code authentication, Bob Shell, notifications, and advanced CEO options.