DFNs API Design
This document describes the design of the DFNs (Definition Files) API (GitHub issue #262). It is intended to be developer-facing, not user-facing, though users may also find it informative.
This is a living document which will be updated as development proceeds.
Background
The modflow_devtools.dfn module currently provides utilities for parsing and working with MODFLOW 6 definition files. On the dfn branch, significant work has been done including:
Object models for DFN components (
Dfn,Block,Fieldclasses)Schema definitions for both v1 (legacy) and v2 (in development)
Parsers for the old DFN format
Schema mapping capabilities including utilities for converting between flat and hierarchical component representations
A
fetch_dfns()function for manually downloading DFN files from the MODFLOW 6 repositoryValidation tools
However, there is currently no registry-based API for:
Automatically discovering and synchronizing DFN files from remote sources
Managing multiple versions of definition files simultaneously
Caching definition files locally for offline use
Users must manually download definition files or rely on whatever happens to be bundled with their installation. This creates similar problems to what the Models API addressed:
Version coupling: Users are locked to whatever DFN version is bundled
Manual management: Users must manually track and download DFN updates
No multi-version support: Difficult to work with multiple MODFLOW 6 versions simultaneously
Maintenance burden: Developers must manually update bundled DFNs
Objective
Create a DFNs API that:
Mirrors Models/Programs API patterns for consistency and familiarity
Leverages existing dfn module work (parsers, schemas, object models)
Provides automated discovery of definition files from MODFLOW 6 repository
Supports multiple versions simultaneously with explicit version addressing
Uses Pooch for fetching and caching (avoiding custom HTTP client code)
Handles schema evolution with proper separation of file format vs schema version
Maintains loose coupling between devtools and remote DFN sources
Overview
Make the MODFLOW 6 repository responsible for publishing a definition file registry.
Make modflow-devtools responsible for:
Defining the DFN registry publication contract
Providing registry-creation machinery
Storing bootstrap information locating the MODFLOW 6 repository
Discovering remote registries at install time or on demand
Caching registry metadata and definition files
Exposing a synchronized view of available definition files
Parsing and validating definition files
Mapping between schema versions
MODFLOW 6 is currently the only repository using the DFN specification system, but this leaves the door open for other repositories to begin using it.
Architecture
The DFNs API will mirror the Models and Programs API architecture, adapted for definition file-specific concerns.
Implementation approach: Following the Models API’s streamlined design, the DFNs API should consolidate core functionality in a single modflow_devtools/dfn/__init__.py file with clear class-based separation:
DfnCache: Cache management for registries and DFN filesDfnSourceRepo: Source repository with discovery/sync methodsDfnSourceConfig: Configuration container from bootstrap fileDfnRegistry: Pydantic data model for registry structurePoochDfnRegistry: Remote fetching with Pooch integrationDiscoveredDfnRegistry: Discovery result with metadataDfnSpec: Full specification with hierarchical and flat access
This single-module OO design improves maintainability while keeping the existing Dfn, Block, and Field dataclasses that are already well-established.
Bootstrap file
The bootstrap file tells modflow-devtools where to look for DFN registries. This file will be checked into the repository at modflow_devtools/dfn/dfns.toml and distributed with the package.
Bootstrap file contents
At the top level, the bootstrap file consists of a table of sources, each describing a repository that publishes definition files.
Each source has:
repo: Repository identifier (owner/name)dfn_path: Path within the repository to the directory containing DFN files (defaults todoc/mf6io/mf6ivar/dfn)registry_path: Path within the repository to the registry metadata file (defaults to.registry/dfns.toml)refs: List of git refs (branches, tags, or commit hashes) to sync by default
User config overlay
Users can customize or extend the bundled bootstrap configuration by creating a user config file at:
Linux/macOS:
~/.config/modflow-devtools/dfns.toml(respects$XDG_CONFIG_HOME)Windows:
%APPDATA%/modflow-devtools/dfns.toml
The user config follows the same format as the bundled bootstrap file. Sources defined in the user config will override or extend those in the bundled config, allowing users to:
Add custom DFN repositories
Point to forks of existing repositories (useful for testing experimental schema versions)
Override default refs for existing sources
Implementation note: The user config path logic (get_user_config_path("dfn")) is shared across all three APIs (Models, Programs, DFNs) via modflow_devtools.config, but each API implements its own merge_bootstrap() function using API-specific bootstrap schemas.
Sample bootstrap file
[sources.modflow6]
repo = "MODFLOW-ORG/modflow6"
dfn_path = "doc/mf6io/mf6ivar/dfn"
registry_path = ".registry/dfns.toml"
refs = [
"6.6.0",
"6.5.0",
"6.4.4",
"develop",
]
DFN spec and registry files
Two types of metadata files support the DFNs API:
Specification file (
spec.toml): Part of the DFN set, describes the specification itselfRegistry file (
dfns.toml): Infrastructure for discovery and distribution
Specification file
A spec.toml file lives in the DFN directory alongside the DFN files. It describes the specification:
# MODFLOW 6 input specification
schema_version = "1.1"
[components]
# Component organization by type
simulation = ["sim-nam", "sim-tdis"]
models = ["gwf-nam", "gwt-nam", "gwe-nam"]
packages = ["gwf-chd", "gwf-drn", "gwf-wel", ...]
exchanges = ["exg-gwfgwf", "exg-gwfgwt", ...]
solutions = ["sln-ims"]
Notes:
The spec file is part of the DFN set, not registry infrastructure
Handwritten by MODFLOW 6 developers, not generated
Describes the specification as a whole (schema version, component organization)
Lives in the DFN directory:
doc/mf6io/mf6ivar/dfn/spec.tomlv1/v1.1: Spec file is optional - can be inferred if not present:
schema_versioncan be inferred from DFN content or defaultedcomponentssection (shown above) is just for categorization/convenience, not hierarchyHierarchy inferred from naming conventions (e.g.,
gwf-chd→ parent isgwf-nam)
v2: Spec file is required for clarity and correctness:
Explicit
schema_version = "2.0"declarationDefines hierarchy via
rootattribute (string reference or inline definition)Component files define
childrenlists (preferred) orparentattributes (backward-compatible)Can be a single file containing everything, or a spec file pointing to separate component files
Ensures clean structural/format separation
See Component Hierarchy section for details
Correspondence:
spec.toml(on disk) ↔DfnSpec(in Python)
Minimal handwritten spec file (v1/v1.1):
schema_version = "1.1"
Or for v1/v1.1, no spec file needed - everything inferred.
Registry file format
A dfns.toml registry file for discovery and distribution (the specific naming distinguishes it from models.toml and programs.toml):
# Registry metadata (top-level, optional)
schema_version = "1.0"
generated_at = "2025-01-02T10:30:00Z"
devtools_version = "1.9.0"
[metadata]
ref = "6.6.0" # Optional, known from discovery context
# File listings (filenames and hashes, URLs constructed as needed)
[files]
"spec.toml" = {hash = "sha256:..."} # Specification file
"sim-nam.dfn" = {hash = "sha256:..."}
"sim-tdis.dfn" = {hash = "sha256:..."}
"gwf-nam.dfn" = {hash = "sha256:..."}
"gwf-chd.dfn" = {hash = "sha256:..."}
# ... all DFN files
Notes:
Registry is purely infrastructure for discovery and distribution
The
filessection maps filenames to hashes for verificationURLs are constructed dynamically from bootstrap metadata (repo, ref, dfn_path) + filename
This allows using personal forks by changing the bootstrap file
All registry metadata is optional - registries can be handwritten minimally
The specification file is listed alongside DFN files
Minimal handwritten registry:
[files]
"spec.toml" = {hash = "sha256:abc123..."}
"sim-nam.dfn" = {hash = "sha256:def456..."}
"gwf-nam.dfn" = {hash = "sha256:789abc..."}
Sample files
For TOML-format DFNs (future v2 schema):
Option A: Separate component files (spec.toml references external files)
Spec file (spec.toml):
schema_version = "2.0"
root = "sim-nam" # References external sim-nam.toml file
Component file (sim-nam.toml):
children = ["sim-tdis", "gwf-nam", "gwt-nam", "gwe-nam", "exg-gwfgwf", "sln-ims"]
[options]
# ... fields
Component file (gwf-nam.toml):
children = ["gwf-dis", "gwf-chd", "gwf-wel", "gwf-drn", ...]
[options]
# ... fields
Registry (dfns.toml):
[files]
"spec.toml" = {hash = "sha256:..."}
"sim-nam.toml" = {hash = "sha256:..."}
"gwf-nam.toml" = {hash = "sha256:..."}
"gwf-chd.toml" = {hash = "sha256:..."}
# ... all component files
Option B: Single specification file (spec.toml contains everything)
spec.toml contains entire specification:
schema_version = "2.0"
[root] # Root component defined inline
name = "sim-nam"
[root.options]
# ... all sim-nam fields
[root.children.sim-tdis]
# ... all sim-tdis fields
[root.children.gwf-nam]
children = ["gwf-dis", "gwf-chd", "gwf-wel", ...] # Can nest children inline too
[root.children.gwf-nam.options]
# ... all gwf-nam fields
[root.children.gwf-nam.children.gwf-chd]
# ... all gwf-chd fields nested within gwf-nam
# ... entire hierarchy nested in one file
Registry just points to the one file:
[files]
"spec.toml" = {hash = "sha256:..."}
Key design: The root attribute is overloaded:
String value (
root = "sim-nam"): Reference to external component fileTable/section (
[root]): Inline component definition with full nested hierarchy
Component children are always a list of strings, whether referencing external files or naming nested inline sections.
Registry discovery
DFN registries can be discovered in two modes, similar to the Models API.
Discovery modes
1. Registry as version-controlled file:
Registry files can be versioned in the repository at a conventional path, in which case discovery uses GitHub raw content URLs:
https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/dfns.toml
This mode supports any git ref (branches, tags, commit hashes).
2. Registry as release asset:
Registry files can also be published as release assets:
https://github.com/{org}/{repo}/releases/download/{tag}/dfns.toml
This mode:
Requires release tags only
Allows registry generation in CI without committing to repo
Provides faster discovery (no need to check multiple ref types)
Discovery precedence: Release asset mode takes precedence if both exist (same as Models API).
Registry discovery procedure
At sync time, modflow-devtools discovers remote registries for each configured ref:
Check for release tag (if release asset mode enabled):
Look for a GitHub release with the specified tag
Try to fetch
dfns.tomlfrom release assetsIf found, use it and skip step 2
If release exists but lacks registry asset, fall through to step 2
Check for version-controlled registry:
Look for a commit hash, tag, or branch matching the ref
Try to fetch registry from
{registry_path}via raw content URLIf found, use it
If ref exists but lacks registry file, raise error:
DfnRegistryDiscoveryError( f"Registry file not found in {registry_path} for 'modflow6@{ref}'" )
Failure case:
If no matching ref found at all, raise error:
DfnRegistryDiscoveryError( f"Registry discovery failed, ref 'modflow6@{ref}' does not exist" )
Note: For initial implementation, focus on version-controlled mode. Release asset mode requires MODFLOW 6 to start distributing DFN files with releases (currently they don’t), but would be a natural addition once that happens.
Registry/DFN caching
Cache structure mirrors the Models API pattern:
~/.cache/modflow-devtools/
├── dfn/
│ ├── registries/
│ │ └── modflow6/ # by source repo
│ │ ├── 6.6.0/
│ │ │ └── dfns.toml
│ │ ├── 6.5.0/
│ │ │ └── dfns.toml
│ │ └── develop/
│ │ └── dfns.toml
│ └── files/ # Actual DFN files, managed by Pooch
│ └── modflow6/
│ ├── 6.6.0/
│ │ ├── sim-nam.dfn
│ │ ├── gwf-nam.dfn
│ │ └── ...
│ ├── 6.5.0/
│ │ └── ...
│ └── develop/
│ └── ...
Cache management:
Registry files cached per source repository and ref
DFN files fetched and cached individually by Pooch, verified against registry hashes
Cache persists across Python sessions for offline use
Cache can be cleared with
dfn cleancommandUsers can check cache status with
dfn info
Registry synchronization
Synchronization updates the local registry cache with remote metadata.
Manual sync
Exposed as a CLI command and Python API:
# Sync all configured refs
python -m modflow_devtools.dfn sync
# Sync specific ref
python -m modflow_devtools.dfn sync --ref 6.6.0
# Sync to any git ref (branch, tag, commit hash)
python -m modflow_devtools.dfn sync --ref develop
python -m modflow_devtools.dfn sync --ref f3df630a
# Force re-download
python -m modflow_devtools.dfn sync --force
# Show sync status
python -m modflow_devtools.dfn info
# List available DFNs for a ref
python -m modflow_devtools.dfn list --ref 6.6.0
# List all synced refs
python -m modflow_devtools.dfn list
Or via Python API:
from modflow_devtools.dfn import sync_dfns, get_sync_status
# Sync all configured refs
sync_dfns()
# Sync specific ref
sync_dfns(ref="6.6.0")
# Check sync status
status = get_sync_status()
Automatic sync
At install time: Best-effort sync to default refs during package installation (fail silently on network errors)
On first use: If registry cache is empty for requested ref, attempt to sync before raising errors
Lazy loading: Don’t sync until DFN access is actually requested
Configurable (Experimental): Auto-sync is opt-in via environment variable:
MODFLOW_DEVTOOLS_AUTO_SYNC=1(set to “1”, “true”, or “yes”)
Source repository integration
For the MODFLOW 6 repository to integrate:
Optionally handwrite
spec.tomlin the DFN directory (if not present, everything is inferred):# doc/mf6io/mf6ivar/dfn/spec.toml schema_version = "1.1" [components] simulation = ["sim-nam", "sim-tdis"] models = ["gwf-nam", "gwt-nam", "gwe-nam"] # ...
If
spec.tomlis absent (v1/v1.1 only),DfnSpec.load()will:Scan the directory for
.dfnand.tomlfilesInfer schema version from DFN content
Infer component organization from filenames
Build hierarchy using naming conventions
Note: For v2 schema,
spec.tomlis required and must declareschema_version = "2.0"Generate registry in CI:
# In MODFLOW 6 repository CI python -m modflow_devtools.dfn.make_registry \ --dfn-path doc/mf6io/mf6ivar/dfn \ --output .registry/dfns.toml \ --ref ${{ github.ref_name }}
Commit registry to
.registry/dfns.tomlExample CI integration (GitHub Actions):
- name: Generate DFN registry run: | pip install modflow-devtools python -m modflow_devtools.dfn.make_registry \ --dfn-path doc/mf6io/mf6ivar/dfn \ --output .registry/dfns.toml \ --ref ${{ github.ref_name }} - name: Commit registry run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git add .registry/dfns.toml git diff-index --quiet HEAD || git commit -m "chore: update DFN registry" git push
Note: Initially generate registries for version-controlled mode. Release asset mode would require MODFLOW 6 to start distributing DFNs with releases.
DFN addressing
Format: mf6@{ref}/{component}
Components include:
ref: Git ref (branch, tag, or commit hash) corresponding to a MODFLOW 6 versioncomponent: DFN component name (without file extension)
Examples:
mf6@6.6.0/sim-nam- Simulation name file definition for MODFLOW 6 v6.6.0mf6@6.6.0/gwf-chd- GWF CHD package definition for v6.6.0mf6@develop/gwf-wel- GWF WEL package definition from develop branchmf6@f3df630a/gwt-adv- GWT ADV package definition from specific commit
Benefits:
Explicit versioning prevents confusion
Supports multiple MODFLOW 6 versions simultaneously
Enables comparison between versions
Works with any git ref (not just releases)
Note: The source is always “mf6” (MODFLOW 6), but the addressing scheme allows for future sources if needed.
Registry classes
The registry class hierarchy is based on a Pydantic DfnRegistry base class:
DfnRegistry (base class):
Pydantic model with optional
metafield for registry metadataProvides access to a
DfnSpec(the full parsed specification)Can be instantiated directly for data-only use (e.g., loading/parsing TOML files)
Key properties:
spec- The full DFN specification (lazy-loaded)ref- Git ref for this registryget_dfn(component)- Convenience forspec[component]get_dfn_path(component)- Get local path to DFN fileschema_version- Convenience forspec.schema_versioncomponents- Convenience fordict(spec.items())
RemoteDfnRegistry(DfnRegistry):
Handles remote registry discovery, caching, and DFN fetching. Constructs DFN file URLs dynamically from bootstrap metadata:
class RemoteDfnRegistry(DfnRegistry):
def __init__(self, source: str = "modflow6", ref: str = "develop"):
self.source = source
self._ref = ref
self._spec = None
self._registry_meta = None
self._bootstrap_meta = None
self._pooch = None
self._cache_dir = None
self._load()
def _setup_pooch(self):
# Create Pooch instance with dynamically constructed URLs
import pooch
self._cache_dir = self._get_cache_dir()
# Construct base URL from bootstrap metadata (NOT stored in registry)
repo = self._bootstrap_meta["repo"]
dfn_path = self._bootstrap_meta.get("dfn_path", "doc/mf6io/mf6ivar/dfn")
base_url = f"https://raw.githubusercontent.com/{repo}/{self._ref}/{dfn_path}/"
self._pooch = pooch.create(
path=self._cache_dir,
base_url=base_url,
registry=self._registry_meta["files"], # Just filename -> hash
)
def get_dfn_path(self, component: str) -> Path:
# Use Pooch to fetch file (from cache or remote)
# Pooch constructs full URL from base_url + filename at runtime
filename = self._get_filename(component)
return Path(self._pooch.fetch(filename))
Benefits of dynamic URL construction:
Registry files are smaller and simpler (no URLs stored)
Users can test against personal forks by modifying bootstrap file
Single source of truth for repository location
URLs adapt automatically when repo/path changes
LocalDfnRegistry(DfnRegistry):
For developers working with local DFN files:
class LocalDfnRegistry(DfnRegistry):
def __init__(self, path: str | PathLike, ref: str = "local"):
self.path = Path(path).expanduser().resolve()
self._ref = ref
self._spec = None
@property
def spec(self) -> DfnSpec:
"""Lazy-load the DfnSpec from local directory."""
if self._spec is None:
self._spec = DfnSpec.load(self.path)
return self._spec
def get_dfn_path(self, component: str) -> Path:
# Return local file path directly
# Look for both .dfn and .toml extensions
for ext in [".dfn", ".toml"]:
p = self.path / f"{component}{ext}"
if p.exists():
return p
raise ValueError(f"Component {component} not found in {self.path}")
Design decisions:
Pydantic-based (not ABC) - allows direct instantiation for data-only use cases
Dynamic URL construction - DFN file URLs constructed at runtime, not stored in registry
No
MergedRegistry- users typically work with one MODFLOW 6 version at a time, so merging across versions doesn’t make sense
Module-level API
Convenient module-level functions:
# Default registry for latest stable MODFLOW 6 version
from modflow_devtools.dfn import (
DEFAULT_REGISTRY,
DfnSpec,
get_dfn,
get_dfn_path,
list_components,
sync_dfns,
get_registry,
map,
)
# Get individual DFNs
dfn = get_dfn("gwf-chd") # Uses DEFAULT_REGISTRY
dfn = get_dfn("gwf-chd", ref="6.5.0") # Specific version
# Get file path
path = get_dfn_path("gwf-wel", ref="6.6.0")
# List available components
components = list_components(ref="6.6.0")
# Work with specific registry
registry = get_registry(ref="6.6.0")
gwf_nam = registry.get_dfn("gwf-nam")
# Load full specification - single canonical hierarchical representation
spec = DfnSpec.load("/path/to/dfns") # Load from directory
# Hierarchical access
spec.schema_version # "1.1"
spec.root # Root Dfn (simulation component)
spec.root.children["gwf-nam"] # Navigate hierarchy
spec.root.children["gwf-nam"].children["gwf-chd"]
# Flat dict-like access via Mapping protocol
gwf_chd = spec["gwf-chd"] # Get component by name
for name, dfn in spec.items(): # Iterate all components
print(name)
len(spec) # Total number of components
# Access spec through registry (registry provides the spec)
registry = get_registry(ref="6.6.0")
spec = registry.spec # Registry wraps a DfnSpec
gwf_chd = registry.spec["gwf-chd"]
# Map between schema versions
dfn_v1 = get_dfn("gwf-chd", ref="6.4.4") # Older version in v1 schema
dfn_v2 = map(dfn_v1, schema_version="2") # Convert to v2 schema
DfnSpec class:
The DfnSpec dataclass represents the full specification with a single canonical hierarchical representation:
from collections.abc import Mapping
from dataclasses import dataclass
@dataclass
class DfnSpec(Mapping):
"""Full DFN specification with hierarchical structure and flat dict access."""
schema_version: str
root: Dfn # Hierarchical canonical representation (simulation component)
# Mapping protocol - provides flat dict-like access
def __getitem__(self, name: str) -> Dfn:
"""Get component by name (flattened lookup)."""
...
def __iter__(self):
"""Iterate over all component names."""
...
def __len__(self):
"""Total number of components in the spec."""
...
@classmethod
def load(cls, path: Path | str) -> "DfnSpec":
"""
Load specification from a directory of DFN files.
The specification is always loaded as a hierarchical tree,
with flat access available via the Mapping protocol.
"""
...
Design benefits:
Single canonical representation: Hierarchical tree is the source of truth
Flat access when needed: Mapping protocol provides dict-like interface
Simple, focused responsibility:
DfnSpeconly knows how to load from a directoryClean layering: Registries built on top of
DfnSpec, not intertwinedClean semantics:
DfnSpec= full specification,Dfn= individual componentPythonic: Implements standard
Mappingprotocol
Separation of concerns:
DfnSpec: Canonical representation of the full specification (foundation)Loads from a directory of DFN files via
load()classmethodHierarchical tree via
.rootpropertyFlat dict access via
MappingprotocolNo knowledge of registries, caching, or remote sources
Registries: Handle discovery, distribution, and caching (built on DfnSpec)
Fetch and cache DFN files from remote sources
Internally use
DfnSpecto represent the loaded specificationProvide access via
.specpropertyget_dfn(component)→ convenience forspec[component]get_dfn_path(component)→ returns cached file path
Backwards compatibility with existing fetch_dfns():
# Old API (still works for manual downloads)
from modflow_devtools.dfn import fetch_dfns
fetch_dfns("MODFLOW-ORG", "modflow6", "6.6.0", "/tmp/dfns")
# New API (preferred - uses registry and caching)
from modflow_devtools.dfn import sync_dfns, get_registry, DfnSpec
sync_dfns(ref="6.6.0")
registry = get_registry(ref="6.6.0")
spec = registry.spec # Registry wraps a DfnSpec
Schema Versioning
A key design consideration is properly handling schema evolution while separating file format from schema version.
Separating format from schema
As discussed in issue #259, file format and schema version are orthogonal concerns:
File format (serialization):
dfn- Legacy DFN text formattoml- Modern TOML format (or potentially YAML, see below)
The format is simply how the data is serialized to disk. Any schema version can be serialized in any supported format.
Schema version (structural specification):
Defines what components exist and how they relate to each other
Defines which variables each component contains
Defines variable types, shapes, and constraints
Separates structural specification from input format representation concerns
The schema describes the semantic structure and meaning of the specification, independent of how it’s serialized.
Key distinction: The schema migration is about separating structural specification (components, relationships, variables, types) from input format representation. This is discussed in detail in pyphoenix-project issue #246.
For example:
Input format issue (v1): Period data defined as recarrays with artificial dimensions like
maxboundStructural reality (v2): Each column is actually a variable living on (a subset of) the grid, using semantically meaningful dimensions
The v1 schema conflates:
Structural information: Components, their relationships, and variables within each component
Format information: How MF6 allows arrays to be provided, when keywords like
FILEIN/FILEOUTare necessary
The v2 schema should treat these as separate layers, where consumers can selectively apply formatting details atop a canonical data model.
Current state (on dfn branch):
The code supports loading both
dfnandtomlformatsThe
Dfn.load()function accepts aformatparameterSchema version is determined independently of file format
V1→V1.1 and V1→V2 schema mapping is implemented
Implications for DFNs API:
Registry metadata includes both
formatandschema_versionfieldsRegistries can have different formats at different refs (some refs: dfn, others: toml)
The same schema version can be serialized in different formats
Schema mapping happens after loading, independent of file format
Users can request specific schema versions via
map()function
Schema evolution
v1 schema (original):
Current MODFLOW 6 releases through 6.6.x
Flat structure with
in_record,tagged,preserve_case, etc. attributesMixes structural specification with input format representation (recarray/maxbound issue)
Can be serialized as
.dfn(original) or.toml
v1.1 schema (intermediate - current mainline on dfn branch):
Cleaned-up v1 with data normalization
Removed unnecessary attributes (
in_record,tagged, etc.)Structural improvements (period block arrays separated into individual variables)
Better parent-child relationships inferred from naming conventions
Can be serialized as
.dfnor.tomlRecommendation from issue #259: Use this as the mainline, not jump to v2
v2 schema (future - comprehensive redesign):
For devtools 2.x / FloPy 4.x / eventually MF6
Requires explicit
spec.tomlfile - no inference for v2 (ensures clarity and correctness)Complete separation of structural specification from input format concerns (see pyphoenix-project #246)
Structural layer: components, relationships, variables, data models
Format layer: how MF6 allows arrays to be provided, FILEIN/FILEOUT keywords, etc.
Consumers can selectively apply formatting details atop canonical data model
Explicit parent-child relationships in DFN files (see Component Hierarchy section)
Modern type system with proper array types and semantically meaningful dimensions
Consolidated attribute representation (see Tentative v2 schema design)
Likely serialized as TOML or YAML (with JSON-Schema validation via Pydantic)
DFNs API strategy:
Support all schema versions via registry metadata
Provide transparent schema mapping where needed
Default to native schema version from registry
Allow explicit schema version selection via API
Maintain backwards compatibility during transitions
Tentative v2 schema design
Based on feedback from mwtoews in PR #229 and the structural/format separation discussed in pyphoenix-project #246:
Structural vs format separation: The v2 schema should cleanly separate:
Structural specification: Component definitions, relationships, variable data models
Generated classes encode only structure and data models
Use semantically meaningful dimensions (grid dimensions, time periods)
Format specification: How MF6 reads/writes the data (separate layer)
I/O layers exclusively handle input format concerns
FILEIN/FILEOUT keywords, array input methods, etc.
Consolidated attributes: Replace individual boolean fields with an attrs list:
# Instead of this (v1/v1.1):
optional = true
time_series = true
layered = false
# Use this (v2):
attrs = ["optional", "time_series"]
Array syntax for shapes: Use actual arrays instead of string representations:
# Instead of this (v1/v1.1):
shape = "(nper, nnodes)"
# Use this (v2):
shape = ["nper", "nnodes"]
Format considerations:
TOML vs YAML: YAML’s more forgiving whitespace better accommodates long descriptions (common for scientific parameters)
Validation approach: Use Pydantic for both schema definition and validation
Pydantic provides rigorous validation (addresses pyphoenix-project #246 requirement for formal specification)
Built-in validation after parsing TOML/YAML to dict (no custom parsing logic)
Automatic JSON-Schema generation for documentation and external tooling
More Pythonic than using
python-jsonschemadirectly
Pydantic integration:
from pydantic import BaseModel, Field
from typing import Any
class FieldV2(BaseModel):
name: str
type: str
block: str | None = None
shape: list[str] | None = None
attrs: list[str] = Field(default_factory=list)
description: str = ""
default: Any = None
children: dict[str, "FieldV2"] | None = None
# Usage:
# 1. Parse TOML/YAML to dict (using tomli/pyyaml/etc)
# 2. Validate with Pydantic (built-in)
parsed = tomli.load(f)
field = FieldV2(**parsed) # Validates automatically
# 3. Export JSON-Schema if needed (for docs, external tools)
schema = FieldV2.model_json_schema()
Benefits:
Validation and schema in one: Pydantic handles both, no separate validation library needed
Type safety: Full Python type hints and IDE support
JSON-Schema export: Available for documentation and external tooling
Widely adopted: Well-maintained, used throughout Python ecosystem
Better UX: Clear error messages, better handling of multi-line descriptions (if using YAML)
Component Hierarchy
Design decision: Component parent-child relationships are defined in spec.toml for v2, with backward-compatible support for parent attributes in component files.
The registry file’s purpose is to tell devtools what it needs to know to consume the DFNs and make them available to users (file locations, hashes). The specification file (spec.toml) and component files are the single source of truth for the specification itself, including component relationships.
v2 schema approach (primary) - Hierarchy in spec.toml:
# spec.toml
schema_version = "2.0"
root = "sim-nam" # Or inline [root] definition
# sim-nam.toml
children = ["sim-tdis", "gwf-nam", "gwt-nam", ...]
[options]
# ... field definitions
# gwf-nam.toml
children = ["gwf-dis", "gwf-chd", "gwf-wel", ...]
[options]
# ... field definitions
v2 schema approach (alternative) - parent attribute still supported:
# gwf-chd.toml
parent = "gwf-nam" # Backward-compatible
[options]
# ... field definitions
DfnSpec.load() can build the hierarchy from either:
childrenlists (preferred for v2) - parent components list their childrenparentattributes (backward-compatible) - child components reference their parent
Benefits of children in spec.toml:
Single top-down view - entire hierarchy visible from root
Matches
DfnSpecdesign -spec.toml↔DfnSpecwith.rootand tree structureCleaner component files - focus on their structure, not their position in hierarchy
Easier validation - validate entire tree structure in one pass
Benefits of keeping parent support:
Backward compatibility - existing component files with
parentstill workGradual migration - can transition incrementally to v2
Flexibility - both approaches work, choose based on preference
Current state (v1/v1.1):
Hierarchy is implicit in naming conventions:
gwf-dis→ parent isgwf-namto_tree()function infers relationships from component namesWorks but fragile (relies on naming conventions being followed)
No
spec.tomlrequired (everything inferred)
Backwards Compatibility Strategy
Since FloPy 3 is already consuming the v1.1 schema and we need to develop v2 schema in parallel, careful planning is needed to avoid breaking existing consumers.
Development approach
Mainline (develop branch):
Keep v1.1 schema stable on mainline
Implement DFNs API with full v1/v1.1 support
All v1.1 schema changes are additive only (no breaking changes)
FloPy 3 continues consuming from mainline without disruption
V2 development (dfn-v2 branch):
Create separate
dfn-v2branch for v2 schema developmentDevelop v2 schema, Pydantic models, and structural/format separation
Test v2 schema with experimental FloPy 4 development
Iterate on v2 design without affecting mainline stability
Integration approach:
Phase 1: DFNs API on mainline supports v1/v1.1 only
Phase 2: Add v2 schema support to mainline (v1, v1.1, and v2 all supported)
Phase 3: Merge dfn-v2 branch, deprecate v1 (but keep it working)
Phase 4: Eventually remove v1 support in devtools 3.x (v1.1 and v2 only)
Schema version support
The DFNs API will support multiple schema versions simultaneously:
# Schema version is tracked per registry/ref
registry_v1 = get_registry(ref="6.4.4") # MODFLOW 6.4.4 uses v1 schema
registry_v11 = get_registry(ref="6.6.0") # MODFLOW 6.6.0 uses v1.1 schema
registry_v2 = get_registry(ref="develop") # Future: develop uses v2 schema
# Get DFN in native schema version
dfn_v1 = registry_v1.get_dfn("gwf-chd") # Returns v1 schema
dfn_v11 = registry_v11.get_dfn("gwf-chd") # Returns v1.1 schema
# Transparently map to desired schema version
from modflow_devtools.dfn import map
dfn_v2 = map(dfn_v1, schema_version="2") # v1 → v2
dfn_v2 = map(dfn_v11, schema_version="2") # v1.1 → v2
Registry support:
Each registry metadata includes
schema_version(fromspec.tomlor inferred)Different refs can have different schema versions
RemoteDfnRegistryloads appropriate schema version for each refload()function detects schema version and uses appropriate parser/validator
Schema detection:
# In RemoteDfnRegistry or DfnSpec.load()
def _detect_schema_version(self) -> Version:
# 1. Check spec.toml if present
if spec_file := self._load_spec_file():
return spec_file.schema_version
# 2. Infer from DFN content
sample_dfn = self._load_sample_dfn()
return infer_schema_version(sample_dfn)
# 3. Default to latest stable
return Version("1.1")
API compatibility
Breaking changes in current implementation:
The dfn branch introduces fundamental breaking changes that make it incompatible with a 1.x release:
Core types changed from TypedDict to dataclass:
# Old (develop) - dict-like access dfn["name"] field.get("type") # New (dfn branch) - attribute access dfn.name field.type
Dfnstructure changed:Removed:
sln,fkeysAdded:
schema_version,parent,blocksRenamed:
fkeys→children
Removed exports:
get_dfns()- nowfetch_dfns()in submodule, not re-exported from main moduleFormatVersion,Sln,FieldType,Readertype aliases
Fieldstructure changed - different attributes and semantics between v1/v2
Why aliasing is not feasible:
The TypedDict → dataclass change is fundamental and cannot be cleanly aliased:
Code using
dfn["name"]syntax would break immediatelyMaking a dataclass behave like a dict requires implementing
__getitem__,get(),keys(),values(),items(), etc.Even with these methods, isinstance checks and type hints would behave differently
The complexity and maintenance burden outweigh the benefits
Recommendation: Release as devtools 2.0, not 1.x.
New API (devtools 2.x):
# DFNs API
from modflow_devtools.dfn import DfnSpec, get_dfn, get_registry, sync_dfns
# Sync and access DFNs
sync_dfns(ref="6.6.0")
dfn = get_dfn("gwf-chd", ref="6.6.0")
registry = get_registry(ref="6.6.0")
spec = registry.spec
# Attribute access (dataclass style)
print(dfn.name) # "gwf-chd"
print(dfn.blocks["options"])
# fetch_dfns() still available for manual downloads
from modflow_devtools.dfn.fetch import fetch_dfns
fetch_dfns("MODFLOW-ORG", "modflow6", "6.6.0", "/tmp/dfns")
Migration timeline
devtools 1.x (current stable):
Existing
modflow_devtools/dfn.pywith TypedDict-based APIget_dfns()function for manual downloadsNo registry infrastructure
No changes - maintain stability for existing users
devtools 2.0 (this work):
❌ Breaking:
Dfn,Fieldchange from TypedDict to dataclass❌ Breaking:
get_dfns()renamed tofetch_dfns()(in submodule)❌ Breaking: Several type aliases removed or moved
✅ New: Full DFNs API with registry infrastructure
✅ New:
DfnSpecclass with hierarchical and flat access✅ New:
RemoteDfnRegistry,LocalDfnRegistryclasses✅ New: CLI commands (sync, info, list, clean)
✅ New: Schema versioning and mapping (v1 ↔ v2)
✅ New: Pydantic-based configuration and validation
devtools 2.x (future minor releases):
Add v2 DFN schema support when MODFLOW 6 adopts it
Schema mapping between all versions (v1, v1.1, v2)
Additional CLI commands and features
Performance improvements
devtools 3.0 (distant future):
Consider removing v1 schema support (with deprecation warnings in 2.x)
Potential further API refinements
Key principles:
Clean break at 2.0 - no half-measures with aliasing
Multi-version schema support - DFNs API works with v1, v1.1, and v2 simultaneously
Clear migration path - document all breaking changes in release notes
Semantic versioning - breaking changes require major version bump
Testing strategy:
Test suite covers all schema versions (v1, v1.1, v2)
Test schema mapping in all directions (v1↔v1.1↔v2)
Test mixed-version scenarios (different refs with different schemas)
Integration tests with real MODFLOW 6 repository
Documentation:
Clear migration guide from 1.x to 2.x
Document all breaking changes with before/after examples
Document which MODFLOW 6 versions use which schema versions
Examples showing multi-version usage
Implementation Dependencies
Existing work on dfn branch
The dfn branch already includes substantial infrastructure:
Completed:
✅
Dfn,Block,Fielddataclasses✅ Schema definitions (
FieldV1,FieldV2)✅ Parsers for both DFN and TOML formats
✅ Schema mapping (V1 → V2) with
MapV1To2✅ Flat/tree conversion utilities (
load_flat(),load_tree(),to_tree())✅
fetch_dfns()function for manual downloads✅ Validation utilities
✅
dfn2tomlconversion tool
Integration with DfnSpec design:
The dfn branch currently has:
# Returns dict[str, Dfn] - flat representation
dfns = load_flat("/path/to/dfns")
# Returns root Dfn with children - hierarchical representation
root = load_tree("/path/to/dfns")
The new DfnSpec class will consolidate these:
# Single load, both representations available
spec = DfnSpec.load("/path/to/dfns")
spec.root # Hierarchical (same as old load_tree)
spec["gwf-chd"] # Flat dict access (same as old load_flat)
Migration path:
Add
DfnSpecclass - wraps existingto_tree()logic and implementsMappingKeep
load_flat()andload_tree()- mark as internal/deprecated but maintain for compatibilityDfnSpec.load()implementation - uses existing functions internally:@classmethod def load(cls, path: Path | str) -> "DfnSpec": # Use existing load_flat for paths dfns = load_flat(path) # Use existing to_tree to build hierarchy root = to_tree(dfns) schema_version = root.schema_version # or load from spec.toml return cls(schema_version=schema_version, root=root)
Update registries - make them wrap
DfnSpec:class RemoteDfnRegistry(DfnRegistry): @property def spec(self) -> DfnSpec: if self._spec is None: self._ensure_cached() # Fetch all files self._spec = DfnSpec.load(self._cache_dir) # Load from cache return self._spec
Future: Eventually remove
load_flat()andload_tree()from public API
This approach:
Reuses all existing parsing/conversion logic
Provides cleaner API without breaking existing code
Smooth transition: old functions work, new class preferred
Note: FloPy 3 is already generating code from an early version of this schema (per pyphoenix-project #246), which creates some stability requirements for the v1.1/v2 transition.
Choreography with develop branch:
Currently:
develop branch has
modflow_devtools/dfn.py(single file, basic utilities)dfn branch has
modflow_devtools/dfn/(package with full implementation)dfns-api branch (current) just adds planning docs
Merge sequence:
First: Merge
dfns-apibranch →develop(adds planning docs)Then: Merge
dfnbranch →develop(replacesdfn.pywithdfn/package)This replaces the single file with the package
Maintains API compatibility:
from modflow_devtools.dfn import ...still worksAdds substantial new functionality (schema classes, parsers, etc.)
Finally: Implement DFNs API features on
develop(registries, sync, CLI,DfnSpec)
API changes during merge:
# Old dfn.py API (on develop now) - uses TypedDicts
from modflow_devtools.dfn import get_dfns, Field, Dfn
dfn["name"] # dict-like access
# New dfn/ package API (after dfn branch merge) - dataclasses
from modflow_devtools.dfn import Dfn, Block, Field # Now dataclasses
from modflow_devtools.dfn.fetch import fetch_dfns # Renamed, moved to submodule
from modflow_devtools.dfn import DfnSpec, get_registry, sync_dfns # New additions
dfn.name # attribute access
Breaking changes (see API compatibility section for full details):
Field,Dfn, etc. change fromTypedDicttodataclass- requires 2.0 releaseget_dfns()renamed tofetch_dfns()and moved to submoduleSeveral type aliases removed or moved to schema submodules
Implementation status (DFNs API):
✅ Bootstrap file and registry schema
✅ Registry discovery and synchronization
✅ Pooch integration for file caching
✅ Registry classes (
DfnRegistry,RemoteDfnRegistry,LocalDfnRegistry)✅ CLI commands (sync, info, list, clean)
✅ Module-level convenience API
✅ Registry generation tool (
make_registry.py)⚠️ Integration with MODFLOW 6 CI (requires registry branch merge in MF6 repo)
Core components
Foundation (no dependencies):
Merge dfn branch work (schema, parser, utility code)
Add bootstrap file (
modflow_devtools/dfn/dfns.toml)Define registry schema with Pydantic (handles validation and provides JSON-Schema export)
Implement registry discovery logic
Create cache directory structure utilities
Registry infrastructure (depends on Foundation):
Add Pooch as dependency
Implement
DfnRegistryabstract base classImplement
RemoteDfnRegistrywith Pooch for file fetchingRefactor existing code into
LocalDfnRegistryImplement
sync_dfns()functionAdd registry metadata caching with hash verification
Implement version-controlled registry discovery
Add auto-sync on first use (opt-in via
MODFLOW_DEVTOOLS_AUTO_SYNCwhile experimental)Implement
DfnSpecdataclass withMappingprotocol for single canonical hierarchical representation with flat dict access
CLI and module API (depends on Registry infrastructure):
Create
modflow_devtools/dfn/__main__.pyAdd commands:
sync,info,list,cleanAdd
--refflag for version selectionAdd
--forceflag for re-downloadAdd convenience functions (
get_dfn,get_dfn_path,list_components, etc.)Create
DEFAULT_REGISTRYfor latest stable versionMaintain backwards compatibility with
fetch_dfns()
Registry generation tool (depends on Foundation):
Implement
modflow_devtools/dfn/make_registry.pyScan DFN directory and generate registry file (
dfns.toml): file listings with hashesCompute file hashes (SHA256) for all files (including
spec.tomlif present)Registry output: just filename -> hash mapping (no URLs - constructed dynamically)
Support both full output (for CI) and minimal output (for handwriting)
Do NOT generate
spec.toml- that’s handwritten by MODFLOW 6 developersOptionally validate
spec.tomlagainst DFN set for consistency if it existsFor v1/v1.1: infer hierarchy from naming conventions for validation
For v2: read explicit parent relationships from DFN files for validation
MODFLOW 6 repository integration
CI workflow (depends on Registry generation tool):
Install modflow-devtools in MODFLOW 6 CI
Generate registry on push to develop and release tags
Commit registry to
.registry/dfns.tomlTest registry discovery and sync
Note:
spec.tomlis handwritten by developers (optional), checked into repo like DFN files
Bootstrap configuration (depends on MODFLOW 6 CI):
Add stable MODFLOW 6 releases to bootstrap refs (6.6.0, 6.5.0, etc.)
Include
developbranch for latest definitionsTest multi-ref discovery and sync
Testing and documentation
Testing (depends on all core components):
Unit tests for registry classes
Integration tests for sync mechanism
Network failure scenarios
Multi-version scenarios
Schema mapping tests (v1 → v1.1 → v2)
Both file format tests (dfn and toml)
Backwards compatibility tests with existing FloPy usage
Documentation (can be done concurrently with implementation):
Update
docs/md/dfn.mdwith API examplesDocument format vs schema separation clearly
Document schema evolution roadmap (v1 → v1.1 → v2)
Document component hierarchy approach (explicit in DFN files for v2)
Add migration guide for existing code
CLI usage examples
MODFLOW 6 CI integration guide
Relationship to Models and Programs APIs
The DFNs API deliberately mirrors the Models and Programs API architecture for consistency:
Aspect |
Models API |
Programs API |
DFNs API |
|---|---|---|---|
Bootstrap file |
|
|
|
Registry format |
TOML with files/models/examples |
TOML with programs/binaries |
TOML with files/components/hierarchy |
Discovery |
Release assets or version control |
Release assets only |
Version control (+ release assets future) |
Caching |
|
|
|
Addressing |
|
|
|
CLI |
|
|
|
Primary use |
Access model input files |
Install program binaries |
Parse definition files |
Key differences:
DFNs API focuses on metadata/parsing, not installation
DFNs API leverages existing parser infrastructure (Dfn, Block, Field classes)
DFNs API handles schema versioning/mapping (format vs schema separation)
DFNs API supports both flat and hierarchical representations
Shared patterns:
Bootstrap-driven discovery
Remote sync with Pooch caching
Ref-based versioning (branches, tags, commits)
CLI command structure
Lazy loading / auto-sync on first use
Environment variable opt-out for auto-sync
This consistency benefits both developers and users with a familiar experience across all three APIs.
Cross-API Consistency
The DFNs API follows the same design patterns as the Models and Programs APIs for consistency. See the Cross-API Consistency section in models.md for full details.
Key shared patterns:
Pydantic-based registry classes (not ABCs)
Dynamic URL construction (URLs built at runtime, not stored in registries)
Bootstrap and user config files with identical naming (
dfns.toml), distinguished by locationTop-level
schema_versionmetadata fieldDistinctly named registry file (
dfns.toml)Shared config utility:
get_user_config_path("dfn")
Unique to DFNs API:
Discovery via version control (release assets mode planned for future)
Extra
dfn_pathbootstrap field (location of DFN files within repo)Schema versioning and mapping capabilities
No
MergedRegistry(users work with one MF6 version at a time)
Design Decisions
Use Pooch for fetching
Following the recommendation in issue #262, the DFNs API will use Pooch for fetching to avoid maintaining custom HTTP client code. This provides:
Automatic caching: Pooch handles local caching with verification
Hash verification: Ensures file integrity
Progress bars: Better user experience for downloads
Well-tested: Pooch is mature and widely used
Consistency: Same approach as Models API
Use Pydantic for schema validation
Pydantic will be used for defining and validating DFN schemas (both registry schemas and DFN content schemas):
Built-in validation: No need for separate validation libraries like
python-jsonschemaType safety: Full Python type hints and IDE support
JSON-Schema export: Can generate JSON-Schema for documentation and external tooling
Developer experience: Clear error messages, good Python integration
Justification: Widely adopted, well-maintained, addresses the formal specification requirement from pyphoenix-project #246
Schema versioning strategy
Based on issue #259:
Separate format from schema: Registry metadata includes both
Support v1.1 as mainline: Don’t jump straight to v2
Backwards compatible: Continue supporting v1 for existing MODFLOW 6 releases
Schema mapping: Provide transparent conversion via
map()functionFuture-proof: Design allows for v2 when ready (devtools 2.x / FloPy 4.x)
Future enhancements
Release asset mode: Add support for registries as release assets (in addition to version control)
Registry compression: Compress registry files for faster downloads
Partial updates: Diff-based registry synchronization
Offline mode: Explicit offline mode that never attempts sync
Conda integration: Coordinate with conda-forge for bundled DFN packages
Multi-source support: Support definition files from sources other than MODFLOW 6
Validation API: Expose validation functionality for user-provided input files
Diff/compare API: Compare DFNs across versions to identify changes