DFNs API Design

This document describes the design of the DFNs (Definition Files) API (GitHub issue #262). It is intended to be developer-facing, not user-facing, though users may also find it informative.

This is a living document which will be updated as development proceeds.

Background

The modflow_devtools.dfn module currently provides utilities for parsing and working with MODFLOW 6 definition files. On the dfn branch, significant work has been done including:

  • Object models for DFN components (Dfn, Block, Field classes)

  • Schema definitions for both v1 (legacy) and v2 (in development)

  • Parsers for the old DFN format

  • Schema mapping capabilities including utilities for converting between flat and hierarchical component representations

  • A fetch_dfns() function for manually downloading DFN files from the MODFLOW 6 repository

  • Validation tools

However, there is currently no registry-based API for:

  • Automatically discovering and synchronizing DFN files from remote sources

  • Managing multiple versions of definition files simultaneously

  • Caching definition files locally for offline use

Users must manually download definition files or rely on whatever happens to be bundled with their installation. This creates similar problems to what the Models API addressed:

  1. Version coupling: Users are locked to whatever DFN version is bundled

  2. Manual management: Users must manually track and download DFN updates

  3. No multi-version support: Difficult to work with multiple MODFLOW 6 versions simultaneously

  4. Maintenance burden: Developers must manually update bundled DFNs

Objective

Create a DFNs API that:

  1. Mirrors Models/Programs API patterns for consistency and familiarity

  2. Leverages existing dfn module work (parsers, schemas, object models)

  3. Provides automated discovery of definition files from MODFLOW 6 repository

  4. Supports multiple versions simultaneously with explicit version addressing

  5. Uses Pooch for fetching and caching (avoiding custom HTTP client code)

  6. Handles schema evolution with proper separation of file format vs schema version

  7. Maintains loose coupling between devtools and remote DFN sources

Overview

Make the MODFLOW 6 repository responsible for publishing a definition file registry.

Make modflow-devtools responsible for:

  • Defining the DFN registry publication contract

  • Providing registry-creation machinery

  • Storing bootstrap information locating the MODFLOW 6 repository

  • Discovering remote registries at install time or on demand

  • Caching registry metadata and definition files

  • Exposing a synchronized view of available definition files

  • Parsing and validating definition files

  • Mapping between schema versions

MODFLOW 6 is currently the only repository using the DFN specification system, but this leaves the door open for other repositories to begin using it.

Architecture

The DFNs API will mirror the Models and Programs API architecture, adapted for definition file-specific concerns.

Implementation approach: Following the Models API’s streamlined design, the DFNs API should consolidate core functionality in a single modflow_devtools/dfn/__init__.py file with clear class-based separation:

  • DfnCache: Cache management for registries and DFN files

  • DfnSourceRepo: Source repository with discovery/sync methods

  • DfnSourceConfig: Configuration container from bootstrap file

  • DfnRegistry: Pydantic data model for registry structure

  • PoochDfnRegistry: Remote fetching with Pooch integration

  • DiscoveredDfnRegistry: Discovery result with metadata

  • DfnSpec: Full specification with hierarchical and flat access

This single-module OO design improves maintainability while keeping the existing Dfn, Block, and Field dataclasses that are already well-established.

Bootstrap file

The bootstrap file tells modflow-devtools where to look for DFN registries. This file will be checked into the repository at modflow_devtools/dfn/dfns.toml and distributed with the package.

Bootstrap file contents

At the top level, the bootstrap file consists of a table of sources, each describing a repository that publishes definition files.

Each source has:

  • repo: Repository identifier (owner/name)

  • dfn_path: Path within the repository to the directory containing DFN files (defaults to doc/mf6io/mf6ivar/dfn)

  • registry_path: Path within the repository to the registry metadata file (defaults to .registry/dfns.toml)

  • refs: List of git refs (branches, tags, or commit hashes) to sync by default

User config overlay

Users can customize or extend the bundled bootstrap configuration by creating a user config file at:

  • Linux/macOS: ~/.config/modflow-devtools/dfns.toml (respects $XDG_CONFIG_HOME)

  • Windows: %APPDATA%/modflow-devtools/dfns.toml

The user config follows the same format as the bundled bootstrap file. Sources defined in the user config will override or extend those in the bundled config, allowing users to:

  • Add custom DFN repositories

  • Point to forks of existing repositories (useful for testing experimental schema versions)

  • Override default refs for existing sources

Implementation note: The user config path logic (get_user_config_path("dfn")) is shared across all three APIs (Models, Programs, DFNs) via modflow_devtools.config, but each API implements its own merge_bootstrap() function using API-specific bootstrap schemas.

Sample bootstrap file

[sources.modflow6]
repo = "MODFLOW-ORG/modflow6"
dfn_path = "doc/mf6io/mf6ivar/dfn"
registry_path = ".registry/dfns.toml"
refs = [
    "6.6.0",
    "6.5.0",
    "6.4.4",
    "develop",
]

DFN spec and registry files

Two types of metadata files support the DFNs API:

  1. Specification file (spec.toml): Part of the DFN set, describes the specification itself

  2. Registry file (dfns.toml): Infrastructure for discovery and distribution

Specification file

A spec.toml file lives in the DFN directory alongside the DFN files. It describes the specification:

# MODFLOW 6 input specification
schema_version = "1.1"

[components]
# Component organization by type
simulation = ["sim-nam", "sim-tdis"]
models = ["gwf-nam", "gwt-nam", "gwe-nam"]
packages = ["gwf-chd", "gwf-drn", "gwf-wel", ...]
exchanges = ["exg-gwfgwf", "exg-gwfgwt", ...]
solutions = ["sln-ims"]

Notes:

  • The spec file is part of the DFN set, not registry infrastructure

  • Handwritten by MODFLOW 6 developers, not generated

  • Describes the specification as a whole (schema version, component organization)

  • Lives in the DFN directory: doc/mf6io/mf6ivar/dfn/spec.toml

  • v1/v1.1: Spec file is optional - can be inferred if not present:

    • schema_version can be inferred from DFN content or defaulted

    • components section (shown above) is just for categorization/convenience, not hierarchy

    • Hierarchy inferred from naming conventions (e.g., gwf-chd → parent is gwf-nam)

  • v2: Spec file is required for clarity and correctness:

    • Explicit schema_version = "2.0" declaration

    • Defines hierarchy via root attribute (string reference or inline definition)

    • Component files define children lists (preferred) or parent attributes (backward-compatible)

    • Can be a single file containing everything, or a spec file pointing to separate component files

    • Ensures clean structural/format separation

    • See Component Hierarchy section for details

  • Correspondence: spec.toml (on disk) ↔ DfnSpec (in Python)

Minimal handwritten spec file (v1/v1.1):

schema_version = "1.1"

Or for v1/v1.1, no spec file needed - everything inferred.

Registry file format

A dfns.toml registry file for discovery and distribution (the specific naming distinguishes it from models.toml and programs.toml):

# Registry metadata (top-level, optional)
schema_version = "1.0"
generated_at = "2025-01-02T10:30:00Z"
devtools_version = "1.9.0"

[metadata]
ref = "6.6.0"  # Optional, known from discovery context

# File listings (filenames and hashes, URLs constructed as needed)
[files]
"spec.toml" = {hash = "sha256:..."}  # Specification file
"sim-nam.dfn" = {hash = "sha256:..."}
"sim-tdis.dfn" = {hash = "sha256:..."}
"gwf-nam.dfn" = {hash = "sha256:..."}
"gwf-chd.dfn" = {hash = "sha256:..."}
# ... all DFN files

Notes:

  • Registry is purely infrastructure for discovery and distribution

  • The files section maps filenames to hashes for verification

  • URLs are constructed dynamically from bootstrap metadata (repo, ref, dfn_path) + filename

  • This allows using personal forks by changing the bootstrap file

  • All registry metadata is optional - registries can be handwritten minimally

  • The specification file is listed alongside DFN files

Minimal handwritten registry:

[files]
"spec.toml" = {hash = "sha256:abc123..."}
"sim-nam.dfn" = {hash = "sha256:def456..."}
"gwf-nam.dfn" = {hash = "sha256:789abc..."}

Sample files

For TOML-format DFNs (future v2 schema):

Option A: Separate component files (spec.toml references external files)

Spec file (spec.toml):

schema_version = "2.0"
root = "sim-nam"  # References external sim-nam.toml file

Component file (sim-nam.toml):

children = ["sim-tdis", "gwf-nam", "gwt-nam", "gwe-nam", "exg-gwfgwf", "sln-ims"]

[options]
# ... fields

Component file (gwf-nam.toml):

children = ["gwf-dis", "gwf-chd", "gwf-wel", "gwf-drn", ...]

[options]
# ... fields

Registry (dfns.toml):

[files]
"spec.toml" = {hash = "sha256:..."}
"sim-nam.toml" = {hash = "sha256:..."}
"gwf-nam.toml" = {hash = "sha256:..."}
"gwf-chd.toml" = {hash = "sha256:..."}
# ... all component files

Option B: Single specification file (spec.toml contains everything)

spec.toml contains entire specification:

schema_version = "2.0"

[root]  # Root component defined inline
name = "sim-nam"

[root.options]
# ... all sim-nam fields

[root.children.sim-tdis]
# ... all sim-tdis fields

[root.children.gwf-nam]
children = ["gwf-dis", "gwf-chd", "gwf-wel", ...]  # Can nest children inline too

[root.children.gwf-nam.options]
# ... all gwf-nam fields

[root.children.gwf-nam.children.gwf-chd]
# ... all gwf-chd fields nested within gwf-nam

# ... entire hierarchy nested in one file

Registry just points to the one file:

[files]
"spec.toml" = {hash = "sha256:..."}

Key design: The root attribute is overloaded:

  • String value (root = "sim-nam"): Reference to external component file

  • Table/section ([root]): Inline component definition with full nested hierarchy

Component children are always a list of strings, whether referencing external files or naming nested inline sections.

Registry discovery

DFN registries can be discovered in two modes, similar to the Models API.

Discovery modes

1. Registry as version-controlled file:

Registry files can be versioned in the repository at a conventional path, in which case discovery uses GitHub raw content URLs:

https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/dfns.toml

This mode supports any git ref (branches, tags, commit hashes).

2. Registry as release asset:

Registry files can also be published as release assets:

https://github.com/{org}/{repo}/releases/download/{tag}/dfns.toml

This mode:

  • Requires release tags only

  • Allows registry generation in CI without committing to repo

  • Provides faster discovery (no need to check multiple ref types)

Discovery precedence: Release asset mode takes precedence if both exist (same as Models API).

Registry discovery procedure

At sync time, modflow-devtools discovers remote registries for each configured ref:

  1. Check for release tag (if release asset mode enabled):

    • Look for a GitHub release with the specified tag

    • Try to fetch dfns.toml from release assets

    • If found, use it and skip step 2

    • If release exists but lacks registry asset, fall through to step 2

  2. Check for version-controlled registry:

    • Look for a commit hash, tag, or branch matching the ref

    • Try to fetch registry from {registry_path} via raw content URL

    • If found, use it

    • If ref exists but lacks registry file, raise error:

      DfnRegistryDiscoveryError(
          f"Registry file not found in {registry_path} for 'modflow6@{ref}'"
      )
      
  3. Failure case:

    • If no matching ref found at all, raise error:

      DfnRegistryDiscoveryError(
          f"Registry discovery failed, ref 'modflow6@{ref}' does not exist"
      )
      

Note: For initial implementation, focus on version-controlled mode. Release asset mode requires MODFLOW 6 to start distributing DFN files with releases (currently they don’t), but would be a natural addition once that happens.

Registry/DFN caching

Cache structure mirrors the Models API pattern:

~/.cache/modflow-devtools/
├── dfn/
│   ├── registries/
│   │   └── modflow6/              # by source repo
│   │       ├── 6.6.0/
│   │       │   └── dfns.toml
│   │       ├── 6.5.0/
│   │       │   └── dfns.toml
│   │       └── develop/
│   │           └── dfns.toml
│   └── files/                     # Actual DFN files, managed by Pooch
│       └── modflow6/
│           ├── 6.6.0/
│           │   ├── sim-nam.dfn
│           │   ├── gwf-nam.dfn
│           │   └── ...
│           ├── 6.5.0/
│           │   └── ...
│           └── develop/
│               └── ...

Cache management:

  • Registry files cached per source repository and ref

  • DFN files fetched and cached individually by Pooch, verified against registry hashes

  • Cache persists across Python sessions for offline use

  • Cache can be cleared with dfn clean command

  • Users can check cache status with dfn info

Registry synchronization

Synchronization updates the local registry cache with remote metadata.

Manual sync

Exposed as a CLI command and Python API:

# Sync all configured refs
python -m modflow_devtools.dfn sync

# Sync specific ref
python -m modflow_devtools.dfn sync --ref 6.6.0

# Sync to any git ref (branch, tag, commit hash)
python -m modflow_devtools.dfn sync --ref develop
python -m modflow_devtools.dfn sync --ref f3df630a

# Force re-download
python -m modflow_devtools.dfn sync --force

# Show sync status
python -m modflow_devtools.dfn info

# List available DFNs for a ref
python -m modflow_devtools.dfn list --ref 6.6.0

# List all synced refs
python -m modflow_devtools.dfn list

Or via Python API:

from modflow_devtools.dfn import sync_dfns, get_sync_status

# Sync all configured refs
sync_dfns()

# Sync specific ref
sync_dfns(ref="6.6.0")

# Check sync status
status = get_sync_status()

Automatic sync

  • At install time: Best-effort sync to default refs during package installation (fail silently on network errors)

  • On first use: If registry cache is empty for requested ref, attempt to sync before raising errors

  • Lazy loading: Don’t sync until DFN access is actually requested

  • Configurable (Experimental): Auto-sync is opt-in via environment variable: MODFLOW_DEVTOOLS_AUTO_SYNC=1 (set to “1”, “true”, or “yes”)

Source repository integration

For the MODFLOW 6 repository to integrate:

  1. Optionally handwrite spec.toml in the DFN directory (if not present, everything is inferred):

    # doc/mf6io/mf6ivar/dfn/spec.toml
    schema_version = "1.1"
    
    [components]
    simulation = ["sim-nam", "sim-tdis"]
    models = ["gwf-nam", "gwt-nam", "gwe-nam"]
    # ...
    

    If spec.toml is absent (v1/v1.1 only), DfnSpec.load() will:

    • Scan the directory for .dfn and .toml files

    • Infer schema version from DFN content

    • Infer component organization from filenames

    • Build hierarchy using naming conventions

    Note: For v2 schema, spec.toml is required and must declare schema_version = "2.0"

  2. Generate registry in CI:

    # In MODFLOW 6 repository CI
    python -m modflow_devtools.dfn.make_registry \
      --dfn-path doc/mf6io/mf6ivar/dfn \
      --output .registry/dfns.toml \
      --ref ${{ github.ref_name }}
    
  3. Commit registry to .registry/dfns.toml

  4. Example CI integration (GitHub Actions):

    - name: Generate DFN registry
      run: |
        pip install modflow-devtools
        python -m modflow_devtools.dfn.make_registry \
          --dfn-path doc/mf6io/mf6ivar/dfn \
          --output .registry/dfns.toml \
          --ref ${{ github.ref_name }}
    
    - name: Commit registry
      run: |
        git config user.name "github-actions[bot]"
        git config user.email "github-actions[bot]@users.noreply.github.com"
        git add .registry/dfns.toml
        git diff-index --quiet HEAD || git commit -m "chore: update DFN registry"
        git push
    

Note: Initially generate registries for version-controlled mode. Release asset mode would require MODFLOW 6 to start distributing DFNs with releases.

DFN addressing

Format: mf6@{ref}/{component}

Components include:

  • ref: Git ref (branch, tag, or commit hash) corresponding to a MODFLOW 6 version

  • component: DFN component name (without file extension)

Examples:

  • mf6@6.6.0/sim-nam - Simulation name file definition for MODFLOW 6 v6.6.0

  • mf6@6.6.0/gwf-chd - GWF CHD package definition for v6.6.0

  • mf6@develop/gwf-wel - GWF WEL package definition from develop branch

  • mf6@f3df630a/gwt-adv - GWT ADV package definition from specific commit

Benefits:

  • Explicit versioning prevents confusion

  • Supports multiple MODFLOW 6 versions simultaneously

  • Enables comparison between versions

  • Works with any git ref (not just releases)

Note: The source is always “mf6” (MODFLOW 6), but the addressing scheme allows for future sources if needed.

Registry classes

The registry class hierarchy is based on a Pydantic DfnRegistry base class:

DfnRegistry (base class):

  • Pydantic model with optional meta field for registry metadata

  • Provides access to a DfnSpec (the full parsed specification)

  • Can be instantiated directly for data-only use (e.g., loading/parsing TOML files)

  • Key properties:

    • spec - The full DFN specification (lazy-loaded)

    • ref - Git ref for this registry

    • get_dfn(component) - Convenience for spec[component]

    • get_dfn_path(component) - Get local path to DFN file

    • schema_version - Convenience for spec.schema_version

    • components - Convenience for dict(spec.items())

RemoteDfnRegistry(DfnRegistry):

Handles remote registry discovery, caching, and DFN fetching. Constructs DFN file URLs dynamically from bootstrap metadata:

class RemoteDfnRegistry(DfnRegistry):
    def __init__(self, source: str = "modflow6", ref: str = "develop"):
        self.source = source
        self._ref = ref
        self._spec = None
        self._registry_meta = None
        self._bootstrap_meta = None
        self._pooch = None
        self._cache_dir = None
        self._load()

    def _setup_pooch(self):
        # Create Pooch instance with dynamically constructed URLs
        import pooch

        self._cache_dir = self._get_cache_dir()

        # Construct base URL from bootstrap metadata (NOT stored in registry)
        repo = self._bootstrap_meta["repo"]
        dfn_path = self._bootstrap_meta.get("dfn_path", "doc/mf6io/mf6ivar/dfn")
        base_url = f"https://raw.githubusercontent.com/{repo}/{self._ref}/{dfn_path}/"

        self._pooch = pooch.create(
            path=self._cache_dir,
            base_url=base_url,
            registry=self._registry_meta["files"],  # Just filename -> hash
        )

    def get_dfn_path(self, component: str) -> Path:
        # Use Pooch to fetch file (from cache or remote)
        # Pooch constructs full URL from base_url + filename at runtime
        filename = self._get_filename(component)
        return Path(self._pooch.fetch(filename))

Benefits of dynamic URL construction:

  • Registry files are smaller and simpler (no URLs stored)

  • Users can test against personal forks by modifying bootstrap file

  • Single source of truth for repository location

  • URLs adapt automatically when repo/path changes

LocalDfnRegistry(DfnRegistry):

For developers working with local DFN files:

class LocalDfnRegistry(DfnRegistry):
    def __init__(self, path: str | PathLike, ref: str = "local"):
        self.path = Path(path).expanduser().resolve()
        self._ref = ref
        self._spec = None

    @property
    def spec(self) -> DfnSpec:
        """Lazy-load the DfnSpec from local directory."""
        if self._spec is None:
            self._spec = DfnSpec.load(self.path)
        return self._spec

    def get_dfn_path(self, component: str) -> Path:
        # Return local file path directly
        # Look for both .dfn and .toml extensions
        for ext in [".dfn", ".toml"]:
            p = self.path / f"{component}{ext}"
            if p.exists():
                return p
        raise ValueError(f"Component {component} not found in {self.path}")

Design decisions:

  • Pydantic-based (not ABC) - allows direct instantiation for data-only use cases

  • Dynamic URL construction - DFN file URLs constructed at runtime, not stored in registry

  • No MergedRegistry - users typically work with one MODFLOW 6 version at a time, so merging across versions doesn’t make sense

Module-level API

Convenient module-level functions:

# Default registry for latest stable MODFLOW 6 version
from modflow_devtools.dfn import (
    DEFAULT_REGISTRY,
    DfnSpec,
    get_dfn,
    get_dfn_path,
    list_components,
    sync_dfns,
    get_registry,
    map,
)

# Get individual DFNs
dfn = get_dfn("gwf-chd")  # Uses DEFAULT_REGISTRY
dfn = get_dfn("gwf-chd", ref="6.5.0")  # Specific version

# Get file path
path = get_dfn_path("gwf-wel", ref="6.6.0")

# List available components
components = list_components(ref="6.6.0")

# Work with specific registry
registry = get_registry(ref="6.6.0")
gwf_nam = registry.get_dfn("gwf-nam")

# Load full specification - single canonical hierarchical representation
spec = DfnSpec.load("/path/to/dfns")  # Load from directory

# Hierarchical access
spec.schema_version  # "1.1"
spec.root  # Root Dfn (simulation component)
spec.root.children["gwf-nam"]  # Navigate hierarchy
spec.root.children["gwf-nam"].children["gwf-chd"]

# Flat dict-like access via Mapping protocol
gwf_chd = spec["gwf-chd"]  # Get component by name
for name, dfn in spec.items():  # Iterate all components
    print(name)
len(spec)  # Total number of components

# Access spec through registry (registry provides the spec)
registry = get_registry(ref="6.6.0")
spec = registry.spec  # Registry wraps a DfnSpec
gwf_chd = registry.spec["gwf-chd"]

# Map between schema versions
dfn_v1 = get_dfn("gwf-chd", ref="6.4.4")  # Older version in v1 schema
dfn_v2 = map(dfn_v1, schema_version="2")  # Convert to v2 schema

DfnSpec class:

The DfnSpec dataclass represents the full specification with a single canonical hierarchical representation:

from collections.abc import Mapping
from dataclasses import dataclass

@dataclass
class DfnSpec(Mapping):
    """Full DFN specification with hierarchical structure and flat dict access."""

    schema_version: str
    root: Dfn  # Hierarchical canonical representation (simulation component)

    # Mapping protocol - provides flat dict-like access
    def __getitem__(self, name: str) -> Dfn:
        """Get component by name (flattened lookup)."""
        ...

    def __iter__(self):
        """Iterate over all component names."""
        ...

    def __len__(self):
        """Total number of components in the spec."""
        ...

    @classmethod
    def load(cls, path: Path | str) -> "DfnSpec":
        """
        Load specification from a directory of DFN files.

        The specification is always loaded as a hierarchical tree,
        with flat access available via the Mapping protocol.
        """
        ...

Design benefits:

  • Single canonical representation: Hierarchical tree is the source of truth

  • Flat access when needed: Mapping protocol provides dict-like interface

  • Simple, focused responsibility: DfnSpec only knows how to load from a directory

  • Clean layering: Registries built on top of DfnSpec, not intertwined

  • Clean semantics: DfnSpec = full specification, Dfn = individual component

  • Pythonic: Implements standard Mapping protocol

Separation of concerns:

  • DfnSpec: Canonical representation of the full specification (foundation)

    • Loads from a directory of DFN files via load() classmethod

    • Hierarchical tree via .root property

    • Flat dict access via Mapping protocol

    • No knowledge of registries, caching, or remote sources

  • Registries: Handle discovery, distribution, and caching (built on DfnSpec)

    • Fetch and cache DFN files from remote sources

    • Internally use DfnSpec to represent the loaded specification

    • Provide access via .spec property

    • get_dfn(component) → convenience for spec[component]

    • get_dfn_path(component) → returns cached file path

Backwards compatibility with existing fetch_dfns():

# Old API (still works for manual downloads)
from modflow_devtools.dfn import fetch_dfns
fetch_dfns("MODFLOW-ORG", "modflow6", "6.6.0", "/tmp/dfns")

# New API (preferred - uses registry and caching)
from modflow_devtools.dfn import sync_dfns, get_registry, DfnSpec
sync_dfns(ref="6.6.0")
registry = get_registry(ref="6.6.0")
spec = registry.spec  # Registry wraps a DfnSpec

Schema Versioning

A key design consideration is properly handling schema evolution while separating file format from schema version.

Separating format from schema

As discussed in issue #259, file format and schema version are orthogonal concerns:

File format (serialization):

  • dfn - Legacy DFN text format

  • toml - Modern TOML format (or potentially YAML, see below)

The format is simply how the data is serialized to disk. Any schema version can be serialized in any supported format.

Schema version (structural specification):

  • Defines what components exist and how they relate to each other

  • Defines which variables each component contains

  • Defines variable types, shapes, and constraints

  • Separates structural specification from input format representation concerns

The schema describes the semantic structure and meaning of the specification, independent of how it’s serialized.

Key distinction: The schema migration is about separating structural specification (components, relationships, variables, types) from input format representation. This is discussed in detail in pyphoenix-project issue #246.

For example:

  • Input format issue (v1): Period data defined as recarrays with artificial dimensions like maxbound

  • Structural reality (v2): Each column is actually a variable living on (a subset of) the grid, using semantically meaningful dimensions

The v1 schema conflates:

  • Structural information: Components, their relationships, and variables within each component

  • Format information: How MF6 allows arrays to be provided, when keywords like FILEIN/FILEOUT are necessary

The v2 schema should treat these as separate layers, where consumers can selectively apply formatting details atop a canonical data model.

Current state (on dfn branch):

  • The code supports loading both dfn and toml formats

  • The Dfn.load() function accepts a format parameter

  • Schema version is determined independently of file format

  • V1→V1.1 and V1→V2 schema mapping is implemented

Implications for DFNs API:

  • Registry metadata includes both format and schema_version fields

  • Registries can have different formats at different refs (some refs: dfn, others: toml)

  • The same schema version can be serialized in different formats

  • Schema mapping happens after loading, independent of file format

  • Users can request specific schema versions via map() function

Schema evolution

v1 schema (original):

  • Current MODFLOW 6 releases through 6.6.x

  • Flat structure with in_record, tagged, preserve_case, etc. attributes

  • Mixes structural specification with input format representation (recarray/maxbound issue)

  • Can be serialized as .dfn (original) or .toml

v1.1 schema (intermediate - current mainline on dfn branch):

  • Cleaned-up v1 with data normalization

  • Removed unnecessary attributes (in_record, tagged, etc.)

  • Structural improvements (period block arrays separated into individual variables)

  • Better parent-child relationships inferred from naming conventions

  • Can be serialized as .dfn or .toml

  • Recommendation from issue #259: Use this as the mainline, not jump to v2

v2 schema (future - comprehensive redesign):

  • For devtools 2.x / FloPy 4.x / eventually MF6

  • Requires explicit spec.toml file - no inference for v2 (ensures clarity and correctness)

  • Complete separation of structural specification from input format concerns (see pyphoenix-project #246)

    • Structural layer: components, relationships, variables, data models

    • Format layer: how MF6 allows arrays to be provided, FILEIN/FILEOUT keywords, etc.

    • Consumers can selectively apply formatting details atop canonical data model

  • Explicit parent-child relationships in DFN files (see Component Hierarchy section)

  • Modern type system with proper array types and semantically meaningful dimensions

  • Consolidated attribute representation (see Tentative v2 schema design)

  • Likely serialized as TOML or YAML (with JSON-Schema validation via Pydantic)

DFNs API strategy:

  • Support all schema versions via registry metadata

  • Provide transparent schema mapping where needed

  • Default to native schema version from registry

  • Allow explicit schema version selection via API

  • Maintain backwards compatibility during transitions

Tentative v2 schema design

Based on feedback from mwtoews in PR #229 and the structural/format separation discussed in pyphoenix-project #246:

Structural vs format separation: The v2 schema should cleanly separate:

  • Structural specification: Component definitions, relationships, variable data models

    • Generated classes encode only structure and data models

    • Use semantically meaningful dimensions (grid dimensions, time periods)

  • Format specification: How MF6 reads/writes the data (separate layer)

    • I/O layers exclusively handle input format concerns

    • FILEIN/FILEOUT keywords, array input methods, etc.

Consolidated attributes: Replace individual boolean fields with an attrs list:

# Instead of this (v1/v1.1):
optional = true
time_series = true
layered = false

# Use this (v2):
attrs = ["optional", "time_series"]

Array syntax for shapes: Use actual arrays instead of string representations:

# Instead of this (v1/v1.1):
shape = "(nper, nnodes)"

# Use this (v2):
shape = ["nper", "nnodes"]

Format considerations:

  • TOML vs YAML: YAML’s more forgiving whitespace better accommodates long descriptions (common for scientific parameters)

  • Validation approach: Use Pydantic for both schema definition and validation

    • Pydantic provides rigorous validation (addresses pyphoenix-project #246 requirement for formal specification)

    • Built-in validation after parsing TOML/YAML to dict (no custom parsing logic)

    • Automatic JSON-Schema generation for documentation and external tooling

    • More Pythonic than using python-jsonschema directly

Pydantic integration:

from pydantic import BaseModel, Field
from typing import Any

class FieldV2(BaseModel):
    name: str
    type: str
    block: str | None = None
    shape: list[str] | None = None
    attrs: list[str] = Field(default_factory=list)
    description: str = ""
    default: Any = None
    children: dict[str, "FieldV2"] | None = None

# Usage:
# 1. Parse TOML/YAML to dict (using tomli/pyyaml/etc)
# 2. Validate with Pydantic (built-in)
parsed = tomli.load(f)
field = FieldV2(**parsed)  # Validates automatically

# 3. Export JSON-Schema if needed (for docs, external tools)
schema = FieldV2.model_json_schema()

Benefits:

  • Validation and schema in one: Pydantic handles both, no separate validation library needed

  • Type safety: Full Python type hints and IDE support

  • JSON-Schema export: Available for documentation and external tooling

  • Widely adopted: Well-maintained, used throughout Python ecosystem

  • Better UX: Clear error messages, better handling of multi-line descriptions (if using YAML)

Component Hierarchy

Design decision: Component parent-child relationships are defined in spec.toml for v2, with backward-compatible support for parent attributes in component files.

The registry file’s purpose is to tell devtools what it needs to know to consume the DFNs and make them available to users (file locations, hashes). The specification file (spec.toml) and component files are the single source of truth for the specification itself, including component relationships.

v2 schema approach (primary) - Hierarchy in spec.toml:

# spec.toml
schema_version = "2.0"
root = "sim-nam"  # Or inline [root] definition
# sim-nam.toml
children = ["sim-tdis", "gwf-nam", "gwt-nam", ...]

[options]
# ... field definitions
# gwf-nam.toml
children = ["gwf-dis", "gwf-chd", "gwf-wel", ...]

[options]
# ... field definitions

v2 schema approach (alternative) - parent attribute still supported:

# gwf-chd.toml
parent = "gwf-nam"  # Backward-compatible

[options]
# ... field definitions

DfnSpec.load() can build the hierarchy from either:

  1. children lists (preferred for v2) - parent components list their children

  2. parent attributes (backward-compatible) - child components reference their parent

Benefits of children in spec.toml:

  • Single top-down view - entire hierarchy visible from root

  • Matches DfnSpec design - spec.tomlDfnSpec with .root and tree structure

  • Cleaner component files - focus on their structure, not their position in hierarchy

  • Easier validation - validate entire tree structure in one pass

Benefits of keeping parent support:

  • Backward compatibility - existing component files with parent still work

  • Gradual migration - can transition incrementally to v2

  • Flexibility - both approaches work, choose based on preference

Current state (v1/v1.1):

  • Hierarchy is implicit in naming conventions: gwf-dis → parent is gwf-nam

  • to_tree() function infers relationships from component names

  • Works but fragile (relies on naming conventions being followed)

  • No spec.toml required (everything inferred)

Backwards Compatibility Strategy

Since FloPy 3 is already consuming the v1.1 schema and we need to develop v2 schema in parallel, careful planning is needed to avoid breaking existing consumers.

Development approach

Mainline (develop branch):

  • Keep v1.1 schema stable on mainline

  • Implement DFNs API with full v1/v1.1 support

  • All v1.1 schema changes are additive only (no breaking changes)

  • FloPy 3 continues consuming from mainline without disruption

V2 development (dfn-v2 branch):

  • Create separate dfn-v2 branch for v2 schema development

  • Develop v2 schema, Pydantic models, and structural/format separation

  • Test v2 schema with experimental FloPy 4 development

  • Iterate on v2 design without affecting mainline stability

Integration approach:

  1. Phase 1: DFNs API on mainline supports v1/v1.1 only

  2. Phase 2: Add v2 schema support to mainline (v1, v1.1, and v2 all supported)

  3. Phase 3: Merge dfn-v2 branch, deprecate v1 (but keep it working)

  4. Phase 4: Eventually remove v1 support in devtools 3.x (v1.1 and v2 only)

Schema version support

The DFNs API will support multiple schema versions simultaneously:

# Schema version is tracked per registry/ref
registry_v1 = get_registry(ref="6.4.4")  # MODFLOW 6.4.4 uses v1 schema
registry_v11 = get_registry(ref="6.6.0")  # MODFLOW 6.6.0 uses v1.1 schema
registry_v2 = get_registry(ref="develop")  # Future: develop uses v2 schema

# Get DFN in native schema version
dfn_v1 = registry_v1.get_dfn("gwf-chd")  # Returns v1 schema
dfn_v11 = registry_v11.get_dfn("gwf-chd")  # Returns v1.1 schema

# Transparently map to desired schema version
from modflow_devtools.dfn import map
dfn_v2 = map(dfn_v1, schema_version="2")  # v1 → v2
dfn_v2 = map(dfn_v11, schema_version="2")  # v1.1 → v2

Registry support:

  • Each registry metadata includes schema_version (from spec.toml or inferred)

  • Different refs can have different schema versions

  • RemoteDfnRegistry loads appropriate schema version for each ref

  • load() function detects schema version and uses appropriate parser/validator

Schema detection:

# In RemoteDfnRegistry or DfnSpec.load()
def _detect_schema_version(self) -> Version:
    # 1. Check spec.toml if present
    if spec_file := self._load_spec_file():
        return spec_file.schema_version

    # 2. Infer from DFN content
    sample_dfn = self._load_sample_dfn()
    return infer_schema_version(sample_dfn)

    # 3. Default to latest stable
    return Version("1.1")

API compatibility

Breaking changes in current implementation:

The dfn branch introduces fundamental breaking changes that make it incompatible with a 1.x release:

  1. Core types changed from TypedDict to dataclass:

    # Old (develop) - dict-like access
    dfn["name"]
    field.get("type")
    
    # New (dfn branch) - attribute access
    dfn.name
    field.type
    
  2. Dfn structure changed:

    • Removed: sln, fkeys

    • Added: schema_version, parent, blocks

    • Renamed: fkeyschildren

  3. Removed exports:

    • get_dfns() - now fetch_dfns() in submodule, not re-exported from main module

    • FormatVersion, Sln, FieldType, Reader type aliases

  4. Field structure changed - different attributes and semantics between v1/v2

Why aliasing is not feasible:

The TypedDict → dataclass change is fundamental and cannot be cleanly aliased:

  • Code using dfn["name"] syntax would break immediately

  • Making a dataclass behave like a dict requires implementing __getitem__, get(), keys(), values(), items(), etc.

  • Even with these methods, isinstance checks and type hints would behave differently

  • The complexity and maintenance burden outweigh the benefits

Recommendation: Release as devtools 2.0, not 1.x.

New API (devtools 2.x):

# DFNs API
from modflow_devtools.dfn import DfnSpec, get_dfn, get_registry, sync_dfns

# Sync and access DFNs
sync_dfns(ref="6.6.0")
dfn = get_dfn("gwf-chd", ref="6.6.0")
registry = get_registry(ref="6.6.0")
spec = registry.spec

# Attribute access (dataclass style)
print(dfn.name)  # "gwf-chd"
print(dfn.blocks["options"])

# fetch_dfns() still available for manual downloads
from modflow_devtools.dfn.fetch import fetch_dfns
fetch_dfns("MODFLOW-ORG", "modflow6", "6.6.0", "/tmp/dfns")

Migration timeline

devtools 1.x (current stable):

  • Existing modflow_devtools/dfn.py with TypedDict-based API

  • get_dfns() function for manual downloads

  • No registry infrastructure

  • No changes - maintain stability for existing users

devtools 2.0 (this work):

  • ❌ Breaking: Dfn, Field change from TypedDict to dataclass

  • ❌ Breaking: get_dfns() renamed to fetch_dfns() (in submodule)

  • ❌ Breaking: Several type aliases removed or moved

  • ✅ New: Full DFNs API with registry infrastructure

  • ✅ New: DfnSpec class with hierarchical and flat access

  • ✅ New: RemoteDfnRegistry, LocalDfnRegistry classes

  • ✅ New: CLI commands (sync, info, list, clean)

  • ✅ New: Schema versioning and mapping (v1 ↔ v2)

  • ✅ New: Pydantic-based configuration and validation

devtools 2.x (future minor releases):

  • Add v2 DFN schema support when MODFLOW 6 adopts it

  • Schema mapping between all versions (v1, v1.1, v2)

  • Additional CLI commands and features

  • Performance improvements

devtools 3.0 (distant future):

  • Consider removing v1 schema support (with deprecation warnings in 2.x)

  • Potential further API refinements

Key principles:

  1. Clean break at 2.0 - no half-measures with aliasing

  2. Multi-version schema support - DFNs API works with v1, v1.1, and v2 simultaneously

  3. Clear migration path - document all breaking changes in release notes

  4. Semantic versioning - breaking changes require major version bump

Testing strategy:

  • Test suite covers all schema versions (v1, v1.1, v2)

  • Test schema mapping in all directions (v1↔v1.1↔v2)

  • Test mixed-version scenarios (different refs with different schemas)

  • Integration tests with real MODFLOW 6 repository

Documentation:

  • Clear migration guide from 1.x to 2.x

  • Document all breaking changes with before/after examples

  • Document which MODFLOW 6 versions use which schema versions

  • Examples showing multi-version usage

Implementation Dependencies

Existing work on dfn branch

The dfn branch already includes substantial infrastructure:

Completed:

  • Dfn, Block, Field dataclasses

  • ✅ Schema definitions (FieldV1, FieldV2)

  • ✅ Parsers for both DFN and TOML formats

  • ✅ Schema mapping (V1 → V2) with MapV1To2

  • ✅ Flat/tree conversion utilities (load_flat(), load_tree(), to_tree())

  • fetch_dfns() function for manual downloads

  • ✅ Validation utilities

  • dfn2toml conversion tool

Integration with DfnSpec design:

The dfn branch currently has:

# Returns dict[str, Dfn] - flat representation
dfns = load_flat("/path/to/dfns")

# Returns root Dfn with children - hierarchical representation
root = load_tree("/path/to/dfns")

The new DfnSpec class will consolidate these:

# Single load, both representations available
spec = DfnSpec.load("/path/to/dfns")
spec.root  # Hierarchical (same as old load_tree)
spec["gwf-chd"]  # Flat dict access (same as old load_flat)

Migration path:

  1. Add DfnSpec class - wraps existing to_tree() logic and implements Mapping

  2. Keep load_flat() and load_tree() - mark as internal/deprecated but maintain for compatibility

  3. DfnSpec.load() implementation - uses existing functions internally:

    @classmethod
    def load(cls, path: Path | str) -> "DfnSpec":
        # Use existing load_flat for paths
        dfns = load_flat(path)
    
        # Use existing to_tree to build hierarchy
        root = to_tree(dfns)
        schema_version = root.schema_version  # or load from spec.toml
        return cls(schema_version=schema_version, root=root)
    
  4. Update registries - make them wrap DfnSpec:

    class RemoteDfnRegistry(DfnRegistry):
        @property
        def spec(self) -> DfnSpec:
            if self._spec is None:
                self._ensure_cached()  # Fetch all files
                self._spec = DfnSpec.load(self._cache_dir)  # Load from cache
            return self._spec
    
  5. Future: Eventually remove load_flat() and load_tree() from public API

This approach:

  • Reuses all existing parsing/conversion logic

  • Provides cleaner API without breaking existing code

  • Smooth transition: old functions work, new class preferred

Note: FloPy 3 is already generating code from an early version of this schema (per pyphoenix-project #246), which creates some stability requirements for the v1.1/v2 transition.

Choreography with develop branch:

Currently:

  • develop branch has modflow_devtools/dfn.py (single file, basic utilities)

  • dfn branch has modflow_devtools/dfn/ (package with full implementation)

  • dfns-api branch (current) just adds planning docs

Merge sequence:

  1. First: Merge dfns-api branch → develop (adds planning docs)

  2. Then: Merge dfn branch → develop (replaces dfn.py with dfn/ package)

    • This replaces the single file with the package

    • Maintains API compatibility: from modflow_devtools.dfn import ... still works

    • Adds substantial new functionality (schema classes, parsers, etc.)

  3. Finally: Implement DFNs API features on develop (registries, sync, CLI, DfnSpec)

API changes during merge:

# Old dfn.py API (on develop now) - uses TypedDicts
from modflow_devtools.dfn import get_dfns, Field, Dfn
dfn["name"]  # dict-like access

# New dfn/ package API (after dfn branch merge) - dataclasses
from modflow_devtools.dfn import Dfn, Block, Field  # Now dataclasses
from modflow_devtools.dfn.fetch import fetch_dfns  # Renamed, moved to submodule
from modflow_devtools.dfn import DfnSpec, get_registry, sync_dfns  # New additions
dfn.name  # attribute access

Breaking changes (see API compatibility section for full details):

  • Field, Dfn, etc. change from TypedDict to dataclass - requires 2.0 release

  • get_dfns() renamed to fetch_dfns() and moved to submodule

  • Several type aliases removed or moved to schema submodules

Implementation status (DFNs API):

  • ✅ Bootstrap file and registry schema

  • ✅ Registry discovery and synchronization

  • ✅ Pooch integration for file caching

  • ✅ Registry classes (DfnRegistry, RemoteDfnRegistry, LocalDfnRegistry)

  • ✅ CLI commands (sync, info, list, clean)

  • ✅ Module-level convenience API

  • ✅ Registry generation tool (make_registry.py)

  • ⚠️ Integration with MODFLOW 6 CI (requires registry branch merge in MF6 repo)

Core components

Foundation (no dependencies):

  1. Merge dfn branch work (schema, parser, utility code)

  2. Add bootstrap file (modflow_devtools/dfn/dfns.toml)

  3. Define registry schema with Pydantic (handles validation and provides JSON-Schema export)

  4. Implement registry discovery logic

  5. Create cache directory structure utilities

Registry infrastructure (depends on Foundation):

  1. Add Pooch as dependency

  2. Implement DfnRegistry abstract base class

  3. Implement RemoteDfnRegistry with Pooch for file fetching

  4. Refactor existing code into LocalDfnRegistry

  5. Implement sync_dfns() function

  6. Add registry metadata caching with hash verification

  7. Implement version-controlled registry discovery

  8. Add auto-sync on first use (opt-in via MODFLOW_DEVTOOLS_AUTO_SYNC while experimental)

  9. Implement DfnSpec dataclass with Mapping protocol for single canonical hierarchical representation with flat dict access

CLI and module API (depends on Registry infrastructure):

  1. Create modflow_devtools/dfn/__main__.py

  2. Add commands: sync, info, list, clean

  3. Add --ref flag for version selection

  4. Add --force flag for re-download

  5. Add convenience functions (get_dfn, get_dfn_path, list_components, etc.)

  6. Create DEFAULT_REGISTRY for latest stable version

  7. Maintain backwards compatibility with fetch_dfns()

Registry generation tool (depends on Foundation):

  1. Implement modflow_devtools/dfn/make_registry.py

  2. Scan DFN directory and generate registry file (dfns.toml): file listings with hashes

  3. Compute file hashes (SHA256) for all files (including spec.toml if present)

  4. Registry output: just filename -> hash mapping (no URLs - constructed dynamically)

  5. Support both full output (for CI) and minimal output (for handwriting)

  6. Do NOT generate spec.toml - that’s handwritten by MODFLOW 6 developers

  7. Optionally validate spec.toml against DFN set for consistency if it exists

  8. For v1/v1.1: infer hierarchy from naming conventions for validation

  9. For v2: read explicit parent relationships from DFN files for validation

MODFLOW 6 repository integration

CI workflow (depends on Registry generation tool):

  1. Install modflow-devtools in MODFLOW 6 CI

  2. Generate registry on push to develop and release tags

  3. Commit registry to .registry/dfns.toml

  4. Test registry discovery and sync

  5. Note: spec.toml is handwritten by developers (optional), checked into repo like DFN files

Bootstrap configuration (depends on MODFLOW 6 CI):

  1. Add stable MODFLOW 6 releases to bootstrap refs (6.6.0, 6.5.0, etc.)

  2. Include develop branch for latest definitions

  3. Test multi-ref discovery and sync

Testing and documentation

Testing (depends on all core components):

  1. Unit tests for registry classes

  2. Integration tests for sync mechanism

  3. Network failure scenarios

  4. Multi-version scenarios

  5. Schema mapping tests (v1 → v1.1 → v2)

  6. Both file format tests (dfn and toml)

  7. Backwards compatibility tests with existing FloPy usage

Documentation (can be done concurrently with implementation):

  1. Update docs/md/dfn.md with API examples

  2. Document format vs schema separation clearly

  3. Document schema evolution roadmap (v1 → v1.1 → v2)

  4. Document component hierarchy approach (explicit in DFN files for v2)

  5. Add migration guide for existing code

  6. CLI usage examples

  7. MODFLOW 6 CI integration guide

Relationship to Models and Programs APIs

The DFNs API deliberately mirrors the Models and Programs API architecture for consistency:

Aspect

Models API

Programs API

DFNs API

Bootstrap file

models/models.toml

programs/programs.toml

dfn/dfns.toml

Registry format

TOML with files/models/examples

TOML with programs/binaries

TOML with files/components/hierarchy

Discovery

Release assets or version control

Release assets only

Version control (+ release assets future)

Caching

~/.cache/.../models

~/.cache/.../programs

~/.cache/.../dfn

Addressing

source@ref/path/to/model

program@version

mf6@ref/component

CLI

models sync/info/list

programs sync/info/install

dfn sync/info/list/clean

Primary use

Access model input files

Install program binaries

Parse definition files

Key differences:

  • DFNs API focuses on metadata/parsing, not installation

  • DFNs API leverages existing parser infrastructure (Dfn, Block, Field classes)

  • DFNs API handles schema versioning/mapping (format vs schema separation)

  • DFNs API supports both flat and hierarchical representations

Shared patterns:

  • Bootstrap-driven discovery

  • Remote sync with Pooch caching

  • Ref-based versioning (branches, tags, commits)

  • CLI command structure

  • Lazy loading / auto-sync on first use

  • Environment variable opt-out for auto-sync

This consistency benefits both developers and users with a familiar experience across all three APIs.

Cross-API Consistency

The DFNs API follows the same design patterns as the Models and Programs APIs for consistency. See the Cross-API Consistency section in models.md for full details.

Key shared patterns:

  • Pydantic-based registry classes (not ABCs)

  • Dynamic URL construction (URLs built at runtime, not stored in registries)

  • Bootstrap and user config files with identical naming (dfns.toml), distinguished by location

  • Top-level schema_version metadata field

  • Distinctly named registry file (dfns.toml)

  • Shared config utility: get_user_config_path("dfn")

Unique to DFNs API:

  • Discovery via version control (release assets mode planned for future)

  • Extra dfn_path bootstrap field (location of DFN files within repo)

  • Schema versioning and mapping capabilities

  • No MergedRegistry (users work with one MF6 version at a time)

Design Decisions

Use Pooch for fetching

Following the recommendation in issue #262, the DFNs API will use Pooch for fetching to avoid maintaining custom HTTP client code. This provides:

  • Automatic caching: Pooch handles local caching with verification

  • Hash verification: Ensures file integrity

  • Progress bars: Better user experience for downloads

  • Well-tested: Pooch is mature and widely used

  • Consistency: Same approach as Models API

Use Pydantic for schema validation

Pydantic will be used for defining and validating DFN schemas (both registry schemas and DFN content schemas):

  • Built-in validation: No need for separate validation libraries like python-jsonschema

  • Type safety: Full Python type hints and IDE support

  • JSON-Schema export: Can generate JSON-Schema for documentation and external tooling

  • Developer experience: Clear error messages, good Python integration

  • Justification: Widely adopted, well-maintained, addresses the formal specification requirement from pyphoenix-project #246

Schema versioning strategy

Based on issue #259:

  • Separate format from schema: Registry metadata includes both

  • Support v1.1 as mainline: Don’t jump straight to v2

  • Backwards compatible: Continue supporting v1 for existing MODFLOW 6 releases

  • Schema mapping: Provide transparent conversion via map() function

  • Future-proof: Design allows for v2 when ready (devtools 2.x / FloPy 4.x)

Future enhancements

  1. Release asset mode: Add support for registries as release assets (in addition to version control)

  2. Registry compression: Compress registry files for faster downloads

  3. Partial updates: Diff-based registry synchronization

  4. Offline mode: Explicit offline mode that never attempts sync

  5. Conda integration: Coordinate with conda-forge for bundled DFN packages

  6. Multi-source support: Support definition files from sources other than MODFLOW 6

  7. Validation API: Expose validation functionality for user-provided input files

  8. Diff/compare API: Compare DFNs across versions to identify changes