# DAM 101: the concepts, in plain language

A short primer so the rest of this repository makes sense even if you have never
worked with a digital asset management system. Each concept ends with where
Keepstack implements it, so you can read the code next to the idea.

## What a DAM actually is

A digital asset management system is a library for files that carries structured
information about each file. The difference between a DAM and a shared drive is
the difference between a card catalog and a pile of books. A shared drive stores
bytes. A DAM stores bytes plus who made them, what they are, what you are allowed
to do with them, and how to find them again in five years.

The core loop is: **ingest** an asset, **describe** it with metadata, **find** it
through search, **use** it through a rendition or a share, and **govern** it with
permissions, an audit trail, and a retention rule.

## Metadata: the part that matters most

Metadata is data about the asset. There are three flavors, and a good DAM keeps
all three.

- **Technical metadata** is written by the device. A camera writes **EXIF**
  (exposure, lens, GPS, orientation). This is provenance you get for free.
- **Descriptive metadata** is written by people or AI: a title, a caption,
  keywords, a creator, a credit line. The professional standard for this on
  images is **IPTC**, and it is effectively mandatory at news and stock agencies.
- **Administrative and rights metadata**: license, copyright, usage expiry,
  retention date.

Two container formats carry these. **XMP** is a modern XML wrapper that can hold
descriptive, IPTC, and rights fields together, and can live inside the file or
as a sidecar `.xmp` next to it. The classic **IPTC-IIM** is the older binary
form. A subtle, real-world failure is that the same field can exist in EXIF,
IPTC-IIM, and XMP at once, so tools must read all three and not clobber one when
writing another.

> In Keepstack: `metadata.py` extracts EXIF, IPTC, and XMP on upload and seeds the
> catalog's descriptive fields from them.

## Interoperability standards: why they are the moat

A file share has no vocabulary. Institutions need shared ones so their catalogs
can talk to each other and to aggregators.

- **Dublin Core** is the lowest common denominator: fifteen elements (Title,
  Creator, Subject, Date, Rights, and so on) that almost every system
  understands. It is also the mandatory baseline for the harvesting protocol
  below.
- **OAI-PMH** (Open Archives Initiative Protocol for Metadata Harvesting) is a
  simple HTTP protocol that lets an aggregator like DPLA or Europeana mirror
  your catalog automatically. Expose OAI-PMH and your collection can be
  discovered far beyond your own site.
- **IIIF** (International Image Interoperability Framework) is a standard way to
  deliver images. The Image API turns one master image into any crop, size, or
  rotation through a predictable URL, so deep-zoom viewers work without you
  exporting a dozen derivatives.
- The heavier archival stack (**METS** for packaging, **PREMIS** for
  preservation events, **EAD** for archival finding aids, **MARC** for library
  records) is what national archives and libraries live on.

These standards are a few hundred lines each to support, but they are the entire
reason museums and archives pay for the incumbents. Shipping them for free is
the point of Keepstack.

> In Keepstack: `standards.py` provides Dublin Core mapping, an OAI-PMH 2.0
> endpoint, and an IIIF Image API endpoint for every asset.

## Digital preservation: readable in fifty years

Preservation is the discipline of keeping files usable over decades. Two ideas
matter here.

- **Fixity** is proof a file has not changed or rotted. You record a checksum
  (a SHA-256 hash) when the file arrives, and you can re-hash it any time to
  confirm the bytes are identical. If the hash still matches, the file is intact.
- **Format migration** is the harder, longer-term half: moving files from
  obsolete formats to current ones before the old format becomes unreadable.
  This is what dedicated preservation systems like Preservica and Archivematica
  specialize in, and it follows the **OAIS** reference model (ISO 14721).

A neat trick ties preservation to storage: if you name each stored file by its
own SHA-256 hash (**content-addressable storage**), then integrity checking and
deduplication become the same mechanism. Identical files share one hash, so they
store once, and the hash is your fixity check.

> In Keepstack: `storage.py` is content-addressable, and the dashboard runs a
> repository-wide fixity check. Format migration is on the roadmap, not built.

## Search: three kinds

- **Full-text and faceted search** is the precise, structured kind: match words
  in the title or metadata, then narrow by media type or tag. Fast and exact.
- **Semantic search** uses machine-learning embeddings to match meaning, so
  "sunset over water" can find a beach photo nobody tagged. It is the capability
  buyers most often say the incumbents fail at.
- **Visual similarity** ("more like this") uses the same embeddings with an
  image as the query instead of text.

A useful architectural insight: embed each asset once at ingest, then vary only
the query. Text query gives semantic search, an image query gives visual
similarity, and a very tight match threshold gives duplicate detection. One
pipeline, three features.

> In Keepstack: `search.py` does FTS5 full-text and faceting, plus embedding-based
> semantic search and "more like this". Embeddings are lexical by default and
> upgrade to a real model when a Cohere key is set.

## Governance: the boring part institutions require

- **Role-based access control (RBAC)** gives each user a role, and each role a
  set of permissions. Keepstack ships four: viewer, contributor, editor, admin.
- **An audit log** records who did what and when. Regulated and government buyers
  treat a complete, queryable trail as non-negotiable.
- **Records retention and legal hold**: government records carry legally mandated
  retention and disposition schedules, and a legal hold blocks deletion during
  litigation regardless of the schedule.

> In Keepstack: `auth.py` enforces the roles, `audit.py` logs every change, and
> assets carry a retention-until field. Full disposition workflow and legal hold
> are on the roadmap.

## Accessibility: a hard procurement gate

Two distinct obligations hide behind the word "accessibility".

1. Is the **application interface** usable with a keyboard and a screen reader?
   This is measured against **WCAG** and **Section 508**, and buyers often
   require a **VPAT** (a filled-in conformance report) before they can purchase.
   Most vendors publish none, which is a real opening.
2. Is the **asset content** accessible: does each image carry alt text, does each
   video have captions? The DAM does not make content accessible by itself, but
   it must store and never strip the alt-text and caption metadata so downstream
   publishing is accessible. IPTC even has a dedicated Alt Text field for this.

> In Keepstack: alt text is a first-class field, AI-suggested on upload. A published
> VPAT and a full WCAG audit of the UI are on the roadmap.

## Where to go next

- The market context: [../RESEARCH.md](../RESEARCH.md)
- How Keepstack scores against the field: [../COMPARISON.md](../COMPARISON.md)
- How the code fits together: [../ARCHITECTURE.md](../ARCHITECTURE.md)
