Crate dailp

source
Expand description

This crate defines the data structure and type system used by the whole DAILP infrastructure, including data migration, GraphQL layer, and the front-end. There are a few key types to understand about our handling of annotated manuscripts and Cherokee lexical sources.

An AnnotatedDoc represents one manuscript broken down word by word (generally referred to as “forms”). It has several fields of metadata, like its title, id, or collection. The meat of the AnnotatedDoc is its segments, which is a list of segments which may each be a AnnotatedForm, block (contains segments), or line break.

An AnnotatedForm is a single word located in some document that has multiple layers of representation. In DAILP’s Cherokee data, those layers are typically the source text, simple phonetics, phonemic representation, morphemic segmentation, and an English gloss. Each AnnotatedForm always knows what document it came from, retaining a sense of source and concrete reference.

Re-exports§

Modules§

  • This module contains types related to authentication
  • Provides types for defining and structuring edited collections.
  • Types that power our features for reading / leaving comments on words and paragraphs
  • This module includes types which are intended to represent the IIIF Presentation API specification. We use these types to build IIIF manifests for any annotated document, allowing any IIIF image viewer to consume and properly display our content.
  • Provides types for structuring text-based pages.
  • Provides stripped-down types for edited collections
  • Provides the struct SheetResult which represents a Google Sheets spreadsheet. Also provides functions to retrieve a sheet from Google Sheets.
  • Provides a type representing a user.

Structs§

Enums§

  • Element within a spreadsheet before being transformed into a full document.
  • One representation of Cherokee phonology. There are several different writing systems for Cherokee phonology and we want to convert between them. This type enumerates all of the systems that we support and provides conversion from our internal orthography into any of these.
  • A contributor can have to any number of roles, which define most of their contributions to the associated item (add or revise as needed)
  • The kind of a document in terms of what body it lives within. A reference document is a dictionary or grammar for example, while a corpus document might be a letter, journal, or notice.
  • Storage format for Cherokee phonetics. Consonants: t/th in storage, converted to d/t on output. Vowels: struct-defined
  • Cherokee vowel categories based on tone and length
  • The kind of segment that a particular sequence of characters in a morphemic segmentations represent.

Traits§

  • Trait that defines function which takes in a possibly undefined value.

Functions§

  • Converts a given phonemic string from the Uchihara representation to the DAILP representation. For example: “a:!” => “áá”
  • Is the given gloss for a root morpheme? This is a crude calculation that just checks if there are any lowercase characters. Convention says that typically functional morpheme tags are all uppercase (plus numbers and punctuation), so having lowercase characters indicates a lexical morpheme gloss.
  • Parse a canonical morphemic segmentation from the two layers: morphemes and glosses.
  • TODO Convert all phonemic representations into the TAOC/DAILP format. TODO Store forms in any format with a tag defining the format so that GraphQL can do the conversion instead of the migration process.
  • Parse an iterator of spreadsheet cells into root noun forms ready to insert into the database.
  • Build a single verb surface form from the given row.
  • Gather many verb surface forms from the given row.
  • Parse spreadsheet cells into one verb form with a morphemic segmentation.
  • Parse spreadsheet cells into many verb forms with morphemic segmentations.
  • Convert consonants in the given d/t phonetics string into their Worcester phonetics equivalents. gw => qu, j => ts
  • Convert any unicode string to an ascii “slug” (useful for file names/url components)
  • Turns a string into an ltree-friendly slug with underscores.