Expand description
This crate defines the data structure and type system used by the whole DAILP infrastructure, including data migration, GraphQL layer, and the front-end. There are a few key types to understand about our handling of annotated manuscripts and Cherokee lexical sources.
An AnnotatedDoc represents one manuscript broken down word by word
(generally referred to as “forms”). It has several fields of metadata, like
its title, id, or collection. The meat of the AnnotatedDoc is its
segments, which is a list of segments which may each be a AnnotatedForm,
block (contains segments), or line break.
An AnnotatedForm is a single word located in some document that has
multiple layers of representation. In DAILP’s Cherokee data, those layers
are typically the source text, simple phonetics, phonemic representation,
morphemic segmentation, and an English gloss. Each AnnotatedForm always
knows what document it came from, retaining a sense of source and concrete
reference.
Re-exports§
pub use async_graphql;pub use chrono;pub use collection::*;pub use doc_metadata::*;pub use menu::*;pub use sheet_result::*;
Modules§
- This module contains types related to authentication
- Provides types for defining and structuring edited collections.
- Types that power our features for reading / leaving comments on words and paragraphs
- This module includes types which are intended to represent the IIIF Presentation API specification. We use these types to build IIIF manifests for any annotated document, allowing any IIIF image viewer to consume and properly display our content.
- Provides types for structuring text-based pages.
- Provides stripped-down types for edited collections
- Provides the struct
SheetResultwhich represents a Google Sheets spreadsheet. Also provides functions to retrieve a sheet from Google Sheets. - Provides a type representing a user.
Structs§
- Represents a morphological gloss tag without committing to a single representation.
- Record for a DAILP admin
- A document with associated metadata and content broken down into pages and further into paragraphs with an English translation. Also supports each word being broken down into component parts and having associated notes.
- A single word in an annotated document. One word contains several layers of interpretation, including the original source text, multiple layers of linguistic annotation, and annotator notes. TODO Split into two types, one for migration and one for SQL + GraphQL
- A single word in an annotated document that can be edited. All fields except id are optional.
- Request to attach user-recorded audio to a document
- Request to attach user-recorded audio to a word
- A segment of audio representing a document, word, phrase, or other audio unit
- An ID for an audio slice
- InputType for AudioSlice for creating new documents
- An individual or organization that contributed to the creation or analysis of a particular document or source. Each contributor has a name and a role that specifies the type of their contributions.
- Basic personal details of an individual contributor, which can be retrieved from a particular instance of
Contributor. - Input Object for Contributor
- The creator of a document
- Request to update if a piece of document audio should be included in an edited collection
- Request to update if a piece of word audio should be included in an edited collection
- Connects to our backing database instance, providing high level functions for accessing the data therein.
- Internal Date type which wraps a reliable date library. Adds SQL and GraphQL support to the type.
- GraphQL input type for dates
- Internal DateTime type which wraps a reliable date library. Adds SQL and GraphQL support to the type.
- Delete a contributor attribution for a document based on the two ids
- A unique identifier for audio slices
- Reference to a document collection
- Database ID for one document
- All the metadata associated with one particular document. TODO Make more of these fields on-demand.
- Used for updating document metadata. All fields except id are optional.
- One page of an
AnnotatedDoc - One paragraph within a
DocumentPage - Reference to a document with a limited subset of fields, namely no contents of the document.
- Mostly unused type
- A rectangle slice of something, usually a large document image.
- Collection of images coming from a IIIF source. Generally used to represent the scans of multi-page manuscripts sourced from libraries/archives.
- Input object for IiifImages
- A IIIF server we use as an image source
- Database ID for an image source
- InputObject for ImageSourceId
- Structs for metadata loaders
- A connection between two lexical entries from the same or different sources
- Start of a new line
- Uniquely identifies a particular generalized morpheme based on its parent document, gloss, and index within that document.
- One particular morpheme and all the known words that contain that exact morpheme.
- A single unit of meaning and its gloss which can be edited.
- A concrete representation of a particular functional morpheme.
- Start of a new page
- A single document image from a IIIF source
- Key to retrieve the pages of a document given a document ID
- A paragraph in an annotated document that can be edited.
- Page ID meant for retrieving all paragraphs within.
- The reference position within a document of one specific form
- Attribution for a particular source, whether an institution or an individual. Most commonly, this will represent the details of a library or archive that houses documents used elsewhere.
- One page of a document containing one or more paragraphs
- One paragraph within a document with source text and overall English translation.
- One full translation broken into several
TranslationBlocks. - One block or paragraph of a translation document that should correspond to a block of original text. One block may contain several segments (or lines).
- Update the contributor attribution for a document
- A Universally Unique Identifier (UUID).
- A single unit of meaning and its corresponding English gloss.
- A list of words grouped by the document that contains them.
- Key to query the words within a paragraph given its database ID
Enums§
- Element within a spreadsheet before being transformed into a full document.
- One representation of Cherokee phonology. There are several different writing systems for Cherokee phonology and we want to convert between them. This type enumerates all of the systems that we support and provides conversion from our internal orthography into any of these.
- A contributor can have to any number of roles, which define most of their contributions to the associated item (add or revise as needed)
- The kind of a document in terms of what body it lives within. A reference document is a dictionary or grammar for example, while a corpus document might be a letter, journal, or notice.
- Storage format for Cherokee phonetics. Consonants: t/th in storage, converted to d/t on output. Vowels: struct-defined
- Cherokee vowel categories based on tone and length
- The kind of segment that a particular sequence of characters in a morphemic segmentations represent.
Traits§
- Trait that defines function which takes in a possibly undefined value.
Functions§
- Converts a given phonemic string from the Uchihara representation to the DAILP representation. For example: “a:!” => “áá”
- Is the given gloss for a root morpheme? This is a crude calculation that just checks if there are any lowercase characters. Convention says that typically functional morpheme tags are all uppercase (plus numbers and punctuation), so having lowercase characters indicates a lexical morpheme gloss.
- Parse a canonical morphemic segmentation from the two layers: morphemes and glosses.
- TODO Convert all phonemic representations into the TAOC/DAILP format. TODO Store forms in any format with a tag defining the format so that GraphQL can do the conversion instead of the migration process.
- Parse an iterator of spreadsheet cells into root noun forms ready to insert into the database.
- Build a single verb surface form from the given row.
- Gather many verb surface forms from the given row.
- Parse spreadsheet cells into one verb form with a morphemic segmentation.
- Parse spreadsheet cells into many verb forms with morphemic segmentations.
- Convert consonants in the given d/t phonetics string into their Worcester phonetics equivalents. gw => qu, j => ts
- Convert any unicode string to an ascii “slug” (useful for file names/url components)
- Turns a string into an ltree-friendly slug with underscores.