Crate dailp

Expand description

This crate defines the data structure and type system used by the whole DAILP infrastructure, including data migration, GraphQL layer, and the front-end. There are a few key types to understand about our handling of annotated manuscripts and Cherokee lexical sources.

An AnnotatedDoc represents one manuscript broken down word by word (generally referred to as “forms”). It has several fields of metadata, like its title, id, or collection. The meat of the AnnotatedDoc is its segments, which is a list of segments which may each be a AnnotatedForm, block (contains segments), or line break.

An AnnotatedForm is a single word located in some document that has multiple layers of representation. In DAILP’s Cherokee data, those layers are typically the source text, simple phonetics, phonemic representation, morphemic segmentation, and an English gloss. Each AnnotatedForm always knows what document it came from, retaining a sense of source and concrete reference.

Re-exports§

pub use async_graphql;
pub use chrono;
pub use collection::*;
pub use doc_metadata::*;
pub use menu::*;
pub use sheet_result::*;

Modules§

annotation
auth
This module contains types related to authentication
collection
Provides types for defining and structuring edited collections.
comment
Types that power our features for reading / leaving comments on words and paragraphs
doc_metadata
iiif
This module includes types which are intended to represent the IIIF Presentation API specification. We use these types to build IIIF manifests for any annotated document, allowing any IIIF image viewer to consume and properly display our content.
menu
page
Provides types for structuring text-based pages.
raw
Provides stripped-down types for edited collections
sheet_result
Provides the struct SheetResult which represents a Google Sheets spreadsheet. Also provides functions to retrieve a sheet from Google Sheets.
user
Provides a type representing a user.

Structs§

AbstractMorphemeTag
Represents a morphological gloss tag without committing to a single representation.
Admin
Record for a DAILP admin
AnnotatedDoc
A document with associated metadata and content broken down into pages and further into paragraphs with an English translation. Also supports each word being broken down into component parts and having associated notes.
AnnotatedForm
A single word in an annotated document. One word contains several layers of interpretation, including the original source text, multiple layers of linguistic annotation, and annotator notes. TODO Split into two types, one for migration and one for SQL + GraphQL
AnnotatedFormUpdate
A single word in an annotated document that can be edited. All fields except id are optional.
AttachAudioToDocumentInput
Request to attach user-recorded audio to a document
AttachAudioToWordInput
Request to attach user-recorded audio to a word
AudioSlice
A segment of audio representing a document, word, phrase, or other audio unit
AudioSliceId
An ID for an audio slice
AudioSliceInput
InputType for AudioSlice for creating new documents
BookmarkedOn
ChaptersInCollection
Contributor
An individual or organization that contributed to the creation or analysis of a particular document or source. Each contributor has a name and a role that specifies the type of their contributions.
ContributorAttributionInput
ContributorDetails
Basic personal details of an individual contributor, which can be retrieved from a particular instance of Contributor.
ContributorInput
Input Object for Contributor
ContributorsForDocument
Creator
The creator of a document
CreatorUpdate
CreatorWithDocId
CreatorsForDocument
CurateDocumentAudioInput
Request to update if a piece of document audio should be included in an edited collection
CurateWordAudioInput
Request to update if a piece of word audio should be included in an edited collection
Database
Connects to our backing database instance, providing high level functions for accessing the data therein.
Date
Internal Date type which wraps a reliable date library. Adds SQL and GraphQL support to the type.
DateInput
GraphQL input type for dates
DateTime
Internal DateTime type which wraps a reliable date library. Adds SQL and GraphQL support to the type.
DeleteContributorAttribution
Delete a contributor attribution for a document based on the two ids
DocumentAudioId
A unique identifier for audio slices
DocumentCollection
Reference to a document collection
DocumentId
Database ID for one document
DocumentMetadata
All the metadata associated with one particular document. TODO Make more of these fields on-demand.
DocumentMetadataUpdate
Used for updating document metadata. All fields except id are optional.
DocumentPage
One page of an AnnotatedDoc
DocumentParagraph
One paragraph within a DocumentPage
DocumentReference
Reference to a document with a limited subset of fields, namely no contents of the document.
DocumentShortName
EditedCollectionDetails
FormId
Mostly unused type
Geometry
A rectangle slice of something, usually a large document image.
IiifImages
Collection of images coming from a IIIF source. Generally used to represent the scans of multi-page manuscripts sourced from libraries/archives.
IiifImagesInput
Input object for IiifImages
ImageSource
A IIIF server we use as an image source
ImageSourceId
Database ID for an image source
ImageSourceIdInput
InputObject for ImageSourceId
KeywordWithDocId
Structs for metadata loaders
KeywordsForDocument
LanguageWithDocId
LanguagesForDocument
LexicalConnection
A connection between two lexical entries from the same or different sources
LineBreak
Start of a new line
MorphemeId
Uniquely identifies a particular generalized morpheme based on its parent document, gloss, and index within that document.
MorphemeReference
One particular morpheme and all the known words that contain that exact morpheme.
MorphemeSegmentUpdate
A single unit of meaning and its gloss which can be edited.
MorphemeTag
A concrete representation of a particular functional morpheme.
PageBreak
Start of a new page
PageId
PageImage
A single document image from a IIIF source
PagesInDocument
Key to retrieve the pages of a document given a document ID
ParagraphUpdate
A paragraph in an annotated document that can be edited.
ParagraphsInPage
Page ID meant for retrieving all paragraphs within.
PartsOfWord
PersonFullName
PositionInDocument
The reference position within a document of one specific form
SourceAttribution
Attribution for a particular source, whether an institution or an individual. Most commonly, this will represent the details of a library or archive that houses documents used elsewhere.
SpatialCoverageForDocument
SpatialCoverageWithDocId
SubjectHeadingWithDocId
SubjectHeadingsForDocument
TagForMorpheme
TagId
TranslatedPage
One page of a document containing one or more paragraphs
TranslatedSection
One paragraph within a document with source text and overall English translation.
Translation
One full translation broken into several TranslationBlocks.
TranslationBlock
One block or paragraph of a translation document that should correspond to a block of original text. One block may contain several segments (or lines).
UpdateContributorAttribution
Update the contributor attribution for a document
Uuid
A Universally Unique Identifier (UUID).
WordSegment
A single unit of meaning and its corresponding English gloss.
WordsInDocument
A list of words grouped by the document that contains them.
WordsInParagraph
Key to query the words within a paragraph given its database ID

Enums§

AnnotatedSeg
Element within a spreadsheet before being transformed into a full document.
CherokeeOrthography
One representation of Cherokee phonology. There are several different writing systems for Cherokee phonology and we want to convert between them. This type enumerates all of the systems that we support and provides conversion from our internal orthography into any of these.
ContributorRole
A contributor can have to any number of roles, which define most of their contributions to the associated item (add or revise as needed)
DocumentType
The kind of a document in terms of what body it lives within. A reference document is a dictionary or grammar for example, while a corpus document might be a letter, journal, or notice.
PhonemicString
Storage format for Cherokee phonetics. Consonants: t/th in storage, converted to d/t on output. Vowels: struct-defined
VowelType
Cherokee vowel categories based on tone and length
WordSegmentRole
The kind of segment that a particular sequence of characters in a morphemic segmentations represent.

Traits§

MaybeUndefinedExt
Trait that defines function which takes in a possibly undefined value.

Functions§

convert_udb
Converts a given phonemic string from the Uchihara representation to the DAILP representation. For example: “a:!” => “áá”
is_root_morpheme
Is the given gloss for a root morpheme? This is a crude calculation that just checks if there are any lowercase characters. Convention says that typically functional morpheme tags are all uppercase (plus numbers and punctuation), so having lowercase characters indicates a lexical morpheme gloss.
parse_gloss_layers
Parse a canonical morphemic segmentation from the two layers: morphemes and glosses.
root_noun_surface_form
TODO Convert all phonemic representations into the TAOC/DAILP format. TODO Store forms in any format with a tag defining the format so that GraphQL can do the conversion instead of the migration process.
root_noun_surface_forms
Parse an iterator of spreadsheet cells into root noun forms ready to insert into the database.
root_verb_surface_form
Build a single verb surface form from the given row.
root_verb_surface_forms
Gather many verb surface forms from the given row.
seg_verb_surface_form
Parse spreadsheet cells into one verb form with a morphemic segmentation.
seg_verb_surface_forms
Parse spreadsheet cells into many verb forms with morphemic segmentations.
simple_phonetics_to_worcester
Convert consonants in the given d/t phonetics string into their Worcester phonetics equivalents. gw => qu, j => ts
slugify
Convert any unicode string to an ascii “slug” (useful for file names/url components)
slugify_ltree
Turns a string into an ltree-friendly slug with underscores.