The UCLDC project is running a harvest of objects in collections in both the Nuxeo DAMS and other external sources such as the OAC. In the upcoming releases, we'll be releasing an interface to register your collection for harvest. For now, this collection registry is seeded with previously identified collections. All harvested data is stored in a Solr index in a standardized metadata schema, and can be retrieved using the publicly available Solr API.

The Metadata Schema

The metadata schema was developed to be interoperable with the DPLA metadata schema, while also supporting the needs of the new Calisphere. This schema is still undergoing active development - this page will be updated as changes are made. Subscribe to this page to get an email update when changes are made. 

NameTypeCommentsMulti-Valued
texttext_generalnot stored; catchall text field for keyword search that indexes tokens - for each object, contains the following fields: title, contributor, creator, coverage, date, description, extent, format, identifier, language, publisher, relation, rights, source, subject, and typeyes
text_revtext_general_revnot stored; the same as the text field, but in reverse for efficient leading wildcard queriesyes
timestampdatetimestamp on the Solr document - default value is NOW, ie the time of object creation in the Solr index. no
COLLECTION REGISTRY FIELDS - all multivalued so an object can be related to more than one Campus, Repository, and/or Collection
campusstringcampus stores the URL to the registry API campus objectyes
campus_namestringcampus_name stores the name of the campus, so that clients don't need to look up against the registry APIyes
collection_urlstringcollection stores the URL to the registry API collection objectyes
collection_namestringcollection_name stores the name of the collection, so that clients don't need to look up against the registry APIyes
collection_data
string
collection_url::collection_name
yes
repository_urlstringrepository stores the URL to the registry API repository objectyes
repository_namestringrepository_name stores the name of the repository, so that clients don't need to look up against the registry APIyes
repository_data
string
repository_url::repository_name
yes
METADATA ON THE METADATA
createddaterefers to creation of the metadata document, not creation of the Solr document, nor creation of the content objectno
last_modifieddaterefers to the date the metadata document was last modifiedno
created_sstringstring variant of created for wildcard searchingno
last_modified_sstringstring variant of last_modified for wildcard searchingno
DUBLIN CORE FIELDS
titletext_generalonly required fieldyes
contributortext_general yes
coveragetext_general yes
creatortext_general yes
datetext_general yes
descriptiontext_general yes
extenttext_general yes
formattext_general yes
identifiertext_general yes
languagetext_general yes
publishertext_general yes
relationtext_general yes
rightstext_general yes
sourcetext_general yes
subjecttext_general yes
typetext_general yes
date_facetdate yes
IMAGE FIELDS
url_itemstringbest guess at home url for the item. Filled in by akara? currently indexed to search for items with it filled in, but will likely not be indexed in final releaseyes
reference_image_md5string

not indexed; holds the md5 of the best image found for image objects this will then be passed to the thumbnail server for nicely sized images. For now you can use md5s3stash to calculate url to image

yes
payloadspayloads yes
_version_long

 

yes
DUBLIN CORE STRING FIELDS - copies of the Dublin Core field by the same name, but stored and indexed as strings, instead of tokenized text
title_ssstring yes
contributor_ssstring yes
coverage_ssstring yes
creator_ssstring yes
date_ssstring yes
description_ssstring yes
extent_ssstring yes
format_ssstring yes
identifer_ssstring yes
language_ssstring yes
publisher_ssstring yes
relation_ssstring yes
rights_ssstring yes
source_ssstring yes
subject_ssstring yes
type_ssstring yes
facet_decade
stringhttps://github.com/ucldc/facet_decadeyes

Coming Soon (finalize by early May)

 

NameTypeCommentsMulti-Valued
structmap_url
string https://github.com/ucldc/ucldc-docs/wiki/media.jsonno
structmap_text
stringdeep harvest (nuxeo) items onlyno
reference_image_dimensions
string

width:height i.e. "100:100", in pixels

no
¿ ga_code ?
stringgoogle analytics code – or, look up from institution-jsonno
alternative_title
text_general yes 
genre

text_general

 yes
temporal

text_general

 yes
rights_holder
text_general yes
rights_note
text_general yes
rights_date
text_general no
provenance
text_general yes
location
text_general yes
transcription
text_general no
  

all text_generals also get an _ss string version

single valued fields get a _s string version rights_date and transcritptio

 
    

 

Changes for beta launch (finalize by July 24)

 

NameTypeCommentsMulti-Valued
sort_collection_data
string
sort_collection_name::collection_name::collection_url
yes
    
id
stringhttps://github.com/ucldc/ucldc-docs/wiki/pretty_idno
harvest_id_sstringmore generic name in anticipation of moving off couchdb. _ss so initially no need to modify schema.no
sort_date_startdate no
sort_date_enddate no
sort_titlestringprobably needs some string normalization (remove quotes, initial articles?)no

 

schema.xml