- WACKG leadership for upcoming year: Now is a good time to establish a co or vice chair - we're about a year in and it'd be great to have someone prepared to take on chairship. Please contact Kathryn with your interest and to consider responsibilities.
- Plan for upcoming meeting topics:
- August - collecting ethics (need a lead/facilitator for this meeting)
- September or October - metadata/discovery, Kevin Miller (UCD) to share out about initiative to derive metadata from analyzed web archive content, could also use a facilitator to host discussion of metadata and discovery approaches.
- other topics for upcoming meetings (which will need a lead/facilitator): preservation, web archives use, web archives as data, social media archiving
|10-15 minutes||News of Note: announcements, conference presentations/report backs, project updates, calls for participation, etc.|| All|
- Christina Fidler (UCB) is using the Google API for YouTube metadata - can share more on this (see below).
- Charlie (UCSF) on the DOC-charged preservation group has shared their report w/DOC but next steps uncertain, focus on costs and resourcing - NB that handling WARCs is on the radar.
- From Archive-It:
|30-40 minutes||Collection development strategies (sharing and discussion)||Tori, Kathryn|
- recaps of SCA presentations from Tori Maches (UCSD) on web archiving strategies for university archives and Kevin Miller (UCD) on using faculty ORCID data to identify web based content (to be shared by email)
- at UCSD, building policy. context: have been collecting for 10+ years, transition from pilot → program and stay within data budget. conducted environmental scan, and will be revisiting policy draft. consulted with folks in the library, including selectors - establishing wants/needs. borrowed from digitization policy and consulting CKG colleagues, NDSA survey results. ensure that there's a mechanism to update policy given how frequently this space can change (not just tech/tools, but also policy approaches). establish selection criteria that apply to a range of subjects. build upon existing local collections, uniqueness, volatility/ephemerality. manage expectations throughout (e.g., "best faith effort")! not really best practices - common practices, yes. tie to existing collecting. includes outlining responsibilities - including who ultimately makes selection decisions. allocating portions of data budget to each selector (including some for responsive collecting). obtaining permission for collecting, addressing robots.txt - establishing when not asking permission, when notifying, making platform-wide approaches. from analysis of NDSA surveys - over time marked decrease in asking permission, similar with robots.txt approaches (to ignore in some contexts or blanket).
- Draft policy
- Kathryn Stine (CDL) on identifying/updating new seeds for the CA.gov collection - using an API to grab data about URLs, then parsing returned JSON to develop seedlist and proto-metadata (more detail to come in a write-up).
- Christina Fidler - using Google API - has a write up that she can share. caveat: need a log in (so institutional account helps!). JSON can be intimidating to parse, but there are free online tools to transform, say to CSV.
- ACTION: KS to set up wiki re tool inventory/write-ups on processes used to develop seedlists and metadata
- Open discussion of collection development strategies - share your examples of how you identify and find new content to grow collections (e.g., content-owner contacts, web scraping, APIs, etc.)