Child pages
  • 2019-02-14 Meeting notes
Skip to end of metadata
Go to start of metadata

Date

Attendees

Goals

    • Catch up on news of note from campus web archiving projects/programs, other web archiving initiatives
    • Share out overview and details of the CA.gov project, encourage discussion about lessons learned, implications for other projects, especially capturing government information

Discussion items

TimeItemWhoNotes
5-10 minutes AdministriviaKathryn
  •  next meeting's agenda (3/14) - Kathryn reached out to Anna Perricci from webrecorder.io to see about scheduling a demo session, perhaps for next meeting. phare any other agenda ideas on the listserv.
10-20 minutesNews of Note: announcements, conference presentations/report backs, project updates, calls for participation, etc.All
  • any news about the UC Libraries Digital Preservation Strategy Group?
  • NYARC/Archive-It Advancing Art Libraries and Curated Web Archives forum - Kathryn to share documentation; check out NYARC's approach to collaborative web archiving, workflows, discovery strategies, engaging interns/fellows: https://www.nyarc.org/content/faq-web-archiving
  • Cobweb 2.0 - actively pursuing grant funding for next phase of work, looking for pilot users/testers - reach out to Kathryn if interested.
  • Archive-It updates: quarterly call 2/27, estimating data needs for 2019-2020 (CDL will be requesting data subscriptions amounts from campus AIT service contacts late March, early April)
  • UCDLFx opportunities? 5/20-5/22 UCSD
    • born-digital CKG is doing a BoF, perhaps room for a WACKG as well?
    • Eric is on planning committee and can get us on the calendar - can nab a time/place and will check on scattering out the BoFs to avoid overlap
    • Next steps: Here's a Google doc to share ideas - please submit by eod Tuesday(!), then Tori, Rachael, and Kathryn can shape into proposal. Possible ideas: what have you crawled lately?
  • Kevin presenting at SCA - metadata work to develop automated taxonomy, mining WARC data with dig scholarship/IT, perhaps feeding into new functionality at AIT (moving seeds across collections, perhaps Q1)
  • Tori: also presenting at SCA with Marlayna Christensen from UCSD (panel with Kevin) lessons learned re coll dev, and organizing collections once set up, with UA
30-35 minutesProject Spotlight: CA.gov (collection: https://archive-it.org/collections/5763, slide deck about the project)Kathryn
  • Archive of CA.gov domain (state not mandated to retain, so UC/Stanford as well as state)
  • >1000 seeds, many still active, ~5TB
  • Collection development focus on legislative sites (legislative bodies and legislators), agencies, executive offices
  • Originally WAS (2007-2015), moved to Archive-it 2015, grant application/funding 2016, snapshot/metadata sprint 2016-17, currently increasing amount/frequency captured
  • Capturing for posterity, responsive collecting (Google news alerts to inform time-sensitive crawls)
  • Working on improved QA workflows/documentation - how far to go?
  • Collection development
    • ID/review URLs from state, add content as found
    • Not everything can be captured as entered (site not found/robots.txt)
    • Sub-collections (Agencies, Legislature, Governor)
    • Responsive collecting (agency sunsets, legislator resignations, elections...)
    • Crawl frequencies depend on how dynamic content is
    • Complementary collecting
  • Ongoing questions/considerations - targeting PDFs, no authoritative site list, social media, staffing/resourcing, promoting/understanding use/barriers, metadata
    • Collecting election-related materials in addition to sites for officials who were elected: Ivy+ collection - how to approach now that there are two similar collections (campaign sites in one and official sites in another), led to more complementary election-related collecting for CA.gov project
    • Social media - exploring Archive-it and Webrecorder crawling in parallel
  • Description/metadata sprint - collection-level record, crowdsourced metadata for seeds (modeled on sprint idea). Participants would add metadata to set of seeds (2hr time block, 4-seed chunks), called for participants in libraries/archives/library schools, ~80% follow-through rate for interested participants, low-barrier (google form, FAST headings), hoping to do another sprint

Action items

  •