Child pages
  • 2019-11-14 Meeting notes
Skip to end of metadata
Go to start of metadata




  • Round robin sharing across the group (news of note for web archiving)
  • Share information and make connections around approaches to social media archiving.

Discussion items

5-10 minutesAdministriviaKathryn
  • Discussion topic backlog:
    • collecting ethics

    • capture tools scan/assessment

    • web archives use (including analytics from AIT)

    • web archives as data

    • preservation

 10-15 minutesNews of Note: announcements, conference presentations/report backs, project updates, calls for participation, etc. All
  • Archive-It "State of the WARC" survey:
  • Calisphere investigation - exploring ways to integrate web archives in Calisphere, details are still forthcoming, but they'd love to solicit feedback from the CKG as the project progresses. Matthew McKinley might guest at an upcoming WACKG meeting
    • CF: exciting! is there a roll-out? KS: this is very exploratory, investigative yet and not something that's road-mapped yet.
    • CM: ArchivesSpace-Archive-It plug in? other system integrations to consider?
      • challenge to integrate web archiving into collections...
    • KM: interesting to consider integrating web archives into larger discovery context, also could Calisphere host WARC
  • CF: attended BitCurator Forum and can share notes (also conference materials are to be available on the website)
30-40 minutes

Social media collecting approaches (sharing and discussion)

Possible topics to consider:

  • Archiving approaches pros/cons - dynamic web capture vs. creator download/deposit?
  • Tool reviews
  • Ethical issues

  • CM: at UCSF - what does social media collecting look like?
    • many approaches seem to be to capture a broad swatch of posts that are topically organized
    • reqs at UCSF are to archive creator accounts as part of their records - initially considered web capture to do this, but seems to be a pain (hard to scrape/capture) so not investigating archiving downloaded data sets
    • more straightforward to get the data from creators
      • ethical issues - can include all DMs, advertising engagements, etc. - should these be archived, def need to let creators know that this is included. save by default, delete by default, save but notify, delete but notify
    • downloaded data includes atomized content (e.g., all images as images, all video files, etc.) also JSON for posts
  • CF: downloaded posts - access considerations; can't emulate, but can archive/share the data
  • JC: good to be able to download all, but curatorial question of what to keep - archiving an account for its own significance, supporting research across aggregate content. What are the facets of data that need to be identified to creators so that they can opt-in/opt-out, be made aware.
  • CM: Collections as "data resources" - most are too messy to work with; social media data is immediately available/structured as a data resource. Also question of whether researchers have literacies to work with social media collections.
  • KM: Another consideration is whether this data could be anonymized (and stripped of PII)?
  • JC: Perhaps DMs are more sensitive than the advertising engagement, which could be separated out for more aggregate analysis.
  • KM: Great value in the look and feel of how content is presented online, but value to both.
  • CM: Need to learn more about how archives are being used - feels researchers' desire for data resources. What are you all seeing in terms of use and researcher need?
  • CF: Fascinated by Stanford's 4Chan archive - downloadable data files in their DAMS
  • CM: UCSF will be piloting - will report back.

Action items