Child pages
  • 2019-12-12 Meeting notes
Skip to end of metadata
Go to start of metadata



Discussion items

5-10 minutesAdministriviaKathryn
  • Discussion topic backlog:

    • capture tools scan/assessment

    • web archives use (including analytics from AIT)

    • web archives as data

    • preservation

  • AIT subscriptions at mid-year
10-15 minutesNews of Note: announcements, conference presentations/report backs, project updates, calls for participation, etc. All
  • CM: AIDS history project - making coll available as data. wondering if they can find ways to incorporate websites/social media into datasets. success in partnering with clinical data process folks. NB data is available on Dryad (data ocr'ed from text docs so far). had a workshop earlier this year to use NLP - there is a public repository on GitHub.
  • KM: have an ongoing project to derive automated taxonomy from WARCs, working with data lab at UCD on this, with grad students getting going. will share out in future.
30-45 minutesWeb archives collecting ethics discussionTori

Questions around ethics, what implications for our work in web archiving. Much of the conversation has been around social media, where there could be high stakes, much more personal. So many of us are working with archiving websites, institutional websites.

Tension between how much of what we're collecting is publicly available, but can sometimes feel private. Perhaps tension between intention/expectation and the fact that it is available.

Getting requests to crawl people's say, flickr albums - which feels very personal and is curated, perhaps not for a public audience. So CM at UCSF is glad to take the tack of requesting social media data directly from creators - this is a mediation that makes the donation/acquisition intentional and transparent.

Many folks don't know how their data is being used - role for archive in understanding/framing this.

GDPR and right to be forgotten - shift in public thinking about online presence/data, what is public evidence/public record - potential impact on our collecting ethics. Who has the right to be forgotten - just individuals?

Example: YouTube commenting is similar to a public comment on a newspaper article. What is less clear (and has less of an analog in pre/non-web contexts) is how to handle third-party content contributed (e.g., if collecting a person's social media account, what about their friend's content). CLIR email report ( recommends that you get permission from correspondants (for email, b/c different beast than letters). in a perfect world - commenters would encounter a pop-up that lets them know that they're content could be archived. CF is part of DLF group looking at legal issues - can bring this work back if it results in documentation of legal questions/considerations. Relationship between the access considerations that play into access for digitized (and born digital) collections.

Use Case from CLIR report: "During this negotiation and appraisal stage, one should also determine if there are some parts of the email account(s) that the owner/donor does not want to deposit into a repository. For example, the owner may be corresponding with whistleblowers or other categories of individuals who might wish to remain anonymous at various companies or agencies – these individuals should be notified by the owner of the account that their messages may end up in your repository. They should be allowed to make the call to disallow inclusion in the archive. 2 Or these decisions might be made by the owner themselves – who may not want to include correspondence from or with specific individuals. This can be common if an individual mixes personal and professional in one account.3 While you might argue for inclusion and embargos, the decision rests with them and the legal agreement your repository has signed."

How could we adapt traditional archiving ethos/approaches to web-based content? For example, correspondence - letters sent TO archival creator, can take same approach with email, but social media and web archives, it's not mediated and feels more public.

Affirmative consent from everyone represented in a collection? Other approaches on the other end, at the point of access/use - mediation, scrubbing data (PII), trigger warnings?

Social media archives approaches - not an if, but a can we be prepared for this?

Collections related to an event of local impact - sites that document community. How to present those ethically (e.g.. ppl depicted)?

What about discussion of particular topics online that could be illegal in certain jurisdictions?

May be impossible/intractable to go back and id ppl depicted/included in captured content - balance ethical obligation to them and desire to preserve ephemeral content.

SAA Documenting in Times of Crisis toolkit:

Real world implications of archiving work - Trever Owens description of someone depicted in a tweeted photo from MO (in wake of killing of Michael Brown).

Also, recall Boston College's IRA collection and what could be promised/shared and not.

Ethics of responsibility - to creators/those depicted or mentioned, to users, to ourselves in accessioning/processing.

Researcher perspectives on using web-based content:

FOLLOW UP ideas:

  • Journal club? read through these reports?
  • CF happy to start a bibliography - we're all plugged in to diff groups so it'd be good to have a space to compare notes. Perhaps open the bibliography up to born digital CKG?