DHAsia Hands-On Clinic | Stylometrics and Genre Research in Imperial Chinese Studies, with Paul Vierthaler

Thu February 11th 2016, 1:30 - 4:00pm
Event Sponsor
Wallenberg Hall, Center for Spatial and Textual Analysis (CESTA), History Department, Center for East Asian Studies
CESTA, Wallenberg Hall, 4th Floor
DHAsia Hands-On Clinic | Stylometrics and Genre Research in Imperial Chinese Studies, with Paul Vierthaler


In this hands-on workshop, Paul Vierthaler will introduce participants to the basics using stylometry to analyze classical and vernacular Chinese texts.

At the end of the workshop, participants will have developed a workflow to sanitize, normalize, and then analyze documents in a corpus of Chinese texts. Participants will also work with tools to semi-automatically detect genre or authorship.

This workshop will focus on problems unique to working with the types of language found in pre-1911 Chinese texts. We will begin by going over how to prepare documents for analysis.

This will include discussions on how to tokenize a Chinese text in ways that improve the accuracy of analysis, particularly when comparing texts written in very different linguistic registers. We will also cover text sanitization and data normalization.

The workshop will conclude by discussing hierarchical cluster analysis, principal component analysis, and (pending time constraints) classification algorithms.

IMPORTANT NOTE: Although focused on the Chinese case, the analytical approaches examined here are valuable for scholars working across Asia, on all time periods.

Contact Phone Number