Recent Posts

Arabographic Optical Character Recognition (OCR)

1 minute read :: Posted on October 5, 2016

The OpenITI team—building on the foundational open-source OCR work of the Leipzig University’s (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven...

Creating Frequency-Based Readers for Classical Arabic

7 minute read :: Posted on May 30, 2016

Learning classical Arabic is a long process. Most of us took great pleasure in advanced reading classes with our professors, but, often struggling with an overwhelming volume of new vocabulary, we also—at least occasionally—had a feeling that a traditional method is not necessarily the most effective one. While adva...

Chronological Coverage of an Arabic Corpus

21 minute read :: Posted on March 29, 2016

While looking for a way to identify all biographical collections and chronicles (and, by extension, all other texts that offer data for time-series analysis) in a collection of 0ver 10,000 texts, it occurred to me that all these texts share the same common feature—they are teeming with dates. So, what if we try t...

Introducing OpenArabic mARkdown

2 minute read :: Posted on November 8, 2015

TEI XML has long become the standard for tagging humanistic texts for research purposes. It is the standard in most digital libraries (including the Perseus Digital Library). Having texts in a TEI XML format that conforms to the standards of a long-standing library allows one to take advantage of libraries’ infrastr...

Distant Reading & the Islamic Archive

less than 1 minute read :: Posted on October 17, 2015

On October 16, 2015, the Digital Islamic Humanities Program at Brown University held its third annual scholarly gathering, a symposium on the subject “Distant Reading & the Islamic Archive.” [View the story on Storify]