Vito D'Orazio
University of Texas at Dallas
School of Economic, Political, and Policy Sciences
Political Science Program
  • Home
  • Research
  • Teaching
  • Data
  • Software

TwoRavens

TwoRavens is a platform for data exploration, analysis, and meta-analysis. This project is funded by the Defense Advanced Research Projects Agency's Data-Driven Discovery of Models (D3M) program. 
  • Demo
  • About TwoRavens
  • GitHub repo

PreText

PreText is a software package written in Perl for representing text documents as data.  The software has been designed to work with documents downloaded from LexisNexis, but it can work with pre-structured documents as well.

The software contains the following features which may or may not be used:
  • Term weighting by normalized term frequency and term frequency inverse document frequency
  • Named entity recognition using Phil Schrodt's CountryCodes file
  • Stopword removal
  • Stemming using Porter's algorithm
  • Document frequency thresholding
  • Multiple output formats available


Please refer to the PreText manual for details and feel free to email me with any questions, comments or suggestions.​
  • GitHub for MID5
  • GitHub for MID4 version
  • PreText Manual
  • PreText Download