Vito D'Orazio
University of Texas at Dallas
School of Economic, Political, and Policy Sciences
Political Science Program
  • Home
  • Research
  • Teaching
  • Data
  • Software

PreText

PreText is a software package written in Perl for representing text documents as data.  The software has been designed to work with documents downloaded from LexisNexis, but it can work with pre-structured documents as well.

The software contains the following features, which may or may not be used:
  • Term weighting by normalized term frequency and term frequency inverse document frequency
  • Named entity recognition using Phil Schrodt's CountryCodes file
  • Stopword removal
  • Stemming using Porter's algorithm
  • Document frequency thresholding
  • Multiple output formats available


Please refer to the PreText manual for details and feel free to email me with any questions, comments or suggestions.

PreText Manual

PreText Download