TwoRavens
TwoRavens is a platform for data exploration, analysis, and meta-analysis. This project is funded by the Defense Advanced Research Projects Agency's Data-Driven Discovery of Models (D3M) program.
PreText
PreText is a software package written in Perl for representing text documents as data. The software has been designed to work with documents downloaded from LexisNexis, but it can work with pre-structured documents as well.
The software contains the following features which may or may not be used:
Please refer to the PreText manual for details and feel free to email me with any questions, comments or suggestions.
The software contains the following features which may or may not be used:
- Term weighting by normalized term frequency and term frequency inverse document frequency
- Named entity recognition using Phil Schrodt's CountryCodes file
- Stopword removal
- Stemming using Porter's algorithm
- Document frequency thresholding
- Multiple output formats available
Please refer to the PreText manual for details and feel free to email me with any questions, comments or suggestions.