License large HTML full-text biological corpora
Build visualization tools
Do the NLP work I've outlined!
Continue my work on diagram understanding (vectorization sub-project)
The technology: Java and Oracle
NSF proposal for all this