Tools and queries

 

So the tools we need are not rule-based parsers or even statistical tools to induce the alleged rules. Instead we need a massive inspection of the data, starting with visualization tools to find the standard "coding sequences" of content.

Rather than plunging deep into issues of semantic representation of what we find, there is a radically different route: Build direct mappings between text content and queries. E.g., "Group II introns are mobile genetic elements" <--> "What are some mobile genetic elements?"