To do computational journalism, at least *some* data must be collected, stored, explored, analyzed, cleaned, managed and "governed." In the past few years, the "traditional" tools for doing this, called relational database management systems (RDBMS), have been supplemented by a new class of tools broadly known as "NoSQL" databases. The name NoSQL comes from the most widely used language for dealing with a traditional RDBMS, SQL.
The NoSQL field is rapidly evolving, but enough knowledge exists to fill several books. The best overview of databases for computational journalists I've found so far comes from Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement.
I've been working through the book, which has been available for a few months in beta from the publisher in the course of collecting the tools for Data Journalism Developer Studio 2012LX and Computational Journalism Server. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement covers, in order:
* PostgreSQL, a traditional RDBMS,
* Riak, a key-value database
* HBase, a columnar database
* MongoDB, a document-oriented database
* CouchDB, a document-oriented database,
* Neo4j, a graph-oriented database, and
* Redis, a key-value database / data structure server.
All of these databases are open source, and they're all supported by either a corporate entity, a non-profit foundation, or some combination of the two. The title really should have been "Seven Databases in Seven Weekends"; each database is covered in three-day hands-on sessions and could easily be done as a series of weekend projects. The book is hands-on - you'll build things with these databases, including a Node.js application combining Redis, CouchDB and Neo4j into an application that provides a "band information service."
Appendix A contains a pair of tables that give an overview of the distinguishing characteristics of the seven databases. As the authors put it, "Although the tables are not a replacement for a true understanding, they should provide you with an at-a-glance sense of what each database is capable of, where it falls short, and how it fits into the modern database landscape."
I believe all of these databases have a place in modern computational journalism, as do the other two well-known open source RDBMS tools, MySQL and SQLite. In particular, for spatial / mapping projects, PostgreSQL, SQLite, MongoDB and CouchDB have robust geographic information systems capabilities either built in or available as add-ons.
I think NoSQL databases will be the core of computational journalism for the next few years. The RDBMS isn't going away, of course, but if you limit yourself to "SQL thinking" or even "object-relational models" and "model-view-controller" architectures, there will be applications you can't build. This book will get you up to speed as fast as you're willing to go.