created on Nov. 16, 2012, 8:40 p.m. by Hevok & updated on Nov. 16, 2012, 8:40 p.m. by Hevok
========= Databases =========
There are many different types of databases, among them the most common ones are relational, document and key-value databases.
Relational
PostgreSQL
SQLite
Document
MongoDB
CouchDB
Key-Value
Redix
Riak
ACID ~~~~ The main theory/rules behind most/many databases is acid:
CAP ~~~ A database can only have two of the following properties:
Relational database dedicates a schema. Columns, rows, relation and normalization. Queries need to be well formulated, as it is easy to be sub-optimal.
Looping and doing a separate .save() for every individual item is inefficient::
names = set()
for book in Book.objects.filter(year=2012):
names.add(book.author.name)
Instead reformulate the statement to perform a single query::
names = Author.objects.filter(books__year=2012)\
.values_list('name', flat=True)\
.distinct()
.. categories = set() for entry in Entries.objects.filter(tag_name="databases"): categories.add(database.category.title)
.. names = Categor.objects.filter(entries__tag
MySQL ~~~~~ MySQL has many issues to be aware of but can also be very fast (like flipper):
Many things there do not work at the same time (often either one feature or another can be active). For instance, MyISAM allows full-table-locking, but no transactions (InnoDB should be preferred). Also the query optimizer is poor. The owner is Oracle (might be not good). However for some tasks its fast.
SQLite ~~~~~~ SQlite is great for development but not for production because of the following points:
SQLite has little integrity checking and it is impossible to do some table alterations. There is no concurrent access because it tries very hard to be a small (lightweight), portable database. On the other-hand this is also good because it is very portable and easy to install.
PostgreSQL ~~~~~~~~~~ Its hard to find any issues with Postgres, despite a few ones:
Postgres has a slow default configuration and it can be a little hard to learn. But despite this obstacles is has almost too many features and is incredible reliable. Further, it has all the SQL features, transaction for everything and even great geospatial support.
In document databases there is no fixed schema. They have low barrier to entry and are closer to python databases.
Their generic problem is that they are immature. Migrations in the databsse are not needed, but must still be performed in the level of the code. Schema have their use as they are there for benefit and protection. So a document database is more risky. There is neither transaction nor integrity checks. There are lots of other issues, like MongoDB is not freeing up disk space unless the entire database is dropped and re-created. Also data can be lost upon a crash (that is now better).
Advantages:
No fixed schema
Closer to python datatypes
Problems
Immature (but improving)
::
db.insert({
'name': 'Hevok',
'tags': ['chaos', 'order', 'simplicity', 'complexity']
'addresses': [
{'type': 'email', 'at': 'denigma.de'},
{'type': 'phone', 'dial': '012345'}
],
})
db.find({
'tags': 'chaos',
'addresses.type': 'phone',
})
key-value databases are fast, but they only can query by key (which is why they are fast).
They allow horizontal scaling (just through in another box), but the drawback is that the consistency is not perfect. It can only be queried by keys, which means they are mostly only useful for things like caching and timestamps. They often support map-reduce queries.
::
db.set('EVA', 'online')
state = db.get('EVA')
db.sadd('aspects', 'research')
db.sadd('aspects', 'programming')
db.sadd('aspects', 'design')
aspects = db.scard('aspects')
Spatial ~~~~~~~ Spatial databases are strange beasts.
Knowledge of projects is useful as every single region has its own projection, creating conversion confusion. Use spatial indexes because they really speed up some problems. In general they are add on to existing databases (for instance PostGIS for Postgres).
Filesystems ~~~~~~~~~~~ A filesystem can also act as a database. It is a hierarchical key-value store, which supports large files. They are very stable and well-known.
Graph ~~~~~ Graph databases are very efficient for neighbour queries, but not for anything else.
Round-Robin ~~~~~~~~~~~ As RRD loses deliberately old data it is actually useful for logging or statistics. There old data is not needed anyway and therefore prevents running out of disk-space.
It is unlikely that all the data fits in one paradigm. Big files are often already stored on filesystem instead in another database type.
Just getting a bigger server goes a long way, i.e. just a big big box instead of horizontal scaling. This prevents unnecessary pre-optimisation and work as well as headaches.
A database need to be explored and different things tried out with it. Different types of database need tested too.
Comment on This Data Unit