Denigma: Change

Change: DBs

created on Nov. 16, 2012, 8:40 p.m. by Hevok & updated on Nov. 16, 2012, 8:40 p.m. by Hevok

========= Databases =========

Kinds of Databases

There are many different types of databases, among them the most common ones are relational, document and key-value databases.

Relational
PostgreSQL
MySQL
SQLite
Document
MongoDB
CouchDB
Key-Value
Redix
Cassandra
Riak
Graph
Object/Hierarchical
Spatial
Time-series / RRD
Search

Theory

ACID ~~~~ The main theory/rules behind most/many databases is acid:

Atomicity
Consistency
Isolation
Durability

CAP ~~~ A database can only have two of the following properties:

Consistency
Availability
Partion tolerance

Relational

Relational database dedicates a schema. Columns, rows, relation and normalization. Queries need to be well formulated, as it is easy to be sub-optimal.

Looping and doing a separate .save() for every individual item is inefficient::

names = set()
for book in Book.objects.filter(year=2012):
    names.add(book.author.name)

Instead reformulate the statement to perform a single query::

names = Author.objects.filter(books__year=2012)\
                      .values_list('name', flat=True)\
                      .distinct()

.. categories = set() for entry in Entries.objects.filter(tag_name="databases"): categories.add(database.category.title)

.. names = Categor.objects.filter(entries__tag

MySQL ~~~~~ MySQL has many issues to be aware of but can also be very fast (like flipper):

No transactional DDL
Poor query optimizer
MyISAM: Full-table locking, no transactions
Oracle
Very fast for some operations

Many things there do not work at the same time (often either one feature or another can be active). For instance, MyISAM allows full-table-locking, but no transactions (InnoDB should be preferred). Also the query optimizer is poor. The owner is Oracle (might be not good). However for some tasks its fast.

SQLite ~~~~~~ SQlite is great for development but not for production because of the following points:

Little integrity checking (slowly being fixed)
Impossible to do some table alterations
No concurrent access
Very low overhead
Very portable

SQLite has little integrity checking and it is impossible to do some table alterations. There is no concurrent access because it tries very hard to be a small (lightweight), portable database. On the other-hand this is also good because it is very portable and easy to install.

PostgreSQL ~~~~~~~~~~ Its hard to find any issues with Postgres, despite a few ones:

Slow default configuration
Can be little harder to learn / less familiar
Almost too many features
Incredible reliable

Postgres has a slow default configuration and it can be a little hard to learn. But despite this obstacles is has almost too many features and is incredible reliable. Further, it has all the SQL features, transaction for everything and even great geospatial support.

Document

In document databases there is no fixed schema. They have low barrier to entry and are closer to python databases.

Their generic problem is that they are immature. Migrations in the databsse are not needed, but must still be performed in the level of the code. Schema have their use as they are there for benefit and protection. So a document database is more risky. There is neither transaction nor integrity checks. There are lots of other issues, like MongoDB is not freeing up disk space unless the entire database is dropped and re-created. Also data can be lost upon a crash (that is now better).

Advantages:
No fixed schema
Low barrier to entry
Closer to python datatypes
Problems
Immature (but improving)
No transactions
No integrity checking

db.insert({
'name': 'Hevok',
    'tags': ['chaos', 'order', 'simplicity', 'complexity']
    'addresses': [
       {'type': 'email', 'at': 'denigma.de'},
       {'type': 'phone', 'dial': '012345'}
    ],
 })

db.find({
   'tags': 'chaos',
   'addresses.type': 'phone',
})

Key-Value

key-value databases are fast, but they only can query by key (which is why they are fast).

Horizontal scaling (but with drawbacks)
Extremely fast
Can only fetch objects by key
Batch/map-reduce queries
Transactions not possible

They allow horizontal scaling (just through in another box), but the drawback is that the consistency is not perfect. It can only be queried by keys, which means they are mostly only useful for things like caching and timestamps. They often support map-reduce queries.

db.set('EVA', 'online')
state = db.get('EVA')

db.sadd('aspects', 'research')
db.sadd('aspects', 'programming')
db.sadd('aspects', 'design')
aspects = db.scard('aspects')

Spatial ~~~~~~~ Spatial databases are strange beasts.

Knowledge of projects = useful
Spatial indexes really speed up some problems
Generally add-on to existing DB

Knowledge of projects is useful as every single region has its own projection, creating conversion confusion. Use spatial indexes because they really speed up some problems. In general they are add on to existing databases (for instance PostGIS for Postgres).

Filesystems ~~~~~~~~~~~ A filesystem can also act as a database. It is a hierarchical key-value store, which supports large files. They are very stable and well-known.

Hierarchical key-value store
Allows multiple writers for appends
Supports very large files

Graph ~~~~~ Graph databases are very efficient for neighbour queries, but not for anything else.

Allow efficient neighbour queries
Generally not much use for anything else

Round-Robin ~~~~~~~~~~~ As RRD loses deliberately old data it is actually useful for logging or statistics. There old data is not needed anyway and therefore prevents running out of disk-space.

Deliberately loses old data
useful for logging or statics

Fazit

It is unlikely that all the data fits in one paradigm. Big files are often already stored on filesystem instead in another database type.

Just getting a bigger server goes a long way, i.e. just a big big box instead of horizontal scaling. This prevents unnecessary pre-optimisation and work as well as headaches.

A database need to be explored and different things tried out with it. Different types of database need tested too.

Tags: rest, databases, comparison

Categories: Tutorial

Parent: Tutorials

Comment: Created entry.

See entry | Admin

Facets

Professions

Achievements