Skip to content

Documents

Overview

Documents in GDN are JSON objects. These objects can be nested (to any depth) and may contain lists. Each document has a unique primary key which identifies it within its collection. Furthermore, each document is uniquely identified by its document handle across all collections in the same database. Different revisions of the same document (identified by its handle) can be distinguished by their document revision. Any transaction only ever sees a single revision of a document.

For example:

{
  "_id" : "myusers/3456789",
  "_key" : "3456789",
  "_rev" : "14253647",
  "firstName" : "John",
  "lastName" : "Doe",
  "address" : {
    "street" : "Road To Nowhere 1",
    "city" : "Gotham"
  },
  "hobbies" : [
    {name: "swimming", howFavorite: 10},
    {name: "biking", howFavorite: 6},
    {name: "programming", howFavorite: 4}
  ]
}

All documents contain special attributes: the document handle is stored as a string in _id, the document's primary key in _key and the document revision in _rev. The value of the _key attribute can be specified by the user when creating a document. _id and _key values are immutable once the document has been created. The _rev value is maintained by C8 automatically.

Document Handle

A document handle uniquely identifies a document in the database. It is a string and consists of the collection's name and the document key (_key attribute) separated by /.

Document Key

A document key uniquely identifies a document in the collection it is stored in. It can and should be used by clients when specific documents are queried. The document key is stored in the _key attribute of each document. The key values are automatically indexed by C8DB in a collection's primary index. Thus looking up a document by its key is a fast operation. The _key value of a document is immutable once the document has been created.

By default, C8 will auto-generate a document key if no _key attribute is specified, and use the user-specified _key otherwise. The generated _key is guaranteed to be unique in the collection it was generated for. This also applies to sharded collections in a cluster. It can't be guaranteed that the _key is unique within a database or across a whole node or instance however.

This behavior can be changed on a per-collection level by creating collections with the keyOptions attribute.

Using keyOptions it is possible to disallow user-specified keys completely, or to force a specific regime for auto-generating the _key values.

Document Revision

As GDN supports MVCC (Multiple Version Concurrency Control), documents can exist in more than one revision. The document revision is the MVCC token used to specify a particular revision of a document (identified by its _rev). It is fully managed by the server and read-only for the user.

In GDN, the _rev strings are in fact time stamps. They use the local clock of the DBserver that actually writes the document and have millisecond accuracy. Actually, a "Hybrid Logical Clock" is used (for this concept see this paper).

Its value should be treated as opaque, no guarantees regarding its format and properties are given except that it will be different after a document update.Within one shard it is guaranteed that two different document revisions have a different _rev string, even if they are written in the same millisecond, and that these stamps are ascending.

Note

Different servers in cluster might have a clock skew, and therefore between different shards or even between different collections the time stamps are not guaranteed to be comparable. The Hybrid Logical Clock feature does one thing to address this issue: Whenever a message is sent from a node in cluster to another node, it is ensured that any timestamp taken on second node after the message has arrived is greater than any timestamp taken on first node before the message was sent.

The above ensures that if there is some causality between events on different servers, time stamps increase from cause to effect. A direct consequence of this is that sometimes a server has to take timestamps that seem to come from the future of its own clock. It will however still produce ever increasing timestamps. If the clock skew is small, then your timestamps will relatively accurately describe the time when the document revision was actually written.

GDN uses 64bit unsigned integer values to maintain document revisions internally. At this stage we intentionally do not document the exact format of the revision values. When returning document revisions to clients, C8 will put them into a string to ensure the revision is not clipped by clients that do not support big integers.

Note

The _rev attribute can be used as a pre-condition for queries, to avoid lost update situations. That is, if a client fetches a document from the server, modifies it locally (but with the _rev attribute untouched) and sends it back to the server to update the document, but meanwhile the document was changed by another operation, then the revisions do not match anymore and the operation is cancelled by the server. Without this mechanism, the client would accidentally overwrite changes made to the document without knowing about it.

In order to find a particular revision of a document, you need the document handle or key, and the document revision.

Multiple Documents in Single call

The basic document API has been designed to handle not only single documents but multiple documents in a single command. This is crucial for performance, in that it reduces the overhead of individual network round trips between the client and the server.

The general idea to perform multiple document operations in a single command is to use JSON arrays of objects in the place of a single document. As a consequence, document keys, handles and revisions for preconditions have to be supplied embedded in the individual documents given. Multiple document operations are restricted to a single document or edge collection.

Working with Monetary Data

Applications that handle monetary data often require to capture fractional units of currency and need to emulate decimal rounding without precision loss. Compared to relational databases, JSON does not support arbitrary precision out-of-the-box but there are suitable workarounds.

There are two ways to handle monetary data:

  • Monetary data as integer: If you store data as integer, decimals can be avoided by using a general scale factor, eg. 100 making 19.99 to 1999. This solution will work for digits of up to (excluding) 253 without precision loss. Calculations can then be done on the server side.

  • Monetary data as string: If you only want to store and retrieve monetary data you can do so without any precision loss by storing this data as string. However, when using strings for monetary data values it will not be possible to do calculations on them on the server. Calculations have to happen in application logic that is capable of doing arithmetic on string-encoded integers.

Data Retrieval

Queries are used to filter documents based on certain criteria, to compute new data, as well as to manipulate or delete existing documents. Queries can be as simple as a "query by example" or as complex as "joins" using many collections or traversing graph structures. They are written in the C8 Query Language (C8QL).

Cursors are used to iterate over the result of queries, so that you get easily processable batches instead of one big hunk.

Indexes are used to speed up searches. There are various types of indexes, such as hash indexes and geo indexes.