Skip to content

Essentials

Today’s applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.

Macrometa Global Data Network (GDN) is a realtime low latency global data layer that provides

  • Multi-model (key-value, docs, graphs, search) real-time database
  • Geo replicated streams for pub/sub, queuing and event processing.
  • Realtime complex event processing on streams and
  • A compute layer for serverless apps and functions colocated with data.

The platform is designed to sit across 100s of worldwide locations/pops and present one unified view.

GDN Essentials

Fabrics

Fabric is a collection of edge data centers linked together as a single, high performance cloud computing system consisting of storage, networking and processing functions. A fabric is created when a tenant account is provisioned with the edge locations. Each fabric contains Collections, Graphs, Streams, Stream Processors and Search.

Geo-Fabrics are subsets of the fabric and can be composed of one or more edge locations in Fabric (defined by tenant). Different geofabrics are usually used to isolate the data inside them (collections, documents etc.) from one another.

A geo fabric contains the following:

  • Collections - are a grouping of JSON documents and are like tables in a RDBMS. You can create any number of collections in a geo fabric. And a collection can have any number of documents.
  • Graphs - consists of vertices and edges. Edges are stored as documents in edge collections. A vertex can be a document of a document collection or of an edge collection (so edges can be used as vertices).
  • Streams - are a type of collection that capture data in motion. Streams support both pub-sub and queuing models. Messages are sent via streams by publishers to consumers who then do something with the message.
  • Stream Processors - to perform complex event processing in realtime on streams.
  • Search - A full-text search engine for information retrieval on one or more linked collections.

Real-time Data

When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain. Macrometa GDN makes building realtime apps dramatically easier. The GDN can push data to applications in realtime across multiple data centers. It dramatically reduces the time and effort necessary to build scalable realtime apps.

Collections

A collection contains zero or more documents. If you are familiar with relational database management systems (RDBMS) then it is safe to compare collections to tables and documents to rows. The difference is that in a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas.

Macrometa GDN is schema-less, which means that there is no need to define what attributes a document can have. Every single document can have a completely different structure and still be stored together with other documents in a single collection. In practice, there will be common denominators among the documents in a collection, but the GDN itself doesn’t force you to limit yourself to a certain data structure.

There are two types of collections:

  • Document Collections - Also refered to as vertex collections in the context of graphs.
  • Edge Collections - Edge collections store documents as well, but they include two special attributes, _from and _to, which are used to create relations between documents.

Usually, two documents (vertices) stored in document collections are linked by a document (edge) stored in an edge collection. This is GDN graph data model. It follows the mathematical concept of a directed, labeled graph, except that edges don’t just have labels, but are full-blown documents.

Collections exist inside of fabrics. There can be one or many fabrics. Different fabrics are usually used for geo-fencing setups, as the data inside them (collections, documents etc.) is isolated from one another. The default fabric _system is special, because it cannot be removed. Fabric users are managed in this fabric.

Similarly fabrics may also contain view entities. A View in its simplest form can be seen as a read-only array or collection of documents. The view concept quite closely matches a similarly named concept available in most relational database management systems (RDBMS). Each view entity usually maps some implementation specific document transformation, (possibly identity), onto documents from zero or more collections.

Data Models

Macrometa GDN supports multiple types of data models.

Key/Value model

The key/value store data model is the easiest to scale. In GDN, this is implemented in the sense that a document collection always has a primary key _key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.

The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If _key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly.

If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly.

Document model

The documents you can store closely follow the JSON format, although they are stored in a binary format called VelocyPack. A document contains zero or more attributes, each of these attributes having a value. A value can either be an atomic type, i.e., number, string, boolean or null, or a compound type, i.e. an array or embedded document / object. Arrays and sub-objects can contain all of these types, which means that arbitrarily nested data structures can be represented in a single document. Documents are grouped into collections.

Each document has a unique primary key which identifies it within its collection. Furthermore, each document is uniquely identified by its document handle across all collections in the same fabric. Different revisions of the same document (identified by its handle) can be distinguished by their document revision. Any transaction only ever sees a single revision of a document.

For example:

{
  "_id" : "myusers/3456789",
  "_key" : "3456789",
  "_rev" : "14253647",
  "firstName" : "John",
  "lastName" : "Doe",
  "address" : {
    "street" : "Road To Nowhere 1",
    "city" : "Gotham"
  },
  "hobbies" : [
    {"name": "swimming", "howFavorite": 10},
    {"name": "biking", "howFavorite": 6},
    {"name": "programming", "howFavorite": 4}
  ]
}

All documents contain special attributes: the document handle is stored as a string in _id, the document’s primary key in _key and the document revision in _rev. The value of the _key attribute can be specified by the user when creating a document. _id and _key values are immutable once the document has been created. The _rev value is maintained by GDN automatically.

Graph model

You can turn your documents into graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the idea of a graph, which directly relates data items in the database.

In SQL databases, you have the notion of a relation table to store n:m relationships between two data tables. An edge collection is somewhat similar to these relation tables; A vertex collection resemble the data tables with the objects to connect.

While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections.

Graph data models are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples.

Stream model

Streams are a type of collection in GDN that capture data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message. Streams can be created via client drivers (pyC8, jsC8), REST API or the web console.

Streams unifies queuing and pub-sub messaging into a unified messaging model that provides a lot of flexibility to users to consume messages in a way that is best for the use case at hand.

producer→stream→subscription→consumer

A stream is a named channel for sending messages. Each stream is backed by a distributed append-only log and can be local (at one edge location only) or global (across all edge locations in the Fabric).

Messages from publishers are only stored once on a stream, and can be consumed as many times as necessary by consumers. The stream is the source of truth for consumption. Although messages are only stored once on the stream, there can be different ways of consuming these messages.

Consumers are grouped together for consuming messages. Each group of consumers is a subscription on a stream. Each consumer group can have its own way of consuming the messages—exclusively, shared, or failover.

GDN provides information retrieval features, natively integrated into C8QL query language and with support for all data models. It is primarily a full-text search engine, a much more powerful alternative to the full-text index type.

GDN introduces the concept of Views which can be seen as virtual collections. Each View represents an inverted index to provide fast full-text searching over one or multiple linked collections and holds the configuration for the search capabilities, such as the attributes to index. It can cover multiple or even all attributes of the documents in the linked collections. Search results can be sorted by their similarity ranking to return the best matches first using popular scoring algorithms.

Configurable Analyzers are available for text processing, such as for tokenization, language-specific word stemming, case conversion, removal of diacritical marks (accents) from characters and more. Analyzers can be used standalone or in combination with Views for sophisticated searching.

The Search features are integrated into C8QL as SEARCH operation and a set of C8QL functions.

The Search features can be used to for various use cases like,

  • Find information in a research database using stemmed phrases, case and accent insensitive, with irrelevant terms removed from the search index (stop word filtering), ranked by relevance based on term frequency (TFIDF).

  • Perform federated full-text searches over product descriptions for a web shop, with the product documents stored in various collections.

  • Query a movie dataset for titles with words in a particular order (optionally with wildcards), and sort the results by best matching (BM25) but favor movies with a longer duration.

Stream Processing

GDN provides geo-replicated stream data processing capabilities to integrate streaming data and takes action based on streaming data.

GDN Essentials

The stream processing can be used for

  • Transforming your data from one format to another (e.g., to/from XML, JSON, AVRO, etc.).
  • Enriching data received from a specific source by combining it with databases, services, etc., via inline calculations and custom functions.
  • Correlating data streams by joining multiple streams to create an aggregate stream.
  • Cleaning data by filtering it and by modifying the content (e.g., obfuscating) in messages.
  • Deriving insights by identifying interesting patterns and sequences of events in data streams.
  • Summarizing data as and when it is generated using temporal windows and incremental time series aggregations.
  • Realtime ETL for Collections, tailing files, scraping HTTP Endpoints, etc.
  • Streaming Integrations i.e., integrating streaming data as well as trigger actions based on data streams. The action can be a single request to a service or a complex enterprise integration flow.

Drivers & CLI

Applications interact with GDN using REST or via a client libraries, also known as a client drivers. Client drivers handle all interaction with the GDN in a language appropriate to the application. Here is a list of client drivers in various languages currentl availabile for C8.

  • Python Driver - pyC8
  • Javascript Driver - jsC8

Support for additional language drivers and CLI is underway.