Tuesday, May 29, 2012

Common NoSQL Terminologies

Parallel Computing

Parallel computing partitions a task into identical subtasks each of which has its own processing power (cpu) but shared memory address space
For example, a code that computes the sum of the first N integers is very easy to program in parallel. Each processor will be responsible for computing a fraction of the global sum. However, at the end of the execution, all processors have to communicate their local sums to a single processor for example that will add them all up to obtain the global sum.

Distributed Computing

A distributed system consists of multiple autonomous computers that communicate through a computer network. These computers interact with each other in order to achieve a common goal and they have their own local memory.

Distributed computing can scale better than parallel computing because you can add more nodes if you run short of memory. Hadoop, ElasticSearch and most NoSQL databases are some examples of open-source projects which leverage concept of distributed computing.

Cluster

Cluster is set computers they mostly work on same task and can be viewed or treated as single system. They are mostly used for load balancing, high availability and high computing needs.

Node

                Each system in a cluster is known as Node. In other words a cluster is formed with set of Nodes. Each node has its own operating system, memory etc.

High Availability

                High Availability in a cluster means atleast one node is available for the user to access the cluster. For example if one node fails, the data in that node has to be replicated to other nodes in short time so that the user can access that. HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. Load balancing servers are also one example of high availability clusters.


Vertically Scalable

                Scaling vertically (or scale up) means to add resources to a single node in a system, typically involving the addition of CPUs or memory to a single computer when performance degrades. Most of the relational databases are vertically scalable.

Horizontally Scalable

                Horizontal scalability is the capability of a system (cluster) to accept adding or removing multiple nodes (independent units of resources) and making them work as a single system. To scale horizontally (or scale out) usually we add more nodes to a system. For example we can increase the number of nodes in a load balancing cluster from two to three.

Single point of failure

                In a cluster if single node fails, the entire cluster will stop working. This kind of architecture is called single point of failure. High availability clusters should be architectured in a way to avoid single point of failure.

File:Single Point of Failure.png

shared nothing architecture

                In shared nothing architecture, each node is independent and self-sufficient, and there is no single point of failure across the system. None of the nodes in the cluster shares memory or disk storage.

Sharding

                A shard is a horizontal partition of a database table or data. In most of the NoSQL systems a single table will be split into multiple shards. These shards will be shared across the nodes in the cluster.


Replication

                It is the process of creating redundant copies of shards to achieve partition tolerance. Usually two copies of same shard will not be stored in a single node.

Fault Tolerance

                Fault-tolerant describes a cluster designed so that, in the event that a node fails, a backup component or procedure can immediately take its place with no loss of service.

Elasticity

                Elasticity is the ability to add new hardware to a cluster without any interruption or downtime. It should not require reconfiguration and supports incremental addition or removal of hardware.


No comments:

Post a Comment