Tuesday, May 29, 2012

Common NoSQL Terminologies

Parallel Computing

Parallel computing partitions a task into identical subtasks each of which has its own processing power (cpu) but shared memory address space
For example, a code that computes the sum of the first N integers is very easy to program in parallel. Each processor will be responsible for computing a fraction of the global sum. However, at the end of the execution, all processors have to communicate their local sums to a single processor for example that will add them all up to obtain the global sum.

Distributed Computing

A distributed system consists of multiple autonomous computers that communicate through a computer network. These computers interact with each other in order to achieve a common goal and they have their own local memory.

Distributed computing can scale better than parallel computing because you can add more nodes if you run short of memory. Hadoop, ElasticSearch and most NoSQL databases are some examples of open-source projects which leverage concept of distributed computing.

Cluster

Cluster is set computers they mostly work on same task and can be viewed or treated as single system. They are mostly used for load balancing, high availability and high computing needs.

Node

                Each system in a cluster is known as Node. In other words a cluster is formed with set of Nodes. Each node has its own operating system, memory etc.

High Availability

                High Availability in a cluster means atleast one node is available for the user to access the cluster. For example if one node fails, the data in that node has to be replicated to other nodes in short time so that the user can access that. HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. Load balancing servers are also one example of high availability clusters.


Vertically Scalable

                Scaling vertically (or scale up) means to add resources to a single node in a system, typically involving the addition of CPUs or memory to a single computer when performance degrades. Most of the relational databases are vertically scalable.

Horizontally Scalable

                Horizontal scalability is the capability of a system (cluster) to accept adding or removing multiple nodes (independent units of resources) and making them work as a single system. To scale horizontally (or scale out) usually we add more nodes to a system. For example we can increase the number of nodes in a load balancing cluster from two to three.

Single point of failure

                In a cluster if single node fails, the entire cluster will stop working. This kind of architecture is called single point of failure. High availability clusters should be architectured in a way to avoid single point of failure.

File:Single Point of Failure.png

shared nothing architecture

                In shared nothing architecture, each node is independent and self-sufficient, and there is no single point of failure across the system. None of the nodes in the cluster shares memory or disk storage.

Sharding

                A shard is a horizontal partition of a database table or data. In most of the NoSQL systems a single table will be split into multiple shards. These shards will be shared across the nodes in the cluster.


Replication

                It is the process of creating redundant copies of shards to achieve partition tolerance. Usually two copies of same shard will not be stored in a single node.

Fault Tolerance

                Fault-tolerant describes a cluster designed so that, in the event that a node fails, a backup component or procedure can immediately take its place with no loss of service.

Elasticity

                Elasticity is the ability to add new hardware to a cluster without any interruption or downtime. It should not require reconfiguration and supports incremental addition or removal of hardware.


Monday, May 28, 2012

Migrating data from SQL to NoSQL

Introduction :


NoSQL - probably the hottest term in database technology today. Database volumes have grown continuously since the earliest days of computing, but that growth has intensified dramatically over the past decade as databases have been tasked with accepting data feeds from customers, the public, point of sale devices, GPS, mobile devices, RFID readers and so on.    
Cloud computing also has placed new challenges on the database.  The economic vision for cloud computing is to provide computing resources on demand with a "pay-as-you-go" model.  A pool of computing resources can exploit economies of scale and a leveling of variable demand by adding or subtracting computing resources as workload demand changes.  The traditional RDBMS has been unable to provide these types of elastic services.
First step of trying out any NoSQL database is, migrating the data from SQL to NoSQL system. But it isn’t straight forward task to do so. Often we have to de-normalize our schema before importing, because most of the NoSQL solutions don’t have support for table joins.

SQL to NoSQL IMPORTER:


I have worked with Solr’s data import handler to import data from mysql systems in my previous work place. It’s a pretty flexible one as you can write your own queries to fetch the data, create flat documents with your desired structure. I was searching for something like this to import data from mysql to mongodb. The main important requirement was, the tool has to de-normalize the data on the fly while importing. I wasn’t able to find anything useful. As it’s a simple utility I sat down to write it on my own. The result is http://code.google.com/p/sql-to-nosql-importer/ . It has support for MongoDB, CouchDB and ElasticSearch.

 

HOW to RUN:


1)      Head over to project site and download the latest version http://code.google.com/p/sql-to-nosql-importer/downloads/list

2)     Go to conf folder and open the import.properties file. This is the file where you have to supply properties of the NoSQL system. Default configuration file is shown below.

3)     In the same folder there is one more file called “db-data-config.xml” where you have to supply all the source db (SQL) related properties. This file is more like solr’s data import handler input file. Default is shown below.

4)     After changing these values just run the run.sh or run.bat depending on your operating system.

Conclusion:


Migrating data from SQL to NoSQL system is the initial task as well as important task in trying out NoSQL systems. Give more attention to your schema design before trying out importing the data. Please let me know in the comments section if you face any difficulties.

Sunday, May 27, 2012

Spring and Properties Files


Spring has always tried to be as transparent as possible when it comes to working with properties. Before Spring 3.1 however, the mechanism of both adding new sources of properties into Spring as well as actually using the properties wasn’t as flexible and as robust as it could be.
Starting with Spring 3.1, the new Environment and PropertySource abstractions simplify working with properties. The default Spring Environment now contains two property sources: the System properties and the JVM properties, with the System properties having precedence.
For more information on the unified property management in Spring 3.1, see this official article.

Registering Properties via the XML namespace

In an XML configuration, new property files can be made accessible to Spring via the following namespace element:
<context:property-placeholder location="com/foo/foo.properties"/>

Registering Properties via Java Annotations

Spring 3.1 also introduces the new @PropertySource annotation, as a convenient mechanism of adding property sources to the environment. This annotation is to be used in conjunction with Java based configuration and the @Configuration annotation:
@PropertySource("classpath:/com/foo/foo.properties")
As opposed to using XML namespace element, the Java @PropertySource annotation does not automatically register a PropertySourcesPlaceholderConfigurer with Spring. Instead, the bean must be explicitly defined in the configuration to get the property resolution mechanism working. The reasoning behind this unexpected behavior is by design and documented on this issue.

Behind the Scenes – the Spring Configuration

Before Spring 3.1

Since the convenience of defining property sources with annotations was introduced in the recently released Spring 3.1, XML based configuration was necessary in the previous versions.
Defining a <context:property-placeholder> XML element automatically registers a newPropertyPlaceholderConfigurer bean in the Spring Context. This is also the case in Spring 3.1 if, for backwards compatibility purposes, the XSD schemas are not updated to the 3.1 versions.

In Spring 3.1

From Spring 3.1 onward, the XML <context:property-placeholder> will no longer register the oldPropertyPlaceholderConfigurer but the newly introduced PropertySourcesPlaceholderConfigurer. This replacement class was created be more flexible and to better interact with the newly introducedEnvironment and PropertySource mechanism; it should be considered the standard for 3.1 applications.

Properties by hand in Spring 3.0 –PropertyPlaceholderConfigurer

Besides the convenient methods of getting properties into Spring – annotations and the XML namespace – the property configuration bean can also be defined and registered manually. Working with thePropertyPlaceholderConfigurer gives us full control over the configuration, with the downside of being more verbose and most of the time, unnecessary.

Java configuration

@Bean
public static PropertyPlaceholderConfigurer properties(){
  PropertyPlaceholderConfigurer ppc = new PropertyPlaceholderConfigurer();
  Resource[] resources = new ClassPathResource[ ]
    { new ClassPathResource( "foo.properties" ) };
  ppc.setLocations( resources );
  ppc.setIgnoreUnresolvablePlaceholders( true );
  return ppc;
}

XML configuration

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
  <property name="location">
    <list>
      <value>classpath:foo.properties</value>
    </list>
  </property>
  <property name="ignoreUnresolvablePlaceholders" value="true"/>
</bean>

Properties by hand in Spring 3.1 –PropertySourcesPlaceholderConfigurer

Similarly, in Spring 3.1, the new PropertySourcesPlaceholderConfigurer can also be configured manually:

Java configuration

@Bean
public static PropertySourcesPlaceholderConfigurer properties(){
  PropertySourcesPlaceholderConfigurer pspc =
   new PropertySourcesPlaceholderConfigurer();
  Resource[] resources = new ClassPathResource[ ]
   { new ClassPathResource( "foo.properties" ) };
  pspc.setLocations( resources );
  pspc.setIgnoreUnresolvablePlaceholders( true );
  return pspc;
}

XML configuration

<bean class="org.springframework.context.support.PropertySourcesPlaceholderConfigurer">
  <property name="location">
    <list>
      <value>classpath:foo.properties</value>
    </list>
  </property>
  <property name="ignoreUnresolvablePlaceholders" value="true"/>
</bean>

Using properties in Spring

Both the older PropertyPlaceholderConfigurer and the new PropertySourcesPlaceholderConfigurer added in Spring 3.1 resolve ${…} placeholders within bean definition property values and @Value annotations.
For example, to inject a property using the @Value annotation:
@Value( "${jdbc.url}" )
private String jdbcUrl;
Using properties in Spring XML configuration:
<bean id="dataSource">
  <property name="url" value="${jdbc.url}" />
</bean>
And lastly, obtaining properties via the new Environment APIs:
@Autowired
private Environment env;
...
dataSource.setUrl(env.getProperty("jdbc.url"));

Properties Search Precedence

By default, in Spring 3.1, local properties are search last, after all environment property sources, including property files. This behavior can be overridden via the localOverride property of thePropertySourcesPlaceholderConfigurer, which can be set to true to allow local properties to override file properties.
In Spring 3.0 and before, the old PropertyPlaceholderConfigurer also attempted to look for properties both in the manually defined sources as well as in the System properties. The lookup precedence was also customizable via the systemPropertiesMode property of the configurer:
  • never – Never check system properties
  • fallback (default) – Check system properties if not resolvable in the specified properties files
  • override – Check system properties first, before trying the specified properties files. This allows system properties to override any other property source.

Conclusion

This article covered the various ways to work with Properties in Spring, discussing both the older Spring 3.0 and below and the new support for properties, introduced in Spring 3.1. For a project making heavy use of properties, check out the github project.