A Study on Data Mining Techniques and Tools for Big Data

Dr. R. Beena, C. Bhuvaneshwari


Big Data refers to large volume, growing data sets with heterogeneous, autonomous source such as Engineering, Genomics, Biology, Meteorology, Environmental research and many more. New technologies, systems and infrastructure must be developed in order to handle these data volumes. Deriving useful information from Big Data requires the development of increasingly sophisticated methods of mathematical and statistical analysis and the design of efficient algorithms.

The big data is constantly varying factor and newer algorithms and tools are continuously being developed to handle this big data.  Big Data is all about exploring large volumes of unstructured, invaluable, imperfect, complex data and extract useful information or knowledge for future use.

The platforms such as GPU, Multicore CPUs etc. can be used to speed up the data processing. There tools like Hadoop, Spark, Dynamo, Pentaho, SAMOA etc., can be used to handle big data. Apart from the above mentioned big data platforms, there are many platforms available with different characteristics and choosing the right platform requires an in-depth knowledge about the capabilities of all these platforms.

This paper provides an in depth study on the various data mining algorithms and tools available for performing big data analytics.


Big Data Mining, Clustering, Classification, Big Data Tools, Hadoop, Spark, Pentaho, ASTERIX.

