Course Description
Course Objectives
Big Data involves itself with the compilation of unstructured data, while Data Science is all about creating value out of this data. The dual certification course offered by ExcelR trains students in this visible synergy between the macro and micro-segments of Data Analytics. Suited for freshers and professionals, the course equips one to apply the Big Data framework Apache Hadoop to extract useful data and analyze it further as a data scientist. This training is focused on providing knowledge on all the key techniques such as Statistical Analysis, most widely used Regression Analysis, Data Mining Unsupervised learning techniques These techniques will be explained using the best data science tools in the industry - R & Python.
Methodology
- Lecture with a blend of theoretical & practical exposure
- Hands-on exercise to reiterate the learning
- Quizzes to test the understanding
- Discussions & final exam to attain certificate
Who should attend
- Candidates aspiring to be Big Data Analysts
- Analytics Managers / Professionals, Business Analysts, Software Developer
- Graduates who are looking to build a career in Big Data Administration and Machine Learning
- Employees of organizations, which are planning to shift to Big Data tools
- Finally – Students who are aiming to work in the IT Industry
Learning Outcome
- Be able to install & setup Hadoop & Spark environment for storing and processing the data
- Be able to understand the advantages of distributed batch processing using the Hadoop Distributed File System (HDFS)
- Be able to understand the differences in Hadoop 1.x and Hadoop 2.x version and the advantages of latest version
- Be able to perform exploratory queries on data batches using Spark, including parallel processing
- Be able to understand Spark RDD optimization techniques
- Be able to write programs in Big Data domain as per system architecture
- Understand about the landscape of the various data generation sources
- Learn about the tools & techniques used in the space of analysing both structured & unstructured data
- To understand the differences between descriptive analytics & predictive analytics
- Perform text mining to generate the sentiment analysis of customers
- To understand the data-driven machine learning approaches in taking critical business decisions
- To understand on how to build prediction models for day to day applicability
- To understand on how to perform forecasting to take proactive business decisions
- Learn about representing data in most representable format using data visualization concepts
Course Curriculum
- Introduction to Hadoop
- HDFS (Hadoop Distributed File System)
- Map Reduce
- PIG
- HIVE
- SCALA
- SPARK
- Data Collection
- Exploratory Data Analytics
- Probability and its Distributions
- Hypothesis Testing
- Correlation Analysis
- Regression Analytics
- Data Mining Unsupervised Learning
- Text Mining for Unstructured Data
- Data Mining Supervised Learning / Machine Learning