What is Data Mining?
Big Data!!! Are you demotivated when your peers are discussing about data science and recent advances in Big Data? Did you ever think how Flipkart and Amazon are suggesting products for their customers? Do you know how financial institutions/retailers are using Big Data to transform themselves into next-generation enterprises? Do you want to be part of the world class next generation organizations to change the game rules of the strategy making and to zoom your career to newer heights?
Here is the power of Data Science in the form of Data Mining concepts which are considered most powerful techniques in Big Data Analytics.
Data Mining with R unveils underlying amazing patterns, wonderful insights which go unnoticed otherwise, from the large amounts of data. Data mining tools predict behaviours and future trends, allowing businesses to make proactive, unbiased and scientific-driven decisions. Data mining has powerful tools and techniques that answer business questions in a scientific manner, which traditional methods cannot answer. Adoption of data mining concepts in decision making changed the companies, the way they operate the business and improved revenues significantly.
Companies in a wide range of industries such as Information Technology, Retail, Telecommunication, Oil and Gas, Finance, Healthcare are already using data mining tools and techniques to take advantage of historical data and to create their future business strategies.
Data mining can be broadly categorized into two branches i.e. supervised learning and unsupervised learning. Unsupervised learning deals with identifying significant facts, relationships, hidden patterns, trends and anomalies. Clustering, Principle Component Analysis, Association Rules, etc., are considered unsupervised learning. Supervised learning deals with prediction and classification of the data with machine learning algorithms. Weka is the most popular tool for supervised learning.
Things You Will Learn
- Basic matrix algebra
- Introduction to data mining
- Dimension reduction techniques: Principal Component Analysis(PCA)
- Singular Value Decomposition (SVD)
- Association rules
- Sequential pattern mining
- Recommender Systems (Collaborative Filtering)
- Network Analytics: Degree centrality, Closeness Centrality etc.
- Cluster Analysis- Application on segmentation, anomaly detection
- Hierarchical clustering and K-means clustering with various distance measures and for continuous/ categorical variables
- Overview of machine learning/supervised learning
- Data exploration methods: Understanding data(distributions, visualizations), Data nuances, data transformations
- Basic classification algorithms
- Version spaces and decision trees classifier
- K-Nearest Neighbors and Parzen window
- Bayesian classifiers: naïve Bayes and other discriminant classifiers
- Perceptron and Logistic regression
- Neural networks
- Advanced classification algorithms
- Bayesian Networks
- Support Vector Machines
- Model validation and interpretation
- Multi-class classification problem
- Bagging(random forest) and Boosting( Gradient Boosted Decision Trees)
- Regression Analysis
- Recommendation engines
- Information retrieval
- Practical tips in modeling: Bias vs trade-off, Feature engineering and incorporating domain knowledge.