Data Science and Machine Learning in Python

If you are interested in this course, please click the below link to fill in the form, and we will get back to you.

Introduction

This is a code-intensive hands-on course with a focus on intuitive and deep understanding of the concepts and algorithms involved in modern Data Science and Machine Learning techniques using the Python programming language. This course will teach you not just to code, but to code in a professional manner, following the industry best practices, standards, and norms. You will start from a complete novice to being someone who is proficient at performing data analysis and Machine Learning tasks using powerful Python libraries such as Numpy, Pandas, Matplotlib, and Scikitlearn at a professional level.

Course organization

The course is made up of the following units.

Unit	Hours
Python Fundamentals	10
Python Intermediate	10
Python for Data Science Numpy	12
Data Manipulation with Pandas	12
Visualization with Matplotlib	5
Machine Learning with Scikit-learn	16
Total Hours	65

Course Contents

Python Fundamentals

This unit involves fundamental concepts and data structures in Python.

Topics Covered

Python variables
- Data types
- Type conversion
Python Operators
Lists
Conditionals
Loops
Functions
- Lambda functions
Tuples
Sets, Frozensets
- Set operations: intersection, union, diff
Dictionary

Python Intermediate

This unit deals with more advanced concepts in Python.

Topics Covered

Functional programming
- map, filter, reduce
File handling: read, write, append

Object-oriented Programming in Python
- classes and object instances
- class and instance variables
- constructor (__init__ method)
- instance methods
- Encapsulation
- Inheritance
- Method overriding/ Polymorphism
Comprehensions
- List comprehension
- Set comprehension
- Dictionary comprehension

Iterators/ Iterables
Generators
- Generator expressions
Decorators
Closure

Basic Statistical Concepts

In this unit, fundamental statistical concepts will be introduced with an intuitive feel.

Topics Covered

Mean/Median/Mode
Standard deviation/variance
Intro to Probability
Probability distributions
- Uniform Distribution
- Gaussian/Normal Distribution
Correlation/ Covariance/ multivariate covariance

Python for Data Science

This unit builds your hands-on skills in using popular Python packages for Data Science.

Numpy (Numerical Python Package)

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. NumPy arrays provide extremely efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python.

Topics Covered

Numpy arrays
- numpy array attributes
- array indexing
- array slicing
- accessing multi-dimensional arrays
- reshaping arrays
- array concatenation and splitting
Numpy standard data types
Computation on NumPy arrays: Universal functions (Ufunc)
Advanced Ufunc features
- Aggregates
- Inner/Outer products
Example: Average heights of US presidents analysis (histogram, ufuncs)
Computation on Arrays: Broadcasting
Comparison, Masks, and Boolean Logic
Example: Counting Rainy Days in Seattle analysis
Fancy indexing
- Fancy indexing in multiple dimensions
- Combined indexing
Example: Selecting Random Points (covariance matrix, correlation, multivariate normal function etc.)
Sorting arrays
- Partial sorts: Partitioning
Structured Data: Numpy’s structured arrays

Data Manipulation with Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Numpy package.

Topics Covered

Pandas objects
- Series objects
- DataFrame objects
- Index objects
Data indexing and selection in Series and DataFrame
- Indexers: loc and iloc
Operating on Data in Pandas
- Ufuncs: Index preservation
- Ufuncs: Index alignment
Ufuncs: Operations between DataFrame and Series
Handling Missing Data
Combining Datasets: Merge and Join
- Relational Algebra
- Categories of Joins
- Specification of the merge key
- Specifying Set arithmetic for Joins
Aggregations and Grouping
- Example: Planets Data analysis
- GroupBy: Split, Apply, Combine
- Aggregate, filter, transform, apply methods
Working with Time series
- Dates and Times in Python
- Numpy’s datetime64
- Dates and Times in Pandas: Best of both worlds
- Pandas Time Series Data Structures
- Resampling, Shifting, and Windowing
- Example: Microsoft Stock market time series data analysis
- Example: Visualizing Seattle Bicycle Counts

Visualization with Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive data visualizations in Python.

Topics Covered

Setting styles
How to display your plots
- Plotting from a script
- Plotting from IPython shell
- Plotting from an IPython notebook
Two interfaces for the price of one
- Matlab-style interface
- Object-oriented interfaces
Simple Line plots
- Adjusting the plot: Line colors, styles, Axes limits, labeling
Simple Scatter plots
Example: Iris dataset visualization
Histograms, Binnings, and Density
Example: Visualizing Handwritten Digits
Multiple subplots
Visualization with Seaborn package
- Seaborn vs. matplotlib
- Histogram, KDE (Kernel Density Estimation), and densities
- Pair plots, faceted histograms, bar plots

Machine Learning with Scikit-learn

This unit delves deep into theory and practice of using different machine learning algorithms using the scikit-learn package.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistent interface in Python.

Topics Covered

Introduction to Machine Learning
Categories of Machine Learning
- Supervised ML
- Unsupervised ML
Qualitative examples of ML applications
- Classification: Predicting discrete labels
- Regression: Predicting continuous labels
- Clustering: Inferring labels on unlabeled data
- Dimensionality Reduction: Inferring structure of unlabeled data
Introduction to Scikit-learn package
Hyperparameters and Model validation
Feature Engineering
Naive Bayes algorithm
Linear Regression
Support Vector Machines
Random Forests
Principal Components Analysis (PCA)
K-means algorithm
Manifold Learning
Gaussian Mixture models
Kernel Density Estimation

Programming Assignments and Projects

Assignment 1: Python Basics
Assignment 2: Python Conditionals
Assignment 3: Functions
Assignment 4: Loops
Assignment 5: File Handling
Assignment 6: Dictionaries
Assignment 7: NumPy
Assignment 8: Pandas
Assignment 9: Data Science Project – Covid data analysis
Assignment 10: Machine Learning Project – Framingham Heart Disease Dataset Predictive Analysis

Instructor Profile

Himalaya Kakshapati has more than a decade of experience working in the software industry. He has more than six years of experience in the area of Artificial Intelligence/Machine Learning (AI/ML), designing and building end-to-end AI applications for the U.S. healthcare industry using Cloud services, such as IBM Cloud, IBM Watson, AWS (Amazon Web Services) and cutting-edge tools and technologies. He has worked extensively on Data Science projects. He is also an AI consultant.

He is an avid educator as well and has taught Computing courses at various renowned colleges (most of them affiliated to UK-based universities) in Kathmandu with remarkable results.

He has completed the following specializations:

Building Artificial Intelligence Applications using Python and TensorFlow (University of Oxford, UK)
Spark NLP Data Scientist (John Snow Labs, USA)
Spark NLP Healthcare Data Scientist (John Snow Labs, USA)
Deep Learning Specialization (Deeplearning.ai)
Probabilistic Graphical Models Specialization (Stanford University, USA)
Mathematics for Machine Learning Specialization (Imperial College London, UK)
Natural Language Processing Specialization (Deeplearning.AI)
Python 3 Specialization (University of Michigan, USA)
Reinforcement Learning Specialization (University of Alberta, Canada)
IBM MobileFirst Platform Foundation Technical Sales Professional v1 (IBM)
Microsoft Certified Professional (Microsoft)

Himalaya has a Bachelor’s degree in Electrical Engineering and Mathematics from the University of Illinois Chicago, USA, and a Master’s degree in Software Engineering from the University of Hertfordshire, UK, both with distinctions.

If you are interested in this course, please click the below link to fill in the form, and we will get back to you.

Google Form