Skip to content

Data Science and Machine Learning in Python

If you are interested in this course, please click the below link to fill in the form, and we will get back to you.

Google Form

Introduction

This is a code-intensive hands-on course with a focus on intuitive and deep understanding of the concepts and algorithms involved in modern Data Science and Machine Learning techniques using the Python programming language. This course will teach you not just to code, but to code in a professional manner, following the industry best practices, standards, and norms. You will start from a complete novice to being someone who is proficient at performing data analysis and Machine Learning tasks using powerful Python libraries such as Numpy, Pandas, Matplotlib, and Scikitlearn at a professional level.

Course organization

The course is made up of the following units.

UnitHours
Python Fundamentals10
Python Intermediate10
Python for Data Science Numpy12
Data Manipulation with Pandas12
Visualization with Matplotlib5
Machine Learning with Scikit-learn16
Total Hours 65

Course Contents

Python Fundamentals

This unit involves fundamental concepts and data structures in Python.

Topics Covered 

  • Python variables
    • Data types
    • Type conversion
  • Python Operators
  • Lists
  • Conditionals
  • Loops
  • Functions
    • Lambda functions
  • Tuples
  • Sets, Frozensets
    • Set operations: intersection, union, diff
  • Dictionary   

Python Intermediate

This unit deals with more advanced concepts in Python.

Topics Covered

  • Functional programming
    • map, filter, reduce
  • File handling: read, write, append
  • Object-oriented Programming in Python
    • classes and object instances
    • class and instance variables
    • constructor (__init__ method)
    • instance methods
    • Encapsulation
    • Inheritance
    • Method overriding/ Polymorphism
  • Comprehensions
    •  List comprehension
    •  Set comprehension
    •  Dictionary comprehension   
  • Iterators/ Iterables
  • Generators
    • Generator expressions
  • Decorators
  • Closure

Basic Statistical Concepts

In this unit, fundamental statistical concepts will be introduced with an intuitive feel.

Topics Covered

  • Mean/Median/Mode
  • Standard deviation/variance
  • Intro to Probability
  • Probability distributions
    • Uniform Distribution
    • Gaussian/Normal Distribution
  • Correlation/ Covariance/ multivariate covariance

Python for Data Science

This unit builds your hands-on skills in using popular Python packages for Data Science. 

Numpy (Numerical Python Package)

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. NumPy arrays provide extremely efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python.

Topics Covered

  • Numpy arrays
    • numpy array attributes
    • array indexing
    • array slicing
    • accessing multi-dimensional arrays
    • reshaping arrays
    • array concatenation and splitting
  • Numpy standard data types
  • Computation on NumPy arrays: Universal functions (Ufunc)
  • Advanced Ufunc features
    • Aggregates
    • Inner/Outer products
  • Example: Average heights of US presidents analysis (histogram, ufuncs)
  • Computation on Arrays: Broadcasting
  • Comparison, Masks, and Boolean Logic
  • Example: Counting Rainy Days in Seattle analysis
  • Fancy indexing
    • Fancy indexing in multiple dimensions
    • Combined indexing
  • Example: Selecting Random Points (covariance matrix, correlation, multivariate normal function etc.)
  • Sorting arrays
    • Partial sorts: Partitioning
  • Structured Data: Numpy’s structured arrays

Data Manipulation with Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Numpy package. 

Topics Covered

  • Pandas objects
    • Series objects
    • DataFrame objects
    • Index objects
  • Data indexing and selection in Series and DataFrame
    • Indexers: loc and iloc
  • Operating on Data in Pandas
    • Ufuncs: Index preservation
    • Ufuncs: Index alignment
  • Ufuncs: Operations between DataFrame and Series
  • Handling Missing Data
  • Combining Datasets: Merge and Join
    • Relational Algebra
    • Categories of Joins
    • Specification of the merge key
    • Specifying Set arithmetic for Joins
  • Aggregations and Grouping
    • Example: Planets Data analysis
    • GroupBy: Split, Apply, Combine
    • Aggregate, filter, transform, apply methods
  • Working with Time series
    • Dates and Times in Python
    • Numpy’s datetime64
    • Dates and Times in Pandas: Best of both worlds
    • Pandas Time Series Data Structures
    • Resampling, Shifting, and Windowing
    • Example: Microsoft Stock market time series data analysis
    • Example: Visualizing Seattle Bicycle Counts

Visualization with Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive data visualizations in Python. 

Topics Covered

  • Setting styles
  • How to display your plots
    • Plotting from a script
    • Plotting from IPython shell
    • Plotting from an IPython notebook
  • Two interfaces for the price of one
    • Matlab-style interface
    • Object-oriented interfaces
  • Simple Line plots
    • Adjusting the plot: Line colors, styles, Axes limits, labeling
  • Simple Scatter plots
  • Example: Iris dataset visualization
  • Histograms, Binnings, and Density
  • Example: Visualizing Handwritten Digits
  • Multiple subplots
  • Visualization with Seaborn package
    • Seaborn vs. matplotlib
    • Histogram, KDE (Kernel Density Estimation), and densities
    • Pair plots, faceted histograms, bar plots

Machine Learning with Scikit-learn

This unit delves deep into theory and practice of using different machine learning algorithms using the scikit-learn package.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistent interface in Python. 

Topics Covered

  • Introduction to Machine Learning
  • Categories of Machine Learning
    •     Supervised ML
    •     Unsupervised ML
  • Qualitative examples of ML applications
    •     Classification: Predicting discrete labels
    •     Regression: Predicting continuous labels
    •     Clustering: Inferring labels on unlabeled data
    •     Dimensionality Reduction: Inferring structure of unlabeled data
  • Introduction to Scikit-learn package
  • Hyperparameters and Model validation
  • Feature Engineering
  • Naive Bayes algorithm
  • Linear Regression
  • Support Vector Machines
  • Random Forests
  • Principal Components Analysis (PCA)
  • K-means algorithm
  • Manifold Learning
  • Gaussian Mixture models
  • Kernel Density Estimation

Programming Assignments and Projects

  1. Assignment 1: Python Basics
  2. Assignment 2: Python Conditionals
  3. Assignment 3: Functions
  4. Assignment 4: Loops
  5. Assignment 5: File Handling
  6. Assignment 6: Dictionaries
  7. Assignment 7: NumPy
  8. Assignment 8: Pandas
  9. Assignment 9: Data Science Project – Covid data analysis
  10. Assignment 10: Machine Learning Project – Framingham Heart Disease Dataset Predictive Analysis

Instructor Profile

Himalaya Kakshapati has more than a decade of experience working in the software industry. He has more than six years of experience in the area of Artificial Intelligence/Machine Learning (AI/ML), designing and building end-to-end AI applications for the U.S. healthcare industry using Cloud services, such as IBM Cloud, IBM Watson, AWS (Amazon Web Services) and cutting-edge tools and technologies. He has worked extensively on Data Science projects. He is also an AI consultant.

He is an avid educator as well and has taught Computing courses at various renowned colleges (most of them affiliated to UK-based universities) in Kathmandu with remarkable results.

He has completed the following specializations:

  • Building Artificial Intelligence Applications using Python and TensorFlow (University of Oxford, UK)
  • Spark NLP Data Scientist (John Snow Labs, USA)
  • Spark NLP Healthcare Data Scientist (John Snow Labs, USA)
  • Deep Learning Specialization (Deeplearning.ai)
  • Probabilistic Graphical Models Specialization (Stanford University, USA)
  • Mathematics for Machine Learning Specialization (Imperial College London, UK)
  • Natural Language Processing Specialization (Deeplearning.AI)
  • Python 3 Specialization (University of Michigan, USA)
  • Reinforcement Learning Specialization (University of Alberta, Canada)
  • IBM MobileFirst Platform Foundation Technical Sales Professional v1 (IBM)
  • Microsoft Certified Professional (Microsoft)

Himalaya has a Bachelor’s degree in Electrical Engineering and Mathematics from the University of Illinois Chicago, USA, and a Master’s degree in Software Engineering from the University of Hertfordshire, UK, both with distinctions.

If you are interested in this course, please click the below link to fill in the form, and we will get back to you.

Google Form