• Course overview
  • Course details
  • Prerequisites

Course overview

About this course

Data science is an applied study of data for statistical analysis and problem solving. This path of courses covers the data science pipeline needed by the everyday data scientist: data wrangling, analysis, machine learning, and communication and visualization.

Audience profile

Individuals with some programming and math experience working toward implementing data science in their everyday work.

At course completion

After completing this course, students will be able to:

  • Data Science Overview
  • Data Gathering
  • Data Filtering
  • Data Transformation
  • Data Exploration
  • Data Integration
  • Data Analysis Concepts
  • Data Classification and Machine Learning
  • Data Communication and Visualization

Show More Show Less

Course details

Data Science Overview

Module 1: Defining Data Science

  • What is Data Science?
  • What is Data Wrangling?
  • What is Big Data?
  • What is Machine Learning?
  • Implementing Data Science

Module 2: Data Science Terminology

  • Data Communication
  • Data Science Pipeline
  • Data Science Tools

Data Gathering

Module 3: Data Extraction

  • Basic Data Gathering
  • Gathering Web Data
  • Extracting Spreadsheet Data with in2csv
  • Extracting Spreadsheet Data with Agate
  • Extracting Legacy Data from dBASE Tables
  • Extracting HTML Data

Module 4: Metadata

  • Gathering Metadata
  • Working with HTTP Headers
  • Working with Linux Log Files
  • Working with Email Headers

Module 5: Remote Data

  • Connecting to Remote Data
  • Copying Remote Data
  • Synchronizing Remote Data

Data Filtering

Module 6: Introduction to Data Filtering

  • Data Filtering Techniques and Tools
  • Processing Date Formats
  • Filtering HTTP Headers
  • Filtering CSV Data
  • Replacing Values with sed
  • Dropping Duplicate Data
  • Working with JPEG Headers
  • Filtering PDF Files
  • Filtering for Invalid Data
  • Parsing robots.txt

Data Transformation

Module 7: File Format Conversions

  • Converting CSV to JSON
  • Converting XML to JSON
  • Converting CSV to SQL
  • Converting SQL to CSV
  • Changing CSV Delimiters

Module 8: Data Conversions

  • Converting Dates
  • Converting Numbers
  • Rounding Numbers

Module 9: Optical Character Recognition

  • OCR JPEG Images
  • Extracting Text from PDF Files

Data Exploration

Module 10: Introduction to Data Exploration

  • Exploring CSV Data
  • Exploring CSV Statistics
  • Querying CSV Data
  • Plotting from the Command Line
  • Counting Words
  • Exploring Directory Trees
  • Determining Word Frequencies
  • Taking Random Samples
  • Finding the Top Rows
  • Finding Repeated Records
  • Identifying Outliers in Data

Data Integration

Module 11: Introduction to Data Integration

  • Joining CSV Data
  • Concatenating Log Files
  • Sorting Text Files
  • Merging XML Data
  • Aggregating Data
  • Normalizing Data
  • Denormalizing Data
  • Pivoting Data Tables
  • Homogenizing Rows

Data Analysis Concepts

Module 12: Data Science Math

  • Basic Data Science Math
  • Linear Algebra Vector Math
  • Linear Algebra Matrix Math
  • Linear Algebra Matrix Decomposition

Module 13: Data Analysis Concepts

  • Data Formation
  • Introduction to Probability
  • Working with Events
  • Working with Probability
  • Continuous Probability Distributions
  • Discrete Probability Distributions
  • Introduction to Bayes Theorem

Module 14: Estimates and Measures

  • Sampling Data
  • Statistical Measures
  • Estimators
  • Sampling Distributions
  • Confidence Intervals
  • Hypothesis Tests
  • Chi-Square

Data Classification and Machine Learning

Module 15: Machine Learning Introduction

  • Introduction to Supervised Learning
  • Introduction to Unsupervised Learning
  • Understanding Linear Regression
  • Working with Predictors

Module 16: Regression and Classification

  • Understanding Logistic Regression
  • Understanding Dummy Variables
  • Using Naïve Bayes Classification
  • Working with Decision Trees

Module 17: Clustering

  • K-means Clustering
  • Using Cluster Validation
  • Using Principle Component Analysis

Module 18: Errors and Validation

  • Introduction to Errors
  • Defining Underfitting
  • Defining Overfitting
  • Using K-folds Cross Validation
  • Using Neural Networks
  • Support Vector Machines (SVM)

Data Communication and Visualization

Module 19: Introduction to Data Communication

  • Effective Communication and Visualization
  • Correlation Versus Causation
  • Simpson’s Paradox
  • Presenting Data
  • Documenting Data Science
  • Visual Data Exploration

Module 20: Plotting

  • Creating Scatter Plots
  • Plotting Line Graphs
  • Creating Bar Charts
  • Creating Histograms
  • Creating Box Plots
  • Creating Network Visualizations
  • Creating a Bubble Plot
  • Creating Interactive Plots

Show More Show Less

Prerequisites

No prerequisites

Our Technology Partners

Spectrum Networks is the Authorised Learning Partner for some of the leaders in IT technology for Digital Transformation