Data Engineer Roadmap

Chapter 1: Introduction to Python
  • Introduction to Python
  • Type Casting and Strings
  • Operators and Conditional Statements
  • Loops in Python
  • Functions (Inbuilt & User Defined)
Chapter 2: Python Data Handling
  • Data Structures
  • Fundamentals of Classes
  • File Handling
  • Exception Handling
  • Regex

Chapter 3: Data Analysis with Python

  • Numpy
  • Pandas
  • Matplotlib & Seaborn
  • Exploratory Data Analysis
  • Data Cleaning

Chapter 4: SQL Basics

  • SQL Introduction
  • CRUD Operations
  • Group Functions
  • Join Queries
  • Sub Queries & Other Functions
  • Table Constraints

Chapter 5: Advanced SQL

  • Advanced Queries on University Schema
  • Cursors Method
  • Views in SQL
  • Transactions
  • Windows Function
  • CTE (Common Table Expressions)
  • String Transformation and Regex
  • Date Time Manipulation
  • Data Modeling

Chapter 6: Data Structure & Algorithm

  • Arrays
  • Algorithms
  • Linked List
  • Stacks and Queues
  • Recursion
  • Trees
  • Graphs
  • Dynamic Programming
  • Designing NLP Algorithms

Chapter 7: Big Data Introduction

  • What is Big Data, Requirements?
  • Monolithic vs Distributed Systems
  • What is Hadoop, HDFS?
  • What are Hadoop Components, What is YARN?
  • Node vs Cluster
  • Linux File System & Commands
  • HDFS Commands

Chapter 8: Scala Basics

  • Importance of Scala
  • Var vs Val
  • Type Inference
  • Data Types in Scala
  • Strong Interpolation & Comparison
  • If-Else Conditional
  • Match Case
  • For & While Loop

Chapter 9: Hive & HBase

  • Introduction to Hive
  • Fundamentals of HBase

Chapter 10: Spark

  • Introduction to Apache Spark
  • Spark RDD
  • Spark SQL

Chapter 11: Big Data on AWS

  • Airflow
  • Sqoop
  • AWS Elastic Map Reduce
  • AWS Athena

Chapter 12: Azure Databricks

  • Introduction to Databricks
  • Data Handling in Azure Databricks
  • Data Processing in Azure Databricks

Chapter 13: Google Cloud Platform

  • Introduction to GCP
  • BigQuery
  • Pub/Sub