Data Engineer Roadmap
Chapter 1: Introduction to Python
- Introduction to Python
- Type Casting and Strings
- Operators and Conditional Statements
- Loops in Python
- Functions (Inbuilt & User Defined)
Chapter 2: Python Data Handling
- Data Structures
- Fundamentals of Classes
- File Handling
- Exception Handling
- Regex
Chapter 3: Data Analysis with Python
- Numpy
- Pandas
- Matplotlib & Seaborn
- Exploratory Data Analysis
- Data Cleaning
Chapter 4: SQL Basics
- SQL Introduction
- CRUD Operations
- Group Functions
- Join Queries
- Sub Queries & Other Functions
- Table Constraints
Chapter 5: Advanced SQL
- Advanced Queries on University Schema
- Cursors Method
- Views in SQL
- Transactions
- Windows Function
- CTE (Common Table Expressions)
- String Transformation and Regex
- Date Time Manipulation
- Data Modeling
Chapter 6: Data Structure & Algorithm
- Arrays
- Algorithms
- Linked List
- Stacks and Queues
- Recursion
- Trees
- Graphs
- Dynamic Programming
- Designing NLP Algorithms
Chapter 7: Big Data Introduction
- What is Big Data, Requirements?
- Monolithic vs Distributed Systems
- What is Hadoop, HDFS?
- What are Hadoop Components, What is YARN?
- Node vs Cluster
- Linux File System & Commands
- HDFS Commands
Chapter 8: Scala Basics
- Importance of Scala
- Var vs Val
- Type Inference
- Data Types in Scala
- Strong Interpolation & Comparison
- If-Else Conditional
- Match Case
- For & While Loop
Chapter 9: Hive & HBase
- Introduction to Hive
- Fundamentals of HBase
Chapter 10: Spark
- Introduction to Apache Spark
- Spark RDD
- Spark SQL
Chapter 11: Big Data on AWS
- Airflow
- Sqoop
- AWS Elastic Map Reduce
- AWS Athena
Chapter 12: Azure Databricks
- Introduction to Databricks
- Data Handling in Azure Databricks
- Data Processing in Azure Databricks
Chapter 13: Google Cloud Platform
- Introduction to GCP
- BigQuery
- Pub/Sub