AI Engineer Roadmap for Beginners
AI Engineer
Following is the roadmap to learning AI Engineer (also known as ML Engineer) skills for a total beginner. It includes learning resources for technical skills (or tool skills) and soft (or core) skills Prerequisites: You must have skills or interests to build skills in Coding and Math. Without these two you cannot become an AI engineer
AI Engineer = Data Scientist + Software Engineer
- Python
- SQL
- DSA&Git and Github
- Pandas&EDA
- Machine Learning
- Deep Learning
- NLP or computer vision
- ML Ops
- Computer Science Fundamentals
- Math and statistics
- Communication
- Business understanding
Computer Science Fundamentals
- Data representation: Bits and Bytes, Storing text and numbers, Binary number system.
- Basics of computer networks, IP addresses, Internet routing protocol.
- UDP, TCP, HTTP, and The World Wide Web o
- Programming basics: variables, strings, and numbers, if condition, loops
- Algorithm basics
Beginners Python
- Variables, Numbers, Strings
- Lists, Dictionaries, Sets, Tuples
- If condition, for loop
- Functions, Lambda Functions
- Modules (pip install)
- Read, Write files
- Exception handling
- Classes, Objects
Data Structures and Algorithms in Python
- Data structures basics, Big Onotation
- Data structures: Arrays, Linked List, Hash Table, Stack, Queue
- Data structures: Tree, Graph
- Algorithms: Binary search, Bubble sort, quick sort, merge sort
- Recursion
Advance Python
- Inheritance, Generators, Iterators
- List Comprehensions, Decorators
- Multithreading, Multiprocessing
Version Control (Git, Github)
- What is the version control system? What is Git and GitHub?
- Basic commands: add, commit, push
- Branches, reverting change, HEAD, Diff and Merge
- Pull requests.
SQL
- Basics of relational databases
- Basic Queries: SELECT, WHERE LIKE, DISTINCT, BETWEEN, GROUP BY, ORDER BY
- Advanced Queries: CTE, Subqueries, Window Functions
- Joins: Left, Right, Inner, Full
- Database creation, indexes, stored procedures.
Data Visualization
- Numpy
- Pandas
- Data Visualization
- Matplotlib
- Seaborn
Math & Statistics for AI
- Basics: Descriptive vs inferential statistics, continuous vs discrete data, nominal vs ordinal data
- Linear Algebra: Vectors, Metrices, Eigenvalues and Eigenvectors
- Calculus: Basics of integral and differential calculus
- Basic plots: Histograms, pie charts, bar charts, scatter plot etc.
- Measures of central tendency: mean, median, mode
- Measures of dispersion: variance, standard deviation
- Probability basics
- Distributions: Normal distribution
- Correlation and covariance
- Central limit theorem
- Hypothesis testing: p value, confidence interval, type 1 vs type 2 error, Z test
Exploratory Data Analysis (EDA)
- Exploratory Data Analysis (EDA)
- https://www.kaggle.com/code searchQuery=exploratory+data+analysis
- Use the above link to search for exploratory data analysis notebooks. Practice EDA using at least 3 datasets.
- https://www.kaggle.com/datasets/rishabhkarn/ipl-auction2023/data
Machine Learning
Machine Learning: Preprocessing
- Handling NA values, outlier treatment, data normalization
- One hot encoding, label encoding
- Feature engineering
- Train test split
- Cross validation
Machine Learning: Model Building
- Types of ML: Supervised, Unsupervised
- Supervised: Regression vs Classification
Linear models
- Linear regression, logistic
regression - Gradient descent
Nonlinear models (tree-based models)
- Decision tree
- Random forest
- XGBoost
Model evaluation
- Regression: Mean Squared Error, Mean Absolute Error, MAPE
- Classification: Accuracy, Precision-Recall, F1 Score, ROC Curve, Confusion matrix
- Hyperparameter tuning: GridSearchCV, Random SearchCV
- Unsupervised: K means, Hierarchical clustering, Dimensionality reduction (PCA)
ML Ops
- What is API, FastAPI for Python server development
- DevOps Fundamentals: CI/CD pipelines, containerization (Docker, Kubernetes)
- Familiarity with at least one cloud platform (AWS, Azure etc.)
Machine Learning Projects with Deployment
You need to finish two end to end ML projects. One on Regression, the other on Classification •
Regression Project: Bangalore property price prediction
YouTube playlist link:
https://bit.ly/3ivycWr
Project covers following
- Data cleaning
- Feature engineering
- Model building and hyper parameter tuning
- Write flask server as a web backend
- Building website for price prediction
- Deployment to AWS
Classification Project: Sports celebrity image classification
YouTube playlist link:
https://bit.ly/3ioaMSU
Project covers following
- Data collection and data cleaning
- Feature engineering and model training
- Flask server as a web backend
- Building website and deployment
Deep Learning
Topics
- What is a neural network? Forward propagation, back propagation
- Building multilayer perceptron Special neural network architecture
- Convolutional neural network (CNN)
- Sequence models: RNN, LSTM
NLP or Computer Vision & GenAI
Many AI engineers choose a specialized track which is either NLP or Computer vision. You don’t need to learn both.
Natural Language Processing (NLP)
- Regex
- Text presentation: Count vectorizer, TF-IDF, BOW, Word2Vec, Embeddings
- Text classification: Naïve Bayes
- Fundamentals of Spacy & NLTP library
- One end to end project
Computer Vision (CV)
- Basic image processing techniques: Filtering, Edge Detection, Image Scaling, Rotation
- Library to use: OpenCV
- Convolutional Neural Networks (CNN) – Already covered in deep learning.
- Data preprocessing, augmentation – Already covered in deep learning.
LLM & Langchain
Topics
- What is LLM, Vector database, Embeddings
- RAG (Retrieval Augmented Generation)
- Langchain framework
Core Skills and Job Preparation
Create a professional-looking LinkedIn profile
Linkedin ▪ Start following prominent AI influencers
Increase engagement
- Start commenting meaningfully on AI and career-related posts
- Helps network with others working in the industry build connections
- Learning and brainstorming opportunity
- Remember online presence is a new form of resume
Business Fundamentals - Soft Skill
Learn business concepts from ThinkSchool and other YT Case Studies
Discord
Start asking questions and get help from the community.
This post shows how to ask questions the right way: https://bit.ly/3I70EbI
Core/Soft Skills
- Project Management
- Scrum: https://scrumtrainingseries.com/
- Kanban: https://youtu.be/jf0tlbt9lx0
- Tools: JIRA, Notion
ATS Resume Preparation
ATS Resume Preparation
- Resumes are dying but not dead yet. Focus more on online presence.
- Here is the resume tips video along with some templates you can use for your data analyst resume:
https://www.youtube.com/watch?v=buQSI8NLOMw - Use this checklist to ensure you have the right ATS Resume
Portfolio Building Resources:
You need a portfolio website in 2024. You can build your portfolio by using these free resources.
GitHub
Upload your projects with code on github and using github.io create a portfolio website
Sample portfolio website:
Linktree
Helpful to add multiple links in one page.
- ATS friendly resume preparation
- Linkedin optimization
- Certificate of course completion
- Online community access where jobs are posted
- Interview Questions and answers
- Bootcamp project
- Resume & Project Related Documents