Roadmap cho Data Scientist và Machine Learning Engineer

1. Linear Algebra
- Beginner
- Intermediate
- Advanced
- Vectors
- Matrices
- Transpose of a matrix
- Inverse of a matrix
- Determinant of a matrix
- Trace of a matrix
- Dot product
- Eigenvalues
- Eigenvectors
- Singular Value Decomposition
- Principal Component Analysis
- Locality Sensitive Hashing
- Distances, Similarity
- Least squares solutions
- Non-negative matrix factorization
- Factor Analysis
- Graphs and Networks
- Markov matrices
- Fourier matrix
- Fast Fourier Transform
2. Statistics
- Beginner
- Intermediate
- Advanced
- Analyzing categorical data
- Displaying and comparing quantitative data
- Summarizing quantitative data
- Modeling data distributions
- Exploring bivariate numerical data
- Study design
- Counting, permutations, and combinations
- Sampling distributions
- Gaussian distribution
- Confidence intervals
- Significance tests (hypothesis testing)
- Inference for categorical data (chi-square tests)
- Analysis of variance (ANOVA)
- Two-sample inference for the difference between groups
- Advanced regression (inference and transforming)
- Effect Sizes, Cohen's d
- P-Curve Analysis
- Causal Inference
- Binomial distribution, poisson distribution
- Benford's law
- Gamma Distribution,
- Beta Distribution
- Latent Dirichlet Allocation
- Latent Semantic Analysis
- A/B Testing
- Simpson's paradox
3. Probability
- Beginner
- Intermediate
- Advanced
- Basic theoretical probability
- Probability using sample spaces
- Basic set operations
- Experimental probability
- Addition rule
- Multiplication rule for independent events
- Multiplication rule for dependent events
- Conditional probability and independence
- Randomness, probability, and simulation
- Likelihood
- Bayes Rule
- Markov Chain, Hidden Markov Model
- Gaussian Mixture Model
- Binominal Mixture Model
- Maximum Likelihood Estimation
4. Calculus and Optimization
- Beginner
- Intermediate
- Advanced
- Derivative
- Find minimum, maximum
- Multivariable Functions
- Partial differentiation
- Exponential function, Exponential decay
- Logarithmic Functions
- Distance Measurement
- Local vs global optimization
- Constrained vs unconstrained optimization
- Convex vs nonconvex optimization
- Smooth vs nonsmooth optimization
- Linear programing
- First-order methods, Second-order methods
- Constrained optimization:Lagrange multipliers, Linear programing, Quadratic programming
5. Computer Sicence
- Beginner
- Intermediate
- Advanced
- List, Stack, Queue
- Hash function, Hash table
- Sorting: Intersection Sort, Selection Sort, Bubble Sort Quicksort
- Binary Tree, Trie
- Binary Search
- Recursion
- LinkedList, Priority Queue
- Sorting: Mergesort, Heapsort, External Sort
- Indexing, TF-IDF
- Complexity of algorithms
- Segment Tree
- Dynamic Programming
- Information theory
- Heuristic
- Search algorithm: BM25, Pagerank
- P, NP, NP Complete and NP Hard
6. Programming & Deployment
- Beginner
- Intermediate
- Advanced
- Variable
- If Else
- Loop
- Operator
- Function
- String
- Unit test
- Object Oriented Programming
- Pointer
- Multithreading
- Multiprocessing
- Memory Profiling
- Web service application
- Schedule
- Design Pattern
- Microservice
- Docker
- MLFlow
- Airflow
- Cache (Redis)
7. Database & Bigdata
- Beginner
- Intermediate
- Advanced
- MySQL
- Postgresql
- MSServer
- MongoDB
- Spark
- HBase, Hive
- Neo4J
- Kafka
- ElasticSearch
8. Machine learning
- Beginner
- Intermediate
- Advanced
- Machine Learning: How to use API, library
- Overfitting, Underfitting
- Regularization
- Simple Evaluation Metrics: MAE, MSE, RMSE, MAPE. Confusion Matrix, Precision, Recall, Accuracy, F1-Score, ROC-AUC
- Imbalanced Data Handling
- Loss function
- Missing value Handling
- Feature Engineering, Feature Selection
- Cross validation
- Machine Learning algorithm: Algorithms
- Boosting and Bagging
- Resampling
- Tree-based model: LightGBM, CatBoost, XGboost
- Ensemble Model, Stacking Model
- Evaluation metrics: Fbeta-score, Pearson, Spearman
- Dimensionality Reduction Techniques
- Basic GridSearchCV, RandomizedSearchCV
- Error Analysis
- Data Labeling
- Data Shift
- Active Learning
- Information Retrieval
- Optimization: Gradient Decent, Stochastic gradient descent
- Debugging ML Models
- How to choose the right Evaluation Metrics
- Avanced Hyperparameters Optimization:Hyperopt, optuna
9. Computer Vision
- Beginner
- Intermediate
- Advanced
- Image Processing: OpenCV
- Convolution, Maxpooling
- Histogram of oriented gradients (HOG)
- Image Classification(SVM)
- CNN, VGG-16, VGG-19, ResNet...
- OCR, Object Detection, Face Recognition
- Triplet loss, Siamese network
- One-shot Learning, Few-shot learning
- Evaluation Metrics: IOU
- MobileNet, Efficient Net
- Style Transfering
- Image Segmentation
- Generative Adversarial Networks
- Computer Vision on the Edge
10. Natural Language Processing
- Beginner
- Intermediate
- Advanced
- Text Processing, Regex
- Tokenizer, Stemming, Lemmatization
- N-Grams
- Parts of Speech Tag (POS Tag)
- Language Model, Probability model
- TF-IDF, BM25
- Text Classification (SVM, Logistic Regression, Naive Bayes)
- Word Embedding
- Topic Modeling
- Named Entity Recognition
- Sequential Model: RNN, GRU, LSTM
- Text Similarity
- Seq2Seq, Attention, Transformer
- Beam Search
- Machine Translation
- Question Answering
- Generative Pre-trained Transformer
- Evaluation Metrics: Mean Reciprocal Rank, BLEU, ROUGE
- Large Language Modeling
11. Deep Learning
- Beginner
- Intermediate
- Advanced
- Batch size
- Tensor, Cuda
- Dropout
- Normalization, Regularization
- Vanishing & Exploding Gradients
- Activation function : Sigmoid, ReLU, SeLU, Tanh
- Backpropagation Algorithm
- Optimizers: adagrad, adam, RMSprop
- Transfer learning
- Fine-tuning
- Graph Neural Network
- Multitask learning
12. Time Series Analysis
- Beginner
- Intermediate
- Advanced
- Definition
- Time Series Patterns: Trend, Seasonal, Cyclic
- Time Series Decomposition: Trend, Seasonal, Residual
- Stationary Time Series
- Autocorrelation,Partial Auto-Correlation Autoregression
- Smoothing Time Series
- ARMA, ARIMA, SARIMA, GARCH, ARCH
- White noise
- Random walk
- Timeseries Forecasting
- Judgmental forecasts
- Anomaly Detection
- Use Tree-based model
- Forecasting hierarchical or grouped time series
- Use Squential Model
13. Recommender System
- Beginner
- Intermediate
- Advanced
- Not required
- Ranking Evaluation Metric: MAP@K, NCDG
- Collaborative filtering
- Item-Similarity, User-Similarity
- Implicit Feedback, Explicit Feedback
- Matrix Factorization
- Cold-start Problem
- Temporal recommendation system
- Deep Neural Network Models for Recommendation
Follow Fanpage của mi ̀nh để nhận được các bài viết mới nhất nhé!!
https://www.facebook.com/datasciencedances/