11 KiB
Study Plan
This repository contains checklists to prepare for software engineering and machine learning interviews and jobs.
The Plan
Tracks
We are following two tracks:
 Software Engineering Track
 Machine Learning Track
Software engineering track:
 Paper and pencil working out algorithms
 Wiki: distilled, polished notes
 Git: toto list for topics
 Git: code practice
 Flashcards
Machine learning track:
 Paper and pencil notes (rough), problems (working out), thinking
 Note: following Alpaydin book, working through problems
 Wiki: distilled, polished notes and learnings
 Summary of major concepts
 Answers/examples worked out more clearly
 Fast notes, for studying, not presentation, so snap photos and upload
 Git: todo list for topics
 Git: code practice
 Flashcards
Daily Plan
Each day:
 Pick one subject from the list.
 Watch videos on the topic.
 Implement the concept in Java or Python.
 Optionally, implement in C (and/or in C++, with or without the stdlib).
 Write tests to ensure code is correct.
 Create flashcards
After one week:
 Revisit and review
Long term strategy:
 Practice coding until you are sick of it.
 Add flashcards
 Work within limited constraints (think interviews).
 Know the builtin types.
Code:
Practice writing out on a whiteboard and/or on paper, before implementing on computer. Get a big drawing pad from the art store.
See checklist below for the checklist of completed tasks.
Software Engineering
Software Engineering: The Basics
Topics to review so you don't get weeded out.
Five essential screening questions:
 Coding  writing simple code with correct syntax (C, C++, Java).
 Object Oriented Design  basic concepts, class models, patterns.
 Scripting and Regular Expressions  know your Unix tooling.
 Data Structures  demonstrate basic knowledge of common data structures.
 Bits and Bytes  know about bits, bytes, and binary numbers.
Things you absolutely, positively must know:
 Algorithm complexity
 Sorting  know how to sort, know at least 2 O(n log n) sort methods (merge sort and quicksort)
 Hashtables  the most useful data structure known to humankind.
 Trees  this is basic stuff, BFS/DFS, so learn it.
 Graphs  twice as important as you think they are.
 Other Data Structures  fill up your brain with other data structures.
 Math  discrete math, combinatorics, probability.
 Systems  operating system level, concurrency, threads, processing, memory.
Software Engineering: The Full Topics List
A much longer and fuller list of topics:

Algorithm complexity

Data structures
 Arrays
 Linked lists
 Stacks
 Queues
 Hash tables
 Trees
 Binary search trees
 Heap trees
 Priority queues
 Balanced search trees
 Tree traversal: preorder, inorder, postorder, BFS, DFS
 Graphs
 Directed
 Undirected
 Adjacency matrix
 Adjacency list
 BFS, DFS
 BuiltIn Data Structures
 Java Collections
 C++ Standard Library
 Sets
 Disjoint Sets
 Union Find
 Advanced Tree Structures
 RedBlack Trees
 Splay Trees
 AVL Trees
 kD Trees
 Van Emde Boas Trees
 Nary, Kary, Mary Trees
 Balanced Search Trees
 23 Trees, 24 Trees
 Augmented Data Structures

Algorithms
 NP, NPComplete, Approximation Algorithms
 Searching
 Sequential search
 Binary search
 Sorting
 Selection
 Insertion
 Heapsort
 Quicksort
 Merge sort
 String algorithms
 String search methods
 String manipulation methos
 Recursion
 Dynamic programming
 Computational Geometry
 Convex Hull

Object Oriented Programming
 Design patterns

Bits and Bytes

Mathematics
 Combinatorics
 Probability
 Linear Algebra
 FFT
 Bloom Filter
 HyperLogLog

Systems Level Programming
 Processing and threads
 Caching
 Memory
 System routines
 Messaging Systems
 Serialization
 Queue Systems

Scaling
 Parallel Programming
 Systems Deisng
 Scalability
 Data Handling

Crypto and Security
 Information Theory
 Parity and Hamming Code
 Entropy
 Hash Attacks

Unix
 Kernel Basics
 Command Line Tools
 Emacs/Vim

Supplemental topics
 Unicode
 Garbage Collection
 Networking
 Compilers
 Compression
 Endianness
Software Engineering: The ToDo List

Arrays

Linked lists

Stacks

Queues

Hash tables

Trees: binary search trees

Trees: heap trees

Trees: priority queues

Trees: balanced search trees

trees: red black trees

Trees: tree traversal

Graphs: directed and undirected

Graphs: graph <> adjacency matrix/list

Graphs: BFS, DFS

Algorithms: NP, NPComplete, Approximation

Search Algorithms: Sequential search

Search Algorithms: Binary search

Algorithms: Selection sort

Algorithms: Merge sort

Algorithms: Quick sort

Algorithms: Heap sort

Algorithms: String search methods

Algorithms: Recursion

Algorithms: Dynamic programming

Algorithms: Convex hull

Algorithms: Computational geometry

Bits and Bytes

Mathematics: Combinatorics

Mathematics: Probability

Mathematics: Linear Algebra (computational)

Mathematics: FFT

Systems: Processing and Threads

Systems: Caching

Systems: Memory

Systems: System routines

Systems: Messaging systems

Systems: Serialization

Systems: Queue systems

Scaling: Systems design

Scaling: Scalability

Scaling: Data handling

Parallel programming: Basic concepts/algorithms

Crypto Security: Information Theory

Crypto Security: Parity and Hamming Code

Crypto Security: Entropy

Crypto Security: Birthday/Hash Attacks

Crypto Security: Public Key Cryptography Math

Supplemental: Unicode

Supplemental: Garbage Collection

Supplemental: Networking

Supplemental: Compilers

Supplemental: Compression

Supplemental: Endianness
Machine Learning
Machine Learning: The Basics
Topics to review so you don't get weeded out.
 Supervised learning
 Unsupervised learning
 Semisupervised learning
 Modeling business decisions usually uses supervised and unsupervised learning.
 Classification and regression are the most commonly seen machine learning models.
Machine Learning: The Full Topics List
A longer, fuller list of topics:

Regression
 Modeling relationship between variables, iteratively refined using an error measure.
 Linear Regression
 Logistic Regression
 OLS (Ordinary Least Squares) Regression
 Stepwise Regression
 MARS (Multivariate Adaptive Regression Splines)
 LOESS (Locally Estimated Scatterplot Smoothing)

Instance Based
 Build up database of data, compare new data to database; winnertakeall or memorybased learning.
 kNearest Neighbor
 Learning Vector quantization
 SelfOrganizing Map
 Localy Weighted Learning

Regularization
 Extension made to other methods, penalizes model complexity, favors simpler and more generalizable models.
 Ridge Regression
 LASSO (Least Absolute Shrinkage and Selection Operator)
 Elastic Net
 LARS (Least Angle Regression)

Decision Tree
 Construct a model of decisions made on actual values of attributes in the data.
 Classification and Regression Tree
 CHAID (ChiSquared Automatic Interaction Detection)
 Conditional Decision Trees

Bayesian
 Methods explicitly applying Bayes' Theorem for classification and regression problems.
 Naive Bayes
 Gaussian Naive Bayes
 Multinomial Naive Bayes
 Bayesian Network
 BBN (Bayesian Belief Network)

Clustering
 Centroidbased and hierarchical modeling approaches; groups of maximum commonality.
 kMeans
 kMedians
 Expectation Maximization
 Hierarchical Clustering

Association Rule Algorithms
 Extract rules that best explain relationships between variables in data.
 Apriori algorithm
 Eclat algorithm

Neural Networks
 Inspired by structure and function of biological neural networks, used ofr regression and classification problems.
 Radial Basis Function Network (RBFN)
 Perceptron
 BackPropagation
 Hopfield Network

Deep Learning
 Neural networks that exploit cheap and abundant computational power; semisupervised, lots of data.
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 LongShortTerm Memory Network (LSTM)
 Deep Boltzmann Machine (DBM)
 Deep Belief Network (DBN)
 Stacked AutoEncoders

Dimensionality Reduction
 Find inherent structure in data, in an unsupervised manner, to describe data using less information.
 PCA
 tSNE
 PLS (Partial Least Squares Regression)
 Sammon Mapping
 Multidimensional Scaling
 Projection Pursuit
 Principal Component Regression
 Partial Least Squares Discriminant Analysis
 Mixture Discriminant Analysis
 Quadratic Discriminant Analysis
 Regularized Discriminant Analysis
 Linear Discriminant Analysis

Ensemble
 Models composed of multiple weaker models, independently trained, that provide a combined prediction.
 Random Forest
 Gradient Boosting Machines (GBM)
 Boosting
 Bootstrapped Aggregation (Bagging)
 AdaBoost
 Stacked Generalization (Blending)
 Gradient Boosted Regression Trees
Machine Learning: The ToDo List
Have 1 repo, with a github pages.
HTML landing page with info about each topic.
Notebook for each overarching topic, split into multiple notebooks as needed. For exmaple, a notebook to compare ridge and lasso.

Regression: linear regression

Regression: logistic regression

Regression: OLS regression

Regression: Stepwise regressoin

Regression: MARS

Regression: LOESS

Instance: kNearest Neighbor

Instance: Learning Vector Quantization

Instance: SelfOrganizing Map

Instance: Locally Weighted Learning

Regularization: Ridge regression

Regularization: LASSO

Regularization: Elastic net

Regularization: LARS

Decision Tree: classification tree

Decision Tree: CHAID

Decision Tree: conditional decision trees

Bayesian: LARS

Bayesian: Naive Bayes

Bayesian: Gaussian Bayes

Bayesian: Multinomial Naive Bayes

Bayesian: Bayesian Network

Bayesian: Bayesian Belief Network

Clustering: kMeans

Clustering: kMedians

Clustering: Expectation Maximization

Clustering: Hierarchical Clustering

Dimensionality Reduction: PCA

Dimensionality Reduction: tSNE

Dimensionality Reduction: PLS

Dimensionality Reduction: Multidimensional Scaling

Dimensionality Reduction: Principal Component Regression

Dimensionality Reduction: Discriminant Analyses

Association Rule: Apriori algorithm

Deep Learning: CNN

Deep Learning: RNN

Deep Learning: LSTM

Deep Learning: DBM

Deep Learning: DBN

Deep Learning: Stacked AutoEncoders