Computer science study plan.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

11 KiB

Study Plan

This repository contains checklists to prepare for software engineering and machine learning interviews and jobs.

The Plan

Tracks

We are following two tracks:

  • Software Engineering Track
  • Machine Learning Track

Software engineering track:

  • Paper and pencil working out algorithms
  • Wiki: distilled, polished notes
  • Git: to-to list for topics
  • Git: code practice
  • Flashcards

Machine learning track:

  • Paper and pencil notes (rough), problems (working out), thinking
    • Note: following Alpaydin book, working through problems
  • Wiki: distilled, polished notes and learnings
    • Summary of major concepts
    • Answers/examples worked out more clearly
    • Fast notes, for studying, not presentation, so snap photos and upload
  • Git: to-do list for topics
  • Git: code practice
  • Flashcards

Daily Plan

Each day:

  • Pick one subject from the list.
  • Watch videos on the topic.
  • Implement the concept in Java or Python.
  • Optionally, implement in C (and/or in C++, with or without the stdlib).
  • Write tests to ensure code is correct.
  • Create flashcards

After one week:

  • Revisit and review

Long term strategy:

  • Practice coding until you are sick of it.
  • Add flashcards
  • Work within limited constraints (think interviews).
  • Know the built-in types.

Code:

Practice writing out on a whiteboard and/or on paper, before implementing on computer. Get a big drawing pad from the art store.

See checklist below for the checklist of completed tasks.

Software Engineering

Software Engineering: The Basics

Topics to review so you don't get weeded out.

Five essential screening questions:

  • Coding - writing simple code with correct syntax (C, C++, Java).
  • Object Oriented Design - basic concepts, class models, patterns.
  • Scripting and Regular Expressions - know your Unix tooling.
  • Data Structures - demonstrate basic knowledge of common data structures.
  • Bits and Bytes - know about bits, bytes, and binary numbers.

Things you absolutely, positively must know:

  • Algorithm complexity
  • Sorting - know how to sort, know at least 2 O(n log n) sort methods (merge sort and quicksort)
  • Hashtables - the most useful data structure known to humankind.
  • Trees - this is basic stuff, BFS/DFS, so learn it.
  • Graphs - twice as important as you think they are.
  • Other Data Structures - fill up your brain with other data structures.
  • Math - discrete math, combinatorics, probability.
  • Systems - operating system level, concurrency, threads, processing, memory.

Software Engineering: The Full Topics List

A much longer and fuller list of topics:

  • Algorithm complexity

  • Data structures

    • Arrays
    • Linked lists
    • Stacks
    • Queues
    • Hash tables
    • Trees
      • Binary search trees
      • Heap trees
      • Priority queues
      • Balanced search trees
      • Tree traversal: preorder, inorder, postorder, BFS, DFS
    • Graphs
      • Directed
      • Undirected
      • Adjacency matrix
      • Adjacency list
      • BFS, DFS
    • Built-In Data Structures
      • Java Collections
      • C++ Standard Library
    • Sets
      • Disjoint Sets
      • Union Find
    • Advanced Tree Structures
      • Red-Black Trees
      • Splay Trees
      • AVL Trees
      • k-D Trees
      • Van Emde Boas Trees
      • N-ary, K-ary, M-ary Trees
      • Balanced Search Trees
      • 2-3 Trees, 2-4 Trees
    • Augmented Data Structures
  • Algorithms

    • NP, NP-Complete, Approximation Algorithms
    • Searching
      • Sequential search
      • Binary search
    • Sorting
      • Selection
      • Insertion
      • Heapsort
      • Quicksort
      • Merge sort
    • String algorithms
      • String search methods
      • String manipulation methos
    • Recursion
    • Dynamic programming
    • Computational Geometry
      • Convex Hull
  • Object Oriented Programming

    • Design patterns
  • Bits and Bytes

  • Mathematics

    • Combinatorics
    • Probability
    • Linear Algebra
    • FFT
    • Bloom Filter
    • HyperLogLog
  • Systems Level Programming

    • Processing and threads
    • Caching
    • Memory
    • System routines
    • Messaging Systems
    • Serialization
    • Queue Systems
  • Scaling

    • Parallel Programming
    • Systems Deisng
    • Scalability
    • Data Handling
  • Crypto and Security

    • Information Theory
    • Parity and Hamming Code
    • Entropy
    • Hash Attacks
  • Unix

    • Kernel Basics
    • Command Line Tools
    • Emacs/Vim
  • Supplemental topics

    • Unicode
    • Garbage Collection
    • Networking
    • Compilers
    • Compression
    • Endianness

Software Engineering: The To-Do List

  • Arrays

  • Linked lists

  • Stacks

  • Queues

  • Hash tables

  • Trees: binary search trees

  • Trees: heap trees

  • Trees: priority queues

  • Trees: balanced search trees

  • trees: red black trees

  • Trees: tree traversal

  • Graphs: directed and undirected

  • Graphs: graph <--> adjacency matrix/list

  • Graphs: BFS, DFS

  • Algorithms: NP, NP-Complete, Approximation

  • Search Algorithms: Sequential search

  • Search Algorithms: Binary search

  • Algorithms: Selection sort

  • Algorithms: Merge sort

  • Algorithms: Quick sort

  • Algorithms: Heap sort

  • Algorithms: String search methods

  • Algorithms: Recursion

  • Algorithms: Dynamic programming

  • Algorithms: Convex hull

  • Algorithms: Computational geometry

  • Bits and Bytes

  • Mathematics: Combinatorics

  • Mathematics: Probability

  • Mathematics: Linear Algebra (computational)

  • Mathematics: FFT

  • Systems: Processing and Threads

  • Systems: Caching

  • Systems: Memory

  • Systems: System routines

  • Systems: Messaging systems

  • Systems: Serialization

  • Systems: Queue systems

  • Scaling: Systems design

  • Scaling: Scalability

  • Scaling: Data handling

  • Parallel programming: Basic concepts/algorithms

  • Crypto Security: Information Theory

  • Crypto Security: Parity and Hamming Code

  • Crypto Security: Entropy

  • Crypto Security: Birthday/Hash Attacks

  • Crypto Security: Public Key Cryptography Math

  • Supplemental: Unicode

  • Supplemental: Garbage Collection

  • Supplemental: Networking

  • Supplemental: Compilers

  • Supplemental: Compression

  • Supplemental: Endianness

Machine Learning

Machine Learning: The Basics

Topics to review so you don't get weeded out.

  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Modeling business decisions usually uses supervised and unsupervised learning.
  • Classification and regression are the most commonly seen machine learning models.

Machine Learning: The Full Topics List

A longer, fuller list of topics:

  • Regression

    • Modeling relationship between variables, iteratively refined using an error measure.
    • Linear Regression
    • Logistic Regression
    • OLS (Ordinary Least Squares) Regression
    • Stepwise Regression
    • MARS (Multivariate Adaptive Regression Splines)
    • LOESS (Locally Estimated Scatterplot Smoothing)
  • Instance Based

    • Build up database of data, compare new data to database; winner-take-all or memory-based learning.
    • k-Nearest Neighbor
    • Learning Vector quantization
    • Self-Organizing Map
    • Localy Weighted Learning
  • Regularization

    • Extension made to other methods, penalizes model complexity, favors simpler and more generalizable models.
    • Ridge Regression
    • LASSO (Least Absolute Shrinkage and Selection Operator)
    • Elastic Net
    • LARS (Least Angle Regression)
  • Decision Tree

    • Construct a model of decisions made on actual values of attributes in the data.
    • Classification and Regression Tree
    • CHAID (Chi-Squared Automatic Interaction Detection)
    • Conditional Decision Trees
  • Bayesian

    • Methods explicitly applying Bayes' Theorem for classification and regression problems.
    • Naive Bayes
    • Gaussian Naive Bayes
    • Multinomial Naive Bayes
    • Bayesian Network
    • BBN (Bayesian Belief Network)
  • Clustering

    • Centroid-based and hierarchical modeling approaches; groups of maximum commonality.
    • k-Means
    • k-Medians
    • Expectation Maximization
    • Hierarchical Clustering
  • Association Rule Algorithms

    • Extract rules that best explain relationships between variables in data.
    • Apriori algorithm
    • Eclat algorithm
  • Neural Networks

    • Inspired by structure and function of biological neural networks, used ofr regression and classification problems.
    • Radial Basis Function Network (RBFN)
    • Perceptron
    • Back-Propagation
    • Hopfield Network
  • Deep Learning

    • Neural networks that exploit cheap and abundant computational power; semi-supervised, lots of data.
    • Convolutional Neural Network (CNN)
    • Recurrent Neural Network (RNN)
    • Long-Short-Term Memory Network (LSTM)
    • Deep Boltzmann Machine (DBM)
    • Deep Belief Network (DBN)
    • Stacked Auto-Encoders
  • Dimensionality Reduction

    • Find inherent structure in data, in an unsupervised manner, to describe data using less information.
    • PCA
    • t-SNE
    • PLS (Partial Least Squares Regression)
    • Sammon Mapping
    • Multidimensional Scaling
    • Projection Pursuit
    • Principal Component Regression
    • Partial Least Squares Discriminant Analysis
    • Mixture Discriminant Analysis
    • Quadratic Discriminant Analysis
    • Regularized Discriminant Analysis
    • Linear Discriminant Analysis
  • Ensemble

    • Models composed of multiple weaker models, independently trained, that provide a combined prediction.
    • Random Forest
    • Gradient Boosting Machines (GBM)
    • Boosting
    • Bootstrapped Aggregation (Bagging)
    • AdaBoost
    • Stacked Generalization (Blending)
    • Gradient Boosted Regression Trees

Machine Learning: The To-Do List

Have 1 repo, with a github pages.

HTML landing page with info about each topic.

Notebook for each overarching topic, split into multiple notebooks as needed. For exmaple, a notebook to compare ridge and lasso.

  • Regression: linear regression

  • Regression: logistic regression

  • Regression: OLS regression

  • Regression: Stepwise regressoin

  • Regression: MARS

  • Regression: LOESS

  • Instance: k-Nearest Neighbor

  • Instance: Learning Vector Quantization

  • Instance: Self-Organizing Map

  • Instance: Locally Weighted Learning

  • Regularization: Ridge regression

  • Regularization: LASSO

  • Regularization: Elastic net

  • Regularization: LARS

  • Decision Tree: classification tree

  • Decision Tree: CHAID

  • Decision Tree: conditional decision trees

  • Bayesian: LARS

  • Bayesian: Naive Bayes

  • Bayesian: Gaussian Bayes

  • Bayesian: Multinomial Naive Bayes

  • Bayesian: Bayesian Network

  • Bayesian: Bayesian Belief Network

  • Clustering: k-Means

  • Clustering: k-Medians

  • Clustering: Expectation Maximization

  • Clustering: Hierarchical Clustering

  • Dimensionality Reduction: PCA

  • Dimensionality Reduction: t-SNE

  • Dimensionality Reduction: PLS

  • Dimensionality Reduction: Multidimensional Scaling

  • Dimensionality Reduction: Principal Component Regression

  • Dimensionality Reduction: Discriminant Analyses

  • Association Rule: Apriori algorithm

  • Deep Learning: CNN

  • Deep Learning: RNN

  • Deep Learning: LSTM

  • Deep Learning: DBM

  • Deep Learning: DBN

  • Deep Learning: Stacked Auto-Encoders