Computer science study plan.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

6.9 KiB

Study Plan

This repository contains checklists to prepare for software engineering and machine learning interviews and jobs.

Software Engineering

Software Engineering: The Basics

Topics to review so you don't get weeded out.

Five essential screening questions:

  • Coding - writing simple code with correct syntax (C, C++, Java).
  • Object Oriented Design - basic concepts, class models, patterns.
  • Scripting and Regular Expressions - know your Unix tooling.
  • Data Structures - demonstrate basic knowledge of common data structures.
  • Bits and Bytes - know about bits, bytes, and binary numbers.

Things you absolutely, positively must know:

  • Algorithm complexity
  • Sorting - know how to sort, know at least 2 O(n log n) sort methods (merge sort and quicksort)
  • Hashtables - the most useful data structure known to humankind.
  • Trees - this is basic stuff, BFS/DFS, so learn it.
  • Graphs - twice as important as you think they are.
  • Other Data Structures - fill up your brain with other data structures.
  • Math - discrete math, combinatorics, probability.
  • Systems - operating system level, concurrency, threads, processing, memory.

Software Engineering: The Full Topics List

A much longer and fuller list of topics:

  • Algorithm complexity

  • Data structures

    • Arrays
    • Linked lists
    • Stacks
    • Queues
    • Hash tables
    • Trees
      • Binary search trees
      • Heap trees
      • Priority queues
      • Balanced search trees
      • Tree traversal: preorder, inorder, postorder, BFS, DFS
    • Graphs
      • Directed
      • Undirected
      • Adjacency matrix
      • Adjacency list
      • BFS, DFS
    • Built-In Data Structures
      • Java Collections
      • C++ Standard Library
    • Sets
      • Disjoint Sets
      • Union Find
    • Advanced Tree Structures
      • Red-Black Trees
      • Splay Trees
      • AVL Trees
      • k-D Trees
      • Van Emde Boas Trees
      • N-ary, K-ary, M-ary Trees
      • Balanced Search Trees
      • 2-3 Trees, 2-4 Trees
    • Augmented Data Structures
  • Algorithms

    • NP, NP-Complete, Approximation Algorithms
    • Searching
      • Sequential search
      • Binary search
    • Sorting
      • Selection
      • Insertion
      • Heapsort
      • Quicksort
      • Merge sort
    • String algorithms
      • String search methods
      • String manipulation methos
    • Recursion
    • Dynamic programming
    • Computational Geometry
      • Convex Hull
  • Object Oriented Programming

    • Design patterns
  • Bits and Bytes

  • Mathematics

    • Combinatorics
    • Probability
    • Linear Algebra
    • FFT
    • Bloom Filter
    • HyperLogLog
  • Crypto and Security

    • Information Theory
    • Parity and Hamming Code
    • Entropy
    • Hash Attacks
  • Unix

    • Kernel Basics
    • Command Line Tools
    • Emacs/Vim
  • Systems Level Programming

    • Processing and threads
    • Caching
    • Memory
    • System routines
    • Messaging Systems
    • Serialization
    • Queue Systems
  • Scaling

    • Parallel Programming
    • Systems Deisng
    • Scalability
    • Data Handling
  • Supplemental topics

    • Unicode
    • Endianness
    • Networking
    • Compilers
    • Compression
    • Garbage Collection

Machine Learning

Machine Learning: The Basics

Topics to review so you don't get weeded out.

  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Modeling business decisions usually uses supervised and unsupervised learning.
  • Classification and regression are the most commonly seen machine learning models.

Machine Learning: The Full Topics List

A longer, fuller list of topics:

  • Regression

    • Modeling relationship between variables, iteratively refined using an error measure.
    • Linear Regression
    • Logistic Regression
    • OLS (Ordinary Least Squares) Regression
    • Stepwise Regression
    • MARS (Multivariate Adaptive Regression Splines)
    • LOESS (Locally Estimated Scatterplot Smoothing)
  • Instance Based

    • Build up database of data, compare new data to database; winner-take-all or memory-based learning.
    • k-Nearest Neighbor
    • Learning Vector quantization
    • Self-Organizing Map
    • Localy Weighted Learning
  • Regularization

    • Extension made to other methods, penalizes model complexity, favors simpler and more generalizable models.
    • Ridge Regression
    • LASSO (Least Absolute Shrinkage and Selection Operator)
    • Elastic Net
    • LARS (Least Angle Regression)
  • Decision Tree

    • Construct a model of decisions made on actual values of attributes in the data.
    • Classification and Regression Tree
    • CHAID (Chi-Squared Automatic Interaction Detection)
    • Conditional Decision Trees
  • Bayesian

    • Methods explicitly applying Bayes' Theorem for classification and regression problems.
    • Naive Bayes
    • Gaussian Naive Bayes
    • Multinomial Naive Bayes
    • Bayesian Netowrk
    • BBN (Bayesian Belief Network)
  • Clustering

    • Centroid-based and hierarchical modeling approaches; groups of maximum commonality.
    • k-Means
    • k-Medians
    • Expectation Maximization
    • Hierarchical Clustering
  • Association Rule Algorithms

    • Extract rules that best explain relationships between variables in data.
    • Apriori algorithm
    • Eclat algorithm
  • Neural Networks

    • Inspired by structure and function of biological neural networks, used ofr regression and classification problems.
    • Radial Basis Function Network (RBFN)
    • Perceptron
    • Back-Propagation
    • Hopfield Network
  • Deep Learning

    • Neural networks that exploit cheap and abundant computational power; semi-supervised, lots of data.
    • Convolutional Neural Network (CNN)
    • Recurrent Neural Network (RNN)
    • Long-Short-Term Memory Network (LSTM)
    • Deep Boltzmann Machine (DBM)
    • Deep Belief Network (DBN)
    • Stacked Auto-Encoders
  • Dimensionality Reduction

    • Find inherent structure in data, in an unsupervised manner, to describe data using less information.
    • PCA
    • t-SNE
    • PLS (Partial Least Squares Regression)
    • Sammon Mapping
    • Multidimensional Scaling
    • Projection Pursuit
    • Principal Component Regression
    • Partial Least Squares Discriminant Analysis
    • Mixture Discriminant Analysis
    • Quadratic Discriminant Analysis
    • Regularized Discriminant Analysis
    • Linear Discriminant Analysis
  • Ensemble

    • Models composed of multiple weaker models, independently trained, that provide a combined prediction.
    • Random Forest
    • Gradient Boosting Machines (GBM)
    • Boosting
    • Bootstrapped Aggregation (Bagging)
    • AdaBoost
    • Stacked Generalization (Blending)
    • Gradient Boosted Regression Trees

Daily Plan

Each day:

  • Pick one subject from the list.
  • Watch videos on the topic.
  • Implement the concept in Java or Python.
  • Optionally, implement in C (and/or in C++, with or without the stdlib).
  • Write tests to ensure code is correct.
  • Practice until you are sick of it.
  • Work within limited constraints (think interviews).
  • Know the built-in types.

Code:

Practice writing out on a whiteboard and/or on paper, before implementing on computer. Get a big drawing pad from the art store.