Jupyter Notebooks exploring the abalone data set for machine learning using Python.
Jupyter Notebook: Initial exploration of the abalone data set.
Jupyter Notebook: This notebook explores the reason why linear and higher-order models fail to fit the data well. The reason? The data have high variance!
Jupyter Notebook: A notebook exploring the use of the covariance matrix and its eigenvalues and eigenvectors to extract principal components and visualize the results.
Jupyter Notebook: The fundamental shortcoming of the abalone data set is that the prediction task involves a dimension (time) completely orthogonal to those given by the measurements (space and mass). This makes some attempt to cluster abalones by growth rate, and regress the growth rate based on ocean temperature data. (a.k.a., cheating)
Jupyter Notebook: Attempts to fit the abalone data set by modeling the system response as a linear function of the input variables.
Jupyter Notebook: Further attempts to fit the abalone data set, using higher-order models for the system response.
Jupyter Notebook: Builds a simple k-nearest neighbors classifier model to fit observed inputs to outputs.
Jupyter Notebook: Linear classifiers, like linear regression models but for categorical data rather than continuous data, are used to categorize abalones here.
Jupyter Notebook: Utilizes a Gaussian process model to fit observed data using inputs/outputs and krigging.