Regularization in Artificial Neural Networks

Neural Networks are highly parameterized models and can be easily overfit to the training data. The most salient way to combat this problem is with regularization techniques. A common technique to prevent overfitting is to use EarlyStopping. This strategy will prevent your weights from being updated well past the point of their peak usefulness. We can also combine EarlyStopping, Weight Decay and Dropout, or use Weight Constraint instead of Weight Decay, which accomplishes similar ends. [Read More]

Hyperparameter Tuning in Neural Networks

Hyperparameter tuning is much more important with neural networks than it has been with any other models. Other supervised learning models might have a couple of parameters, but neural networks can have dozens. These can substantially affect the accuracy of our models and although it can be a time consuming process, it is a necessary step when working with neural networks. Some of the important hyperparameters in neural networks to tune are batch_size, training epochs, optimization algorithms, learning rate, momentum, activation functions, dropout regularization, number of neurons in the hidden layers, number of the layers and so on. Hyperparameter tuning... [Read More]

Sketch Classification with Neural Networks

Classification of QuickDraw Dataset

We are going to use TensorFlow Keras with a 3 layer feed forward perceptron neural network to build a sketch classification model. The dataset is a subset of the Quickdraw dataset. It has been sampled to only 10 classes and 10000 observations per class. We will build a baseline classification model then run a few experiments with different optimizers and learning rates to benchmark the performance of this simple Neural Network (NN) architecture. [Read More]

Classification of Imbalanced Dataset provided by Bridges to Prosperity (B2P) and FastAPI Framework deployment to AWS Elastic Beanstalk

Bridges to Prosperity (B2P)

About the Organization: Bridges to Prosperity (B2P) footbridges works with isolated communities to create access to essential health care, education and economic opportunities by building footbridges over impassable rivers. [Read More]

Building A Subreddit Recommendation Engine Using Machine Learning Techniques

Subreddit Recommender For Posts

Whenever a user submit a post to Reddit, The poster is required to choose a subreddit category. A subreddit is topic/interest specific link aggregation pages with posts related to that topic. Reddit provides a default list of popular subreddits for the user to submit the post to. If none of those subreddits are appropriate, the user needs to search for a subreddit that is more relevant to the post. In this post we discuss how to use ML models in conjunction with NLP techniques to generate a list of subreddit recommendations based the post content. More specifically, the input is... [Read More]

Importing Modules In Python

How python searches for a file: When importing from a python file (module or script), python does not use the path that is in file, but it uses the file full name and the sys.path to identify the file. The full name is __package__ + __name__. If package is None, it’s simply name. [Read More]

A Data Science API For Spotify Web Applications

Building A WEB API Service with Python And FastAPI

This article will discuss how to use FastAPI framework to implement a data science API. The data science API as a micro service allows deployment of a machine learning model and provides multiple endpoints to interact with a frontend JavaScript Web App. The data is sent over to the frontend as JSON format. This is a part of a Web application that interacts with Spotify API. [Read More]

A Comparison of Supervised Multi-class Classification Methods for the Prediction of Forest Cover Types

In this post, common machine learning techniques, such as feature engineering, data transformation, Cross validation, and hyperparameter tuning are applied to a several regression-based and tree-based classifiers. For this comparative analysis the Covertype dataset from UCI machine learning repository is used to predict the type of forest coverage from one of the 7 categories. This is a single-label multi-class classification with equal weight classes. For this work we selected the following multi-class classifiers: Logistic Regression Ridge Regression Random Forest Gradient Boosting XGBoost [Read More]