Project Goal: Using word embeddings identify company names and stock tickers from natural text.
Assumption: Stock tickers and company names are used in similar context in natural text such as a Reddit post or a tweet.
Under this assumption, word embeddings should be a good fit for identifying these target words as word embeddings are trained by the context in which words are found.
I have been interested in learning the chrome extension framework for a long time now, so decided I would take a crack at it. I heard Sam Parr on a podcast talk about how he uses RedditList as a tool to see what is trending around the web. So, I decided that I would try to create a new-tab override chrome extension that would display what’s trending around the web in a minimal form. I have also been interested in learning Django and chose to build a Django API, which will handle all the web scraping.
I first needed to…
There are many different implementations of the circular queue all of which may be better suited for specific applications. This blog post is to help understand how a circular queue works along with its uses and advantages.
A Queue is a simple data structure that implements the FIFO (First-In-First-Out) ordering. This simply means that the first item added to your queue is the first one out. Just like a line or queue of customers at the deli, the first customer in line is the first to be served. …
Most Hiring Companies, Top Tools & Tech, and More
This is an August 2019 update of my original project where I simply aim to explore the job market for data analysts and data scientists in the Greater Boston Area.
These visuals were produced only from job listings posted on Indeed with the search term ‘data analyst’ or ‘data science’ and therefore only represent companies who chose to post on Indeed. Since this is an update of a previous post I will only show the visuals here. …
The goal of this post is to share a scripting problem with which I was challenged. Not having much experience with these types of challenges I thought it would be a great opportunity to share and look for feedback.
Write a script that will convert any .RIS file into a well-formed XML document.
Although I know that other languages are probably better suited to these types of processes…
This is a quick walk-through of my first project working with some of the text analysis tools in R. The goal of this project was to explore the basics of text analysis such as working with corpora, document-term matrices, sentiment analysis etc…
I am using the job descriptions from my latest web-scraping project. Which is about 5300 job postings pulled from Indeed.
We are going to focus on the job descriptions here, as they contain the most text and information. …
Scraping Indeed with Rvest | Data Wrangling with Tidyverse | Text Mining with Stringr & Tidyverse | Visualization with the GGplot2.
In this project, I aimed to explore the job market for data analyst and data scientist roles in Boston. I decided this would be a great opportunity to learn about web scraping and decided to build a scraper to pull this information from Indeed and explore the data.
For the scraper I decided to use 2 different job titles in 3 different cities, producing 6 different search terms.
In this project, I aimed to practice different hypothesis tests in R; while exploring data from the 2017 MLB season. I will briefly walk through the data exploration and cleaning but will focus on the statistical tests. After exploring the data, I chose the following questions to ask, leading to four different statistical tests.
A cheat-sheet walk through
Tidyverse is a collection of packages for R that are all designed to work together to help users stay organized and efficient throughout their data science projects. The core packages of Tidyverse consist of the following 8 packages:
1. readr: for data import.
2. tidyr: for data tidying.
3. tibble: for tibbles, a modern re-imagining of data frames.
4. dplyr: for data manipulation.
5. stringr: for strings.
6. ggplot2: for data visualisation.
7. purrr: for functional programming.
8. forcats: for dealing factors.
See more on the tidyverse site.
Tidyverse and Rstudio have put out extremely helpful…