Working with Messy Data<!-- --> | <!-- -->Assume Wisely
Working with Messy Data

Working with Messy Data

Posted: April 28, 2018

This week I spent some time thinking about what data I want in my data warehouse. A big piece of what I want is a birds eye view of what my social networks are talking about. I’d like to see a pie chart with the percent of the conversation related to sports, news, politics, Netflix, or just life in general. I’d like to be able to drill down on those conversations where I might add my two cents. I’ve tried listening on social media with Hootsuite and I didn’t get very far. I wasted a lot of time. For now, my first data warehouse is going to focus on social media.

That requires using APIs, web scraping, and NLP (natural language processing) for starters.

Also, this week at work I started a data mining project which involves NLP.

Time to dust off my NLP essentials course. I thought my first real python based project would be similar to the Titanic dataset i worked with. Oh well. There goes that.

NLP Essentials

Natural Language Processing deals with data that isn't necessarily in neat columns and rows: like email, reviews, social media posts, etc. You need to be able to parse through the tags and html.

Below is some code that I want to hold onto for later. I've tried several strategies for posting my notebook code into wordpress. This is my first attempt. It's not pretty. It's an image that links out to a Kaggle Kernals. You can fork it there.

Small Edit, I am just linking out to Kaggle.

Git Sum (un)common sense,


Don't miss my next thought:

© 2018 · Rho Lall