When was the last time someone wrote you a custom HTML letter? ;)
I am excited to show you what we do with the data!
I was intrigued by your challenge, and kept going down and down the rabbit hole. You gave me at least three Key Performance Indicators (tweets, facebook shares, and OkDork comments), and I did at least three different experiments for each KPI. It took somewhat longer than just pressing the "GO" button, because we expanded your data with new metrics.
If you don't feel like reading you may jump down to the graphs, but I am going to start with a summary.
I think we have an interesting story here, but I would love to repeat the analysis for a much larger set of posts. The textual analysis will get crispier.
The data you gave me did not have sufficient information about the post popularity - we could create predictive models, but they all had a poor accuracy (around 60% correlation with the real shares, which is weak).
We first added to your data a few additional metrics on the post stats:
Then we run the experiments to predict the number of facebook and twitter shares allowing the following metrics: numbers in a headline, headline length, word count in the post, # images, whether or not there is a video, week day, and words in a headline.
The weekday really rocked it, because we got models with 83-85% correlation (you can play with the posts below). All was cool, but the models were not capturing the viral posts (with a gigantic number of shares). You will see it in the graphs. For both, facebook and twitter shares models are predicting the posts reasonably, but fail to see the 'viral' posts.
The reason for this is likely that the actual content of the posts was related to their epic popularity (I know - no rocket science). So, we decided to dig a little deeper.
These metrics were added to your data in the next iteration:
All these extra metrics were added to the table and fed to DataStories again. The results were MUCH better. We could more accurately predict the popularity of all posts, but the 'epic' posts were still under-predicted.
It helps to NOT BE NEUTRAL. Only two highly popular posts are neutral: A (Proven) Freelancer’s Guide to Growing Your Business and Are things happening to you or are you making things happen? And these two posts were still hard to predict accurately.
I only don't like that the usage of word "two" was labeled as important (the less times you use it - the better for shares). This is an artefact of a small sample of posts that we have. I will remodel everything exclusing words "one" and "two", but thought will show you this for now. lIf we repeat the process on a larger set - the word selection will be crisper.
Machine learning rules!
The rules for a higher number for Twitter are quite simple:
The actual number of posts per week day in 2014 was the following: 7 posts on Mondays, 12 on Tuesdays, 10 on Wednesdays, one on a Thursday, two on Fridays, 6 on Saturdays, and three on Sundays.
It looks like Noah's favorite days to post were Tuesday and Wednesday, but we see that all posts with more than 500 twitter shares were published on Monday and Tuesday (with one exception).
Another great way of looking at the data is sorting the rows of the table by the factor of interest.
OkDork published five posts with a book review in 2014. They were much shorter on average (740 words vs. an average of 2400 words for other posts). All of them had precisely one image. The average length of the headline was very similar with an average length for other posts - 47 characters. However, the book reviews got considerably less shares and comments.
Let's hope, Noah will keep doing them despite the low share volume, because they are great!
We pulled in the text from all post pages, and applied some simplest natural language processing algorithms to it.
We stripped it off to the contents of the posts, removed all stop words like "is","the","a","are", etc., converted words to infinitive forms forms or nominative cases for each word. This gave us 5600 unique words. Then we selected the 3000 most frequent ones and first tried to understand what are the distinct topics representing the overall content of OkDork posts in 2014.
You may consider the topics as the shortest possible summary of all posts of 2014.
The topics are great, but they do not give us a clear way to segment the posts. So, we took all posts and the frequencies of all unique words in them and clustered the posts into five groups.
We named the resulting segments A,B,C,D,E,F and calculated the average stats for them. The results are interesting!.
|Name||Tweets||Top 10 Words|
|A||30||taco, shirt, minute, learned, favorite, lesson, started, get, poster, didnt|
|B||103||email, marketing, get, new, subscriber, okdork, want, post, reader, business|
|C||117||ad, car, retargeting, campaign, advertising, step, targeting, facebook, target, video|
|D||361||book, people, thing, business, time, yogurt, sugar, fat, review, take|
|E||1441||content, post, blog, headline, page, linkedin, share, social, article, kissmetrics|
|F||1554||email, client, job, people, hiring, work, noah, designer, freelancer, person|
It looks like the OkDork followers love everything Noah publishes equally. The top segments for them in terms of the number of comments are lessons & tacos (A), content & headlines (E), and hiring & freelancers (F).
Social media, however, really focuses on the last two segments E & F only!