A team of researchers from IBM have found that a user’s personality type may be accurately inferred from social media posts using much smaller sample sizes than previously thought.

The researchers published their experiment and results in a paper entitled “25 Tweets to Know You: A New Model to Predict Personality with Social Media.”

Using an innovative analysis framework, the researchers discovered that they can accurately predict a person’s Big-5 personality traits from as few as 22 separate posts on Twitter, where the characters from a post are limited to 140 characters. This represents a significant reduction from previous experiments in the amount of information required to accurately predict personality traits.

The new framework analyzes tweets using a combination of word embedding, extracting the words used in tweets to create a representative vector, and Gaussian Processes, which creates a non-linear model using the assigned vectors to relate real-life tweets to personality traits.

The Big-5 personality traits, recognized by modern psychologists as the five basic indicators of overall personality are extraversion, agreeableness, openness, conscientiousness and neuroticism.

Based on the evaluation of over 1,300 Twitter users, the research team found that they were able to predict these five traits with better accuracy than previous attempts, using 8 times less data.

For the experiment, the team conducted a survey on Twitter to collect self-reported personality ratings, using a 50-item form to assess Big-5 traits. The respondents also agreed to allow the researchers to access their public tweets. They were able to recruit 1323 participants with 200 original (non re-tweet) tweets each, allowing the researchers to determine the optimal number of tweets required for accurate prediction.

The researchers found that their model approached Big-5 trait prediction accuracy within only 25 tweets, with accuracy leveling off when the sample size increased from 25-200.

There has been an increasing interest in the use of social media posts for various research projects. The fact that hundreds of millions of users voluntarily post to social media sites daily is a rich trove of data that can be mined for research purposes. However, prior works that correlated social media posts to personality traits tended to use far larger samples with less accurate results.

The IBM research team found that using an innovative method of combining word embedding to create vectors and Gaussian processes for analysis allowed for far more accurate results using far less data.

The research team made this personality prediction model available as an API on bluemix.com.