Personality Detection and Prediction Using Natural Language Processing

6 min readJun 25, 2021

Personality affects every aspect of human life and indicates that personality can not only predict and describe an individual behaviour but also encompasses the way they think and feel, as well as influencing their motives, preferences, emotions and even health. Social networking platforms including Facebook and Twitter have become an increasingly popular medium for individuals to share their thoughts and emotions, as well as being a forum to share opinions and sentiments about current or past news and events. The way that an individual presents themselves online reflects their attitude, behaviour and personality. Some psychologists argue that there is a clear connection between a person’s personality or temperament and the way that they behave online in the form of likes, tweets or comments. However, the association between personality and computational behaviour is yet to be revealed.

The importance of personality recognition on social networks has been evidenced by researchers’ recent attention to the development of automatic personality recognition systems. Based on the content on the user’s profile, personality-based sentiment classification is a critical and difficult task as a result of the highly complex nature of stylistic characteristics included in blogs, posts, other creative writing contents, likes, dislikes, comments and profile pictures. Various models for personality trait classification have been proposed previously. Some cyber personality-related studies indicate that psychologists employ several definitions of an individual’s personality because personality describes the behaviour of humans in response to different environmental factors, including feelings, thoughts and emotions. Two of the vastly used personality indications are Myers-Briggs Type Indicator (MBTI) and the Big Five Factor Personality Model.

Myers-Briggs Type Indicator (MBTI) model

MBTI was developed by Isabel Briggs Myers (in the 1940s) to make the theory of psychological types described by C. G. Jung (founder of analytical psychology) understandable and useful for the community in general. Based on the answers to the questions on the inventory, people are identified as having one of 16 personality types. The goal of the MBTI is to allow respondents to further explore and understand their personalities including likes, dislikes, strengths, weaknesses, possible career preferences, and compatibility with other people.

Source: https://blog.adioma.com/16-personality-types/

Big Five model

A user’s personality can be compared against standard personality tests in what is known as the automated classification of personality. Probably the most standardized and important personality test, the Big Five uses five factors, to describe personality and human psychology. This model is the most widely accepted personality theory held by psychologists today. Evidence of this theory has been growing for many years, beginning with the research of D. W. Fiske (1949) and later expanded upon by other researchers including Norman (1967), Smith (1967), Goldberg (1981), and McCrae & Costa (1987).

Source: https://blog.adioma.com/5-personality-traits-infographic/

Recent controversies about the level of replicability of behavioral research analyzed using statistical inference have cast interest in developing more efficient techniques for analyzing the results of psychological experiments. Respectively, automatic prediction of personality traits has received a lot of attention in terms of academia, research and corporate. Specifically, personality trait prediction from multimodal data has emerged as a hot topic within the field of affective computing.

Deep learning is a sub-field of machine learning and is also known as hierarchical learning, deep machine learning and deep structured learning. In its simplest form, one set of neurons receives an input signal and the other set sends an output signal. Models based on deep learning can facilitate tasks including computer vision, speech recognition, automatic handwriting generation and natural language processing. The complex and dynamic nature of human behaviour in cyber environments can be studied using NLP powered textual analysis because NLP is able to extract local and global significant features automatically and identify misinformation. As a result of the learning capacity of the model, deep learning-based neural network models are particularly effective for detecting personality traits.

Every word used in textual content carries a significant amount of sentimental value. A combination of sentimental representation of each word in a given textual content ultimately decides the overall expression in terms of positivity, negativity, anger, anticipation, disgust, fear, joy, sadness, surprise, trust or any other under consideration. Such representations can be used to determine the personality type of the person who expressed those thoughts. Ultimately, using the overall classification of a large amount of dataset, the personality of a particular individual is selected.

The first step of the personality prediction process is Data Processing. Datasets are usually collected from a social media platform, it contains a lot of slang and words which could not be used as meaningful semantic features for identifying one’s personality. The data also contained various punctuation marks and a lot of emoticons which had to be cleaned and removed from the dataset. The dataset went through rounds of data cleaning wherein all the redundant punctuations, emoticons, and “stop words” like ‘a’, ‘the’, etc. will be removed. This was achieved using regular expressions in python and the NLTK(Natural Language Processing Tool Kit) text processing library. Raw text cannot be directly fed to the machine model. Therefore, meaningful features need to be extracted from this text in order to assess a person’s personality types, this step is known as Feature Extraction.

In order to classify personality based on words, a certain weight needs to be added to the words. This is done by a process called Vectorizing. The unstructured text was converted into vectors (mathematical values) based on their frequency and weight using the Count Vectorizer and Term Frequency- Inverse Document Frequency Vectorizer models. These are two of the most used vectorizing algorithms in Natural Language Processing. This step is followed by forming the algorithm, training, testing and evaluation of the machine learning model to predict the personality of an individual.

Ethical concerns behind the studies associated with personality prediction

Due to the availability of data, most of the personality detection studies are now being conducted using social media profiles, content shared and responses to public contents shared by other people. However, apart from concerns about whether social media data make for sufficient training data for building reliable personality models, the approaches based on social media data carry some ethical concerns given that the data comes from social media profiles, a construct that is considered highly private. The biggest issue is related to the misuse of data without the knowledge or consent of the users. Especially after the infamous Cambridge Analytica study, the issue has started gaining more attention and awareness, and the public’s opinion on digital targeting and modelling has deteriorated drastically. Alongside the consent problem, algorithmic bias could be another issue in this context, making certain groups susceptible to negative effects of the psychological targeting.

Various research has suggested that detecting the private characteristics of users can lead to biases, and put certain groups in unfair situations. For instance, an algorithm that detects a Neurotic user might recommend jobs that do not include any human interaction by ignoring the user’s educational and occupational background only because data indicates that Neurotic people do not engage in social interactions as much. The use of personality detection via psychographic targeting in advertising and marketing also brings ethical concerns. It enables companies to easily target their consumer segments on large scales and increase their sales and conversion rates. However, psychographic targeting can be particularly harmful to vulnerable groups who engage with risky behaviours, such as targeting addicted people with online gambling advertising. At the same time, when handled properly, the same targeting approach could make consumers more satisfied by directing them to spend their money only on products that are compatible with their personality, and avoiding unnecessary and impulse purchasing behaviours. Hence, like any research, personality detection can be utilized for acceptable and for vile purposes.

Written By — Moksha Thisarani

Sinhala Translator — Rumeshika Pallewela, Roshinie Jayasundara
Tamil Translator — Mohamed Izad

Personality Detection and Prediction Using Natural Language Processing

Myers-Briggs Type Indicator (MBTI) model

Big Five model

Ethical concerns behind the studies associated with personality prediction

Written by YGSL

No responses yet