We can observe many machine learning applications in our daily lives. One of the most significant applications is classifying individuals based on their personality traits. For example, the availability of high-dimensional and large amounts of data has paved the way for increasing the effectiveness of marketing campaigns by targeting specific people. Such personality-based communications are highly effective in increasing the popularity and attractiveness of products and services. It leads to increased usage, customer satisfaction, and broader acceptance among users.
Some common examples of how personality-based approaches are used in machine learning include:
It is clear that a person's personality plays a significant role in their interactions. According to reports, companies often request social profiles from job candidates in their hiring forms to gain insight into their personalities and assign them tasks that they are well-suited for. This not only helps companies select suitable candidates but also increases their efficiency. Isn't this amazing?
So let’s start without any further delay!
The Big Five Personality Trait model, also known as the OCEAN model, is a widely used framework for assessing personality in psychology. It provides a summary of a person's overall character.
The availability of high-dimensional and fine-grained data about human behavior has made researching and observing human behavior much easier. For example, mobile sensing studies and data collected from daily activities have greatly impacted how psychologists conduct research and administer personality assessments.
In this direction, machine learning models have the potential to revolutionize research and assessment in personality psychology. Algorithms can handle large datasets, including thousands of attributes, without issues of collinearity. Additionally, ML algorithms are highly efficient at recognizing patterns in datasets that humans may not be able to detect. These ML models can lead to more accurate, objective, and automated personality assessments.
Another example would be social media, where people express their likes, thoughts, feelings, and opinions. Machine learning models have been effectively using this data to predict individuals' "Big Five" (OCEAN) personality traits. Various supervised machine learning algorithms, such as Naïve Bayes and Support Vector Machines, are widely used in industries to predict personality traits. Additionally, researchers have recently started applying unsupervised learning methods to identify other psychological constructs in digital data.
In recent years, social media platforms like Facebook, Twitter, Instagram, and LinkedIn have grown in popularity among internet users. These platforms provide a valuable opportunity for researchers to study and understand individuals' online behaviors, preferences, and personalities. Different personalities are associated with different social interactions and behavior patterns on social media, such as status updates and preferences.
The figure below describes how to predict the personality traits of Facebook users based on different features and measures of the Big Five model.
Now that we have a basic understanding of personality traits and their use cases. Let's dive deeper into predicting the Big Five personality traits.
This section performs Big Five Personality Test prediction using a dataset of 1,015,342 questionnaire answers collected online by Open Psychometrics. Let’s look at how the dataset appears in actuality. The Number of participants = 1015341.
data_raw=pd.read_csv("data-final.csv",sep='\t')
data = data_raw.copy()
pd.options.display.max_columns = 150
data.drop(data.columns[50:107], axis=1, inplace=True)
data.drop(data.columns[51:], axis=1, inplace=True)
print('Number of participants: ', len(data))
data.head()
Dataset: Consists of 110 columns.
Specific questions are asked for each personality trait, and participants must choose between 1 and 5. The scale was labeled between 1 = Disagree, 3 = Neutral, and 5 = Agree. Here EST corresponds to the Extroversion trait, AGR corresponds to the Agreeable Personality, etc.
Let’s look into how questions for each personality trait are distributed. Here we are showing the frequency distribution of questions for Extroversion and Conscientious Personality.
Conscientious Personality
Extroversion Personality
We need to scale the data using MinMaxScaler between 0–1. Scaling helps in optimizing the model’s performance and generate better results.
Now, we have our data in the desired format. So let’s take a step ahead and gets our hands dirty by forming five clusters where each cluster corresponds to each personality train from the OCEAN model. For this problem, we are using the K-means clustering algorithm. After performing clustering, we have our results:
Cluster Distribution
from sklearn.cluster import KMeans
df_model = data.drop('country', axis=1)
#define 5 clusters and fit model
kmeans = KMeans(n_clusters=5)
k_fit = kmeans.fit(df_sample)
# Predicting the Clusters
pd.options.display.max_columns = 10
#labels_ is used to identify Labels of each point
predictions = k_fit.labels_
df_sample['Clusters'] = predictions
df_sample.head(10)
We can use the PCA algorithm for dimensionality reduction to visualize our final results. After performing PCA, we have:
import seaborn as sns
plt.figure(figsize=(10,10))
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set2', alpha=0.9)
plt.title('Personality Clusters after PCA');
For evaluating the model performance, reconstruction error is used. PCA is used to project the points into the low-dimensional space. The original points are reconstructed by projecting the low-dimensional representations back into the high-dimensional space. The distance between the reconstructions and original points is inversely related to how well the model captures the structure present in the data. Similarly, reconstruction error can also be used to compute the R2 score, measuring the performance.
The complete code for personality prediction project can be found here.
Since the start of the century, Japan has seen a decreasing trend in the birth rate. It is because of reducing the number of annual marriages from 800,000 in 2000 to 600,000 in 2019. Nowadays, it is too difficult to find the perfect mate, even in this COVID-19 time when almost everything has become online and virtual. Hence to help Japan’s declining birth rate and people find their eternal love, Japan’s government is leveraging Artificial Intelligence and Machine Learning so that they can get married and start their families.
However, Japan’s Cabinet believes that current dating services are not advanced enough to find the perfect match. They have relied on preferences such as age, income, and educational level filled by the users. Hence Japan Government sought Artificial Intelligence’s help to find the perfect match based on more hidden patterns.
The new AI and ML-based dating systems have shown excellent results by focusing on individuals’ values and personalities. Hence, this more personalized approach, rather than merely using age, income, education level, and the matched pair, has a higher probability of getting married. The government also pays two-thirds of the new and improved AI dating systems’ operating costs to support such services.
Currently, Japan’s Cabinet Office is also looking for approval of two billion yen for the new and advanced AI-enabled dating service in the budget.
The usage of Machine Learning methods in psychological research is expected to increase sharply soon. Personalization is the key to businesses expanding and offering customer-oriented services. Similarly, personalization provides better options and gives better opportunities to individuals based on their personalities. Machine Learning has excellent potential in determining personality traits, which can be further used for self-monitoring and for businesses to hire employees based on their personality criteria.
Next Blog: Customer segmentation using hierarchical clustering
Enjoy Learning, Enjoy Algorithms!