Editor Note: This academic essay is available at ResearchGate
Nearly three decades ago, I started working in contact centres, after several years in industrial/labourer roles, and was asked to undertake a series of unfamiliar assessments.
I’d experienced tests to prove my role-based skills – such as interpreting warehouse documents – but I’d never been asked word preferences or to agree/disagree with statements that felt in no way connected to the role of answering retail service calls.
These assessments are popular: 75-80% of FTSE500 organisations utilise personality tests to predict performance and for talent acquisition. Multiple non-commercial tests are available online, where users can “reveal who you really are” and “find your strengths” (CIPD, 2021; see also Independent, 2019; Salako, 2021; Truity, n.d.).
I was informed that my training would be tailored to my reported preferences. Perhaps this was its intention. Rather than being a deciding datum in employment, it might be utilised to alter teaching methods (Moyle & Hackston, 2018; Randall et al., 2017).
I remember worrying that I might be giving incorrect answers.
Surely self-reported surveys can be faked? Perhaps I could manipulate answers to reflect the type of candidate I thought they wanted to employ (Carter, 2016; Furnham, 1990; Martin et al., 2002)?
I arrived on-time, relaxed and prepared for questions. Would my answers be different if I were delayed in traffic, stressed, caught off-guard, or more nervous in nature?
If I were financially desperate for the job (compared to just browsing), would that change my responses?
If I’d been recommended by another employee and wanted to leave a good impression of them, would my responses reflect positively?
With the popularity of personality testing, might I have had previous experience/practice in completing and faking results – and how would an assessor know?
These challenges go to the heart of personality testing. What are we testing for? Is personality something inherent within us (trait) or a reaction to our environment (state)? Does testing at different times or under different conditions change results? Do different types of tests identify the same personality?
If FTSE500 organisations are utilising these assessments, does that suggest rigour, reliability and validity, or have they been hoodwinked into a myth of what personality tests provide?
The test I completed in 1996 was the proprietary Myers-Briggs Type Indicator® (MBTI®), which groups respondents into one of sixteen “personalities” based on four dichotomies associated with Jungian psychology (Maltby et al., 2022, p54-56; see also 16personalities, n.d.; Mattoon & Hinshaw, 2003; Myers & Briggs Foundation, n.d.):
Completed c2-million times each year through a c95-, 144- or 244-question self-reported survey, MBTI® was developed by Isabel Myers and her mother, Katherine Cook Briggs, as a means to simplify C.G. Jung’s work and to understand individual differences – specifically between Isabel and her husband, Clarence – as this psychological instrument might prove useful in World War II recruitment. However, development took longer than anticipated and assessments weren’t available until 1962 (CAPT, n.d.; Myers-Briggs Company, n.d.).
Given the prevalence of its organisational usage, we can ask a simple question: Does MBTI® measure what it claims to measure?
MBTI® focuses on patterns of traits to characterise an individual according to four dimensions into one of sixteen types (Funder, 2007).
A trait is a dimension of personality that categorises people on a spectrum, based on the level to which they manifest that dimension.
A type is a collection of traits grouped into observed specific or situational/habitual responses (Burger, 2008; Maltby et al., 2022, p.175-17; Moyle & Hackston, 2018).
MBTI® outputs one of 16 four-letter combinations (see above), based on four dichotomies:
Introversion/Extraversion identifies where an individual gains/loses their energy. Extroverts prefer external stimuli (e.g., socialising, active involvement, music etc.). Introverts find these same stimuli reduces their personal energy, and prefer ideas, pictures, memories and reactions that are internalised (Myers & Briggs Foundation, n.d.).
Despite MBTI® being based on Jung’s work, introversion and extraversion are defined differently in Jungian psychology and not directly related to energy. Extraversion meant outgoing, candid and quickly forming attachments, whereas introverts were defensive, hesitant, reflective and mistrustful (Maltby et al., 2022, p.56).
Sensing/Intuiting is related to processing information. Do we focus on the core information or do we prefer to interpret and add meaning?
Thinking/Feeling considers how we utilise that information in decision-making processes. Do we consider the logic and consistency of situations or do we first look at people and special circumstances?
Judging/Perceiving is the external manner in which we demonstrate that decision-making process. Do we prefer to get things decide or are we open to new information and options (Myers & Briggs Foundation, n.d.)?
Although variations exist, MBTI® statements are generally arranged on a 5- or 7-point Likert scale between “strongly disagree” and “strongly agree”. Alternatives include word association variants, and a recent study analysed published social media content, identifying that “people’s personality traits could be effectively predicted using social media profiles, their use of language, and their [sic] behavioral patterns.” (Li, 2021).
These statements might include examples such as:
On completing the 1996 self-assessment, my resulting ENTP and its description seemed valid, despite my realisation that the small, simplified number of outcomes was comparable to horoscopes; indeed, some have attempted to link MBTI® to astrological signs (Esteves, 2022).
The “Debater” personality is quick-thinking, knowledgeable and energetic, but can be argumentative, insensitive and favour abstract ideas over getting things done (16personalities, n.d.). The ENTP profile indicates that I broadly prefer:
On review of these preference descriptors, some aspects seem vague. Readers could apply these statements to themselves and, if the respondent was agreeable (a personality dimension not directly expressed in MBTI®[1]), there is a strengthened correlation with the Barnum or Forer effect: the tendency to believe that generally flattering and sufficiently vague personality statements apply specifically to oneself (Dickson & Kelly, 1985; Poškus, 2014; Shtulman, 2015; VandenBos, 2007).
This effect has been linked with self-reported computer-based tests since the mid-to-late-80’s, when multifactor assessments could first be conducted digitally. However, not all validity issues were due to the Barnum effect; some were caused by subjective interpretations of questions (Guastello et al., 1989).
[1] Though MBTI® and NEO PI-R (Big Five) are not directly correlated at each dichotomy, there are corelations with: Agreeableness/TF; Extraversion/EI; Conscientiousness/JP; Openness/SN; and Neuroticism/EI (though to a smaller magnitude) (Furnham et al., 2003)
Validity is concerned with whether a measure is measuring what it claims (Maltby et al., 2022, p.663). All self-reported tests are reliant on honesty and objectivity, and MBTI® has been criticised for its ability to be faked. Certain roles – and, in western environments, certain specific MBTI® results – can be considered more favourable and this social pressure could influence respondents to answer inauthentically (Carter, 2016; McPeek et al., 2011; Moyle & Hackston, 2018). Using questionnaire data as one datum in a suite of components, supported by trained practitioners, Moyle & Hackston suggest, could minimise fakeability concerns.
How an assessee interprets questions/statements or the context in which they’re asked can impact validity. MBTI® has been translated into multiple languages, but we must be cognisant of bias introduced to assessment approaches through the overwhelming focus on western, educated, industrialised, rich, democratic (WEIRD) cohorts (Schulz et al., 2018; see also Henrick et al., 2010; Lundgren et al., 2019; Muthukrishna et al., 2020).
Sutin et al. (2020) also found that responses to “I try to go to work or school even when I’m not feeling well” changed during the COVID-19 pandemic. Historically, this was an indicator of conscientiousness (linked to J/P in MBTI®), with a favourable score representing a higher level of the trait. Pandemic macroenvironmental and societal pressures changed the interpretation and, as the assessment was not updated, an inappropriate result manifested.
If we consider one of the four dimensions within MBTI®, it would be reasonable to expect a distribution curve within each type indicator, with two distinct local maxima (see below).
Pittenger (2005; see also Bess & Harvey, 2002) notes the conspicuous absence of this and opines that the high frequency of mid-point scores results in a distribution curve across the dimension, with no evidence that an extrovert type is qualitatively different to an introvert type.
Reliability is concerned with the consistency of measurement at different points in time (Howitt & Cramer, 2014, p.306). MBTI® has received criticism for test/retest reliability. Pittenger (2005) argued that many people assessed twice often get different type results, despite personality traits being considered stable and cross-situationally consistent (Banicki, 2017).
I have completed MBTI® assessments on 20+ occasions, always reporting ENTP. As my career includes the use of MBTI® assessments, it might be that I understand the dimension each question is referencing, and am being disingenuous in my responses or perhaps interpreting instructions differently (Kubinger, 2002; Mahar et al., 1995; Martin et al., 2002).
Even without this insider view, respondents can answer questions in a way that favours themselves, endorsing desirable and rejecting undesirable traits (Monaro et al., 2021). It can be argued that western societies tend to prefer extroverted employees, and social pressures might encourage retest fakery or that respondent’s views of descriptors may be viewed as deeper and more meaningful than is objectively accurate (Caldwell & Burger, 1997; Stein & Swan, 2018).
A second consideration in reliability is standard deviation: the extent to which dataset values deviate from the mean. If deviation is large and retest reliability is low, differences between individual scores become lessened unless they, too, are large (APA Dictionary, n.d.; Pittenger, 1993). Pittenger (2005) later opined that this deviation doesn’t meet expectations, with a low of r(38) = .48 for TF scale and r(38) = .73 for EI scales when retested over 14-months.
Rather than measuring individual traits of personality on a spectrum or scale, MBTI® claims to group multiple these together into a more general type. This simplification in approach will result in any output also being grouped and simplified, losing any nuance between each spectrum of personality trait.
Using the Myers-Briggs Type Indicator as our test theory, our initial question of “how accurate are personality tests?” remains unanswered. Considering validity and reliability evidence, MBTI® theory doesn’t represent a robust or suitable framework for studying personality (Stein & Swan, 2019). The Myers-Briggs organisation also state that the tool isn’t intended as a personality assessment or to predict performance, but for self-awareness and the awareness of others (Hayes, 2014). It’s easily faked, has low retest reliability and its validity is questionable.
MBTI® isn’t a personality model. But, if its limitations are appreciated, it can be both harmless and potentially useful in heuristically classifying peoples’ general tendencies. However, many organisations utilise MBTI® as if it were objective, and use its findings in HR decisions around placement and development. This is the core challenge with MBTI®. It’s a tool with its own mythology that many organisations often interpret as both ideology and scientific, claiming to measure something it does not, being utilised to inform talent attraction (in both role advert creation and candidate selection), retention, and development strategies (Burnett, 2013; CIPD, n.d.; Essig, 2014; Mahar et al., 2006).
The insight provided from MBTI® can prove useful in identifying an individual’s preferences. One can appreciate Sensors’ natural preference is for structure, facts and truths. When assembling flat-packed furniture, they may arrange the pieces in order, check the box contents and read instructions. Conversely, Intuitors might picture the finished product, then work towards that abstract goal. Neither is the best or worst way to achieve the task. However, Sensors might look too structured, rigid and inflexible in their approach to Intuitors; Intuitors may appear laisse-faire and disorganised to Sensors.
Therefore, the value in the MBTI approach lies not in the 16 “personalities” that are grouped post-assessment, but in the understanding of each of the continuums and their relationship to others.
Does MBTI® measure personality accurately? No. Does it still have a place in today’s organisations? Yes – provided it’s not used in isolation, as the deciding factor in decision making, or as a measurement of personality.
“Firgun”, “#HappyBeesMakeTastyHoney” and the hexagon device are registered trademarks of Firgun Ltd.
Registered in England and Wales: 13907991. Copyright 2023 | Firgun Ltd – All rights reserved.