Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 2: Going off on a Tangent – AI/ML Applications in Social Engineering | NCC Group | Leading Cyber Security & Managed Services

07 June 2019

This is the second blog in the Project Ava series – the first set out the aims of the research and the tools that our research team experimented with to facilitate their work.

In this blog, the team explore an interesting tangent as they play with the capabilities of IBM’s Natural Language Processing tool. Read on to find out more…

Overview

In the previous blog, we introduced our investigation into some of the public cloud offerings on AI/ML and their potential applicability to Project Ava. We must admit, that during that part of our research, we got slightly distracted when playing with some IBM Watson’s offering – specifically its Natural Language Processing (NLP) capabilities. This tool analyses “text to extract metadata from content such as concepts, entities, keywords, categories, sentiment, emotion, relations, and semantic roles” [1].

Despite going off-piste from the goals of Project Ava, we devoted some time to further explore IBM’s Watson’s NLP capabilities and how we might be able to apply it to social engineering or phishing-type scenarios. In particular, we wanted to look at personality insights and whether it might be possible to determine properties of individual personalities and exploit specific properties in attempts at maximising the success of social engineering or phishing attempts.

Personality insights

IBM Watson provides a service which seeks to gain insight into how and why people think, based on the text that they write. Specifically, its test service allows for analysis of a person’s personality based on their tweets [2]. Running this test against my own Twitter profile resulted in the following:

Initially, I was slightly affronted at being assessed as a bit inconsiderate by a machine. However, having read the remainder of the personality portrait and pondered the fact that I adopt a slightly exaggerated persona beyond my actual self on Twitter (for comedic effect and/or sarcasm), this was a pretty accurate summary based on analysis of a mere 5,187 words across my tweet history.

Other traits that jumped out include:

You are open to and intrigued by new ideas and love to explore them – Correct. I guess that’s perhaps why I work in research.
You are intermittent, you have a hard time sticking with difficult tasks for a long time – Correct. I do context-switch often and struggle to focus on one main thing, for an extended period of time.
Experiences that give a sense of efficiency hold some appeal to you – Correct – and probably for most techies – we love to be able to automate or hack our way to a more efficient solution.

The last paragraph in the screenshot above perhaps makes me sound a little selfish – I had not previously thought of myself in this way, but given the seemingly accurate assessment so far, I began to doubt and question myself, descending into an existential funk…

… but then I figured that it was ridiculous to be offended by the opinion of a machine and realised the potential powerful effects of this type of analysis within social engineering scenarios.

From the screenshot above, we can also see that the analysis correctly picked out that I have experience playing music (I play guitar and piano), and that I am unlikely to be influenced by brand and social media when making product purchases (true – I have a blindness to advertisements).

These types of insight could be invaluable when thinking about manipulation of individuals into performing certain actions, such as clicking on a link, downloading a piece of software or allowing physical entry to a building.

We therefore set out to further explore the IBM Watson Personality Insights API and how we might use this in targeted scenarios. The solution allows us to analyse textual content sent to the API and returns a personality profile for the author of the input. The service infers personality characteristics based on three models, with focus on “the big five” and their associated facets [3]:

Agreeableness – a person’s tendency to be compassionate and cooperative toward others
Conscientiousness – a person’s tendency to act in an organised or thoughtful way
Extraversion – a person’s tendency to seek stimulation in the company of others
Emotional range (also referred to as neuroticism or natural reactions) – the extent to which a person’s emotions are sensitive to the person’s environment
Openness – the extent to which a person is open to experiencing different activities

We wrote an internal tool named Personality Insight Manipulation Suggester (PIMS) that can be configured to extract personality attribute facets of interest with a configurable threshold. A threshold of 0.75 or greater indicates readily-discernible aspects of a characteristic. For optimum results, the more text supplied as written by the target, the more accurate the results.

Our intended use case here was to take text from public sources of a target, such as their social media accounts and use that as input. Elaborating on the Twitter demo from IBM, we developed a tool to consume a corpus of text (in this example my tweet history).

If any attribute facets of interest exceed the likelihood threshold (set at 0.75 but can be configured) then the tool suggests some ‘manipulations’ for the target, related to those facets. For example, a high altruism score results in a suggested manipulation that involves making the target individual feel useful in helping others.

As a very quick example of our thinking, we came up with the following table looking at just some of the personality characteristics, their facets and potential suggested manipulations for high scoring individuals against those respective characteristics:

Personality Characteristic	Facet	High Scoring Individuals	API Parameter of Interest	Suggested Target Manipulation
Agreeableness	Altruism / Altruistic	Find that helping others is genuinely rewarding, and that doing things for others is a form of self-fulfilment rather than self-sacrifice.	facet_altruism	Request assistance in something that will appeal to the target and will fulfil them by knowing they are helping with the request.
	Trust / Trusting of others	Assume that most people are fundamentally fair, honest, and have good intentions. They take people at face value and are willing to forgive and forget.	facet_trust	This target exhibits a high level of trust, rendering them a good target for manipulation. Seek to appeal to their interests in any crafted manipulations.
Extraversion	Activity level / Energetic	Lead fast-paced and busy lives. They do things and move about quickly, energetically, and vigorously, and they are involved in many activities.	facet_activity_level	Create a manipulation for this target which links to some key aspect of their daily life to pique their interest. Their busy nature might make them less likely to think before performing manipulated actions.
	Excitement-seeking	Are easily bored without high levels of stimulation.	facet_excitement_seeking	Craft a manipulation for this target that meets or exceeds their high level of excitement. Be bold and creative in the manipulation.
Emotional Range	Immoderation / Self-indulgence	Feel strong cravings and urges that they have difficulty resisting, even though they know that they are likely to regret them later. They tend to be oriented toward short-term pleasures and rewards rather than long-term consequences.	facet_immoderation	Identify common interests for this target then craft enticing manipulations that they will not be able to resist. Be grandiose so as to peak interest with this facet.
Openness	Adventurousness / Willingness to experiment	Eager to try new activities and experience different things. They find familiarity and routine boring.	facet_adventurousness	Understand the targets common activities then craft a manipulation that is outside of their norm yet exciting enough to satisfy their eagerness to be adventurous.

Similarly, some characteristics and their traits might indicate phishing targets to either avoid, or that will require different approaches to their manipulation. For example, with conscientiousness, we might think to avoid devoting effort in manipulating high scoring individuals against the following facets:

Cautiousness / Deliberate / Deliberateness – These individuals are disposed to think through possibilities carefully before acting. Therefore, they may be better at spotting things that look suspicious and/or may be cautious in acting quickly by clicking on links
Dutifulness / Dutiful / Sense of Responsibility – These individuals have a strong sense of duty and obligation – if they are aware of phishing and/or have been trained about the associated dangers, they may be more likely to spot something suspicious and report it as a matter of duty

The hypothesis here is that by understanding a target’s personality, we can better craft a manipulation, such as a phishing email or social engineering script, that is more likely to be accomplished.

The output of PIMS on my tweet history was as follows:

>python pims.py tweets.txt 
Target Attributes
-----------------
big5_agreeableness : 0.1792536382106265
facet_altruism : 0.41694259093170094
facet_cooperation : 0.318986237250926
facet_modesty : 0.0873722388405736
facet_morality : 0.15980257723249852
facet_sympathy : 0.6969288386007256
facet_trust : 0.6975657276803714
big5_conscientiousness : 0.21017094031629957
facet_achievement_striving : 0.46750241225479544
facet_cautiousness : 0.3234070832726173
facet_dutifulness : 0.26818352563423586
facet_orderliness : 0.3127800628979961
facet_self_discipline : 0.23265616294989055
facet_self_efficacy : 0.6355308407337696
big5_extraversion : 0.31673647923115467
facet_activity_level : 0.5805646809458663
facet_assertiveness : 0.559915253558856
facet_cheerfulness : 0.12500641247030364
facet_excitement_seeking : 0.5488505424292813
facet_friendliness : 0.37471053163981827
facet_gregariousness : 0.24220495489370197
big5_neuroticism : 0.3314110423096769
facet_anger : 0.7992248241528529
facet_anxiety : 0.5863983948253635
facet_depression : 0.6803465943133252
facet_immoderation : 0.5833513105426745
facet_self_consciousness : 0.6709581839516744
facet_vulnerability : 0.4407429865819616
big5_openness : 0.6821049411456432
facet_adventurousness : 0.7593883796613703
facet_artistic_interests : 0.5140319734900856
facet_emotionality : 0.3113117491259887
facet_imagination : 0.830705402848515
facet_intellect : 0.9068674066123124
facet_liberalism : 0.9004335592082839
need_liberty : 0.2296237711144805
need_ideal : 0.3218116251256743
need_love : 0.21187125011418867
need_practicality : 0.5497172336648453
need_self_expression : 0.20587328089116824
need_stability : 0.03562358274661781
need_structure : 0.3568232046961516
need_challenge : 0.45573482377558905
need_closeness : 0.10635637265877823
need_curiosity : 0.4301100527961385
need_excitement : 0.26183226485767574
need_harmony : 0.24498590340858972
value_conservation : 0.10107471188055861
value_hedonism : 0.3561331108847293
value_openness_to_change : 0.44889173968725327
value_self_enhancement : 0.48502684784568517
value_self_transcendence : 0.18459215333950185
word_count : 14560
processed_language : en
----------------------
Suggested Manipulation
----------------------
adventurousness : 0.759388379661
Understand the target's common interests then craft a manipulation that is either extreme within those interests or outside of their norm yet exciting.

This tool is in no way exhaustive and further research is required on what might constitute a good suggested manipulation against specific traits and facets that score highly, while on the converse, potential manipulations for those low scoring traits and facets should also be explored.

In the example above on my tweet history, my ‘adventurousness’ triggers against the defined high threshold of 0.75 and thus a high-level manipulation is suggested that might maximise manipulation of me in how I behave and think.

Note: to use the tool an IBM account is required (sign up is free) which can be used to create an application name and generate an API access token to be copied and pasted into the python script [4].

Fun with Markov chains

On a sort of related theme – during our work in this domain, I recalled of a great BlackHat USA 2016 presentation from ZeroFOX on “Weaponizing Data Science for Social Engineering” [5]. One of the explored areas in this talk was the use of Markov chains [6] trained on a Twitter users’ timeline in order to generate new texts that might satisfy curiosity and thus propensity to click on a link due to relevance or likeness to the types of tweets commonly generated and/or read by a target victim.

Following a conversation with our chief scientist, Chris Anley, we also sought to explore whether we could “parody” oneself through the use of Markov chains. Using an excellent python package called Markovify [7], which builds Markov models of large corpora of text and generates random sentences from that input, in a few lines of Python we were able to achieve a parodyme.py script:

import markovify
import sys
if len(sys.argv) != 3:
 print "Usage: %s  " % (sys.argv[0])
 exit(0)

# Get raw text as string.
with open(sys.argv[1]) as f:
 text = f.read()

# Build the model.
 text_model = markovify.Text(text)

# Print randomly-generated sentences
 for i in range(int(sys.argv[2])):
 print(text_model.make_sentence())

Using the script above, we passed through my tweet history as text and requested a number of sentences to generate as output. As a result, a number of amusing, yet seemingly Matt-like tweets were produced, which of course I tweeted back out from my account with the hashtag #MarkovMatt to distinguish the tweet from my own, actual musings:

The statistics on impressions and engagements with the tweet certainly show that there is mileage in using such approaches to generate text that will be read by potential targets (e.g. in a phishing campaign).

Summary

In the background we will continue to explore this domain and are keen to quantify success of the techniques described above in real-world phishing or social engineering engagements. We are currently investigating whether there are applications of these techniques for our phishing tool, Piranha [8].

Despite going off on a tangent on this topic, we learned a lot, had fun, and uncovered a research area that is ripe for further work and which demonstrates the need for a multi-disciplinary approach, including specialists from cyber security, computer science, AI and psychology.

Further research in this domain will support the notion of “cyber security as a science” and help us better understand how and why certain people click on links, and thus what unique advice and guidance we can provide individuals based on their personalities, rather than a blanket “don’t click on dodgy emails”, which as we know, does not work as a guaranteed control.

In the next blog we get back on track with Project Ava, where we explore existing approaches to web application security testing using Machine Learning techniques.

References

[1] https://www.ibm.com/uk-en/cloud/watson-natural-language-understanding
[2] https://personality-insights-demo.ng.bluemix.net/?source=myself
[3] https://console.bluemix.net/docs/services/personality-insights/models.html#models
[4] https://cloud.ibm.com/catalog/services/personality-insights
[5] https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter.pdf
[6] https://en.wikipedia.org/wiki/Markov_chain
[7] https://github.com/jsvine/markovify/blob/master/README.md
[8] https://www.nccgroup.trust/uk/our-services/security-consulting/managed-and-hosted-security-services/vulnerability-management-and-detection/phishing-simulation-piranha/

Written by NCC Group
First published on 07/06/19

Matt Lewis