Last week I restarted this newsletter with a plan to share practical details of my work at Fairwords. With this article, I get started in earnest by talking about how Generative AI has helped us achieve roadmap goals. Generative AI is really amazing, solves some tough problems, and I hope you find the article interesting.
Skip the Background. Take me to the Nuts and Bolts.
Background
Before digging into one way we are making use of generative AI, it is important to first understand the context. Fairwords is a small startup who primarily focuses on regulated markets. We develop, market, and sell two main products - Guide and Archive - both require sophisticated analytics to detect risk in what we write but they have starkly different operating requirements.
Fairwords Mission: To promote fairness in business by encouraging ethical, honest, and compliant communications.
The Archive is how regulated businesses, like Financial Services Organizations, retain messages sent by covered employees and fulfill their legal obligation to identify fraud, market abuse, corrupt practices and other risks. Archive analytics runs on streaming messages or periodic batch processes depending on source service.
While Guide uses real-time analytics to let users know before they hit send that they may want to modify their language. This is our uniquely differentiated capability to supercharge compliance by preventing noise from ever hitting the Archive caused by innocuous messages.
Both these services are enabled by consistent detection capabilities. For the Archive and communications monitoring space generally, the state of the market is to use deterministic rules to find strings of text known to be related to different types of risk.
For example, if an energy trader writes an IM to a colleague, “I plan to manipulate the price of brent today” the business is supposed to notice. It should be not only discoverable in the retained messages but they should have tools and processes in place to help them detect such a risky message.
I plan to manipulate the price of brent today
Basic Detection Challenge
To detect such language the software may look for the specific phrase “manipulate the price”. Since that may be too specific, the users may configure it to look for the simple term manipulate to ensure they do not miss similar phrases. Unfortunately, most leading communications monitoring tools will stop at this point giving users the option of an overly specific phrase or a simple one that generates lots of false positives.
The ultimate solution to effectively balance these trade-offs is AI. A mathematical model can be trained to understand the context of the term manipulate such that it is only flagged in a negative context.
Examples of manipulate in context:
I plan to manipulate the spot price of brent today ❎ Flag
I am going to have my spine manipulated today ✅ Ok
She asked me to manipulate the audit record ❎ Flag
Data engineers mostly just manipulate data ✅ Ok
The problem with developing such an AI model lies in the training data. An analytic model can only be as good as the data used to train it. In order to teach the model to classify phrases that are ok versus not ok, you need lots of examples of both.
The primary approach to developing good training data is to find a large corpus of domain relevant data. Then use humans to go through a random sample labelling it as manipulation risk or not manipulation risk.
In the real world most conversations will not include anything having to do with market or price manipulation. So you may find <0.1% of phrases with something that is worth labeling as manipulation risk. Further, to get a meaningful set of language that conceivably resembles manipulation risk might takes millions of records and, even then, some obvious phrases might never have been seen in the data.
The second problem is, gaining access to such large amounts of sensitive workplace communications for the purpose of training a model can be difficult. Snippets are sometimes in the public domain when illegal activity is found but it is too sparse to be meaningful on its own.
It turns out that, beyond market manipulation, a lot of the particular risks that compliance might be concerned with are associated with the same training data challenges.
Mitigating Risk with Generative AI
Generative AI is a highly scalable NLP solution for generating human like data that can be then used to train separate detection models. Yes, I know you may ask yourself, if we don’t have data to train detection models appropriately then how can we train generative AI models?
While generative AI has been around since the 1950’s it is only very recently that large language models like GPT-3 (Generative Pre-Trained Transformer 3) have begun to generate human like text and become reasonably available to the general public. It is the massive scale of that enables it to generate new content from few examples
The availability of LLMs like GPT-3 remove the need for individual researchers to:
Develop the mathematical algorithms
Source massive amounts of data
Label massive amounts of data
Run the compute infrastructure necessary to produce the models
Some examples to illustrate these benefits:
We can see that GPT-3 has been built at a scale that is out of reach for the vast majority of hobbyists up to large corporations. Collecting and processing 500 billion tokens (eg words) with 175 billion parameters required ~2500 Petaflop/s-days to train.
Needless to say these are ridiculous numbers and require levels of investment not possible at Fairwords. Luckily, we don’t need to do it. These models and alternatives are now available as open source models (e.g. Bloom) and paid API services (e.g. OpenAI) removing the huge initial investment hurdle.
Nuts and Bolts
Given all the benefits of these new Generative AI models, Fairwords is able to use these models to quickly generate high quality human like conversation data relevant to our domain with minimal input.
While we have built tools for building such synthetic training data, the process is straightforward.
Provide sample input prompts into the selected model.
Review results.
Re-input those results iteratively to generate conversation flow.
Repeat steps 1 - 3 to produce many conversation variations being careful to widely vary the starting prompts for generated output diversity.
Once we have raw conversation data we can (optionally) co-mingle it with non synthetic conversation data and go through a data labelling process - in house or via third party services.
After producing training data in the 2,000 to 10,000 record length we can train our detection model. We then test the performance of the detection model against precision and recall goals we set - just like we would for non-synthetic data.
The end result is that we can detect all sorts of specific compliance risk in conversations that rarely happen in workplace conversations. This enables us to strike the ideal of both increasing our ability to find risky conversations (recall) AND minimize the rate of false positives (precision) that waste compliance time.
I share this as an example of how we are using new Generative AI capabilities in the market today. It takes a long time to develop the institutional capacity to effectively leverage data science. As Product Leaders, it is our job to ensure we are laying the groundwork to take advantage of these technological innovations.
Hopefully, this article plants a seed of ideas for how Generative AI can help you deliver on your roadmap goals.
Final note. While I do my best to understand the benefits and application of these technologies, I am definitely not “the” expert at Fairwords. I work with some great data scientists who actually know what they are doing (and try to teach me). If you want to connect to learn more just comment on this article or DM me over on Twitter.
Thanks for writing this Sean! I wanted to ask how generative AI generates labeled conversations?