AI Automation - building bots with ChatGPT
# What is ChatGPT?
For those who have been away on Mars for the the past few months, ChatGPT is an "LLM" (Large Language Model) which is capable of a wide array of natural language processing tasks.
Essentially, it is a large probabilistic model, trained on all the data on the internet. It works by predicting the next word in a sentence, based on the previous words it has seen.
You can think of this a little like a fancy autocomplete - but ChatGPT goes well beyond anything we've seen before.
ChatGPT's understanding of context can be downright spooky at times; you can give it instructions and it appears to understand and be able to follow them. It is even capable even of writing correct code (as long as the task you give it is well defined enough).
The ability for ChatGPT to "understand" your input and perform tasks based on a natural language specification is what makes it such an impressive piece of technology.
# What can ChatGPT do to automate tasks?
# Generating text
At the moment, ChatGPT works by taking a text input and by outputting text in response. If you have a task that relies on text generation, then ChatGPT is able to perform these straight away - things like generating marketing materials from prompts, filling out email text, or automatically translating from one language to another.
It's worth noting that some of these use-cases are frowned upon. Even without the advent of powerful generative text tools, we all have to deal with the headache of spam. Adding more spam to this already towering pile may save you some time, but won't necessarily make the world a better place!
If you want your marketing to make an impact, you'll generally want to spend time and effort crafting your messages to cut through the noise and resonate with your customers.
Where ChatGPT can helpful is in assisting you to flesh out your ideas into full prose. ChatGPT writes quite competently, so if you're not a professional writer this can be pretty handy! However, it's difficult to automate completely.
# Structuring data
One of ChatGPT's most powerful features is its ability to take a piece of text and to reformat it into a different structure.
For example, you can give ChatGPT a prompt containing some unstructured text and ask it to extract certain data in the structure you specify. ChatGPT is sophisticated enough to be able to understand and follow these instructions.
The result? A task that's extremely hard using standard web scraping techniques - the extraction of unstructured data - becomes a relatively simple matter.
A downside of this is that writing the prompt can be a little painstaking. Luckily, the prompt design can be generalised to the point that the prompt structure can be implemented automatically into a tool like axiom.ai.
This is a screenshot from our upcoming release that shows some beta ChatGPT integration, using it as an AI data extractor (there'll be a lot more to come):
This basic ability seems like it can be expanded in quite a number of ways. ChatGPT is capable of answering mathematics questions, performing data analysis functions, and more. Applying this to unstructured data seems to create a lot of potential for powerful automation capabilities. Think of how the Star Trek computer works and you'll have an idea of why this could be exciting.
# Automatically building automations
This may be the most obviously interesting case for AI automation, although at this stage there are some limitations to how far you can go.
Essentially, this works the same way as the previous data structuring example. You provide a set of steps to ChatGPT in natural language, and it generates an output in code that will run the automation for you.
ChatGPT is more than happy to output JSON if you ask for it, and it doesn't take much to get it to understand what you want:
As this is a structured output, it then becomes possible to automatically translate these steps into any other similar DSL - including Axiom's. What's not to like? Extending this technology really is the Holy Grail for automation. Simply tell the computer what you want, in plain English, and it'll do it for you, no complex configuration or technical knowledge required!
Eagle-eyed readers may note some oddities in the above example - we'll explore the problems with this methodology below. Although exciting, there are still some significant issues to overcome. But who knows how rapidly this capability will advance? We're certainly keeping an eye on it, and will be implementing these techniques into axiom.ai in the near future.
# Questions and answers
ChatGPT was fundamentally designed as a back-and-forth process which automatically stores all your previous questions and incorporates them into ChatGPT's next responses. This gives it an awareness of context which works amazingly with its natural language processing abilities.
As a result, ChatGPT is the best chatbot software ever, and you are certainly going to see this aspect of the technology proliferate widely.
The context-learning features of ChatGPT (Open AI call it "few-shot learning") mean anyone can update its knowledge with new information and prime it to answer in a particular way. That can go as far as you like - you can even do things like paste entire pieces of documentation into the bot and have it pick out answers for you!
This means that with some creativity, it's absolutely possible to use ChatGPT right now to search through large volumes of documentation and the general web, to help you understand what you need to do to build a bot. This is a newly developing field which is called "prompt engineering", a field which is likely to become increasingly important as AI tools proliferate.
However, there are limitations.
On the technical side, there is a limit to prompt size, which can make it rather awkward if there's a lot of info to parse. Meanwhile, on the side of the user, it can be quite difficult to know what information ChatGPT even has available, or what question to ask.
To solve these issues, we're currently working on our own fine-tuned ChatGPT model. Fine-tuning is a more detailed way of training a bot to answer questions, and one that's more efficient than few-shot learning. Unfortunately, it's also a more technical approach and requires a large number of examples to be effective.
When done well, however, this fined tuned "automation guru" model will able to answer questions specifically about a wide variety of browser automation problems.
We have spent years in this space and have acquired a huge amount of knowledge in that time. Now, with ChatGPT, we have a way to encode this knowledge into software that is able to converse in a natural way. Watch this space!
# What are the downsides and limitations of ChatGPT?
# Falsehoods and hallucinations
A major issue in ChatGPT is its desire to "hallucinate" results, i.e. fabricate plausible looking text that is either false or nonsensical. When it comes to automations that need to run consistently, this is a tricky issue that needs to be addressed - quite apart from the valid societal concerns raised by relying on machines that can "lie" like this.
There are ways around this problem. The more advanced models made available by Open AI appear to be able to understand when you tell them not to output false results, something we find remarkable. Consider these two examples:
Whoops! It seems ChatGPT is a little overeager to answer our question, and has gone on a flight of fancy.
Let's see if we can fix that:
That's better! Here we've told ChatGPT not to make things up, and as a result the hallucinations have vanished. What's fascinating about this is that it appears that - at least on some level - ChatGPT is able to understand that some extrapolations are more justified than others.
In other words, it's capable of knowing when it's lying.
We have been implementing a few of these "anti-disinformation" countermeasures into our ChatGPT integration, but even so it's something to watch out for. It may not be possible yet to 100% guarantee that the answers ChatGPT is producing are accurate.
# Running costs
ChatGPT is not particularly expensive, but it can't be denied that running a large number of queries will start to stack up the costs.
ChatGPT works by using a token-based system; a token is approximately one word and one space. Both the prompt and ChatGPT's response to it are charged, so the longer these prompts are, the more expensive they get.
The cost is barely worth worrying about for relatively infrequent use, such as having ChatGPT on hand as a kind of assistant to ask information, correct code, and other similar tasks.
However, when it comes to full automation tasks, this can get a little more sticky. Invoking ChatGPT on thousands of pieces of data will rack up the charges, particularly when that data is substantial (as is the case for many web pages, for example).
Some pages are even so large that the currently available models aren't able to process them, although Open AI are addressing this with ChatGPT 4, which allows a higher maximum token limit.
ChatGPT can also be a little slow. When integrating ChatGPT into axiom to perform scraping tasks, using the AI to parse text automatically is considerably slower than using the visual scraper alone. This is partially because of ChatGPT's response times, which can take a few seconds, and partially because you need to scrape the page data anyway in order to pass it to ChatGPT. ChatGPT does not store up to date data automatically, and doesn't have a 1-1 record of all data on every site available for retrieval, so the scrape is required for accurate results.
Ultimately, whether it's worth using this tool or not depends on your use case. ChatGPT excels at pulling unstructured data from pages, or pages where the data you want can be placed in a number of different configurations. For many websites, however, the data is reasonably well structured using HTML code, which makes visual scraping tools perfectly sufficient for the task - and these tools are generally faster and less expensive to run.
# Specifying complexity
ChatGPT is extremely impressive and can look magical, or like there's really a person in there thinking through your problem. But it's important to remember that this is not the case. ChatGPT can only produce a useful result if it's given enough context to generate a completion.
For simple tasks, this works superbly. But as the task you need to perform gets more and more complex, ChatGPT's apparent ability begins to degrade. Hallucinations increase, and you have greater and greater trouble controlling the output. What's going on?
ChatGPT is not an expert on any particular task, but is a large model built from all the existing expertise that's available on the internet.
This leads to a critical difference between human and machine. A human's understanding is based on general principles, which the human can then extrapolate to solve new problems.
LLMs do not work like this. Instead, they are using your question as a set of inputs to weight the probability distrubtions of previously digested data. That means the output is only as good as the input, unlike with a real expert, who will be able to take very vague specifications and use their understanding to build a solution.
The upshot of this is that ChatGPT does not possess any understanding of your particular problem, unless you are capable of explaining it to the AI. Once you fully understand your task, you are able to tell ChatGPT what to do and it will produce the result you're after - code, data, whatever it is. But if you do not yet fully understand the steps involved, ChatGPT won't be able to either.
It's also an unfortunate fact that as tasks increase in complexity, the proportion of the work that is specification (as opposed to implementation) increases. Very complex tasks are almost entirely specification problems - the main challenge is in breaking down a requirement into smaller, more well-defined steps that can be solved indivually.
This can mean that a good visual UI with context specific help, one designed to help you with specifying your problem as well as building it, is probably still superior to the state of the art in generative AI. At least for now...
In any case, we'll be watching this space for further advancements! One thing that's certainly true about AI is that it's impossible to predict how rapidly it will advance in the future.
ChatGPT is a remarkable tool and we're investigating several implementations into axiom.ai in the near future, to help with the following use cases:
- Extracting unstructured data (coming soon!)
- Scaffolding automations
- Customer support assistance
- Locating selectors
In conjunction with working on improvements in the vital art of user experience, we're hoping that this technology can make it easier than ever for anyone to build bots and automation with AI.