Can ChatGPT help me extract CSS selectors?

I want to test if ChatGPT can extract CSS selectors from HTML. When I first tried this, it wasn't possible, but with the new models, it might be. Let's find out. How much prompt crafting will it take?

When scraping the web or automating a browser, I sometimes need to create a custom CSS selector to locate an element on the page. I usually use Google Chrome's inspector to examine the code and identify a unique selector—a process that takes just a few minutes.

I'm excited to see how ChatGPT performs. How detailed will my prompts need to be? Will ChatGPT understand how to create a unique selector while avoiding pitfalls like obfuscated CSS classes?

I'm going to test this with real-world selector challenges I've encountered. I'll start with a simple prompt: can ChatGPT solve the problem without guidance?

For clarity, I don't just want ChatGPT to generate a CSS selector—I want it to be unique to the element. I'm curious to see how ChatGPT achieves that and what the results will be.

Does ChatGPT understand the hierarchy of selectors? Will it be able to leverage other types of selectors? Will ChatGPT know what I mean by a unique selector? I'm Alex Barlow, co-founder of Axiom. Let's dive in.

The first model I'll test is GPT-4-turbo-03-mini, as it allows me to paste a full page of HTML into the web UI. However, my gut tells me the reasoning could be total overkill, leading to funky results.

# How I will conduct my CSS experiments with AI and ChatGPT

I aim to determine whether ChatGPT can generate a unique CSS selector for an element within an HTML page. I will be testing ChatGPT and a few of its recent models. I will provide:

  1. The entire HTML of the page.
  2. I want a unique css selector to locate the element shown below.
<input data-testid="wordcount" type="number" placeholder="Max" min="0" class="sc-csKJxZ sc-dsAqUS dQSMvU" value=""></input>

I will not share the full HTML of the page here; it's a whole page of code, and including it would not be practical. As for my prompts, I'll keep them simple to see how well AI can handle the task with minimal instruction.

I will also avoid including ChatGPT's response word-for-word in the post; I fear that would make the longest blog post in the world.

# Experiment 1: Minimal guidance

We want AI to perform tasks without needing a long list of instructions. For my first experiment, I kept it simple with a short prompt.

ChatGPT Prompt

Can you find a unique CSS selector for this element (I inserted the elements HTML) From this HTML (I inserted the HTML)

# The result

Well, first off, it didn't produce a unique selector. Instead, the response was a lengthy explanation covering CSS selectors, JavaScript, and XPath. None of which I asked for. After nudging it, ChatGPT quickly generated a selector for me:

/* This is the unique selector recommended by ChatGPT */
input[data-testid="wordcount"]

ChatGPT claimed the selector was unique. However, after inspecting the HTML, I discovered there were two inputs with that selector: `input[data-testid="wordcount"]. Additionally, I would avoid using the data-testid attribute as it originates from testing code and could be removed.

# Experiment 2: Minimal guidance with some instruction

I Make minimal changes to the prompt. Keep in mind that this is a short task, and AI must be worth it. I don't want to spend hours prompt writing.

ChatGPT Prompt

Can you find a unique CSS selector for this element (Element HTML) From this HTML (I insert the HTML).

I want a unique selector without using data-testid="wordcount" or any other data-testid attributes.

# The result

/* Option one - element with 3 attributes */
input[type="number"][placeholder="Min"][min="0"]
/* Option one - 3 css classes */
input.sc-csKJxZ.sc-dsAqUS.dQSMvU

ChatGPT states, "Either approach will give you a unique CSS selector without relying on data-testid attributes."

When manually checking the HTML, I found that the first solution was not unique because other inputs shared the exact attributes. The second suggestion used obfuscated CSS classes that could change periodically, causing the selector to break. I do not recommend doing that! I will make sure to mention this explicitly in my following prompt.

# Experiment 3: Expanding the prompt using the console to validate results


ChatGPT Prompt

Can you find a unique CSS selector for this element (Element HTML) From this HTML (I inserted the HTML).

I want a unique selector without using data-testid="wordcount" or any other data-testid attributes. In addition to not use CSS classes when obviously obfuscated.

# The result

ChatGPT turned its attention to using absolute paths, which is excellent. I recommend this approach when solving selector riddles. Nice work! Or was it?

/* This is a single-line comment */
#container form input[type="number"]:nth-of-type(2)
/* This is a single-line comment */
html > body > div:nth-child(1) > form:nth-child(1) > input:nth-of-type(2)

I was excited by the results, so I didn't just manually check them—I validated them using the Console in Chrome Tools:

document.querySelector('html > body > div:nth-child(1) > form:nth-child(1) > input:nth-of-type(2)')

When I checked in the console, the selectors failed validation. When I checked the HTML, they were completely incorrect. The selector form input[type="number"]:nth-of-type(2) was misusing nth-of-type, and it wasn't even close to accurate. Something had gone astray, or some AI.

# A dead end

I started to wonder why the selectors were so inaccurate. The structure didn't match the HTML at all. But why? I asked the AI to recheck the HTML. After restructuring the prompt several times and providing both the element and HTML snippet again, the AI obliged and returned a few results.

form#myForm > div:nth-child(2) > div:nth-child(1) > input[placeholder="Max"]

However, it used this ID selector: #myForm. The issue was that this ID wasn't part of the HTML I provided. I asked ChatGPT to show me the HTML it used to generate the selector. It obliged, revealing it wasn't using the HTML snippet I gave it—it had invented its example form. I'd reached a dead end; the AI was confused and hallucinating.

That's enough—time to start again in a fresh window.

It kept forgetting the HTML snippet I shared.

  1. It started using its own basic form example.
  2. It invented an ID (#myForm) that didn’t exist.
  3. Its reasoning meant every prompt adjustment caused it to reevaluate moving further away from the solution.

The reasoning model was completely the wrong choice. It tends to go off the rails, making leaps when I needed incremental improvements.

# Experiment 5: One last go with ChatGPT 4.0 model

I didn't use this model first because of its character limit—I can't share the entire HTML page with the AI to work out its solution fully. Starting from scratch:

ChatGPT prompt

Construct a unique CSS selector:

Start with the Form ID in the HTML form element containing the input. Use document hierarchy from the form to the input, applying nth-child for each div.

For the input element, use an attribute selector. Do not use classes or attributes starting with data-test.

# Result

/* This is a single-line comment */
#CerebroFilter > div > div:nth-of-type(2) > div:nth-of-type(1) > div:nth-of-type(2) > input[type="number"][placeholder="Max"]
/* This is a single-line comment */
#CerebroFilter input[placeholder="Max"]
/* This is a single-line comment */
#CerebroFilter input[placeholder="Max"][type="number"]

This time, the model is faster, and unlike the reasoning models, it extracts the correct form ID without an issue. But again, the selectors are not unique.

# Wrapping up: Is it worth using AI to find CSS selectors?

My experiments showed that ChatGPT can quickly generate CSS selectors but struggles to produce unique ones. Early attempts resulted in selectors that weren't unique or depended on unstable attributes, such as testing-specific or obfuscated CSS classes.

When I added more instruction to the prompts, the results improved but required manual validation. More detailed prompts led to increasingly complex and incorrect selectors. Additionally, the model forgot the provided HTML context and created selectors based on imagined code structures.

Switching to the GPT-4.0 model showed better initial results. It successfully identified correct form IDs and structured the selectors logically. Yet even this improvement didn't fully guarantee unique selectors without manual verification.

The experiment wasn't a great success. It wasn't better than a human, so we're not likely to be replaced anytime soon. However, it has got me excited. With further prompt crafting, it could work well.

We plan to explore using AI to fix broken selectors in Axiom.ai, which could be a fantastic application of ChatGPT or similar models. It certainly showed enough promise to warrant further exploration.

However, we must remember that if the AI starts inventing HTML forms, it could confuse users. So we'll approach this cautiously—with a healthy pinch of salt.

Alex Barlow

Alex Barlow

Alex spent 14 years creating web apps, often automating repetitive tasks, before co-founding Axiom.ai. He’s hands-on with users and enjoys learning from them. He creates intricate automation the no-code way, and empowered by generative AI, he's extending his skill set to include code. Outside of work, he loves exploring the Scottish Highlands with his daughter and making sandcastles on Firemore Beach.

Contents

    Install the Chrome Extension

    Two hours of free runtime, no credit card required