Does Artificial Intelligence Query MySQL Better Than You?

 The signal-to-noise ratio in the hype for Artificial Intelligence is incredibly high. There are many claims that AI will replace software developers. Many years ago, developers were told to find other jobs as a new programming language was designed to be written by nonprogrammers, replacing expensive coders. The Common Business Orientated Language, or COBOL, was a big hit and is still very popular in some sectors. But it did not replace programmers.

But can an AI write Structured Query Language (SQL) as well for use with MySQL, or better than you? 

I decided to use the MySQL World and Sakila databases because they have been around for decades. Nearly everyone who has had tangential exposure to MySQL has used them, and their tables are easy to understand. These tried and true schemas have been used for documentation, blogs, webinars, and demos for years, so I had little doubt that some of that had been scraped and fed into the AI.

I randomly picked Grok as the AI and will try others as I have time.

First Try

To answer the question in the post's title, I will use the free Grok AI. I asked it to 'Use the MySQL world database to list all the cities in Texas.'  The results were less than spectacular.

The Answer:

I'm afraid I don't have direct access to the MySQL World database. However, I can still help you with your request! Since I can't query that specific database, I'll rely on my general knowledge and available resources to provide a list of cities in Texas. Would you like me to give you a list of some major cities in Texas based on what I know, or should I search for a more comprehensive list for you? Let me know how you'd like me to proceed!

Okay, I was hoping for a more sock-knocking-off answer. I realized I did not express my desire to get the SQL for the query. 

Yes, It Probably Does Write Better SQL Than You!

I tried again, asking for the SQL to answer the question of what cities are in Texas. I asked, 'Using the MySQL world database, write the SQL query to list all the cities in Texas.' And the answer:

Since I don’t have direct access to the MySQL World database, I’ll assume its structure based on common conventions for such databases. The MySQL World database typically includes tables like `City`, `Country`, and `CountryLanguage`, with `City` containing city names and linking to a country or region code. For this query, I’ll assume:
- The `City` table has columns like `Name` (city name), `CountryCode`, and `District` (state or region).
- Texas is represented as a `District` under the `CountryCode` for the United States (usually 'USA').
Here’s the SQL query to list all cities in Texas based on that structure:
```sql
SELECT Name
FROM City
WHERE CountryCode = 'USA' AND District = 'Texas'
ORDER BY Name;
```
This query:
- Selects the `Name` column (city names) from the `City` table.
- Filters for cities where the `CountryCode` is 'USA' and the `District` is 'Texas'.
- Orders the results alphabetically by city name.
If the database uses a different schema (e.g., a separate table for states or different column names), let me know, and I can adjust the query accordingly! Would you like me to explain any part of it further?

Wow! The query highlighted above even had ORDER By Name added to it. I got poorer results when interviewing junior SQL developers with the same question.

Open The Pod Bay Doors, HAL!

Okay, let us get a little tougher. 'Using the MySQLworld database, write the SQL to list the five largest cities by population, the district, and the country.'

I notice you're asking for a query using the MySQL "world" database. I'll assume you're referring to the common sample database that contains tables like `city`, `country`, and related information. Here's the SQL query to list the five largest cities by population, including their district and country:

```sql
SELECT 
    c.Name AS City,
    c.District,
    co.Name AS Country,
    c.Population
FROM 
    city c
JOIN 
    country co ON c.CountryCode = co.Code
ORDER BY 
    c.Population DESC
LIMIT 5;
```

This query:
1. Selects the city name, district, country name, and population
2. Joins the `city` table with the `country` table using the CountryCode
3. Orders the results by population in descending order (largest first)
4. Limits the output to the top 5 results

Would you like me to explain any part of this query in more detail?

Double WOW!  Notice that the CountryCode and City tables are aliases to make reading the query easier. There is also an ORDER BY included.  Okay, so AI can write some good SQL.

Getting Tougher

I asked, 'Use the MySQL Sakila database and write the SQL to list the top five grossing films by decade.' This query will need to make some joins and translate some terminology common to humans ('decade') into SQL.

Let's write a SQL query for the Sakila database to find the top 5 grossing films by decade. The Sakila database tracks film rentals and payments, so we'll need to:
1. Join relevant tables (film, inventory, rental, payment)
2. Calculate total revenue per film
3. Extract the decade from the release_year
4. Group and rank films by revenue within each decade

Here's the SQL query:

```sql
WITH FilmRevenue AS (
    SELECT 
        f.film_id,
        f.title,
        f.release_year,
        FLOOR(f.release_year / 10) * 10 AS decade,
        SUM(p.amount) AS total_revenue
    FROM film f
    JOIN inventory i ON f.film_id = i.film_id
    JOIN rental r ON i.inventory_id = r.inventory_id
    JOIN payment p ON r.rental_id = p.rental_id
    GROUP BY f.film_id, f.title, f.release_year
),
RankedFilms AS (
    SELECT 
        title,
        release_year,
        decade,
        total_revenue,
        ROW_NUMBER() OVER (PARTITION BY decade ORDER BY total_revenue DESC) AS revenue_rank
    FROM FilmRevenue
)
SELECT 
    decade,
    title,
    release_year,
    total_revenue,
    revenue_rank
FROM RankedFilms
WHERE revenue_rank <= 5
ORDER BY decade, revenue_rank;
```

This query will:
- Calculate total revenue per film by joining the necessary tables
- Create decades (1950, 1960, etc.) using FLOOR division
- Use ROW_NUMBER() to rank films within each decade by revenue
- Filter to show only the top 5 per decade
- Sort results by decade and rank

The output will show:
- Decade (like 1950, 1960, etc.)
- Film title
- Release year
- Total revenue
- Rank (1-5) within that decade

Note: The Sakila sample database primarily contains films from 2005-2006, so you might only see one decade in the results unless the database has been modified with more diverse release years.

Triple WOW! It wrote Common Table Expressions and a Window Function. And it did it very quickly. I will admit Grok did it much faster than I could. 

Impressions

My impressions are that AI can write very good SQL. My gut feeling is similar to my discovery many years ago that C could write low-level code on par with my assembly language skills.  The SQL is high-quality, quickly written, and easy to read (I fear I have to add for a human here).

Is AI going to be like IDEs, filling in gaps and completing lines for coders? I can see that happening. Will it replace coders? Well, someone still has to double-check the output but the above queries are pretty impressive. 

Time will tell if an AI like Grok can consistently produce high-quality Structured Query Language code and how well it follows coding standards. 

Excuse me while I have my Kier Dullea from 2001: A Space Odyssey moment.


Comments

Popular posts from this blog

Tracking MySQL Query Plans Over Time Part One