CSV File Load Error - Data Too Long or Value Too Long
You are trying to load a CSV file into a database, and almost inevitably, you run into an ERROR: value too long for type character varying 64 if you are using PostgreSQL. Or SQL Error [1406] [22001]: Data truncation: Data too long for column 'Name' at row 1 if you are using MariaDB or MySQL.
It doesn't matter if you are using the DBeaver application or the native database import tools. Newbies will look at the first line of the CSV file to examine the very first line and scratch their heads, as there is no error in the first line.
The database is ingesting the data one line at a time, so each line in the file is the first line. Your line in the CSV file is hiding deep in the file. And there may be many of them in there.
The error message you receive is helpful, but it DOES NOT tell you which line in the CSV file is causing you this grief. You could use an 'eyeball inspection' of the data, which quickly becomes tedious. And if the CSV file has thousands of lines, your odds of finding the offending line grow longer.
Splitting up the file into smaller chunks is labor-intensive.
Example
(you can disable batch insert to skip particular rows).
Reason:
SQL Error [22001]: Batch entry 0 INSERT INTO public.titanic ("PassengerId","Survived","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked")
VALUES (('308'::int8),('1'::int8),('1'::int8),('Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'),('female'),('17'::int8),('1'::int8),('0'::int8),('PC 17758'),('108.9'::real),('C65'),('C')) was aborted: ERROR: value too long for type character varying(64) Call getNextException to see other errors in the batch.
A Difference Between PostgreSQL and MariaDB/MySQL
PostgreSQL will return the offending line in the error message, as shown above. MariaDB and MySQL do not. They provide only SQL Error [1406] [22001]: Data truncation: Data too long for column 'Name' at row 1.
DBeaver's import tool selected a VARCHAR(64) for the name field after examining some of the records. The unfortunate Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo needs more space. And there may be other records with names as long, or longer.
Solution
You can use Google Sheets to examine the CSV file before attempting to load it into the database.
We can use =max(arrayformula(len(D1:D891))) on the columns with the names to determine the maximum length of a name. The longest name in this data is eighty-two characters long.
If you read the linked blog post, you will see that I defined the Name column as a TEXT field to solve the issue. But what if you are trying to save space? Knowing the width of that column is valuable.
So now you have a tool to determine the width of a column when you get the dreaded 'value too long'
Comments
Post a Comment