Error "line too long" while using GpLoad
I was trying to load a 6GB data file to my Greenplum database, when I encountered the following error:
2012-04-30 00:34:04|ERROR|ERROR: gpfdist error - line too long in file (url.c:1192) (seg0 slice1 warrior:40000 pid=7466) (cdbdisp.c:1457)
DETAIL: External table ext_gpload20120430_002808_27522, line N/A of gpfdist://abc:8000//home//inputData.dat: ""
encountered while running INSERT INTO public.target_table ("col1","col2","col3","col4") SELECT "col1","col2","col3","col4" FROM ext_gpload20120430_002808_27522
The Gpload documentations has the following option for gpfdist:
Sets the maximum allowed data row length in bytes. Default is 32768. Should be used when user data includes very wide rows (or when line too long error message occurs). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 1MB.
Can anyone throw some light on what may be causing this error, I've never come across it before although I've been using Gpload for quite some time now.
Also, how can I set a gpfdist option when using gpload?
line to long error explaination
we see this frequently when the data contains control characters that are not properly escaped. You should set the escape option to a character not present in the data.
Specifies the single character that is used for C escape sequences (such as
\n,\t,\100, and so on) and for escaping data characters that might otherwise be
taken as row or column delimiters. Make sure to choose an escape character that is
not used anywhere in your actual column data. The default escape character is a \
(backslash) for text-formatted files and a " (double quote) for csv-formatted files,
however it is possible to specify another character to represent an escape. It is also
possible to disable escaping in text-formatted files by specifying the value 'OFF' as
the escape value. This is very useful for data such as text-formatted web log data
that has many embedded backslashes that are not intended to be escapes.
FYI, there's an undocumented option that can be put into the INPUT section of the gpload control file (which I discovered by reading gpload.py):
- MAX_LINE_LENGTH: 32768
But yes, it's often caused by an unescaped quote somewhere in the middle of the data.
Also, I’ve seen this “line too long” error when loading compressed files (gzip) where the file turned out to be corrupted. Are you loading compressed files such that some “bad gzip files” might be getting through to the load process? Correcting the corrupted file resolves the problem for us. (running gunzip –t filename validates gz files are good or not.)
i have also seen it loading geospatial data. polygon data can get huge.
Tags for this Thread