I've an issue with double-quotes in CSV file. One of the columns may contain this kind of value: "STATUS ""H"" "
I've got quote set to "
The file source fails on such records.
I found this thread and Scott tells us there that the file can't contain " in data.
Is this 100% correct?
I've got mutliple text columns and the pain is that I don't know which column might have these cases in future. To create a script means to write my own file parser for all files I use.
Any ideas?
Dima.
Hi,
If you have a CSV file, do you actually need to specify the text delimeter - or do you have some text that has commas in the string as well?
If your source file contains things like
"Status ""H"" ", "Next string", 1234, etc,
then I would try to solve the problem at source. That formatting suggests to me that the data is from another database where the actual data is ,Status "H", and the CSV export is trying to escape the " character by doubling it up.
Perhaps if you can export the CSV with no text delimiters, or with a character that is guaranteed not to appear, you may be OK.
Hope this helps,
Richard
|||Richard,
you are correct in your gueses. I do have data like this:
"Status ""H"" ", "Next string", 1234, etc,
I can't remove quoation mark since I may have data like this:
"Status ""H"" ", "Next,string", 1234, etc,
So, if I removed " I'd have another error like this:
Status "H", Next,string, 1234, etc,
So, it wouldn't parse it correctly.
I can't change quotation, symbol since I don't control source system...
Now, what I could change is data within the column. So, I could replace:
Status "H"
with something like
Status "H"
Then when I read this - I'd convert it back to:
Status "H"
What do you think?
Dima
|||Dima,
That looks like a solution to me - assuming you are going to make this change prior to bringing the data into the database, ie by pre-processing the CSV file.
What process are you going to use to make the changes? I'm sure you've thought of this, but if you were just to globally change "" to " then it will fall over when you three or more " in a row.
eg. """H"" is my status" should become ""H" is my status" , not ""H" is my status"
I've not tried it, but you can probably do this with an SSIS script of some sort, passing regular expressions. That would be better than having an external process change the file.
Regards,
Rich
|||The file I'm dumping is on Unix. So, I was thinking about creating sed script, using regular expressions to post-process the file after it's dumped...
Thank for the feedback BTW it helped me think this thru
No comments:
Post a Comment