csv-reading: Comma-Separated Value (CSV) Parsing
(require csv-reading) | package: csv-reading |
1 Introduction
2 Reader Specs
newline-type —
Symbol representing the newline, or record-terminator, convention. The convention can be a fixed character sequence ('lf, 'crlf, or 'cr, corresponding to combinations of line-feed and carriage-return), any string of one or more line-feed and carriage-return characters ('lax), or adaptive ('adapt). 'adapt attempts to detect the newline convention at the start of the input and assume that convention for the remainder of the input. Default: 'lax separator-chars —
Non-null list of characters that serve as field separators. Normally, this will be a list of one character. Default: '(#\,) (list of the comma character) quote-char —
Character that should be treated as the quoted field delimiter character,or #f if fields cannot be quoted. Note that there can be only one quote character. Default: #\" (double-quote) quote-doubling-escapes? —
Boolean for whether or not a sequence of two quote-char quote characters within a quoted field constitute an escape sequence for including a single quote-char within the string. Default: #t comment-chars —
List of characters, possibly null, which comment out the entire line of input when they appear as the first character in a line. Default: '() (null list) whitespace-chars —
List of characters, possibly null, that are considered whitespace constituents for purposes of the strip-leading-whitespace? and strip-trailing-whitespace? attributes described below. Default: '(#\space) (list of the space character) strip-leading-whitespace? —
Boolean for whether or not leading whitespace in fields should be stripped. Note that whitespace within a quoted field is never stripped. Default: #f strip-trailing-whitespace? —
Boolean for whether or not trailing whitespace in fields should be stripped. Note that whitespace within a quoted field is never stripped. Default: #f newlines-in-quotes? —
Boolean for whether or not newline sequences are permitted within quoted fields. If true, then the newline characters are included as part of the field value; if false, then the newline sequence is treated as a premature record termination. Default: #t
3 Making Reader Makers
procedure
→
(-> (or/c input-port? string?) (-> (listof string?))) reader-spec : csv-reader-spec?
"fruits.csv"
apples | 2 | 0.42
bananas | 20 | 13.69
(define make-food-csv-reader (make-csv-reader-maker '((separator-chars #\|) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t))))
(define next-row (make-food-csv-reader (open-input-file "fruits.csv")))
> (next-row) ("apples" "2" "0.42")
> (next-row) ("bananas" "20" "13.69")
> (next-row) ()
4 Making Readers
procedure
in : (or/c input-port? string?) reader-spec : csv-reader-spec = '()
(define next-row (make-csv-reader (open-input-file "fruits.csv") '((separator-chars #\|) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t))))
5 High-Level Conveniences
(csv->list string) (csv->list (make-csv-reader string)) (csv->list (make-csv-reader (open-input-string string)))
procedure
proc : (-> (listof string?) any)
reader-or-in :
(or/c (-> (listof string?)) input-port? string?)
procedure
proc : (-> (listof string?) any/c)
reader-or-in :
(or/c (-> (listof string?)) input-port? string?)
procedure
reader-or-in :
(or/c (-> (listof string?)) input-port? string?)
6 Converting CSV to SXML
"friends.csv"
Binoche,Ste. Brune,33-1-2-3
Posey,Main St.,555-5309
Ryder,Cellblock 9,
> (csv->sxml (open-input-file "friends.csv"))
(*TOP* (row (col-0 "Binoche") (col-1 "Ste. Brune") (col-2 "33-1-2-3")) |
(row (col-0 "Posey") (col-1 "Main St.") (col-2 "555-5309")) |
(row (col-0 "Ryder") (col-1 "Cellblock 9") (col-2 ""))) |
> (csv->sxml (open-input-file "friends.csv") 'friend '(name address phone))
(*TOP* (friend (name "Binoche") |
(address "Ste. Brune") |
(phone "33-1-2-3")) |
(friend (name "Posey") |
(address "Main St.") |
(phone "555-5309")) |
(friend (name "Ryder") |
(address "Cellblock 9") |
(phone ""))) |
7 History
- Version 3:4 —
2016-12-09 Fixed bug of various %csv-reading:reader-or-in-arg calls passing string instead of symbol as first argument, which resulted in an error when an error-raising call was made. (Thanks to John B. Clements.)
- Version 3:3 —
2016-03-02 Tweaked info.rkt, filenames.
Linked to sxml-intro package, and changed “SXML/xexp” references to “SXML”.
- Version 3:2 —
2016-02-27 Updated home page URL.
Separated deps and build-deps.
- Version 3:1 —
2016-02-25 Fixed deps.
- Version 3:0 —
2016-02-21 Moving PLaneT csv package to new package system as csv-reading package, due to naming conflict.
Moved test suite into main source file.
- Version 2:0 —
2012-06-13 Converted to McFly and Overeasy.
- Version 0.11 —
Version 1:7 — 2011-08-22 Changed URL.
Changed references to Scheme to Racket. A little code cleanup, including using Racket error better.
- Version 0.10 —
Version 1:6 — 2010-04-13 Documentation fix.
- Version 0.9 —
Version 1:5 — 2009-03-14 Documentation fix.
- Version 0.8 —
Version 1:4 — 2009-02-23 Documentation changes.
- Version 0.7 —
Version 1:3 — 2009-02-22 License is now LGPL 3.
Moved to author’s new Scheme administration system.
- Version 0.6 —
Version 1:2 — 2008-08-12 For PLT 4 compatibility, new versions of csv-map and csv->list that don’t use set-cdr!. (Thanks to Doug Orleans.)
PLT 4 if compatibility change.
Minor documentation fixes.
- Version 0.5 —
2005-12-09 Changed a non-R5RS use of letrec to let*. (Thanks to David Pirotte and Guile.)
- Version 0.4 —
2005-06-07 Converted to Testeez.
Minor documentation changes.
- Version 0.3 —
2004-07-21 Minor documentation changes.
Test suite now disabled by default.
- Version 0.2 —
2004-06-01 Work-around for case-related bug in Gauche 0.8 and 0.7.4.2 that was tickled by csv-internal:make-portreader/positional. (Thanks to Grzegorz Chrupala for reporting.)
- Version 0.1 —
2004-05-31 First release, for testing with real-world input.
8 Legal
Copyright 2004, 2005, 2008–2012, 2016 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.