| Age | Commit message (Collapse) | Author |
|
Will be used by tokenizer for short lists of strings
|
|
It's debatable if Eof should be considered an error or not.
But it is pretty clear it generally is a special response that
needs special handling, so easier to keep with the unexpected lot.
Also keeps better at higher abstraction levels, such as the line
reader.
|
|
|
|
|
|
|
|
Make it easier to keep a consistent style
|
|
|
|
|
|
Reads UTF-8 and UTF-16 into UTF-8 or UTF-16 strings.
If strict is true, fails at first invalid character.
If strict is false, invalid characters are replaced with U+FFFD.
For the replacement, I changed behavior if uN::read_replace to only
jump one byte. Otherwise a common invalid case when ISO-8859-1 or
WIN-1252 are read as UTF-8 would skip many characters.
If skip_bom is true any bom at start of stream is ignored.
If skip_bom is false any bom will be included.
Input format can be forced, if not detect is used which will
try to guess and then fallback to UTF-8.
|
|
|
|
|
|
Generate the lookup tables from UnicodeData.txt, do to that,
add gen_ugc, which uses csv, buffers, line, io and other modules
to do the job.
|
|
|
|
|
|
|
|
Only a basic argument parser to start with.
|