charsets {tools} | R Documentation |
charset_to_Unicode
is a matrix of Unicode code points with
columns for the common 8-bit encodings.
Adobe_glyphs
is a data frame which gives Adobe glyph names for
Unicode code points. It has two character columns, "adobe"
and
"unicode"
(a 4-digit hex representation).
charset_to_Unicode Adobe_glyphs
charset_to_Unicode
is an integer matrix of class
c("noquote", "hexmode")
so prints in hexadecimal.
The mappings are those used by libiconv
: there are differences
in the way quotes and minus/hyphen are mapped between sources (and the
postscript encoding files use a different mapping).
Adobe_glyphs
includes all the Adobe glyph names which correspond
to single Unicode characters. It is sorted by Unicode code point and
within a point alphabetically on the glyph (there can be more than one
name for a Unicode code point). The data are in the file
‘R_HOME/share/encodings/Adobe_glyphlist’.
## find Adobe names for ISOLatin2 chars. latin2 <- charset_to_Unicode[, "ISOLatin2"] aUnicode <- as.numeric(paste0("0x", Adobe_glyphs$unicode)) keep <- aUnicode %in% latin2 aUnicode <- aUnicode[keep] aAdobe <- Adobe_glyphs[keep, 1] ## first match aLatin2 <- aAdobe[match(latin2, aUnicode)] ## all matches bLatin2 <- lapply(1:256, function(x) aAdobe[aUnicode == latin2[x]]) format(bLatin2, justify = "none")