-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathREADME
157 lines (106 loc) · 5.37 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
NAME
utf8::all - turn on Unicode - all of it
VERSION
version 0.024
SYNOPSIS
use utf8::all; # Turn on UTF-8, all of it.
open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here
print length 'føø bār'; # 7 UTF-8 characters
my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
DESCRIPTION
The use utf8 pragma tells the Perl parser to allow UTF-8 in the program
text in the current lexical scope. This also means that you can now use
literal Unicode characters as part of strings, variable names, and
regular expressions.
utf8::all goes further:
* charnames are imported so \N{...} sequences can be used to compile
Unicode characters based on names.
* On Perl v5.11.0 or higher, the use feature 'unicode_strings' is
enabled.
* use feature fc and use feature unicode_eval are enabled on Perl
5.16.0 and higher.
* Filehandles are opened with UTF-8 encoding turned on by default
(including STDIN, STDOUT, and STDERR when utf8::all is used from the
main package). Meaning that they automatically convert UTF-8 octets
to characters and vice versa. If you don't want UTF-8 for a
particular filehandle, you'll have to set binmode $filehandle.
* @ARGV gets converted from UTF-8 octets to Unicode characters (when
utf8::all is used from the main package). This is similar to the
behaviour of the -CA perl command-line switch (see perlrun).
* readdir, readlink, readpipe (including the qx// and backtick
operators), and glob (including the <> operator) now all work with
and return Unicode characters instead of (UTF-8) octets (again only
when utf8::all is used from the main package).
Lexical Scope
The pragma is lexically-scoped, so you can do the following if you had
some reason to:
{
use utf8::all;
open my $out, '>', 'outfile';
my $utf8_str = 'føø bār';
print length $utf8_str, "\n"; # 7
print $out $utf8_str; # out as utf8
}
open my $in, '<', 'outfile'; # in as raw
my $text = do { local $/; <$in>};
print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use no utf8::all to turn off
the effects.
Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file
handles is always global and can not be undone!
Enabling/Disabling Global Features
As described above, the default behaviour of utf8::all is to convert
@ARGV and to open the STDIN, STDOUT, and STDERR file handles with UTF-8
encoding, and override the readlink and readdir functions and glob
operators when utf8::all is used from the main package.
If you want to disable these features even when utf8::all is used from
the main package, add the option NO-GLOBAL (or LEXICAL-ONLY) to the use
line. E.g.:
use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when
utf8::all was used from another package than main, use the option
GLOBAL on the use line:
use utf8::all 'GLOBAL';
UTF-8 Errors
utf8::all will handle invalid code points (i.e., utf-8 that does not
map to a valid unicode "character"), as a fatal error.
For glob, readdir, and readlink, one can change this behaviour by
setting the attribute "$utf8::all::UTF8_CHECK".
ATTRIBUTES
$utf8::all::UTF8_CHECK
By default utf8::all marks decoding errors as fatal (default value for
this setting is Encode::FB_CROAK). If you want, you can change this by
setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the
encoding errors as warnings, and Encode::FB_DEFAULT will completely
ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is
always enforced.
Important: Only controls the handling of decoding errors in glob,
readdir, and readlink.
INTERACTION WITH AUTODIE
If you use autodie, which is a great idea, you need to use at least
version 2.12, released on June 26, 2012
<https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise,
autodie obliterates the IO layers set by the open pragma. See RT #54777
<https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7
<https://github.com/doherty/utf8-all/issues/7>.
BUGS
Please report any bugs or feature requests on the bugtracker website
<https://github.com/doherty/utf8-all/issues>.
When submitting a bug or request, please include a test-file or a patch
to an existing test-file that illustrates the bug or desired feature.
COMPATIBILITY
The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8.
The readlink and readdir functions and glob operators will therefore
not be replaced on these systems.
SEE ALSO
* File::Find::utf8 for fully utf-8 aware File::Find functions.
* Cwd::utf8 for fully utf-8 aware Cwd functions.
AUTHORS
* Michael Schwern <mschwern@cpan.org>
* Mike Doherty <doherty@cpan.org>
* Hayo Baan <info@hayobaan.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Michael Schwern
<mschwern@cpan.org>; he originated it.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.