22.14.2 Replacing high ASCII
characters for W3C validation
W3C validation tests complain if a file includes any
characters with ASCII decimal values 128 through 159. Presence of these
characters does not preclude validation. However, if the file contains
real validation errors, the W3C validator reports these characters
along with the actual errors. If you fix the errors, and leave the characters,
the complaint becomes just a note about “non-SGML” characters.
Note: Leaving these characters
in your document does not make the output invalid, despite the
somewhat misleading way the W3C validator lists them when something else
in the output is not valid.
For most purposes you should not need to do anything
about the characters in question. However, if you want to have DITA2Go
remap or remove the offending characters, you can set the following option:
[HTMLOptions]
; ValidOnly = No (default, allow normal use of chars from 128 to 160),
; or Yes (for warning-free W3C validation, remaps or removes
; those chars)
ValidOnly=Yes
This option affects the following characters:
- 128 through 159 (first 32 high
ASCII characters), in all fonts except the following:
- Symbol
- Zapf Dingbats
- Webdings
- 171 and 187 (the guillemets),
in macros only.
Setting ValidOnly=Yes changes the
output as follows:
- curly quotes become straight
quotes
- en dashes become hyphens
- em dashes become a pair of hyphens
- bullets (except those produced
by <ul> tags) become
mid-dots
- all other characters in the
range are dropped, unless you map them yourself; see §30.4
Assigning properties to text formats.
Table
22-3 shows how DITA2Go treats characters in this range when
ValidOnly=Yes. Depending
on which version of the DITA2Go User’s Guide
you are using to view the table, some characters might not be displayed.
Table 22-3 Characters replaced or
removed for W3C validation
128
|
€
|
euro
|
Removed
|
129
|
(none)
|
(none)
|
Removed
|
130
|
‚
|
single base quote
|
' 039 (single quote)
|
131
|
ƒ
|
florin
|
Removed
|
132
|
„
|
double base quote
|
" 034 (double
quote)
|
133
|
…
|
ellipsis
|
Removed
|
134
|
†
|
dagger
|
Removed
|
135
|
‡
|
double dagger
|
Removed
|
136
|
ˆ
|
circumflex
|
Removed
|
137
|
‰
|
per thousand
|
Removed
|
138
|
Š
|
S caron
|
Removed
|
139
|
‹
|
left single guillemet
|
Removed
|
140
|
Œ
|
OE ligature
|
Removed
|
141
|
˘
|
(none)
|
Removed
|
142
|
Ž
|
Z caron
|
Removed
|
143
|
(none)
|
(none)
|
Removed
|
144
|
(none)
|
(none)
|
Removed
|
145
|
‘
|
left single quote
|
' 039 (single quote)
|
146
|
’
|
right single quote
|
' 039 (single quote)
|
147
|
“
|
left double quote
|
" 034 (double
quote)
|
148
|
”
|
right double quote
|
" 034 (double
quote)
|
149
|
•
|
bullet
|
· 183 (mid-dot),
except in <ul> lists
|
150
|
–
|
en dash
|
- 045 (hyphen)
|
151
|
—
|
em dash
|
- 045 (hyphen)
in text,
-- (two hyphens)
in macros
|
152
|
˜
|
tilde
|
Removed
|
153
|
™
|
trademark
|
Removed
|
154
|
š
|
s caron
|
Removed
|
155
|
›
|
right single guillemet
|
Removed
|
156
|
œ
|
oe ligature
|
Removed
|
157
|
˝
|
(varies; not used)
|
Removed
|
158
|
ž
|
z caron
|
Removed
|
159
|
Ÿ
|
Y diaeresis
|
Removed
|
···
|
171
|
«
|
left double guillemet
|
" 034 (double
quote), in macros only
|
187
|
»
|
right double guillemet
|
" 034 (double
quote), in macros only
|
See also:
§22.4.3
Specifying character encoding for HTML
§23.2.3 Specifying
character encoding for generic XML
§30.4
Assigning properties to text formats
Previous Topic: 22.14.1 Understanding
limitations of W3C validation
Next Topic: 22.14.3 Eliminating
<nobr> tags
Parent Topic: 22.14 Passing
W3C validation tests
Sibling Topics:
22.14.1 Understanding
limitations of W3C validation
22.14.3 Eliminating
<nobr> tags
22.14.4 Avoiding
redundant attribute assignments in tables