Lexical Conventions
Treatment of Non-ASCII Characters
VHDL is defined to use the 8-bit character set ISO8859-1. Verilog and SystemVerilog do not specify the character set at all, although "ASCII" is mentioned, as well as "printable ASCII" in the range 33-126 decimal for use in escaped identifiers.
Strictly speaking, if 7-bit ASCII is to be assumed, then characters outside that range must be rejected, even in comments. However, we have seen code with non-ASCII characters (e.g. the copyright symbol) in comments.
DSim accepts any 8-bit character sequence inside a comment. However, DSim treats the input as a sequence of 8-bit characters, one character per byte. It has no knowledge of UTF-8, UTF-16, byte order marks (BOM), etc. Therefore, only encodings that cannot spuriously produce a comment termination are reliable. UTF-8 has this property.
DSim accepts only printable 7-bit ASCII characters in escaped identifiers.