diff options
author | Jan Kara <jack@suse.cz> | 2018-04-12 18:22:23 +0300 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2018-04-24 10:36:38 +0300 |
commit | a2a9d0190f9964dd734348085c22582e519c145a (patch) | |
tree | 3308763b20a53455306180b2bc9dfc4447683694 | |
parent | f86815184c471dae61f3c764b7589872a91e930a (diff) | |
download | linux-a2a9d0190f9964dd734348085c22582e519c145a.tar.xz |
udf: Fix leak of UTF-16 surrogates into encoded strings
commit 44f06ba8297c7e9dfd0e49b40cbe119113cca094 upstream.
OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.
CC: stable@vger.kernel.org # >= v4.6
Reported-by: Mingye Wang <arthur200126@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r-- | fs/udf/unicode.c | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c index 695389a4fc23..3a3be23689b3 100644 --- a/fs/udf/unicode.c +++ b/fs/udf/unicode.c @@ -28,6 +28,9 @@ #include "udf_sb.h" +#define SURROGATE_MASK 0xfffff800 +#define SURROGATE_PAIR 0x0000d800 + static int udf_uni2char_utf8(wchar_t uni, unsigned char *out, int boundlen) @@ -37,6 +40,9 @@ static int udf_uni2char_utf8(wchar_t uni, if (boundlen <= 0) return -ENAMETOOLONG; + if ((uni & SURROGATE_MASK) == SURROGATE_PAIR) + return -EINVAL; + if (uni < 0x80) { out[u_len++] = (unsigned char)uni; } else if (uni < 0x800) { |