kernel/linux.git/fs/udf/unicode.c, branch v6.12.80

udf: Fix uninitialized array access for some pathnames

2023-06-21T09:53:06+00:00

For filenames that begin with . and are between 2 and 5 characters long, UDF charset conversion code would read uninitialized memory in the output buffer. The only practical impact is that the name may be prepended a "unification hash" when it is not actually needed but still it is good to fix this. Reported-by: syzbot+cd311b1e43cc25f90d18@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000e2638a05fe9dc8f9@google.com Signed-off-by: Jan Kara

fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifier

2023-05-30T13:39:13+00:00

The notice refers to full GPL 2.0 text on now defunct MIT FTP site [1]. Replace it with appropriate SPDX license identifier. Cc: Thomas Gleixner Cc: Pali Rohár Link: https://web.archive.org/web/20020809115410/ftp://prep.ai.mit.edu/pub/gnu/GPL [1] Signed-off-by: Bagas Sanjaya Signed-off-by: Jan Kara Message-Id: <20230522005434.22133-2-bagasdotme@gmail.com>

udf: Fix iocharset=utf8 mount option

2021-08-12T14:07:09+00:00

Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset, it is required to use utf8 mount option. Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount option. If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to distinguish between UTF-8 and non-UTF-8 it is needed just to check if s_nls_map set to NULL or not. Link: https://lore.kernel.org/r/20210808162453.1653-4-pali@kernel.org Signed-off-by: Pali Rohár Signed-off-by: Jan Kara

udf: Allow mounting volumes with incorrect identification strings

2018-11-19T09:27:59+00:00

Commit c26f6c615788 ("udf: Fix conversion of 'dstring' fields to UTF8") started to be more strict when checking whether converted strings are properly formatted. Sudip reports that there are DVDs where the volume identification string is actually too long - UDF reports: [ 632.309320] UDF-fs: incorrect dstring lengths (32/32) during mount and fails the mount. This is mostly harmless failure as we don't need volume identification (and even less volume set identification) for anything. So just truncate the volume identification string if it is too long and replace it with 'Invalid' if we just cannot convert it for other reasons. This keeps slightly incorrect media still mountable. CC: stable@vger.kernel.org Fixes: c26f6c615788 ("udf: Fix conversion of 'dstring' fields to UTF8") Reported-and-tested-by: Sudip Mukherjee Signed-off-by: Jan Kara

udf: Add support for decoding UTF-16 characters

2018-04-19T14:00:48+00:00

Add support to decode characters outside of Base Multilingual Plane of UTF-16 encoded in CS0 charset of UDF. Signed-off-by: Jan Kara

udf: Add support for encoding UTF-16 characters

2018-04-19T14:00:48+00:00

Add support to store characters outside of Base Multilingual Plane of UTF-16 in CS0 encoding of UDF. Signed-off-by: Jan Kara

udf: Push sb argument to udf_name_[to|from]_CS0()

2018-04-19T14:00:48+00:00

Push superblock argument to udf_name_[to|from]_CS0() functions so that we can decide about character conversion functions there. Signed-off-by: Jan Kara

udf: Convert ident strings to proper charset

2018-04-19T14:00:48+00:00

iocharset= mount option specifies the character set used on *console* (not on disk). So even dstrings from VRS need to be converted from CS0 to the specified charset and not always UTF-8. This is barely user visible as those strings are shown only in UDF debug messages. CC: Andrew Gabbasov Signed-off-by: Jan Kara

udf: Use UTF-32 <-> UTF-8 conversion functions from NLS

2018-04-19T14:00:48+00:00

Instead of implementing our own functions converting to and from UTF-8, use the ones provided by NLS. Signed-off-by: Jan Kara

udf: Fix leak of UTF-16 surrogates into encoded strings

2018-04-18T14:34:55+00:00

OSTA UDF specification does not mention whether the CS0 charset in case of two bytes per character encoding should be treated in UTF-16 or UCS-2. The sample code in the standard does not treat UTF-16 surrogates in any special way but on systems such as Windows which work in UTF-16 internally, filenames would be treated as being in UTF-16 effectively. In Linux it is more difficult to handle characters outside of Base Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte characters only. Just make sure we don't leak UTF-16 surrogates into the resulting string when loading names from the filesystem for now. CC: stable@vger.kernel.org # >= v4.6 Reported-by: Mingye Wang Signed-off-by: Jan Kara