kernel/linux.git/fs/unicode/utf8-core.c, branch v6.19.11

Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

2024-11-23T04:50:55+00:00

Pull unicode updates from Gabriel Krisman Bertazi: - constify a read-only struct (Thomas Weißschuh) - fix the error path of unicode_load, avoiding a possible kernel oops if it fails to find the unicode module (André Almeida) - documentation fix, updating a filename in the README (Gan Jie) - add the link of my tree to MAINTAINERS (André Almeida) * tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: MAINTAINERS: Add Unicode tree unicode: change the reference of database file unicode: Fix utf8_load() error path unicode: constify utf8 data table

unicode: Recreate utf8_parse_version()

2024-10-28T12:36:54+00:00

All filesystems that currently support UTF-8 casefold can fetch the UTF-8 version from the filesystem metadata stored on disk. They can get the data stored and directly match it to a integer, so they can skip the string parsing step, which motivated the removal of this function in the first place. However, for tmpfs, the only way to tell the kernel which UTF-8 version we are about to use is via mount options, using a string. Re-introduce utf8_parse_version() to be used by tmpfs. This version differs from the original by skipping the intermediate step of copying the version string to an auxiliary string before calling match_token(). This versions calls match_token() in the argument string. The paramenters are simpler now as well. utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement higher level API for string handling") and later removed by 49bd03cc7e9 ("unicode: pass a UNICODE_AGE() tripple to utf8_load"). Signed-off-by: André Almeida Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com Reviewed-by: Theodore Ts'o Reviewed-by: Gabriel Krisman Bertazi Signed-off-by: Christian Brauner

unicode: Fix utf8_load() error path

2024-09-03T16:45:07+00:00

utf8_load() requests the symbol "utf8_data_table" and then checks if the requested UTF-8 version is supported. If it's unsupported, it tries to put the data table using symbol_put(). If an unsupported version is requested, symbol_put() fails like this: kernel BUG at kernel/module/main.c:786! RIP: 0010:__symbol_put+0x93/0xb0 Call Trace: ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x65/0x80 ? __symbol_put+0x93/0xb0 ? exc_invalid_op+0x51/0x70 ? __symbol_put+0x93/0xb0 ? asm_exc_invalid_op+0x1a/0x20 ? __pfx_cmp_name+0x10/0x10 ? __symbol_put+0x93/0xb0 ? __symbol_put+0x62/0xb0 utf8_load+0xf8/0x150 That happens because symbol_put() expects the unique string that identify the symbol, instead of a pointer to the loaded symbol. Fix that by using such string. Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Signed-off-by: André Almeida Reviewed-by: Theodore Ts'o Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com Signed-off-by: Gabriel Krisman Bertazi

unicode: remove MODULE_LICENSE in non-modules

2023-04-13T20:13:54+00:00

Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Acked-by: Gabriel Krisman Bertazi Cc: Luis Chamberlain Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa Cc: Gabriel Krisman Bertazi Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Luis Chamberlain

unicode: Add utf8-data module

2021-10-12T14:41:39+00:00

utf8data.h contains a large database table which is an auto-generated decodification trie for the unicode normalization functions. Allow building it into a separate module. Based on a patch from Shreeya Patel . Signed-off-by: Christoph Hellwig Signed-off-by: Gabriel Krisman Bertazi

unicode: cache the normalization tables in struct unicode_map

2021-10-11T20:02:02+00:00

Instead of repeatedly looking up the version add pointers to the NFD and NFD+CF tables to struct unicode_map, and pass a unicode_map plus index to the functions using the normalization tables. Signed-off-by: Christoph Hellwig Signed-off-by: Gabriel Krisman Bertazi

unicode: pass a UNICODE_AGE() tripple to utf8_load

2021-10-11T20:01:46+00:00

Don't bother with pointless string parsing when the caller can just pass the version in the format that the core expects. Also remove the fallback to the latest version that none of the callers actually uses. Signed-off-by: Christoph Hellwig Signed-off-by: Gabriel Krisman Bertazi

unicode: remove the charset field from struct unicode_map

2021-10-11T20:01:35+00:00

It is hardcoded and only used for a f2fs sysfs file where it can be hardcoded just as easily. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi Signed-off-by: Gabriel Krisman Bertazi

unicode: Add utf8_casefold_hash

2020-09-10T21:03:31+00:00

This adds a case insensitive hash function to allow taking the hash without needing to allocate a casefolded copy of the string. The existing d_hash implementations for casefolding allocate memory within rcu-walk, by avoiding it we can be more efficient and avoid worrying about a failed allocation. Signed-off-by: Daniel Rosenberg Reviewed-by: Gabriel Krisman Bertazi Reviewed-by: Eric Biggers Signed-off-by: Jaegeuk Kim

unicode: make array 'token' static const, makes object smaller

2019-09-17T15:48:24+00:00

Don't populate the array 'token' on the stack but instead make it static const. Makes the object code smaller by 234 bytes. Before: text data bss dec hex filename 5371 272 0 5643 160b fs/unicode/utf8-core.o After: text data bss dec hex filename 5041 368 0 5409 1521 fs/unicode/utf8-core.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King Reviewed-by: Theodore Ts'o Signed-off-by: Gabriel Krisman Bertazi