<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/unicode/utf8-core.c, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-11-23T04:50:55+00:00</updated>
<entry>
<title>Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode</title>
<updated>2024-11-23T04:50:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-23T04:50:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=060fc106b6854d3289d838ac3c98eb17afb261d7'/>
<id>urn:sha1:060fc106b6854d3289d838ac3c98eb17afb261d7</id>
<content type='text'>
Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
</content>
</entry>
<entry>
<title>unicode: Recreate utf8_parse_version()</title>
<updated>2024-10-28T12:36:54+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-10-21T16:37:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=142fa60f61f93805471012f24e029af6d113c5cc'/>
<id>urn:sha1:142fa60f61f93805471012f24e029af6d113c5cc</id>
<content type='text'>
All filesystems that currently support UTF-8 casefold can fetch the
UTF-8 version from the filesystem metadata stored on disk. They can get
the data stored and directly match it to a integer, so they can skip the
string parsing step, which motivated the removal of this function in the
first place.

However, for tmpfs, the only way to tell the kernel which UTF-8 version
we are about to use is via mount options, using a string. Re-introduce
utf8_parse_version() to be used by tmpfs.

This version differs from the original by skipping the intermediate step
of copying the version string to an auxiliary string before calling
match_token(). This versions calls match_token() in the argument string.
The paramenters are simpler now as well.

utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement
higher level API for string handling") and later removed by 49bd03cc7e9
("unicode: pass a UNICODE_AGE() tripple to utf8_load").

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>unicode: Fix utf8_load() error path</title>
<updated>2024-09-03T16:45:07+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-09-02T22:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=156bb2c569cd869583c593d27a5bd69e7b2a4264'/>
<id>urn:sha1:156bb2c569cd869583c593d27a5bd69e7b2a4264</id>
<content type='text'>
utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  &lt;TASK&gt;
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</content>
</entry>
<entry>
<title>unicode: remove MODULE_LICENSE in non-modules</title>
<updated>2023-04-13T20:13:54+00:00</updated>
<author>
<name>Nick Alcock</name>
<email>nick.alcock@oracle.com</email>
</author>
<published>2023-03-07T18:02:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=573858e85d7d390313fbc1f452ae764512f9819d'/>
<id>urn:sha1:573858e85d7d390313fbc1f452ae764512f9819d</id>
<content type='text'>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock &lt;nick.alcock@oracle.com&gt;
Suggested-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa &lt;hasegawa-hitomi@fujitsu.com&gt;
Cc: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</content>
</entry>
<entry>
<title>unicode: Add utf8-data module</title>
<updated>2021-10-12T14:41:39+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T07:00:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b3d047870120bcd46d7cc257d19ff49328fd585'/>
<id>urn:sha1:2b3d047870120bcd46d7cc257d19ff49328fd585</id>
<content type='text'>
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel &lt;shreeya.patel@collabora.com&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
</entry>
<entry>
<title>unicode: cache the normalization tables in struct unicode_map</title>
<updated>2021-10-11T20:02:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T07:00:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ca99ce756c27852d1ea1e555045de1c920f30ed'/>
<id>urn:sha1:6ca99ce756c27852d1ea1e555045de1c920f30ed</id>
<content type='text'>
Instead of repeatedly looking up the version add pointers to the
NFD and NFD+CF tables to struct unicode_map, and pass a
unicode_map plus index to the functions using the normalization
tables.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
</entry>
<entry>
<title>unicode: pass a UNICODE_AGE() tripple to utf8_load</title>
<updated>2021-10-11T20:01:46+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T07:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49bd03cc7e95cb78420305ca2f5ef67497b6fa80'/>
<id>urn:sha1:49bd03cc7e95cb78420305ca2f5ef67497b6fa80</id>
<content type='text'>
Don't bother with pointless string parsing when the caller can just pass
the version in the format that the core expects.  Also remove the
fallback to the latest version that none of the callers actually uses.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
</entry>
<entry>
<title>unicode: remove the charset field from struct unicode_map</title>
<updated>2021-10-11T20:01:35+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T06:59:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a440943e68cd1b5a853a6f60865967b7cc2539eb'/>
<id>urn:sha1:a440943e68cd1b5a853a6f60865967b7cc2539eb</id>
<content type='text'>
It is hardcoded and only used for a f2fs sysfs file where it can be
hardcoded just as easily.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
</entry>
<entry>
<title>unicode: Add utf8_casefold_hash</title>
<updated>2020-09-10T21:03:31+00:00</updated>
<author>
<name>Daniel Rosenberg</name>
<email>drosen@google.com</email>
</author>
<published>2020-07-08T09:12:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d7bfea8b8378277a25b42b28fe5a2a5ca76a7cf'/>
<id>urn:sha1:3d7bfea8b8378277a25b42b28fe5a2a5ca76a7cf</id>
<content type='text'>
This adds a case insensitive hash function to allow taking the hash
without needing to allocate a casefolded copy of the string.

The existing d_hash implementations for casefolding allocate memory
within rcu-walk, by avoiding it we can be more efficient and avoid
worrying about a failed allocation.

Signed-off-by: Daniel Rosenberg &lt;drosen@google.com&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
</entry>
<entry>
<title>unicode: make array 'token' static const, makes object smaller</title>
<updated>2019-09-17T15:48:24+00:00</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2019-09-06T13:58:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa28b98d6dbcdb1c822b53f90e509565dfc450b0'/>
<id>urn:sha1:aa28b98d6dbcdb1c822b53f90e509565dfc450b0</id>
<content type='text'>
Don't populate the array 'token' on the stack but instead make it
static const. Makes the object code smaller by 234 bytes.

Before:
   text	   data	    bss	    dec	    hex	filename
   5371	    272	      0	   5643	   160b	fs/unicode/utf8-core.o

After:
   text	   data	    bss	    dec	    hex	filename
   5041	    368	      0	   5409	   1521	fs/unicode/utf8-core.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
</entry>
</feed>
