summaryrefslogtreecommitdiff
path: root/net/ceph
AgeCommit message (Collapse)AuthorFilesLines
5 dayslibceph: make calc_target() set t->paused, not just clear itIlya Dryomov1-2/+9
commit c0fe2994f9a9d0a2ec9e42441ea5ba74b6a16176 upstream. Currently calc_target() clears t->paused if the request shouldn't be paused anymore, but doesn't ever set t->paused even though it's able to determine when the request should be paused. Setting t->paused is left to __submit_request() which is fine for regular requests but doesn't work for linger requests -- since __submit_request() doesn't operate on linger requests, there is nowhere for lreq->t.paused to be set. One consequence of this is that watches don't get reestablished on paused -> unpaused transitions in cases where requests have been paused long enough for the (paused) unwatch request to time out and for the subsequent (re)watch request to enter the paused state. On top of the watch not getting reestablished, rbd_reregister_watch() gets stuck with rbd_dev->watch_mutex held: rbd_register_watch __rbd_register_watch ceph_osdc_watch linger_reg_commit_wait It's waiting for lreq->reg_commit_wait to be completed, but for that to happen the respective request needs to end up on need_resend_linger list and be kicked when requests are unpaused. There is no chance for that if the request in question is never marked paused in the first place. The fact that rbd_dev->watch_mutex remains taken out forever then prevents the image from getting unmapped -- "rbd unmap" would inevitably hang in D state on an attempt to grab the mutex. Cc: stable@vger.kernel.org Reported-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 dayslibceph: return the handler error from mon_handle_auth_done()Ilya Dryomov1-1/+1
commit e84b48d31b5008932c0a0902982809fbaa1d3b70 upstream. Currently any error from ceph_auth_handle_reply_done() is propagated via finish_auth() but isn't returned from mon_handle_auth_done(). This results in higher layers learning that (despite the monitor considering us to be successfully authenticated) something went wrong in the authentication phase and reacting accordingly, but msgr2 still trying to proceed with establishing the session in the background. In the case of secure mode this can trigger a WARN in setup_crypto() and later lead to a NULL pointer dereference inside of prepare_auth_signature(). Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 dayslibceph: make free_choose_arg_map() resilient to partial allocationTuo Li1-8/+12
commit e3fe30e57649c551757a02e1cad073c47e1e075e upstream. free_choose_arg_map() may dereference a NULL pointer if its caller fails after a partial allocation. For example, in decode_choose_args(), if allocation of arg_map->args fails, execution jumps to the fail label and free_choose_arg_map() is called. Since arg_map->size is updated to a non-zero value before memory allocation, free_choose_arg_map() will iterate over arg_map->args and dereference a NULL pointer. To prevent this potential NULL pointer dereference and make free_choose_arg_map() more resilient, add checks for pointers before iterating. Cc: stable@vger.kernel.org Co-authored-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Tuo Li <islituo@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 dayslibceph: replace overzealous BUG_ON in osdmap_apply_incremental()Ilya Dryomov1-1/+3
commit e00c3f71b5cf75681dbd74ee3f982a99cb690c2b upstream. If the osdmap is (maliciously) corrupted such that the incremental osdmap epoch is different from what is expected, there is no need to BUG. Instead, just declare the incremental osdmap to be invalid. Cc: stable@vger.kernel.org Reported-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 dayslibceph: prevent potential out-of-bounds reads in handle_auth_done()ziming zhang1-0/+2
commit 818156caffbf55cb4d368f9c3cac64e458fb49c9 upstream. Perform an explicit bounds check on payload_len to avoid a possible out-of-bounds access in the callout. [ idryomov: changelog ] Cc: stable@vger.kernel.org Signed-off-by: ziming zhang <ezrakiez@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 dayslibceph: make decode_pool() more resilient against corrupted osdmapsIlya Dryomov1-64/+52
commit 8c738512714e8c0aa18f8a10c072d5b01c83db39 upstream. If the osdmap is (maliciously) corrupted such that the encoded length of ceph_pg_pool envelope is less than what is expected for a particular encoding version, out-of-bounds reads may ensue because the only bounds check that is there is based on that length value. This patch adds explicit bounds checks for each field that is decoded or skipped. Cc: stable@vger.kernel.org Reported-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Tested-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07libceph: prevent potential out-of-bounds writes in handle_auth_session_key()ziming zhang1-0/+2
commit 7fce830ecd0a0256590ee37eb65a39cbad3d64fc upstream. The len field originates from untrusted network packets. Boundary checks have been added to prevent potential out-of-bounds writes when decrypting the connection secret or processing service tickets. [ idryomov: changelog ] Cc: stable@vger.kernel.org Signed-off-by: ziming zhang <ezrakiez@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07libceph: fix potential use-after-free in have_mon_and_osd_map()Ilya Dryomov2-25/+42
commit 076381c261374c587700b3accf410bdd2dba334e upstream. The wait loop in __ceph_open_session() can race with the client receiving a new monmap or osdmap shortly after the initial map is received. Both ceph_monc_handle_map() and handle_one_map() install a new map immediately after freeing the old one kfree(monc->monmap); monc->monmap = monmap; ceph_osdmap_destroy(osdc->osdmap); osdc->osdmap = newmap; under client->monc.mutex and client->osdc.lock respectively, but because neither is taken in have_mon_and_osd_map() it's possible for client->monc.monmap->epoch and client->osdc.osdmap->epoch arms in client->monc.monmap && client->monc.monmap->epoch && client->osdc.osdmap && client->osdc.osdmap->epoch; condition to dereference an already freed map. This happens to be reproducible with generic/395 and generic/397 with KASAN enabled: BUG: KASAN: slab-use-after-free in have_mon_and_osd_map+0x56/0x70 Read of size 4 at addr ffff88811012d810 by task mount.ceph/13305 CPU: 2 UID: 0 PID: 13305 Comm: mount.ceph Not tainted 6.14.0-rc2-build2+ #1266 ... Call Trace: <TASK> have_mon_and_osd_map+0x56/0x70 ceph_open_session+0x182/0x290 ceph_get_tree+0x333/0x680 vfs_get_tree+0x49/0x180 do_new_mount+0x1a3/0x2d0 path_mount+0x6dd/0x730 do_mount+0x99/0xe0 __do_sys_mount+0x141/0x180 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Allocated by task 13305: ceph_osdmap_alloc+0x16/0x130 ceph_osdc_init+0x27a/0x4c0 ceph_create_client+0x153/0x190 create_fs_client+0x50/0x2a0 ceph_get_tree+0xff/0x680 vfs_get_tree+0x49/0x180 do_new_mount+0x1a3/0x2d0 path_mount+0x6dd/0x730 do_mount+0x99/0xe0 __do_sys_mount+0x141/0x180 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 9475: kfree+0x212/0x290 handle_one_map+0x23c/0x3b0 ceph_osdc_handle_map+0x3c9/0x590 mon_dispatch+0x655/0x6f0 ceph_con_process_message+0xc3/0xe0 ceph_con_v1_try_read+0x614/0x760 ceph_con_workfn+0x2de/0x650 process_one_work+0x486/0x7c0 process_scheduled_works+0x73/0x90 worker_thread+0x1c8/0x2a0 kthread+0x2ec/0x300 ret_from_fork+0x24/0x40 ret_from_fork_asm+0x1a/0x30 Rewrite the wait loop to check the above condition directly with client->monc.mutex and client->osdc.lock taken as appropriate. While at it, improve the timeout handling (previously mount_timeout could be exceeded in case wait_event_interruptible_timeout() slept more than once) and access client->auth_err under client->monc.mutex to match how it's set in finish_auth(). monmap_show() and osdmap_show() now take the respective lock before accessing the map as well. Cc: stable@vger.kernel.org Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02libceph: fix invalid accesses to ceph_connection_v1_infoIlya Dryomov1-3/+4
commit cdbc9836c7afadad68f374791738f118263c5371 upstream. There is a place where generic code in messenger.c is reading and another place where it is writing to con->v1 union member without checking that the union member is active (i.e. msgr1 is in use). On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter, so such a read is almost guaranteed to return a bogus value instead of 0 when msgr2 is in use. This ends up being fairly benign because the side effect is just the invalidation of the authorizer and successive fetching of new tickets. con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that it's being written to can cause more serious consequences, but luckily it's not something that happens often. Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-18libceph: fix race between delayed_work() and ceph_monc_stop()Ilya Dryomov1-2/+12
commit 69c7b2fe4c9cc1d3b1186d1c5606627ecf0de883 upstream. The way the delayed work is handled in ceph_monc_stop() is prone to races with mon_fault() and possibly also finish_hunting(). Both of these can requeue the delayed work which wouldn't be canceled by any of the following code in case that happens after cancel_delayed_work_sync() runs -- __close_session() doesn't mess with the delayed work in order to avoid interfering with the hunting interval logic. This part was missed in commit b5d91704f53e ("libceph: behave in mon_fault() if cur_mon < 0") and use-after-free can still ensue on monc and objects that hang off of it, with monc->auth and monc->monmap being particularly susceptible to quickly being reused. To fix this: - clear monc->cur_mon and monc->hunting as part of closing the session in ceph_monc_stop() - bail from delayed_work() if monc->cur_mon is cleared, similar to how it's done in mon_fault() and finish_hunting() (based on monc->hunting) - call cancel_delayed_work_sync() after the session is closed Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/66857 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-20libceph: use kernel_connect()Jordan Rife1-2/+2
commit 7563cf17dce0a875ba3d872acdc63a78ea344019 upstream. Direct calls to ops->connect() can overwrite the address parameter when used in conjunction with BPF SOCK_ADDR hooks. Recent changes to kernel_connect() ensure that callers are insulated from such side effects. This patch wraps the direct call to ops->connect() with kernel_connect() to prevent unexpected changes to the address passed to ceph_tcp_connect(). This change was originally part of a larger patch targeting the net tree addressing all instances of unprotected calls to ops->connect() throughout the kernel, but this change was split up into several patches targeting various trees. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/20230821100007.559638-1-jrife@google.com/ Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/ Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11libceph: fix potential hang in ceph_osdc_notify()Ilya Dryomov1-6/+14
commit e6e2843230799230fc5deb8279728a7218b0d63c upstream. If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-03rbd: harden get_lock_owner_info() a bitIlya Dryomov1-0/+1
commit 8ff2c64c9765446c3cef804fb99da04916603e27 upstream. - we want the exclusive lock type, so test for it directly - use sscanf() to actually parse the lock cookie and avoid admitting invalid handles - bail if locker has a blank address Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23libceph: harden msgr2.1 frame segment length checksIlya Dryomov1-15/+26
commit a282a2f10539dce2aa619e71e1817570d557fc97 upstream. ceph_frame_desc::fd_lens is an int array. decode_preamble() thus effectively casts u32 -> int but the checks for segment lengths are written as if on unsigned values. While reading in HELLO or one of the AUTH frames (before authentication is completed), arithmetic in head_onwire_len() can get duped by negative ctrl_len and produce head_len which is less than CEPH_PREAMBLE_LEN but still positive. This would lead to a buffer overrun in prepare_read_control() as the preamble gets copied to the newly allocated buffer of size head_len. Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Reported-by: Thelford Williams <thelford@google.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25libceph: fix potential use-after-free on linger ping and resendsIlya Dryomov1-183/+119
commit 75dbb685f4e8786c33ddef8279bab0eadfb0731f upstream. request_reinit() is not only ugly as the comment rightfully suggests, but also unsafe. Even though it is called with osdc->lock held for write in all cases, resetting the OSD request refcount can still race with handle_reply() and result in use-after-free. Taking linger ping as an example: handle_timeout thread handle_reply thread down_read(&osdc->lock) req = lookup_request(...) ... finish_request(req) # unregisters up_read(&osdc->lock) __complete_request(req) linger_ping_cb(req) # req->r_kref == 2 because handle_reply still holds its ref down_write(&osdc->lock) send_linger_ping(lreq) req = lreq->ping_req # same req # cancel_linger_request is NOT # called - handle_reply already # unregistered request_reinit(req) WARN_ON(req->r_kref != 1) # fires request_init(req) kref_init(req->r_kref) # req->r_kref == 1 after kref_init ceph_osdc_put_request(req) kref_put(req->r_kref) # req->r_kref == 0 after kref_put, req is freed <further req initialization/use> !!! This happens because send_linger_ping() always (re)uses the same OSD request for watch ping requests, relying on cancel_linger_request() to unregister it from the OSD client and rip its messages out from the messenger. send_linger() does the same for watch/notify registration and watch reconnect requests. Unfortunately cancel_request() doesn't guarantee that after it returns the OSD client would be completely done with the OSD request -- a ref could still be held and the callback (if specified) could still be invoked too. The original motivation for request_reinit() was inability to deal with allocation failures in send_linger() and send_linger_ping(). Switching to using osdc->req_mempool (currently only used by CephFS) respects that and allows us to get rid of request_reinit(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-29libceph: fix doc warnings in cls_lock_client.cBaokun Li1-3/+9
Add description to fix the following W=1 kernel build warnings: net/ceph/cls_lock_client.c:28: warning: Function parameter or member 'osdc' not described in 'ceph_cls_lock' net/ceph/cls_lock_client.c:28: warning: Function parameter or member 'oid' not described in 'ceph_cls_lock' net/ceph/cls_lock_client.c:28: warning: Function parameter or member 'oloc' not described in 'ceph_cls_lock' [ idryomov: tweak osdc description ] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29libceph: remove unnecessary ret variable in ceph_auth_init()zuoqilin1-6/+1
There is no necessary to define variable assignment, just return directly to simplify the steps. Signed-off-by: zuoqilin <zuoqilin@yulong.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29libceph: fix some spelling mistakesZheng Yongjun3-4/+4
Fix some spelling mistakes in comments: - enconding ==> encoding - ambigous ==> ambiguous - orignal ==> original - encyption ==> encryption Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29libceph: kill ceph_none_authorizer::reply_bufIlya Dryomov2-3/+2
We never receive authorizer replies with cephx disabled, so it is bogus. Also, it still uses the old zero-length array style. Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-24libceph: set global_id as soon as we get an auth ticketIlya Dryomov3-14/+13
Commit 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") delayed the setting of global_id too much. It is set only after all tickets are received, but in pre-nautilus clusters an auth ticket and the service tickets are obtained in separate steps (for a total of three MAuth replies). When the service tickets are requested, global_id is used to build an authorizer; if global_id is still 0 we never get them and fail to establish the session. Moving the setting of global_id into protocol implementations. This way global_id can be set exactly when an auth ticket is received, not sooner nor later. Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24libceph: don't pass result into ac->ops->handle_reply()Ilya Dryomov3-11/+14
There is no result to pass in msgr2 case because authentication failures are reported through auth_bad_method frame and in MAuth case an error is returned immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-05-06Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds3-20/+38
Pull ceph updates from Ilya Dryomov: "Notable items here are - a series to take advantage of David Howells' netfs helper library from Jeff - three new filesystem client metrics from Xiubo - ceph.dir.rsnaps vxattr from Yanhu - two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem" * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits) libceph: allow addrvecs with a single NONE/blank address libceph: don't set global_id until we get an auth ticket libceph: bump CephXAuthenticate encoding version ceph: don't allow access to MDS-private inodes ceph: fix up some bare fetches of i_size ceph: convert some PAGE_SIZE invocations to thp_size() ceph: support getting ceph.dir.rsnaps vxattr ceph: drop pinned_page parameter from ceph_get_caps ceph: fix inode leak on getattr error in __fh_to_dentry ceph: only check pool permissions for regular files ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon ceph: avoid counting the same request twice or more ceph: rename the metric helpers ceph: fix kerneldoc copypasta over ceph_start_io_direct ceph: use attach/detach_page_private for tracking snap context ceph: don't use d_add in ceph_handle_snapdir ceph: don't clobber i_snap_caps on non-I_NEW inode ceph: fix fall-through warnings for Clang ceph: convert ceph_readpages to ceph_readahead ceph: convert ceph_write_begin to netfs_write_begin ...
2021-05-04libceph: allow addrvecs with a single NONE/blank addressIlya Dryomov1-6/+14
Normally, an unused OSD id/slot is represented by an empty addrvec. However, it also appears to be possible to generate an osdmap where an unused OSD id/slot has an addrvec with a single blank address of type NONE. Allow such addrvecs and make the end result be exactly the same as for the empty addrvec case -- leave addr intact. Cc: stable@vger.kernel.org # 5.11+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-04-28libceph: don't set global_id until we get an auth ticketIlya Dryomov1-13/+23
With the introduction of enforcing mode, setting global_id as soon as we get it in the first MAuth reply will result in EACCES if the connection is reset before we get the second MAuth reply containing an auth ticket -- because on retry we would attempt to reclaim that global_id with no auth ticket at hand. Neither ceph_auth_client nor ceph_mon_client depend on global_id being set ealy, so just delay the setting until we get and process the second MAuth reply. While at it, complain if the monitor sends a zero global_id or changes our global_id as the session is likely to fail after that. Cc: stable@vger.kernel.org # needs backporting for < 5.11 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2021-04-28libceph: bump CephXAuthenticate encoding versionIlya Dryomov1-1/+1
A dummy v3 encoding (exactly the same as v2) was introduced so that the monitors can distinguish broken clients that may not include their auth ticket in CEPHX_GET_AUTH_SESSION_KEY request on reconnects, thus failing to prove previous possession of their global_id (one part of CVE-2021-20288). The kernel client has always included its auth ticket, so it is compatible with enforcing mode as is. However we want to bump the encoding version to avoid having to authenticate twice on the initial connect -- all legacy (CephXAuthenticate < v3) are now forced do so in order to expose insecure global_id reclaim. Marking for stable since at least for 5.11 and 5.12 it is trivial (v2 -> v3). Cc: stable@vger.kernel.org # 5.11+ URL: https://tracker.ceph.com/issues/50452 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2021-03-26net: ceph: Fix a typo in osdmap.cLu Wei1-1/+1
Modify "inital" to "initial" in net/ceph/osdmap.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16libceph: remove osdtimeout option entirelyIlya Dryomov1-6/+0
Commit 83aff95eb9d6 ("libceph: remove 'osdtimeout' option") deprecated osdtimeout over 8 years ago, but it is still recognized. Let's remove it entirely. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-02-16libceph: deprecate [no]cephx_require_signatures optionsIlya Dryomov1-6/+5
These options were introduced in 3.19 with support for message signing and are rather useless, as explained in commit a51983e4dd2d ("libceph: add nocephx_sign_messages option"). Deprecate them. In case there is someone out there with a cluster that lacks support for MSG_AUTH feature (very unlikely but has to be considered since we haven't formally raised the bar from argonaut to bobtail yet), make nocephx_sign_messages also waive MSG_AUTH requirement. This is probably how it should have been done in the first place -- if we aren't going to sign, requiring the signing feature makes no sense. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-01-21libceph: fix "Boolean result is used in bitwise operation" warningIlya Dryomov1-1/+1
This line dates back to 2013, but cppcheck complained because commit 2f713615ddd9 ("libceph: move msgr1 protocol implementation to its own file") moved it. Add parenthesis to silence the warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-04libceph, ceph: disambiguate ceph_connection_operations handlersIlya Dryomov2-27/+27
Since a few years, kernel addresses are no longer included in oops dumps, at least on x86. All we get is a symbol name with offset and size. This is a problem for ceph_connection_operations handlers, especially con->ops->dispatch(). All three handlers have the same name and there is little context to disambiguate between e.g. monitor and OSD clients because almost everything is inlined. gdb sneakily stops at the first matching symbol, so one has to resort to nm and addr2line. Some of these are already prefixed with mon_, osd_ or mds_. Let's do the same for all others. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2021-01-04libceph: zero out session key and connection secretIlya Dryomov3-43/+62
Try and avoid leaving bits and pieces of session key and connection secret (gets split into GCM key and a pair of GCM IVs) around. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-12-28libceph: align session_key and con_secret to 16 bytesIlya Dryomov1-2/+10
crypto_shash_setkey() and crypto_aead_setkey() will do a (small) GFP_ATOMIC allocation to align the key if it isn't suitably aligned. It's not a big deal, but at the same time easy to avoid. The actual alignment requirement is dynamic, queryable with crypto_shash_alignmask() and crypto_aead_alignmask(), but shouldn't be stricter than 16 bytes for our algorithms. Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-28libceph: fix auth_signature buffer allocation in secure modeIlya Dryomov1-1/+2
auth_signature frame is 68 bytes in plain mode and 96 bytes in secure mode but we are requesting 68 bytes in both modes. By luck, this doesn't actually result in any invalid memory accesses because the allocation is satisfied out of kmalloc-96 slab and so exactly 96 bytes are allocated, but KASAN rightfully complains. Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Reported-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-17Merge tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds16-1914/+6384
Pull ceph updates from Ilya Dryomov: "The big ticket item here is support for msgr2 on-wire protocol, which adds the option of full in-transit encryption using AES-GCM algorithm (myself). On top of that we have a series to avoid intermittent errors during recovery with recover_session=clean and some MDS request encoding work from Jeff, a cap handling fix and assorted observability improvements from Luis and Xiubo and a good number of cleanups. Luis also ran into a corner case with quotas which sadly means that we are back to denying cross-quota-realm renames" * tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client: (59 commits) libceph: drop ceph_auth_{create,update}_authorizer() libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1 libceph, ceph: implement msgr2.1 protocol (crc and secure modes) libceph: introduce connection modes and ms_mode option libceph, rbd: ignore addr->type while comparing in some cases libceph, ceph: get and handle cluster maps with addrvecs libceph: factor out finish_auth() libceph: drop ac->ops->name field libceph: amend cephx init_protocol() and build_request() libceph, ceph: incorporate nautilus cephx changes libceph: safer en/decoding of cephx requests and replies libceph: more insight into ticket expiry and invalidation libceph: move msgr1 protocol specific fields to its own struct libceph: move msgr1 protocol implementation to its own file libceph: separate msgr1 protocol implementation libceph: export remaining protocol independent infrastructure libceph: export zero_page libceph: rename and export con->flags bits libceph: rename and export con->state states libceph: make con->state an int ...
2020-12-15libceph: drop ceph_auth_{create,update}_authorizer()Ilya Dryomov1-28/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1Ilya Dryomov1-16/+5
This shouldn't cause any functional changes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph, ceph: implement msgr2.1 protocol (crc and secure modes)Ilya Dryomov8-24/+4046
Implement msgr2.1 wire protocol, available since nautilus 14.2.11 and octopus 15.2.5. msgr2.0 wire protocol is not implemented -- it has several security, integrity and robustness issues and therefore considered deprecated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: introduce connection modes and ms_mode optionIlya Dryomov4-6/+87
msgr2 supports two connection modes: crc (plain) and secure (on-wire encryption). Connection mode is picked by server based on input from client. Introduce ms_mode option: ms_mode=legacy - msgr1 (default) ms_mode=crc - crc mode, if denied fail ms_mode=secure - secure mode, if denied fail ms_mode=prefer-crc - crc mode, if denied agree to secure mode ms_mode=prefer-secure - secure mode, if denied agree to crc mode ms_mode affects all connections, we don't separate connections to mons like it's done in userspace with ms_client_mode vs ms_mon_client_mode. For now the default is legacy, to be flipped to prefer-crc after some time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph, rbd: ignore addr->type while comparing in some casesIlya Dryomov1-2/+4
For libceph, this ensures that libceph instance sharing (share option) continues to work. For rbd, this avoids blocklisting alive lock owners (locker addr is always LEGACY, while watcher addr is ANY in nautilus). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph, ceph: get and handle cluster maps with addrvecsIlya Dryomov4-55/+195
In preparation for msgr2, make the cluster send us maps with addrvecs including both LEGACY and MSGR2 addrs instead of a single LEGACY addr. This means advertising support for SERVER_NAUTILUS and also some older features: SERVER_MIMIC, MONENC and MONNAMES. MONNAMES and MONENC are actually pre-argonaut, we just never updated ceph_monmap_decode() for them. Decoding is unconditional, see commit 23c625ce3065 ("libceph: assume argonaut on the server side"). SERVER_MIMIC doesn't bear any meaning for the kernel client. Since ceph_decode_entity_addrvec() is guarded by encoding version checks (and in msgr2 case it is guarded implicitly by the fact that server is speaking msgr2), we assume MSG_ADDR2 for it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: factor out finish_auth()Ilya Dryomov1-22/+30
In preparation for msgr2, factor out finish_auth() so it is suitable for both existing MAuth message based authentication and upcoming msgr2 authentication exchange. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: drop ac->ops->name fieldIlya Dryomov2-2/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: amend cephx init_protocol() and build_request()Ilya Dryomov2-28/+49
In msgr2, initial authentication happens with an exchange of msgr2 control frames -- MAuth message and struct ceph_mon_request_header aren't used. Make that optional. Stop reporting cephx protocol as "x". Use "cephx" instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph, ceph: incorporate nautilus cephx changesIlya Dryomov6-48/+194
- request service tickets together with auth ticket. Currently we get auth ticket via CEPHX_GET_AUTH_SESSION_KEY op and then request service tickets via CEPHX_GET_PRINCIPAL_SESSION_KEY op in a separate message. Since nautilus, desired service tickets are shared togther with auth ticket in CEPHX_GET_AUTH_SESSION_KEY reply. - propagate session key and connection secret, if any. In preparation for msgr2, update handle_reply() and verify_authorizer_reply() auth ops to propagate session key and connection secret. Since nautilus, if secure mode is negotiated, connection secret is shared either in CEPHX_GET_AUTH_SESSION_KEY reply (for mons) or in a final authorizer reply (for osds and mdses). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: safer en/decoding of cephx requests and repliesIlya Dryomov1-21/+26
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: more insight into ticket expiry and invalidationIlya Dryomov1-14/+25
Make it clear that "need" is a union of "missing" and "have, but up for renewal" and dout when the ticket goes missing due to expiry or invalidation by client. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: move msgr1 protocol specific fields to its own structIlya Dryomov2-212/+216
A couple whitespace fixups, no functional changes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: move msgr1 protocol implementation to its own fileIlya Dryomov3-1496/+1504
A pure move, no other changes. Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}() helpers are also moved. msgr2 will bring its own, more efficient, variants based on iov_iter. Switching msgr1 to them was considered but decided against to avoid subtle regressions. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: separate msgr1 protocol implementationIlya Dryomov1-50/+88
In preparation for msgr2, define internal messenger <-> protocol interface (as opposed to external messenger <-> client interface, which is struct ceph_connection_operations) consisting of try_read(), try_write(), revoke(), revoke_incoming(), opened(), reset_session() and reset_protocol() ops. The semantics are exactly the same as they are now. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15libceph: export remaining protocol independent infrastructureIlya Dryomov1-82/+75
In preparation for msgr2, make all protocol independent functions in messenger.c global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>