summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-03-19ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}Marcelo Ricardo Leitner11-87/+51
in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19ipv4,ipv6: grab rtnl before locking the socketMarcelo Ricardo Leitner2-14/+52
There are some setsockopt operations in ipv4 and ipv6 that are grabbing rtnl after having grabbed the socket lock. Yet this makes it impossible to do operations that have to lock the socket when already within a rtnl protected scope, like ndo dev_open and dev_stop. We normally take coarse grained locks first but setsockopt inverted that. So this patch invert the lock logic for these operations and makes setsockopt grab rtnl if it will be needed prior to grabbing socket lock. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19Merge branch 'listen_refactor_part_13'David S. Miller17-174/+110
Eric Dumazet says: ==================== inet: tcp listener refactoring, part 13 inet_hash functions are in a bad state : Too much IPv6/IPv4 copy/pasting. Lets refactor a bit. Idea is that we do not want to have an equivalent of inet_csk(sk)->icsk_af_ops for request socks in order to be able to use the right variant. In this patch series, I started to let IPv6/IPv4 converge to common helpers. Idea is to use ipv6_addr_set_v4mapped() even for AF_INET sockets, so that we can test if (sk->sk_family == AF_INET6 && !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) to tell if we deal with an IPv6 socket, or IPv4 one, at least in slow paths. Ideally, we could save 8 bytes per struct sock_common, if we alias skc_daddr & skc_rcv_saddr to skc_v6_daddr[3]/skc_v6_rcv_saddr[3]. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19inet: request sock should init IPv6/IPv4 addressesEric Dumazet5-10/+15
In order to be able to use sk_ehashfn() for request socks, we need to initialize their IPv6/IPv4 addresses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19inet: get rid of last __inet_hash_connect() argumentEric Dumazet3-9/+6
We now always call __inet_hash_nolisten(), no need to pass it as an argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19ipv6: get rid of __inet6_hash()Eric Dumazet6-75/+12
We can now use inet_hash() and __inet_hash() instead of private functions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19inet: add IPv6 support to sk_ehashfn()Eric Dumazet7-45/+45
Intent is to converge IPv4 & IPv6 inet_hash functions to factorize code. IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set() helpers. __inet6_hash can now use sk_ehashfn() instead of a private inet6_sk_ehashfn() and will simply use __inet_hash() in a following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19net: introduce sk_ehashfn() helperEric Dumazet2-11/+8
Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers for all kind of sockets (full sockets, timewait and request sockets) inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19netns: constify net_hash_mix() and various callersEric Dumazet9-31/+31
const qualifiers ease code review by making clear which objects are not written in a function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge branch 'mlx4-net'David S. Miller4-8/+24
Or Gerlitz says: ==================== mlx4 driver fixes for 4.0-rc Just few small fixes for the 4.0 rc cycle. The fix from Moni addresses an issue from 4.0-rc1 so we just need it for net. Eran's fix for off-by-one should go to 3.19.y too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net/mlx4_en: Set statistics bitmap at port initEran Ben Elisha1-2/+2
Port statistics bitmap will now be initialized at port init. Even before starting the port, statistics are visible to the user and must be properly masked. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18IB/mlx4: Saturate RoCE port PMA counters in case of overflowMajd Dibbiny1-4/+16
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of overflow, according to the IB spec, we have to saturate a counter to its max value, do that. Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net/mlx4_en: Fix off-by-one in ethtool statistics displayEran Ben Elisha1-1/+1
NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when displaying the statistics starting from tx_chksum_offload in ethtool. Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18IB/mlx4: Verify net device validity on port change eventMoni Shoua1-1/+5
Processing an event is done in a different context from the one when the event was dispatched. This requires a check that the slave net device is still valid when the event is being processed. The check is done under the iboe lock which ensure correctness. Fixes: a57500903093 ('IB/mlx4: Add port aggregation support') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge branch 'txq_max_rate'David S. Miller12-18/+181
Or Gerlitz says: ==================== Add max rate TXQ attribute Add the ability to set a max-rate limitation for TX queues. The attribute name is maxrate and the units are Mbs, to make it similar to the existing max-rate limitation knobs (ETS and SRIOV ndo calls). changes from V2: - added Documentation (thanks Florian and Tom) - rebased to latest net-next to comply with the swdev ndo removal - addressed more feedback from Dave on the comments style changes from V1: - addressed feedback from Dave changes from V0: - addressed feedback from Sergei John Fastabend (1): net: Add max rate tx queue attribute ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net/mlx4_en: Add tx queue maxrate supportOr Gerlitz1-0/+29
Add ndo_set_tx_maxrate support. To support per tx queue maxrate limit, we use the update-qp firmware command to do run-time rate setting for the qp that serves this tx ring. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net/mlx4_core: Add basic support for QP max-rate limitingOr Gerlitz7-6/+72
Add the low-level device commands and definitions used for QP max-rate limiting. This is done through the following elements: - read rate-limit device caps in QUERY_DEV_CAP: number of different rates and the min/max rates in Kbs/Mbs/Gbs units - enhance the QP context struct to contain rate limit units and value - allow to do run time rate-limit setting to QPs through the update-qp firmware command - QP rate-limiting is disallowed for VFs Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net: Add max rate tx queue attributeJohn Fastabend4-12/+80
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing for max-rate limiting. Along with DCB-ETS and BQL this provides another knob to tune queue performance. The limit units are Mbps. By default it is disabled. To disable the rate limitation after it has been set for a queue, it should be set to zero. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge tag 'sound-4.0-rc5' of ↵Linus Torvalds25-102/+167
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This is a collection of many small fixes. Most of fixes are for ASoC drivers, including the fixes of wrong field usages for boolean kctls. In addition, there is a fix in ASoC core for adding proper locks for component lists, and a fix for a HD-audio regression by the previous mono channel fix" * tag 'sound-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits) ALSA: hda - Treat stereo-to-mono mix properly ASoC: wm9713: Fix wrong value references for boolean kctl ASoC: wm9712: Fix wrong value references for boolean kctl ASoC: wm8960: Fix wrong value references for boolean kctl ASoC: wm8955: Fix wrong value references for boolean kctl ASoC: wm8904: Fix wrong value references for boolean kctl ASoC: wm8903: Fix wrong value references for boolean kctl ASoC: wm8731: Fix wrong value references for boolean kctl ASoC: wm2000: Fix wrong value references for boolean kctl ASoC: tas5086: Fix wrong value references for boolean kctl ASoC: pcm1681: Fix wrong value references for boolean kctl ASoC: es8238: Fix wrong value references for boolean kctl ASoC: cs4271: Fix wrong value references for boolean kctl ASoC: ak4641: Fix wrong value references for boolean kctl ASoC: adav80x: Fix wrong value references for boolean kctl ASoC: Fix component lists locking ASoC: Intel: remove conflicts when load/unload multiple firmware images ASoC: rt286: Change the DMI mapping for Dino ASoC: sgtl5000: remove useless register write clearing CHRGPUMP_POWERUP ASoC: fsl_ssi: Don't try to round-up for PM divisor calculation ...
2015-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds3-10/+18
Pull crypto fixes from Herbert Xu: "Fix a bug in the ARM XTS implementation that can cause failures in decrypting encrypted disks, and fix is a memory overwrite bug that can cause a crash which can be triggered from userspace" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - fix memory usage in GCM decryption crypto: arm/aes update NEON AES module to latest OpenSSL version
2015-03-18Merge branch 'for-linus' of ↵Linus Torvalds2-4/+30
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: - fix for potential race with module loading, from Petr Mladek. The race is very unlikely to be seen in real world and has been found by code inspection, but should be fixed for 4.0 anyway. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Fix subtle race with coming and going modules
2015-03-18Merge branch 'for-linus' of ↵Linus Torvalds5-33/+56
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - fixes for pen pen proximity / touch events in wacom driver, from Ping Cheng and Benjamin Tissoires - two new device-specific quirks from Oliver Neukum and Forest Wilkinson * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: wacom: check for wacom->shared before following the pointer HID: tivo: enable all buttons on the TiVo Slide Pro remote HID: add ALWAYS_POLL quirk for a Logitech 0xc007 HID: wacom: rely on actual touch down count to decide touch_down HID: wacom: do not send pen events before touch is up/forced out
2015-03-18cc2520: Add support for CC2591 amplifier.Brad Campbell3-8/+52
The TI CC2521 is an RF power amplifier that is designed to interface with the CC2520. Conveniently, it directly interfaces with the CC2520 and does not require any pins to be connected to a microcontroller/processor. Adding a CC2591 increases the CC2520's range, which is useful for border router and other wall-powered applications. Using the CC2591 with the CC2520 requires configuring the CC2520 GPIOs that are connected to the CC2591 to correctly set the CC2591 into TX and RX modes. Further, TI recommends that the CC2520_TXPOWER and CC2520_AGCCTRL1 registers are set differently to maximize the CC2591's performance. These settings are covered in TI Application Note AN065. This patch adds an optional `amplified` field to the cc2520 entry in the device tree. If present, the CC2520 will be configured to operate with a CC2591. The expected pin mapping is: CC2520 GPIO0 --> CC2591 EN CC2520 GPIO5 --> CC2591 PAEN Signed-off-by: Brad Campbell <bradjc5@gmail.com> Acked-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-18cc2520: Do not store platform_data in spi_deviceBrad Campbell1-49/+46
Storing the `platform_data` struct inside of the SPI struct when using the device tree allows for a later function to edit the content of that struct. This patch refactors the `cc2520_get_platformat_data` function to accept a pointer to a `cc2520_platform_data` struct and populates the fields inside of it. This change mirrors commit aaa1c4d226e4cd730075d3dac99a6d599a0190c7 ("at86rf230: copy pdata to driver allocated space"). Signed-off-by: Brad Campbell <bradjc5@gmail.com> Acked-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-18Merge branch 'rhashtable_remove_shift'David S. Miller5-17/+13
Herbert Xu says: ==================== rhashtable: Kill redundant shift parameter I was trying to squeeze bucket_table->rehash in by downsizing bucket_table->size, only to find that my spot had been taken over by bucket_table->shift. These patches kill shift and makes me feel better :) v2 corrects the typo in the test_rhashtable changelog and also notes the min_shift parameter in the tipc patch changelog. ==================== Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Remove max_shift and min_shiftHerbert Xu2-10/+1
Now that nobody uses max_shift and min_shift, we can safely remove them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18test_rhashtable: Use rhashtable max_size instead of max_shiftHerbert Xu1-1/+1
This patch converts test_rhashtable to use rhashtable max_size instead of the obsolete max_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18tipc: Use rhashtable max/min_size instead of max/min_shiftHerbert Xu1-2/+2
This patch converts tipc to use rhashtable max/min_size instead of the obsolete max/min_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18netlink: Use rhashtable max_size instead of max_shiftHerbert Xu1-1/+1
This patch converts netlink to use rhashtable max_size instead of the obsolete max_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Introduce max_size/min_sizeHerbert Xu2-4/+12
This patch adds the parameters max_size and min_size which are meant to replace max_shift and min_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Remove shift from bucket_tableHerbert Xu2-5/+2
Keeping both size and shift is silly. We only need one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge branch 'xgene-next'David S. Miller7-12/+98
Keyur Chudgar says: ==================== drivers: net: xgene: Add second SGMII based 1G interface This patch adds support for second SGMII based 1G interface. ==================== Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18drivers: net: xgene: Add second SGMII based 1G interfaceKeyur Chudgar4-12/+67
- Added resource initialization based on port-id field - Enabled second SGMII 1G interface Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18dtb: xgene: Add second SGMII based 1G interface nodeKeyur Chudgar2-0/+29
- Added new SGMII node for port 1 - Added port-id field Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Documentation: dtb: Add port-id field for APM X-Gene ethernetKeyur Chudgar1-0/+2
Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18pinctrl: sun4i: GPIOs configured as irq must be set to input before readingHans de Goede3-2/+17
On sun4i-a10, when GPIOs are configured as external interrupt the value for them in the data register does not seem to get updated, so set their mux to input (and restore afterwards) when reading the pin. Missed edges seem to be buffered, so this does not introduce a race condition. I've also tested this on sun5i-a13 and sun7i-a20 and those do not seem to be affected, the input value representation in the data register does seem to correctly get updated to the actual pin value while in irq mode there. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-03-18ovl: upper fs should not be R/Ohujianyang1-5/+19
After importing multi-lower layer support, users could mount a r/o partition as the left most lowerdir instead of using it as upperdir. And a r/o upperdir may cause an error like overlayfs: failed to create directory ./workdir/work during mount. This patch check the *s_flags* of upper fs and return an error if it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags* can be removed now. This patch also remove /* FIXME: workdir is not needed for a R/O mount */ from ovl_fill_super() because: 1) for upper fs r/o case Setting a r/o partition as upper is prevented, no need to care about workdir in this case. 2) for "mount overlay -o ro" with a r/w upper fs case Users could remount overlayfs to r/w in this case, so workdir should not be omitted. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-18ovl: check lowerdir amount for non-upper mounthujianyang1-1/+7
Recently multi-lower layer mount support allow upperdir and workdir to be omitted, then cause overlayfs can be mount with only one lowerdir directory. This action make no sense and have potential risk. This patch check the total number of lower directories to prevent mounting overlayfs with only one directory. Also, an error message is added to indicate lower directories exceed OVL_MAX_STACK limit. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-18ovl: print error message for invalid mount optionshujianyang1-0/+1
Overlayfs should print an error message if an incorrect mount option is caught like other filesystems. After this patch, improper option input could be clearly known. Reported-by: Fabian Sturm <fabian.sturm@aduu.de> Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-18Bluetooth: Fix potential NULL dereference in SMP channel setupMarcel Holtmann1-3/+5
When the allocation of the L2CAP channel for the BR/EDR security manager fails, then the smp variable might be NULL. In that case do not try to free the non-existing crypto contexts Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-18act_bpf: allow non-default TC_ACT opcodes as BPF exec outcomeDaniel Borkmann1-8/+28
Revisiting commit d23b8ad8ab23 ("tc: add BPF based action") with regards to eBPF support, I was thinking that it might be better to improve return semantics from a BPF program invoked through BPF_PROG_RUN(). Currently, in case filter_res is 0, we overwrite the default action opcode with TC_ACT_SHOT. A default action opcode configured through tc's m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC, TC_ACT_OK. In cls_bpf, we have the possibility to overwrite the default class associated with the classifier in case filter_res is _not_ 0xffffffff (-1). That allows us to fold multiple [e]BPF programs into a single one, where they would otherwise need to be defined as a separate classifier with its own classid, needlessly redoing parsing work, etc. Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode mapping, where we would allow above possibilities. Thus, like in cls_bpf, a filter_res of 0xffffffff (-1) means that the configured _default_ action is used. Any unkown return code from the BPF program would fail in tcf_bpf() with TC_ACT_UNSPEC. Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED, which both have the same semantics, we have the option to either use that as a default action (filter_res of 0xffffffff) or non-default BPF return code. All that will allow us to transparently use tcf_bpf() for both BPF flavours. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge branch 'tipc_netns_leak'David S. Miller4-94/+44
Ying Xue says: ==================== tipc: fix netns refcnt leak The series aims to eliminate the issue of netns refcount leak. But during fixing it, another two additional problems are found. So all of known issues associated with the netns refcnt leak are resolved at the same time in the patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18tipc: withdraw tipc topology server name when namespace is deletedYing Xue1-0/+3
The TIPC topology server is a per namespace service associated with the tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn before we call sk_release_kernel because the kernel socket release is done in init_net and trying to withdraw a TIPC name published in another namespace will fail with an error as: [ 170.093264] Unable to remove local publication [ 170.093264] (type=1, lower=1, ref=2184244004, key=2184244005) We fix this by breaking the association between the topology server name and socket before calling sk_release_kernel. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18tipc: fix a potential deadlock when nametable is purgedYing Xue1-2/+2
[ 28.531768] ============================================= [ 28.532322] [ INFO: possible recursive locking detected ] [ 28.532322] 3.19.0+ #194 Not tainted [ 28.532322] --------------------------------------------- [ 28.532322] insmod/583 is trying to acquire lock: [ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [ 28.532322] but task is already holding lock: [ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc] [ 28.532322] [ 28.532322] other info that might help us debug this: [ 28.532322] Possible unsafe locking scenario: [ 28.532322] [ 28.532322] CPU0 [ 28.532322] ---- [ 28.532322] lock(&(&nseq->lock)->rlock); [ 28.532322] lock(&(&nseq->lock)->rlock); [ 28.532322] [ 28.532322] *** DEADLOCK *** [ 28.532322] [ 28.532322] May be due to missing lock nesting notation [ 28.532322] [ 28.532322] 3 locks held by insmod/583: [ 28.532322] #0: (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50 [ 28.532322] #1: (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc] [ 28.532322] #2: (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc] [ 28.532322] [ 28.532322] stack backtrace: [ 28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194 [ 28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.532322] ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007 [ 28.532322] ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998 [ 28.532322] ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38 [ 28.532322] Call Trace: [ 28.532322] [<ffffffff81792f3e>] dump_stack+0x4c/0x65 [ 28.532322] [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0 [ 28.532322] [<ffffffff810a4df3>] ? __bfs+0x23/0x270 [ 28.532322] [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0 [ 28.532322] [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc] [ 28.532322] [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc] [ 28.532322] [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc] [ 28.532322] [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc] [ 28.532322] [<ffffffff8163dece>] ops_init+0x4e/0x150 [ 28.532322] [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10 [ 28.532322] [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190 [ 28.532322] [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50 [ 28.532322] [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc] [ 28.532322] [<ffffffffa0024000>] ? 0xffffffffa0024000 [ 28.532322] [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0 [ 28.532322] [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0 [ 28.532322] [<ffffffff810e725b>] ? do_init_module+0x2b/0x200 [ 28.532322] [<ffffffff810e7294>] do_init_module+0x64/0x200 [ 28.532322] [<ffffffff810e9353>] load_module+0x12f3/0x18e0 [ 28.532322] [<ffffffff810e5890>] ? show_initstate+0x50/0x50 [ 28.532322] [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110 [ 28.532322] [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to remove a publication with a name sequence, the name sequence's lock is held. However, when tipc_nametbl_remove_publ() calling tipc_nameseq_remove_publ() to remove the publication, it first tries to query name sequence instance with the publication, and then holds the lock of the found name sequence. But as the lock may be already taken in tipc_purge_publications(), deadlock happens like above scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name sequence's lock, the deadlock can be avoided if it's directly invoked by tipc_purge_publications(). Fixes: 97ede29e80ee ("tipc: convert name table read-write lock to RCU") Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18tipc: fix netns refcnt leakYing Xue3-92/+39
When the TIPC module is loaded, we launch a topology server in kernel space, which in its turn is creating TIPC sockets for communication with topology server users. Because both the socket's creator and provider reside in the same module, it is necessary that the TIPC module's reference count remains zero after the server is started and the socket created; otherwise it becomes impossible to perform "rmmod" even on an idle module. Currently, we achieve this by defining a separate "tipc_proto_kern" protocol struct, that is used only for kernel space socket allocations. This structure has the "owner" field set to NULL, which restricts the module reference count from being be bumped when sk_alloc() for local sockets is called. Furthermore, we have defined three kernel-specific functions, tipc_sock_create_local(), tipc_sock_release_local() and tipc_sock_accept_local(), to avoid the module counter being modified when module local sockets are created or deleted. This has worked well until we introduced name space support. However, after name space support was introduced, we have observed that a reference count leak occurs, because the netns counter is not decremented in tipc_sock_delete_local(). This commit remedies this problem. But instead of just modifying tipc_sock_delete_local(), we eliminate the whole parallel socket handling infrastructure, and start using the regular sk_create_kern(), kernel_accept() and sk_release_kernel() calls. Since those functions manipulate the module counter, we must now compensate for that by explicitly decrementing the counter after module local sockets are created, and increment it just before calling sk_release_kernel(). Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Cong Wang <cwang@twopensource.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18Merge branch 'listener_refactor_part_12'David S. Miller12-73/+77
Eric Dumazet says: ==================== inet: tcp listener refactoring, part 12 By adding a pointer back to listener, we are preparing synack rtx handling to no longer be governed by listener keepalive timer, as this is the most problematic source of contention on listener spinlock. Note that TCP FastOpen had such pointer anyway, so we make it generic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18inet: fix request sock refcountingEric Dumazet7-18/+22
While testing last patch series, I found req sock refcounting was wrong. We must set skc_refcnt to 1 for all request socks added in hashes, but also on request sockets created by FastOpen or syncookies. It is tricky because we need to defer this initialization so that future RCU lookups do not try to take a refcount on a not yet fully initialized request socket. Also get rid of ireq_refcnt alias. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 13854e5a6046 ("inet: add proper refcounting to request sock") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18inet: avoid fastopen lock for regular accept()Eric Dumazet1-2/+4
It is not because a TCP listener is FastOpen ready that all incoming sockets actually used FastOpen. Avoid taking queue->fastopenq->lock if not needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18tcp: rename struct tcp_request_sock listenerEric Dumazet7-25/+15
The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a full pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18inet: add rsk_listener field to struct request_sockEric Dumazet2-4/+10
Once we'll be able to lookup request sockets in ehash table, we'll need to get access to listener which created this request. This avoid doing a lookup to find the listener, which benefits for a more solid SO_REUSEPORT, and is needed once we no longer queue request sock into a listener private queue. Note that 'struct tcp_request_sock'->listener could be reduced to a single bit, as TFO listener should match req->rsk_listener. TFO will no longer need to hold a reference on the listener. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>