summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2009-02-03connector: create connector workqueue only while needed onceFrederic Weisbecker1-0/+8
The netlink connector uses its own workqueue to relay the datas sent from userspace to the appropriate callback. If you launch the test from Documentation/connector and change it a bit to send a high flow of data, you will see thousands of events coming to the "cqueue" workqueue by looking at the workqueue tracer. This flow of events can be sent very quickly. So, to not encumber the kevent workqueue and delay other jobs, the "cqueue" workqueue should remain. But this workqueue is pointless most of the time, it will always be created (assuming you have built it of course) although only developpers with specific needs will use it. So avoid this "most of the time useless task", this patch proposes to create this workqueue only when needed once. The first jobs to be sent to connector callbacks will be sent to kevent while the "cqueue" thread creation will be scheduled to kevent too. The following jobs will continue to be scheduled to keventd until the cqueue workqueue is created, and then the rest of the jobs will continue to perform as usual, through this dedicated workqueue. Each time I tested this patch, only the first event was sent to keventd, the rest has been sent to cqueue which have been created quickly. Also, this patch fixes some trailing whitespaces on the connector files. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01net: move bsockets outside of read only beginning of struct inet_hashinfoEric Dumazet1-1/+2
And switch bsockets to atomic_t since it might be changed in parallel. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler.Jarek Poplawski2-3/+5
Patrick McHardy <kaber@trash.net> suggested: > How about making this flag and the warning message (in a out-of-line > function) globally available? Other qdiscs (f.i. HFSC) can't deal with > inner non-work-conserving qdiscs as well. This patch uses qdisc->flags field of "suspected" child qdisc. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01net: add ARP notify option for devicesStephen Hemminger2-0/+2
This adds another inet device option to enable gratuitous ARP when device is brought up or address change. This is handy for clusters or virtualization. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01smsc911x: allow mac address to be saved before device resetSteve Glendinning1-0/+1
Some platforms (for example pcm037) do not have an EEPROM fitted, instead storing their mac address somewhere else. The bootloader fetches this and configures the ethernet adapter before the kernel is started. This patch allows a platform to indicate to the driver via the SMSC911X_SAVE_MAC_ADDRESS flag that the mac address has already been configured via such a mechanism, and should be saved before resetting the chip. Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01smsc911x: add external phy detection overridesSteve Glendinning1-0/+2
On LAN9115/LAN9117/LAN9215/LAN9217, external phys are supported. These are usually indicated by a hardware strap which sets an "external PHY detected" bit in the HW_CFG register. In some cases it is desirable to override this hardware strap and force use of either the internal phy or an external PHY. This patch adds SMSC911X_FORCE_INTERNAL_PHY and SMSC911X_FORCE_EXTERNAL_PHY flags so a platform can indicate this preference via its platform_data. Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-31Merge branch 'master' of ↵David S. Miller1-1/+1
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000/e1000_main.c
2009-01-30gro: Avoid copying headers of unmerged packetsHerbert Xu2-2/+26
Unfortunately simplicity isn't always the best. The fraginfo interface turned out to be suboptimal. The problem was quite obvious. For every packet, we have to copy the headers from the frags structure into skb->head, even though for 99% of the packets this part is immediately thrown away after the merge. LRO didn't have this problem because it directly read the headers from the frags structure. This patch attempts to address this by creating an interface that allows GRO to access the headers in the first frag without having to copy it. Because all drivers that use frags place the headers in the first frag this optimisation should be enough. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-30gro: Move common completion code into helpersHerbert Xu1-0/+3
Currently VLAN still has a bit of common code handling the aftermath of GRO that's shared with the common path. This patch moves them into shared helpers to reduce code duplication. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-28net: wrong test in inet_ehash_locks_alloc()Eric Dumazet1-1/+1
In commit 9db66bdcc83749affe61c61eb8ff3cf08f42afec (net: convert TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one occurrence of rwlock_t to spinlock_t I believe sizeof(raw_spinlock_t) might be > 0 on !CONFIG_SMP if CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this case. Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and rwlock_t, but at this might change in the future (being able to debug spinlocks but not rwlocks for example), better to be safe. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-28net: Allow RX queue selection to seed TX queue hashing.David S. Miller1-0/+15
The idea is that drivers which implement multiqueue RX pre-seed the SKB by recording the RX queue selected by the hardware. If such a seed is found on TX, we'll use that to select the outgoing TX queue. This helps get more consistent load balancing on router and firewall loads. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-27Phonet: use per-namespace devices listremi.denis-courmont@nokia1-1/+1
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-27Phonet: handle rtnetlink registration failureremi.denis-courmont@nokia1-1/+1
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-27Phonet: allow phonet_device_init() to fail, put it to __init sectionremi.denis-courmont@nokia2-2/+2
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-27Merge branch 'master' of ↵David S. Miller180-15290/+80
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-01-27Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6Linus Torvalds2-64/+5
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: i2c: Warn on deprecated binding model use eeprom: More consistent symbol names eeprom: Move 93cx6 eeprom driver to /drivers/misc/eeprom spi: Move at25 (for SPI eeproms) to /drivers/misc/eeprom i2c: Move old eeprom driver to /drivers/misc/eeprom i2c: Move at24 to drivers/misc/eeprom i2c: Quilt tree has moved i2c: Delete many unused adapter IDs i2c: Delete 10 unused driver IDs
2009-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds4-9/+8
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (92 commits) gianfar: Revive VLAN support vlan: Export symbols as non GPL symbols. bnx2x: tx_has_work should not wait for FW netxen: reduce memory footprint netxen: fix vlan tso/checksum offload net: Fix linux/if_frad.h's suitability for userspace. net: Move config NET_NS to from net/Kconfig to init/Kconfig isdn: Fix missing ifdef in isdn_ppp networking: document "nc" in addition to "netcat" in netconsole.txt e1000e: workaround hw errata af_key: initialize xfrm encap_oa virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs lcs: fix compilation for !CONFIG_IP_MULTICAST rtl8187: Add termination packet to prevent stall iwlwifi: fix rs_get_rate WARN_ON() p54usb: fix packet loss with first generation devices sctp: Fix another socket race during accept/peeloff sctp: Properly timestamp outgoing data chunks for rtx purposes sctp: Correctly start rtx timer on new packet transmissions. sctp: Fix crc32c calculations on big-endian arhes. ...
2009-01-26net: Fix linux/if_frad.h's suitability for userspace.Krzysztof Hałasa1-6/+4
The userspace interfaces are protected by CONFIG_* ifdefs and that of course can't work. Reported by Jaswinder Singh Rajput. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26i2c: Warn on deprecated binding model useJean Delvare1-3/+5
Let the kernel developers know that i2c_attach_client() and i2c_detach_client() are deprecated and should no longer be used. Drivers using these should be converted to the standard device driver binding model (probe and remove methods.) Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ben Dooks <ben-linux@fluff.org>
2009-01-26i2c: Delete many unused adapter IDsJean Delvare1-51/+0
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2009-01-26i2c: Delete 10 unused driver IDsJean Delvare1-10/+0
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2009-01-26Merge branch 'for_linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: ocfs2: Remove ocfs2_dquot_initialize() and ocfs2_dquot_drop() quota: Improve locking
2009-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds2-1/+8
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: klist.c: bit 0 in pointer can't be used as flag debugfs: introduce stub for debugfs_create_size_t() when DEBUG_FS=n sysfs: fix problems with binary files PNP: fix broken pnp lowercasing for acpi module aliases driver core: Convert '/' to '!' in dev_set_name()
2009-01-26Merge branch 'for-next' of ↵Linus Torvalds162-15197/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k,m68knommu: merge header files Resolve trivial conflict in arch/m68knommu/include/asm/Kbuild
2009-01-26Merge branch 'drm-fixes' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/i915: Fix cursor physical address choice to match the 2D driver. drm: stash AGP include under the do-we-have-AGP ifdef drm: don't whine about not reading EDID data drm/i915: hook up LVDS DPMS property drm/i915: remove unnecessary debug output in KMS init i915: fix freeing path for gem phys objects. drm: create mode_config idr lock drm: fix leak of device mappings since multi-master changes.
2009-01-26Merge branch 'for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI hotplug: fix lock imbalance in pciehp PCI PM: Restore standard config registers of all devices early PCI/MSI: bugfix/utilize for msi_capability_init()
2009-01-26Merge branch 'fixes' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: i.MX31: framebuffer driver i.MX31: Image Processing Unit DMA and IRQ drivers dmaengine: add async_tx_clear_ack() macro dmaengine: dma_issue_pending_all == nop when CONFIG_DMA_ENGINE=n dmaengine: kill some dubious WARN_ONCEs fsldma: print correct IRQ on mpc83xx fsldma: check for NO_IRQ in fsl_dma_chan_remove() dmatest: Use custom map/unmap for destination buffer fsldma: use a valid 'device' for dma_pool_create dmaengine: fix dependency chaining
2009-01-26Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: add and use INIT_WORK_ON_STACK rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR relay: fix lock imbalance in relay_late_setup_files oprofile: fix uninitialized use of struct op_entry rcu: move Kconfig menu softlock: fix false panic which can occur if softlockup_thresh is reduced rcu: add __cpuinit to rcu_init_percpu_data()
2009-01-26Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds3-11/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix inconsistent lock state on resume in hres_timers_resume time-sched.c: tick_nohz_update_jiffies should be static locking, hpet: annotate false positive warning kernel/fork.c: unused variable 'ret' itimers: remove the per-cpu-ish-ness
2009-01-26Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds4-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) xen: unitialised return value in xenbus_write_transaction x86: fix section mismatch warning x86: unmask CPUID levels on Intel CPUs, fix x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn. x86: use standard PIT frequency xen: handle highmem pages correctly when shrinking a domain x86, mm: fix pte_free() xen: actually release memory when shrinking domain x86: unmask CPUID levels on Intel CPUs x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> x86: fix PTE corruption issue while mapping RAM using /dev/mem x86: mtrr fix debug boot parameter x86: fix page attribute corruption with cpa() Revert "x86: signal: change type of paramter for sys_rt_sigreturn()" x86: use early clobbers in usercopy*.c x86: remove kernel_physical_mapping_init() from init section fix: crash: IP: __bitmap_intersects+0x48/0x73 cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write work_on_cpu: Use our own workqueue. work_on_cpu: don't try to get_online_cpus() in work_on_cpu. ...
2009-01-24com20020: Fix allyesconfig build failure.David S. Miller1-1/+1
Reported by Stephen Rothwell. Due to missing 'extern' in the com20020_netdev_ops declaration, each file that includes linux/com20020.h gets another copy defined in it's resulting object file. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23Merge branch 'fix/asoc' into for-linusTakashi Iwai1-1/+1
2009-01-23ASoC: Add missing comma to SND_SOC_DAPM_SWITCH_E in soc-dapm.hPeter Ujfalusi1-1/+1
Typo fix. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-01-23sctp: Fix crc32c calculations on big-endian arhes.Vlad Yasevich1-1/+1
crc32c algorithm provides a byteswaped result. On little-endian arches, the result ends up in big-endian/network byte order. On big-endinan arches, the result ends up in little-endian order and needs to be byte swapped again. Thus calling cpu_to_le32 gives the right output. Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: enable namespace support in ipv4 multicast routing codeBenjamin Thery1-1/+2
This last patch makes the appropriate changes to use and propagate the network namespace where needed in IPv4 multicast routing code. This consists mainly in replacing all the remaining init_net occurences with current netns pointer retrieved from sockets, net devices or mfc_caches depending on the routines' contexts. Some routines receive a new 'struct net' parameter to propagate the current netns: * vif_add/vif_delete * ipmr_new_tunnel * mroute_clean_tables * ipmr_cache_find * ipmr_cache_report * ipmr_cache_unresolved * ipmr_mfc_add/ipmr_mfc_delete * ipmr_get_route * rt_fill_info (in route.c) Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: declare reg_vif_num per-namespaceBenjamin Thery1-0/+3
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4. At the moment, this variable is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespaceBenjamin Thery1-0/+2
Preliminary work to make IPv4 multicast routing netns-aware. Declare IPv multicast routing variables 'mroute_do_assert' and 'mroute_do_pim' per-namespace in struct netns_ipv4. At the moment, these variables are only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: declare counter cache_resolve_queue_len per-namespaceBenjamin Thery1-0/+1
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable cache_resolve_queue_len per-namespace: move it into struct netns_ipv4. This variable counts the number of unresolved cache entries queued in the list mfc_unres_queue. This list is kept global to all netns as the number of entries per namespace is limited to 10 (hardcoded in routine ipmr_cache_unresolved). Entries belonging to different namespaces in mfc_unres_queue will be identified by matching the mfc_net member introduced previously in struct mfc_cache. Keeping this list global to all netns, also allows us to keep a single timer (ipmr_expire_timer) to handle their expiration. In some places cache_resolve_queue_len value was tested for arming or deleting the timer. These tests were equivalent to testing mfc_unres_queue value instead and are replaced in this patch. At the moment, cache_resolve_queue_len is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: dynamically allocate mfc_cache_arrayBenjamin Thery1-0/+1
Preliminary work to make IPv4 multicast routing netns-aware. Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array, and move it to struct netns_ipv4. At the moment, mfc_cache_array is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: store netns in struct mfc_cacheBenjamin Thery1-0/+15
This patch stores into struct mfc_cache the network namespace each mfc_cache belongs to. The new member is mfc_net. mfc_net is assigned at cache allocation and doesn't change during the rest of the cache entry life. A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres. This will help to retrieve the current netns around the IPv4 multicast routing code. At the moment, all mfc_cache are allocated in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: dynamically allocate vif_tableBenjamin Thery1-0/+2
Preliminary work to make IPv6 multicast routing netns-aware. Dynamically allocate interface table vif_table and move it to struct netns_ipv4, and update MIF_EXISTS() macro. At the moment, vif_table is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-23netns: ipmr: allocate mroute_socket per-namespace.Benjamin Thery1-0/+4
Preliminary work to make IPv4 multicast routing netns-aware. Make IPv4 multicast routing mroute_socket per-namespace, moves it into struct netns_ipv4. At the moment, mroute_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22Merge branch 'core/debugobjects' into core/urgentThomas Gleixner44-93/+191
2009-01-22debugobjects: add and use INIT_WORK_ON_STACKThomas Gleixner1-0/+6
Impact: Fix debugobjects warning debugobject enabled kernels spit out a warning in hpet code due to a workqueue which is initialized on stack. Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it in hpet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-01-22drm: create mode_config idr lockJesse Barnes1-1/+2
Create a separate mode_config IDR lock for simplicity. The core DRM config structures (connector, mode, etc. lists) are still protected by the mode_config mutex, but the CRTC IDR (used for the various identifier IDs) is now protected by the mode_config idr_mutex. Simplifies the locking a bit and removes a warning. All objects are protected by the config mutex, we may in the future, split the object further to have reference counts. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-01-22net: ppp_generic - introduce net-namespace functionality v2Cyrill Gorcunov1-0/+4
- Each namespace contains ppp channels and units separately with appropriate locks Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22virtio_net: add link status handlingMark McLoughlin1-0/+5
Allow the host to inform us that the link is down by adding a VIRTIO_NET_F_STATUS which indicates that device status is available in virtio_net config. This is currently useful for simulating link down conditions (e.g. using proposed qemu 'set_link' monitor command) but would also be needed if we were to support device assignment via virtio. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added future masking) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22inet: Allowing more than 64k connections and heavily optimize bind(0) time.Evgeniy Polyakov1-1/+2
With simple extension to the binding mechanism, which allows to bind more than 64k sockets (or smaller amount, depending on sysctl parameters), we have to traverse the whole bind hash table to find out empty bucket. And while it is not a problem for example for 32k connections, bind() completion time grows exponentially (since after each successful binding we have to traverse one bucket more to find empty one) even if we start each time from random offset inside the hash table. So, when hash table is full, and we want to add another socket, we have to traverse the whole table no matter what, so effectivelly this will be the worst case performance and it will be constant. Attached picture shows bind() time depending on number of already bound sockets. Green area corresponds to the usual binding to zero port process, which turns on kernel port selection as described above. Red area is the bind process, when number of reuse-bound sockets is not limited by 64k (or sysctl parameters). The same exponential growth (hidden by the green area) before number of ports reaches sysctl limit. At this time bind hash table has exactly one reuse-enbaled socket in a bucket, but it is possible that they have different addresses. Actually kernel selects the first port to try randomly, so at the beginning bind will take roughly constant time, but with time number of port to check after random start will increase. And that will have exponential growth, but because of above random selection, not every next port selection will necessary take longer time than previous. So we have to consider the area below in the graph (if you could zoom it, you could find, that there are many different times placed there), so area can hide another. Blue area corresponds to the port selection optimization. This is rather simple design approach: hashtable now maintains (unprecise and racely updated) number of currently bound sockets, and when number of such sockets becomes greater than predefined value (I use maximum port range defined by sysctls), we stop traversing the whole bind hash table and just stop at first matching bucket after random start. Above limit roughly corresponds to the case, when bind hash table is full and we turned on mechanism of allowing to bind more reuse-enabled sockets, so it does not change behaviour of other sockets. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Tested-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22dccp: Initialisation and type-checking of feature sysctlsGerrit Renker1-8/+0
This patch takes care of initialising and type-checking sysctls related to feature negotiation. Type checking is important since some of the sysctls now directly impact the feature-negotiation process. The sysctls are initialised with the known default values for each feature. For the type-checking the value constraints from RFC 4340 are used: * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes), tested and confirmed that it works up to 4294967295 - for Gbps speed; * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer); * CCIDs are between 0 .. 255; * request_retries, retries1, retries2 also between 0..255 for good measure; * tx_qlen is checked to be non-negative; * sync_ratelimit remains as before. Notes: ------ 1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c. 2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in other places, sometimes with exactly the same kind of definitions (e.g. "static int zero;"). It may be a good idea (kernel janitors?) to consolidate type checking. For the sake of keeping the changeset small and in order not to affect other subsystems, I have not strived to generalise here. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22dccp: Implement both feature-local and feature-remote Sequence Window featureGerrit Renker1-20/+4
This adds full support for local/remote Sequence Window feature, from which the * sequence-number-validity (W) and * acknowledgment-number-validity (W') windows derive as specified in RFC 4340, 7.5.3. Specifically, the following is contained in this patch: * integrated new socket fields into dccp_sk; * updated the update_gsr/gss routines with regard to these fields; * updated handler code: the Sequence Window feature is located at the TX side, so the local feature is meant if the handler-rx flag is false; * the initialisation of `rcv_wnd' in reqsk is removed, since - rcv_wnd is not used by the code anywhere; - sequence number checks are not done in the LISTEN state (cf. 7.5.3); - dccp_check_req checks the Ack number validity more rigorously; * the `struct dccp_minisock' became empty and is now removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>