April 2015 - targetcli-fb-devel - Fedora Mailing-Lists

Re: plans for user-backed device support

by Andy Grover

On 02/04/2015 12:21 PM, Alex Elsayed wrote: >> Maybe we step back and define a DBus interface > > I'd be quite happy with that - I considered suggesting it, in fact, but > wasn't sure of the prevailing opinion re: dbus around here. > > What would you do re: discovery, though? > > (Explaining some DBus things, which readers may already be aware of) > > In DBus, there's a two-level hierarchy of busnames holding objects. > Busnames are either the inherent, connection-level one of the :\d+.\d+ form, > or the human-readable well-known name form (reverse DNS). Objects, then, > implement interfaces. > > However, well-known names can only have a single owner - so discovering > which busnames have objects which implement an interface is non-trivial. > > The approach taken by KDE is to suffix the well-known name with the PID > (org.kde.StatusNotifierItem-2055 or whatever), call ListNames, and filter in > the client. This has the drawback of making DBus activation impossible. > > Another approach is for every implementor to try to claim the well-known > name, and on failure contact the existing owner to republish their objects > (possibly under a namespaced object path). This has the drawback of > complicating the implementation somewhat, as well as making bus activation > only able to activate a single 'default' implementation. > > A third approach would be to explicitly define a multiplexor, which backends > ask to republish their objects. This simplifies implementations, and it > could also provide its own API that requests a backend by name, and ensures > that backend's object is available. This could be driven by something as > simple as a key-value mapping from backend name to a well-known DBus name > specific to that backend, which the multiplexor calls to trigger service > activation. > > Thoughts? It really seems to come down to: will multiple independent user-handler daemons be needed? Because I'm trying really hard to make tcmu-runner good enough so that the answer is no :-) tcmu-runner supports multiple handler modules, so it's extensible. It is permissively licensed so no issues with non-FOSS handlers needing their own daemon. It also could be replaced entirely (either with a modified version of itself or from scratch) and still not give up the single-busname, service activation approach. So that would be my current preference. (The fact that the kernel API doesn't preclude multiple handler daemons does not mean we need to *support* those right away, or ever.) If there are likely use cases that tcmu-runner is unsuitable for solving by itself then that would change things of course, and let's please talk about them! -- Andy

8 years, 6 months

2
2
0 / 0

LIO crashing Fedora box, multiple versions and kernels tested

by Dan Lane

I have now built out several servers in an attempt to use LIO for my lab, but all recent attempts have met failure. I used a very similar setup in the past with a different lab so I know this all works, but that build used Ubuntu and an older kernel (Ubuntu 11.04). Here is a brief description of my test environment: Storage servers: IBM bladecenter using HS21 (8853) blades Disk backend: 10k SAS JBOD, RAID 1, 5 and 6 attempted Disk controllers: On-board LSI and Serveraid 8k (adaptec) tested Storage fabric: QLA2462 equivalent (IBM branded) 4gb FC cards (PCI-X based) OS: Fedora 19, 20 and 21 using multiple kernels, the latest being Kernel 3.19.3 I'm able to get things up and running, but as soon as the ESXi hosts start using the shared storage, the server blade crashes (it actually seems to lock up). I've also tried multiple versions of ESXi, just in case. To ensure the problem wasn't with the server I ran multiple loops of bonnie++, used Linux "stress", and ran memtest+ on the server for over 24 hours, with no failures. I found reports of similar problems in the past and it was suggested that the problem was with the performance of the backend storage. As unlikely as I expect this to be with my storage, here are the results of bonnie++ on my RAID 6 array, hopefully one of you are better at interpreting them than I am. [root@labsan1 bonnie]# bonnie++ -d /bonnie -r 4096 -s 32G -n 0 -m TEST -f -b -n 128 -u root Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP TEST 32G 336366 47 80745 12 352922 28 542.4 8 Latency 348ms 297ms 42853us 224ms Version 1.96 ------Sequential Create------ --------Random Create-------- TEST -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 128 1786 9 +++++ +++ 1033 5 1769 9 +++++ +++ 984 5 Latency 286ms 478us 448ms 282ms 32us 597ms I've seen quite a few different errors, but the following are some of the most common, but please let me know what I should do to collect the best possible logs for troubleshooting the problem. ------------------------------------------------------------------------------------------SNIP-------------------------------------------------------------------------------------------------------------- Message from syslogd@labsan1 at Feb 5 23:49:01 ... kernel:[17184.127000] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [systemd-udevd:1602] ------------------------------------------------------------------------------------------/SNIP-------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------SNIP-------------------------------------------------------------------------------------------------------------- Apr 3 00:24:34 labsan1 kernel: rport-5:0-9: blocked FC remote port time out: removing rport Apr 3 00:24:34 labsan1 kernel: rport-4:0-9: blocked FC remote port time out: removing rport Apr 3 00:24:34 labsan1 kernel: rport-6:0-9: blocked FC remote port time out: removing rport Apr 3 00:24:34 labsan1 kernel: rport-3:0-9: blocked FC remote port time out: removing rport Apr 3 00:26:40 labsan1 kernel: [41095.711611] MODE SENSE: unimplemented page/subpage: 0x1c/0x02 Apr 3 00:26:40 labsan1 kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 Apr 3 00:34:03 labsan1 kernel: [41538.771168] Detected MISCOMPARE for addr: ffff8800caeb7000 buf: ffff8800c9896e00 Apr 3 00:34:03 labsan1 kernel: [41538.771173] Target/iblock: Send MISCOMPARE check condition and sense Apr 3 00:34:03 labsan1 kernel: Detected MISCOMPARE for addr: ffff8800caeb7000 buf: ffff8800c9896e00 Apr 3 00:34:03 labsan1 kernel: Target/iblock: Send MISCOMPARE check condition and sense Apr 3 00:34:48 labsan1 kernel: [41584.159441] ABORT_TASK: Found referenced qla2xxx task_tag: 1170576 Apr 3 00:34:48 labsan1 kernel: [41584.159446] ABORT_TASK: ref_tag: 1170576 already complete, skipping Apr 3 00:34:48 labsan1 kernel: [41584.159448] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1170576 Apr 3 00:34:48 labsan1 kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1170576 Apr 3 00:34:48 labsan1 kernel: ABORT_TASK: ref_tag: 1170576 already complete, skipping Apr 3 00:34:48 labsan1 kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1170576 Apr 3 00:34:49 labsan1 kernel: [41585.156976] ABORT_TASK: Found referenced qla2xxx task_tag: 1171236 Apr 3 00:34:49 labsan1 kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1171236 Apr 3 00:34:50 labsan1 kernel: [41586.351683] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1171236 Apr 3 00:34:50 labsan1 kernel: [41586.351691] ABORT_TASK: Found referenced qla2xxx task_tag: 1172644 Apr 3 00:34:50 labsan1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1171236 Apr 3 00:34:50 labsan1 kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1172644 Apr 3 00:34:50 labsan1 kernel: [41586.472423] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1172644 Apr 3 00:34:50 labsan1 kernel: [41586.472432] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1147388 Apr 3 00:34:50 labsan1 kernel: [41586.472438] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1171368 Apr 3 00:34:50 labsan1 kernel: [41586.472441] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1171456 Apr 3 00:34:50 labsan1 kernel: [41586.472445] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1171500 ------------------------------------------------------------------------------------------/SNIP------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------SNIP-------------------------------------------------------------------------------------------------------------- Apr 3 02:04:40 labsan1 kernel: [ 2282.079013] rport-6:0-5: blocked FC remote port time out: no longer a FCP target, removing starget Apr 3 02:04:40 labsan1 kernel: rport-6:0-5: blocked FC remote port time out: no longer a FCP target, removing starget Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] ------------[ cut here ]------------ Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] WARNING: CPU: 3 PID: 0 at kernel/watchdog.c:317 watchdog_overflow_callback+0x92/0xc0() Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] Watchdog detected hard LOCKUP on cpu 3 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security i p6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip table_security iptable_raw coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support gpio_ich ipmi_ssif ipmi_devintf lpc_ich mfd_core i5000 _edac serio_raw edac_core ses ioatdma enclosure i5k_amb shpchp dca ipmi_si acpi_cpufreq ipmi_msghandler nfsd auth_rpcgss nfs_acl lock d grace sunrpc radeon i2c_algo_bit drm_kms_helper ttm drm qla2xxx bnx2 ata_generic pata_acpi scsi_transport_fc aacraid Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.19.3-200.fc21.x86_64 #1 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] Hardware name: IBM IBM eServer BladeCenter HS21 -[8853L6U]-/Server Blade, BIOS -[BCE14 8BUS-1.21]- 04/04/2011 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] 0000000000000000 ef4cea677ede47e6 ffff88012fd85a60 ffffffff8176e215 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] 0000000000000000 ffff88012fd85ab8 ffff88012fd85aa0 ffffffff8109bc1a Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] 0000000000000000 ffff88012a510000 0000000000000000 ffff88012fd85c00 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] Call Trace: Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] <NMI> [<ffffffff8176e215>] dump_stack+0x45/0x57 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8109bca5>] warn_slowpath_fmt+0x55/0x70 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81150ad2>] watchdog_overflow_callback+0x92/0xc0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8119191b>] __perf_event_overflow+0x9b/0x250 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81192434>] perf_event_overflow+0x14/0x20 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8103460a>] intel_pmu_handle_irq+0x1da/0x3f0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8102bafb>] perf_event_nmi_handler+0x2b/0x50 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81018fd8>] nmi_handle+0x88/0x130 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81019562>] default_do_nmi+0x42/0x110 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810196b8>] do_nmi+0x88/0xd0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81776d21>] end_repeat_nmi+0x1e/0x2e Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8177448a>] ? _raw_spin_lock_irqsave+0x4a/0x60 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8177448a>] ? _raw_spin_lock_irqsave+0x4a/0x60 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8177448a>] ? _raw_spin_lock_irqsave+0x4a/0x60 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] <<EOE>> <IRQ> [<ffffffffa00eade2>] qlt_fc_port_deleted+0x62/0xd0 [qla2xxx] Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffffa008cc13>] qla2x00_mark_device_lost+0x153/0x2e0 [qla2xxx] Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffffa00ac8a9>] qla2x00_async_event+0xe39/0x1890 [qla2xxx] Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810cb828>] ? sched_clock_cpu+0x88/0xb0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810c04ca>] ? update_rq_clock.part.78+0x1a/0xe0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810cf9c3>] ? update_blocked_averages+0x2f3/0x7a0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffffa00ade41>] qla24xx_intr_handler+0x1a1/0x2f0 [qla2xxx] Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810f2a47>] handle_irq_event_percpu+0x77/0x1a0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810f2bab>] handle_irq_event+0x3b/0x60 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8101e8ca>] ? native_sched_clock+0x2a/0x90 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810f5a99>] handle_fasteoi_irq+0x79/0x120 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff81017414>] handle_irq+0x74/0x140 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810bb54a>] ? atomic_notifier_call_chain+0x1a/0x20 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8177777f>] do_IRQ+0x4f/0xf0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8177556d>] common_interrupt+0x6d/0x6d Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] <EOI> [<ffffffff81103c98>] ? hrtimer_start+0x18/0x20 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8105ea56>] ? native_safe_halt+0x6/0x10 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810fabb3>] ? rcu_eqs_enter+0xa3/0xb0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8101f97e>] default_idle+0x1e/0xc0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8102034f>] arch_cpu_idle+0xf/0x20 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff810de15a>] cpu_startup_entry+0x37a/0x3c0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] [<ffffffff8104af3a>] start_secondary+0x1aa/0x1f0 Apr 3 02:05:10 labsan1 kernel: [ 2252.121715] ---[ end trace 05113eaf8d7e2e9d ]--- Apr 3 02:05:10 labsan1 kernel: [ 2312.441004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 0, t=60003 jiffies, g=3933, c=3932, q=0) Apr 3 02:05:10 labsan1 kernel: [ 2312.441207] Task dump for CPU 3: Apr 3 02:05:10 labsan1 kernel: [ 2312.441211] swapper/3 R running task 0 0 1 0x00000008 Apr 3 02:05:10 labsan1 kernel: [ 2312.441215] 0000000000000000 0000000000000000 ffffffffffffff1e ffffffff8105ea56 Apr 3 02:05:10 labsan1 kernel: [ 2312.441219] 0000000000000010 0000000000000246 ffff88012abf7e88 0000000000000018 Apr 3 02:05:10 labsan1 kernel: [ 2312.441222] ffffffff810fabb3 ffff88012abf7ea8 ffffffff8101f97e ffffffff81d2a6c0 Apr 3 02:05:10 labsan1 kernel: [ 2312.441226] Call Trace: Apr 3 02:05:10 labsan1 kernel: [ 2312.441237] [<ffffffff8105ea56>] ? native_safe_halt+0x6/0x10 Apr 3 02:05:10 labsan1 kernel: [ 2312.441242] [<ffffffff810fabb3>] ? rcu_eqs_enter+0xa3/0xb0 Apr 3 02:05:10 labsan1 kernel: [ 2312.441247] [<ffffffff8101f97e>] ? default_idle+0x1e/0xc0 Apr 3 02:05:10 labsan1 kernel: [ 2312.441251] [<ffffffff8102034f>] ? arch_cpu_idle+0xf/0x20 Apr 3 02:05:10 labsan1 kernel: [ 2312.441254] [<ffffffff810de15a>] ? cpu_startup_entry+0x37a/0x3c0 Apr 3 02:05:10 labsan1 kernel: [ 2312.441259] [<ffffffff8104af3a>] ? start_secondary+0x1aa/0x1f0 Apr 3 02:05:10 labsan1 kernel: ------------[ cut here ]------------ Apr 3 02:05:10 labsan1 kernel: WARNING: CPU: 3 PID: 0 at kernel/watchdog.c:317 watchdog_overflow_callback+0x92/0xc0() Apr 3 02:05:10 labsan1 kernel: Watchdog detected hard LOCKUP on cpu 3 Apr 3 02:05:10 labsan1 kernel: Modules linked in: Apr 3 02:05:16 labsan1 kernel: [ 2318.111331] ABORT_TASK: Found referenced qla2xxx task_tag: 1192752 Apr 3 02:05:30 labsan1 kernel: [ 2332.303075] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1192752 Apr 3 02:05:30 labsan1 kernel: [ 2332.303084] ABORT_TASK: Found referenced qla2xxx task_tag: 1199704 ------------------------------------------------------------------------------------------/SNIP-------------------------------------------------------------------------------------------------------------- Thanks, Dan

8 years, 11 months

6
20
0 / 0

New release: rtslib.fb54

by Andy Grover

Hi all, This release moves the contents of debian/ and rpm/ to different locations, to simplify the actual Debian package maintenance, see: https://github.com/agrover/rtslib-fb/issues/59 This leaves 'make rpm' and 'make deb' broken; I've opened: https://github.com/agrover/rtslib-fb/issues/61 if anyone has the inclination and skills to fix them up :) Github: https://github.com/agrover/targetcli-fb https://github.com/agrover/rtslib-fb https://github.com/agrover/configshell-fb tarballs: https://fedorahosted.org/released/targetcli-fb/ rtslib-fb 2.1.fb54: Andy Grover <agrover(a)redhat.com> (2): Move debian/ and rpm/ to example-debian/ and example-rpm/ update to 2.1.fb54 Regards -- Andy

8 years, 12 months

1
0
0 / 0

New releases: targetcli.fb40 and rtslib.fb53

by Andy Grover

Hi all, This release includes some small usability tweaks: it documents in targetcli a little better that the user will likely have to enable a service to get settings restored on reboot, and it also changes the targetctl script in rtslib-fb to not return a failure error code if recoverable errors were encountered. There were some cases where setting an attribute returned an error and led to the service being marked by systemd as failed, and that's more strict than we want. Github: https://github.com/agrover/targetcli-fb https://github.com/agrover/rtslib-fb https://github.com/agrover/configshell-fb tarballs: https://fedorahosted.org/released/targetcli-fb/ targetcli-fb 2.1.fb40: Andy Grover <agrover(a)redhat.com> (4): self.cfs_cwd is unused Don't need to set ui_so.name = so.name in refresh() Add enabling service to man page quickstart section update to 2.1.fb40 rtslib-fb 2.1.fb53: Andy Grover <agrover(a)redhat.com> (4): Add properties to RTSRoot that return new Group objects Remove trailing whitespace Do not fail if there were recoverable errors on restore update version to 2.1.fb53 Regards -- Andy

9 years

1
0
0 / 0

Re: target refuses to start due to sector size after update

by Andy Grover

On 04/03/2015 03:28 PM, Dan Lane wrote: > That took care of half the problem, but it's still failing because of: > > block/data: Cannot find attribute: fabric_max_sectors, skipped > > I tried adding it but either I'm putting it in the wrong location or > my syntax is wrong, please advise. Hmm? I think this means saveconfig.json contains fabric_max_sectors, but it doesn't exist as an attribute for LIO. So I'd think you'd want to remove it from saveconfig.json, not add it. BTW since this is a -fb -specific issue, we can spare everyone else and use the targetcli-fb list (CC'd) if further discussion is needed. -- Andy

9 years

2
1
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

targetcli-fb-devel April 2015