questions about "targetctl" - targetcli-fb-devel - Fedora mailing-lists

14 Jun 2016


      Hi,
I've got a system that is acting as an OpenStack controller.  As such, it's 
exporting LVM block devices over iscsi using lio with the targetcli/targetctl 
tools.  The system being tested is running CentOS 7.
We have an active/standby configuration, and when we want to switch activity we 
need to shut down the iscsi targets, move activity to the other node, and then 
bring them back up again.  In order to shut down the iscsi targets we're saving 
the configuration with "targetctl" and then running "targetctl clear".  We then 
go on to try to deactivate the LVM volumes.
The problem we're running into is that if there are any actively-in-use targets 
(the backing store for a running boot-from-volume OpenStack guest, for example), 
after we run "targetctl clear" any LVM-related command hangs.  As an example, I 
ran "vgs" and it hung with the following stack trace:
[<ffffffff81081ae5>] flush_work+0x105/0x1d0
[<ffffffff81081c39>] __cancel_work_timer+0x89/0x120
[<ffffffff81081d03>] cancel_delayed_work_sync+0x13/0x20
[<ffffffff812dba60>] disk_block_events+0x80/0x90
[<ffffffff811dee0e>] __blkdev_get+0x6e/0x4d0
[<ffffffff811df445>] blkdev_get+0x1d5/0x360
[<ffffffff811df67b>] blkdev_open+0x5b/0x80
[<ffffffff811a1cc7>] do_dentry_open+0x1a7/0x2e0
[<ffffffff811a1ef9>] vfs_open+0x39/0x70
[<ffffffff811b131d>] do_last+0x1ed/0x1270
[<ffffffff811b4082>] path_openat+0xc2/0x490
[<ffffffff811b584b>] do_filp_open+0x4b/0xb0
[<ffffffff811a33c3>] do_sys_open+0xf3/0x1f0
[<ffffffff811a34de>] SyS_open+0x1e/0x20
[<ffffffff81681249>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
other times it looks like this:
[<ffffffff811e19a3>] do_blockdev_direct_IO+0xbc3/0x2560
[<ffffffff811e3395>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811ddc77>] blkdev_direct_IO+0x57/0x60
[<ffffffff81134913>] generic_file_aio_read+0x6d3/0x750
[<ffffffff811de0ec>] blkdev_aio_read+0x4c/0x70
[<ffffffff811a38bd>] do_sync_read+0x8d/0xd0
[<ffffffff811a401c>] vfs_read+0x9c/0x170
[<ffffffff811a4b6f>] SyS_read+0x7f/0xe0
[<ffffffff81681249>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
or this:
[<ffffffff812dba11>] disk_block_events+0x31/0x90
[<ffffffff811dee0e>] __blkdev_get+0x6e/0x4d0
[<ffffffff811df445>] blkdev_get+0x1d5/0x360
[<ffffffff811df67b>] blkdev_open+0x5b/0x80
[<ffffffff811a1cc7>] do_dentry_open+0x1a7/0x2e0
[<ffffffff811a1ef9>] vfs_open+0x39/0x70
[<ffffffff811b131d>] do_last+0x1ed/0x1270
[<ffffffff811b4082>] path_openat+0xc2/0x490
[<ffffffff811b584b>] do_filp_open+0x4b/0xb0
[<ffffffff811a33c3>] do_sys_open+0xf3/0x1f0
[<ffffffff811a34de>] SyS_open+0x1e/0x20
[<ffffffff81681249>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Eventually we seem to get a kernel log like this:
[23448.266758]  session41: session recovery timed out after 900 secs
followed by a number of logs like this:
[23448.266772] sd 48:0:0:0: rejecting I/O to offline device
[23448.272701] sd 48:0:0:0: rejecting I/O to offline device
[23448.278630] sd 48:0:0:0: rejecting I/O to offline device
[23448.284554] sd 48:0:0:0: rejecting I/O to offline device
[23448.290481] sd 48:0:0:0: rejecting I/O to offline device
And the hung processes continue on.
Are we doing something wrong here by just calling "targetctl clear" when there 
are actively-in-use targets?  Or is there a kernel bug in this CentOS kernel?
Thanks,
Chris