Resolves: https://issues.redhat.com/browse/RHEL-7028
Currently, nfs dumping fails on some machines that has a dedicated PHY driver (dealing with the physical layer) or MDIO bus (connecting the MAC to PHY devices) driver. This is because kexec-tools doesn't install dedicated PHY or MDIO driver explicitly and the NIC driver don't specify the dependency on the needed PHY or MDIO driver. So when the dependency on a PHY driver or MDIO driver is not found by dracut's instmods, the PHY or MDIO driver won't be installed.
This patch passes =drivers/net/phy and =drivers/net/mdio to dracut's instmods which will only install in-use PHY or MDIO driver(s).
Note ideally we should find out which PHY driver is used by a NIC but unfortunately currently no universal way can be found (/sys/class/net/NIC_NAME/device/driver/module can be used to find the name of the PHY driver for some NICs but it doesn't exist for some NICs like Qualcomm Atheros AR8031). So is it for a MDIO bus driver. Fortunately currently no huge memory consumption is found for a PHY or MDIO driver.
Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers") Reported-by: Doreen Alongi dalongi@redhat.com Signed-off-by: Coiby Xu coxu@redhat.com --- dracut-module-setup.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index ff53d084..905e6fbd 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -381,7 +381,7 @@ _get_hpyerv_physical_driver() { kdump_install_nic_driver() { local _netif _driver _drivers
- _drivers=() + _drivers=('=drivers/net/phy' '=drivers/net/mdio')
for _netif in $1; do [[ $_netif == lo ]] && continue
Currently, network dumping failed over a NIC that is a Single Root I/O Virtualization (SR-IOV) virtual device. Usually the driver of the virtual device won't specify the dependency on the driver of the physical device. So to fix this issue, the driver of the physical device needs to be found and installed as well.
Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers") Signed-off-by: Coiby Xu coxu@redhat.com --- dracut-module-setup.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index 905e6fbd..a4544a77 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -378,6 +378,14 @@ _get_hpyerv_physical_driver() { _get_nic_driver "$_physical_nic" }
+_get_physical_function_driver() { + local _physfn_dir=/sys/class/net/"$1"/device/physfn + + if [[ -e "$_physfn_dir" ]]; then + basename "$(readlink -f "$_physfn_dir"/driver)" + fi +} + kdump_install_nic_driver() { local _netif _driver _drivers
@@ -405,6 +413,9 @@ kdump_install_nic_driver() { fi
_drivers+=("$_driver") + # For a Single Root I/O Virtualization (SR-IOV) virtual device, + # the driver of physical device needs to be installed as well + _drivers+=("$(_get_physical_function_driver "$_netif")") done
[[ -n ${_drivers[*]} ]] || return
Hi Coiby,
On 09/27/23 at 10:34am, Coiby Xu wrote:
Resolves: https://issues.redhat.com/browse/RHEL-7028
Currently, nfs dumping fails on some machines that has a dedicated PHY driver (dealing with the physical layer) or MDIO bus (connecting the MAC to PHY devices) driver. This is because kexec-tools doesn't install dedicated PHY or MDIO driver explicitly and the NIC driver don't specify the dependency on the needed PHY or MDIO driver. So when the dependency
Do you know why the NIC driver don't specify the dependency? In theory, it should be. Is there chance this can be fixed in kernel or the NIC driver at the same time?
on a PHY driver or MDIO driver is not found by dracut's instmods, the PHY or MDIO driver won't be installed.
This patch passes =drivers/net/phy and =drivers/net/mdio to dracut's instmods which will only install in-use PHY or MDIO driver(s).
Note ideally we should find out which PHY driver is used by a NIC but unfortunately currently no universal way can be found (/sys/class/net/NIC_NAME/device/driver/module can be used to find the name of the PHY driver for some NICs but it doesn't exist for some NICs like Qualcomm Atheros AR8031). So is it for a MDIO bus driver. Fortunately currently no huge memory consumption is found for a PHY or MDIO driver.
Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers") Reported-by: Doreen Alongi dalongi@redhat.com Signed-off-by: Coiby Xu coxu@redhat.com
dracut-module-setup.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index ff53d084..905e6fbd 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -381,7 +381,7 @@ _get_hpyerv_physical_driver() { kdump_install_nic_driver() { local _netif _driver _drivers
- _drivers=()
_drivers=('=drivers/net/phy' '=drivers/net/mdio')
for _netif in $1; do [[ $_netif == lo ]] && continue
-- 2.41.0 _______________________________________________ kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Hi Baoquan,
On Thu, Sep 28, 2023 at 07:51:01AM +0800, Baoquan He wrote:
Hi Coiby,
On 09/27/23 at 10:34am, Coiby Xu wrote:
Resolves: https://issues.redhat.com/browse/RHEL-7028
Currently, nfs dumping fails on some machines that has a dedicated PHY driver (dealing with the physical layer) or MDIO bus (connecting the MAC to PHY devices) driver. This is because kexec-tools doesn't install dedicated PHY or MDIO driver explicitly and the NIC driver don't specify the dependency on the needed PHY or MDIO driver. So when the dependency
Do you know why the NIC driver don't specify the dependency? In theory, it should be. Is there chance this can be fixed in kernel or the NIC driver at the same time?
Sorry, I lost track of this work. Hangbin told me he can't answer the question if an NIC driver should specify the dependency since it's not his expertise.
So I dug a bit deeper by myself. My conclusion is a NIC driver (MAC driver) shouldn't specify dependency on a specific PHY driver. A MAC driver is for dealing with the Data link layer and a PHY driver is for physical layer. So as long as a MAC driver can talk to the PHY layer via APIs, it doesn't care which PHY driver or device it's talking to. More can be found on https://docs.kernel.org/networking/phy.html. There are even external hot-pluggable PHY devices as seen from https://www.kernel.org/doc/html/latest/networking/sfp-phylink.html
So we shouldn't fix it in NIC or the kernel. Sorry, maybe my commit message is a bit misleading when I said "the NIC driver don't specify the dependency on the needed PHY or MDIO driver". So unless a driver e.g. r8169 explicitly depend on a certain PHY driver, we shouldn't specify the dependency in the NIC driver,
commit 11287b693d03830010356339e4ceddf47dee34fa Author: Heiner Kallweit hkallweit1@gmail.com Date: Mon Jan 7 21:49:09 2019 +0100
r8169: load Realtek PHY driver module before r8169
This soft dependency works around an issue where sometimes the genphy driver is used instead of the dedicated PHY driver. The root cause of the issue isn't clear yet. People reported the unloading/re-loading module r8169 helps, and also configuring this soft dependency in the modprobe config files. Important just seems to be that the realtek module is loaded before r8169.
Once this has been applied preliminary fix 38af4b903210 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device") will be removed.
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 784ae5001656..abb94c543aa2 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -708,6 +708,7 @@ module_param(use_dac, int, 0); MODULE_PARM_DESC(use_dac, "Enable PCI DAC. Unsafe on 32 bit PCI slot."); module_param_named(debug, debug.msg_enable, int, 0); MODULE_PARM_DESC(debug, "Debug verbosity level (0=none, ..., 16=all)"); +MODULE_SOFTDEP("pre: realtek");
On Thu, Apr 18, 2024 at 02:15:29PM GMT, Coiby Xu wrote:
Hi Baoquan,
On Thu, Sep 28, 2023 at 07:51:01AM +0800, Baoquan He wrote:
Hi Coiby,
On 09/27/23 at 10:34am, Coiby Xu wrote:
Resolves: https://issues.redhat.com/browse/RHEL-7028
Currently, nfs dumping fails on some machines that has a dedicated PHY driver (dealing with the physical layer) or MDIO bus (connecting the MAC to PHY devices) driver. This is because kexec-tools doesn't install dedicated PHY or MDIO driver explicitly and the NIC driver don't specify the dependency on the needed PHY or MDIO driver. So when the dependency
Do you know why the NIC driver don't specify the dependency? In theory, it should be. Is there chance this can be fixed in kernel or the NIC driver at the same time?
Sorry, I lost track of this work. Hangbin told me he can't answer the question if an NIC driver should specify the dependency since it's not his expertise.
So I dug a bit deeper by myself. My conclusion is a NIC driver (MAC driver) shouldn't specify dependency on a specific PHY driver. A MAC driver is for dealing with the Data link layer and a PHY driver is for physical layer. So as long as a MAC driver can talk to the PHY layer via APIs, it doesn't care which PHY driver or device it's talking to. More can be found on https://docs.kernel.org/networking/phy.html. There are even external hot-pluggable PHY devices as seen from https://www.kernel.org/doc/html/latest/networking/sfp-phylink.html
So we shouldn't fix it in NIC or the kernel. Sorry, maybe my commit message is a bit misleading when I said "the NIC driver don't specify the dependency on the needed PHY or MDIO driver".
I have improved the commit msg and sent the patches to https://github.com/rhkdump/kdump-utils/pull/3
So unless a driver e.g. r8169 explicitly depend on a certain PHY driver, we shouldn't specify the dependency in the NIC driver,
commit 11287b693d03830010356339e4ceddf47dee34fa Author: Heiner Kallweit hkallweit1@gmail.com Date: Mon Jan 7 21:49:09 2019 +0100 r8169: load Realtek PHY driver module before r8169 This soft dependency works around an issue where sometimes the genphy driver is used instead of the dedicated PHY driver. The root cause of the issue isn't clear yet. People reported the unloading/re-loading module r8169 helps, and also configuring this soft dependency in the modprobe config files. Important just seems to be that the realtek module is loaded before r8169. Once this has been applied preliminary fix 38af4b903210 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device") will be removed.
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 784ae5001656..abb94c543aa2 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -708,6 +708,7 @@ module_param(use_dac, int, 0); MODULE_PARM_DESC(use_dac, "Enable PCI DAC. Unsafe on 32 bit PCI slot."); module_param_named(debug, debug.msg_enable, int, 0); MODULE_PARM_DESC(debug, "Debug verbosity level (0=none, ..., 16=all)"); +MODULE_SOFTDEP("pre: realtek");
-- Best regards, Coiby