Hi Philipp,
On Wed, Mar 30, 2022 at 11:34 PM Philipp Rudo prudo@redhat.com wrote:
Hi Tao,
On Tue, 29 Mar 2022 20:45:33 +0800 Tao Liu ltao@redhat.com wrote:
When AMD sme/sev enabled, it will require extra memory reserved for swiotlb. Currently the swiotlb kernel parameter is removed by /etc/sysconfig/kdump for 2nd kernel, so a 64MB memory for swiotlb will be reserved by default [1], and the 2nd kernel will fail of OOM with the default crashkernel value.
swiotlb memory is mandatory for AMD sme/sev. For rhel8, kernel commit 0adb0f4 ("[x86] Reserve at most 64M of SWIOTLB memory for crashkernel")
is that commit id correct? I cannot find it in my rhel-8 clone...
Sorry, it's ab30d3f, I copied the wrong commit id...
implemented the reservation in the kernel side, as part of the kernel crashkernel=auto implementation.
For rhel9 however, crashkernel=auto has been mirrored to userspace kexec-tools, so this patch will enable the swiotlb reservation for AMD sme/sev in kexec-tools as well.
Not really about this patch but I find it weird that we adjust the crashkernel=auto value for swiotlb but not for other cases wher we know that we need a larger crashkernel, e.g. when the dump target is on a LUKS encrypted device. I understand that in the past, when crashkernel=auto was handled by the kernel, we simply didn't know about what the dump target will be when we allocated the crashkernel during boot. But today, when crashkernel=auto is handled in userspace we can.
I didn't have much context on LUKS, by looking into the code, I see only "kdumpctl estimate" has the LUKS related crashkernel value increment. It needs the user to adjust crashkernel value manually by estimate-and-set. Maybe we can merge the crashkernel adjustment of swiotlb and luks together, so luks don't need to "kdumpctl estimate" manually anymore?
Signed-off-by: Tao Liu ltao@redhat.com
kdump-lib-initramfs.sh | 10 ++++++++++ kdump-lib.sh | 11 +++++++++-- kdumpctl | 7 ++++++- 3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index 9be0fe9..699babd 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -8,6 +8,11 @@ KDUMP_CONFIG_FILE="/etc/kdump.conf" FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump" FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send"
+# Reserve 64MB for swiotlb when AMD sme/sev enabled. +AMD_SME_SEV_SWIOTLB=64 +# Copied from include/linux/swiotlb.h +IO_TLB_SHIFT=11
# Read kdump config in well formated style kdump_read_conf() { @@ -130,3 +135,8 @@ is_fs_dump_target() { [ -n "$(kdump_get_conf_val "ext[234]|xfs|btrfs|minix")" ] }
+is_amd_sme_sev_enabled() +{
dmesg | grep -sq "AMD Memory Encryption Features active"Is there a more reliable way, e.g. a sysfs attribute, to determine if SEV is enabled?
The dmesg has the problem that it is stored in a ring buffer. Which means if it runs full old entries are overwritten by new ones. So you cannot fully rely on it.
I agree with your point, dmesg isn't the ideal way to detect sme/sev enablement. However I didn't find a better way to do it.
Actually we need to verify the following:
1) sme enabled on host 2) sev enabled on host 3) sev enabled on VM guest
Only case 2 can be verified through $(cat /sys/module/kvm_amd/parameters/sev), $(dmesg | grep -sq "AMD Memory Encryption Features active") is generic for the 3 cases though...
Thanks, Tao Liu
Thanks Philipp
+} diff --git a/kdump-lib.sh b/kdump-lib.sh index 4ed5035..8ea6bfe 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -831,7 +831,7 @@ get_recommend_size() # $1 dump mode, if not specified, dump_mode will be judged by is_fadump_capable kdump_get_arch_recommend_crashkernel() {
local _arch _ck_cmdline _dump_mode
local _arch _ck_cmdline _dump_mode _mem_enc if [[ -z "$1" ]]; then if is_fadump_capable; then@@ -845,7 +845,14 @@ kdump_get_arch_recommend_crashkernel()
_arch=$(uname -m)
if [[ $_arch == "x86_64" ]] || [[ $_arch == "s390x" ]]; then
if [[ $_arch == "x86_64" ]]; thenif is_amd_sme_sev_enabled; then_mem_enc=$AMD_SME_SEV_SWIOTLBelse_mem_enc=0fi_ck_cmdline="1G-4G:$((192+$_mem_enc))M,4G-64G:$((256+$_mem_enc))M,64G-:$((512+$_mem_enc))M"elif [[ $_arch == "s390x" ]]; then _ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M" elif [[ $_arch == "aarch64" ]]; then _ck_cmdline="2G-:448M"diff --git a/kdumpctl b/kdumpctl index 1869753..50aafda 100755 --- a/kdumpctl +++ b/kdumpctl @@ -21,6 +21,7 @@ FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered" #kdump shall be the default dump mode DEFAULT_DUMP_MODE="kdump" image_time=0 +size_mb=$((1024 * 1024))
standard_kexec_args="-d -p"
@@ -622,6 +623,11 @@ load_kdump() { local ret
# Force 2nd kernel use swiotlb for AMD sme/sve to 64MB at most.if is_amd_sme_sev_enabled; thenKDUMP_COMMANDLINE_APPEND+=" swiotlb=$((($AMD_SME_SEV_SWIOTLB * $size_mb) >> $IO_TLB_SHIFT)) "fiKEXEC_ARGS=$(prepare_kexec_args "${KEXEC_ARGS}") KDUMP_COMMANDLINE=$(prepare_cmdline "${KDUMP_COMMANDLINE}" "${KDUMP_COMMANDLINE_REMOVE}" "${KDUMP_COMMANDLINE_APPEND}")@@ -1189,7 +1195,6 @@ do_estimate() local -A large_mods local baseline local kernel_size mod_size initrd_size baseline_size runtime_size reserved_size estimated_size recommended_size
local size_mb=$((1024 * 1024)) setup_initrd if [[ ! -f $TARGET_INITRD ]]; then