gcc in f8/x86 using more stack than in rhel5?

Jakub Jelinek jakub at redhat.com
Fri Aug 24 13:59:47 UTC 2007


On Thu, Aug 23, 2007 at 04:20:41PM -0500, Eric Sandeen wrote:
> I did a quick experiment, and it seems that by and large, gcc in f8 (gcc
> version 4.1.2 20070821 (Red Hat 4.1.2-19)) is using slightly more stack
> when building a kernel than is gcc on rhel5 (gcc version 4.1.1 20070105
> (Red Hat 4.1.1-52))  There are a few functions using less, too, though.
> 
> After much checkstack output frobbing:
> 
> function [module] : old : new : delta
> --------------------------------------
> acpi_add_single_object [vmlinux]: 156 160 : 4
> acpi_cpufreq_cpu_init [acpi-cpufreq]: 136 140 : 4
> adi_connect [adi]: 144 148 : 4
> aes_decrypt [aes]: 140 144 : 4
> ahc_pci_config [aic7xxx]: 152 148 : -4
> ata_do_eh [libata]: 292 304 : 12
> aty128_probe [aty128fb]: 272 276 : 4
> balance_internal [reiserfs]: 148 160 : 12
> bond_arp_send_all [bonding]: 132 136 : 4
> capidtmf_recv_block [divacapi]: 188 192 : 4
> cciss_update_non_disk_devices [cciss]: 548 552 : 4
> cfb_copyarea [vmlinux]: 148 144 : -4
> check_balance [reiserfs]: 268 272 : 4
> copy_to_user_tmpl [vmlinux]: 400 404 : 4
> ctnetlink_new_expect [nf_conntrack_netlink]: 232 236 : 4
> decode_rs16 [reed_solomon]: 220 224 : 4
> diNewExt [jfs]: 112 108 : -4
> diva_add_card [divacapi]: 144 156 : 12
> do_balance [reiserfs]: 308 320 : 12
> do_cciss_request [cciss]: 544 548 : 4
> do_con_write [vmlinux]: 160 156 : -4
> do_tx [eni]: 120 124 : 4
> ea_dealloc_unstuffed [gfs2]: 116 120 : 4
> ehci_urb_enqueue [ehci-hcd]: 196 212 : 16
> ext4_expand_extra_isize_ea [ext4dev]: 128 132 : 4
> ext4_ext_insert_extent [ext4dev]: 144 148 : 4
> ext4_ext_remove_space [ext4dev]: 120 124 : 4
> facility_req [divacapi]: 396 392 : -4
> __fat_readdir [fat]: 252 256 : 4
> fat_search_long [fat]: 408 412 : 4
> fetch_frame [cpia]: 184 196 : 12
> ftdi_elan_status_work [ftdi-elan]: 252 248 : -4
> gdth_detect [gdth]: 440 436 : -4
> get_far_parent [reiserfs]: 140 152 : 12
> hfcpci_interrupt [hisax]: 356 360 : 4
> hptiop_probe [hptiop]: 152 156 : 4
> huft_build [vmlinux]: 152 144 : -8
> ieee80211_master_start_xmit [mac80211]: 176 180 : 4
> ieee80211_sta_work [mac80211]: 464 476 : 12
> i_ipmi_request [ipmi_msghandler]: 108 100 : -8
> inftl_scan_bbt [diskonchip]: 196 200 : 4
> ip_route_input [vmlinux]: 196 192 : -4
> ip_setsockopt [vmlinux]: 448 452 : 4
> isdn_tty_write [isdn]: 160 164 : 4
> ivtv_process_vbi_data [ivtv]: 116 120 : 4
> jfs_readdir [jfs]: 348 340 : -8
> key_schedule [cast5]: 608 596 : -12
> matroxfb_dh_set_par [matroxfb_crtc2]: 116 120 : 4
> matroxfb_ioctl [matroxfb_base]: 136 140 : 4
> mmc_blk_issue_rq [mmc_block]: 360 356 : -4
> module_verify_signature [vmlinux]: 124 120 : -4
> myri10ge_xmit [myri10ge]: 112 116 : 4
> nv_tx_timeout [forcedeth]: 156 144 : -12
> ocfs2_write_cluster_by_desc [ocfs2]: 112 108 : -4
> os_scsi_tape_open [osst]: 180 184 : 4
> paging32_page_fault [kvm]: 116 108 : -8
> parse_audio_unit [snd-usb-audio]: 132 144 : 12
> patch_cmi9880 [snd-hda-intel]: 108 124 : 16
> pbus_size_mem [vmlinux]: 124 128 : 4
> pkt_open [pktcdvd]: 556 560 : 4
> prism2_plx_probe [hostap_plx]: 116 136 : 20
> qla1280_nvram_config [qla1280]: 128 160 : 32
> r300_do_cp_cmdbuf [radeon]: 580 572 : -8
> radeon_check_modes [radeonfb]: 204 212 : 8
> radeon_get_pllinfo [radeonfb]: 176 192 : 16
> s2io_add_isr [s2io]: 176 184 : 8
> savage_dispatch_draw [savage]: 304 308 : 4
> sd_revalidate_disk [sd_mod]: 156 116 : -40
> search_by_key [reiserfs]: 256 272 : 16
> send_s870 [atp870u]: 100 104 : 4
> service_interrupt [atmel]: 188 192 : 4
> _snd_emu10k1_audigy_init_efx [snd-emu10k1]: 160 164 : 4
> snd_emu10k1_init_efx [snd-emu10k1]: 136 172 : 36
> snd_intel8x0_probe [snd-intel8x0]: 156 148 : -8
> snd_mixart_hw_params [snd-mixart]: 268 272 : 4
> snd_pcm_common_ioctl1 [snd-pcm]: 272 276 : 4
> snd_pcm_hw_refine [snd-pcm]: 160 168 : 8
> snd_pcm_oss_change_params [snd-pcm-oss]: 168 176 : 8
> snd_usb_create_midi_interface [snd-usb-lib]: 112 116 : 4
> sr_probe [sr_mod]: 140 144 : 4
> start_preview [saa7134]: 212 232 : 20
> st_ioctl [st]: 128 144 : 16
> stv680_newframe [stv680]: 176 192 : 16
> svcauth_gss_accept [auth_rpcgss]: 216 208 : -8
> sys_copyarea [syscopyarea]: 144 140 : -4
> sys_init_module [vmlinux]: 232 236 : 4
> tcp_sendmsg [vmlinux]: 128 120 : -8
> tcp_v4_do_rcv [vmlinux]: 104 108 : 4
> tulip_init_one [tulip]: 136 140 : 4
> tveeprom_hauppauge_analog [tveeprom]: 156 164 : 8
> txCommit [jfs]: 168 180 : 12
> udf_get_block [udf]: 412 416 : 4
> udf_get_filename [udf]: 584 580 : -4
> write_filehandle [nfsd]: 208 204 : -4
> xfs_alloc_delrec [xfs]: 132 144 : 12
> xfs_bmap_btalloc [xfs]: 172 180 : 8
> xfs_bmbt_delrec [xfs]: 212 216 : 4
> xfs_bmbt_newroot [xfs]: 132 140 : 8
> xfs_da_do_buf [xfs]: 188 192 : 4
> xfs_inobt_delrec [xfs]: 148 152 : 4
> xfs_inobt_newroot [xfs]: 172 176 : 4
> xtSearch [jfs]: 104 116 : 12
> xtUpdate [jfs]: 404 424 : 20
> zd1201_usbrx [zd1201]: 100 108 : 8
> ------------------------
> Functions increased: 79
> Functions decreased: 25
> Net change: +432 bytes
> 
> 
> that starts with checkstack output, so it's only checking functions
> using > 100 bytes of stack, and then the above is only showing functions
> with changed stack usage... but from a spot-check, smaller stack-users
> are affected as well.
> 
> With 4KSTACKS on x86, I'm afraid this could add up to more problems.
> 
> Any idea what might be causing this?

At least for the 4 testcases you've mailed me, there are two different
causes:
1) http://gcc.gnu.org/PR30364
2) http://gcc.gnu.org/PR30931

E.g. snd_emu10k1_init_efx growth is caused by 1), while qla1280_nvram_config
by 2), the other two I believe are caused by both together.

PR30364 is a fix for when -fwrapv isn't used (but that option on the other
hand can pessimize loops), it is unsafe to reassociate
additions/subtractions if the type doesn't have defined overflow behavior.
Say with int a, b, (a - 20) + (b - 20) is unsafe to reassociate into
a + b - 40, because for certain values of a and b the former wouldn't
overflow while the latter will.  In the qla1280_nvram_config
case this is with pointers which at least as richi wrote the patch
are considered also to have undefined overflow behavior.  But gcc
4.1.x/4.2.x (unlike the trunk) still reassociate e.g.
struct A
{
  unsigned int a, b, c;
};

struct B
{
  struct A d[12];
};

struct A *
foo (struct B *x, int y)
{
 return &x->d[y - 8];
}

	Jakub




More information about the kernel mailing list