Ticket #2244 (assigned defect)

Opened 14 years ago

Last modified 7 years ago

MC consumes 100% cpu after wake up from suspend

Reported by: Spinal Owned by:
Priority: major Milestone: Future Releases
Component: mc-tty Version: 4.7.4
Keywords: high cpu Cc: zaytsev, slyfox, torohov_s_a@…, petre.rodan@…, graham@…
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

I often use hardware suspend to ram or suspend to disk between my sessions on PC. It sometimes occurs that something consumes 100% of my CPU (2.66 GHz is the CPU frequency, by the way) after wake up. When doing top I see that it's an MC instance running. The interesting thing is that I've closed all MC's but it's still running somewhere in background consuming 100% CPU. "killall mc" helps a lot. I don't know exactly how to trigger the bug, but it only occurs after wake up from suspend. Please let me know if I can assist you in fixing this.
P.S. The bug was introduced by Slavaz version of MC. I didn't mention such behaviour before.
P.P.S. I'm Gentoo user.
My MC version is 4.7.2
USE flags:
X edit gpm nls -samba -slang

Attachments

mc-2244-infinite-loop-when-stdin-fd-got-deleted.patch (2.2 KB) - added by and 8 years ago.

Change History

comment:1 Changed 14 years ago by angel_il

show please output of
'top' and 'ps ax'

comment:2 Changed 14 years ago by andrew_b

  • Version changed from version not selected to 4.7.2

comment:3 Changed 14 years ago by Spinal

Okay.

~ $ top

top - 21:21:55 up 1 day,  1:57,  4 users,  load average: 1.18, 0.97, 0.62
Tasks: 145 total,   2 running, 143 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 74.5%sy, 24.5%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.7%si,  0.0%st
Mem:   1035200k total,   850560k used,   184640k free,   117364k buffers
Swap:  3903784k total,     8896k used,  3894888k free,   371240k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
10460 spinal    30  10  8360 3776 2548 R 95.3  0.4   3:05.41 mc                 
30336 root      30  10 70316  39m 9304 S  2.3  4.0   2:27.24 X                  
10032 spinal    30  10  176m  86m  20m S  1.0  8.6   1:13.24 opera              
10784 spinal    30  10 47632  20m 9548 S  0.7  2.0   0:00.34 Terminal           
30426 spinal    30  10 26388  14m 6956 S  0.3  1.4   0:01.64 xfce4-netload-p    
    1 root      20   0  1624  520  496 S  0.0  0.1   0:00.32 init               
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd           
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.49 ksoftirqd/0        
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.38 events/0           
    5 root      20   0     0    0    0 S  0.0  0.0   0:00.01 khelper            
    6 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr          
    7 root      20   0     0    0    0 S  0.0  0.0   0:01.78 sync_supers        
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 bdi-default        
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.35 kblockd/0          
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpid             
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_notify       
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_hotplug      

~ $ ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 init [3]  
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00 [ksoftirqd/0]
    4 ?        S      0:00 [events/0]
    5 ?        S      0:00 [khelper]
    6 ?        S      0:00 [async/mgr]
    7 ?        S      0:01 [sync_supers]
    8 ?        S      0:00 [bdi-default]
    9 ?        S      0:00 [kblockd/0]
   10 ?        S      0:00 [kacpid]
   11 ?        S      0:00 [kacpi_notify]
   12 ?        S      0:00 [kacpi_hotplug]
   13 ?        S      0:04 [ata/0]
   14 ?        S      0:00 [ata_aux]
   15 ?        S      0:00 [ksuspend_usbd]
   16 ?        S      0:00 [khubd]
   17 ?        S      0:00 [kseriod]
   18 ?        S      0:02 [kswapd0]
   19 ?        S      0:00 [aio/0]
   20 ?        S      0:00 [crypto/0]
   24 ?        S      0:08 [scsi_eh_0]
   25 ?        S      0:00 [scsi_eh_1]
   26 ?        S      0:00 [scsi_eh_2]
   29 ?        S      0:00 [scsi_eh_3]
   32 ?        S      0:00 [edac-poller]
   33 ?        S      0:00 [usbhid_resumer]
   36 ?        S      0:00 [jbd2/sda5-8]
   37 ?        S      0:00 [ext4-dio-unwrit]
  129 ?        S<s    0:00 /sbin/udevd --daemon
  267 ?        S      0:00 [kpsmoused]
  500 ?        S      0:04 [flush-8:0]
  529 ?        S      0:00 [reiserfs/0]
  532 ?        S      0:00 [xfs_mru_cache]
  533 ?        S      0:02 [xfslogd/0]
  534 ?        S      0:00 [xfsdatad/0]
  535 ?        S      0:00 [xfsconvertd/0]
  536 ?        S      0:00 [xfsbufd]
  537 ?        S      0:00 [xfsaild]
  538 ?        S      0:00 [xfssyncd]
  539 ?        S      0:00 [jbd2/sda8-8]
  540 ?        S      0:00 [ext4-dio-unwrit]
  541 ?        S      0:00 [jbd2/sda9-8]
  542 ?        S      0:00 [ext4-dio-unwrit]
  543 ?        S      0:00 [xfsbufd]
  544 ?        S      0:00 [xfsaild]
  545 ?        S      0:00 [xfssyncd]
  546 ?        S      0:00 [xfsbufd]
  547 ?        S      0:00 [xfsaild]
  548 ?        S      0:00 [xfssyncd]
 3496 ?        S      0:00 supervising syslog-ng
 3497 ?        Ss     0:00 /usr/sbin/syslog-ng
 3560 ?        Ss     0:00 /usr/sbin/acpid
 3623 ?        Ss     0:00 /bin/bash /opt/scripts/sbin/acpid-helper
 3691 ?        Ss     0:00 /usr/bin/dbus-daemon --system
 3754 ?        Ssl    0:00 /usr/sbin/console-kit-daemon
 4525 ?        SNs    0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --user distcc --port 3632 --log-level critical --all
 4529 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --user distcc --port 3632 --log-level critical --all
 4589 ?        Ss     0:00 /usr/sbin/gpm -m /dev/input/mice -t ps2
 4652 ?        Ss     0:00 /usr/sbin/hald --use-syslog --verbose=no
 4653 ?        S      0:00 hald-runner
 4654 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --user distcc --port 3632 --log-level critical --all
 4676 ?        S      0:00 hald-addon-input: Listening on /dev/input/event4 /dev/input/event3 /dev/input/event0 /dev/input/event1
 4683 ?        S      0:09 hald-addon-storage: polling /dev/sr0 (every 2 sec)
 4700 ?        S      0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
 4720 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --user distcc --port 3632 --log-level critical --all
 4819 ?        Ss     0:00 /sbin/portmap
 4884 ?        Ss     0:00 /sbin/rpc.statd --no-notify
 4946 ?        S      0:00 [rpciod/0]
 4954 ?        Ss     0:00 /usr/sbin/rpc.mountd
 4956 ?        S      0:00 [lockd]
 4957 ?        S      0:00 [nfsd]
 4958 ?        S      0:00 [nfsd]
 4959 ?        S      0:00 [nfsd]
 4960 ?        S      0:00 [nfsd]
 4961 ?        S      0:00 [nfsd]
 4962 ?        S      0:00 [nfsd]
 4963 ?        S      0:00 [nfsd]
 4964 ?        S      0:00 [nfsd]
 5077 ?        Ssl    0:03 /usr/bin/mpd /etc/mpd.conf
 5138 ?        Ss     0:00 /usr/bin/mpdscribble --pidfile /var/run/mpdscribble.pid
 5201 ?        Ss     0:00 /usr/sbin/smbd -D
 5210 ?        S      0:00 /usr/sbin/smbd -D
 5211 ?        Ss     0:00 /usr/sbin/nmbd -D
 5279 ?        Ss     0:00 /usr/sbin/sshd
 5409 ?        Ss     0:00 /usr/sbin/cron
 5475 ?        Ss     0:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf
 5644 tty1     Ss     0:00 /bin/login --     
 5645 tty2     Ss+    0:00 /sbin/agetty 38400 tty2 linux
 5646 tty3     Ss+    0:00 /sbin/agetty 38400 tty3 linux
 5647 tty4     Ss+    0:00 /sbin/agetty 38400 tty4 linux
 5648 tty5     Ss+    0:00 /sbin/agetty 38400 tty5 linux
 5649 tty6     Ss+    0:00 /sbin/agetty 38400 tty6 linux
 5677 ?        SNsl   0:01 /usr/bin/smbnetfs /home/spinal/net
 5984 ?        S<     0:00 /sbin/udevd --daemon
 5985 ?        S<     0:00 /sbin/udevd --daemon
10032 ?        SNl    1:20 /opt/opera/lib/opera/10.10/opera -notrayicon
10042 ?        SN     0:00 /usr/libexec/gconfd-2
10161 ?        S      0:00 /usr/bin/inotifywait -qq -e close_write -e move -e delete_self /dev/shm/acpid.status
10169 ?        SN     0:01 /usr/bin/smbnetfs /home/spinal/net
10460 ?        RN     6:24 /usr/bin/mc -P /tmp/mc-spinal/mc.pwd.10447
10462 pts/1    SNs+   0:00 bash -rcfile .bashrc
10784 ?        SN     0:01 /usr/bin/Terminal
10785 ?        SN     0:00 gnome-pty-helper
10786 pts/2    SNs+   0:00 bash
10799 pts/3    SNs+   0:00 bash
10833 pts/4    SNs    0:00 bash
10843 pts/4    SN+    0:00 screen -r
10849 ?        SN     0:00 /usr/bin/smbnetfs /home/spinal/net
10850 ?        SN     0:00 /usr/bin/smbnetfs /home/spinal/net
10851 ?        SN     0:00 /usr/bin/smbnetfs /home/spinal/net
10852 ?        SN     0:00 /usr/bin/smbnetfs /home/spinal/net
10853 ?        SN     0:00 /usr/bin/smbnetfs /home/spinal/net
10887 pts/6    SNs    0:00 bash
10897 pts/6    RN+    0:00 ps ax
30056 ?        SNs    0:00 SCREEN
30057 pts/5    SNs+   0:00 -/bin/bash
30200 tty1     S+     0:00 -bash
30327 ?        SNs    0:00 /usr/bin/gdm
30333 ?        SN     0:00 /usr/bin/gdm
30336 tty7     RNs+   2:40 /usr/bin/X :0 -audit 0 -auth /var/gdm/:0.Xauth vt7
30355 ?        SNs    0:00 /bin/sh /etc/xdg/xfce4/xinitrc -- /etc/X11/xinit/xserverrc
30371 ?        SN     0:00 /usr/bin/dbus-launch --sh-syntax --exit-with-session
30372 ?        SNs    0:00 /usr/bin/dbus-daemon --fork --print-pid 6 --print-address 9 --session
30377 ?        SNs    0:00 /usr/bin/ssh-agent -- startxfce4
30384 ?        SN     0:00 xscreensaver -no-splash
30389 ?        SN     0:00 /usr/bin/xfce4-session
30391 ?        SN     0:00 /usr/libexec/xfconfd
30395 ?        SN     0:00 xfsettingsd
30397 ?        SN     0:02 xfwm4
30399 ?        SN     0:06 xfce4-panel
30401 ?        SN     0:00 Thunar --daemon
30403 ?        SN     0:02 xfdesktop
30405 ?        SN     0:00 /usr/libexec/gam_server
30406 ?        SN     0:00 /usr/libexec/xfce4/panel-plugins/xfce4-menu-plugin socket_id 16777244 name xfce4-menu id 5 display_name Меню Xfce size
30414 ?        SN     0:00 /bin/bash /home/spinal/.config/autorun/autorun.sh
30418 ?        SN     0:01 stardict
30419 ?        SN     0:00 xfce4-settings-helper
30424 ?        SN     0:00 /usr/libexec/xfce4/panel-plugins/xfce4-mpc-plugin socket_id 16777252 name xfce4-mpc-plugin id 12679081321 display_name
30425 ?        SNl    0:00 /usr/libexec/xfce4/panel-plugins/xfce4-mixer-plugin socket_id 16777253 name xfce4-mixer-plugin id 125438493015 display
30426 ?        SN     0:01 /usr/libexec/xfce4/panel-plugins/xfce4-netload-plugin socket_id 16777254 name netload id 125438500817 display_name Net
30427 ?        SN     0:00 /usr/libexec/xfce4/panel-plugins/xfce4-genmon-plugin socket_id 16777256 name genmon id 125438385110 display_name Gener
30428 ?        SN     0:00 /usr/libexec/xfce4/panel-plugins/xfce4-notes-plugin socket_id 16777257 name xfce4-notes-plugin id 12595153000 display_
30429 ?        SN     0:05 /usr/libexec/xfce4/panel-plugins/xfce4-cpugraph-plugin socket_id 16777258 name cpugraph id 12582873440 display_name CP
30430 ?        SN     0:01 /usr/libexec/xfce4/panel-plugins/xfce4-time-out-plugin socket_id 16777259 name xfce4-time-out-plugin id 12744763684 di
30431 ?        SN     0:01 /usr/libexec/xfce4/panel-plugins/orageclock socket_id 16777260 name orageclock id 12543837287 display_name Часы с дато
30439 ?        SN     0:00 /bin/bash /home/spinal/.config/autorun/autorun.sh
30442 ?        SN     0:00 trix

comment:4 Changed 14 years ago by ossi

it would be probably much more helpful to attach first strace (to see whether it is looping around some syscalls) and then gdb to the process.

comment:5 Changed 14 years ago by Spinal

Hello, Ossi.
Could you please tell me what should I do with gdb?
I'm completely noob with it...

comment:6 Changed 14 years ago by angel_il

1 run mc
2 waiting for mc hangup
3 in another terminal run 'top' or 'ps ax', copy PID of mc
4 start gdb -p PID_of_mc
5 bt (copy/paste output)

comment:7 Changed 14 years ago by Spinal

Okay. Here's the strace's output:
...
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99992})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99992})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
...

And here's gdb's:

~ $ gdb -p 28087

warning: Can not parse XML syscalls information; XML support was disabled at compile time.
GNU gdb (Gentoo 7.0.1 p1) 7.0.1
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>.
Attaching to process 28087
Reading symbols from /usr/bin/mc...(no debugging symbols found)...done.
Reading symbols from /lib/libext2fs.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libext2fs.so.2
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /lib/libgpm.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgpm.so.1
Reading symbols from /lib/libncursesw.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncursesw.so.5
Reading symbols from /usr/lib/libgmodule-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgmodule-2.0.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libglib-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libglib-2.0.so.0
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/libX11.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libX11.so
Reading symbols from /usr/lib/libxcb.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libxcb.so.1
Reading symbols from /usr/lib/libXau.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXau.so.6
Reading symbols from /usr/lib/libXdmcp.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXdmcp.so.6
0xb77f9424 in kernel_vsyscall ()
(gdb) bt
#0 0xb77f9424 in
kernel_vsyscall ()
#1 0xb75f98fd in select () from /lib/libc.so.6
#2 0x080ad5fd in try_channels ()
#3 0x080ae4da in tty_get_event ()
#4 0x0806116d in run_dlg ()
#5 0x08097130 in main ()
(gdb)

~ $ file /usr/bin/mc

/usr/bin/mc: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

The mc binary is not stripped but gdb says
Reading symbols from /usr/bin/mc...(no debugging symbols found)...done.

I'm going now to recompile mc with -ggdb gcc option enabled to see if there will be any difference.

Please let me know if there's anything else I should consider doing.

comment:8 Changed 14 years ago by ossi

unhandled EOF

comment:9 Changed 14 years ago by Spinal

Hm... And what is the solution?..

comment:10 Changed 14 years ago by slyfox

It looks like mc bug for file descriptor exceptions (mis)handling. I wonder what forces input fd to close/'error out' exactly on suspend/resume.

Do you use any sort of terminal multiplexers? (screen or something like that)

comment:11 Changed 14 years ago by slyfox

At least I seem found an easy way to reproduce the bug (many thanks to ossi)

$ mc
# attach to it in another session and close fd=4
$ gdb -p $mc_pid
call close(4)
quit

poor mc process starts to eat CPU in dead loop.

comment:12 follow-up: ↓ 13 Changed 14 years ago by Spinal

I only use screen to run rtorrent. This is not connected to the bug.
What is that file descriptor #4 points to?
Could you tell if I can check if that's actually the reason of the bug in my case?

comment:13 in reply to: ↑ 12 Changed 14 years ago by slyfox

Replying to Spinal:

I only use screen to run rtorrent. This is not connected to the bug.
What is that file descriptor #4 points to?
Could you tell if I can check if that's actually the reason of the bug in my case?

According to whole strace log it's a descriptor of spawned by mc subshell (the one, available on Ctrl+O). Can you run mc under strace before suspend and then reproduce the hangup?
I should look like:

$ strace -omc-log.strace mc
# reproduce the bug
# attach mc-log.strace here

I'd like to see the whole log to get what caused tty close/break.

And I'd like to see your exact linux kernel version:

$ uname -a

comment:14 Changed 14 years ago by Spinal

It's a pity, I don't know how to reproduce the bug. It occurs once a week - i.e. pretty rarely. I use mc extensively. And I'm not really sure if that's connected to suspend. It may be just a conjunction.

~ $ uname -a

Linux supervisor 2.6.32-gentoo-r7 #2 PREEMPT Wed May 26 22:50:15 EEST 2010 i686 Intel(R) Celeron(R) CPU 2.66GHz GenuineIntel? GNU/Linux

So, how could I do needed checks, considering the rareness of this bug?

comment:15 Changed 14 years ago by slyfox

Ah, i thought it's not so rare. I think we have enough available info at least to fix dead loop.
So we can give the root cause to live in the tree for a while :]

comment:16 Changed 14 years ago by Zenith88

  • Keywords high cpu added
  • Version changed from 4.7.2 to 4.7.0

Similar problem, but not related to suspend/resume. In my system mc sometimes 'disconnects' from terminal and consumes 100% CPU. Exiting from mc via F10 does not help - the mc process remains running and only killing it helps. I remember noticing that for almost 10 years since Slackware 2.
That happens under X term and in console. Don't know what steps to take to reproduce - it feels absolutely random.

comment:17 follow-ups: ↓ 18 ↓ 19 Changed 14 years ago by andrew_b

Try to compile mc without gpm support. Or simple whitch off the gpm service.

comment:18 in reply to: ↑ 17 Changed 14 years ago by Spinal

Replying to andrew_b:

Try to compile mc without gpm support. Or simple whitch off the gpm service.

Ok. I will notice you if the problem occurs again (with disabled gpm).
If there's no answer in two months, please consider the solution to be helpfull.
Thanks for advice.

comment:19 in reply to: ↑ 17 Changed 13 years ago by Spinal

  • Version changed from 4.7.0 to 4.7.4
  • Milestone changed from 4.7 to 4.7.5

Try to compile mc without gpm support. Or simple whitch off the gpm service.

That didn't help.

Installed version: mc-4.7.4-r1(13:04:56 25.09.2010)(X edit nls -gpm -samba -slang)

Same behaviour occured. Here's the strace copy (this is output in infinite loop):
...
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99993})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
...

What did I do? I just launch smplayer using Enter on avi file. Then when the movie finished I saw 100% cpu load. It's again midnight commander. But I closed all mc.
So now it wasn't connected to gpm (mc is compiled with -gpm USE flag)

Only "killall mc" helped.

comment:20 follow-up: ↓ 21 Changed 13 years ago by andrew_b

  • Milestone changed from 4.7.5 to 4.7

Please build mc with full debug info. When mc will consume 100% cpu, connect to the mc process using gdb -p and walk step-by-step in code where inifinte loop occured. Then post here your results. Thanks!

comment:21 in reply to: ↑ 20 ; follow-up: ↓ 22 Changed 13 years ago by Spinal

Please build mc with full debug info. When mc will consume 100% cpu, connect to the mc process using gdb -p and walk step-by-step in code where inifinte loop occured.

Thanks for the answer, Andrew.
Could you please consult me how to do these steps properly:
1) build mc with full debug info
2) walk step-by-step in code with gdb

I'm Gentoo user if that matters.
Thanks again!

comment:22 in reply to: ↑ 21 Changed 13 years ago by Spinal

FORGET MY LAST POST :)
I've found this article: http://www.unknownroad.com/rtfm/gdbtut/gdbinfloop.html
Probably it should help.
Sorry for my naive lame. Googling is not my best side ))

Replying to Spinal:

Please build mc with full debug info. When mc will consume 100% cpu, connect to the mc process using gdb -p and walk step-by-step in code where inifinte loop occured.

Thanks for the answer, Andrew.
Could you please consult me how to do these steps properly:
1) build mc with full debug info
2) walk step-by-step in code with gdb

I'm Gentoo user if that matters.
Thanks again!

comment:23 Changed 13 years ago by Spinal

(gdb) bt
#0 0xb76ea424 in kernel_vsyscall ()
#1 0xb74d184d in select () from /lib/libc.so.6
#2 0x080ac8ab in try_channels (set_timeout=<value optimized out>) at key.c:589
#3 0x080ad445 in getch_with_delay (event=0xbf92fc50, redo_event=0, block=1) at key.c:661
#4 tty_get_event (event=0xbf92fc50, redo_event=0, block=1) at key.c:1684
#5 0x08061718 in frontend_run_dlg (h=0x9a70358) at dialog.c:1043
#6 run_dlg (h=0x9a70358) at dialog.c:1075
#7 0x0809619f in create_panels_and_run_mc (argc=3, argv=0xbf92fe64) at main.c:1716
#8 do_nc (argc=3, argv=0xbf92fe64) at main.c:1798
#9 main (argc=3, argv=0xbf92fe64) at main.c:2048
(gdb) next
Single stepping until exit from function
kernel_vsyscall,
which has no line number information.
0xb74d184d in select () from /lib/libc.so.6
(gdb) next
Single stepping until exit from function select,
which has no line number information.
try_channels (set_timeout=<value optimized out>) at key.c:590
590 if (v > 0) {
(gdb) next
591 check_selects (&select_set);
(gdb) next
592 if (FD_ISSET (input_fd, &select_set))
(gdb) next
596 }
(gdb) next
tty_get_event (event=0xbf92fc50, redo_event=0, block=1) at key.c:1684
1684 c = block ? getch_with_delay () : get_key_code (1);
(gdb) next

And then gdb hangs here doing nothing and consuming 100% cpu.
Will this be helpful?
Should I do something else to debug this?

Thank you.

comment:24 Changed 13 years ago by zaytsev

  • Cc zaytsev, slyfox added

Hi! Sorry, we are busy with release preps but thank you for helping with debugging this issue. Maybe Sly has something to say about it. We'll get back to it later.

comment:25 Changed 13 years ago by andrew_b

  • Component changed from mc-core to mc-tty

Some additional info can be found in #2416.

comment:26 follow-up: ↓ 40 Changed 13 years ago by Spinal

Please disregard my explanation of the bug! It's not connected to suspend to ram.
It's more common than I thought. I experienced this bug on my n900 about 2 times last month.
I just suddenly explored that my battery is drained more actively than usually.
"top" showed me that it was ... yes, it was midnight commander. Killall helped to fix the things.
But, it's interesting that I don't see much comments here from other mc users...

Am I a magnet for that bug or something? :-)

P.S. The packaged mc version is 4.7.4-maemo3 on the phone.

comment:27 follow-up: ↓ 28 Changed 13 years ago by angel_il

try build MC without subshel suport

comment:28 in reply to: ↑ 27 Changed 13 years ago by Spinal

Replying to angel_il:

try build MC without subshel suport

And what's the reason?
1) I don't know how to build software for the phone.
I use binary (.deb) packages prepared by maemo community.
2) (More important) I use subshell actively. What's the point of removing it?
3) I don't know how to reproduce bug. It's reproduced randomly.

comment:29 Changed 13 years ago by angel_il

just do it, don't ask me 'why' :) i want to know the bug still reproduced if subshell is disabled.

1) I don't know how to build software for the phone.

99% - bug in mc.

comment:30 Changed 13 years ago by Spinal

As I stated above "I don't know how to reproduce bug. It's reproduced randomly"
I cannot work without a subshell for a week or two waiting if the bug will appear.
Midnight commander without a subshell is useless thing, IMHO.

comment:31 Changed 13 years ago by slyfox

  • Owner set to slyfox
  • Status changed from new to assigned
  • severity changed from no branch to on review

Created branch:2244_busy_loop
aka changeset:8de43bfa2c776a6142665cd78cb94b39617e5038

I don't guarantee the patch fixes this exact problem, but it will not hurt in any way.

I think (not sure) Spinal's case is the following:

  1. He closes terminal window in mc
  2. it leads subshell (and it's descriptor) to die
  3. signal delivery is:
    • too late or
    • absent or
    • SIGCHLD is blocked or
    • something else (it's the major thing to find out) and mc is fast enough to call select() on that invalid descriptor. Full strace log (from the very mc start) would certainly help. So we get busy loop.

Please review.

comment:32 follow-up: ↓ 37 Changed 13 years ago by ossi

that should be an else-if, to make it clear that no fall-through from the v > 0 is possible.

anyway, while i agree that the patch won't hurt, it makes plain no sense in this context. the strace indicates clearly that the select does not fail (and it never will, unless an FD is actually actively closed somewhere or the system is in real trouble). as i said five months ago, the problem is that the EOF (on stdin) is not handled.

comment:33 Changed 13 years ago by slavazanko

  • Blocked By 2409 added

comment:34 Changed 13 years ago by slavazanko

  • Blocking 2409 added
  • Blocked By 2409 removed

comment:35 Changed 13 years ago by andrew_b

  • Votes for changeset set to andrew_b

comment:36 Changed 13 years ago by slavazanko

  • Votes for changeset changed from andrew_b to andrew_b slavazanko
  • severity changed from on review to approved

comment:37 in reply to: ↑ 32 Changed 13 years ago by slyfox

  • Votes for changeset andrew_b slavazanko deleted
  • severity changed from approved to on rework

Replying to ossi:

that should be an else-if, to make it clear that no fall-through from the v > 0 is possible.

Agreed, will amend.

anyway, while i agree that the patch won't hurt, it makes plain no sense in this context. the strace indicates clearly that the select does not fail (and it never will, unless an FD is actually actively closed somewhere or the system is in real trouble). as i said five months ago, the problem is that the EOF (on stdin) is not handled.

Oh my, right. I'm not sure how smplayer managed to close mc's stdin though.
I tried to address some nasty issue (which i can't reproduce with hangup) happening when one closes terminal window.

andrew_b slavazanko

Guys, i am removing your votes as I'll have to fix the EOF issue as well.

Might last for a while so block on #2409 can be dropped at some time.

Sorry.

comment:38 Changed 13 years ago by slyfox

Another theory: Spinal's terminal sometimes does not send SIGHUP to mc.

Steps to reproduce another hangup:

  1. apply the following patch (mc ignores SIGHUP signal)
    diff --git a/lib/utilunix.c b/lib/utilunix.c
    index 5d0a207..3f7b8b2 100644
    --- a/lib/utilunix.c
    +++ b/lib/utilunix.c
    @@ -215,6 +215,7 @@ my_system (int flags, const char *shell, const char *command)
             signal (SIGQUIT, SIG_DFL);
             signal (SIGTSTP, SIG_DFL);
             signal (SIGCHLD, SIG_DFL);
    +        signal (SIGHUP, SIG_DFL);
    
             if (flags & EXECUTE_AS_SHELL)
                 execl (shell, shell, "-c", command, (char *) NULL);
    diff --git a/src/main.c b/src/main.c
    index 7b54789..6fb9e7c 100644
    --- a/src/main.c
    +++ b/src/main.c
    @@ -522,6 +522,15 @@ main (int argc, char *argv[])
     #endif /* HAVE_SUBSHELL_SUPPORT */
             mc_prompt = (geteuid () == 0) ? "# " : "$ ";
    
    +    {
    +        struct sigaction ignore;
    +        ignore.sa_handler = SIG_IGN;
    +        sigemptyset (&ignore.sa_mask);
    +        ignore.sa_flags = 0;
    +
    +        sigaction (SIGHUP, &ignore, NULL);
    +    }
    +
         /* Program main loop */
         if (!midnight_shutdown)
             do_nc ();
    
  2. build mc with ncurses (slang seems to handle it properly)
    F="$F -ggdb3"
    ../mc/configure --prefix=$(pwd)/_mc-bin \
                    --with-samba \
                    --with-mcserver \
                    --enable-charset \
                    --enable-extcharset \
                    --enable-maintainer-mode \
                    --with-screen=ncurses \
    &&
    make CFLAGS="$F" &&
    make install -j 3
    
  3. run ./mc_bin/bin/mc in xterm and close xterm's window
  4. contemplate 100% CPU load

strace log:

select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99993})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99994})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99993})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99993})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99994})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99993})

gdb log:

#0  0x00007f5c76c348f3 in __select_nocancel () from /lib/libc.so.6
#1  0x0000000000442764 in try_channels (set_timeout=1) at ../../../mc/lib/tty/key.c:609
#2  0x00000000004429e9 in getch_with_delay () at ../../../mc/lib/tty/key.c:693
#3  0x00000000004447ba in tty_get_event (event=0x7fff8d16a500, redo_event=0, block=1) at ../../../mc/lib/tty/key.c:1877
#4  0x000000000044839e in frontend_run_dlg (h=0x1d65fd0) at ../../../mc/lib/widget/dialog.c:527
#5  0x000000000044952d in run_dlg (h=0x1d65fd0) at ../../../mc/lib/widget/dialog.c:1145
#6  0x000000000048af11 in create_panels_and_run_mc () at ../../../mc/src/filemanager/midnight.c:883
#7  0x000000000048c538 in do_nc () at ../../../mc/src/filemanager/midnight.c:1606
#8  0x0000000000422e75 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../mc/src/main.c:536

Looks similar, eh?

comment:39 Changed 13 years ago by slavazanko

  • Blocking 2409 removed

comment:40 in reply to: ↑ 26 Changed 13 years ago by zap

But, it's interesting that I don't see much comments here from other mc users...
Am I a magnet for that bug or something? :-)

You're not a magnet, I have experienced the same bug when porting mc to N900, and reported it here:

http://www.midnight-commander.org/ticket/2416

(the bug above also contains debug info, maybe it can shed additional light on the bug). The bug is triggered when you close the terminal without quitting mc first. This will quickly drain your battery, so it's a very serious bug for a smartphone.

but it was marked as a duplicate of this bug. Please use version 4.6.2-pre1-1maemo10 (the latest stable mc port), it's old but doesn't have this (and many other) bugs.

comment:41 Changed 13 years ago by zaytsev

Hi! Is there a N900 emulator or something? We don't have N900, so you know, it's not easy to fix something if you only have strange backtraces and you can't reproduce the bug yourself and also can't check if it is fixed or not.

comment:42 follow-up: ↓ 44 Changed 13 years ago by ossi

oh, c'mon, don't be silly. the bug is rather obviously that get_key_code() returns -1 on EOF, which is interpreted as "try again" by getch_with_delay(). or something very similar. i found that after three minutes of just looking over the code, so it can't be that hard to find the problem when you do an actual review.

comment:43 Changed 13 years ago by zaytsev

Is this "don't be silly" comment for me? WTF? Go talk like this to someone who appreciates it.

comment:44 in reply to: ↑ 42 Changed 13 years ago by slyfox

Replying to ossi:

oh, c'mon, don't be silly. the bug is rather obviously that get_key_code() returns -1 on EOF, which is interpreted as "try again" by getch_with_delay(). or something very similar. i found that after three minutes of just looking over the code, so it can't be that hard to find the problem when you do an actual review.

That's cool and I agree mc needs a workaround, but I'd also like to know who stole SIGHUP.

It could easily be a flaw in n900 terminal.

comment:45 Changed 13 years ago by ossi

zaytsev: you don't have to appreciate it, as it was an expression of *my* disappreciation for your somewhat unconvincing approach to this problem.

slyfox: good point. otoh, some sources i found indicate that only the session leader (which would be the shell mc was started from) receive the hangup, and propagating it to the children is part of the shell's job control - which can be intentionally suppressed (nohup or disown -h) or could fail for example if the shell simply died (which may even be the reason for the terminal exiting in the first place).

so i think we have an explanation and a tentative solution.
ps: merry xmas! :)

comment:46 Changed 13 years ago by angel_il

Orthodox Christmas on Jan. 7:)

comment:47 Changed 13 years ago by zaytsev

I am not considering the technical merits of your input, however, I find that you are repeatedly being from boorish to plain rude in expressing your opinions. It might be that it's a norm of life to be harsh to each other to pass for an 1337 h4x0r in the other open source communities that you are involved in, or a subtle trait of your personality which gives it such an unique touch, but unfortunately I couldn't care less.

Consequently, by saying that your comments are not appreciated I was trying to politely indicate that maybe you should for once drop your mentor tone and force yourself to try to be a bit more convivial if not friendly. Your attitude is considered to be intimidating not only by me, but also by the other members of the group, therefore if you absolutely want to stick to it, it would be better if you would keep your comments for yourself. Am I making myself clear enough now?

comment:48 Changed 13 years ago by ossi

ilya: accumulate the wishes for later then. :D

zyv: and being "part of the group" allows you to be exactly that? sorry, but the irony of *you* trying to teach me good manners borders on grotesque. have *i* made myself clear enough now?

comment:49 Changed 13 years ago by zaytsev

Using your own words, I don't expect you to appreciate it: if you see nothing wrong with your behavior there's frankly not much that I can do. The allusion to other group members was to keep you from being deluded that I am to only one who finds your tone irritating and hence it is my personal problem.

You are more than welcome to create your own community and express yourself in every imaginable way that you think you should. However, if you want keep commenting on this specific trac you are expected to behave at least neutrally in the eyes of those whom you are addressing to. I think this is a reasonable requirement.

I hope you have enough self esteem to not to engage in a follow-up discussion on how I am expected to enforce it.

comment:50 Changed 13 years ago by ossi

I think this is a reasonable requirement.

i just wonder why you think it doesn't apply to you. or do you really not notice how aggressive, sarcastic and plain rude you often are to the bug reporters and sometimes your team mates?
and don't get me wrong: it's my philosophy to be just that towards those who are taxing my patience. but if you do that, you better make damn sure that it's not *you* who is wasting others' time.

comment:51 follow-up: ↓ 52 Changed 13 years ago by zaytsev

I don't think that the policies that I'm advocating for do not equally apply to me, however, to my mind, my own behavior is within the realms of acceptable. You have a track record of tapping on my nerves for more than one year on different occasions. Now you present this as your consistent responses to my aggression, but I can't see how I could have triggered your rudeness to me and others in the first place, when you were initially commenting on messages that were not destined to you directly in any way.

Your usual communication tactics are:

(1) Fish out something from the commit list and reply back something along the lines of "wtf, how this could have even been committed? of course anyone with a slightest clue would have done X and Y instead" (which obviously implies that you have the clue, but the day-to-day routine has to be performed by lower-grade programmers and you will be sending them your directions when they really irritate you with their lack of competence beyond of what you can handle)

(2) Go through the bugs backlog and wonder how come this was not noticed, or that has not been done yet, while for a person with your qualification this would have only taken a few moments. Sometimes, you have brought up points years ago on the mailing list with former developers, and yet nobody fixed it! Of course, anyone should be devoted to fixing things that annoy you in the first place. If one wants to work on mc, one has to be doing what you think is needed.

(3) Chime into a discussion on the trac and declare that the only true way to implement X is by doing Y and Z. Obviously, anyone that dares to disagree with you is an idiot. Moreover, this idiot has to implement it for you the way you want it to, because you have already pointed him the right way.

(4) Whenever once in awhile you are in the mood of writing some divine code, you attach it to the ticket saying something like "Fix it!". Roger, sir! Of course, this was a sparkle of humor, which stupid slaves did not get.

So am I really the one that provokes you doing this all the time? Maybe I need to see a therapist.

The explanation I came up with is that the emphasis on your superiority comes naturally without any ulterior motives and you are sincerely surprised by subsequent reactions, but it doesn't make it any more pleasant.

Maybe your close friends that know all your virtues IRL will appreciate you calling them silly, but not me, sorry. And I don't think that my provocations are the reason for your lack of positive communication.

comment:52 in reply to: ↑ 51 Changed 13 years ago by Spinal

Sorry for my 50 cents but...
That really seems ridiculous, reading comments about manners on this tracker.
Is there any update on the bug? Is it going to be fixed?

comment:53 Changed 13 years ago by zap

  1. I can provide a remote SSH session and assistance to any mc developer that wishes to debug the problem on my N900. Contact me via jabber zap#jabber.ozerki.net if you want it.
  1. I will try to debug the problem myself, I'm just not familiar with inner workings of mc. Now that I know someone must catch SIGHUP, I can set some breakpoints and see what happen.

comment:54 follow-up: ↓ 55 Changed 13 years ago by Shareth

100% way to reproduce it:

  1. open Konsole
  2. sudo bash
  3. mc
  4. close Konsole (Quit from menu or just by pressing X on window).

Works only after sudo but I guess the problem is not superuser itself.

The exact same thing happens if you run 'top' instead of 'mc' - top eats 100% cpu after closing Konsole. On the other hand htop in the same circumstances closes gracefully.

mc --version
GNU Midnight Commander 4.7.0.3

emerge -pv mc
app-misc/mc-4.7.0.3 USE="X edit gpm nls samba -slang"

uname -a
Linux ruf-gentoo 2.6.36-gentoo-r5-1 #1 SMP PREEMPT Thu Dec 23 12:55:11 MSK 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz GenuineIntel? GNU/Linux

comment:55 in reply to: ↑ 54 Changed 13 years ago by angel_il

Replying to Shareth:

100% way to reproduce it:

  1. open Konsole
  2. sudo bash
  3. mc
  4. close Konsole (Quit from menu or just by pressing X on window).

Works only after sudo but I guess the problem is not superuser itself.

The exact same thing happens if you run 'top' instead of 'mc' - top eats 100% cpu after closing Konsole. On the other hand htop in the same circumstances closes gracefully.

mc --version
GNU Midnight Commander 4.7.0.3

emerge -pv mc
app-misc/mc-4.7.0.3 USE="X edit gpm nls samba -slang"

uname -a
Linux ruf-gentoo 2.6.36-gentoo-r5-1 #1 SMP PREEMPT Thu Dec 23 12:55:11 MSK 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz GenuineIntel? GNU/Linux

Shareth, thanx, i try install Konsole for reproduce.

comment:56 Changed 12 years ago by sorath

  • Cc torohov_s_a@… added
  • Branch state set to no branch

The bug is still reproduced in Gentoo Linux with mc-4.7.5.2 (current stable), mc-4.7.5.5 and mc-4.8.0 (current masked) releases.

comment:57 Changed 12 years ago by sorath

Sorry, it seems that while adding myself to "cc list" I leaved default "Branch state: no branch" options and it change branch status from probably (mentioned above)"severity changed from approved to on rework" to "no branch".

comment:58 Changed 12 years ago by andrew_b

  • Milestone changed from 4.7 to Future Releases

comment:59 Changed 12 years ago by petertux

  • Cc petre.rodan@… added

hi

I'm another gentoo user confronted with this bug on both x86 and amd64. mc went into 100%cpu multiple times per day and today I got fed up with it and decided to investigate.

it is extremely easy to replicate: start xterm and mc inside it. close xterm. voila. nothing else is needed.

read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL) = 1 (in [0])
read(0, "", 1) = 0
select(5, [0 4], NUL

mc was built with ncurses and without slang:
[ebuild R ] app-misc/mc-4.7.5.2 USE="X ncurses -edit -gpm -nls -samba -slang" 0 kB
[ebuild R ] sys-libs/ncurses-5.7-r7 USE="unicode -ada -cxx -debug -doc -gpm -minimal -profile -static-libs -trace" 2,388 kB

if I compile it with slang and without ncurses:
[ebuild R ] app-misc/mc-4.7.5.2 USE="X slang -edit -gpm -ncurses -nls -samba" 0 kB

the bug disappears.

comment:60 follow-up: ↓ 61 Changed 11 years ago by ginggs

I am able to reproduce this with MC 4.8.10 on Ubuntu Raring.

Download and unpack the Debian source package for mc/3:4.8.10-2.
Change line 31 of debian/rules from:

--with-screen=slang \

to:

--with-screen=ncurses \

and line 14 of debian/control from:

,libslang2-dev

to:

,libncurses-dev

Build and install the package.
Start gnome-terminal or xterminal.
Run mc as root (sudo mc), bug does not occur as normal user.
Close the terminal window.
Mc process continues to run, consuming 100% CPU.

comment:61 in reply to: ↑ 60 ; follow-up: ↓ 62 Changed 11 years ago by slyfox

Replying to ginggs:

Start gnome-terminal or xterminal.
Run mc as root (sudo mc), bug does not occur as normal user.
Close the terminal window.
Mc process continues to run, consuming 100% CPU.

May I ask you to get a strace log bit of a process when it's in such state?

strace -p $pid -o log

Some lines should be enough to see where we don't handle errors on tty in/out.
And, may I ask you to attack to it with gdb and get a backtrace?
Needs debuggigng symbols

gdb -p $pid
bt full

Thanks!

comment:62 in reply to: ↑ 61 Changed 11 years ago by Spinal

Replying to slyfox:

Replying to ginggs:

Start gnome-terminal or xterminal.
Run mc as root (sudo mc), bug does not occur as normal user.
Close the terminal window.
Mc process continues to run, consuming 100% CPU.

May I ask you to get a strace log bit of a process when it's in such state?

strace -p $pid -o log

Some lines should be enough to see where we don't handle errors on tty in/out.
And, may I ask you to attack to it with gdb and get a backtrace?
Needs debuggigng symbols

gdb -p $pid
bt full

Thanks!

Hi, Slyfox.
I got this from mc-4.8.10.

This is gdb output:

GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>.
Attaching to process 2373
Reading symbols from /usr/bin/mc...done.

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /lib/libncursesw.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncursesw.so.5
Reading symbols from /lib/libext2fs.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libext2fs.so.2
Reading symbols from /usr/lib/libgmodule-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgmodule-2.0.so.0
Reading symbols from /usr/lib/libglib-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libglib-2.0.so.0
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/libX11.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libX11.so
Reading symbols from /usr/lib/libxcb.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libxcb.so.1
Reading symbols from /usr/lib/libXau.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXau.so.6
Reading symbols from /usr/lib/libXdmcp.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libXdmcp.so.6
0xb770c424 in __kernel_vsyscall ()
(gdb) bt full
#0  0xb770c424 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7480b9d in select () from /lib/libc.so.6
No symbol table info available.
#2  0x08081219 in try_channels (set_timeout=0) at key.c:621
        time_out = {tv_sec = 0, tv_usec = 99999}
        select_set = {fds_bits = {1, 0 <repeats 31 times>}}
        timeptr = <optimized out>
        v = <optimized out>
        maxfdp = <optimized out>
#3  0x080829f7 in getch_with_delay () at key.c:698
        c = <optimized out>
#4  tty_get_event (event=0xbfa582b0, redo_event=0, block=1) at key.c:2133
        c = <optimized out>
        flag = 0
        time_out = {tv_sec = 10, tv_usec = 0}
        time_addr = <optimized out>
        dirty = 1
#5  0x0806663a in frontend_dlg_run (h=0x8333280) at dialog.c:565
        d_key = <optimized out>
        event = {buttons = 0, x = -1, y = 135627276, 
          type = (GPM_MOVE | GPM_DRAG | GPM_DOWN | GPM_TRIPLE | GPM_HARD | unknown: 134632960)}
#6  dlg_run (h=0x8333280) at dialog.c:1252
No locals.
#7  0x0808ad21 in create_panels_and_run_mc () at midnight.c:959
No locals.
#8  do_nc () at midnight.c:1774
        ret = <optimized out>
        midnight_colors = {9, 9, 9, 9, 9}
#9  0x08054890 in main (argc=1, argv=0xbfa584d4) at main.c:397
        error = 0x0
        config_migrated = 0
        config_migrate_msg = 0xbfa58438 "\250\204\245\277\227E;\267\001"
        exit_code = 1

Strace:

read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99996})

comment:63 Changed 11 years ago by ginggs

  • Cc graham@… added

comment:64 Changed 11 years ago by ginggs

My gdb backtrace and strace log look much the same as Spinal's.

comment:65 Changed 8 years ago by ginggs

This issue is still present in mc 4.8.15 (compiled with ncurses).
Tested on Ubuntu 15.10 amd64 with mc 4.8.15-2 from Debian unstable.

comment:66 follow-up: ↓ 68 Changed 8 years ago by and

Can we have a fresh strace or is comment:61 strace log uptodate?

Looks like we looping in getch_with_delay() all day long,
when mc thinks to retrieve next key but slang/ncurses never return a key after resume?

comment:67 Changed 8 years ago by ginggs

backtrace:

(gdb) bt full
#0  0x00007f9b50f53723 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x0000560bab8c7d7e in try_channels (set_timeout=set_timeout@entry=1) at key.c:626
        timeptr = 0x7ffc3a3002b0
        maxfdp = 1
        v = <optimized out>
        time_out = {tv_sec = 0, tv_usec = 99999}
        select_set = {fds_bits = {1, 0 <repeats 15 times>}}
#2  0x0000560bab8c971a in getch_with_delay () at key.c:722
        c = <optimized out>
#3  tty_get_event (event=event@entry=0x7ffc3a300400, redo_event=0, block=block@entry=1) at key.c:2138
        c = <optimized out>
        flag = 0
        ev = {buttons = 0 '\000', modifiers = 0 '\000', vc = 0, dx = 0, dy = 0, x = 0, y = 0, type = (unknown: 0), clicks = 0, margin = (unknown: 0), wdx = 0, wdy = 0}
        time_out = {tv_sec = 94608148554896, tv_usec = 94608148537344}
        time_addr = <optimized out>
        dirty = 1
#4  0x0000560bab8b892b in frontend_dlg_run (h=0x560bad162000) at dialog.c:568
        d_key = <optimized out>
        event = {buttons = 0 '\000', modifiers = 0 '\000', vc = 0, dx = 0, dy = 0, x = -1, y = -21621, 
          type = (GPM_MOVE | GPM_DRAG | GPM_UP | GPM_ENTER | GPM_LEAVE | unknown: 20480), clicks = -1391003136, margin = (GPM_TOP | GPM_BOT | GPM_RGT | unknown: 22016), 
          wdx = -28784, wdy = -21576}
#5  dlg_run (h=0x560bad162000) at dialog.c:1267
No locals.
#6  0x0000560bab8d1126 in create_panels_and_run_mc () at midnight.c:954
No locals.
#7  do_nc () at midnight.c:1757
        ret = <optimized out>
#8  0x0000560bab8aafc9 in main (argc=1, argv=0x7ffc3a300668) at main.c:418
        mcerror = 0x0
        config_migrated = <optimized out>
        config_migrate_msg = 0x0
        exit_code = 1

strace:

read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99998})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99997})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
read(0, "", 1)                          = 0
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
select(5, [0 4], NULL, NULL, {0, 100000}) = 1 (in [0], left {0, 99999})
select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])

comment:68 in reply to: ↑ 66 Changed 8 years ago by ginggs

Replying to and:

Looks like we looping in getch_with_delay() all day long,
when mc thinks to retrieve next key but slang/ncurses never return a key after resume?

I don't think suspending and resuming are relevant.

Steps to reproduce:
compile mc --with-screen=ncurses (does not occur with slang)
subshell must be enabled and a suitable shell available (does not occur with busybox)
open a GUI terminal (gnome-terminal, xterminal, konsole is also mentioned)
start mc as root (sudo mc)
close GUI terminal
1 CPU will continue running at 100%

comment:69 Changed 8 years ago by and

thanks ginggs for more information.

with slang mc will exiting with

SLang_getkey returned SLANG_GETKEY_ERROR
Assuming EOF on stdin and exiting

but under ncurses getch() error condition check is complicated.

ncurses getch() can return ERR which is an error on delay mode, but not strictly on no-delay mode.
So stdin EOF may never signaled by ncurses getch() in no-delay mode (I have no test case for checking ncurses getch() in no-delay mode if returning ERR _and_ an errno state)

patch will handled ncurces getch() error in delay mode at least which solve looping on stdin EOF.

Last edited 8 years ago by and (previous) (diff)

comment:70 Changed 8 years ago by ginggs

Thanks, andreas!
That patch mc-2244-infinite-loop-when-stdin-fd-got-deleted.patch works for me.

comment:71 Changed 8 years ago by and

#3108 is a duplicate this

comment:72 Changed 8 years ago by zaytsev

Ticket #3108 has been marked as a duplicate of this ticket.

comment:73 Changed 7 years ago by andrew_b

  • Owner slyfox deleted
Note: See TracTickets for help on using tickets.