Ticket #2283 (new defect) — at Version 4

Opened 14 years ago

Last modified 9 years ago

mcview scrolling issues with heavy utf-8 files

Reported by: egmont Owned by:
Priority: major Milestone: 4.8.14
Component: mcview Version: 4.7.3
Keywords: Cc:
Blocked By: #2132 Blocking:
Branch state: no branch Votes for changeset:

Description (last modified by andrew_b) (diff)

wget http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
mcview UTF-8-demo.txt
Scroll up and down with the arrow (or pgup/pgdn) keys. Notice that very often a partial line appears on the top row, you have to press the arrow key twice or even more times to actually scroll by one line.
This happens when the topmost line that you're scrolling in or out contains lots of non-ascii characters. More precisely, I believe this occurs exactly when the number of bytes forming the topmost row is bigger than the terminal's width.
Buggy both in 4.7.3 and 4.7.0.7, fully UTF-8 environment.

Change History

Changed 14 years ago by egmont

Screenshot - though experiencing the behavior is much more useful

comment:1 Changed 14 years ago by egmont

Note: the bug only happens when word wrapping is enabled (that is, you see 2UnWrap in the button bar), and happens even despite the terminal being wider than the file.

comment:2 follow-up: ↓ 3 Changed 14 years ago by egmont

I'm looking at mc-4.7.0.7. Here the bug is in src/viewer/move.c, mcview_move_up() and mcview_move_down() functions, the view->text_wrap_mode branches. The logic that modifies col (e.g. col += width, col -= width etc.) assume that width and bytelenght are the same notions (because col actually means offset in the file), hence does not handle UTF-8 or CJK (double width) characters correctly.

I don't see what the best solution would be, probably someone more familiar with the utf8/width functions of mc could fix it much faster than me.

comment:3 in reply to: ↑ 2 Changed 14 years ago by andrew_b

  • Blocked By 2132 added

Replying to egmont:

The logic that modifies col (e.g. col += width, col -= width etc.) assume that width and bytelenght are the same notions (because col actually means offset in the file), hence does not handle UTF-8 or CJK (double width) characters correctly.

Yes, this is the known issue. At least the #2132 ticket requires such fix.

comment:4 Changed 14 years ago by andrew_b

  • Component changed from mc-core to mcview
  • Description modified (diff)
Note: See TracTickets for help on using tickets.