Ticket #1627 (closed defect: fixed)

Opened 7 years ago

Last modified 7 years ago

Viewer doesn't support CJK

Reported by: egmont Owned by: slavazanko
Priority: major Milestone: 4.7.0-pre4
Component: mcview Version: 4.7.0-pre2
Keywords: Cc: galtgendo@…
Blocked By: Blocking:
Branch state: Votes for changeset: committed-master

Description

In a fully UTF-8 environment, the builtin viewer doesn't properly display double width (CJK) characters. (The editor does display them, though.) Locale, mc's display width are set to UTF-8, file charset (T) set to UTF-8 or No translation.

Open any text file with UTF-8 CJK characters in one of its lines. All but the last characters are replaced by a single (not double) space, and then the last one is shown correctly (but positioned incorrectly).

Terminal is from Mac OSX 10.5.8; I don't think it matters. mc-4.7-pre2 with slang ran from Mac as well as from Linux over ssh.

Attachments

mc-widechar-view.patch (478 bytes) - added by mnk 7 years ago.
patch version 1.0

Change History

comment:1 Changed 7 years ago by andrew_b

  • Component changed from mc-core to mcview

comment:2 Changed 7 years ago by andrew_b

Currentrly, MC supports 7-bit, 8-bit and UTF-8 locales. Multibyte non-UTF-8 locales are not supported. Parches are welcome!

comment:3 follow-up: ↓ 4 Changed 7 years ago by egmont

I *am* talking about UTF-8.

CJK is a well-known abbreviation on Chinese, Japanese and Korean characters. They require special treatment in terminals because they take up 2 character cells. It has nothing to do with encoding.

CJK characters in UTF-8 are supported by mc's two panel mode, it seems to me that they are handled correctly on UI strings (mc's Japanese translation for example), in filenames, in command line, in dialog boxes etc. They are also handled correctly (minor bugs put aside) by mcedit. mcview seems to be the only component that doesn't support them at all. This is a regression from 4.6+utf8 patches - they did correctly display CJK in the viewer.

comment:4 in reply to: ↑ 3 Changed 7 years ago by angel_il

Replying to egmont:

CJK is a well-known abbreviation on Chinese, Japanese and Korean characters. They require special treatment in terminals because they take up 2 character cells. It has nothing to do with encoding.

you right, but it seems to me that I have repaired it in the current version "master", or not?

comment:5 Changed 7 years ago by egmont

Current master produces the same buggy behavior for me.

comment:6 Changed 7 years ago by mnk

I really hate such regressions.
I ranted about this problem several months ago - it got fixed then.
But whoever redone viewer broke it again.

I'm in the process of re-figuring out midnight (as I've been out of touch lately),
but the outlook looks good.
Perhaps I'll have a correct patch soon.

Changed 7 years ago by mnk

patch version 1.0

comment:7 Changed 7 years ago by mnk

v.1.0 is mostly working - 2 issues

  • floating wrapping: if you scroll through long CJK text

(meaning - nearly no spaces, almost whole double-width),
you'll see line break of the first line is moving as you scroll
(bad description, see yourself); IIRC, it was that way even before regression

  • 4096 bytes break: every 4096 bytes there's a chance it happens in the middle

of an utf8 char (I'm not sure if this can happen on non-double-width and
don't know yet where does this number come from), leading to a valid char(s)
treated as unprintable

comment:8 Changed 7 years ago by mnk

Well, second issue seems to come from mcview_file_load_data,
but short of moving distinction between utf8 and 1-byte from
display only to load data stage, I can't see how to fix it.

comment:9 Changed 7 years ago by mnk

  • Cc galtgendo@… added

comment:10 Changed 7 years ago by mnk

v.1.0 is only for src/viewer/plain.c, but
src/viewer/nroff.c may need that too.

comment:11 Changed 7 years ago by angel_il

branch: 1627_widechar_in_viewer

  • 4096 bytes break: every 4096 bytes there's a chance it happens in the middle...

i need think about this...

comment:12 Changed 7 years ago by angel_il

comment:13 Changed 7 years ago by angel_il

  • Status changed from new to accepted
  • Owner set to angel_il

comment:14 Changed 7 years ago by mnk

Good example for floating wrapping can be seen,
when viewing ftp://ftp.monash.edu.au/pub/nihongo/radkfile.gz
(after locally uncompressing).
You'll see that as you scroll down, break in the first line moves.

You'll see the 4096 bug there too.

comment:15 Changed 7 years ago by mnk

And as for your fix, are you sure zero-width chars
won't be a problem (OK, I'm not sure which are those
and if they're printable, hoping you do) ?

comment:16 Changed 7 years ago by angel_il

  • Milestone changed from 4.7 to 4.7.0-pre4

comment:17 Changed 7 years ago by angel_il

  • severity changed from no branch to on review

comment:18 Changed 7 years ago by angel_il

nroff: b0c06ef13fbb559a16218241d3327490d08c2a4d

other known troubles should be fixed in #1730

comment:19 Changed 7 years ago by andrew_b

  • Votes for changeset set to andrew_b

comment:22 Changed 7 years ago by slavazanko

  • Votes for changeset changed from andrew_b to andrew_b slavazanko
  • severity changed from on review to approved

comment:23 Changed 7 years ago by angel_il

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b slavazanko to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged

comment:24 Changed 7 years ago by angel_il

  • Status changed from testing to closed

comment:25 Changed 7 years ago by slavazanko

  • Status changed from closed to reopened
  • Votes for changeset commited-master deleted
  • Resolution fixed deleted
  • severity changed from merged to no branch

src/glibcompat.c have incorrect code.

comment:26 Changed 7 years ago by slavazanko

  • Status changed from reopened to accepted
  • Owner changed from angel_il to slavazanko
  • severity changed from no branch to on review

Created branch 1627_glib_macros_fix

Initial changeset:7f056d01edf85b0790ed0cfe748d24d0ca904e18

Review, please.

comment:27 Changed 7 years ago by slavazanko

comment:28 Changed 7 years ago by angel_il

  • Votes for changeset set to angel_il

comment:29 Changed 7 years ago by andrew_b

  • Votes for changeset changed from angel_il to angel_il andrew_b
  • severity changed from on review to approved

comment:30 Changed 7 years ago by slavazanko

  • Status changed from accepted to testing
  • Votes for changeset changed from angel_il andrew_b to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged

comment:31 Changed 7 years ago by slavazanko

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.