Ticket #2396 (closed defect: fixed)
Find File "Whole words" search bug
| Reported by: | x906 | Owned by: | slavazanko |
|---|---|---|---|
| Priority: | critical | Milestone: | 4.7.5 |
| Component: | mc-search | Version: | master |
| Keywords: | Cc: | gotar@…, zyv, shirsch | |
| Blocked By: | Blocking: | ||
| Branch state: | Votes for changeset: | committed-master committed-stable |
Description
when searching in files for non english word with "Whole words" set "on" - then nothig will be found
try search word: "время" and also "time" in attached file
mc ver: 4.7.4-90-g1e265ea
Attachments
Change History
comment:1 follow-up: ↓ 3 Changed 3 years ago by gotar
- Cc gotar@… added
Works fine with polish diacritics and ISO-8859-2 locale (LC_CTYPE to be exact). Are you using UTF-8? Try with KOI8-R.
comment:4 follow-ups: ↓ 5 ↓ 6 Changed 3 years ago by andrew_b
- Priority changed from major to critical
- Version changed from 4.7.4 to master
It seems, we have a global bug in search engine. For me, search of whole non-ASCII words (cyrillic, for example) doesn't work ar all: neither in files neither in editor nor in viewer.
comment:6 in reply to: ↑ 4 ; follow-up: ↓ 7 Changed 3 years ago by gotar
x905 - I suspected it may be UTF-8 related, but apparently it's not
andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)
comment:7 in reply to: ↑ 6 Changed 3 years ago by andrew_b
Replying to gotar:
andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)
It doesn't work with russian cyrillic (as KOI8-R as UTF-8) words (for example, "время" in ticket text). Moreover, search using "\bвремя\b" regular expression ("Regular expression" is on, "Whole words" is off) also doesn't find anything.
comment:8 follow-ups: ↓ 9 ↓ 17 Changed 3 years ago by slavazanko
As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html
"In all flavors, the characters [a-zA-Z0-9_] are word characters"
I don't know how fix this trouble, sorry :(
gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 3 years ago by x905
Replying to slavazanko:
I don't know how fix this trouble, sorry :(
maybe look at source of grep?
this works: grep -iw "время" ./f1
comment:10 in reply to: ↑ 9 Changed 3 years ago by andrew_b
Replying to x905:
maybe look at source of grep?
Yes :). We found a soluton in grep:
static char const word_beg[] = "(^|[^[:alnum:]_])(";
static char const word_end[] = ")([^[:alnum:]_]|$)";
This works for me in KOI8-R.
comment:11 Changed 3 years ago by slavazanko
- Status changed from new to accepted
- Keywords stable-candidate added
- severity changed from no branch to on review
- Owner set to slavazanko
- Milestone changed from 4.7 to 4.7.5
Created branch 2396_find_whole_words (parent: master)
changeset:c859e906d0bd6e91b1e52a8002c98de043dcb817
All other changeset is typo fixes and code refactoring:
- 466b34b8fb1ebf30e19dc30ccd75c85623f85aef: Code cleanup for avoid compiler warnings
- 4af1277e4fc7e093de1b72ac2d5fa02bd4c30413: Fixed bit operations in mc_search_regexprocess_append_str()
- 2793a312492079fe99ad56bfeb81802886a1a5d6: Removed mc_search_cond_t->len (used mc_search_cond_t->str->len instead).
- 100df42d9578af79ce4b80d5a6572d9a81fbb683: Avoid extra-allocation of string while prepare to regexp-search.
Review, please.
x905: thanks for tip :)
comment:12 follow-up: ↓ 15 Changed 3 years ago by x905
not work: in new attached file (f2) mc finds all first 1-6 lines
but grep also fail on line 6 :(
mc 4.7.4-103-g4e2ffca
also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"
comment:13 Changed 3 years ago by slavazanko
See new start changeset:f46302b2651bee6246f9f3349cdbbb67144fb284
Bug should be fixed :)
Review branch again, please.
comment:14 Changed 3 years ago by x905
better, but not complete - line "6. невремя" still in search results
(4.7.4-102-g02acc44)
comment:15 in reply to: ↑ 12 Changed 3 years ago by andrew_b
Replying to x905:
also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"
Did you run new mc binary with old mc environment, i.e. without installation? If yes, this is not a bug. Some files changed their locations (#1424):
Install help files into /usr/share/mc/help instead of /usr/share/mc. Install hint files into /usr/share/mc/hints instead of /usr/share/mc.
comment:16 Changed 3 years ago by x905
yes, with help is my fault - i do sudo make install, but has another instances of mc running
comment:17 in reply to: ↑ 8 Changed 3 years ago by gotar
Replying to slavazanko:
As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html
"In all flavors, the characters [a-zA-Z0-9_] are word characters"
gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?
Yes and no:
yes - it does find my string (in 'find file', not mcedit or mcview)
no - it treats every letter as separate word (i.e. despite of 'Search whole words' any substring of example 'ąćęśłżźńó' is being found, which means that all the characters are treated as word boundaries).
comment:18 Changed 3 years ago by slavazanko
Okay, check branch now. I have changed regexp for emulating '\b' behaviour (changeset:d5aa913edffc824075c72bcdd6411657df91f347). Hope this helps...
Review again, please.
comment:19 Changed 3 years ago by andrew_b
- Votes for changeset set to andrew_b
Fine. This is works for me.
comment:20 Changed 3 years ago by x905
works
comment:21 Changed 3 years ago by angel_il
- Votes for changeset changed from andrew_b to andrew_b angel_il
comment:23 Changed 3 years ago by andrew_b
Don't forget update po files after merge to master.
comment:24 Changed 3 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b angel_il to commited-master
- Resolution set to fixed
- severity changed from approved to merged
Merged to master: b60f00df0d8d1d52840ad81ed6529672957d555c
Updated *.po files: 5bf5dd170e00a4dc8e3fbf862dfbc23c1774a79e
comment:26 Changed 3 years ago by shirsch
- Status changed from testing to reopened
- Resolution fixed deleted
Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.
comment:27 follow-up: ↓ 29 Changed 3 years ago by slavazanko
- Cc shirsch added
- Status changed from reopened to closed
- Resolution set to invalid
Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.
Stop reopen! :)
Bug was fixed in our 'master' branch (in repository) and fix will included in near future release (4.7.0.10 in your case). Just await for new version. ;)
comment:28 Changed 3 years ago by slavazanko
Cherry-picked in stable branch:
comment:29 in reply to: ↑ 27 ; follow-up: ↓ 31 Changed 3 years ago by gotar
comment:30 Changed 3 years ago by andrew_b
- Keywords stable-candidate removed
- Status changed from closed to reopened
- Votes for changeset changed from commited-master to committed-master committed-stable
- Resolution invalid deleted
comment:31 in reply to: ↑ 29 Changed 3 years ago by andrew_b
- Status changed from reopened to closed
- Resolution set to fixed
