Ticket #3694 (closed defect: fixed)

Opened 8 years ago

Last modified 7 years ago

Fix hex pattern parsing

Reported by: mooffie Owned by: andrew_b
Priority: minor Milestone: 4.8.19
Component: mc-search Version: master
Keywords: Cc:
Blocked By: Blocking: #400, #3589
Branch state: merged Votes for changeset: committed-master

Description

There are a few bugs in the way hexadecimal search patterns are parsed.

Worse: errors in patterns aren't reported to the user. So, currently, hex searching is like shooting in the dark; when MC informs you "Search string not found" you can't know if it's because the data really isn't there or because you have an error in your pattern syntax.

This series of patches fixes this.

Tests are included, so it's safe to commit this.

Change History

comment:1 Changed 8 years ago by mooffie

  • Blocking 3695 added

comment:2 Changed 8 years ago by andrew_b

  • Blocking 400 added

comment:3 Changed 7 years ago by andrew_b

Thanks for patches!

I have some remarks.

  • 3694-0002-Hex-patterns-report-errors-to-the-user.patch​ added:

val is unsigned actually (see [8a0a49c07e0725891db89b773b3e4a6c9974ed03]. We don't want the merge conflict here). The MC_SEARCH_HEX_E_NUM_OUT_OF_RANGE message should be concordant with that.

  • 3694-0004-Hex-patterns-handle-invalid-characters.patch​:
    +        if (g_ascii_isspace (tmp_str[loop]))
    +        {
    +            /* Eat-up whitespace between tokens. */
    +            while (g_ascii_isspace (tmp_str[loop]))
    +                loop++;
    +        }
    

if is unneeded: this case is covered by first run of while.

  • 3694-0005-Hex-patterns-fix-quotes-handling.patch​:

The refactoring of loop in mc_search__hex_translate_to_regex() should be moved to separate patch.

-    g_string_ascii_down (mc_search_cond->str);

Why?

  • 3694-0009-Hex-patterns-fix-manual-page.patch​:

See above: hex value is unsigned.

comment:4 Changed 7 years ago by andrew_b

ping?

comment:5 follow-up: ↓ 6 Changed 7 years ago by mooffie

Sorry for not replying earlier. I'll tend to this issue soon.

Do you mind if I post a link to a branch on my github account instead of uploading files here? That's because I may want to insert an extra patch(1) and I don't see a way to delete files here.

(1) Probably at the beginning, to fix the documentation (man page) at step zero, because of @Andreas' removing support for negative numbers (which is a good thing).

comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 7 years ago by andrew_b

Replying to mooffie:

I don't see a way to delete files here.

Do you have the "Delete attachment" button at the bottom of attachment page? I no, probably you don't have permissions to do that. I can delete all your attachments in this ticket for you.

(1) Probably at the beginning, to fix the documentation (man page) at step zero, because of @Andreas' removing support for negative numbers (which is a good thing).

Probably, the 3693_cleanup branch is correct place for that.

comment:7 in reply to: ↑ 6 ; follow-up: ↓ 8 Changed 7 years ago by mooffie

Replying to andrew_b:

Replying to mooffie:

(1) Probably at the beginning, to fix the documentation (man page) at step zero, because of @Andreas' removing support for negative numbers (which is a good thing).

Probably, the 3693_cleanup branch is correct place for that.


Ok, I added a patch there (#3693). I'll wait till you approve that patch before I proceed.

(I included in that patch *all* the documentation fixes, as they're independent of the code fixes presented here. The reason I couldn't just remove the "-1 is converted to 0xFF" sentence, in that patch, is because I would've then left the example without a "bare" number.)

comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 7 years ago by andrew_b

Replying to mooffie:

Ok, I added a patch there (#3693). I'll wait till you approve that patch before I proceed.

I've added your patch to the 3693_cleanup branch. But then the 3693_cleanup branch must be merged into master one.
Could you please review (or just glance over) the 3693_cleanup branch?

Changed 7 years ago by mooffie

Changed 7 years ago by mooffie

Changed 7 years ago by mooffie

Changed 7 years ago by mooffie

(This patch, the n'th, is no longer necessary (was committed in #3693 instead).)

comment:9 in reply to: ↑ 8 Changed 7 years ago by mooffie

Ok, the patches were re-rolled.

Replying to andrew_b:

the 3693_cleanup branch must be merged into master one.
Could you please review (or just glance over) the 3693_cleanup branch?


I will do it this evening. I'll reply there if I see something amiss.

3694-0005-Hex-patterns-fix-quotes-handling.patch​:

-    g_string_ascii_down (mc_search_cond->str);

Why?


First: that line makes the pattern "aBc" match abc instead of aBc. The man page says "Text in quotes is matched exactly after removing the quotes" (Although proper case support for hex search will be done in #3589).

Second: that line was introduced in #2705 to make "0X" interchangeable with "0x". This concern is handled in the patch that follows ("clean up handling of 0x prefixes").

(BTW, the unit test covers both issues.)

3694-0005-Hex-patterns-fix-quotes-handling.patch​:

The refactoring of loop in mc_search__hex_translate_to_regex() should be moved to separate patch


There's no refactoring there. It's a bug fix: the escape character, '\', wasn't handled correctly: it must be excluded from the result string. The unit test covers this issue.

(It wasn't the only quote-related bug there: the index passed to g_string_append_len() was wrong.)

3694-0004-Hex-patterns-handle-invalid-characters.patch​:

+        if (g_ascii_isspace (tmp_str[loop]))
+        {
+            /* Eat-up whitespace between tokens. */
+            while (g_ascii_isspace (tmp_str[loop]))
+                loop++;
+        }

if is unneeded: this case is covered by first run of while.


This check is not redundant. (Thankfully, the tests will detect the bug that would ensue if one were to remove it.)

The current code is as follows:

while (loop < str_len && !error)
{
  if (try_to_read_something()) {
    ...
  }
  else if (try_to_read_something_else()) {
    ...
  }
  else if (try_to_read_something_else_still()) {
    ...
  }
  else if (try_to_read_something_else_yet()) {
    ...
  }
  else
    error = INPUT_ERROR;
}

This is an easy to understand structure, and relatively resistant to errors. What you want me to do is change it to:

while (loop < str_len && !error)
{
  skip_whitespace();

  if (try_to_read_something()) {
    ...
  }
  else if (try_to_read_something_else()) {
    ...
  }
  else if (try_to_read_something_else_still()) {
    ...
  }
  else
    if (!we_are_past_end_of_string)
      error = INPUT_ERROR;
}

Note how this makes the code less elegant. First, we duplicate the check for the end of the string (for the case it ends in whitespace). Second, whenever we modify the code in the future we'll have to be careful that all the try_to_read_something_*() handle the NUL case properly.

So, if you want me to change it, ask again and I will, though I don't see the benefit.

comment:10 follow-up: ↓ 11 Changed 7 years ago by mooffie

Interestinly, I now see that sscanf() knows to handle the "0x" and "0X" prefixes itself. The standard confirms this (via the link to strtoul). So I wonder why #2705 introduced the removal of these prefixes. (@Andrew: do you happen to recall why you did this? Perhaps there exist libc's that don't do this?)

This means that we can simply remove the 5 lines that handle the prefixes. The tests all pass.

(I'll later have a look in K&R's book to see how they document scanf(), to make sure this is how it behaves in all systems.)

comment:11 in reply to: ↑ 10 Changed 7 years ago by mooffie

Replying to mooffie:

(I'll later have a look in K&R's book to see how they document scanf(), to make sure this is how it behaves in all systems.)


Ok, that book documents "%x" as "hexadecimal integer (with or without leading 0x or 0X)". I'm attaching a patch to remove this explicit prefix handling.

comment:12 Changed 7 years ago by zaytsev

OMG, it's awesome, and of all things, with unit tests!!!11

Sorry, I can't have a look right now, because I'm very stressed out, among other things due to the fallout from this [1], but I still want to let you know somehow that what you're doing is very much appreciated.

[1]: https://mail.python.org/pipermail/pypy-dev/2016-October/014726.html

comment:13 Changed 7 years ago by mooffie

Note: because of #3720 one can't see search error messages (e.g. "Number out of range") in the viewer. So you should test them in the editor instead.

comment:14 Changed 7 years ago by andrew_b

  • Blocking 3589 added; 3695 removed

comment:15 Changed 7 years ago by andrew_b

  • Owner set to andrew_b
  • Status changed from new to accepted
  • Votes for changeset set to andrew_b
  • Branch state changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.19

Branch: 3694_hex_pattern_parsing
Initial changeset:bef3a8a8a8c283697b7fecf0645626ba0e504fd9

comment:16 Changed 7 years ago by andrew_b

  • Branch state changed from on review to approved

comment:17 Changed 7 years ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

Merged to master: [92b7acf7bfc2e29faf7d5c1169c895155e21b319].

git log pretty=oneline 4095a04..92b7acf

comment:18 Changed 7 years ago by andrew_b

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.