quoting issues with [[ string =~ re ]]
Obviously, the easiest (and I'd argue cleanest) resolution is to adopt the bash31/zsh approach instead.
You may also want to consider adding support for PCREs instead of EREs in the future (as zsh does with the rematchpcre option; PCREs are the new de-facto regex standard these days). And with the bash32+ approach, do a correct escaping could become tricky.
ksh93 behaves a bit like bash32+, but quoting works differently with quotes and with backslashes and quotes only disable some RE operators ([[ a =~ ".+" ]] matches there but not [[ a =~ \.\+ ]] nor [[ a = "a*" ]])
Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors. To support ksh-like handling of | and parentheses, however, I need to implement the quirky syntax parser that treats them as normal word characters. *sigh*
Reply To magicant
Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors.
Note that [[ =~ ]] comes from bash, not ksh. ksh93 added it later, but it's unfinished and pretty bogus there as mentioned above. ksh88, pdksh and all its derivatives don't have it.
I still need more time to learn what bash's and ksh's parsers are doing to handle special characters after the =~ token.
bash5.0 ksh2020 zsh5.8 yash2.50 0 0 SE SE [[ a =~ a|b ]] 0 0 SE SE [[ a =~ |a|b ]] 0 0 SE SE [[ a =~ a|| ]] 0 0 SE SE [[ a =~ ||a ]] 0 0 0 SE [[ a =~ (a) ]] 0 0 0 SE [[ a =~ (((a))) ]] 1 1 1 0 [[ a =~ "<" ]] 1 1 0 1 [[ a =~ "a|b" ]] 0 0 SE 0 [[ \\ =~ \\ ]] 1 1 0 1 [[ \\ =~ \\\\ ]] 1 1 0 1 [[ a =~ \(a\) ]] 1 1 0 1 [[ a =~ \.\+ ]] 1 1 0 1 [[ a =~ "a*" ]] 0 0 0 0 [[ a =~ a$ ]] 0 0 0 0 [[ z =~ [[:alpha:]] ]] 0 0 0 0 [[ \(\) =~ \(\) ]] 0 0 0 0 [[ \| =~ \| ]] 0 0 0 0 [[ aaa =~ a{3} ]] 1 0 0 1 [[ a =~ ".+" ]] (SE stands for syntax error)
bash5.0 ksh2020 zsh5.8 yash2.50 0 0 0 0 [[ 2 =~ $((1+1)) ]] 0 0 0 0 [[ a =~ `echo a` ]] 0 0 0 0 [[ a =~ `echo "a|b"` ]] 0 0 0 0 v=a; [[ a =~ ${v} ]] 0 0 0 0 s=\*; [[ abc =~ ab${s}c ]] 0 0 0 0 s=\|; [[ a =~ a${s}b ]] 1 1 1 1 s="\|"; [[ a =~ a${s}b ]] 0 0 0 0 e=\(a\|b\); [[ a =~ ${e} ]] 0 0 0 0 v=a; [[ a =~ "${v}" ]] 1 1 0 1 s=\*; [[ abc =~ "ab${s}c" ]] 1 1 0 1 s=\|; [[ abc =~ "a${s}b" ]] 1 1 1 1 s="\|"; [[ abc =~ "a${s}b" ]] 1 1 0 1 e=\(a\|b\); [[ a =~ "${e}" ]]
Reply To magicant
> bash5.0 ksh2020 zsh5.8 yash2.50
Note that ksh2020 (based on ksh93v-) development has been abandoned (and was very buggy). For a version of ksh93 still maintained and in the open, you can have a look at https://github.com/ksh93/ksh (based on ksh93u+). It's likely not to make a difference for your test cases in any case.
In any case, yes, for a [[ =~ ]] portable to all shells that have it, at the moment, the only viable option is to store the regexp in a variable and use [[ $subject =~ $regexp ]] (with $regexp unquoted).
Fixed the ( ) and | issue in r4151, but the escaping issue is still remaining.
bash5.0 ksh2020 zsh5.8 yash2.50 0 0 0 0 [[ b =~ [a"-"c] ]] 1 0 1 1 [[ b = [a"-"c] ]] 1 1 1 1 [[ - =~ [a"-"c] ]] 0 1 0 0 [[ - = [a"-"c] ]] 1 1 1 0 [[ \\ =~ ["."] ]] 1 1 1 1 [[ \\ = ["."] ]] 0 1 0 1 [[ \\ =~ [a[.\\.]c] ]] 1 1 1 0 [[ \\ = [a[.\\.]c] ]] 0 0 0 0 [[ a] =~ ^[a"]"]$ ]] 1 1 1 1 [[ a] = [a"]"] ]] 0 0 0 0 [[ [a] =~ "["a] ]] 0 0 0 0 [[ [a] = "["a] ]]
2.48 has introduced a Korn-style [[...]] construct. For the =~ operator, I see the bash32+ approach, as opposed to the bash31/zsh one was chosen with regards to quoting.
{ a =~ '.' ] does match as it does in zsh.
But:
[[ a =~ '.' ]] doesn't match, because quotes remove their special meaning of regex operators.
Now, a problem with that is that (, ), and | are regex operators but cannot appear in a normal shell word. At the moment in yash:
works (like in zsh) where || is the "OR" token inside [[...]]
But you can't use the | ERE operator:
Same as in zsh, but in zsh, like in bash3.1, you'd write [[ a =~ 'a|b' ]], but that doesn't work in yash because those quotes remove its special meaning to |.
In zsh, [[ a =~ (a|b) ]] works because (a|b) is the same syntax as the (a|b) glob operator (specific to zsh, ksh has @(a|b) instead).
There's a similar problem with ( and ):
yash also has the same bug (actually worse) as bash originally had in that, to remove the special meaning of re operators, it escapes them with \ before calling regcomp.
But it inserts that backslash even when it should not, like inside bracket expressions (as bash originally did), but also when before characters that are not regexp operators (bash didn't have that bug).
That means that [[ '\' =~ ["."] ]] matches (like in old bash versions), but also [[ x =~ "<" ]] on systems where \< is the word boundary operator for instance.
yash should insert that \ only where needed (where [...] is a special case, also beware of [^]")"]).
There's also the question of whether [[ b =~ [a"-"c] ]] should work the same as [[ b = [a"-"c] ]]