Smart preparsing with the bash shell
Is hand quoting in bash a pain? Do you want to display the currently
executing command line in your xterm title? Don't you wish you could
just paste an URL into a waiting bash and have a browser start automatically?
The bash(1)
shell is a ubiquitous interactive command line environment; It is
powerful, but inevitably has many annoyances. Below is a trick you can
add to your .bashrc initialization file, which I call smart
preparsing.
When you're using bash interactively in an xterm,
smart preparsing lets you execute shell commands, and arbitrarily
modify the command line programmatically, right after you type RETURN
and right before bash looks at it. This is a generalization of the bash xterm title trick explained
previously.
You can do anything you like with the command line this way. You can
inspect it to see if it's a URL, then modify it to read "firefox URL".
You can take the value and write it into your xterm's title
bar. You can see if the command is special, and if so surround the
parameters by quotation marks, but only if they're missing.
It's not trivial to do some of these things with a standard shell.
For example, the canonical form of a shell statement is
"COMMAND [ARG1 ARG2 ...]". If COMMAND isn't an executable program,
or some predefined symbol, the typical shell returns an error. So how do we
recognize a statement that looks like "http://localhost:8080" as a URL?
In the case of bash(1), the answer lies in using the
GNU readline facility
creatively. When bash is loaded in an xterm and asks for input, it delegates
the work of recognizing the keystrokes to the readline library.
In turn, readline has a macro binding facility, which lets us replace the
effect of typing a single key with a predefined sequence of other keys.
Take a look at the following bash commands. Each bind instruction
tells readline to replace a sequence of keys with a different sequence of
keys:
bind '"\C-x1": "\C-afoo \t\C-a\C-d\C-d\C-d\C-d"'
bind '"\C-x2": ""'
bind '"\C-x3": "\C-abar \t\C-a\C-d\C-d\C-d\C-d"'
bind 'RETURN: "\C-x1\C-x2\C-x3\C-e\n"'
The last instruction tells readline to replace the RETURN keystroke with
the sequence "\C-x1\C-x2\C-x3\C-e\n". In readline-speak, this sequence
means "press CONTROL+X, press 1, press CONTROL+X, press 2, press CONTROL+X,
press 3, press CONTROL+E, press NEWLINE".
The NEWLINE causes readline to
send the command line to bash for processing. If we hadn't set up the bind
instructions, then the effect of RETURN would be to press NEWLINE only.
The CONTROL+E key normally tells readline to put the cursor at the end of the
line, we do this just to be nice. What about "CONTROL+X followed by
1"? Normally, readline doesn't treat this specially, but look again at
the bind instructions above: the first instruction binds "CONTROL+X
followed by 1" to another sequence, so readline knows that when it
sees "CONTROL+X followed by 1" it should replace the sequence with
"CONTROL+A, f, o, o, SPACE, TAB, CONTROL+A..." etc. Similarly, we've
set up "CONTROL+X followed by 2" to be replaced by nothing, and
"CONTROL+X followed by 3" is replaced by another complicated sequence.
If you look at the macro definition for "\C-x1", it helps to know that
"\C-a" places the cursor at the beginning of the line, and "\C-d"
deletes the character immediately following the cursor. So the
sequence means "go to the beginning, type "foo " (note the extra
space), press TAB, go to the beginning, delete the next four
characters".
So imagine the current command line contains "abc". Then "\C-x1"
causes readline to replace the command line with "foo abc", to press
TAB while the cursor is one space after "foo", then to go back and
delete "foo" so that in the end the command line is again "abc".
Try seeing what "\C-x3" does, then put all this together and explain
how the binding for RETURN works.
So far, this seems rather pointless, but the TAB key has a special
meaning in readline. It calls the bash programmable completion for the
symbol foo. Normally, bash doesn't know about foo, but if we tell it
the following, then the effect will be to execute the function fooF()
whenever TAB is pressed.
function fooF() {
# some instructions...
}
complete -F fooF foo
What this means for us is that when we press RETURN, readline causes
bash to execute fooF(), and fooF() is a normal shell function which can
look at the command line through the $COMP_LINE variable. Note that
all this is happening before the NEWLINE ("\n") is sent, which
subsequently causes bash to actually read and execute the command line.
So we can do things in fooF(), such as filling in the xterm title with
the contents of the command line $COMP_LINE. Another thing
we can do is put some text in the variable $COMPREPLY. When bash
returns from executing fooF(), it automatically inserts the contents
of $COMPREPLY right after the cursor position. For example, if
we decide to put COMPREPLY="firefox ", then we get the following (the cursor
is represented by []):
abc[] (press RETURN)
foo []abc (press TAB)
(complete foo, execute fooF(), set COMPREPLY)
foo firefox []abc (finish the "\C-x1" sequence)
[]firefox abc (the "\C-x1" sequence is now finished)
(start the "\C-x2" sequence)...
So far, we're able to replace a command line such as "abc" with "firefox abc"
and such like, but how do we make bigger changes to the command line?
Remember that by default, we bound "\C-x2" to the empty string "", so
when it is called next, nothing is done and we go straight to "\C-x3".
But in fooF(), we're allowed
to rebind "\C-x2" again if we like, depending on what we see exists in
the command line.
For example, I have a shell function e() defined as follows:
function e() {
echo "$@" | bc -l
}
What this function does is call bc(1), the arbitrary precision calculator. I do this so I can type things
like "e 2+5" on the command line, and bash responds with 7.
Sometimes I type things like
% e "sqrt(2/77)"
.16116459280507605967
but invariably I forget to put in the quote marks, so I really type
% e sqrt(2/77)
bash: syntax error near unexpected token `('
This is correct behaviour, since bash treats () specially, but it's very annoying.
What I'd like instead is for bash to be smart enough to recognize that
I meant to type the argument in quotes.
Let's suppose that the function fooF() recognizes that I've typed
"e sqrt(2/77)" and decides to help me. It can rebind the "\C-x2" sequence
as follows:
bind '"\C-x2": "\C-a\M-f\C-f\"\C-e\""'
This instruction means replace "CONTROL-X followed by 2" with
a sequence which does the following: go to the beginning of the line,
go forward one word, go forward one character, press DOUBLEQUOTE, go to the
end of the line, press DOUBLEQUOTE again.
Imagine that the command line is "e sqrt(2/77)", and see what happens
to it when "\C-x2" is called.
Now we're nearly done. We can call arbitrary scripts in fooF(), perform
arbitrary edits afterwards by redefining "\C-x2", and then let bash
properly see the command line.
The only important thing left to do is to
clean up. This is important because if we changed "\C-x2", then it
will stay changed forever and we don't want to have strange key sequences
executing if we press "CONTROL+X followed by 2" by mistake!
To clean up, we simply call another function barF(), using the same
technique used for fooF().
Here is the full system in place, which does title bar manipulations,
calls a browser on simple URLs and quotes the calculator input.
It also quotes the simple URLS in case they contain bad shell characters
such as "&".
If you add this somewhere in your .bashrc, then all these things will
happen when you press RETURN. But don't do it now, because you'll get
errors. The code below needs the bc and firefox programs to be installed,
and might interact with other statements in your .bashrc if you've
extensively customized it. Just use it as a guide for your own experiments.
# this function redefines the RETURN key in readline,
# so that we can do things before bash executes it.
#
# This is done by inserting the key sequence C-x1, C-x2, C-x3,
# where C-x1 calls the function fooF(), C-x2 can be redefined
# any way you like, and C-x3 calls the function barF().
#
# The idea is that fooF() performs something like changing the
# xterm title, and maybe rebinds C-x2 so that we can modify the
# command line. Then barF() performs cleanup and other things.
#
# The way that C-x1 calls fooF() is by prepending a dummy function
# foo before the command and pressing TAB, which calls the programmable
# completion facility of bash. Then bash calls fooF() for us, but we
# must not forget to unset COMPREPLY, otherwise we'll get garbage displayed.
# If you want to fill in COMPREPLY, you should do so in barF()
#
# We need the three stages because we can't change the command line
# directly inside the fooF() function, at least I don't know how.
#
# Currently, I'm using fooF() to change the xterm title, and look at
# the command line to see if I want to protect it from shell expansion.
# C-x2 either does nothing, or puts quote marks around the command line.
# barF() does cleanup.
function setup-interactive-foo() {
# these commands update the xterm title etc.
set -o emacs
bind '"\C-x1": "\C-afoo \t\C-a\C-d\C-d\C-d\C-d"'
bind '"\C-x2": ""'
bind '"\C-x3": "\C-abar \t\C-a\C-d\C-d\C-d\C-d"'
bind 'RETURN: "\C-x1\C-x2\C-x3\C-e\n"'
complete -F fooF foo
complete -F barF bar
}
function setup-quote-protect() {
bind '"\C-x2": "\C-a\"\C-e\""'
}
function setup-quote-protect-arg() {
bind '"\C-x2": "\C-a\M-f\C-f\"\C-e\""'
}
function clear-quote-protect() {
bind '"\C-x2": ""'
}
function check-quote-char() {
C=${1:$2:1}
[ "$C" = '"' ] || [ "$C" = "'" ]
}
# command line looks like "foo $@"
function fooF() {
# place command line in xterm title bar
echo -ne "\033]0;bash: ${COMP_LINE:4}\007"
# rudimentary parsing of command line
case "${COMP_LINE:4}" in
http://*)
setup-quote-protect
;;
e\ *)
if ! check-quote-char "${COMP_LINE:4}" 2; then
setup-quote-protect-arg
fi
;;
esac
unset -v COMPREPLY
}
# command line looks like "bar $@"
function barF() {
clear-quote-protect
case "${COMP_LINE:4}" in
\"http*)
COMPREPLY="firefox "
;;
*)
unset -v COMPREPLY
;;
esac
}
function e() {
echo "$@" \
| sed 's/sin/s/g;s/cos/c/g;s/arctan/a/g;s/log/l/g;s/exp/e/g;s/bessel/j/g;' \
| bc -l
}
|