[jao@zack] osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ out ('fred1', 1) ('fred2', 0) ('fred3', 5)
Now suppose you want to find the total number of open requests across the cluster. You can pipe the tuples into an aggregation command:
[jao@zack] osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ agg 0 'total, node, count: total + count' $ 6
cd installosh tar xzvf osh-0.8.0.tar.gz cd osh-0.8.0 python ./setup.py installAfter this is done, you may remove the installosh directory and its contents.
An alternative installation technique is to use the osh command installosh. If you have osh installed locally, then the osh command installosh can do a remote install, assuming you have root access on both machines, and assuming you have ssh configured to not prompt for a password.
Due to the use of Python, the primitive numeric and string types are objects, and familiar operators and methods can be used, (assuming you're familiar with Python). So in the example above, Python subscripting was used to access a field in a tuple inside a nother tuple.
Osh uses tuples and (less often) lists to represent collections of objects. For example, the sql command outputs tuples corresponding to rows retrieved from the database.
[jao@zack] vmstat -n 1 | osh ^ ... $
An alternative approach is to execute the vmstat command from osh, using the sh (shell) function to spawn a command running vmstat:
[jao@zack] osh sh 'vmstat -n 1' ^ ... $The escaped command must be quoted.
One other possibility is that the first osh command generates data, in which case osh is the first thing on the command line, e.g.
[jao@zack] osh sql 'select * from person' ^ ... $
from oshconfig import * osh.sql = 'mydb' osh.sql.mydb.dbtype = 'postgres' osh.sql.mydb.host = 'localhost' osh.sql.mydb.db = 'mydb' osh.sql.mydb.user = 'fred' osh.sql.mydb.password = 'l3tme1n'
Slice notation can also be used, e.g.
from oshconfig import * osh.sql = 'mydb' osh.sql['mydb'].dbtype = 'postgres' osh.sql['mydb'].host = 'localhost' osh.sql['mydb'].db = 'mydb' osh.sql['mydb'].user = 'fred' osh.sql['mydb'].password = 'l3tme1n'The .oshrc file contains Python code, which is executed when osh starts. The osh.sql entries configure the connection to the database, (a postgres database on localhost, named mydb, accessed with username fred and password l3tme1n). The first line, osh.sql = 'mydb' configures mydb (described by the osh.sql.mydb lines) as the default database. If no database configuration is specified, as in example 2 above, then mydb will be used. With this configuration, these commands produce the same result:
[jao@zack] osh sql 'select * from person' $ [jao@zack] osh sql mydb 'select * from person' $
Remote execution profiles can be created too. Here is the setup for a remote execution profile named flock:
osh.remote = 'flock' osh.remote.flock.user = 'root' osh.remote.flock.hosts = ['192.168.100.1', '192.168.100.2', '192.168.100.3', '192.168.100.4']The hosts can also be specified using map notation, in which the key is a logical name. Example:
osh.remote = 'flock' osh.remote.flock.user = 'root' osh.remote.flock.hosts = {'seagull1': '192.168.100.1', 'seagull2': '192.168.100.2', 'seagull3': '192.168.100.3', 'seagull4': '192.168.100.4'}When referring to individual nodes, osh uses logical names, (e.g. when the copyfrom command creates directories). To execute a command on each node of this cluster:
[jao@zack] osh @flock [ ... ] ...Or, because flock is the default cluster:
[jao@zack] osh @ [ ... ] ...To execute just on seagull3:
[jao@zack] osh @flock:3 [ ... ] ...This selects nodes belonging to the flock cluster containing the substring 3. (So flock:seagull would select all nodes.) A cluster configuration can also specify a default database for each node, e.g.
osh.remote = 'flock' osh.remote.flock.user = 'root' osh.remote.flock.hosts = {'seagull1': {'host': '192.168.100.1', 'db_profile': 'db1'}, 'seagull2': {'host': '192.168.100.2', 'db_profile': 'db2'}, 'seagull3': {'host': '192.168.100.3', 'db_profile': 'db3'}, 'seagull4': {'host': '192.168.100.4', 'db_profile': 'db4'}}This says that on seagull1, the default database profile is db1, on seagull2 it's db2, etc. This specification of a database overrides the default profile specified in the .oshrc file on each node, and can be overridden by a profile name specified with the sql command.
osh help f
For example, a list of files can be obtained as follows:
[jao@zack] find . | osh ^ f 's: path(s)' $This command is pointless by itself, as it produces the same output as find. agg can be added to this command, to compute the total size of these files, as follows:
[jao@zack] find . | osh ^ f 's: path(s)' ^ agg 0 'sum, p: sum + p.size' $The f command outputs a set of path objects. The first argument to agg specifies the initial value of the sum, 0. The aggregation function is:
sum, p: sum + p.sizeThis function has two variables, sum and p. sum is the total size, for all paths processed so far. p is a path passed from the f command. sum + p.size adds the size of the file represented by p to the sum. This value will be passed to sum on the next invocation of the aggregation function. Or, if there are no more paths, then the sum is passed to the next command, which prints the result.
There is a second form of the agg command, which computes sums for groups of input objects. For example, suppose we want to compute a histogram of word lengths for the words in /usr/share/dict/words, (i.e., find the number of words of length 1, length 2, length 3, ...).
The osh command to do this is:
[jao@zack]cat /usr/share/dict/words | osh ^ agg -g 'w: len(w)' 0 'count, n: count + 1' $ (2, 49) (3, 536) (4, 2236) (5, 4176) (6, 6177) (7, 7375) (8, 7078) (9, 6093) (10, 4599) (11, 3072) (12, 1882) (13, 1138) (14, 545) (15, 278) (16, 103) (17, 57) (18, 23) (19, 3) (20, 3) (21, 2) (22, 1) (28, 1)(The first number is the length. The second number is the number of words of that length.)
Agg uses this grouping function (specified with the -g flag):
w: len(w)which returns the length of word w. This causes agg to define a group for each word length. The remaining arguments to agg describe how to do aggregation for each group. The initial value of the aggregation is zero, and this function:
count, n: count + 1increments the groups counter.
agg -g does not generate output until the entire input stream has been processed. It has to be this way because group members are in no particular order. In some situations, group members appear consecutively. In these cases, the -c flag can be used instead of -g. This reduces memory requirements of the agg command (to a single group instead of all groups); and also allows output to be generated sooner. This is important in some applications, e.g. when output from commands such as vmstat and top are being processed.
[jao@zack] osh copyfrom -c flock /var/log/messages /tmpAfter executing this command, if flock contains nodes seagull1, seagull2, seagull3, then seagull4, then /tmp will contain:
seagull1/messages seagull2/messages seagull3/messages seagull4/messages
[jao@zack] osh copyto -pr -c flock /home/jao/foobar /tmp-p preserves file modes and times. -r does a recursive copy.
('a', (1, 2, 3), 'x')Then this command expands generates one output sequence for each item of the nested sequence, with each output sequence containing one of the items in the nested sequence:
[jao@zack] osh ^ ... ^ expand 1 $ ('a', 1, 'x') ('a', 2, 'x') ('a', 3, 'x')The argument to expand is the position of the nested sequence to be expanded.
expand can also be used to expand a top-level sequence, by omitting the argument. For example, if the input stream contains these sequences:
('a', 1) ('b', 2) ('c', 3)
then expand with no arguments works as follows:
[jao@zack] osh ^ ... ^ expand $ ('a',) (1,) ('b',) (2,) ('c',) (3,)expand can also be used to insert the contents of files into an output stream. For example, suppose we have two files, a.txt:
a1 a2 a3
and b.txt:
b1 b2Now suppose that the input stream to expand contains ('a.txt',) and ('b.txt',). Then a listing of the files, with each line including the name of a file and one line from the file as follows:
[jao@zack] osh ^ ... ^ f 'x: (x, x)' ^ expand 1 $ ('a.txt', 'a1') ('a.txt', 'a2') ('a.txt', 'a3') ('b.txt', 'b1') ('b.txt', 'b2')f 'x: (x, x)' duplicates the file name x. The first occurrence is kept for the output, and the second occurrence is expanded. When expand is applied to a string, (the filename in position 1), the string is interpreted as a filename and the lines of that file are generated in each output tuple.
[jao@zack] find /usr/bin | osh ^ f 's: path(s).size' ^ agg 0 'sum, size: sum + size' $f can also be used as the first osh command, to run a function with no arguments. For example, to get a list of all processes pids and command lines:
[jao@zack] osh f 'processes()' ^ expand ^ f 'p: (p.pid(), p.command_line())' $
[jao@zack] osh f path("/etc").walk()' ^ expand ^ ...would generate a stream of path objects, each representing a file under the directory /etc.
gen with no arguments generates integers starting at 0. (The end will not be reached for a very, very long time.)
gen N generates integers from 0 through N - 1. gen N S generates N integers starting at S.
Example: This command:
[jao@zack] osh gen 3 $ (0,) (1,) (2,)The output contains tuples, each containing an integer. This is because osh always pipes tuples or lists of objects between commands.
[jao@zack] osh help Usage: help [OSH_COMMAND] Print usage information for the named osh command. Builtin commands: agg copyfrom copyto expand f gen help install installosh out remote reverse select sh sort sql squish stdin timer unique version window [jao@zack] osh help unique Usage: unique [-c] Copies input objects to output, dropping duplicates. No output is generated until the end of the input stream occurs. However, if the duplicates are known to be consecutive, then specifying -c allows output to be generated sooner.
[jao@zack] osh gen 1000 ^ imp random ^ f 'random.randint(1, 10)' $
[root@zack] osh install -c flock *.pyWill install all .py files in the current directory on the cluster named flock (as configured in .oshrc). This command must be run as root.
If the cluster name is omitted, then installation goes to the remote cluster.
If there is no cluster with the specified name, then the name is interpreted as a hostname, permitting installation on a single host without specifying configuration in .oshrc. In this case, the user must be specified using the -u flag:
[root@zack] osh install -c seagull99 -u root *.py
To install osh on a cluster named flock:
[root@zack] osh installosh flockThis will copy your (i.e., root's) .oshrc file to /root on each node. You can specify a different .oshrc file using the -c flag (but the file still goes to /root/.oshrc), e.g.
[root@zack] osh installosh -c different_osh_config_file flockTo install to the default cluster, omit the cluster name argument. As with other remote commands, if the cluster name cannot be found in the .oshrc file, it will be interpreted as the name of a single node.
Formatting, using the Python % operator can be done by providing a formatting string. For example, this command prints rows from the person table, using default tuple formatting:
[jao@zack] osh sql 'select * from person' ^ out ('julia', 6) ('hannah', 11)If a formatting string is used, then the command is:
[jao@zack] osh sql 'select * from person' ^ out 'The age of %s is %d' The age of julia is 6 The age of hannah is 11To write to a file, replacing the existing contents, specify the file's name with the -f flag, e.g.
[jao@zack] osh sql 'select * from person' ^ out -f people.txtTo append to the file instead, use the -a flag, e.g.
[jao@zack] osh sql 'select * from person' ^ out -a people.txt
Because commands sequences are so often terminated by ^ out, a slightly more convenient piece of syntax is provided. At the end of a command only, ^ out can be replaced by $. If you want to send the output to a file using the -f or -a flags, or specify a format, or if you want to invoke out anywhere but the end of a command, then you must use the out command; $ is not syntactically legal. (Most examples in this document use $.) It is often useful to generate output in a CSV format (comma-separated values). This can be done using a formatting string, but the out command also supports a flag, -c, that generates CSV format
[jao@zack] osh py 'foo = 123; bar = 456' ^ f 'foo + bar' $ (579,)
[jao@zack]cat /usr/share/dict/words | osh ^ select 's: len(s) >= 20' $ ('antidisestablishmentarianism',) ('electroencephalogram',) ('electroencephalograph',) ('electroencephalography',) ('Mediterraneanization',) ('Mediterraneanizations',) ('nondeterministically',)
[jao@zack] osh sh 'cat /home/jao/somefile' ^ ...which is equivalent to
[jao@zack] cat /home/jao/somefile' | osh ^ ...sh can also be used to run OS commands, binding input from earlier osh commands. Example:
[jao@zack] osh gen 5 ^ sh 'mkdir dir%s'osh gen 5 generates the integers 0, 1, 2, 3, and 4. Each value is substituted for %s in the mkdir command, and the resulting mkdir command is executed. This results in the creation of directories dir0, dir1, dir2, dir3 and dir4.
[jao@zack] cat file | osh ^ sort $The same could be done using the Unix sort command, of course.
Sorting is done using the default Python comparison function cmp. You can also provide your own sorting function. For example, to sort a list of words by length, shortest words first:
[jao@zack] cat file | osh ^ sort 's: len(s)' $
[jao@zack] osh sql 'select * from person' $ ('julia', 6) ('hannah', 11)If the query is an INSERT, DELETE or UPDATE statement, then there is no output. In these cases, the query may have variables, denoted by %s, which are assigned values from incoming objects. For example, suppose person.txt contains this data:
alexander 13 nathan 11 zoe 6This data can be loaded into the database by the following command:
[jao@zack] cat person.txt | osh ^ f 's: s.split()' ^ sql "insert into person values('%s', %s)"Splitting the lines of the file results in tuples ('alexander', '13'), ('nathan', '11'), ('zoe', '6'). These tuples are bound to the two %s occurrences in the INSERT statement. Running the SELECT statement again shows that the data from person.txt has been added:
[jao@zack] osh sql 'select * from person' $ ('julia', 6) ('hannah', 11) ('alexander', 13) ('nathan', 11) ('zoe', 6)To access a database using a non-default profile, specify the database configuration's name before the query, e.g.
[jao@zack] osh sql mydb 'select * from person' $ ('julia', 6) ('hannah', 11)The complete rules for selecting a database profile are as follows:
If the osh stream of objects contains sequences, then squish could be applied. For example, suppose the input stream contains these sequences:
(1, 2, 3) (4, 5, 6) (7, 8, 9)Then to compute the sum of each sequence, we could do this:
[jao@zack] osh ^ ... ^ f '*x: reduce(lambda a, b: a + b, x)' $ (6,) (15,) (24,)Osh provides the squish command which does the same sort of thing as applying the Python reduce function using the osh command f, but more concisely. The above command line is equivalent to the following:
[jao@zack] osh ^ ... ^ squish + $ (6,) (15,) (24,)If the arguments to squish comprise a single occurrence of +, as above, then the + can be omitted, e.g.
[jao@zack] osh ^ ... ^ squish $ (6,) (15,) (24,)If each input sequence contains nested sequences, then the squish command can be used to do multiple reductions in parallel. For example, suppose the input contains sequences of sequences like this:
((1, 2, 3), (10, 20, 30), (100, 200, 300))To combine items in like positions, (e.g. 1 + 10, + 100, 2 + 20 + 200, 3 + 30 + 300), then we can do this:
[jao@zack] osh ^ ... ^ squish '+ + +' $ (111, 222, 333)The operators that can appear in the argument to squish (and make sense) are +, *, min and max, e.g.
[jao@zack] osh ^ ... ^ squish '+ min max' $ (111, 2, 300)111 is 1 + 10 + 100. 2 is min(2, 20, 200). 300 is max(3, 30, 300).
[jao@zack] osh timer 1 $ (2005, 9, 18, 23, 55, 57, 6, 261, 1) (2005, 9, 18, 23, 55, 58, 6, 261, 1) (2005, 9, 18, 23, 55, 59, 6, 261, 1) (2005, 9, 18, 23, 56, 0, 6, 261, 1) (2005, 9, 18, 23, 56, 1, 6, 261, 1) ...
The tuples generated as output have the same format as time.localtime(), (in fact, time.localtime() is used to generate them.)
timer is useful for running other commands on a regular basis, in particular, monitoring command. For example, the memory footprint of httpd processes could be monitored every 10 seconds as follows:
[jao@zack] osh timer 10 ^\ > f 'ts: (strftime("%H:%M:%S", ts), processes())' ^\ > spread 1 ^\ > select 'time, proc: proc.command_line().find("httpd") > 0' ^\ > f 'time, proc: (time, proc.size())' $ ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 29237248) ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 22241280) ('00:09:51', 17436672) ...
the good the bad and the uglyThen this command can be used to obtain the unique words (sorted):
[jao@zack] cat ~/foo.txt | osh ^ f 's: s.split()' ^ spread ^ unique ^ sort $ ('and',) ('bad',) ('good',) ('the',) ('ugly',)
[jao@zack] osh version $ ('0.8.0',)As with other osh commands, if you don't explicitly request output (e.g. using $ or ^ out, there will be no output.
[jao@zack] osh gen 100 ^ window 'n: n % 10 == 0' ^ squish $ (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) (20, 21, 22, 23, 24, 25, 26, 27, 28, 29) (30, 31, 32, 33, 34, 35, 36, 37, 38, 39) (40, 41, 42, 43, 44, 45, 46, 47, 48, 49) (50, 51, 52, 53, 54, 55, 56, 57, 58, 59) (60, 61, 62, 63, 64, 65, 66, 67, 68, 69) (70, 71, 72, 73, 74, 75, 76, 77, 78, 79) (80, 81, 82, 83, 84, 85, 86, 87, 88, 89) (90, 91, 92, 93, 94, 95, 96, 97, 98, 99)Each input to window is a tuple containing a single integer. window combines these into a tuple of tuples. squish concatenates the interior tuples. (Without squish the first output tuple would be ((0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)).)
Another way to form windows is to specify window sizes. In this example, the gen command is used to generate a stream of numbers, 0 through 9. The window command turns these into two lists of five numbers each. The -d flag specifies window size:
[jao@zack] osh gen 10 ^ window -d 5 ^ squish $ (0, 1, 2, 3, 4) (5, 6, 7, 8, 9)The -d flag means that the windows are disjoint -- each input to the window command is assigned to a single output list. Overlapping lists can be created by specifying the -o flag. After a list is formed, the next list is formed by shifting out the first item in the list, and adding a new item at the end, e.g.
[jao@zack] osh gen 10 ^ window -o 5 ^ squish $ (0, 1, 2, 3, 4) (1, 2, 3, 4, 5) (2, 3, 4, 5, 6) (3, 4, 5, 6, 7) (4, 5, 6, 7, 8) (5, 6, 7, 8, 9) (6, 7, 8, 9, None) (7, 8, 9, None, None) (8, 9, None, None, None) (9, None, None, None, None)Notice that the last four lines of output contain padding (None) to fill out each list to 5 items as specified.
Every osh command processes objects in one stream only, stream o by default. Example:
[jao@zack] osh gen 3 $ (0,) (1,) (2,)The gen command writes its output to stream o, and the osh command prints whatever arrives on stream o. The stream processed by a command can be specified explicitly as follows:
[jao@zack] osh gen 3 ^ o : out (0,) (1,) (2,)If the string label is changed from o to e, then no output is generated:
[jao@zack] osh gen 3 ^ e : outThis is because the out command only processes objects in stream e.
Here is an example of a command that generates an error:
[jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' $ (f#3['x: (x, float(x + 1) / x)'], 0, 'float division') (1, 2.0) (2, 1.5)gen 3 generates the integers 0, 1, and 2. The f command generates tuples (x, float(x + 1) / x). For x = 0, division by zero occurs, which is an error. The first line of output describes this error, which shows up on stream e. The next two lines show (non-erroneous) output on stream o.
To make the handling of the o and e streams clearer, consider this command:
[jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' ^ o : out 'OUT: %s', e : out 'ERR: %s' ERR: (f#3['x: (x, float(x + 1) / x)'], 0, 'float division') OUT: (1, 2.0) OUT: (2, 1.5)After the last ^ there are two out commands, separated by commas:
o : out 'OUT: %s' e : out 'ERR: %s'The first out command handles the o stream and prints OUT at the beginning of each line. The second out command handles the e stream and prints ERR at the beginning of each line.
So why did the original command, with only a single out command (handling just the o stream) print both streams? Because osh provides a handler of the e stream if you don't. Suppressing error output is dangerous. In a future version of osh you will be able to replace the error handler.
Stream names can be changed using the :: operator. For example, if for some reason you wanted to switch the e and o streams:
[jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' ^ o :: e, e :: o ^ o : out 'OUT: %s', e : out 'ERR: %s' OUT: (f#3['x: (x, float(x + 1) / x)'], 0, 'float division') ERR: (1, 2.0) ERR: (2, 1.5)o :: e moves everything in the o stream to the e stream, and e :: o does the opposite.
#!/usr/bin/python from osh.oshapi import *Execution of an osh command is done by a function named osh. Commands are provided using function invocations inside the osh call. Piping of objects from one command to the next is implied by the order of the function invocations. Example:
#!/usr/bin/python from osh.oshapi import * osh(gen(3), out('%s'))gen(3) generates a stream containing the integers 0, 1, 2, exactly as the osh command gen would do, run from the command line. Results are piped to the next function invocation. out('%s') prints to stdout all objects received from the previous command, formatting using %s.
Output from this script is:
0 1 2Various osh commands take python functions as arguments, e.g. f, select, and agg. In the osh API, a function may be passed by naming a function, providing a lambda expression, or as a string. However, functions that will be invoked remotely must be passed as strings. (This is because remote invocation relies on pickling, and functions and lambdas do not seem to be pickle-able.)
For example, the following two osh invocations produce the same output:
osh(gen(3), f(lambda x: x * 10), out()) osh(gen(3), f('x: x * 10'), out())The use of the 'o' stream for normal output and the 'e' stream for error output is identical to command-line osh. To specify handling for a particular stream, the osh API relies on python dicts in which the key is a stream name, and the value is osh code that handles the stream.
For example, the following osh statement generates integers 0, 1, 2, 3, 4, and computes f(x) = x / (x - 2) for each. x = 2 results in division by zero.
osh(gen(5), f(lambda x: x / (x - 2)), out())Output:
(0,) (-1,) ERROR: ('_F#1{}[We can replace the invocation of out by a dict specifying the handling of the 'o' and 'e' streams:at 0xb7f47844>]', (2,), 'integer division or modulo by zero') (3,) (2,)
osh(gen(5), f(lambda x: x / (x - 2)), {'o': out('OK: %s'), 'e': [f(lambda command, input, message: input), out('ERR: %s')]})Now, normal output is handled by out, using the format 'OK: %s'; and error output, on stream 'e', is handled by a sequence of commands which picks the offending input value, and formats it using 'ERR: %s'.
osh(gen(10, 1), agg(1, lambda fact, x: fact * x), out())gen(10, 1) generates the integers 1, ..., 10. The invocation of agg computes factorial, keeping a partial result, multiplying it by each incoming integer. The partial result is initialized to 1, the first argument to agg; and the multiplication is done by the second argument, a lambda expression. In the lambda expression, fact is the partial result, x is the incoming integer, and fact * x is the next partial result.
agg can also be used to compute an aggregate for "groups" of input values. For example, the following expression computes the sum of the odd and even integers between 0 and 9:
osh(gen(10), agg(group(lambda x: x % 2), 0, lambda sum, x: sum + x), out())There are two groups, 0 and 1, computed by x % 2 for each input value x. group(lambda x: x % 2) specifies the grouping function. Aggregation within a group is done by initializing a partial result to 0 (the second argument to agg, and then applying the aggregation function (the last argument to agg) repeatedly. The results from group 0 and group 1 are kept separate. Output from the above expression is:
(0, 20) (1, 25)If it is known that group members are consecutive in the input sequence, then the grouping function can be specified using consecutive() instead of group(). This reduces memory requirements (since only one partial result needs to be maintained at a time), and makes results available to the next osh function as each group is completed.
For example, suppose the grouping function is x / 2 instead of x % 2. Then for x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, the values of the grouping function will be 0, 0, 1, 1, 2, 2, 3, 3, 4, 4. Because the members of each group are adjacent, this expression:
osh(gen(10), agg(group(lambda x: x / 2), 0, lambda sum, x: sum + x), out())is equivalent to this one:
osh(gen(10), agg(consecutive(lambda x: x / 2), 0, lambda sum, x: sum + x), out())Both produce this output:
(0, 1) (1, 5) (2, 9) (3, 13) (4, 17)
osh(copyfrom('flock', '/var/log/messages', '/tmp'))After executing this command, if flock contains nodes seagull1, seagull2, seagull3, then seagull4, then /tmp will contain:
seagull1/messages seagull2/messages seagull3/messages seagull4/messagesThe scp options -r (recursive) -p (preserve file attributes) and -C (compress) are supported. These can be specified using the scp function inside the copyfrom call. For example, to copy all of /var/log to /tmp, using all of these flags:
osh(copyfrom('flock', scp('rpC'), '/var/log/messages', '/tmp'))
osh(copyto('flock', scp('rpC'), '/home/jao/foobar', '/tmp'))
('a', (1, 2, 3), 'x')Then this command expands generates one output sequence for each item of the nested sequence, with each output sequence containing one of the items in the nested sequence:
osh(..., expand(1), out())The output from this statement is:
... ('a', 1, 'x') ('a', 2, 'x') ('a', 3, 'x') ...The argument to expand is the position of the nested sequence to be expanded.
expand can also be used to expand a top-level sequence, by omitting the argument. For example, if the input stream contains these sequences:
('a', 1) ('b', 2) ('c', 3)
then expand() (no arguments) generates this output:
('a',) (1,) ('b',) (2,) ('c',) (3,)expand can also be used to insert the contents of files into an output stream. For example, suppose we have two files, a.txt:
a1 a2 a3
and b.txt:
b1 b2Now suppose that the input stream contains ('a.txt',) and ('b.txt',). Then a listing of the files, with each line including the name of a file and one line from the file, can be generated by this statement:
osh(..., f(lambda x: (x, x)), expand(1))This generates the following output:
('a.txt', 'a1') ('a.txt', 'a2') ('a.txt', 'a3') ('b.txt', 'b1') ('b.txt', 'b2')f(lambda x: (x, x) duplicates the file name x. The first occurrence is kept for the output, and the second occurrence is expanded. When expand is applied to a string, (the filename in position 1), the string is interpreted as a filename and the lines of that file are generated in each output tuple.
For example, the following statements both print tuples of the form (x, x * 100), for x = 0, ..., 9:
osh(gen(10), f(lambda x: x * 100), out()) osh(gen(10), f('x: x * 100'), out())
osh(gen(3), out())generates this output:
(0,) (1,) (2,)gen(N, S) generates N integers starting at S.
osh(gen(3), out())prints gen output as tuples:
(0,) (1,) (2,)because osh passes tuples from one function to the next. Formatting using %s:
osh(gen(3), out('%s'))generates this output instead:
0 1 2
To write to a file instead of stdout, use the append function to append to a file, or the write function to create or replace it:
osh(gen(3), out(write('/tmp/numbers.txt'))) osh(gen(3), out(append('/tmp/numbers.txt')))It is often useful to generate output in a CSV format (comma-separated values). This can be done using a formatting string, but this can also be done by calling csv() inside of out(), e.g.
osh(gen(3), out(write('/tmp/numbers.txt'), csv()))
osh(gen(3), reverse(), out('%s'))generates this output:
2 1 0
osh(gen(100), select(lambda x: (x % 7) == 0), out())
osh(sh('ls -l /tmp'), out())sh can also be used to run OS commands, binding input piped in from other osh functions. For example, this statement creates directories dir0, ..., dir4 in /tmp:
osh(gen(5), sh('mkdir /tmp/dir%s'), out())
osh(stdin(), sort(), out())Sorting is done using the default Python comparison function cmp. You can also provide your own sorting function. For example, to sort stdin by length of each input line:
osh(stdin(), sort(lambda line: len(line)), out())
osh(sql('select * from person'), out())might generate this output:
('julia', 6) ('hannah', 11)For INSERT, DELETE or UPDATE statements, there may be variables, denoted by %s, which are assigned values from incoming objects. For example, suppose person.txt contains this data:
alexander 13 nathan 11 zoe 6This data can be loaded into the database by the following command:
osh(stdin(), f(lambda s: s.split()), sql("insert into person values('%s', %s)"))Splitting the lines of the file results in tuples ('alexander', '13'), ('nathan', '11'), ('zoe', '6'). These tuples are bound to the two %s occurrences in the INSERT statement.
To access a database using a non-default configuration (specified in .oshrc), specify the database configuration's name before the query, e.g.
osh(sql('mydb', 'select * from person'), out())The complete rules for selecting a database profile are as follows:
If the osh stream of objects contains sequences, then squish could be applied. For example, suppose the input stream contains these sequences:
(1, 2, 3) (4, 5, 6) (7, 8, 9)Then to compute the sum of each sequence, we could do this:
osh(..., f(lambda *x: reduce(lambda a, b: a + b, x), out())producing this output:
(6,) (15,) (24,)Osh provides the squish command which does the same sort of thing as applying the Python reduce function using the osh command f, but more concisely. The above statement is equivalent to:
osh(..., squish('+'), out())If the arguments to squish comprise a single occurrence of +, as above, then the + can be omitted, e.g.
osh(..., squish(), out())If each input sequence contains nested sequences, then the squish command can be used to do multiple reductions in parallel. For example, suppose the input contains sequences of sequences like this:
((1, 2, 3), (10, 20, 30), (100, 200, 300))To combine items in like positions, (e.g. 1 + 10, + 100, 2 + 20 + 200, 3 + 30 + 300), then we can do this:
osh(..., squish('+ + +'), out())which yield this output:
(111, 222, 333)The operators that can appear in the argument to squish (and make sense) are +, *, min and max. For example, given the same input as in the preceding example, this statement:
osh(..., squish('+ min max'), out())yields this output:
(111, 2, 300)111 is 1 + 10 + 100. 2 is min(2, 20, 200). 300 is max(3, 30, 300).
osh(timer(1), out())generates this output, with lines appearing every second:
(2005, 9, 18, 23, 55, 57, 6, 261, 1) (2005, 9, 18, 23, 55, 58, 6, 261, 1) (2005, 9, 18, 23, 55, 59, 6, 261, 1) (2005, 9, 18, 23, 56, 0, 6, 261, 1) (2005, 9, 18, 23, 56, 1, 6, 261, 1) ...In general, the input to timer is a string of the form HH:MM:SS. An int can be passed for intervals up to 59 seconds.
(0,) (1,) (2,) (3,) (0,) (1,) (2,) (3,)then this statement:
osh(..., unique(), out())generates this output:
(1,) (0,) (3,) (2,)Ordering is not guaranteed. If the input is know to be structured such that all duplicates are consecutive, then this variant can be used:
osh(..., unique(consecutive()), out())consecutive() reduces consecutive inputs to a single copy, minimizes memory requirements, and generates output sooner.
osh(gen(100), window(lambda n: n % 10 == 0), squish(), out())This statement generates the following output:
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) (20, 21, 22, 23, 24, 25, 26, 27, 28, 29) (30, 31, 32, 33, 34, 35, 36, 37, 38, 39) (40, 41, 42, 43, 44, 45, 46, 47, 48, 49) (50, 51, 52, 53, 54, 55, 56, 57, 58, 59) (60, 61, 62, 63, 64, 65, 66, 67, 68, 69) (70, 71, 72, 73, 74, 75, 76, 77, 78, 79) (80, 81, 82, 83, 84, 85, 86, 87, 88, 89) (90, 91, 92, 93, 94, 95, 96, 97, 98, 99)Each input to window is a tuple containing a single integer. window combines these into a tuple of tuples. squish concatenates the interior tuples. (Without squish the first output tuple would be ((0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)).)
Another way to form windows is to specify window sizes. In this example, gen is used to generate a stream of numbers, 0 through 9. The window function turns these into two lists of five numbers each:
osh(gen(10), window(disjoint(5)), squish(), out())The output from this statement is:
(0, 1, 2, 3, 4) (5, 6, 7, 8, 9)disjoint(5) specifies that windows of size 5, with non-overlapping elements should be created.
Overlapping lists can be created by specifying the window size using overlap After one window is formed, the next one is formed by shifting out the first item in the list, and adding a new item at the end. For example, this statement:
osh(gen(10), window(overlap(5)), out()) (0, 1, 2, 3, 4) (1, 2, 3, 4, 5) (2, 3, 4, 5, 6) (3, 4, 5, 6, 7) (4, 5, 6, 7, 8) (5, 6, 7, 8, 9) (6, 7, 8, 9, None) (7, 8, 9, None, None) (8, 9, None, None, None) (9, None, None, None, None)Notice that the last four lines of output contain padding (None) to fill out each list to 5 items as specified.
Example:
[jao@zack] find /usr/bin | osh ^ f 's: (n(), path(s)) $ (0, path('/usr/bin')) (1, path('/usr/bin/consolehelper')) (2, path('/usr/bin/catchsegv')) (3, path('/usr/bin/gencat')) (4, path('/usr/bin/getconf')) (5, path('/usr/bin/getent')) (6, path('/usr/bin/glibcbug')) ...Each osh command has its own copy of the n() function. So using n() multiple times in the same command will yield different values on each call, e.g.
[jao@zack] find /usr/bin | osh ^ f 's: (n(), n(), path(s)) $ (0, 1, path('/usr/bin')) (2, 3, path('/usr/bin/consolehelper')) (4, 5, path('/usr/bin/catchsegv')) (6, 7, path('/usr/bin/gencat')) (8, 9, path('/usr/bin/getconf')) (10, 11, path('/usr/bin/getent')) (12, 13, path('/usr/bin/glibcbug')) ...But calls in different commands are independent of one another, e.g.
[jao@zack] find /usr/bin | osh ^ f 's: (n(), path(s))' ^ f 't: (n(),) + t' $ (0, 0, path('/usr/bin')) (1, 1, path('/usr/bin/consolehelper')) (2, 2, path('/usr/bin/catchsegv')) (3, 3, path('/usr/bin/gencat')) (4, 4, path('/usr/bin/getconf')) (5, 5, path('/usr/bin/getent')) (6, 6, path('/usr/bin/glibcbug')) ...
ifelse(predicate, thenExpr, elseExpr) returns thenExpr if predicate is true, elseExpr otherwise.
For example, this function finds the longest word in /usr/share/dict/words:
[jao@zack] cat /usr/share/dict/words | osh ^ agg '""' 'longest, w: ifelse(len(w) > len(longest), w, longest)' $ ('antidisestablishmentarianism',)
The path module can be used inside osh commands to obtain objects representing files and directories. For example, here is a command to print a list of files under your home directory:
[jao@zack] osh f 'path("/etc").walk()' ^ spread $ /etc/wgetrc /etc/pnm2ppa.conf /etc/a2ps.cfg /etc/security /etc/security/group.conf /etc/security/chroot.conf /etc/security/time.conf ...This is obviously pointless, since osh adds little to what find already does. Here is another example which includes file size and sorts by descending file size:
[jao@zack] osh f 'path("/etc").walk() ^ spread ^ f 'p: (p.size, p)' ^ sort 't: -t[0]' $ (129993L, path('/etc/lynx.cfg.sk')) (115004L, path('/etc/squid/squid.conf.default')) (115004L, path('/etc/squid/squid.conf')) (91259L, path('/etc/ld.so.cache')) (26104L, path('/etc/squid/mib.txt')) (23735L, path('/etc/webalizer.conf')) (15276L, path('/etc/a2ps.cfg')) (11651L, path('/etc/squid/mime.conf.default')) (11651L, path('/etc/squid/mime.conf')) (6300L, path('/etc/pnm2ppa.conf')) (4096L, path('/etc/security')) ...
[jao@zack] find /usr/bin | osh ^ f 's: path(s).abspath()' ^ f 'p: (p, p.stat())' ^ sort 't: -t[1].size' $ (path('/usr/bin/gmplayer'), stat{mode:33261, inode:345865, device:770, hardLinks:1, uid:0, gid:0, size:4975052, atime:1098396622, mtime:1065382845, ctime:1098221020}) (path('/usr/bin/mplayer'), stat{mode:33261, inode:345865, device:770, hardLinks:1, uid:0, gid:0, size:4975052, atime:1098396622, mtime:1065382845, ctime:1098221020}) (path('/usr/bin/mencoder'), stat{mode:33261, inode:345864, device:770, hardLinks:1, uid:0, gid:0, size:4331308, atime:1065382845, mtime:1065382845, ctime:1098221019}) (path('/usr/bin/emacs'), stat{mode:33261, inode:345185, device:770, hardLinks:2, uid:0, gid:0, size:4093052, atime:1105552462, mtime:1045723299, ctime:1076337118}) (path('/usr/bin/emacs-21.2'), stat{mode:33261, inode:345185, device:770, hardLinks:2, uid:0, gid:0, size:4093052, atime:1105552462, mtime:1045723299, ctime:1076337118}) (path('/usr/bin/gs'), stat{mode:33261, inode:344661, device:770, hardLinks:1, uid:0, gid:0, size:3233020, atime:1105475462, mtime:1053254951, ctime:1076346525}) (path('/usr/bin/ghostscript'), stat{mode:33261, inode:344661, device:770, hardLinks:1, uid:0, gid:0, size:3233020, atime:1105475462, mtime:1053254951, ctime:1076346525}) (path('/usr/bin/doxygen'), stat{mode:33261, inode:345455, device:770, hardLinks:1, uid:0, gid:0, size:3120192, atime:1043451839, mtime:1043451839, ctime:1076337735}) ...(p.size is no longer needed because the same information can be obtained from the stat object.)
[jao@zack]top -n 1 b | grep httpd | osh ^ f 's: top(s.split())' $ top{pid:4123, user:root, priority:15, nice:0, size:2048, rss:76, share:48, stat:S, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:0, command:httpd} top{pid:4137, user:apache, priority:15, nice:0, size:2040, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4138, user:apache, priority:15, nice:0, size:2052, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4139, user:apache, priority:15, nice:0, size:2048, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4140, user:apache, priority:15, nice:0, size:2048, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4141, user:apache, priority:15, nice:0, size:2056, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:0, command:httpd} top{pid:4142, user:apache, priority:15, nice:0, size:2052, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4143, user:apache, priority:15, nice:0, size:2116, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd} top{pid:4144, user:apache, priority:15, nice:0, size:2116, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
[jao@zack] vmstat -n 1 | osh ^ select 's: n() > 1' ^ f 's: (strftime("%H:%M:%S"), vmstat(tuple(s.split())))' ^ select 't, v: v.id < 20' $
[jao@zack] osh sql 'select * from person' ^ outbut this is not (no space after ^):
[jao@zack] osh sql 'select * from person' ^outand neither is this (no space before or after ^):
[jao@zack] osh sql 'select * from person'^out