[tclug-list] line cut: another missing coreutil

Tue Jun 4 00:48:10 CDT 2013

I'll make data like so with a comma delimiter:

$ seq 100 | paste -d, - - | paste -d, - - - - -
1,2,3,4,5,6,7,8,9,10
11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30
31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50
51,52,53,54,55,56,57,58,59,60
61,62,63,64,65,66,67,68,69,70
71,72,73,74,75,76,77,78,79,80
81,82,83,84,85,86,87,88,89,90
91,92,93,94,95,96,97,98,99,100

It isn't handling multiple output lines correctly:

$ seq 100 | paste -d, - - | paste -d, - - - - - | print_ranges.awk - 3,4 1,2 ,
21,22

$ seq 100 | paste -d, - - | paste -d, - - - - - | print_ranges.awk - 3 1,2 ,
21,22

It does do it correctly when I use the minus unstead of the comma:

$ seq 100 | paste -d, - - | paste -d, - - - - - | print_ranges.awk - 3-4 1,2 ,
21,22
31,32

Now to try it with a tab delimiter:

$ seq 100 | paste - - | paste - - - - - | print_ranges.awk - 3-4 1,2 "	"
21	22
31	32

It would be cool to be able to use the \t syntax to specify the tab 
delimiter, and it almost works:

$ seq 100 | paste - - | paste - - - - - | print_ranges.awk - 3-4 1,2 "\t"
21\t22
31\t32

Thanks, Jack.  That's a cool idea.

Mike

On Tue, 4 Jun 2013, zedlan at invec.net wrote:

> Mike -- try this:
>
>
> #!/usr/bin/gawk -f
>
>
> # print_ranges.awk
> # usage: print_ranges.awk file rows cols field-separator
> # ex: print_ranges.awk file 92-97,5-8,23-42,55-71 2,3,5 ,
>
>
>
> BEGIN {
> OFS= ""
> ORS = ""
> FS = ""
> arg_sep = ","
>
> file = ARGV[1];
> range_cnt = split(ARGV[2], ranges, arg_sep);
> choice_cnt = split(ARGV[3], col_choice, arg_sep);
> col_sep = ARGV[4]
>
>
> for(i = 1; i <= range_cnt; i++) {
> num = split(ranges[i], start_stop, "-");
> n = 0;
> start = ranges[i];
> stop = start;
>
>
> if (num > 1) {
> start = start_stop[1]
> stop = start_stop[2]
> }
>
>
> while(getline < file) {
> n++;
> if(n >= start && n <= stop) {
> cc_cnt = split($0, cc_arr, col_sep);
>
>
> for(j = 1; j <= cc_cnt; j++) {
> for(k = 1; k <= choice_cnt; k++) {
> if(j == col_choice[k]) {
> print cc_arr[j]
> if(k < choice_cnt) { print col_sep }
> }
> }
> }
> print "\n"
> }
> } close(file)
> }
> }
>
>
>
>
> jack
>
>
>
> -----Original Message-----
> From: Mike Miller [mailto:mbmiller+l at gmail.com]
> Sent: Monday, June 3, 2013 04:53 AM
> To: 'TCLUG Mailing List'
> Subject: Re: [tclug-list] line cut: another missing coreutil
>
> Thanks, Jack. Now the cut-like list of lines will work:seq 10000000 | print_ranges.awk 1-5,55,27One big problem is that in this example your script uses about 1.9 GB of memory and takes 20 seconds when the memory is immediately available. My friend's perl script uses 0.002 GB of memory and uses 0 seconds. The only difference is that it does not reorder the lines. When the last line of the 10 million input lines is in the output, then the perl script is slower, taking about a minute, but still only using 0.002 GB memory. Your script takes the same amount of time and uses the same amount of memory whether it has to read to the end or not. Thus both these use 1.8 GB of RAM and take 20 seconds:seq 10000000 | print_ranges.awk 1seq 10000000 | print_ranges.awk 10000000The perl script finishes the first one almost instantly but it takes longer than the awk script on the second one, though it uses minimal memory.MikeOn Mon, 3 Jun 2013, zedlan at invec.net wrote:> Mike,>>> I made a few 
 changes to the script per your request:>>> #!/usr/bin/gawk -f>>> # print_ranges.awk> # usage: takes csv arg string to-from1,to-from2, ...> # ex: cat file | print_ranges.awk 92-97,5-8,23-42,55-71>>> BEGIN {> range_cnt = split(ARGV[1], ranges, ",");>>> while(getline < "-") {> line_arr[++n] = $0;> } close("-")>>> for(i = 1; i <= range_cnt; i++) {> num = split(ranges[i], start_stop, "-");>> if(num == 1) {> start = ranges[i];> stop = start;> } else {> start = start_stop[1]> stop = start_stop[2]> }>>> for(j = start; j <= stop; j++) {> print line_arr[j];> }> }> }_______________________________________________TCLUG Mailing List - Minneapolis/St. Paul, Minnesotatclug-list at mn-linux.orghttp://mailman.mn-linux.org/mailman/listinfo/tclug-list
>