[tclug-list] line cut: another missing coreutil

Mon Jun 3 20:40:52 CDT 2013

Mike -- try this:

#!/usr/bin/gawk -f

# print_ranges.awk
# usage: print_ranges.awk file rows cols field-separator 
# ex: print_ranges.awk file 92-97,5-8,23-42,55-71 2,3,5 ,

BEGIN {
 OFS= ""
 ORS = ""
 FS = ""
 arg_sep = ","

 file = ARGV[1];
 range_cnt = split(ARGV[2], ranges, arg_sep);
 choice_cnt = split(ARGV[3], col_choice, arg_sep);
 col_sep = ARGV[4]

 for(i = 1; i <= range_cnt; i++) { 
 num = split(ranges[i], start_stop, "-");
 n = 0;
 start = ranges[i];
 stop = start;

 if (num > 1) {
 start = start_stop[1]
 stop = start_stop[2]
 }

 while(getline < file) {
 n++;
 if(n >= start && n <= stop) {
 cc_cnt = split($0, cc_arr, col_sep);

 for(j = 1; j <= cc_cnt; j++) { 
 for(k = 1; k <= choice_cnt; k++) { 
 if(j == col_choice[k]) {
 print cc_arr[j]
 if(k < choice_cnt) { print col_sep }
 }
 }
 } 
 print "\n"
 } 
 } close(file)
 }
}

jack

-----Original Message-----
From: Mike Miller [mailto:mbmiller+l at gmail.com]
Sent: Monday, June 3, 2013 04:53 AM
To: 'TCLUG Mailing List'
Subject: Re: [tclug-list] line cut: another missing coreutil

Thanks, Jack. Now the cut-like list of lines will work:seq 10000000 | print_ranges.awk 1-5,55,27One big problem is that in this example your script uses about 1.9 GB of memory and takes 20 seconds when the memory is immediately available. My friend's perl script uses 0.002 GB of memory and uses 0 seconds. The only difference is that it does not reorder the lines. When the last line of the 10 million input lines is in the output, then the perl script is slower, taking about a minute, but still only using 0.002 GB memory. Your script takes the same amount of time and uses the same amount of memory whether it has to read to the end or not. Thus both these use 1.8 GB of RAM and take 20 seconds:seq 10000000 | print_ranges.awk 1seq 10000000 | print_ranges.awk 10000000The perl script finishes the first one almost instantly but it takes longer than the awk script on the second one, though it uses minimal memory.MikeOn Mon, 3 Jun 2013, zedlan at invec.net wrote:> Mike,>>> I made a few changes to the script per your request:>>> #!/usr/bin/gawk -f>>> # print_ranges.awk> # usage: takes csv arg string to-from1,to-from2, ...> # ex: cat file | print_ranges.awk 92-97,5-8,23-42,55-71>>> BEGIN {> range_cnt = split(ARGV[1], ranges, ",");>>> while(getline < "-") {> line_arr[++n] = $0;> } close("-")>>> for(i = 1; i <= range_cnt; i++) {> num = split(ranges[i], start_stop, "-");>> if(num == 1) {> start = ranges[i];> stop = start;> } else {> start = start_stop[1]> stop = start_stop[2]> }>>> for(j = start; j <= stop; j++) {> print line_arr[j];> }> }> }_______________________________________________TCLUG Mailing List - Minneapolis/St. Paul, Minnesotatclug-list at mn-linux.orghttp://mailman.mn-linux.org/mailman/listinfo/tclug-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20130604/96acbf0e/attachment.html>