Skip to main content

How to extract n number of lines from a text file

3 replies [Last post]
sanjeev's picture
Offline
Joined: 21 Feb 2011

Problem Scenario: Suppose you have a text file which contains thousands of lines and you need to extract n number of lines from that file from any positions.  Example:  first n lines, last n lines, n lines from start_range to end_range
Ex: first 5 lines
Last 5 lines
Text from line no 6 to 16 (10 lines)

Solution: Write a program in Perl (as I know Perl) to do the same. A program should ask for a file name and range to extract the lines. It will have 2 arguments if it’s just first n lines or last n lines then otherwise three. If it’s 2 arguments and length is positive n which means First n lines, if it’s negative then it means last n lines otherwise range would be given. And make sure start range is less than end range and only integer.

 
[Code]:

  1.  
  2. #!/usr/bin/perl
  3.  
  4. ##########################################################################
  5. # Author: Sanjeev Jaiswal
  6. # Date: April, 2013
  7. # Description: this script will manipulate a file (textfile)
  8. # and will extract given range of lines
  9. # @extracted_array = extract_lines_from_files($start, $end);
  10. ###########################################################################
  11.  
  12. die "You should pass file name and then range e.g.
  13. perl script-name filename 12 17 or \nperl script-name filename 5 (from top) or
  14. perl script-name filename -5 (for last 5 lines)\n" unless (scalar @ARGV == 2 || scalar @ARGV == 3 );
  15.  
  16. my $file_input = $ARGV[0];
  17. my $start_range = $ARGV[1];
  18. my $end_range = $ARGV[2];
  19.  
  20. my $file_output = "extracted_lines.txt";
  21.  
  22. if(scalar @ARGV == 2){
  23. if ($start_range > 0){
  24. $end_range = $start_range -1 ;
  25. $start_range = 0;
  26. }
  27. else{
  28. $end_range = -1;
  29. }
  30. }
  31. die "Range should be integer only" unless( checktype($start_range) && checktype($end_range) );
  32. die "End range ($end_range) cannot be lesser than start range ($start_range)\n" if ($start_range > $end_range);
  33.  
  34.  
  35. open IN, "<", "$file_input" or die "Can't open file $file_input: $!\n";
  36. open OUT, ">", "$file_output" or die "Can't open file to write $file_output: $!\n";
  37.  
  38. my @content_from_first_file = <IN>;
  39. my $final_text = &extract_lines(\@content_from_first_file, $start_range, $end_range);
  40.  
  41. # Loop over the array to write in a file
  42. foreach (@{$final_text})
  43. {
  44. print OUT $_; # Print each line from array to extracted_text file
  45. }
  46.  
  47. close(IN);
  48. close(OUT);
  49.  
  50. #This subroutine will extract the lines from the array for a given range
  51. sub extract_lines{
  52. my $content = shift;
  53. my $start = shift;
  54. my $end = shift;
  55. my @sliced_text = @{$content}[$start .. $end];
  56. return \@sliced_text;
  57. }
  58.  
  59.  
  60. sub checktype{
  61. my $num = shift;
  62. return 1 if($num =~ m/^[+-]?\d+$/);
  63. return 0;
  64. }
  65.  


Note:
It may not be efficient much but it would work fine. For more efficient way check seek() method in Perl.
Instead of regex to check if it’s integer or not, you may use POSIX module
EX:
use POSIX;
print "$start_range is digits\n" if isdigit $start_range;

If you wish to share your code or have any suggestions/feedback then please share with ur through comment box.

Follow us at :
Facebook | Twitter
########### Give me the right place to stand, I shall move the earth. #################

Offline
Joined: 24 Feb 2011
Can I do the same way to check for a string ?
Can I do the same way to check for a string ?
like I want to get the block into an array
Ex:
I have a file test.txt
#cat test.txt

admin test line     -->start
some line to test here
some line to test here
not fixed how many lines are there
here to match     --->match here
some more line
some more line
length is not fixed
match here this string  --> match here

where my output should be

print @arrary =(some line to test here
some line to test here
not fixed how many lines are there );

print @array2 =(some more line
some more line
length is not fixed );

this needs to get the lines between "admin test line" line and "here to match" matched line into arrary and the lines between "here to match"  and "match here this string "  will be at @array2.


I just test this with a simple script for the 1st match :
  1.  
  2. #!/usr/bin/perl
  3. use strict;
  4. use warnings;
  5.  
  6. my $path="/usr/home/test";
  7. my $test = $path ."/test.csv";
  8. open (fh_test, "<$test") || die "Could not open file $test\n";
  9. my @grabbed;
  10. while (<fh_test>) {
  11. if (/admin test line/) {
  12. push @grabbed, $_;
  13. while (<fh_test>) {
  14. last if /here to match/;
  15. push @grabbed, $_;
  16. }
  17. }
  18. }
  19. print "the array is @grabbed \n";
  20. close(fh_test);



My array value is coming blank.Can you please suggest ...
sanjeev's picture
Offline
Joined: 21 Feb 2011
Your requirement doesn't match your code
Hi Rani,
In your requirement, you are suggesting to have a script which will extract lines between given text , and it should again extract line between second text to third text to match and so on.
To write generic code you need to come up with logic but what you write , specially nested while on same file handle, is obviously wrong and upper while loop will run only for the first line and rest lines are run by inner loop.

Before using any such things, ask yourself, do you really need one extra loops?
By the way you have used only one array and that should have the value of first match between those two given texts. Your code is also matching and giving the output for first part but its wastage of loop and time.

for the second condition, you should use one more array right? or use hash or array or simple array or array.

Check this code snippet :
  1.  
  2. my $flag = 0;
  3.  
  4. while (<fh_test>) {
  5. $flag = 1 if (/admin test line/);
  6. last if /here to match/;
  7. push @grabbed, $_ if($flag);
  8. }


You may write $flag = 0 if /here to match/; but in that case it would read the file the end of the file so used last. In the same way try some logic to make it working for more such boundary conditions by using minimal data structures and loops :D

Follow us at :
Facebook | Twitter
########### Give me the right place to stand, I shall move the earth. #################

Offline
Joined: 24 Feb 2011
Thanks Sanjeev for providing the different options
Thanks Sanjeev for providing the different options, I just fixed it as below
  1.  
  2. while($fh_tset>)
  3. {
  4. @test = $_ if /admin test line/ .. /here to match/;
  5. $test1 = $_ if /here to match/ .. /match here this string/;
  6. }
  7. print "the first matched lines are @test\n";
  8. print "the second matched lines are @test1\n";


Cheers,
Saila

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.