On Wed, 10 Jan 2007, G. Scott Walters wrote:
> I've got a couple hundred PDF files that have been malformed with some
> extra lines AFTER the EOF. This keeps them from being doing important
> things like printing, or displaying properly on some versions of
> Acrobat. Not all PDFs are necessarily effected with this issue...
>
> Since these files are hosted on a linux server, I figured the proper
> tool to solve this problem would be PERL. The question is, how....if I
> open the file with a standard open function, won't it read the file til
> the EOF and not beyond?
>
> I understand that SED might be helpful, but I'm sed-impaired, but I'm
> working on that.
This should do it:
perl -pi -e 'BEGIN{undef $/} ; s/\A(.+?%%EOF).*\z/$1\n/gs' *.pdf
That will remove everything after the newline following the first %%EOF in
all .pdf files in the default directory. I tested it on some files and it
worked. It can be used if the file is not corrupted -- it will then leave
the file unchanged except that it will change the date stamp. It is
pretty fast.
Best,
Mike