-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expected positive integer in object trailer #7
Comments
Thank you for your report. The In your case, it may be that We might consider adding some code or a manual an option to handle this case in the relaxed mode in the future. |
Thank you for your return! I have try to remove the /prev field, and i have another error:
Below , the output by a dump of xref with caradoc:
And here, the first line of the file :
Another question : i try to find a way to convert malformed pdf files into a correct pdf format. I think there will be two techniques to do that. The first one: export the malformed pdf to a correct pdf (I realized a simple test with pdfcreator: by printing a malformed pdf to a pdf respecting the standard pdf/a, the resulting file is in a correct format. The second: parse the pdf file malformed and correct errors then export. What do you think ? Do you know of this type of tool? Can it be transposed in a web environment (example: convert a pdf while upload?) From a security point of view, files that will be converted to a pdf/a format should be clean and no longer have an antiviral threat? thanxs ! |
It looks like the first explanation was correct in your case (i.e. there should not be a /Prev field because there is no previous xref table).
You now have a type error in an object of type "content_stream". The error seems legitimate because the specification does not define a "/Type" field for this type. Also, bear in mind that caradoc aim at being a strict validator (e.g. to avoid any ambiguities), but that a lot of PDF-producing software are not so strict and type errors/inaccuracies are not uncommon. Besides, this is still a beta version, i.e. the type system does not yet implement all of the 700+ pages of the PDF specification, which requires a large amount of work: the specification describes everything in a natural language (English text) and we have to convert it into a formal language. Even though the most common types are already implemented, you will probably end up with a type error/warning if your PDF input is a bit complex.
Caradoc is a good start to clean up the syntax. However, we do not modify the higher-level content (at least for now), to preserve the semantics of the file and avoid inadvertently destroying legitimate features. So yes another converter (e.g. "printing" towards PDF/A) can be a complement to remove all kinds of features. One day we might implement in Caradoc a more thorough converter that only keeps the core graphical content (similarly to the "print" feature that you mention). Also, bear in mind that some errors are ambiguous, e.g. they are interpreted differently by distinct PDF readers. In that case, the choice made by Caradoc is to reject the file as "unrecoverable".
If you trust pdfcreator for being robust against malformed files it's also a good start.
In principle, "caradoc cleanup" gets rid of polyglot files, by converting the low-level syntax. But the original polyglot needs to be close enough to a PDF file for the normalizer to work. It depends if you want a large coverage and accept weird polyglots or be more strict about the inputs you get.
I don't really understand what you mean here. Correct the errors manually?
There's no reason why it shouldn't work in a web environment. But the converter must be robust enough to not become a threat itself.
PDF/A is a subset of the specification that may be relevant, but similarly to input restrictions of Caradoc that can be a problem, PDF/A may damage interesting files, depending on your use-case / the features you want to support. Also, PDF/A conversion is somewhat orthogonal to the syntax sanitisation done by Caradoc, as PDF/A cares mostly about higher-level features (e.g. embed all fonts inside the file) (I am not an expert in PDF/A though, as it is yet another quite large specification). So "PDF/A printing" and "caradoc cleanup" are complementary operations. Thanks again for your feedback! |
Hello,
I have a problem with a pdf.
It is detected malformed by an antivirus and I wanted to know at what level it does not respect the pdf structure.
I also think your tool will be able to clean it. Can you tell me how?
thanxs for your help !
./caradoc cleanup ../PDF_MALFORMED/KO/1/1.pdf --out ../PDF_MALFORMED/KO/1/2.pdf
PDF error : Expected positive integer in object trailer at entry /Prev at offset 1872031 [0x1c909f] in file !
thats the end of the pdf :
<< /Pages 1 0 R /Type /Catalog >>
endobj
xref
1 5
0001871801 00000 n
0000000208 00000 n
0001871655 00000 n
0000000012 00000 n
0001871861 00000 n
trailer
<< /Prev 0 /Root 5 0 R /Size 6 >>
startxref
1871913
%%EOF
The text was updated successfully, but these errors were encountered: