How To Convert PDF Documents to XML ?
Hello, I Have a of lots of Pdf Document That I want To Convert to XML Format I just need to automatically convert the incoming PDF files to XML on a server (automating Acrobat Standard's "SaveAS XML" function) please Suggest me What To Do thanks in Advance For Your replies
Re: How To Convert PDF Documents to XML ?
Hello, you Can try PDF XML Converter(P2X) Which extract the text information from the pdf file and output them into a xml file. All the functions were encapsulated into a COM component, the exposed methods/interface is as same as PDF Plain Text Extractor(P2T), but the output file is in XML format. You can integrate it into your own application and redistribute it royalty free. The output XML format was defined in PDFDocument.xsd
Output XML sample
Quote:
<?xml version="1.0" encoding="UTF-8"?>
<PDFDocument>
<PDFInfo>
<title><![CDATA[ PDF Reference ]]></Title>
<Subject><![CDATA[PDF Reference 1.4]]></Subject>
<Author><![CDATA[Smith.H]]></Author>
<Creator><![CDATA[PDF Writer]]></Creator>
<Producer><![CDATA[Adobe Acrobat]]></Producer>
<CreateDate><![CDATA[2002/06/15]]></CreateDate>
<KeyWords><![CDATA[PDF Reference]]></KeyWords>
</PDFInfo>
<Pages>
<Page>
<PageNumber>1</PageNumber>
<PDFElement>
<Coordinate_X>12</Coordinate_X>
<Coordinate_Y>34</Coordinate_Y>
<DataString>
<![CDATA[
Hello, this is a data chunk with
special chars "~@@^%^$(^#\''"'and
line break.CDATA will deal with
this kind of data perfectly.
]]>
</DataString>
</PDFElement>
.
.
.
</Page>
.
.
.
</Pages>
</PDFDocument>
Download It From Here
Re: How To Convert PDF Documents to XML ?
Hello , After fooling around with several shareware programs that only convert the first few pages of an Acrobat file, or only work for a few days, I found an open source utility on sourceforge that worked so nicely that I wanted to give it wider publicity among people who might find it handy. See http://pdftohtml.sourceforge.net and http://sourceforge.net/projects/pdftohtml. A Windows binary is available.
Re: How To Convert PDF Documents to XML ?
The Investintech PDF-to-XML Conversion Software Development Kit (SDK) is a collection of methods compiled, linked and stored in a dynamic-link library (DLL) file that is required for application development. The purpose of these methods is to convert files from the Portable Document Format (PDF) to an Image (Bitmap, JPEG, GIF, PNG, and TIFF).
The PDF-to-XML Conversion SDK can be used via COM API to support VB, .NET, Delphi, C/C++ applications.
Download From Here