1. Noncommercial Products:
http://www.etymon.com/epub.html
PJX specifically included significantly faster reading and writing of PDF documents, thread safety, "on demand" reading and parsing of PDF objects which greatly reduces memory usage and processing time, incremental update support to enable fast modification of PDF documents, reading PDF documents from either disk or memory, thorough documentation of the class library interface, support for J2SE collection classes and NIO, access to form/field objects, rudimentary support for insertion of images and watermarks, appending of large documents, and design patterns for recursive processing of PDF objects.
http://www.pdfbox.org/
PDFBox is a Java PDF Library. This project will allow access to all of the components in a PDF document. More PDF manipulation features will be added as the project matures. This ships with a utility to take a PDF document and output a text file.
http://www.foolabs.com/xpdf/about.html
Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called 'Acrobat' files, from the name of Adobe's PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities.
http://jakarta.apache.org/poi/index.html
The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java. In short, you can read and write MS Excel files using Java. Soon, you'll be able to read and write Word files using Java. POI is your Java Excel solution as well as your Java Word solution. However, we have a complete API for porting other OLE 2 Compound Document formats and welcome others to participate.
2. Commercial Products:
http://tonicsystems.com/products/
Tonic Systems is the leading PowerPoint® automation specialist. Each of our products has been born from our experience developing solutions to real life business challenges, and each has been proven extensively in enterprise environments.
We have developed our range of products in response to customer demand. These 100% java, server-side products are robust and scalable to meet the needs of the most demanding environments.
http://snowtide.com/home/PDFTextStream/
PDFTextStream is the ideal solution for Java applications and J2EE web services that need to rapidly and accurately extract text and document metadata from PDF files.