Showing posts with label document. Show all posts
Showing posts with label document. Show all posts

Wednesday, 22 June 2011

reading data from the .Doc file by using Apache POI api

This program simply explains how to read data from the MS wordfile(.DOC) line by line using Apache POI,
what is Apache POI and what is the need i already explain in previous post, you can find that post here
for executing this program we need to download Apache POI api and make jar files  in classpath.

Example
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class NewDocReader {
public static void main(String args[]) throws FileNotFoundException, IOException{

File docFile=new File(“c:\\multi\\multi.doc”); // file object was created
// file input stream with docFile
 
FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); 
// throws IOException and need to import org.apache.poi.hwpf.HWPFDocument;
HWPFDocument doc=new HWPFDocument(finStream);
// import  org.apache.poi.hwpf.extractor.WordExtractor
WordExtractor wordExtract=new WordExtractor(doc);
String [] dataArray =wordExtract.getParagraphText();
// dataArray stores the each line from the document
for(int i=0;i<dataArray.length;i++)
{
System.out.println(“\n–”+dataArray[i]);
// printing lines from the array
}
finStream.close(); //closing fileinputstream
}
}

Tuesday, 21 June 2011

Apache POI api for Java people

It is one of the great tool from the Apache,

1st of all i want to write about what is POI?,why we need POI?.

What is POI?
“Poor Obfuscation Implementation”

Why we need POI?
reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel using Java.
when i was trying to read the data from the .doc files by normal datainputstream reader , i get some garbage data along with original data.
then i startred searching for resolving this issue..at last i found that this API is very useful to work with microsoft formats.
i worked with only .doc file for  reading data line by line , in next post i will explain how to do that.
you can find more info regarding apache POI at WIKI and at apache project site .
Apache POI api  download from here.