Hi Friends, This is a small effort to explain "How to convert HTML file to PDF through java program ?"
Lets start step by step approach,
The open source java project "flyingsaucer (XHTML Renderrer)" helps us to convert XHTML file PDF file.
But, usually the input that we provide will be in HTML format instead of XHTML.
So, the first step that we need to follow is to convert HTML file to XHTML.
JTidy (another open source java project) helps us to do the above conversion.
Steps to convert HTML to XHTML:
1. Download JTidy from http://sourceforge.net/projects/jtidy/files/JTidy/r938/jtidy-r938.zip/download url.
2. Extract it - You will find jtidy-r938.jar
3. Sample.html (HTML file content). Save this in C:\ as sample.html file name
<html>
<head>
<body>
<h2>JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer.</h2>
<p> Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. <br>
In addition, JTidy provides a DOM interface to the document that is being processed, <br>
which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
</body>
</html>
5. When you execute it will return the following XHTML output:
6. You can observe the changes in the above output (in bold color)
Lets start step by step approach,
The open source java project "flyingsaucer (XHTML Renderrer)" helps us to convert XHTML file PDF file.
But, usually the input that we provide will be in HTML format instead of XHTML.
So, the first step that we need to follow is to convert HTML file to XHTML.
JTidy (another open source java project) helps us to do the above conversion.
Steps to convert HTML to XHTML:
1. Download JTidy from http://sourceforge.net/projects/jtidy/files/JTidy/r938/jtidy-r938.zip/download url.
2. Extract it - You will find jtidy-r938.jar
3. Sample.html (HTML file content). Save this in C:\ as sample.html file name
<html>
<head>
<body>
<h2>JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer.</h2>
<p> Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. <br>
In addition, JTidy provides a DOM interface to the document that is being processed, <br>
which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
</body>
</html>
4. Write and Execute the following program, which will convert the above HTML file to XHTML
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import org.w3c.tidy.Tidy;
public class HtmlToXHtml {
public static void main(String[] args) {
File file=new File("C:/sample.html");
InputStream is=null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Tidy tidy = new Tidy(); //HTML parser and pretty printer.
tidy.setXHTML(true); //true if tidy should output XHTML
tidy.parse(is, System.out);
}
}
5. When you execute it will return the following XHTML output:
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 10 column 1 - Warning: inserting missing 'title' element
InputStream: Document content looks like HTML 2.0
2 warnings, no errors were found!
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" />
<title></title>
</head>
<body>
<h2>JTidy is a Java port of HTML Tidy, a HTML syntax checker and
pretty printer.</h2>
<p>Like its non-Java cousin, JTidy can be used as a tool for
cleaning up malformed and faulty HTML.<br />
In addition, JTidy provides a DOM interface to the document that is
being processed,<br />
which effectively makes you able to use JTidy as a DOM parser for
real-world HTML.</p>
</body>
</html>
6. You can observe the changes in the above output (in bold color)
You can also use Java PDF Library like Aspose.PDF for Java which can convert HTML( complex page) to PDF and vice versa. It is not free but you can use their trial version to know how this library works and check out their sample codes.
ReplyDeleteHave you used this JPG to PDF converter. Not only JPG to PDF converter, this company also provides HTML to PDF converter free of cost.
ReplyDeleteNice and precise blog! You can use pdf library. We also offers best html to pdf converter. Visit our website for more details. https://www.hiqpdf.com/
ReplyDelete