Inserting HTML into word documents

Document generation is one of the most useful things that any system can do for users as basically users tend to think in terms of Microsoft word and Excel documents and in my opinion the best library for producing them via Java is http://www.docx4java.org/

This library (I am using version 2.8) allows you to build everything from scratch, but its far easier to start with a template document normally supplied by a client and just substitute the values you want

so lets do that:

First create your empty “WordprocessingMLPackage” package which is the holding object for your word document, and create a normal java File object using your template file

Then load that file into your WordprocessingMLPackage

WordprocessingMLPackage wordMLPackage;
File templatefile = new File("C:\mytemplatefile.docx");
wordMLPackage = WordprocessingMLPackage.load(templatefile);

Ok, so now we have a word document, but its just a generic document its not personalised to our clients needs, to personalise it we first need to put placeholders in the template for our data to go in, these are just text in the word document in the format ${xxxxxxx} as you can see below

docx4j1.png

Once these are in you can use them to substitute your text
first build and populate yourself a hashmap of the values you want substituting

HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("Replace_Tex1", "This is some custome data");

As you can see the first part needs to be the placeholder name and the second part the value you want to be inserted.
Once that is ready you can do the substitution, but first you need to convert your existing WordprocessingMLPackage to XML so you have something easier to work with

String xml = XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true);

with the resultant xml and the hashmap we can use the XmlUtils.unmarshallFromTemplate function to do the swap then stuff the result back into the WordprocessingMLPackage

wordMLPackage.getMainDocumentPart().setJaxbElement((org.docx4j.wml.Document) XmlUtils.unmarshallFromTemplate(xml, mappings));

And that is us done we can just “SaveToZipFile” (basically docx files are just renamed zip files with XML inside them) and save the file to the fie system

SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
saver.save("C:\finaldocument.docx");

But where you ask was the html insertion you promised, you have just shown us plain text, where is the formatted html inserts.
For that dear reader with need to add a bunch more code
There is no built in function to do that for HTML so we are going to have to:
1. Get a list of all the object (such as paragraphs, lines of text etc etc) in the WordprocessingMLPackage
2. Search though then to find the location of the text we want to replace (the placeholder)
3. Remove the Place holder text.
4. Add a wrapper object to our HTML, convert it to DOCX XML and insert it at the correct place.
for No.1 we will pinch from the fabulous Jos Dirksen’s [Create complex Word documents programatically with docx4j](http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j) to get our list of objects.

private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }
    return result;
}

That done we will use that function to go though and search each object for the placeholder text and if its found return the integer that tells us its location in the document (and while its at it, it removes the place holder text)

private int findPlaceHolder(String placeholder, WordprocessingMLPackage template) {
    int index = 0;
    List<Object> paragraphs = getAllElementFromObject(template.getMainDocumentPart(), P.class);
    P toReplace = null;
    for (Object p : paragraphs) {
        List<Object> texts = getAllElementFromObject(p, Text.class);
        for (Object t : texts) {
            Text content = (Text) t;
            if (content.getValue().equals(placeholder)) {
                toReplace = (P) p;
                index = template.getMainDocumentPart().getContent().indexOf(toReplace);
                break;
            }
        }
    }
    if ( toReplace != null ) {
        //remove placeholder
        ((ContentAccessor)toReplace.getParent()).getContent().remove(toReplace);    
    }
    return index;
}

This finally gives us a way to insert the HTML at the correct place, so we do our mapping again, but this time with HTML rather than plain text

mappings.put("Replace_Tex1", "<b>This is some custom html data</b>");

Then loop though the HashMap, finding the correct location, importing the HTML and inserting the HTML into the WordprocessingMLPackage using the addall function.

Iterator iterator = mappings.entrySet().iterator();
while (iterator.hasNext()) {
    Map.Entry mapEntry = (Map.Entry) iterator.next();
    String xhtml = "<div>" + mapEntry.getValue().toString() + "</div>"; 
    int locationOfItem =  findPlaceHolder(mapEntry.getKey().toString(), wordMLPackage);
    if (locationOfItem > 0) {
        wordMLPackage.getMainDocumentPart().getContent().addAll(locationOfItem , XHTMLImporter.convert(xhtml, null, wordMLPackage)  );   
    }
    xhtml = "";
}

Thats it you can now insert any HTML you want anywhere you want into an existing Document
You can see that I had to wrap the HTML in a “DIV” (any root element will do but DIV is easiest), you have to do this or the XHTMLImporter will fail its validation, also the smallest item the HTML will import in as is a Paragraph so you can not use this method to change text and formatting inside an existing line, you will always get a paragraph break before and after (not really a limitation given the use cases)
As always yell if anything seems off.

Leave a Reply

Your email address will not be published. Required fields are marked *