Word to HTML conversion with JOD and OpenOffice

30 April 2013, by Baptiste Autin

By combining OpenOffice and the library JODConverter, it is easy to add an Office-to-HTML (or -PDF) document conversion service to a J2EE/Spring application.

Here is the procedure to follow:

1. Install OpenOffice on your server
2. Copy the sample class ConvertorJod (see below) in your webapp, and modify your applicationContext.xml accordingly
3. In ConvertorJod, it is important to modify the path leading to your OpenOffice install, as well as the path of the template profile to use.
4. Start your webapp. A process soffice.bin should appear in your processes list, and your logs should look something like this:

	DEBUG - Starting LibreOffice server...
	org.artofsolving.jodconverter.office.ProcessPoolOfficeManager
	INFO: ProcessManager implementation is PureJavaProcessManager
	org.artofsolving.jodconverter.office.OfficeProcess prepareInstanceProfileDir
	org.artofsolving.jodconverter.office.OfficeProcess start
	INFO: starting process with acceptString 'socket,host=127.0.0.1,port=8100,tcpNoDelay=1' and profileDir (...)
	org.artofsolving.jodconverter.office.OfficeProcess start
	INFO: started process
	org.artofsolving.jodconverter.office.OfficeConnection connect
	INFO: connected: 'socket,host=127.0.0.1,port=8100,tcpNoDelay=1'
	DEBUG [localhost-startStop-1] (ConvertorJod.java:48) [] - LibreOffice server started...

Now that your webapp has correctly started, your document service bean is available to your other beans. As the startup of OpenOffice occurs only once (when the application context mounts), conversion times are rather good (about one or two seconds for an average size file).

Note that if you don’t specify explicitly a template folder, a default one will be used instead, and if that one does not exist, the daemon process soffice.bin will not start.
This is particularly important in the case of a webapp, as it is unlikely that your application server runs under an account for which an Office template folder exists.
You might copy-and-paste the folder of an existing user (with Windows, it is stored under C:\Users\\AppData\Roaming\OpenOffice.org\)
Read/write access must also be set on the folder, otherwise you will get the error:

java.net.ConnectException: connection failed: 'socket,host=127.0.0.1,port=8100,tcpNoDelay=1'; java.net.ConnectException: Connection refused: connect

package test.convertor;

import java.io.File;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;

import org.apache.log4j.Logger;
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.document.DocumentFormat;
import org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration;
import org.artofsolving.jodconverter.office.OfficeConnectionProtocol;
import org.artofsolving.jodconverter.office.OfficeException;
import org.artofsolving.jodconverter.office.OfficeManager;
import org.springframework.stereotype.Service;

@Service
public class ConvertorJod implements AbstractFileConvertor {

	protected final Logger logger = Logger.getLogger(getClass());

	private OfficeManager officeManager = null;
	private OfficeDocumentConverter converter = null;

	@PostConstruct
	protected void initOfficeManager() {
		logger.debug("Starting conversion service...");

		DefaultOfficeManagerConfiguration configuration = new DefaultOfficeManagerConfiguration();

		configuration.setPortNumber(8100);
		configuration.setConnectionProtocol(OfficeConnectionProtocol.SOCKET);

		configuration.setTemplateProfileDir(new File("D:\\openoffice\\3"));
		configuration.setOfficeHome(new File("C:\\Program Files (x86)\\OpenOffice.org 3"));

		configuration.setTaskExecutionTimeout(30000L);

		officeManager = configuration.buildOfficeManager();
		converter = new OfficeDocumentConverter(officeManager);

		officeManager.start();

		logger.debug("Conversion service started");
	}

	@PreDestroy
	protected void preDrestroy() {
		logger.debug("Stopping conversion service...");
		officeManager.stop();
		logger.debug("Conversion service stopped");
	}

	@Override
	public void convertToHtml(final File source, final File destination) throws OfficeException {

		DocumentFormat outputFormat = converter.getFormatRegistry().getFormatByExtension("html");    // "html" ou "pdf"

		logger.debug("Converting " + source.getName());

		converter.convert(source, destination, outputFormat);
	}
}

And here is the Java interface of the service (inject it in every business bean that needs HTML conversion):

package test.convertor;

import java.io.File;

import org.artofsolving.jodconverter.office.OfficeException;

public interface AbstractFileConvertor {

	void convertToHtml(File source, File destination) throws OfficeException;

}

Un commentaire sur “Word to HTML conversion with JOD and OpenOffice”

  1. gio says:

    It works pretty good word to html. But html to word with html with img base64 image are not converted in the doc.

Laisser une réponse

«     »