A recent discussion with a customer brought me to the point rewriting the existing DOCX import into Plone (as it exists in the "old" Produce & Publish Authoring Environment) from scratch. The new functionality provides basically the following:
- an import form that can be called in the context of a Plone folder object
- the import form allows you to upload a Word DOCX document
- the conversion process in the background will send the file to the Produce & Publish Server where is it converted to XHTML using unoconv
- the generated XHTML document is split into the contents and styles
- the html-ish context is stored as Plone Document, the style sheets are stored as Plone files, images are stored as Plone images
- a dedicated view will render the html-ish context together with the extracted styles from the DOCX document
The conversion result actually looks reasonably well.
A nice side-effect: the names of DOCX styles(?) ("Formatvorlage" in German) are preserved as CSS class in the HTML. So when the original DOCX document contains a paragraph marked as "address" we will have <p class="address"> in the HTML. This makes it easy to apply global styles to common CSS classes as used for example as part of a company-wide DOCX template.