How to Create Templates

Easy template editor

The core of Word Cleaner is the powerful template system. Here you can edit the built in templates, duplicate templates, and create your own templates. To enter this section just click on the radial button next to Templates list on the main screen. Remember to first select the template you want to edit/duplicate/delete by selecting a template from the drop down list.

T:\Documents\websites\zapadoo-V2\helpfilev4.5\images\tip.gifTip: Remember you can edit or duplicate the templates that come with Word Cleaner.

In the easy editor screen all you need to do is click on an option on the right and select the options you need.

 

You can also preview how your template will work. At the top you got the Template Preview tab (3rd tab). There you can select a file to convert and preview its HTML code and see how it look in the IE browser. You can go back to Easy Editor, do some changes and go back to Template Preview, click Preview file button again to see how Word Cleaner will convert your document.

 

Template Overview

Converting format (1): Word Cleaner can save files in different formats. The main conversion format is XHTML 1.0, HTML5 and HTML4 but Word Cleaner can save files as text files (txt) or save document pages as graphic images files (jpg, png, gif, bmp, wmf). When saving to images you can also create either a single or separate html file for each page saved as image. ‘Create an index HTML file for converted output file’ option is useful when you convert many files and you want to have a list with <a> links referencing all converted files. Word Cleaner will create index.html file in the same folder as your source documents are located. Later you can upload all converted html files to a website and use that index file links on a web page.

What conversion engine should I use? (2) Always try our built in Apollo converter first.

Output file naming (3): One of the cool features of Word Cleaner is the ability to automatically rename output files to make them web friendly. By default Word Cleaner will save documents with web friendly file names, you can turn it off by unticking ‘File name web friendly format’ checkbox.

By default all spaces will be changed to dash char ‘-‘ (Space replacement) but you can specify your own replacement character. All special chars (not URL friendly) will be deleted from file name. Word Cleaner will try to convert language specific characters to its equivalent in ASCII character set (eg: äöü.doc will be saved to aou.html file).

You can also control the case of output file names. Word Cleaner can save file names without changing it or can save to lowercase or uppercase letters.

‘Create new output file each time you convert’ option is useful if you want to create a new file each time you convert the same document. It can be useful if you want to compare difference in files to see how different templates create different output content. If you tick this then if the converted file name already exists in the folder word cleaner will not overwrite it. For example if your word file is called aboutus.doc, and aboutus.html exists in the same folder then word cleaner will rename the new converted file to aboutus_1.html. Turning this option off will overwrite the existing html file so be careful.

By default Word Cleaner will save files with .html extension but if you want to you can set own extension.

HTML input file options (4): ‘Backup html files before cleaning if input and output file names are the same’ – Ticking this will make Word Cleaner create a backup of any html file before it cleans it if input and output file html names are the same. It will prevent overwriting input html files.

‘Process HTML input file by conversion engine’ – If you tick this option your HTML will be processed by either our Internal Converter or MS Word (depending on the option you tick above). This means you can use all the customisation options in the templates.  If you do not tick this option then options like: Image Output Folder, CSS, Image & Meta data options will not work. The reason you would untick this option would be if you find the Internal Converter or MS Word conversion process is affecting your HTML in a bad way. If you untick this option then we will basically only run find and replace/delete commands on your files, so it’s a safer option if you find you have problems with code being changed.

Template notes (5): here you can write your own reminder notes about the template you’re creating.

Conversion Options

Options for all engines

Ignore case – If you tick this option, Word Cleaner will ignore the case when looking for code. For example if ignore case is turned OFF if you ask Word Cleaner to find the code <p class=”HEADING_1″> it would only find <p class=”HEADING_1″> not <p class=”heading_1″> note how in the second example HEADING_1 is in lowercase.  Turning ignore case on means that Word Cleaner will ignore that case of the code, and in our example it would find <p class=”HEADING_1″> and the lowercase <p class=”heading_1″>

Correct HTML structure before converting – With this option ticked Word Cleaner will be fixing problems with code structure before converting with template. Please use this option only if you know that input HTML may have structure problem, otherwise do not tick this option.

Body content only – useful for importing into a CMS – If you tick this option, Word Cleaner will remove the head section of the HTML, and just leave the body tag content. This is useful if you want to paste HTML into a content management system (CMS) or a template.

Clean HTML Conversion – This will remove all the formatting, e.g. styles, font size, font type etc. This is a good option if you want really clean HTML or you intend to use your own CSS styles. Note it will not remove bold and italic formatting.

Convert bold and italic tags to strong and em – With this option ticked Word Cleaner will convert <b> tags to <strong> and <i> to <em>. Strong and em are the new standard for bold and italic.

Delete empty lines – Word Cleaner will delete all empty lines found in the HTML file.

Apollo engine options

Convert web addresses and emails to links – Ticking this option will convert any web or email addresses to clickable hyperlinks. This option only works with the Apollo conversion engine. Please note that using this feature may cause longer conversion times.

Preserve table layout when saving as plain text – Ticking this option helps to preserve the table layout when saving to text format. Eg data in columns will stay on the same row.

Convert field codes to plain text – Converts any document field codes into static text.

Headers and footers – Control how document header and footer are converted. You can set ‘None’ and headers and footers will not be exported. ‘Per section’ – primary headers and footers are exported at the beginning and the end of each section. ‘First and last’ – primary header of the first section is exported at the beginning of the document and primary footer is at the end.

MS Word Engine Options

Clean MS Word HTML – With this option ticked Word Cleaner will remove any unneeded Word specific HTML code from your existing MS Word HTML file. You don’t have to use this option for the Apollo engine or if you convert files in doc/docx/rtf/odt format.

Output Folders

Output Folder Options

Place converted files in the same folder as the files to convert – Word Cleaner will save all converted files in the same folder as the files to be converted.

Place files in a specific folder – You can specify your own output conversion folder. Click Browse button to select output folder or type/paste relative (e.g.: ‘my converted files’) or absolute (e.g.: ‘D:/My converted files’) folder path in text box for that option.

Image Folder options

Place images in the same folder as the converted files – this default option of Word Cleaner will save all images in the same folder as converted output file. Each converted file will have a separate folder with images. Folder name and images file names are based on the converted file output file name.

Place images in their own folder in the same location as the converted files – set your own relative images folder name (e.g.: ‘images’). Please note that this folder will be used for all converted document image files.

Place images in this specific folder – set your own absolute images folder name (e.g.: ‘D:/images’). Please note that this folder will be used for all converted document image files.

Image Options

Image options – if there are images in your Word File you can select what image format you want to convert them to. The options are: auto (default), jpg, png, gif, bmp or wmf.  Auto images save type means that conversion engine will decide which format use when saving image files – jpg will be used to save photos and png will be used to save other image files (charts, transparent images etc). There is also an option to control the jpg compression level, the higher it is the better quality the image will be, but the file size rises in accordance. Generally for photos you should use jpg, for clipart, graphics etc use png.

Highest image quality– generally it should increase the quality of images. By default DPI set to 96dpi, but you can change this in the drop down selector. 96dpi is the recommended image quality levels for web images, any higher and your images will be large and slow to load. The high quality option will slow down conversion slightly, so if you are converting lots of files and image quality is not important to you, you can experiment with turning this option off.

Getting small images?  Try turning off the highest image quality option.

Keep <img> tag height/width attributes – this option will make Word Cleaner keep the height and width image attributes in the HTML. Un-ticking it will remove those attributes from the converted HTML.

Embed images – You can embed images directly into your HTML file so you do not need separate image files – this is a great way to make files self-contained. Please note this feature is only support by newer browsers like: Firefox 3 and above, Google Chrome, Safari or Internet Explorer 8 and above.

Tip: Consider experimenting with png, as it is now supported by all major browsers e.g. IE5 and greater, Firefox, Opera, Safari etc.

CSS Options


No CSS – If you tick this option, no CSS will be used in the HTML file. Basic CSS formatting like bold and em will still be kept.

Inline CSS – Inline is where all CSS is put into tags style attribute of the HTML file tags, not in <style> tag.

Normal CSS – The CSS is put into the head <style> tag section of the HTML file and sometimes in tag style attribute.

Save CSS in a separate file – this will put all the CSS into a separate CSS file and link it to the HTML file.

 

Save CSS in separate custom folder/file name – this will put all the CSS into a separate custom CSS file name that you will specify and link it to the HTML file.

‘Delete CSS rules not being used in HTML’ option is useful in Normal and both Save CSSS options to remove any CSS rules that are not used in output html.

You can add your own CSS files to converted html by entering each file names per line in a text box below:

In the box above only enter the CSS file name and location e.g. style.css or /css/style.css. Word Cleaner will automatically add in the full CSS tag and link it to the HTML file. Note: For Save CSS in separate file options Word Cleaner will already contain a CSS file link tag.

You can add your own CSS rules to styles section or file by adding CSS in the text field below:

CSS Find/Replace

CSS Customization options allow you to find and replace, rename or delete any CSS rule names in the html document that is being processed.

 

Step 1) manually enter CSS rule names in the Find text box or you can load rule names from a document by clicking the ‘Get CSS Rules From Document’ button. The dropdown list on the left side of that button will be filled with CSS rule names that you can select.

Step 2) enter new replace with CSS rule name or leave it empty in order to delete it from the CSS rules. You can get CSS rules from external an CSS styles file or HTML file that contains <style> tag with CSS rules by clicking ‘Get CSS Rules From CSS File’ button.

Step 3) click ‘Add to Find/Replace List’ button.

Page/Tag Splitting

 

Split per tag – This will split the page based on a tag for example h1. If you select h1 Word Cleaner will look for the h1 tag and place all the code from the h1 tag onwards, into its own file.

By default we will name split files with the file name and the tag inner text. For example if your file is called catalogue files will be called catalogue_introduction.html, catalogue_products.html etc. If you tick the numbered output file names then the names would be catalogue_1.html, catalogue_2.html etc. This feature is setting Custom page title option to #PAGESPLITTAGTEXT# costant under Medadata / Page Title tab, you can add any custom text before or after this constant string.

Split per page – This will convert each page of your document to its own html file. For example if your text file has 4 pages, Word Cleaner will create 4 separate .html pages, one for each page.

Create index file – You can create an index file with links to the files created using the split page option. This option is in the Template Overview tab – Converting format section.

 

Encoding

Load and Save file encoding – these options allow you to set the input and output file character encoding. If you don’t need to change encoding just leave these options set to the AUTO value.

Save files with UTF8 BOM marker’ – If you tick this option, Word Cleaner will save all files with UTF8 special marker (special chars at the beginning of text file) that is used to detect UTF8 files encoding.

Notify me if AUTO encoding cannot determine encoding from html file’ – If you tick this option, Word Cleaner will notify you if input file encoding cannot be determined.

Convert special chars to HTML Entities – these options will convert output file non ASCII or special chars to their equivalent html entities. If we take as an example the copyright symbol © this as a numbered entity would be: &#169; and as a named entity it would be: &copy;. If you do not know what option to select then just select do not convert.

By default all Word Cleaner template files are encoded as UTF-8. Most English Web Pages are encoded to UTF-8. If you are cleaning existing HTML files that are encoded differently e.g. ANSI then you will need to change the template encoding setting to match the encoding of the file you are cleaning (opened template encoding will be automatically changed to selected Load Encoding).

Please do not modify Load Encoding in Advanced Editor because the template code will not be converted to new encoding – please use Easy Editor Encoding section – Load Encoding option.

Delete Tags/Attributes

Remove HTML Tags – this will delete all references of the tag in the file, but it will leave the content that was in that tag. To delete the content in the tag tick the remove tag with content option.

Remove all tag attributes – If you added the tag p it would remove all attributes of the p tag. For example: <p class=”aboutustxt”> would become <p>

Remove attributes globally – If you added the style attributes then it would delete all reference of the style attribute across all tags in the file.

Remove empty tags – This option allows you to remove any empty tags. For example if you enter span it will delete all empty span tags. Ticking the remove all empty tags options will automatically delete any empty tags in the file without you having to specify the tags.

Find and Replace/Delete

These options allow you to find and replace or delete any text or regular expression.  For instance if you want to find all <p…> tags with all attributes in html file and replace it with <p> you have to use regular expressions: <p[^<>]>. Regular expressions are a very powerful tool when doing find and replace in any text files.

Regular Expressions Support

T:\Documents\websites\zapadoo-V2\helpfilev4.5\images\tip.gifWe support .NET Regular Expressions (RegEx) in commands, see this page for more information.

 

Metadata/Page Title

Metadata options – allow you to add/edit the metadata information contained in the head section of the html file. Please note that if your document does not have any metadata then we cannot add it automatically. For example if you tick the author option and enter the text ‘Mark Smith’ then this will be inserted into the HTML as <meta name=”author” content=”Mark Smith />

Page title options – If you use it Word Cleaner will change the <title>…</title> tag in the html. For example if you enter about us it will change the title to <title>about us</title>. You can enter a custom title or you can get Word Cleaner to use the file name for the title text. For example if your file is called ‘my file.doc’ then the title tag will be <title>my file</title>.

Please note that metadata variables can be used for the metadata section, on Find and Replace/Delete and in Advanced Editor. Variable names that can be used as parameters:

#TITLE#, #AUTHOR#, #SUBJECT#, #KEYWORDS#, #CATEGORY#, #COMMENTS#, #COMPANY#, #CREATIONDATE#, #LASTSAVETIME#, #MANAGER#, #PAGES#, #REVISIONNUMBER#

Example command:

add_html_after_opening_tag(‘HEAD’,'<meta name=”title” content=”#TITLE#”>\r\n’);

E-book Options

Word Cleaner can create an ebook in the epub format based on a converted html file. Just enable ‘Create ebook in Epub format’ option and specify optional cover image.

Cover image: this should be in jpg, gif or png. Optimal size for kindle is 600 pixels wide by 800 pixels tall. You need to resize the image before selecting it.

Table of contents: you need to create the table of contents in your Word file before you convert your file. Please refer to the Word help file on how to do this. We will use this table of contents to generate the table of contents for the ebook.

CSV Export Options

Word Cleaner can export converted files to the CSV format that you can then import to a database or Excel. For instance you can configure template embed images in HTML and split by h1 tag and create a CSV file in WordPress posts table format.  You can import that CSV file with a WordPress CSV Import plugin.

Please note that this feature is only available in the Word Cleaner Business version.

Custom C# Code

Custom C# code can additionally process converted content in any way you want. It’s a very powerful feature in Word Cleaner for users with C# and .NET programming knowledge. For instance use can use C# code if you would have some special tags in your document that you would like to dynamically replace with your text or values (e.g.: current date/time, counter values, some generated text/strings etc).

Please note that this feature is only available in the Word Cleaner Business version.

Replace Header/Footer

Specify your own HTML header (from html file start to <body> tag) and footer (from </body> tag till the end of file) sections.

Please note that this feature is only available in the Word Cleaner Business version.

HTML Before/After Tags

Add any HTML before and after tags in converted files. For instance you can add custom metadata, style links, script links in <head> tag or add some JavaScript code before the end of </body> tag.

Please note that this feature is only available in the Word Cleaner Business version.

 

Advanced Template editor screen

Need help? We can create your custom templates for you. Just tell us what you need it to do and we can code it. Basic advice is free, for more complex work there may be a small fee. Contact us for details.

For advanced users we have the advanced template editor. You access this by clicking the tab at the top of the template editor screen.

The first thing to be aware of is creating templates is not as complicated as it first looks. We have created a custom template system that should cover the needs of most users.

At a basic level the template system should be considered like a find and replace system. You can look for a specific code then edit or delete it.

We support .NET Regular Expressions (RegEx) in commands, see this page for more information.

On the screen grab below you can see the template editor screen, it has several main parts:

Command area: this is where you create the command you want to add to the template.

  1. The first step is to select a command from the drop down list. For example say we want to delete the span tag for a document. We select the delete_tag command.
  2. The second step is to add the parameters. For example if we enter the span tag. You can enter the tag manually or select it from the drop down tag/attribute list.
  3. The third step is to add the command to the template by clicking the Add Command button.

Edit/delete a command: to edit or delete a command you should first click on it in the editor area then click on the Edit or Delete Command button.

Template preview area: you can test out your template by selecting a test file in Template Preview tab. This enables you to easily tweak and test your template until you are happy with it.

Editor area: here you can directly edit the template commands. Pressing control and space will show you a list of commands.

In the screengrab example below there is a sample command: delete_tag(‘span’); – this will delete the span tag from your file.