I have code that takes a Word document’s UNICODE content and puts it (in a raw reformatted html way) onto a web-page hosted by a Unix box that speaks UTF-8
(For many reasons Word’s HTML is not the correct answer)
Anyway – the document contains unicode characters such as smart quotes, Macron characters or user defined bullets. My ideal solution is an automated conversion of UNICODE to UTF-8 that I can drive by VBA. My secondary position is to be able to write code that detects any character that is going to give me grief. This set will be small because we are really only dual language – and all the Māori Macron characters I already detect.
For instance, my current method of handling ‘known’ special characters such as em-dash is to change them to their equivalent UTF-code (&mdash —).