以文本方式查看主题

-  中文XML论坛 - 专业的XML技术讨论区  (http://bbs.xml.org.cn/index.asp)
--  『 WORD to XML, HTML to XML 』  (http://bbs.xml.org.cn/list.asp?boardid=13)
----  CambridgeDocs - 一个Word to XML工具  (http://bbs.xml.org.cn/dispbbs.asp?boardid=13&rootid=&id=14748)


--  作者:admin
--  发布时间:2/24/2005 12:13:00 AM

--  CambridgeDocs - 一个Word to XML工具
CambridgeDocs Technology Overview:   
   Driver for Microsoft Word Paragraph Content (text)
Styles Information (style of text of paragraph and of text runs within paragraphs)
Formatting information (font, font-color, font-size) of paragraph and of text runs within paragraphs that deviate from "Style" setting
Paragraph Format Information (leftindent, rightindent, spacebefore, spaceafter)
Frames (text frames can be extracted as a block-level <FRAME> tag, which has the contents within it).  Specific information about the location of the frame on the page (x,y coordinates) can be extracted (if Pagination = true).
Images (bitmap images within the Word document can be extracted).  Specific information about location of the image on the page (x,y coordinates) can be set (if Pagination =true).
Superscripted text in-line is extracted and noted, including reference to footnotes.
WordArt (extracted as WMF files)
Lists -numbered lists and bulleted lists are identified
Page Breaks - hard page breaks are inserted as block level items
Word Fields - word fields can have either their text extracted by itself, or you can have <FIELD> tags as in-line tags, with a field code, as well as the content of the field.
Tables - table information is extracted, including background color, column-widths, row-height, colspan, rowspan, table-border (at the level of each cell), including border-color.
Pagination - pagination can be set to true, in which case the entire document is divided into <PAGE> tags.
Footnotes and EndNotes can extracted (they all become endnotes in the XML version of the document and are automatically renumbered)
Page Headers and Footers can be extracted, as <HEADER> and <FOOTER> elements.

The Microsoft Word driver built by CambridgeDocs was meant to extract as much information as possible from a Microsoft Word (.doc, .rtf, or other) file into XML.  This includes the content, the formatting and stylistic information, layout information, and graphics information. We refer to this as "non-lossy", because many of our customers want to use XML for multi-channel publishing, which means that after the conversion to XML, they may want to reconvert to HTML, to PDF, etc.

Depending on your needs, you can set options on or off for specific bits of information.  Our XML conversion also includes a pagination option, which preserves the pagination of the original document (especially useful for pages which have text frames and images positioned exactly on the page).

  
Word Driver FAQs

What XML format an I convert my Word documents into?

The driver initially converts into ppXML, our "intermediate format".  You can then  convert into any further XML schema you like, including DocBook, LegalXML, or into your own custom DTD/schema using an XSLT, or by using the extraction and transformation rules of the xDoc Converter, our flagship product.

What format can I render it into?

We provide an XSLT that can be used to convert it further - into XHTML so that it can be viewed in a browser.   You can see this in action by going to the "View as HTML" tab of the RUN/DEBUG window in the xDoc Converter, or by applying the XSLT in the XMLSpy plug-in.

We also provide an XSLT that can transform ppXML into XSL:FO, which can be used to create PDF files, RTF files, etc.

Can I do a two-way conversion back into Word?

Yes, you can do a two way conversion - from Word in to XML, and then from XML back into Word using our XSL:FO and RTF rendering capabilities.  The xDoc Submit plug-in for Word will have this functionality built into it.  However, because of some limitations of XSL:FO rendering engines, you may not be able to convert some of the more advanced features of the word driver both ways.



--  作者:zhangshying
--  发布时间:4/25/2005 4:50:00 PM

--  
有TXT文件自动转到WORD文件的工具吗
--  作者:cxh0926
--  发布时间:5/5/2005 10:18:00 PM

--  
顶起。希望高手来解决下,我也正要问的!
W 3 C h i n a ( since 2003 ) 旗 下 站 点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
45.044ms