Number: DocSII N12
Date: 2002-09-18
Editor: Keisuke Kamimura
Minutes - First Workshop for Asian Document Style Information Interchange ========================================================================= ************************************************************************* 17 September, 2002 Morning session Opening ------- Begin at 10.00am Yawara Tomiyama's greetings --------------------------- Thank you for joining the workshop. Intended to collect the requirements from Asian countries. Address by Executive Director, Akio Kanaya (read by Tatsumi Tanaka) ------------------------------------------------------------------- Pleasure to host the DocSII workshop. Before going into the workshop, let me explain the background of the project. CICC has been involved in creating multilingual IT environment. First step of this, input and output, and char code. This step is almost final. 2002 is the Document Style Information Exchange. Uniqueness of document style in Asian language is to be focused. To bridge the divide that Asian countries face, m-lingual system is necessary, and putting reqs from Asian countries to ISO/IEC processes is quite worthwhile. It is expected that the outcomes of this workshop is reflected in TR in SC34. Hope that this workshop will help IT development in Asia. Introduction of participants ---------------------------- * Yushi Komachi. Chair, DocSII. Convenor WG2, which is * Jasmine Jang (Korea). KIC. Ecommerce, EDI, XML.... SC34/JTC1, ISO TC154 * Virach Sornletlamvanich (Thailand) * Tatsuo Kobayashi (Japan) * Phonpasit Phissamay (Lao PDR) * Lin Chao (Singapore) * Wo, Zhigang (China) ISO JTC1/SC32. EDI, XML, documentation standards. * Keisuke Kamimura * Takayuki Sato ---------------------------------------------------------------------- Chairman's address ------------------ See the preliminary web page. Let us share the background of DocSII. You can access the web page later at www.y-adagio.com/public/committees/docsii/index.htm Interchange of documents on the web pages, which could be multilingual. Documents are structured in a document structure language, like SGML, XML. To read a structure document, you need a style and layout to render the original document. Existing style/layout language include HTML, CSS, but these are limited in function. e.g. Dimension splitters, and nesting itemisation in legislative documents. Some of these layouts and styles can be supported in existing HTML tables in a a"crobatic" way. We need a more sophisticated and interchangeable "style specification" to express these features of documents. To address these issues knowledge of style and layout based on cultural backgound is required. DocSII intends to solve the problem with supports of documents style/layout experts in Asian countries. Scope of DocSII. Three-step process. 1. Collect document style/layout information. Extract formatting objects. 2. Create a style language library 3. Submit the outcome for international approval. ISO/IEC TR 19758, is a preceding example. Now open for questions. Remarks from Takayuki Sato -------------------------- IT is not the only topic for international cooperation. There are lots of issues which need cultural and local attention. Standard is finished. One possibility for the next step will to make products based on the standard we worked on. The other alternative is to depart from characters for a new topic at a higher level. We chose the second alternative. DocSII stands for document style information interchange. Mission carried over from the Katmandu meeting is the "awareness" raising. Characters are so visible that people can readily understand where the problem is, while people tend not to understand the importance of document style. Previous approach may not work. We need a different approach from that of solving character issues. We decided on addressing the issues of Han characters first, and then move on the issues faced in South East Asia. Each year we would like to move "southwards" and on. Homework called "action items" will be given. .... Will decide how Asian Forum will be oraganised (the number of members, the scope). AFIT feedbacks to the organisation of thesecond workshop of DocSII. If attention at AFIT is low, the workshop next year will be as small as it reflects the activeness of AFIT. Please relax and enjoy. ---------------------------------------------------------------------- Country report: Japan (Tatsuo Kobayashi) ---------------------------------------- history of japanese document style. 1. Renmentai. Ligature of hiragana characters. 2. Ruby. Annotation of pronuciation. 3. Demonstration of remnant functionalities in the generation old word processign software. Renmentai ligature. See fig 1. taken from a book ca 1600. Ligature was realised in hot metal printing, which was advanced and sophisticated. New technology imtates features of older technology. "Renmentai" ligature is the imitation of hand writing. Every technolog yhas its optimal solution. Ruby annotation. Ruby annotation was introduce to show the reading of Han characters. But in the course of history, ruby came to be used as a way for sideline annotation in general. Upon the first Unicode meeting in Asia, Jun-ichiro Kida spoke about the tradition of ruby used in Japanese, and it drove Martin Duerst to draft a tag for Ruby in HTML. But this functionality may be helpful for other languages. Remnant functionalities. Double height. Double width. These literally double the height or width of characters. These functionalities are carried over to the latest version of Ichitaro. Why these are introduced? Not very long ago, outlined and scalable fonts were expensive. Instead of changing the size of characters by the point, the dimension was simply doubled. Because of backward compatibility, there functions are held. Document style and literary tradition are restricted by the available technology at the time. Technology may add aestheticity to document style. New technology sometimes imitates and simulates old tech. Needs a certain time to determine its optimised style and expression. Q: How do you define the font size back in those days? A: We did not have the notion of font "size" or "EM", back at the time. Q: How do you put these functionality in HTML or XML? A: Doubled width and height are only kept for the sake of backward compatibility. If we are to put these in HTML or XML, we can add these to layout languages. Once a product is released in the market, the feature persists. The example of the renmentai ligature shows alignment line shifting is required, which is not addressed in the current Wo, Zhigang (China) ------------------- * Chinese policy on IT * Oragnisation of IT standards. * Standardisation trends of document based on xml. China's policy on IT are represented by the following principles. * Timely research on standard. * Trace on technical trends. * Self-determination on drafting standards. * Timely publication of standards. * Timely review of the utilisation of standards. China IT standard technology committee founded in 1983, managed by the SAC, MII. Responsible for the whole nation's IT standardisation. XML in China. China GB/T14814-1993 accords with ISO 8879-1986: SGML. SGML is widely applied in large technology organisations and systems. With the emergence of the Internet, focus moved on to HTML. But, HTML is insufficient to carry variety of business docs. Now XML came in. China National IT Standardisation Technical Committee in 1998 domestic standardisation of XML. China's mandatory srandards are GB2312, GB13000.1, GB18030. Both code declaration and lang attributes should be specified. Besides, you sometimes need t o specify the pin yin in xml:lang. Standardisation trends of document based on XML. A couple of word processers whose format are not interchangeable with each other are out in the market. Barrier for implementing e-government. To bridge the barrier, XML is the key. Q: How about XSL? A: DSSSL is a draft national standard. Not XSL. Q: How about char set in XML in China? A: GB2312 is in use in most cases. Takes time to migrate into Unicode. Q: To lessen the effect of remnant functionality, migration to Unicode should be as early as it can happen. Q: What is the number for the national standard corresponding to DSSSL. A: I will give you the details later. Jasmine Jang, KIEC (Korea) -------------------------- KIEC promotes e-document, XML/EDI, ebXML, EDI. Major issues in e-document standards in Korea. Different appoaches in e-document in Korea. EDMS, intranet groupware, e-Jounal and e-Book, EDI and ebXML, and others. Trends for web based e-document. * Markup: HTML vs XML * Service: Interenet vs Intranet * Objectives: e-business document vs more genuine digital contents XML is standard language of the government in 1999. Used in government official documents. Used in business docs as well. among 30 eMPs industries. In 2002, KIEC announced Korean e Document Standardisation Guidelines (version 1.0) based on ebXML. Metadata is another issue in document standards in Korea. Analysis of collaborative business process. Component based e-business doc, which is the new generation of business inforamtion interchange. Third aspect is legally bound security. The flow of the standardisation of e-business document is as follows. Req analysis of business process. Business Process Modelling. Extraction and selection. Developmenet of e-document. Implementation of System. Component based e-business document. A result of business process modelling. Library (kind of a list of records) for XML. Example of entities. Legally bound security. Important but difficult to achieve. XACL, PKI, SSL, RCA, BCA are among the topics. Other issues include web services, Topic Maps, Semantic Web, XML content management for web services, XML Metadata registry. Broken Korean char, limited support for Korea fonts by major IT vendors. KIEC launched the Basic Semantic Register (BSR). Standard mapping table for sharing semantics. Conclusion. We are at the initiating stage. Issues in flexibility and its control, securities, business collaboration, multilingual issues. Q: What does "broken Korea characters" mean? A: Characters are shown in "???". Outlook Express. Encoding should be UTF-8. Q: Give us more info on semantic registration. Is it like a entity definition? Yes. Basic Semantic Unit is a set of entities. Basic Semantic Component aggregates BSUs. Q: How is the e-business document displayed on the screen? A: I'll give you an example later on. ************************************************************************* 17 September, 2002 Afternoon session Phonpasit Phissamay (Lao PDR) ----------------------------- No document style standards for the Lao languages. Brief report on the Lao language and script. 78 characters, composed of consonants, vowels, alternate consonants, tonal marks. Syllable structures. Characters are positioned at four different levels (heights). Three standards in use for the Lao script. LSWIN, IBM-8 1033, and ISO/IEC 10646. Lao keyboard. More characters than key tops. Needs alteration by the shift key. No Lao language platform. Fonts missing. No spacing between words in a single sentence. Support for switch to Roman when necessary. Four level displaying should be considered on output. LSWin is employed in more than 90 percent of the cases. Keyborad layout is taken from the type writer keyboard. Three varieties of the keyboard layout. Pronunciaton based, character based.... VS: What did you use to produce the handout. PP: LSWin YM: Is it acceptable aesthetically? PP: Yes.... YK: Underline is used in the newspaper, and underline crosses the bottom part of a character. YK: How about Thai? VS: When you draw an underline under characters, you will leave blank where the bottom of a character is sticking out through the baseline. Cao Lin (Singapore) ------------------- Talk about document style in Singapore. Multilingual environment in Singapore. English, Chinese (second largest), Malay and Tamil. GB2313 in mainland CN is most popularly employed. But there are special characters of Singapore origin. Need additions to GB2313. Incompatibility of OS platform make documents in the Chinese language uninterchangeable sometimes. Document style issues in Singpore. 1) Mixture of vertical and horizontal writing. Mixture of vertical and horizontal composition is seen elsewhere other than heading and body text. Inline mixure is also found. Majority of text flows horizontally. The horizontal paragraphs looks more modern than the vertical. 2) Numbering. Only be able to use arabic numbers. Need a library-type of solution that facilitate convenience. 3) Dimension splitter Virach Sornletlamvanich (Thailand) ---------------------------------- Document styles according to the six categories. * External use * Internal use * Sealed documents * Command * Announcement * Certificate Thai digit must be used in official documents. And other rules apply. (See the table in the handout....) The seal of Garuda shows the document is official. The beginning of the date must be aligned at the center of the document. Among official documents, some are signed, and others are just issued annonymously from a department. Command. Three types. Instructions, regulations and rules, where all the leading items are centered, with the rest of the document varying. No Garuda signs for the news (informational documents). Reorganisation of the government is taking place, and it may affect the styles of official documents. Additional issues. Page connection. Line wrapping. Strike-out letters. Q: What is the difference between the left aligned and center-centric documents. Traditionally text is centered. Modern style tends to align text to the left. Alignment of text in Japanese documents reflects honourification? relationship between the receiver and the sender. YK: Letters are a good example to show what document style is. VS: Indentation. Paragraph is designated by indentation. YK: Hyphenation. VS: Small column documents may use hyphens to connect split words. It's not a general convention. VS: When you want to break a word (phrase?) in a sentence, you may want to have a break marker. This raises a problem particularly in HTML. It's not addressed yet. Round table discussion ---------------------- TK: Demonstration of dates, names, and addresses. Date. Year Month Date, whereas Date Month Year in English. Name. Surname and given, whereas given and surname in English. Address. Prefecture City Street, whereas Street City Prefecture (State) in English. TK: JJ: Demonstration of Korean. Postal code comes after the address. Spacing between prefecture and region, region and city, and so on. VS: Demonstration of addressing in Thai. Small-to-large. TS: Let us propose specific technical issues. Focus on addressing. Question. Does spacing between words occur because Hangul is illegible without space? CL: Sometimes two spaces are put between the surname and given name. In Singapore small-to-large principle affect the convention of Chinese. Some people (better educated in English) use the small-to-large format even in Chinese. PP: Small-to-large, but no postal code. YK: 9.9.2002, "2002 nen 9 gatu 9 niti", 11/09/2002. ISO format would be 2002-09-09 (YYYY-MM-DD). WZ: There are two different types of documentation. The first type is official documents. The other is non official, free style documents. We need to focus on either of these. Our problem is the incompatibility among office software product in China, which relates to unofficial documents. YK: Personally I would rather focus on the latter. WZ: Documentation should be an object, and an object should have a number of attributes, such as horizontal/vertical text flow. When we agree on a common set of attributes, we may be able to standardise it as an international standard. CL: Title, addressing is one of the levels which the document have. The order between the surname and given is another level of issues. PP: You can keep it as homework. We cannot answer the question officially now. YK: Final target is formatting. This is just the first meeting. CL: Not much attention has been paid to layouts. TK: Orders of addresses and personal names are not the same even in one country. This is just an example to raise our awareness. Through sharing these information, TS: Our objective is not to find differences, but to find common features. Based on these features we will be able to make standards. Even we can standardise the machine readable document, we will still have the difference in the layout which human 'eye balls' look at. This cannot be covered by a simple XML standardisation. TS: You define the internal order of addresses and names. But to decide where in the document you put those addresses and names is another (layout) issue. YK: If you extract components of addressing and naming, you can process those with XSL and XSLT to get a final rendition. WZ: What we are now discussing are issue of document structures, DTD. YK: Cannot agree. Metadata is not within our scope. (Sorry. I have missed quite a lot of discussion again....) JJ: In Korea, e-govenment effort faces problems because of the lack of common structure. YK: I am aware of those efforts, but I would like to focus on higher level of issues, such as layout. TK: There is another type of document than unstructured and structured documents. That is hand written documents. I would like to incorporate the rich cultural tradition in hand writing into the hosts of our document styles. YK: Unstructured documents are not interchangeable. We may be better off staying with structured documents. JJ: In letter writing, you can find "sender", "reference", "title", and Addition (attention?) are structured, whereas the body is free style, and authored by various word processers, which cause a serious uninterchangeability. TK: Why do you need to attach an application dependent document to an otherwise clean XML file? Why not the body itself is marked up in XML too? or leave it in plain text. JJ: For the sake of beauty. TS: We need to consider how to make the main body richer in expression and layout. Items, such as sender, receiver and subject are eaiser to solve than body. ************************************************************************* 18 September, 2002 Morning session Summary by Chair ---------------- 1. Scope of DocSII * style specifications (e.g. XSL and DSSSL) for structured documents (XML and SGML) * composition styles and layouts for all kinds of documents including historical and business documents 2. Research items on styles and layouts of each country * List, ordered and unordered * Emphasizing mark * Heading * Country specific formatting objects 3. Mailing list for facilitating discussion Round (Square?) Table Discussion -------------------------------- TS: - Lots of confusion yesterday. Most of the time spent in emphasising on the markup, underlying the document. Need want to discuss style and its cultural dependencies. - Vertical and horizontal mixture is a good example for style requirement. If there are matters to be rectified really, we want to focus on such issues. YK: - Could China give us a example on cultural dependent stlye. Such as emphasis mark. Sideline dots and commas. CL: - Enclosing box tells the editor, "please pass away this phrase (leave as this is?)" YK: - What happens when a emphasised portion is separated. WZ: .... YK: - In Japanese, "wari chu" annotation, or multirow inline annotation is employed ina certain type of text. - Is emphasis with dots applicable to roman text? CL: - We may use little triangles at the bottom (in a horizontal flowing text). Not sure about vertical writing. YK: - In japanese, you put emphasis marks on the right hand side of text. TS: - From a standard viewpoint, you may not to put "dot" or "triangle", on the "right hand side" or at the "bottom". Need to be more general. WZ: - May be a matter of personal choice. TS: - Shape of emphasis marks should be selectable. And the position where the marks are put should also be independent. TK: - JIS X 0213 has two emphasis marks, called sesame. Black one and white one, reserved for nesting emphasis. Nested citing also causes similar attention. YK: - Large dot for emphasis means stronger emphasis, and smaller emphasis means less. VS: .... PP: .... VS: Thai only uses underline. YK: - Delimiter like mark is found in newspaper. VS: - That shows a heading and .... TK: - In XML, two ways of making emphasis. One is a tag. The other is quotation marks. YK: - In XML, emphasis is to be marked with. And XSL processes how to render it. WZ: - (Demonstration of underlines and dots at the bottom of characters.) You can have choices over the attributes of a character. VS: - Is it a semantic or font attribute? We need to define what the semantic issues are and what the rendering and style issues are. TS: - You need to tell the recipient of the message that the portion is necessary. YK: - Distinction between font and layout issues are hard to make. VS: - Why can you tell a given symbol shows "emphasis"? You need to know the meaning of the symbol. TS: - For the sake of our discussion, we have enough information. The rendering of emphasis should be left for implementation. If we start with "right hand side emphsis" and "lines and dots at the bottom", we will attached to the specific realisations of emphasis, and we would face trouble in the future. - Story from my experience. When I talked about emphais by dots at the bottom, some people suggest that they should have a separate font for emphais. Is this good? CL: When converting to XML document with MS Word, character-bottom symbols disappear. TS: - To express requirements, say you need overline emphasis, ans say you need various characters, not one or two, for overline emphasis. TK: - Japanese word does not have overline characters. It's cultural dependent. WZ: - We need to define features. Let rendering software express and layout these features. PP: - I have to need what standards define? Then I can come up with examples...? YK: - We want layout and style requirements from each country first. Then we will be able to put them in the standard. .... YK: - What happens to the combination of italics and underlines. How far the underline goes? As far as the top right width? or as far as the bottom right width? TK: - Japanese Word align the end of underline at the bottom right point. Turned out to be arbitrary. In some cases, underline ends under the top right end of the last character. WZ: - Chinese Word align the top right end of a line. .... YK: - Next topic. Headlines. (Confirms the example of each country.) Summing up ---------- YK: - From yesterday's discussion, I came up with three points. - Our scope would be style specification, and composition styles and layouts. - DocSII research focuses on more common formatting objects, such as listing, emphsis and heading, and country-specific formatting objects. Bullets of the list may take various shape. Heading of paragraph is another example. - Homework on indentifying formatting objects. To get an idea what the formatting object is, see the draft of ISO/IEC TR 19758:2002 on the web. ...Whispering...confirming homework items.... YK: - Due the end of November. Report on formatting objects in your country. - (To the secretariat) please create a mailing list. - Please introduce us to appropriate experts in the field. TS: - At default, we work together this year. YK: - Any additional action items. TS: - For today, this is the beginning of the beginning. And most of you are computer experts and you may be happy with what we are now. But we need expertise in "pre-computer age" tradition. PP: - When is it due? YK: - End of November, hopefully. CL: - When is the next SC34/WG2? YK: - November. - What is the plan for next year. TS: - We expect to host a similar meeting to this next September although it depends on the budget. If we get the full budget we request, we may be able to host this workshop in Mongolia. When we have a bit of budget restriction, the second workshop will be hosted in Tokyo, or one of your countries somewhere in September to November. - Next time, we would like to found examples in reality. Go out on the street, and find real life examples. - Tasiro-san will help host a mailing list. PP: - When we need consultancy and advice, who should I contact, Komati-san or Sato-san? TS: - I would like to share your question and ideas with other members on the mailing list. YK: - What would be the last item? ME: - Photograph. [End of workshop]