Changes to Zope Developers Guide, Chapter 2, Object Publishing * Add the following before the section 'HTTP Responses' under 'Stringifying the published object' Character Encodings for Responses If the published method returns an object of type 'string', a plain 8-bit character string, the publisher will use it directly as the body of the response. Things are different if the published method returns a unicode string, because the publisher has to apply some character encoding. The published method can choose which character encoding it uses by setting a 'Content-Type' response header which includes a 'charset' property (setting response headers is explained later in this chapter). A common choice of character encoding is UTF-8. To cause the publisher to send unicode results as UTF-8 you need to set a 'Content-Type' header with the value 'text/html; charset=UTF-8' If the 'Content-Type' header does not include a charser property (or if this header has not been set by the published method) then the publisher will choose a default character encoding. Today this default is ISO-8859-1 (also known as Latin-1) for compatability with old versions of Zope which did not include Unicode support. At some time in the future this default is likely to change to UTF-8. * Inside the section 'Argument Conversion' is a list of type conversion marshalling tags. Insert the following definition of 'ustring' under 'string' ustring Converts a variable to a Python unicode string. * and insert this definition at the bottom of the list ulines, utokens, utext like lines, tokens, text, but using unicode strings instead of plain strings. * Insert this section before 'Method Arguments' Character Encodings for Arguments The publisher needs to know what character encoding was used by the browser to encode form fields into the request. That depends on whether the form was submitted using GET or POST (which the publisher can work out for itself) and on the character encoding used by the page which contained the form (for which the publisher needs your help). In some cases you need to add a specification of the character encoding to each fields type converter. The full details of how this works are explained below, however most users do not need to deal with the full details: 1 If your pages all use the UTF-8 character encoding (or at least all the pages that contain forms) the browsers will always use UTF-8 for arguments. You need to add ':utf8' into all argument type converts. For example: 2 If your pages all use a character encoding which has ASCII as a subset (such as Latin-1, UTF-8, etc) then you do not need to specify any chatacter encoding for boolean, int, long, float, and date types. You can also omit the character encoding type converter from string, tokens, lines, and text types if you only need to handle ASCII characters in that form field. Character Encodings for Arguments; The Full Story If you are not in one of those two easy categories, you first need to determine which character encoding will be used by the browser to encode the arguments in submitted forms. 1. Forms submitted using GET, or using POST with "application/x-www-form-urlencoded" (the default) 1. Page uses an encoding of unicode: Forms are submitted using UTF8, as required by RFC 2718 2.2.5 2. Page uses another regional 8 bit encoding: Forms are often submitted using the same encoding as the page. If you choose to use such an encoding then you should also verify how browsers behave. 2. Forms submitted using "multipart/form-data": According to HTML 4.01 (section 17.13.4) browsers should state which character encoding they are using for each field in a Content-Type header, however this is poorly supported. The current crop of browsers appear to use the same encoding as the page containing the form. Every field needs that character encoding name appended to is converter. The tag parser insists that tags must only use alphanumberic characters or an underscore, so you might need to use a short form of the encoding name from the Python 'encodings' library package (such as utf8 rather than UTF-8).